当前位置:网站首页>Selenium+bs4 parsing +mysql capturing BiliBili Tarot data
Selenium+bs4 parsing +mysql capturing BiliBili Tarot data
2022-07-07 09:44:00 【-Coffee-】
git Source code :https://github.com/xuyanhuiwelcome/python-spider
It will be updated continuously , Crawling data spider
Data can be analyzed by Tarot , Don't talk much , Code up :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
from selenium.webdriver.support.wait import WebDriverWait
import lxml
import pymysql
browser = webdriver.Chrome()
def run(page):
url = 'https://search.XXX.com/upuser?keyword=%E5%A1%94%E7%BD%97&page={page}'.format(
page=page)
browser.get(url)
WebDriverWait(browser,30)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
save_data(soup)
def save_data(soup):
lsts = soup.find(class_="body-contain").find_all(class_="user-item")
for item in lsts:
username = item.find(class_="title").get('title')
level = item.find(class_="title").find("i")
desc = item.find(class_="desc").text
spans = item.find(class_="up-info clearfix").find_all("span")
funs = spans[0].text.strip(" Manuscript :")
work = spans[1].text.strip(" fans :")
if work.count(' ten thousand ') > 0:
work = float(work.strip(' ten thousand ')) * 10000
centerUrl = item.find("a").get('href').strip("//")
save_mysql(username,level,desc,funs,work,centerUrl)
def save_mysql(useranme,level,desc,funs,work,contenturl):
# Open database connection
db = pymysql.connect(host='localhost',
user='root',
password='root',
database='tesst')
# Use cursor() Method get operation cursor
cursor = db.cursor()
# SQL Insert statement
sql = "INSERT INTO `tarotuser`(`userName`,`funs`,`work`,`desc`,`centerUrl`) VALUES('"+useranme+"','"+funs+"','"+str(work)+"','"+desc+"','"+contenturl+"')"
try:
# perform sql sentence
cursor.execute(sql)
# Commit to database execution
db.commit()
except:
# Roll back if an error occurs
db.rollback()
# Close database connection
db.close()
def main():
for page in range(1,50):
run(page)
if __name__=='__main__':
main()
边栏推荐
- iNFTnews | 时尚品牌将以什么方式进入元宇宙?
- 如何使用clipboard.js库实现复制剪切功能
- thinkphp数据库的增删改查
- How to become a senior digital IC Design Engineer (5-2) theory: ULP low power design technology (Part 1)
- Difference between interface iterator and iteratable
- 20排位赛3
- 【无标题】
- Regular matching starts with XXX and ends with XXX
- Write VBA in Excel, connect to Oracle and query the contents in the database
- 网易云微信小程序
猜你喜欢
In fact, it's very simple. It teaches you to easily realize the cool data visualization big screen
The configuration and options of save actions are explained in detail, and you won't be confused after reading it
Difference between interface iterator and iteratable
PLC信号处理系列之开关量信号防抖FB
[cloud native] Devops (I): introduction to Devops and use of code tool
信息安全实验三 :PGP邮件加密软件的使用
基础篇:带你从头到尾玩转注解
Detailed explanation of diffusion model
Lecture 1: stack containing min function
[4G/5G/6G专题基础-147]: 6G总体愿景与潜在关键技术白皮书解读-2-6G发展的宏观驱动力
随机推荐
Install pyqt5 and Matplotlib module
asp. How to call vb DLL function in net project
Netease Cloud Wechat applet
H5 web player easyplayer How does JS realize live video real-time recording?
JS reverse tutorial second issue - Ape anthropology first question
AI从感知走向智能认知
Information Security Experiment 3: the use of PGP email encryption software
MongoDB怎么实现创建删除数据库、创建删除表、数据增删改查
Can flycdc use SqlClient to specify mysqlbinlog ID to execute tasks
How to solve the problem of golang select mechanism and timeout
esp8266使用TF卡并读写数据(基于arduino)
The configuration and options of save actions are explained in detail, and you won't be confused after reading it
Upload taro pictures to Base64
如何使用clipboard.js库实现复制剪切功能
Dynamics 365Online ApplicationUser创建方式变更
Communication mode between processes
Liunx command
**Grafana installation**
sqlplus乱码问题,求解答
Information Security Experiment 1: implementation of DES encryption algorithm