当前位置:网站首页>Selenium+bs4 parsing +mysql capturing BiliBili Tarot data
Selenium+bs4 parsing +mysql capturing BiliBili Tarot data
2022-07-07 09:44:00 【-Coffee-】
git Source code :https://github.com/xuyanhuiwelcome/python-spider
It will be updated continuously , Crawling data spider
Data can be analyzed by Tarot , Don't talk much , Code up :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
from selenium.webdriver.support.wait import WebDriverWait
import lxml
import pymysql
browser = webdriver.Chrome()
def run(page):
url = 'https://search.XXX.com/upuser?keyword=%E5%A1%94%E7%BD%97&page={page}'.format(
page=page)
browser.get(url)
WebDriverWait(browser,30)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
save_data(soup)
def save_data(soup):
lsts = soup.find(class_="body-contain").find_all(class_="user-item")
for item in lsts:
username = item.find(class_="title").get('title')
level = item.find(class_="title").find("i")
desc = item.find(class_="desc").text
spans = item.find(class_="up-info clearfix").find_all("span")
funs = spans[0].text.strip(" Manuscript :")
work = spans[1].text.strip(" fans :")
if work.count(' ten thousand ') > 0:
work = float(work.strip(' ten thousand ')) * 10000
centerUrl = item.find("a").get('href').strip("//")
save_mysql(username,level,desc,funs,work,centerUrl)
def save_mysql(useranme,level,desc,funs,work,contenturl):
# Open database connection
db = pymysql.connect(host='localhost',
user='root',
password='root',
database='tesst')
# Use cursor() Method get operation cursor
cursor = db.cursor()
# SQL Insert statement
sql = "INSERT INTO `tarotuser`(`userName`,`funs`,`work`,`desc`,`centerUrl`) VALUES('"+useranme+"','"+funs+"','"+str(work)+"','"+desc+"','"+contenturl+"')"
try:
# perform sql sentence
cursor.execute(sql)
# Commit to database execution
db.commit()
except:
# Roll back if an error occurs
db.rollback()
# Close database connection
db.close()
def main():
for page in range(1,50):
run(page)
if __name__=='__main__':
main()
边栏推荐
- 信息安全实验二 :使用X-SCANNER扫描工具
- 如何成为一名高级数字 IC 设计工程师(5-3)理论篇:ULP 低功耗设计技术精讲(下)
- 在EXCEL写VBA连接ORACLE并查询数据库中的内容
- thinkphp3.2信息泄露
- 有没有大佬帮忙看看这个报错,有啥排查思路,oracle cdc 2.2.1 flink 1.14.4
- 牛客网——华为题库(61~70)
- How to become a senior digital IC Design Engineer (5-2) theory: ULP low power design technology (Part 1)
- flinkcdc采集oracle在snapshot阶段一直失败,这个得怎么调整啊?
- Kubernetes cluster capacity expansion to add node nodes
- shake数据库中怎么使用Mongo-shake实现MongoDB的双向同步啊?
猜你喜欢

Octopus future star won a reward of 250000 US dollars | Octopus accelerator 2022 summer entrepreneurship camp came to a successful conclusion

sqlplus乱码问题,求解答

ComputeShader

How to speed up video playback in browser

nlohmann json

PLC信号处理系列之开关量信号防抖FB

CSDN salary increase technology - learn about the use of several common logic controllers of JMeter

Lecture 1: stack containing min function

The configuration and options of save actions are explained in detail, and you won't be confused after reading it

flex弹性布局
随机推荐
Oracle安装增强功能出错
Loxodonframework quick start
How does mongodb realize the creation and deletion of databases, the creation of deletion tables, and the addition, deletion, modification and query of data
农牧业未来发展蓝图--垂直农业+人造肉
asp. How to call vb DLL function in net project
The industrial chain of consumer Internet is actually very short. It only undertakes the role of docking and matchmaking between upstream and downstream platforms
Unity shader (data type in cghlsl)
【frida实战】“一行”代码教你获取WeGame平台中所有的lua脚本
进程间的通信方式
Kubernetes cluster capacity expansion to add node nodes
CDZSC_2022寒假个人训练赛21级(2)
CSDN salary increase technology - learn about the use of several common logic controllers of JMeter
flink. CDC sqlserver. 可以再次写入sqlserver中么 有连接器的 dem
【BW16 应用篇】安信可BW16模组/开发板AT指令实现MQTT通讯
What is MD5
Difference between process and thread
Octopus future star won a reward of 250000 US dollars | Octopus accelerator 2022 summer entrepreneurship camp came to a successful conclusion
Arthas simple instructions
Switching value signal anti shake FB of PLC signal processing series
MongoDB怎么实现创建删除数据库、创建删除表、数据增删改查