当前位置:网站首页>Selenium+bs4 parsing +mysql capturing BiliBili Tarot data
Selenium+bs4 parsing +mysql capturing BiliBili Tarot data
2022-07-07 09:44:00 【-Coffee-】
git Source code :https://github.com/xuyanhuiwelcome/python-spider
It will be updated continuously , Crawling data spider
Data can be analyzed by Tarot , Don't talk much , Code up :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
from selenium.webdriver.support.wait import WebDriverWait
import lxml
import pymysql
browser = webdriver.Chrome()
def run(page):
url = 'https://search.XXX.com/upuser?keyword=%E5%A1%94%E7%BD%97&page={page}'.format(
page=page)
browser.get(url)
WebDriverWait(browser,30)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
save_data(soup)
def save_data(soup):
lsts = soup.find(class_="body-contain").find_all(class_="user-item")
for item in lsts:
username = item.find(class_="title").get('title')
level = item.find(class_="title").find("i")
desc = item.find(class_="desc").text
spans = item.find(class_="up-info clearfix").find_all("span")
funs = spans[0].text.strip(" Manuscript :")
work = spans[1].text.strip(" fans :")
if work.count(' ten thousand ') > 0:
work = float(work.strip(' ten thousand ')) * 10000
centerUrl = item.find("a").get('href').strip("//")
save_mysql(username,level,desc,funs,work,centerUrl)
def save_mysql(useranme,level,desc,funs,work,contenturl):
# Open database connection
db = pymysql.connect(host='localhost',
user='root',
password='root',
database='tesst')
# Use cursor() Method get operation cursor
cursor = db.cursor()
# SQL Insert statement
sql = "INSERT INTO `tarotuser`(`userName`,`funs`,`work`,`desc`,`centerUrl`) VALUES('"+useranme+"','"+funs+"','"+str(work)+"','"+desc+"','"+contenturl+"')"
try:
# perform sql sentence
cursor.execute(sql)
# Commit to database execution
db.commit()
except:
# Roll back if an error occurs
db.rollback()
# Close database connection
db.close()
def main():
for page in range(1,50):
run(page)
if __name__=='__main__':
main()
边栏推荐
- asp. How to call vb DLL function in net project
- What development models did you know during the interview? Just read this one
- Mysql:select ... for update
- How to become a senior digital IC Design Engineer (5-3) theory: ULP low power design technology (Part 2)
- IIS faked death this morning, various troubleshooting, has been solved
- The difference between viewpager2 and viewpager and the implementation of viewpager2 in the rotation chart
- Redis common commands
- 【原创】程序员团队管理的核心是什么?
- The industrial chain of consumer Internet is actually very short. It only undertakes the role of docking and matchmaking between upstream and downstream platforms
- 根据热门面试题分析Android事件分发机制(一)
猜你喜欢
Unity uses mesh to realize real-time point cloud (I)
【原创】程序员团队管理的核心是什么?
小程序弹出半角遮罩层
基于智慧城市与储住分离数字家居模式垃圾处理方法
[Frida practice] "one line" code teaches you to obtain all Lua scripts in wegame platform
iNFTnews | 时尚品牌将以什么方式进入元宇宙?
【BW16 应用篇】安信可BW16模组/开发板AT指令实现MQTT通讯
Information Security Experiment 4: implementation of IP packet monitoring program
Netease cloud wechat applet
Over 100000 words_ Ultra detailed SSM integration practice_ Manually implement permission management
随机推荐
liunx命令
Unity uses mesh to realize real-time point cloud (II)
asp. How to call vb DLL function in net project
# Arthas 简单使用说明
有没有大佬帮忙看看这个报错,有啥排查思路,oracle cdc 2.2.1 flink 1.14.4
20排位赛3
Mysql:select ... for update
印象笔记终于支持默认markdown预览模式
JS inheritance prototype
根据热门面试题分析Android事件分发机制(二)---事件冲突分析处理
其实特简单,教你轻松实现酷炫的数据可视化大屏
JS judge whether checkbox is selected in the project
How will fashion brands enter the meta universe?
小程序实现页面多级来回切换支持滑动和点击操作
信息安全实验二 :使用X-SCANNER扫描工具
[bw16 application] Anxin can realize mqtt communication with bw16 module / development board at instruction
Redis common commands
网易云微信小程序
[4g/5g/6g topic foundation-146]: Interpretation of white paper on 6G overall vision and potential key technologies-1-overall vision
Communication mode between processes