当前位置:网站首页>Selenium+bs4 parsing +mysql capturing BiliBili Tarot data
Selenium+bs4 parsing +mysql capturing BiliBili Tarot data
2022-07-07 09:44:00 【-Coffee-】
git Source code :https://github.com/xuyanhuiwelcome/python-spider
It will be updated continuously , Crawling data spider
Data can be analyzed by Tarot , Don't talk much , Code up :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
from selenium.webdriver.support.wait import WebDriverWait
import lxml
import pymysql
browser = webdriver.Chrome()
def run(page):
url = 'https://search.XXX.com/upuser?keyword=%E5%A1%94%E7%BD%97&page={page}'.format(
page=page)
browser.get(url)
WebDriverWait(browser,30)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
save_data(soup)
def save_data(soup):
lsts = soup.find(class_="body-contain").find_all(class_="user-item")
for item in lsts:
username = item.find(class_="title").get('title')
level = item.find(class_="title").find("i")
desc = item.find(class_="desc").text
spans = item.find(class_="up-info clearfix").find_all("span")
funs = spans[0].text.strip(" Manuscript :")
work = spans[1].text.strip(" fans :")
if work.count(' ten thousand ') > 0:
work = float(work.strip(' ten thousand ')) * 10000
centerUrl = item.find("a").get('href').strip("//")
save_mysql(username,level,desc,funs,work,centerUrl)
def save_mysql(useranme,level,desc,funs,work,contenturl):
# Open database connection
db = pymysql.connect(host='localhost',
user='root',
password='root',
database='tesst')
# Use cursor() Method get operation cursor
cursor = db.cursor()
# SQL Insert statement
sql = "INSERT INTO `tarotuser`(`userName`,`funs`,`work`,`desc`,`centerUrl`) VALUES('"+useranme+"','"+funs+"','"+str(work)+"','"+desc+"','"+contenturl+"')"
try:
# perform sql sentence
cursor.execute(sql)
# Commit to database execution
db.commit()
except:
# Roll back if an error occurs
db.rollback()
# Close database connection
db.close()
def main():
for page in range(1,50):
run(page)
if __name__=='__main__':
main()
边栏推荐
- 其实特简单,教你轻松实现酷炫的数据可视化大屏
- Unity uses mesh to realize real-time point cloud (II)
- [4g/5g/6g topic foundation-146]: Interpretation of white paper on 6G overall vision and potential key technologies-1-overall vision
- Final keyword
- 沙龙预告|GameFi 领域的瓶颈和解决方案
- The configuration and options of save actions are explained in detail, and you won't be confused after reading it
- Oracle安装增强功能出错
- iNFTnews | 时尚品牌将以什么方式进入元宇宙?
- MongoDB怎么实现创建删除数据库、创建删除表、数据增删改查
- First issue of JS reverse tutorial
猜你喜欢
AI从感知走向智能认知
sqlplus乱码问题,求解答
信息安全实验二 :使用X-SCANNER扫描工具
js逆向教程第二发-猿人学第一题
4、 Fundamentals of machine learning
What development models did you know during the interview? Just read this one
Sqlplus garbled code problem, find the solution
Dynamics 365Online ApplicationUser创建方式变更
[4G/5G/6G专题基础-147]: 6G总体愿景与潜在关键技术白皮书解读-2-6G发展的宏观驱动力
如何使用clipboard.js库实现复制剪切功能
随机推荐
sqlplus乱码问题,求解答
golang select机制和超时问题怎么解决
Pick up the premise idea of programming
Nested (multi-level) childrn routes, query parameters, named routes, replace attribute, props configuration of routes, params parameters of routes
Final keyword
C# Socke 服务器,客户端,UDP
根据热门面试题分析Android事件分发机制(一)
Unity uses mesh to realize real-time point cloud (II)
iNFTnews | 时尚品牌将以什么方式进入元宇宙?
How to become a senior digital IC Design Engineer (1-6) Verilog coding Grammar: Classic Digital IC Design
Network request process
flinkcdc采集oracle在snapshot阶段一直失败,这个得怎么调整啊?
[cloud native] Devops (I): introduction to Devops and use of code tool
如何成为一名高级数字 IC 设计工程师(5-2)理论篇:ULP 低功耗设计技术精讲(上)
VSCode+mingw64
Octopus future star won a reward of 250000 US dollars | Octopus accelerator 2022 summer entrepreneurship camp came to a successful conclusion
Lecture 1: stack containing min function
大佬们,有没有遇到过flink cdc读MySQLbinlog丢数据的情况,每次任务重启就有概率丢数
NATAPP内网穿透
小程序弹出半角遮罩层