当前位置:网站首页>Selenium+bs4 parsing +mysql capturing BiliBili Tarot data
Selenium+bs4 parsing +mysql capturing BiliBili Tarot data
2022-07-07 09:44:00 【-Coffee-】
git Source code :https://github.com/xuyanhuiwelcome/python-spider
It will be updated continuously , Crawling data spider
Data can be analyzed by Tarot , Don't talk much , Code up :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
from selenium.webdriver.support.wait import WebDriverWait
import lxml
import pymysql
browser = webdriver.Chrome()
def run(page):
url = 'https://search.XXX.com/upuser?keyword=%E5%A1%94%E7%BD%97&page={page}'.format(
page=page)
browser.get(url)
WebDriverWait(browser,30)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
save_data(soup)
def save_data(soup):
lsts = soup.find(class_="body-contain").find_all(class_="user-item")
for item in lsts:
username = item.find(class_="title").get('title')
level = item.find(class_="title").find("i")
desc = item.find(class_="desc").text
spans = item.find(class_="up-info clearfix").find_all("span")
funs = spans[0].text.strip(" Manuscript :")
work = spans[1].text.strip(" fans :")
if work.count(' ten thousand ') > 0:
work = float(work.strip(' ten thousand ')) * 10000
centerUrl = item.find("a").get('href').strip("//")
save_mysql(username,level,desc,funs,work,centerUrl)
def save_mysql(useranme,level,desc,funs,work,contenturl):
# Open database connection
db = pymysql.connect(host='localhost',
user='root',
password='root',
database='tesst')
# Use cursor() Method get operation cursor
cursor = db.cursor()
# SQL Insert statement
sql = "INSERT INTO `tarotuser`(`userName`,`funs`,`work`,`desc`,`centerUrl`) VALUES('"+useranme+"','"+funs+"','"+str(work)+"','"+desc+"','"+contenturl+"')"
try:
# perform sql sentence
cursor.execute(sql)
# Commit to database execution
db.commit()
except:
# Roll back if an error occurs
db.rollback()
# Close database connection
db.close()
def main():
for page in range(1,50):
run(page)
if __name__=='__main__':
main()
边栏推荐
猜你喜欢
![[Frida practice]](/img/20/fc68bcf2f55b140d6754af6364896b.png)
[Frida practice] "one line" code teaches you to obtain all Lua scripts in wegame platform

小程序滑动、点击切换简洁UI

H5 web player easyplayer How does JS realize live video real-time recording?

Information Security Experiment 4: implementation of IP packet monitoring program

Unity shader (to achieve a simple material effect with adjustable color attributes only)

nlohmann json

In fact, it's very simple. It teaches you to easily realize the cool data visualization big screen

小程序弹出半角遮罩层

js逆向教程第二发-猿人学第一题

Oracle installation enhancements error
随机推荐
[cloud native] Devops (I): introduction to Devops and use of code tool
Unity shader (pass user data to shader)
洛谷P2482 [SDOI2010]猪国杀
ViewPager2和VIewPager的區別以及ViewPager2實現輪播圖
Use 3 in data modeling σ Eliminate outliers for data cleaning
牛客网——华为题库(61~70)
在EXCEL写VBA连接ORACLE并查询数据库中的内容
Detailed explanation of diffusion model
How to solve the problem of golang select mechanism and timeout
Regular matching starts with XXX and ends with XXX
第一讲:包含min函数的栈
如何成为一名高级数字 IC 设计工程师(1-6)Verilog 编码语法篇:经典数字 IC 设计
根据热门面试题分析Android事件分发机制(二)---事件冲突分析处理
4、 Fundamentals of machine learning
Pick up the premise idea of programming
# Arthas 简单使用说明
如何成为一名高级数字 IC 设计工程师(5-3)理论篇:ULP 低功耗设计技术精讲(下)
Difference between process and thread
csdn涨薪技术-浅学Jmeter的几个常用的逻辑控制器使用
Sqlplus garbled code problem, find the solution