当前位置:网站首页>Selenium+bs4 parsing +mysql capturing BiliBili Tarot data
Selenium+bs4 parsing +mysql capturing BiliBili Tarot data
2022-07-07 09:44:00 【-Coffee-】
git Source code :https://github.com/xuyanhuiwelcome/python-spider
It will be updated continuously , Crawling data spider
Data can be analyzed by Tarot , Don't talk much , Code up :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
from selenium.webdriver.support.wait import WebDriverWait
import lxml
import pymysql
browser = webdriver.Chrome()
def run(page):
url = 'https://search.XXX.com/upuser?keyword=%E5%A1%94%E7%BD%97&page={page}'.format(
page=page)
browser.get(url)
WebDriverWait(browser,30)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
save_data(soup)
def save_data(soup):
lsts = soup.find(class_="body-contain").find_all(class_="user-item")
for item in lsts:
username = item.find(class_="title").get('title')
level = item.find(class_="title").find("i")
desc = item.find(class_="desc").text
spans = item.find(class_="up-info clearfix").find_all("span")
funs = spans[0].text.strip(" Manuscript :")
work = spans[1].text.strip(" fans :")
if work.count(' ten thousand ') > 0:
work = float(work.strip(' ten thousand ')) * 10000
centerUrl = item.find("a").get('href').strip("//")
save_mysql(username,level,desc,funs,work,centerUrl)
def save_mysql(useranme,level,desc,funs,work,contenturl):
# Open database connection
db = pymysql.connect(host='localhost',
user='root',
password='root',
database='tesst')
# Use cursor() Method get operation cursor
cursor = db.cursor()
# SQL Insert statement
sql = "INSERT INTO `tarotuser`(`userName`,`funs`,`work`,`desc`,`centerUrl`) VALUES('"+useranme+"','"+funs+"','"+str(work)+"','"+desc+"','"+contenturl+"')"
try:
# perform sql sentence
cursor.execute(sql)
# Commit to database execution
db.commit()
except:
# Roll back if an error occurs
db.rollback()
# Close database connection
db.close()
def main():
for page in range(1,50):
run(page)
if __name__=='__main__':
main()
边栏推荐
- 根据热门面试题分析Android事件分发机制(一)
- Mysql:select ... for update
- Using JWT to realize login function
- The configuration and options of save actions are explained in detail, and you won't be confused after reading it
- C# XML的应用
- 创建一个长度为6的int型数组,要求数组元素的值都在1-30之间,且是随机赋值。同时,要求元素的值各不相同。
- CSDN salary increase technology - learn about the use of several common logic controllers of JMeter
- 软件建模与分析
- 印象笔记终于支持默认markdown预览模式
- How to solve the problem of golang select mechanism and timeout
猜你喜欢

Information Security Experiment 4: implementation of IP packet monitoring program

Lecture 1: stack containing min function

Dynamics 365Online ApplicationUser创建方式变更

Regular matching starts with XXX and ends with XXX

JMeter JDBC batch references data as input parameters (the simplest method for the whole website)

Netease Cloud Wechat applet

小程序实现页面多级来回切换支持滑动和点击操作

Network request process

PLC信号处理系列之开关量信号防抖FB

Unity3d interface is embedded in WPF interface (mouse and keyboard can respond normally)
随机推荐
如何使用clipboard.js库实现复制剪切功能
JS逆向教程第一发
In fact, it's very simple. It teaches you to easily realize the cool data visualization big screen
How to become a senior digital IC Design Engineer (1-6) Verilog coding Grammar: Classic Digital IC Design
Lesson 1: hardness of eggs
【frida实战】“一行”代码教你获取WeGame平台中所有的lua脚本
MongoDB怎么实现创建删除数据库、创建删除表、数据增删改查
flex弹性布局
其实特简单,教你轻松实现酷炫的数据可视化大屏
esp8266使用TF卡并读写数据(基于arduino)
What is MD5
The configuration and options of save actions are explained in detail, and you won't be confused after reading it
Unity shader (basic concept)
在EXCEL写VBA连接ORACLE并查询数据库中的内容
PostgreSQL创建触发器的时候报错,
Dynamics 365Online ApplicationUser创建方式变更
CDZSC_2022寒假个人训练赛21级(2)
Unity shader (data type in cghlsl)
C# 初始化程序时查看初始化到哪里了示例
牛客网——华为题库(61~70)