当前位置:网站首页>Selenium+bs4 parsing +mysql capturing BiliBili Tarot data
Selenium+bs4 parsing +mysql capturing BiliBili Tarot data
2022-07-07 09:44:00 【-Coffee-】
git Source code :https://github.com/xuyanhuiwelcome/python-spider
It will be updated continuously , Crawling data spider
Data can be analyzed by Tarot , Don't talk much , Code up :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
from selenium.webdriver.support.wait import WebDriverWait
import lxml
import pymysql
browser = webdriver.Chrome()
def run(page):
url = 'https://search.XXX.com/upuser?keyword=%E5%A1%94%E7%BD%97&page={page}'.format(
page=page)
browser.get(url)
WebDriverWait(browser,30)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
save_data(soup)
def save_data(soup):
lsts = soup.find(class_="body-contain").find_all(class_="user-item")
for item in lsts:
username = item.find(class_="title").get('title')
level = item.find(class_="title").find("i")
desc = item.find(class_="desc").text
spans = item.find(class_="up-info clearfix").find_all("span")
funs = spans[0].text.strip(" Manuscript :")
work = spans[1].text.strip(" fans :")
if work.count(' ten thousand ') > 0:
work = float(work.strip(' ten thousand ')) * 10000
centerUrl = item.find("a").get('href').strip("//")
save_mysql(username,level,desc,funs,work,centerUrl)
def save_mysql(useranme,level,desc,funs,work,contenturl):
# Open database connection
db = pymysql.connect(host='localhost',
user='root',
password='root',
database='tesst')
# Use cursor() Method get operation cursor
cursor = db.cursor()
# SQL Insert statement
sql = "INSERT INTO `tarotuser`(`userName`,`funs`,`work`,`desc`,`centerUrl`) VALUES('"+useranme+"','"+funs+"','"+str(work)+"','"+desc+"','"+contenturl+"')"
try:
# perform sql sentence
cursor.execute(sql)
# Commit to database execution
db.commit()
except:
# Roll back if an error occurs
db.rollback()
# Close database connection
db.close()
def main():
for page in range(1,50):
run(page)
if __name__=='__main__':
main()
边栏推荐
- JS逆向教程第一发
- iNFTnews | 时尚品牌将以什么方式进入元宇宙?
- [4G/5G/6G专题基础-146]: 6G总体愿景与潜在关键技术白皮书解读-1-总体愿景
- scrapy爬虫mysql,Django等
- Liunx command
- [4g/5g/6g topic foundation -147]: Interpretation of the white paper on 6G's overall vision and potential key technologies -2-6g's macro driving force for development
- Information Security Experiment 1: implementation of DES encryption algorithm
- Mysql:select ... for update
- 沙龙预告|GameFi 领域的瓶颈和解决方案
- Strategic cooperation subquery becomes the secret weapon of Octopus web browser
猜你喜欢
PLC信号处理系列之开关量信号防抖FB
JS reverse tutorial second issue - Ape anthropology first question
印象笔记终于支持默认markdown预览模式
[bw16 application] Anxin can realize mqtt communication with bw16 module / development board at instruction
基础篇:带你从头到尾玩转注解
Unity shader (to achieve a simple material effect with adjustable color attributes only)
网易云微信小程序
信息安全实验二 :使用X-SCANNER扫描工具
How does mongodb realize the creation and deletion of databases, the creation of deletion tables, and the addition, deletion, modification and query of data
# Arthas 简单使用说明
随机推荐
flex弹性布局
iNFTnews | 时尚品牌将以什么方式进入元宇宙?
How to use clipboard JS library implements copy and cut function
[Frida practice] "one line" code teaches you to obtain all Lua scripts in wegame platform
大佬们,请问 MySQL-CDC 有什么办法将 upsert 消息转换为 append only 消
JS inheritance prototype
What development models did you know during the interview? Just read this one
Netease Cloud Wechat applet
shake数据库中怎么使用Mongo-shake实现MongoDB的双向同步啊?
Unity shader (basic concept)
[4G/5G/6G专题基础-147]: 6G总体愿景与潜在关键技术白皮书解读-2-6G发展的宏观驱动力
第一讲:包含min函数的栈
Scratch crawler mysql, Django, etc
The industrial chain of consumer Internet is actually very short. It only undertakes the role of docking and matchmaking between upstream and downstream platforms
In fact, it's very simple. It teaches you to easily realize the cool data visualization big screen
Impression notes finally support the default markdown preview mode
Write VBA in Excel, connect to Oracle and query the contents in the database
Colorbar of using vertexehelper to customize controls (II)
【原创】程序员团队管理的核心是什么?
Addition, deletion, modification and query of ThinkPHP database