当前位置:网站首页>Use selenium to climb the annual box office of Yien
Use selenium to climb the annual box office of Yien
2022-07-03 06:15:00 【Black~boy】
utilize selenium Climb to the annual box office of Yien
1. summary
1.1 selenium
Selenium Is a Web Tools for application testing .Selenium Test runs directly in browser , It's like a real user is doing it . Supported browsers include IE(7, 8, 9, 10, 11),Mozilla Firefox,Safari,Google Chrome,Opera,Edge etc. . The main functions of this tool include : Test compatibility with browser —— Test the application to see if it works well on different browsers and operating systems . Test system functions —— Create regression tests to verify software functionality and user requirements . Support automatic recording of actions and automatic generation .Net、Java、Perl Test scripts in different languages .( From baidu baike )
2. Crawling principle
utilize selenium Crawl the data in the website , And save it to mysql In the database
3. Preparation
3.1webdrive: Similar to drive ( The principle is as follows )
Webdriver It is developed for different browsers , Different browsers have different webdriver. For example, for Chrome The use of chromedriver.
remind :webdriver It must be consistent with the browser version !
3.2 selenium library
install selenium library :
3.3 mysql Database installation
Installation details mysql Installation tutorial
3.4 mysql And python Connection Library ( Be similar to webdrive)
There are many connection libraries , Please see the link below for details
Connection Library
This case uses pymysql:
3.5 re( Regular expressions ) library
A regular expression is a special sequence of characters , It can help you easily check whether a string matches a certain pattern .
Python since 1.5 Version has been added re modular , It provides Perl Style regular expression pattern .
re Module enable Python The language has all the regular expression functions .
compile Function to generate a regular expression object based on a pattern string and optional flag parameters . This object has a series of methods for matching and replacing regular expressions .
re The module also provides functions that are fully consistent with the functions of these methods , These functions use a pattern string as their first argument .
4. Code instance
import re
import pymysql
from selenium import webdriver
from selenium.webdriver.support.select import Select
import time
db = pymysql.connect(host='127.0.0.1', port=3306,user = 'root',password='123456',database='dianying',charset='utf8') # Database name 、 The password is defined for yourself
driver = webdriver.Chrome()
driver.get('https://www.endata.com.cn/BoxOffice/BO/Year/index.html')
sel_el = driver.find_element_by_xpath('//*[@id="OptionDate"]')
sel = Select(sel_el)
for i in range(len(sel.options)):
sel.select_by_index(i)
time.sleep(2)
table2 = driver.find_element_by_xpath('/html/body/section[1]/div/div[2]/div/div/div[2]/table/tbody')
ss = table2.text
ss1 = re.split(r'[\n ]',ss)
for j in range(25):
cursor = db.cursor()
demo = cursor.execute('INSERT INTO data VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s)',(str(2021-i),ss1[j*8+0],ss1[j*8+1],ss1[j*8+2],ss1[j*8+3],ss1[j*8+4],ss1[j*8+5],ss1[j*8+6],ss1[j*8+7]))
lists = cursor.fetchall()
db.commit()
print("==================================")
db.close()
driver.close()
5. design sketch

6 explain
If there is any infringement , Contact deletion [email protected]
边栏推荐
- YOLOV2学习与总结
- 技术管理进阶——你了解成长的全貌吗?
- 代码管理工具
- JMeter performance automation test
- Migrate data from Mysql to tidb from a small amount of data
- 从小数据量分库分表 MySQL 合并迁移数据到 TiDB
- Clickhouse learning notes (2): execution plan, table creation optimization, syntax optimization rules, query optimization, data consistency
- YOLOV3学习笔记
- Loss function in pytorch multi classification
- PMP notes
猜你喜欢

Kubernetes cluster environment construction & Deployment dashboard

JMeter performance automation test

Loss function in pytorch multi classification

Understand the first prediction stage of yolov1

SVN分支管理

Convolution operation in convolution neural network CNN

How to scan when Canon c3120l is a network shared printer

Skywalking8.7 source code analysis (II): Custom agent, service loading, witness component version identification, transform workflow

Kubernetes notes (I) kubernetes cluster architecture

YOLOV3学习笔记
随机推荐
Kubernetes notes (II) pod usage notes
YOLOV3学习笔记
Method of converting GPS coordinates to Baidu map coordinates
Bio, NiO, AIO details
conda和pip的区别
Fluentd is easy to use. Combined with the rainbow plug-in market, log collection is faster
Kubernetes notes (VIII) kubernetes security
Loss function in pytorch multi classification
Clickhouse learning notes (I): Clickhouse installation, data type, table engine, SQL operation
有意思的鼠標指針交互探究
Kubesphere - set up redis cluster
Cesium 点击获三维坐标(经纬度高程)
SVN分支管理
Cesium 点击获取模型表面经纬度高程坐标(三维坐标)
Convolution operation in convolution neural network CNN
Maximum likelihood estimation, divergence, cross entropy
Kubernetes notes (IX) kubernetes application encapsulation and expansion
Oauth2.0 - use database to store client information and authorization code
Apple submitted the new MAC model to the regulatory database before the spring conference
项目总结--2(Jsoup的基本使用)