当前位置:网站首页>Use selenium to climb the annual box office of Yien
Use selenium to climb the annual box office of Yien
2022-07-03 06:15:00 【Black~boy】
utilize selenium Climb to the annual box office of Yien
1. summary
1.1 selenium
Selenium Is a Web Tools for application testing .Selenium Test runs directly in browser , It's like a real user is doing it . Supported browsers include IE(7, 8, 9, 10, 11),Mozilla Firefox,Safari,Google Chrome,Opera,Edge etc. . The main functions of this tool include : Test compatibility with browser —— Test the application to see if it works well on different browsers and operating systems . Test system functions —— Create regression tests to verify software functionality and user requirements . Support automatic recording of actions and automatic generation .Net、Java、Perl Test scripts in different languages .( From baidu baike )
2. Crawling principle
utilize selenium Crawl the data in the website , And save it to mysql In the database
3. Preparation
3.1webdrive: Similar to drive ( The principle is as follows )
Webdriver It is developed for different browsers , Different browsers have different webdriver. For example, for Chrome The use of chromedriver.
remind :webdriver It must be consistent with the browser version !
3.2 selenium library
install selenium library :
3.3 mysql Database installation
Installation details mysql Installation tutorial
3.4 mysql And python Connection Library ( Be similar to webdrive)
There are many connection libraries , Please see the link below for details
Connection Library
This case uses pymysql:
3.5 re( Regular expressions ) library
A regular expression is a special sequence of characters , It can help you easily check whether a string matches a certain pattern .
Python since 1.5 Version has been added re modular , It provides Perl Style regular expression pattern .
re Module enable Python The language has all the regular expression functions .
compile Function to generate a regular expression object based on a pattern string and optional flag parameters . This object has a series of methods for matching and replacing regular expressions .
re The module also provides functions that are fully consistent with the functions of these methods , These functions use a pattern string as their first argument .
4. Code instance
import re
import pymysql
from selenium import webdriver
from selenium.webdriver.support.select import Select
import time
db = pymysql.connect(host='127.0.0.1', port=3306,user = 'root',password='123456',database='dianying',charset='utf8') # Database name 、 The password is defined for yourself
driver = webdriver.Chrome()
driver.get('https://www.endata.com.cn/BoxOffice/BO/Year/index.html')
sel_el = driver.find_element_by_xpath('//*[@id="OptionDate"]')
sel = Select(sel_el)
for i in range(len(sel.options)):
sel.select_by_index(i)
time.sleep(2)
table2 = driver.find_element_by_xpath('/html/body/section[1]/div/div[2]/div/div/div[2]/table/tbody')
ss = table2.text
ss1 = re.split(r'[\n ]',ss)
for j in range(25):
cursor = db.cursor()
demo = cursor.execute('INSERT INTO data VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s)',(str(2021-i),ss1[j*8+0],ss1[j*8+1],ss1[j*8+2],ss1[j*8+3],ss1[j*8+4],ss1[j*8+5],ss1[j*8+6],ss1[j*8+7]))
lists = cursor.fetchall()
db.commit()
print("==================================")
db.close()
driver.close()
5. design sketch
6 explain
If there is any infringement , Contact deletion [email protected]
边栏推荐
- Loss function in pytorch multi classification
- Mysql5.7 group by error
- Oauth2.0 - user defined mode authorization - SMS verification code login
- Mysql database table export and import with binary
- MySQL带二进制的库表导出导入
- Method of converting GPS coordinates to Baidu map coordinates
- Exportation et importation de tables de bibliothèque avec binaires MySQL
- Simple handwritten ORM framework
- Common interview questions
- MySQL帶二進制的庫錶導出導入
猜你喜欢
GPS坐标转百度地图坐标的方法
輕松上手Fluentd,結合 Rainbond 插件市場,日志收集更快捷
CKA certification notes - CKA certification experience post
How to scan when Canon c3120l is a network shared printer
Skywalking8.7 source code analysis (II): Custom agent, service loading, witness component version identification, transform workflow
Pytorch dataloader implements minibatch (incomplete)
【系统设计】邻近服务
Simple handwritten ORM framework
Advanced technology management - do you know the whole picture of growth?
Kubesphere - build Nacos cluster
随机推荐
Get a screenshot of a uiscrollview, including off screen parts
Clickhouse learning notes (2): execution plan, table creation optimization, syntax optimization rules, query optimization, data consistency
Kubesphere - build Nacos cluster
Understand expectations (mean / estimate) and variances
PMP notes
.NET程序配置文件操作(ini,cfg,config)
Kubernetes notes (VIII) kubernetes security
Reinstalling the system displays "setup is applying system settings" stationary
Kubesphere - Multi tenant management
phpstudy设置项目可以由局域网的其他电脑可以访问
Kubernetes notes (IV) kubernetes network
It is said that the operation and maintenance of shell scripts are paid tens of thousands of yuan a month!!!
conda和pip的区别
Simple understanding of ThreadLocal
About the difference between count (1), count (*), and count (column name)
Leetcode problem solving summary, constantly updating!
[set theory] relational closure (relational closure solution | relational graph closure | relational matrix closure | closure operation and relational properties | closure compound operation)
Exportation et importation de tables de bibliothèque avec binaires MySQL
Fluentd is easy to use. Combined with the rainbow plug-in market, log collection is faster
从 Amazon Aurora 迁移数据到 TiDB