当前位置:网站首页>Use selenium to climb the annual box office of Yien
Use selenium to climb the annual box office of Yien
2022-07-03 06:15:00 【Black~boy】
utilize selenium Climb to the annual box office of Yien
1. summary
1.1 selenium
Selenium Is a Web Tools for application testing .Selenium Test runs directly in browser , It's like a real user is doing it . Supported browsers include IE(7, 8, 9, 10, 11),Mozilla Firefox,Safari,Google Chrome,Opera,Edge etc. . The main functions of this tool include : Test compatibility with browser —— Test the application to see if it works well on different browsers and operating systems . Test system functions —— Create regression tests to verify software functionality and user requirements . Support automatic recording of actions and automatic generation .Net、Java、Perl Test scripts in different languages .( From baidu baike )
2. Crawling principle
utilize selenium Crawl the data in the website , And save it to mysql In the database
3. Preparation
3.1webdrive: Similar to drive ( The principle is as follows )
Webdriver It is developed for different browsers , Different browsers have different webdriver. For example, for Chrome The use of chromedriver.
remind :webdriver It must be consistent with the browser version !
3.2 selenium library
install selenium library :
3.3 mysql Database installation
Installation details mysql Installation tutorial
3.4 mysql And python Connection Library ( Be similar to webdrive)
There are many connection libraries , Please see the link below for details
Connection Library
This case uses pymysql:
3.5 re( Regular expressions ) library
A regular expression is a special sequence of characters , It can help you easily check whether a string matches a certain pattern .
Python since 1.5 Version has been added re modular , It provides Perl Style regular expression pattern .
re Module enable Python The language has all the regular expression functions .
compile Function to generate a regular expression object based on a pattern string and optional flag parameters . This object has a series of methods for matching and replacing regular expressions .
re The module also provides functions that are fully consistent with the functions of these methods , These functions use a pattern string as their first argument .
4. Code instance
import re
import pymysql
from selenium import webdriver
from selenium.webdriver.support.select import Select
import time
db = pymysql.connect(host='127.0.0.1', port=3306,user = 'root',password='123456',database='dianying',charset='utf8') # Database name 、 The password is defined for yourself
driver = webdriver.Chrome()
driver.get('https://www.endata.com.cn/BoxOffice/BO/Year/index.html')
sel_el = driver.find_element_by_xpath('//*[@id="OptionDate"]')
sel = Select(sel_el)
for i in range(len(sel.options)):
sel.select_by_index(i)
time.sleep(2)
table2 = driver.find_element_by_xpath('/html/body/section[1]/div/div[2]/div/div/div[2]/table/tbody')
ss = table2.text
ss1 = re.split(r'[\n ]',ss)
for j in range(25):
cursor = db.cursor()
demo = cursor.execute('INSERT INTO data VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s)',(str(2021-i),ss1[j*8+0],ss1[j*8+1],ss1[j*8+2],ss1[j*8+3],ss1[j*8+4],ss1[j*8+5],ss1[j*8+6],ss1[j*8+7]))
lists = cursor.fetchall()
db.commit()
print("==================================")
db.close()
driver.close()
5. design sketch

6 explain
If there is any infringement , Contact deletion [email protected]
边栏推荐
猜你喜欢

Kubernetes notes (II) pod usage notes

项目总结--2(Jsoup的基本使用)

Kubernetes notes (10) kubernetes Monitoring & debugging

Pytorch dataloader implements minibatch (incomplete)

Convolution operation in convolution neural network CNN

使用conda创建自己的深度学习环境

Oauth2.0 - Introduction and use and explanation of authorization code mode

Project summary --01 (addition, deletion, modification and query of interfaces; use of multithreading)

Fluentd is easy to use. Combined with the rainbow plug-in market, log collection is faster

Bio, NiO, AIO details
随机推荐
PMP notes
Bernoulli distribution, binomial distribution and Poisson distribution, and the relationship between maximum likelihood (incomplete)
Leetcode solution - 01 Two Sum
剖析虚幻渲染体系(16)- 图形驱动的秘密
Cesium entity (entities) entity deletion method
Kubernetes notes (VII) kuberetes scheduling
Jackson: what if there is a lack of property- Jackson: What happens if a property is missing?
Phpstudy setting items can be accessed by other computers on the LAN
技术管理进阶——你了解成长的全貌吗?
Characteristics and isolation level of database
Clickhouse learning notes (I): Clickhouse installation, data type, table engine, SQL operation
智牛股项目--05
Cesium 点击获取模型表面经纬度高程坐标(三维坐标)
使用 Abp.Zero 搭建第三方登录模块(一):原理篇
Deep learning, thinking from one dimensional input to multi-dimensional feature input
Interesting research on mouse pointer interaction
Luogu problem list: [mathematics 1] basic mathematics problems
Jedis source code analysis (I): jedis introduction, jedis module source code analysis
Convolution operation in convolution neural network CNN
冒泡排序的简单理解