当前位置:网站首页>Use selenium to climb the annual box office of Yien
Use selenium to climb the annual box office of Yien
2022-07-03 06:15:00 【Black~boy】
utilize selenium Climb to the annual box office of Yien
1. summary
1.1 selenium
Selenium Is a Web Tools for application testing .Selenium Test runs directly in browser , It's like a real user is doing it . Supported browsers include IE(7, 8, 9, 10, 11),Mozilla Firefox,Safari,Google Chrome,Opera,Edge etc. . The main functions of this tool include : Test compatibility with browser —— Test the application to see if it works well on different browsers and operating systems . Test system functions —— Create regression tests to verify software functionality and user requirements . Support automatic recording of actions and automatic generation .Net、Java、Perl Test scripts in different languages .( From baidu baike )
2. Crawling principle
utilize selenium Crawl the data in the website , And save it to mysql In the database
3. Preparation
3.1webdrive: Similar to drive ( The principle is as follows )
Webdriver It is developed for different browsers , Different browsers have different webdriver. For example, for Chrome The use of chromedriver.
remind :webdriver It must be consistent with the browser version !
3.2 selenium library
install selenium library :
3.3 mysql Database installation
Installation details mysql Installation tutorial
3.4 mysql And python Connection Library ( Be similar to webdrive)
There are many connection libraries , Please see the link below for details
Connection Library
This case uses pymysql:
3.5 re( Regular expressions ) library
A regular expression is a special sequence of characters , It can help you easily check whether a string matches a certain pattern .
Python since 1.5 Version has been added re modular , It provides Perl Style regular expression pattern .
re Module enable Python The language has all the regular expression functions .
compile Function to generate a regular expression object based on a pattern string and optional flag parameters . This object has a series of methods for matching and replacing regular expressions .
re The module also provides functions that are fully consistent with the functions of these methods , These functions use a pattern string as their first argument .
4. Code instance
import re
import pymysql
from selenium import webdriver
from selenium.webdriver.support.select import Select
import time
db = pymysql.connect(host='127.0.0.1', port=3306,user = 'root',password='123456',database='dianying',charset='utf8') # Database name 、 The password is defined for yourself
driver = webdriver.Chrome()
driver.get('https://www.endata.com.cn/BoxOffice/BO/Year/index.html')
sel_el = driver.find_element_by_xpath('//*[@id="OptionDate"]')
sel = Select(sel_el)
for i in range(len(sel.options)):
sel.select_by_index(i)
time.sleep(2)
table2 = driver.find_element_by_xpath('/html/body/section[1]/div/div[2]/div/div/div[2]/table/tbody')
ss = table2.text
ss1 = re.split(r'[\n ]',ss)
for j in range(25):
cursor = db.cursor()
demo = cursor.execute('INSERT INTO data VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s)',(str(2021-i),ss1[j*8+0],ss1[j*8+1],ss1[j*8+2],ss1[j*8+3],ss1[j*8+4],ss1[j*8+5],ss1[j*8+6],ss1[j*8+7]))
lists = cursor.fetchall()
db.commit()
print("==================================")
db.close()
driver.close()
5. design sketch

6 explain
If there is any infringement , Contact deletion [email protected]
边栏推荐
- Introduction to software engineering
- After the Chrome browser is updated, lodop printing cannot be called
- 有意思的鼠标指针交互探究
- GPS坐标转百度地图坐标的方法
- Mysql5.7 group by error
- [set theory] equivalence relation (concept of equivalence relation | examples of equivalence relation | equivalence relation and closure)
- 从小数据量分库分表 MySQL 合并迁移数据到 TiDB
- Intel's new GPU patent shows that its graphics card products will use MCM Packaging Technology
- Zhiniu stock project -- 04
- Printer related problem record
猜你喜欢

轻松上手Fluentd,结合 Rainbond 插件市场,日志收集更快捷

Pytorch dataloader implements minibatch (incomplete)

Maximum likelihood estimation, divergence, cross entropy

智牛股项目--05

ThreadLocal的简单理解

Clickhouse learning notes (2): execution plan, table creation optimization, syntax optimization rules, query optimization, data consistency

Migrate data from Mysql to tidb from a small amount of data

Kubernetes notes (VIII) kubernetes security

CKA certification notes - CKA certification experience post

YOLOV2学习与总结
随机推荐
Cesium entity(entities) 实体删除方法
YOLOV1学习笔记
Printer related problem record
MySQL带二进制的库表导出导入
Naive Bayes in machine learning
项目总结--01(接口的增删改查;多线程的使用)
【系统设计】邻近服务
ODL framework project construction trial -demo
项目总结--04
Leetcode solution - 02 Add Two Numbers
Tabbar settings
表达式的动态解析和计算,Flee用起来真香
Oauth2.0 - use database to store client information and authorization code
Zhiniu stock -- 03
JDBC connection database steps
Kubernetes notes (VIII) kubernetes security
Yum is too slow to bear? That's because you didn't do it
认识弹性盒子flex
Zhiniu stock project -- 05
When PHP uses env to obtain file parameters, it gets strings