当前位置:网站首页>Use selenium to climb the annual box office of Yien
Use selenium to climb the annual box office of Yien
2022-07-03 06:15:00 【Black~boy】
utilize selenium Climb to the annual box office of Yien
1. summary
1.1 selenium
Selenium Is a Web Tools for application testing .Selenium Test runs directly in browser , It's like a real user is doing it . Supported browsers include IE(7, 8, 9, 10, 11),Mozilla Firefox,Safari,Google Chrome,Opera,Edge etc. . The main functions of this tool include : Test compatibility with browser —— Test the application to see if it works well on different browsers and operating systems . Test system functions —— Create regression tests to verify software functionality and user requirements . Support automatic recording of actions and automatic generation .Net、Java、Perl Test scripts in different languages .( From baidu baike )
2. Crawling principle
utilize selenium Crawl the data in the website , And save it to mysql In the database
3. Preparation
3.1webdrive: Similar to drive ( The principle is as follows )
Webdriver It is developed for different browsers , Different browsers have different webdriver. For example, for Chrome The use of chromedriver.
remind :webdriver It must be consistent with the browser version !
3.2 selenium library
install selenium library :
3.3 mysql Database installation
Installation details mysql Installation tutorial
3.4 mysql And python Connection Library ( Be similar to webdrive)
There are many connection libraries , Please see the link below for details
Connection Library
This case uses pymysql:
3.5 re( Regular expressions ) library
A regular expression is a special sequence of characters , It can help you easily check whether a string matches a certain pattern .
Python since 1.5 Version has been added re modular , It provides Perl Style regular expression pattern .
re Module enable Python The language has all the regular expression functions .
compile Function to generate a regular expression object based on a pattern string and optional flag parameters . This object has a series of methods for matching and replacing regular expressions .
re The module also provides functions that are fully consistent with the functions of these methods , These functions use a pattern string as their first argument .
4. Code instance
import re
import pymysql
from selenium import webdriver
from selenium.webdriver.support.select import Select
import time
db = pymysql.connect(host='127.0.0.1', port=3306,user = 'root',password='123456',database='dianying',charset='utf8') # Database name 、 The password is defined for yourself
driver = webdriver.Chrome()
driver.get('https://www.endata.com.cn/BoxOffice/BO/Year/index.html')
sel_el = driver.find_element_by_xpath('//*[@id="OptionDate"]')
sel = Select(sel_el)
for i in range(len(sel.options)):
sel.select_by_index(i)
time.sleep(2)
table2 = driver.find_element_by_xpath('/html/body/section[1]/div/div[2]/div/div/div[2]/table/tbody')
ss = table2.text
ss1 = re.split(r'[\n ]',ss)
for j in range(25):
cursor = db.cursor()
demo = cursor.execute('INSERT INTO data VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s)',(str(2021-i),ss1[j*8+0],ss1[j*8+1],ss1[j*8+2],ss1[j*8+3],ss1[j*8+4],ss1[j*8+5],ss1[j*8+6],ss1[j*8+7]))
lists = cursor.fetchall()
db.commit()
print("==================================")
db.close()
driver.close()
5. design sketch
6 explain
If there is any infringement , Contact deletion [email protected]
边栏推荐
- Code generator - single table query crud - generator
- Project summary --2 (basic use of jsup)
- ODL framework project construction trial -demo
- 智牛股项目--04
- Intel's new GPU patent shows that its graphics card products will use MCM Packaging Technology
- Interesting research on mouse pointer interaction
- Mysql database binlog log enable record
- What's the difference between using the Service Worker Cache API and regular browser cache?
- Loss function in pytorch multi classification
- 项目总结--04
猜你喜欢
轻松上手Fluentd,结合 Rainbond 插件市场,日志收集更快捷
Skywalking8.7 source code analysis (II): Custom agent, service loading, witness component version identification, transform workflow
Fluentd facile à utiliser avec le marché des plug - ins rainbond pour une collecte de journaux plus rapide
Core principles and source code analysis of disruptor
Kubernetes notes (VIII) kubernetes security
How to scan when Canon c3120l is a network shared printer
Kubernetes notes (I) kubernetes cluster architecture
智牛股项目--05
Jedis source code analysis (I): jedis introduction, jedis module source code analysis
. Net program configuration file operation (INI, CFG, config)
随机推荐
Understand expectations (mean / estimate) and variances
Kubernetes notes (IV) kubernetes network
YOLOV2学习与总结
Kubernetes notes (VIII) kubernetes security
Simple handwritten ORM framework
Characteristics and isolation level of database
YOLOV1学习笔记
Convolution operation in convolution neural network CNN
What's the difference between using the Service Worker Cache API and regular browser cache?
There is no one of the necessary magic skills PXE for old drivers to install!!!
Yum is too slow to bear? That's because you didn't do it
冒泡排序的简单理解
项目总结--04
Exportation et importation de tables de bibliothèque avec binaires MySQL
智牛股项目--05
Intel's new GPU patent shows that its graphics card products will use MCM Packaging Technology
Detailed explanation of contextclassloader
BeanDefinitionRegistryPostProcessor
Deep learning, thinking from one dimensional input to multi-dimensional feature input
Support vector machine for machine learning