当前位置:网站首页>使用selenium自动化测试工具爬取高考相关院校专业招生分数线及排名情况
使用selenium自动化测试工具爬取高考相关院校专业招生分数线及排名情况
2022-07-01 03:19:00 【黄钢】
随着高考分数公布,填报大学和专业成了各位家长最重要的事情,这两天有好几位亲戚朋友咨询专业填报的事情,发现了一个网站内容不错,提供了各个学校各个专业的最低分数线和最低录取名次,网站链接在这里,这个就是计算机类专业在浙江招生的情况,专业可以换掉。
这个页面的内容还是很简单的,但是他的分页(不同年份)通过get请求没法体现,应该是用前后端分离的模式开发的,所以通过网页请求来爬虫可能不太容易实现,所以使用了selenium进行自动化提取,并自动化跳转页面。
代码如下:
from selenium import webdriver
import time
import pandas as pd
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome(r'C:\Users\HP\Downloads\chromedriver_win32\chromedriver.exe')
#time.sleep(5)
driver.get("https://www.zjut.cc/zhuanye/fsx-0809-33.html")
# time.sleep(15)
# url = driver.find_element_by_xpath("/html/body/div/div/section/main/div/div[4]/div/div[1]/div/div/div[3]/table/tbody/tr[1]")
# url = driver.find_element_by_xpath("/html/body/div/div/section/main/div/div[4]/div/div[1]/div/div/div[3]/table/tbody/tr[1]/td[2]/div")
# scqy = driver.find_element_by_xpath("/html/body/div/div/section/main/div/div[4]/div/div[1]/div/div/div[3]/table/tbody/tr[1]/td[2]/div").text
vehicles = []
res = []
for j in range(4):
schools = []
if j < 2:
for i in range(100):
series = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div/div[1]/div/div[2]/table/tbody/tr[{}]/th".format(1+i)).text
school_name = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div/div[1]/div/div[2]/table/tbody/tr[{}]/td[1]/a".format(1+i)).text
major = driver.find_element_by_xpath('//*[@id="pills-2021"]/div/div[2]/table/tbody/tr[{}]/td[1]/small[2]'.format(1+i)).text
min_score = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div/div[1]/div/div[2]/table/tbody/tr[{}]/td[2]".format(1+i)).text
min_rank = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div/div[1]/div/div[2]/table/tbody/tr[{}]/td[3]".format(1+i)).text
plan = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div/div[1]/div/div[2]/table/tbody/tr[{}]/td[4]".format(1+i)).text
schools.append([series, school_name, major, min_score, min_rank, plan])
else:
for i in range(100):
series = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div/div[3]/div/div[2]/table/tbody/tr[{}]/th".format(1+i)).text
school_name = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div/div[3]/div/div[2]/table/tbody/tr[{}]/td[1]/a".format(1+i)).text
major = driver.find_element_by_xpath('//*[@id="pills-2021"]/div/div[2]/table/tbody/tr[{}]/td[1]/small[2]'.format(1+i)).text
min_score = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div/div[3]/div/div[2]/table/tbody/tr[{}]/td[2]".format(1+i)).text
min_rank = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div/div[3]/div/div[2]/table/tbody/tr[{}]/td[3]".format(1+i)).text
plan = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div/div[3]/div/div[2]/table/tbody/tr[{}]/td[4]".format(1+i)).text
schools.append([series, school_name, major, min_score, min_rank, plan])
df = pd.DataFrame(schools, columns=['排序', '院校', '专业', '最低分', '最低排名', '计划招录人数'])
df.to_excel("%d.xlsx" % (-j + 2021), index=False)
# res.append(schools)
a = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/ul/li[{}]/a".format(1+j))
driver.execute_script("arguments[0].click();", a)
time.sleep(3)
可以看出来,绝大多数用的xpath,但也有一些细节需要解释,等空了再来解释。
边栏推荐
- 数据库中COMMENT关键字的使用
- Detailed explanation of ES6 deconstruction grammar
- Feature pyramid networks for object detection
- IPv4 and IPv6, LAN and WAN, gateway, public IP and private IP, IP address, subnet mask, network segment, network number, host number, network address, host address, and IP segment / number - what does
- Ouc2021 autumn - Software Engineering - end of term (recall version)
- Use of comment keyword in database
- C # realize solving the shortest path of unauthorized graph based on breadth first BFS -- complete program display
- File upload and download
- RSN:Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs
- 衡量两个向量相似度的方法:余弦相似度、pytorch 求余弦相似度:torch.nn.CosineSimilarity(dim=1, eps=1e-08)
猜你喜欢

数据库中COMMENT关键字的使用

Feature Pyramid Networks for Object Detection论文理解

监听器 Listener

Binary tree god level traversal: Morris traversal

The method to measure the similarity of two vectors: cosine similarity, pytorch calculate cosine similarity: torch nn. CosineSimilarity(dim=1, eps=1e-08)

线程数据共享和安全 -ThreadLocal

小程序容器技术与物联网IoT的结合点

pytorch nn. AdaptiveAvgPool2d(1)

终极套娃 2.0 | 云原生交付的封装

Home online shopping project
随机推荐
Avalanche problem and the use of sentinel
shell脚本使用两个横杠接收外部参数
How to use hybrid format to output ISO files? isohybrid:command not found
RSN:Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs
Binary tree god level traversal: Morris traversal
E15 solution for cx5120 controlling Huichuan is620n servo error
Finally in promise
Split(), split(), slice(), can't you tell?
Subnet division (10)
监听器 Listener
The method to measure the similarity of two vectors: cosine similarity, pytorch calculate cosine similarity: torch nn. CosineSimilarity(dim=1, eps=1e-08)
Cookie&Session
Analyze datahub, a new generation metadata platform of 4.7K star
串口接收数据方案设计
The preorder traversal of leetcode 144 binary tree and the expansion of leetcode 114 binary tree into a linked list
Feature Pyramid Networks for Object Detection论文理解
LeetCode 31下一个排列、LeetCode 64最小路径和、LeetCode 62不同路径、LeetCode 78子集、LeetCode 33搜索旋转排序数组(修改二分法)
[deep learning] activation function (sigmoid, etc.), forward propagation, back propagation and gradient optimization; optimizer. zero_ grad(), loss. backward(), optimizer. Function and principle of st
split(),splice(),slice()傻傻分不清楚?
C语言多线程编程入门学习笔记