当前位置:网站首页>使用selenium自动化测试工具爬取高考相关院校专业招生分数线及排名情况
使用selenium自动化测试工具爬取高考相关院校专业招生分数线及排名情况
2022-07-01 03:19:00 【黄钢】
随着高考分数公布,填报大学和专业成了各位家长最重要的事情,这两天有好几位亲戚朋友咨询专业填报的事情,发现了一个网站内容不错,提供了各个学校各个专业的最低分数线和最低录取名次,网站链接在这里,这个就是计算机类专业在浙江招生的情况,专业可以换掉。
这个页面的内容还是很简单的,但是他的分页(不同年份)通过get请求没法体现,应该是用前后端分离的模式开发的,所以通过网页请求来爬虫可能不太容易实现,所以使用了selenium进行自动化提取,并自动化跳转页面。
代码如下:
from selenium import webdriver
import time
import pandas as pd
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome(r'C:\Users\HP\Downloads\chromedriver_win32\chromedriver.exe')
#time.sleep(5)
driver.get("https://www.zjut.cc/zhuanye/fsx-0809-33.html")
# time.sleep(15)
# url = driver.find_element_by_xpath("/html/body/div/div/section/main/div/div[4]/div/div[1]/div/div/div[3]/table/tbody/tr[1]")
# url = driver.find_element_by_xpath("/html/body/div/div/section/main/div/div[4]/div/div[1]/div/div/div[3]/table/tbody/tr[1]/td[2]/div")
# scqy = driver.find_element_by_xpath("/html/body/div/div/section/main/div/div[4]/div/div[1]/div/div/div[3]/table/tbody/tr[1]/td[2]/div").text
vehicles = []
res = []
for j in range(4):
schools = []
if j < 2:
for i in range(100):
series = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div/div[1]/div/div[2]/table/tbody/tr[{}]/th".format(1+i)).text
school_name = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div/div[1]/div/div[2]/table/tbody/tr[{}]/td[1]/a".format(1+i)).text
major = driver.find_element_by_xpath('//*[@id="pills-2021"]/div/div[2]/table/tbody/tr[{}]/td[1]/small[2]'.format(1+i)).text
min_score = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div/div[1]/div/div[2]/table/tbody/tr[{}]/td[2]".format(1+i)).text
min_rank = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div/div[1]/div/div[2]/table/tbody/tr[{}]/td[3]".format(1+i)).text
plan = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div/div[1]/div/div[2]/table/tbody/tr[{}]/td[4]".format(1+i)).text
schools.append([series, school_name, major, min_score, min_rank, plan])
else:
for i in range(100):
series = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div/div[3]/div/div[2]/table/tbody/tr[{}]/th".format(1+i)).text
school_name = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div/div[3]/div/div[2]/table/tbody/tr[{}]/td[1]/a".format(1+i)).text
major = driver.find_element_by_xpath('//*[@id="pills-2021"]/div/div[2]/table/tbody/tr[{}]/td[1]/small[2]'.format(1+i)).text
min_score = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div/div[3]/div/div[2]/table/tbody/tr[{}]/td[2]".format(1+i)).text
min_rank = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div/div[3]/div/div[2]/table/tbody/tr[{}]/td[3]".format(1+i)).text
plan = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/div/div[3]/div/div[2]/table/tbody/tr[{}]/td[4]".format(1+i)).text
schools.append([series, school_name, major, min_score, min_rank, plan])
df = pd.DataFrame(schools, columns=['排序', '院校', '专业', '最低分', '最低排名', '计划招录人数'])
df.to_excel("%d.xlsx" % (-j + 2021), index=False)
# res.append(schools)
a = driver.find_element_by_xpath("/html/body/div[3]/div[1]/div/ul/li[{}]/a".format(1+j))
driver.execute_script("arguments[0].click();", a)
time.sleep(3)
可以看出来,绝大多数用的xpath,但也有一些细节需要解释,等空了再来解释。
边栏推荐
- 雪崩问题以及sentinel的使用
- idea插件备份表
- LeetCode 128最长连续序列(哈希set)
- 网页不能右键 F12 查看源代码解决方案
- Explain spark operation mode in detail (local+standalone+yarn)
- IPv4 and IPv6, LAN and WAN, gateway, public IP and private IP, IP address, subnet mask, network segment, network number, host number, network address, host address, and IP segment / number - what does
- ASGNet论文和代码解读2
- [us match preparation] complete introduction to word editing formula
- 深度学习中的随机种子torch.manual_seed(number)、torch.cuda.manual_seed(number)
- Feign远程调用和Getaway网关
猜你喜欢

idea插件备份表

JUC learning

Use of comment keyword in database

详解Spark运行模式(local+standalone+yarn)
![[小样本分割]论文解读Prior Guided Feature Enrichment Network for Few-Shot Segmentation](/img/b3/887d3fb64acbf3702814d32e2e6414.png)
[小样本分割]论文解读Prior Guided Feature Enrichment Network for Few-Shot Segmentation

The preorder traversal of leetcode 144 binary tree and the expansion of leetcode 114 binary tree into a linked list

BluePrism注册下载并安装-RPA第一章

TEC: Knowledge Graph Embedding with Triple Context

Leetcode 128 longest continuous sequence (hash set)

还在浪费脑细胞自学吗,这份面试笔记绝对是C站天花板
随机推荐
shell脚本使用两个横杠接收外部参数
4、【WebGIS实战】软件操作篇——数据导入及处理
Server rendering technology JSP
LeetCode 144二叉树的前序遍历、LeetCode 114二叉树展开为链表
Thread data sharing and security -threadlocal
数组的includes( )
Filter
Valid brackets (force deduction 20)
Leetcode 1482 guess, how about this question?
Analyze datahub, a new generation metadata platform of 4.7K star
FCN full Convolution Network Understanding and Code Implementation (from pytorch Official Implementation)
Avalanche problem and the use of sentinel
实现pow(x,n)函数
How do spark tasks of 10W workers run? (Distributed Computing)
Force buckle - sum of two numbers
BluePrism注册下载并安装-RPA第一章
岭回归和lasso回归
复习专栏之---消息队列
RSN:Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs
Listener listener