当前位置:网站首页>2021 software university ranking crawler program
2021 software university ranking crawler program
2022-06-26 08:56:00 【ML_ python_ get√】
# -*- Coding: UTF-8 -*-
# data.py
# @ author ML_get
# @ Date of creation 2021-04-26T16:00:54.397Z+08:00
# @ Last modified date 2021-04-26T22:12:42.172Z+08:00
# Soft science ranking
import requests
from bs4 import BeautifulSoup
import json
import csv
class FindRank:
def __init__(self, num):
self.num = num
self.headers = {
'User-Agent':
'Mozilla/5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1 Edg/90.0.4430.85'
}
pass
def parse(self, url):
# Pass in url Return a dictionary
try:
response = requests.get(url, headers=self.headers, timeout=20)
response.raise_for_status()
dict_text = json.loads(response.content.decode())
return dict_text
except:
return ''
def get_data(self, ulist):
# Extract ranking information
# Print header
print("{:^10}\t{:^20}\t{:^10}".format(' ranking ', ' School name ', ' score '))
for i in range(self.num):
u = ulist[i]
print("{
:^10}\t{
:^20}\t{
:^10}\
".format(u['rankOverall'], u['univNameCn'], u['score']))
def store_data(self, ulist):
with open('rank.csv', 'w', newline='') as f:
w = csv.DictWriter(f, ulist[0].keys())
w.writeheader()
w.writerows(ulist)
print(' Write successfully ')
def run(self):
# Implement the main logic
# 1、 Get web information
url = 'https://www.shanghairanking.cn/api/pub/v1/bcur?bcur_type=11&year=2021'
dict1 = self.parse(url)
# 2、 Extract web page information, store it in data structure and display it
ulist = dict1['data']['rankings']
self.get_data(ulist)
# 3. Or deposited locally
self.store_data(ulist)
if __name__ == '__main__':
rank = FindRank(300)
rank.run()
边栏推荐
- SRv6----IS-IS扩展
- ROS learning notes (5) -- Exploration of customized messages
- 【程序的编译和预处理】
- How to realize wireless Ethernet high-speed communication for multiple Mitsubishi PLCs?
- 隐藏式列表菜单以及窗口转换在Selenium 中的应用
- OpenGL display mat image
- [已解决]setOnNavigationItemSelectedListener()被弃用
- [unity mirror] use of networkteam
- torch. fft
- 如何利用最少的钱,快速打开淘宝流量入口?
猜你喜欢

pgsql_ UDF01_ jx

Slider verification - personal test (JD)

MPC learning notes (I): push MPC formula manually

Nebula diagram_ Object detection and measurement_ nanyangjx

Relation extraction model -- spit model

【Unity Mirror】NetworkTeam的使用

Fourier transform of image

鲸会务一站式智能会议系统帮助主办方实现数字化会议管理

OpenCV Learning notes iii

Opencv learning notes II
随机推荐
[unity mirror] use of networkteam
滑块验证 - 亲测 (京东)
The solution of positioning failure caused by framework jump
And are two numbers of S
在 KubeSphere 部署 Wiki 系统 wiki.js 并启用中文全文检索
Corn image segmentation count_ nanyangjx
Analysis of Yolo series principle
力扣399【除法求值】【并查集】
Principle of playing card image segmentation
鲸会务一站式智能会议系统帮助主办方实现数字化会议管理
1.21 study logistic regression and regularization
Performance comparison of unaryexpr's function on matrix elements in eigen Library
1.25 suggestions and design of machine learning
Optimize quiver function in MATLAB to draw arrow diagram or vector diagram (1) -matlab development
Clion installation + MinGW configuration + opencv installation
Ltp-- extract time, person and place
Section IV HQL execution process
Exploration of webots and ROS joint simulation (I): software installation
[QNX Hypervisor 2.2用户手册]12.2 术语(二)
pgsql_ UDF01_ jx