当前位置:网站首页>Detailed explanation of the ranking of the best universities
Detailed explanation of the ranking of the best universities
2022-07-05 04:49:00 【Prosperity comes to an end and the city is ruined 891】
#2021/10/16 Saturday
# Crawling https://www.shanghairanking.cn/rankings/bcur/202111 The top Chinese Universities on the website 20 Famous university “ ranking ”“ University name ”“ Provinces ”“ Total score ” Four things
# Before crawling, carefully observe the web page source code of the content to be crawled , Include the tag element where the content is located (<tbody><tr><td><div><a>), Sort the crawled content in the same tag
# The website changes every year , The source code of the website will change , In recent years, there are many spaces in the content tags we need to crawl , Attention should be paid to handling
import requests# Role of request , The simple understanding is to request web pages url link , Then climb it
import bs4# In the second method bs4 Tag definition function of element
from bs4 import BeautifulSoup# This BeautifulSoup Library is a function of typesetting and beautifying web pages , To the original web page html Wrap closer to make it look more comfortable
def getHTMLText(url):# Get university rankings from the web : Defined function getHTMLText()
try: # remarks 1
r = requests.get(url,timeout=30)# adopt get Function to obtain url Information
r.raise_for_status()# Used to generate abnormal information
r.encoding = r.apparent_encoding# Modify encoding ,apparent_encoding It's usually utf-8, Avoid garbled code .
return r.text# If successful, the web page information of the link will be returned
except:
return ""# Otherwise, it is abnormal information , Return to empty string
def fillUnivList(ulist, html):# Extract the information needed in the university ranking web page and store it in the appropriate list
soup = BeautifulSoup(html, "html.parser")# adopt BeautifulSoup Function to adjust the page , Make the format more convenient to see , use html The parser
for tr in soup.find('tbody').children:# remarks 2
if isinstance(tr, bs4.element.Tag):# remarks 3( To filter out bs4 Other information of non label information defined by the Library )
a = tr('a')# Will all a The tag is saved as a list type
tds = tr('td')# Will all td The tag is saved as a list type
ulist.append([tds[0].text.strip(), a[0].text.strip(), tds[2].text.strip(),tds[4].text.strip()])
#td There is more white space before the content in the label ,strip() Method is used to remove the characters specified at the beginning and end of a string ( The default is space or newline ) Or character sequence
def printUnivList(ulist, num):# Use data structure to display and output results
tplt = "{0:^10}\t{1:{4}^10}\t{2:^10}\t{3:^10}"
# use tplt Store output The definition of format ; among ^ Indicates center alignment ,10 According to the said 10 The length of characters is output . The length is not enough to fill in spaces ,{4} Said the use of format Functional
# The fourth variable is filled , That is, fill in the blanks in Chinese .
print(tplt.format(" ranking "," School name "," Provinces "," Total score ",chr(12288)))
#Python Use .format Function to format the output
#chr(12288) Means to fill in blanks according to Chinese habits , To output aligned constraints
for i in range(num):
u=ulist[i]
print(tplt.format(u[0],u[1],u[2],u[3],chr(12288)))
print("Suc"+str(num))
def main():
uinfo = []# Store University Information
url = "https://www.shanghairanking.cn/rankings/bcur/202111"
html = getHTMLText(url# Get the content of this page
fillUnivList(uinfo,html)# Analyze the content of this web page , Store in uinfo In the list
printUnivList(uinfo,20)# Print the information of the top 20 in the list
main()
# remarks 1:
#try except The execution flow of the statement is as follows :
# First, execute try Code block in , If an exception occurs during execution , The system will automatically generate an exception type , And submit the exception to Python Interpreter , This process is called catching exceptions .
# When Python When the interpreter receives an exception object , Will look for someone who can handle the exception object except block , If you find the right except block , Then give the exception object to the except Block handling ,
# This process is called exception handling . If Python The interpreter could not find a to handle the exception except block , Then the program is terminated ,Python The interpreter will also exit .
# remarks 2:
# The following functions need to be written by observing the source code of the web page ,( You can use the web page source code page ctrl+f Find the tag ) It can be seen that : One <tr></tr> It contains all the information of a University , Every
#<td></td> It also includes a ranking of different aspects of universities 、 name 、 Provinces and cities, etc .tr The last attribute of is tbody, adopt tbody The child node of search traverses all tr, stay tr label
# Find td Tag information , And will be the first 1、2、4 Corresponding to tds No 0、1、3 Column information , The first 1 Corresponding a The... In the array 0 The information of the column is stored in ulist in .
# remarks 3:
#isinstance Function USES ,isinstance() Function to determine whether an object is a known type , similar type().isinstance(object, classinfo),object: Instance object .
#classinfo: It can be a direct or indirect class name 、 Basic types or tuples made up of them . Determine whether the instance belongs to which class .
#bs4.element.Tag:bs4 Defined in the library tag type
边栏推荐
- [groovy] closure (closure parameter list rule | default parameter list | do not receive parameters | receive custom parameters)
- 程序员应该怎么学数学
- Emlog blog theme template source code simple good-looking responsive
- SQLServer 存储过程传递数组参数
- 【acwing】528. cheese
- 775 Div.1 B. integral array mathematics
- Neural networks and deep learning Chapter 4: feedforward neural networks reading questions
- AutoCAD - command repetition, undo and redo
- Fluent objects and lists
- 2022 thinking of mathematical modeling D problem of American college students / analysis of 2022 American competition D problem
猜你喜欢
![[Business Research Report] top ten trends of science and technology and it in 2022 - with download link](/img/9f/4fc63fa7b0e9afc5dd638d4b599b2c.jpg)
[Business Research Report] top ten trends of science and technology and it in 2022 - with download link
![[groovy] closure closure (customize closure parameters | customize a single closure parameter | customize multiple closure parameters | specify the default value of closure parameters)](/img/92/937122b059b6f3a91ae0e0858685e7.jpg)
[groovy] closure closure (customize closure parameters | customize a single closure parameter | customize multiple closure parameters | specify the default value of closure parameters)
![[PCL self study: feature9] global aligned spatial distribution (GASD) descriptor (continuously updated)](/img/2b/933586b6feff1d48c5bee11cd734ba.jpg)
[PCL self study: feature9] global aligned spatial distribution (GASD) descriptor (continuously updated)

Looking at Chinese science and technology from the Winter Olympics: what is the mystery of the high-speed camera that the whole people thank?

Cookie learning diary 1

【Leetcode】1352. 最后 K 个数的乘积

Setting up redis cluster cluster under Windows
![[groovy] closure (closure as function parameter | code example)](/img/a6/a4ed401acfb61f85eb08daa15a8a80.jpg)
[groovy] closure (closure as function parameter | code example)

2022 thinking of mathematical modeling a problem of American college students / analysis of 2022 American competition a problem

【acwing】528. cheese
随机推荐
Neural networks and deep learning Chapter 4: feedforward neural networks reading questions
质量体系建设之路的分分合合
Is $20billion a little less? Cisco is interested in Splunk?
Observable time series data downsampling practice in Prometheus
AutoCAD - scaling
Use assimp library to read MTL file data
解密函数计算异步任务能力之「任务的状态及生命周期管理」
[Business Research Report] Research Report on male consumption trends in other economic times -- with download link
2021 higher education social cup mathematical modeling national tournament ABCD questions - problem solving ideas - Mathematical Modeling
Neural network and deep learning Chapter 1: introduction reading questions
AutoCAD -- dimension break
【acwing】836. Merge sets
AutoCAD - Zoom previous
3 minutes learn to create Google account and email detailed tutorial!
Decryption function calculates "task state and lifecycle management" of asynchronous task capability
#775 Div.1 C. Tyler and Strings 组合数学
包 类 包的作用域
AutoCAD - stretching
#775 Div.1 B. Integral Array 数学
Rip notes [rip message security authentication, increase of rip interface measurement]