当前位置:网站首页>Using requests library and re library to crawl web pages

Using requests library and re library to crawl web pages

2022-06-26 04:50:00 I am a little monster

Catalog

brief introduction

Example

post Example

get Example


The following are my personal learning and understanding , I hope it will be useful to you

brief introduction

First of all, requests Kuhe re The library should have a clear function orientation .requests Library is used to get web page source code , Use re The regularization matching of the library obtains the information we need from the obtained web page source code .

If you need to learn re The library learns through the following connections Regular expressions ——python The search for the string matches _ I am a little monster blog -CSDN Blog _python Find the matching string

requests The way to send the request is get and post Two kinds of , The most intuitive of the two is get When getting , Data can appear in url in , and post When getting , Need to be in post Method data Parameter to specify the data ,url And other parameters are obtained through the check in the right button of the page to be crawled

Example

post Example

The simplest example , For example, baidu translate to translate a word , This year is the year of the tiger tiger, We get the information of the translated page , Examples are as follows :

import requests
import re

url='https://fanyi.baidu.com/sug'

s=input(' Please enter an English word \n')
dic={'kw':s}
r=requests.post(url,data=dic)

tt=r.json()
print(tt)

r.close()

The operation results are as follows : 

Please enter an English word
tiger
{'errno': 0, 'data': [{'k': 'tiger', 'v': 'n. The tiger ; All kinds of cats ; A vicious man , Tiger wolf disciple '}, {'k': 'Tiger', 'v': '[ The person's name ] Tiger ; [ Place names ] [ The United States ] Tiger '}, {'k': 'TIGER', 'v': 'abbr. testabitily insertion guidance expert system'}, {'k': 'tigers', 'v': 'n. The tiger ( tiger The plural of a noun ); A fierce man ; The warriors ; brave warrior '}, {'k': 'tigery', 'v': 'adj. tiger( to … Draw tiger stripes ) Deformation of '}]}

***Repl Closed***

get Example

Here's how to use get Method to obtain the source code of Tencent's official website that contains “ north ” Words of the word

# If the server verifies whether it is accessed through a normal browser , Will be crawled back , Need to specify headers Parameter hiding
#verify Ignore security validation
#params Parameter specifies the value , Effect and url Adding a string of parameters after the question mark has the same effect

import requests
import re

url='https://www.qq.com/'

# If the server verifies whether it is accessed through a normal browser , Will be crawled back , Need to specify headers Parameter hiding 
#verify Ignore security validation 
#params Parameter specifies the value , Effect and url Adding a string of parameters after the question mark has the same effect 
r=requests.get(url,verify=False)#verify

r.encoding='gb2312'
tt=r.text# Get the required source code 

p=re.compile('(?P<wenzi> north .*?)<')
results=p.findall(tt)
print(results)

r.close()# Finally, close the request 

The output is as follows : 

 D:\python\lib\site-packages\urllib3\connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host 'www.qq.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(
[' The Beijing municipal ', ' The Beijing municipal ', ' Northern Province ', ' Northern Province ', ' The Beijing municipal ', ' The Beijing municipal ', ' Beijing ”', ' Beijing has been 10 There is no local addition ! newly added 2 Cases of imported asymptomatic infection ', ' Beijing added a new report yesterday 3 Of the confirmed cases All in Daxing District ', ' Thunderstorms are frequent in Beijing Pay attention to lightning protection and rain protection when going out ', ' The Beijing epidemic is only a small-scale rebound , China rejects the second wave ', ' Beijing releases case details More than one quarantined person is not reported ', ' Beijing multi person isolation 14 Days later, the diagnosis was made , Experts say there are two reasons ', ' The Bank of Beijing was closed for a week ? Five lines refute rumors : Only individual risk area outlets are suspended ', ' Beijing 6 month 30 The day has 3 The risk level of local epidemic situation is degraded ', ' Three focal points of epidemic control in Beijing ', ' Beijing : A patient who has been discharged from the hospital with COVID-19 No human to human transmission has been found ', ' Beijing : The proportion of patients with severe and critical illness is obviously low ', ' Beijing : The non emergency comprehensive appointment mechanism of medical institutions above the second level shall be normalized ', ' The balance of Beijing provident fund account can be directly used to repay the loan !', ' Beijing 57 The appointment telephone number for nucleic acid testing of public medical institutions was announced ', ' Beijing is near 4 The day has 37 Cases of confirmed cases came from centralized isolation sites ', ' Beida Xueba talks about the Winter Olympics , I am looking forward to alpine speed skating ', ' Beijing Winter Olympics ', ' Beijing Internet court legal service workstation ']
[Finished in 0.5s]

原网站

版权声明
本文为[I am a little monster]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202180510153386.html