当前位置:网站首页>Using requests library and re library to crawl web pages
Using requests library and re library to crawl web pages
2022-06-26 04:50:00 【I am a little monster】
Catalog
The following are my personal learning and understanding , I hope it will be useful to you
brief introduction
First of all, requests Kuhe re The library should have a clear function orientation .requests Library is used to get web page source code , Use re The regularization matching of the library obtains the information we need from the obtained web page source code .
If you need to learn re The library learns through the following connections Regular expressions ——python The search for the string matches _ I am a little monster blog -CSDN Blog _python Find the matching string
requests The way to send the request is get and post Two kinds of , The most intuitive of the two is get When getting , Data can appear in url in , and post When getting , Need to be in post Method data Parameter to specify the data ,url And other parameters are obtained through the check in the right button of the page to be crawled 
Example
post Example
The simplest example , For example, baidu translate to translate a word , This year is the year of the tiger tiger, We get the information of the translated page , Examples are as follows :
import requests
import re
url='https://fanyi.baidu.com/sug'
s=input(' Please enter an English word \n')
dic={'kw':s}
r=requests.post(url,data=dic)
tt=r.json()
print(tt)
r.close()The operation results are as follows :
Please enter an English word
tiger
{'errno': 0, 'data': [{'k': 'tiger', 'v': 'n. The tiger ; All kinds of cats ; A vicious man , Tiger wolf disciple '}, {'k': 'Tiger', 'v': '[ The person's name ] Tiger ; [ Place names ] [ The United States ] Tiger '}, {'k': 'TIGER', 'v': 'abbr. testabitily insertion guidance expert system'}, {'k': 'tigers', 'v': 'n. The tiger ( tiger The plural of a noun ); A fierce man ; The warriors ; brave warrior '}, {'k': 'tigery', 'v': 'adj. tiger( to … Draw tiger stripes ) Deformation of '}]}***Repl Closed***
get Example
Here's how to use get Method to obtain the source code of Tencent's official website that contains “ north ” Words of the word
# If the server verifies whether it is accessed through a normal browser , Will be crawled back , Need to specify headers Parameter hiding
#verify Ignore security validation
#params Parameter specifies the value , Effect and url Adding a string of parameters after the question mark has the same effect
import requests
import re
url='https://www.qq.com/'
# If the server verifies whether it is accessed through a normal browser , Will be crawled back , Need to specify headers Parameter hiding
#verify Ignore security validation
#params Parameter specifies the value , Effect and url Adding a string of parameters after the question mark has the same effect
r=requests.get(url,verify=False)#verify
r.encoding='gb2312'
tt=r.text# Get the required source code
p=re.compile('(?P<wenzi> north .*?)<')
results=p.findall(tt)
print(results)
r.close()# Finally, close the request The output is as follows :
D:\python\lib\site-packages\urllib3\connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host 'www.qq.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
[' The Beijing municipal ', ' The Beijing municipal ', ' Northern Province ', ' Northern Province ', ' The Beijing municipal ', ' The Beijing municipal ', ' Beijing ”', ' Beijing has been 10 There is no local addition ! newly added 2 Cases of imported asymptomatic infection ', ' Beijing added a new report yesterday 3 Of the confirmed cases All in Daxing District ', ' Thunderstorms are frequent in Beijing Pay attention to lightning protection and rain protection when going out ', ' The Beijing epidemic is only a small-scale rebound , China rejects the second wave ', ' Beijing releases case details More than one quarantined person is not reported ', ' Beijing multi person isolation 14 Days later, the diagnosis was made , Experts say there are two reasons ', ' The Bank of Beijing was closed for a week ? Five lines refute rumors : Only individual risk area outlets are suspended ', ' Beijing 6 month 30 The day has 3 The risk level of local epidemic situation is degraded ', ' Three focal points of epidemic control in Beijing ', ' Beijing : A patient who has been discharged from the hospital with COVID-19 No human to human transmission has been found ', ' Beijing : The proportion of patients with severe and critical illness is obviously low ', ' Beijing : The non emergency comprehensive appointment mechanism of medical institutions above the second level shall be normalized ', ' The balance of Beijing provident fund account can be directly used to repay the loan !', ' Beijing 57 The appointment telephone number for nucleic acid testing of public medical institutions was announced ', ' Beijing is near 4 The day has 37 Cases of confirmed cases came from centralized isolation sites ', ' Beida Xueba talks about the Winter Olympics , I am looking forward to alpine speed skating ', ' Beijing Winter Olympics ', ' Beijing Internet court legal service workstation ']
[Finished in 0.5s]
边栏推荐
- Dameng database backup and restore
- Redis cache message queue
- Physical design of database design (2)
- 1.20 learning summary
- 企业的产品服务怎么进行口碑营销?口碑营销可以找人代做吗?
- Créateur de génie: cavalier solitaire, magnat de la technologie et ai | dix ans d'apprentissage profond
- YOLOV5训练结果的解释
- Multipass中文文档-提高挂载性能
- Why do many Shopify independent station sellers use chat robots? Read industry secrets in one minute!
- Solution to back-off restarting failed container
猜你喜欢

1.16 learning summary

企业的产品服务怎么进行口碑营销?口碑营销可以找人代做吗?

Multipass Chinese document - setup driver

Thinkphp6 using kindeditor

Motivational skills for achieving goals

Nabicat连接:本地Mysql&&云服务Mysql以及报错

为什么许多shopify独立站卖家都在用聊天机器人?一分钟读懂行业秘密!

PHP small factory moves bricks for three years - interview series - my programming life

一个从坟墓里爬出的公司

Use fill and fill in Matplotlib_ Between fill the blank area between functions
随机推荐
Multipass Chinese document - setup driver
Hash problem
Thinkphp6 parsing QR code
Stm8 MCU ADC sampling function is triggered by timer
Thymeleaf data echo, single selection backfill, drop-down backfill, time frame backfill
2022.1.24
排序查询
Use of better scroll
2022.2.13
Zhimeng CMS will file a lawsuit against infringing websites
Multipass Chinese document - share data with instances
[H5 development] 03- take you hand in hand to improve H5 development - single submission vs batch submission with a common interface
Wechat applet exits the applet (navigator and api--wx.exitminiprogram)
numpy 通用函数
Comment enregistrer une image dans une applet Wechat
numpy 索引及切片
2020-12-18
1.13 learning summary
1.18 learning summary
超高精度定位系统中的UWB是什么