当前位置:网站首页>Requses template
Requses template
2022-07-28 15:15:00 【Demon YoY】
Preface
This article only talks about simple requests Something about , If you want to learn reptiles , Another article of mine
First, let's talk about the website crawling process :
1) There is a website to crawl url;
2) Check elements (F12), Check the source code to find the resources you want to crawl ( Text 、 website 、、、)
3) Code section ,get Request URL , Data location ( Tag Index )
4) Save the crawled array ( Save to local )
Simply climb and take these things
Page elements
Check elements 、 View page source code 、F12
panel ( part ):
| panel | explain |
|---|---|
| Element panel (Elements) | Web text content , Use element panel to operate freely DOM and CSS To iterate over the layout and design pages |
| console panel (Console) | This panel records various warning and error messages , And can be used as shell On the page with JavaScript Interaction |
| Network panel (Network) | You can view page requests 、 Downloaded resource files 、 And optimize web page loading performance |
Detailed explanation of each panel
The data that can be seen in the web page can be obtained in the element panel
Get in the network panel User-Agent、cookies, These things will be used 
And then there is , Here's the picture :
Go to a website and find the required data in the element , You need to review the following information further :
1、 Select network bar (NetWork)
2、 Find the website options on this page
3、 see url
4、 View the request type ,get or post, Mostly get
5、 Website data format , It's usually text or json
Requests Library request
Go straight to the code :
url = 'https://movie.douban.com/chart' # The website of Douban ranking list
# Set request header
hd = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36 Edg/90.0.818.62'
# Generate get request
r = requests.get(url,headers=hd)
# Get element panel text
text = r.content.decode('utf-8')
#content Binary system
#decode Coding format ('utf-8','gbk') Generally, these two kinds
''' These two lines of code are equivalent to uplink r.encoding = 'utf-8' text = r.text '''
print(text)
Output results 
get Method can send information to the server , In addition to requesting the required pages (url) outside , You can also send the content we specify
headers Parameters ( Dictionaries ): Simulated browser login ,User-Agent: The user agent 、Cookie: Maintain user access status , Used for user login
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36',
'Cookie': '_octooz7h3ni%2FGlaRFR3ETrDvuhKiiUgJP8jStjNLNiFfpvWQy7U10IGCY15XhHxudYu3tRlt%2Fawt4SHEaDct0LNUQ%2B%2Fi2rHiCLVsL1Y8w%2BC9HpTtd2S6gxDLzfHK5dvBPc4TB6WDn%2BaRt9ljs4lSdlT0mn--qas9T0w68J9araMi--Se2WerqwQ6PlXV5xa2W9lw%3D%3D'
}
cookies Can be in Network Get in the panel
cookies Parameters
Example :
cookie = {
'__cfduid':"d3dbaf9b94d0d23daa6b3cb26cf79242c1621331340",
'__yjs_duid':"1_ad976dbfb5297c53ae1f2b995a9fe7371620895017180",
'Hm_lpvt_2c6cc9163dcd6f496c48a6b8ac043309':"1621336831",
'Hm_lvt_2c6cc9163dcd6f496c48a6b8ac043309':"1621316608,1621336294,1621336595,1621336821",
'PHPSESSID':"k39e67jndp98visgouhvclrb77",
'trennlastsearchtime':"1621331611",
'trennmlauth':"a45766ed6525d97dae8c035f3fd05ef8",
'trennmlgroupid':"3",
'trennmlrnd':"CCL36aDXfcQqN7D5X5kn",
'trennmluserid':"13977",
'trennmlusername':"LZT"}
# about cookie You must log in to get
re = requests.get("https://www.4kbizhi.com/wallpaper/7433-original.html",cookies=cookie)
# In this way, users can log in and visit the website
params Parameters ( Dictionaries ): Add on the website url Back , Often used in simulated search
import requests
url = 'https://www.baidu.com/s' # The last one s, This is the URL of the search
hd = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36 Edg/90.0.818.62'
}
params = {
'wd': ' Tomorrow's Ark '
}
re = requests.get(url,params=params,headers=hd)
print(re.url)
result :
https://www.baidu.com/s?wd=%E6%98%8E%E6%97%A5%E6%96%B9%E8%88%9F
This is the result of coding , After the code is :
https://www.baidu.com/s?wd= Tomorrow's Ark
timeout Parameters ( decimal ):
In order to avoid permanently losing the response of the program due to waiting for the server to respond for too long , Parameter value exceeded ( The time ) The program will automatically stop waiting
proxies( Proxy parameters ):
Definition : Replace your original IP Address to the network IP Address .
effect : Hide your truth IP, Avoid being sealed
Grammatical structure
proxies = {
' agreement ':[' agreement ://]IP: Port number '
}
example :
proxies = {
'http':'112.85.164.220'
'http':'http://112.85.164.220:9999',
'https':'https://112.85.164.220:9999'
}
Of course, the agents provided in these pages IP Most people can't use it because there are too many people , For example, in the Western spurs agency ip, Crawled with reptiles 1000 The only thing that can be used is 80 Multiple , In two days, one will be a lot worse , Generally, larger companies will provide you with special funds to purchase agents IP, If not, just accumulate your own IP Pool bar
verify(SSL Certification parameters )
Applicable websites : https Type of website but not through Certification body Certified website
Applicable scenario : Throw out SSLError Exceptions consider using this parameter
verify=True( Default ) : Check certificate Certification
verify=False( Commonly used ): Ignore Certificate Authentication
Generally, this doesn't matter
auth(Web Client validation parameters )
1、 For needs web Client user name password authentication website
2、auth = (‘username’,‘password’)
边栏推荐
- Three pop-up boxes commonly used in JS
- No files or folders found to process
- 每日一题(回溯)
- redis常用命令总结(自备)
- iframe 标签
- Mysql使用left join连表查询时,因连接条件未加索引导致查询很慢
- Repvgg paper explanation and model reproduction using pytoch
- Foundation of knowledge atlas (II) - knowledge expression system of knowledge atlas
- Chapter I Introduction
- How to conduct risk assessment related to intellectual property rights
猜你喜欢

Establishment and traversal of binary tree (implemented in C language)

Enterprise wechat customer service link, enterprise wechat customer service chat

MLX90640 红外热成像仪传感器模块开发笔记(八)

听说crmeb多商户增加了种草功能?

Chapter II linear table

使用cpolar发布树莓派网页(apache2网页的发布)

The automatic prompt of vs code code is missing - a tick to solve it

Buuctf partial solution

全开源免费的客服系统来了
Node.js+express realizes the operation of MySQL database
随机推荐
Apple iPhone app icon hidden how to retrieve and restore the hidden app icon displayed on the iPhone iPhone desktop to the iPhone iPhone iPhone desktop?
buuctf_ php
网络安全应急响应具体操作流程
PHP memory horse
Node.js+express realizes the operation of MySQL database
The difference between @notnull, @notblank, @notempty of commonly used verification annotations
Word creates a title list with automatic numbering
A problem -- about dragging div in JS, when I change div to round, there will be a bug
4518. 最低票价
Establish binary tree + C language code from preorder and middle order
The automatic prompt of vs code code is missing - a tick to solve it
Bcompare key expired or bcompare license key revoked
Compose learning notes 2 - launchedeffect, status and status management
SQL learning
Three pop-up boxes commonly used in JS
View gnuradio version
从thinkphp远程代码执行学php反射类
每日一题(回溯)
Establishment and traversal of binary tree (implemented in C language)
MySQL authorization method