当前位置:网站首页>UA camouflage, get and post in requests carry parameters to obtain JSON format content
UA camouflage, get and post in requests carry parameters to obtain JSON format content
2022-07-03 07:36:00 【start field】
First of all, let's learn an anti climbing strategy UA camouflage .
UA yes User-Agent( The identity of the request carrier )
Most websites have one UA Anti creep mechanism of detection , It will detect the identity of the request carrier , If it is detected that the identity of the request carrier is a browser , It indicates that the request is normal , Conversely, if the detection of identity is not browser based , That's reptile , It is likely that the server will reject the request .
UA camouflage : Let the crawler's identity disguise as a browser .
How to disguise ?
First open the browser , Right click to check or press fn and f12 Open the developer tool

Then choose the network , Select a request header , Turn down and find it User-Agent Then copy and package it into a dictionary .

headers = {
"user-agent": "Mozilla / 5.0(Windows NT 10.0;Win64;x64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 97.0.4692.71Safari / 537.36Edg / 97.0.1072.55"
}Put it in when you make a request
page_text = requests.get(url=url,headers=headers)
Next, deal with if url How to carry parameters in .
It also encapsulates parameters into a dictionary , however get The request is assigned to params,post The request is assigned to data.
response = requests.get(url=url,params=param,headers=headers)response = requests.post(url=url,data=data,headers=headers)
Last , How to get json Formatted data
json() The object returned (obj), Only confirm that the response content is json Type of , Can be used json() Method .
How do you know if the response content is json Type? ? Or open the developer tool , Select the network , Select the request header to find content-type, You can know which type it is .
![]()
The following is the code of crawling Baidu translation
# -- coding:UTF-8 --
import json
import requests
if __name__ == "__main__":
url = 'https://fanyi.baidu.com/sug'
word = input('enter a word:')
data = {
'kw':word
}
headers = {
"user-agent": "Mozilla / 5.0(Windows NT 10.0;Win64;x64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 97.0.4692.71Safari / 537.36Edg / 97.0.1072.55"
}
response = requests.post(url=url,data=data,headers=headers)
dic_obj = response.json()
filename = word+'.json'
with open (filename,'w',encoding='utf-8') as fp:
json.dump(dic_obj,fp=fp,ensure_ascii=False)
print('over')Use json.dump Import when json library ,dump() The role of the python The object is encoded as Json character string .
ensure_ascii: This parameter takes only Boolean values . If it is not set to true, Default output ASCLL value , If you put ensure_ascii Assigned as False, You can output Chinese .
边栏推荐
- Common operations of JSP
- Common methods of file class
- Analysis of the problems of the 7th Blue Bridge Cup single chip microcomputer provincial competition
- HCIA notes
- 为什么说数据服务化是下一代数据中台的方向?
- 技术干货|AI框架动静态图统一的思考
- 圖像識別與檢測--筆記
- List exercises after class
- 哪一刻你才发现青春结束了
- Download address collection of various versions of devaexpress
猜你喜欢

Dora (discover offer request recognition) process of obtaining IP address
![[mindspire paper presentation] summary of training skills in AAAI long tail problem](/img/34/9c9ec1b94edeecd4a3e7f20fdd8356.png)
[mindspire paper presentation] summary of training skills in AAAI long tail problem

FileInputStream and fileoutputstream

The embodiment of generics in inheritance and wildcards

Homology policy / cross domain and cross domain solutions /web security attacks CSRF and XSS

Lucene introduces NFA

TCP cumulative acknowledgement and window value update

昇思MindSpore再升级,深度科学计算的极致创新

项目经验分享:基于昇思MindSpore实现手写汉字识别

Use of other streams
随机推荐
VMware network mode - bridge, host only, NAT network
[set theory] Stirling subset number (Stirling subset number concept | ball model | Stirling subset number recurrence formula | binary relationship refinement relationship of division)
Jeecg data button permission settings
专题 | 同步 异步
docket
IO stream system and FileReader, filewriter
Dora (discover offer request recognition) process of obtaining IP address
Use of generics
Topic | synchronous asynchronous
【开发笔记】基于机智云4G转接板GC211的设备上云APP控制
項目經驗分享:實現一個昇思MindSpore 圖層 IR 融合優化 pass
The babbage industrial policy forum
Leetcode 198: house raiding
论文学习——鄱阳湖星子站水位时间序列相似度研究
Leetcode 198: 打家劫舍
技术干货|昇思MindSpore Lite1.5 特性发布,带来全新端侧AI体验
pgAdmin 4 v6.11 发布,PostgreSQL 开源图形化管理工具
Common operations of JSP
Various postures of CS without online line
[coppeliasim4.3] C calls UR5 in the remoteapi control scenario