当前位置:网站首页>UA camouflage, get and post in requests carry parameters to obtain JSON format content
UA camouflage, get and post in requests carry parameters to obtain JSON format content
2022-07-03 07:36:00 【start field】
First of all, let's learn an anti climbing strategy UA camouflage .
UA yes User-Agent( The identity of the request carrier )
Most websites have one UA Anti creep mechanism of detection , It will detect the identity of the request carrier , If it is detected that the identity of the request carrier is a browser , It indicates that the request is normal , Conversely, if the detection of identity is not browser based , That's reptile , It is likely that the server will reject the request .
UA camouflage : Let the crawler's identity disguise as a browser .
How to disguise ?
First open the browser , Right click to check or press fn and f12 Open the developer tool
Then choose the network , Select a request header , Turn down and find it User-Agent Then copy and package it into a dictionary .
headers = { "user-agent": "Mozilla / 5.0(Windows NT 10.0;Win64;x64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 97.0.4692.71Safari / 537.36Edg / 97.0.1072.55" }
Put it in when you make a request
page_text = requests.get(url=url,headers=headers)
Next, deal with if url How to carry parameters in .
It also encapsulates parameters into a dictionary , however get The request is assigned to params,post The request is assigned to data.
response = requests.get(url=url,params=param,headers=headers)response = requests.post(url=url,data=data,headers=headers)
Last , How to get json Formatted data
json() The object returned (obj), Only confirm that the response content is json Type of , Can be used json() Method .
How do you know if the response content is json Type? ? Or open the developer tool , Select the network , Select the request header to find content-type, You can know which type it is .
The following is the code of crawling Baidu translation
# -- coding:UTF-8 --
import json
import requests
if __name__ == "__main__":
url = 'https://fanyi.baidu.com/sug'
word = input('enter a word:')
data = {
'kw':word
}
headers = {
"user-agent": "Mozilla / 5.0(Windows NT 10.0;Win64;x64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 97.0.4692.71Safari / 537.36Edg / 97.0.1072.55"
}
response = requests.post(url=url,data=data,headers=headers)
dic_obj = response.json()
filename = word+'.json'
with open (filename,'w',encoding='utf-8') as fp:
json.dump(dic_obj,fp=fp,ensure_ascii=False)
print('over')
Use json.dump Import when json library ,dump() The role of the python The object is encoded as Json character string .
ensure_ascii: This parameter takes only Boolean values . If it is not set to true, Default output ASCLL value , If you put ensure_ascii Assigned as False, You can output Chinese .
边栏推荐
- Leetcode 198: house raiding
- FileInputStream and fileoutputstream
- Download address collection of various versions of devaexpress
- [Development Notes] cloud app control on device based on smart cloud 4G adapter gc211
- [set theory] Stirling subset number (Stirling subset number concept | ball model | Stirling subset number recurrence formula | binary relationship refinement relationship of division)
- [mindspire paper presentation] summary of training skills in AAAI long tail problem
- 2. E-commerce tool cefsharp autojs MySQL Alibaba cloud react C RPA automated script, open source log
- docker建立mysql:5.7版本指定路径挂载不上。
- Technical dry goods Shengsi mindspire operator parallel + heterogeneous parallel, enabling 32 card training 242 billion parameter model
- Collector in ES (percentile / base)
猜你喜欢
随机推荐
Circuit, packet and message exchange
Comparison of advantages and disadvantages between most complete SQL and NoSQL
Traversal in Lucene
技术干货|利用昇思MindSpore复现ICCV2021 Best Paper Swin Transformer
An overview of IfM Engage
【MySQL 14】使用DBeaver工具远程备份及恢复MySQL数据库(Linux 环境)
gstreamer ffmpeg avdec解码数据流向分析
4everland: the Web3 Developer Center on IPFs has deployed more than 30000 dapps!
Le Seigneur des anneaux: l'anneau du pouvoir
[set theory] Stirling subset number (Stirling subset number concept | ball model | Stirling subset number recurrence formula | binary relationship refinement relationship of division)
Technology dry goods | luxe model for the migration of mindspore NLP model -- reading comprehension task
Es writing fragment process
Homology policy / cross domain and cross domain solutions /web security attacks CSRF and XSS
技术干货|昇思MindSpore NLP模型迁移之LUKE模型——阅读理解任务
The concept of C language pointer
VMware network mode - bridge, host only, NAT network
Analysis of the problems of the 7th Blue Bridge Cup single chip microcomputer provincial competition
项目经验分享:基于昇思MindSpore,使用DFCNN和CTC损失函数的声学模型实现
技术干货|昇思MindSpore算子并行+异构并行,使能32卡训练2420亿参数模型
Longest common prefix and