当前位置:网站首页>Crawler request module
Crawler request module
2022-07-06 01:17:00 【horizonTel】
request modular
1、 Basic operation
''' - Appoint url - Initiate request - Get the data of the response - Persistent storage '''
import requests
if __name__ == "__main__":
# Appoint url
url = "https://www.sogou.com/"
# Send a request
response = requests.get(url=url)
# Obtain corresponding data ,text The corresponding data of string type is returned
page_text = response.text
print(page_text)
# Persistent storage
with open("./sougou.html", "w", encoding="UTF-8") as fp:
fp.write(page_text)
2、UA camouflage
# UA:user-agent
# UA camouflage
import requests
if __name__ == "__main__":
url = "https://www.sogou.com/web"
kw = input('enter a word:')
# UA camouflage
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:96.0) Gecko/20100101 Firefox/96.0'
}
# Dictionaries , amount to url Parameters in the request of
param = {
'query': kw
}
# agent , Normal use https and sockets5
proxies = {
"HTTP":"http://123.169.122.201:9999"}
response = requests.get(url=url, params=param, headers=headers, proxies=proxies)
page_text = response.text
with open("./sougou.html", "w", encoding="UTF-8") as fp:
fp.write(page_text)
3、 Small cases
# Partial data of the whole page
# Crack Baidu translation
''' - poat request ( With parameters ) - The corresponding data is a set json data json.load('json data ') It can be changed into string form '''
import requests
import json
if __name__ == "__main__":
post_url = "https://fanyi.baidu.com/sug"
# post Requested parameters
data = {
'kw': 'dog'
}
#UA camouflage
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:96.0) Gecko/20100101 Firefox/96.0'
}
response = requests.post(url=post_url, data=data, headers=headers)
# Back to a json object ( The corresponding data must be json data )
dic_obj = post_text = response.json()
# print(dic_obj)
# Storage
with open('./dog.json', 'w', encoding='UTF-8') as fp:
json.dump(dic_obj, fp=fp, ensure_ascii=False, indent=2)
边栏推荐
- Use of crawler manual 02 requests
- The growth path of test / development programmers, the problem of thinking about the overall situation
- Leetcode 208. 实现 Trie (前缀树)
- yii中console方法调用,yii console定时任务
- View class diagram in idea
- 在产业互联网时代,将会凭借大的产业范畴,实现足够多的发展
- 毕设-基于SSM高校学生社团管理系统
- ThreeDPoseTracker项目解析
- Logstash clear sincedb_ Path upload records and retransmit log data
- Dede collection plug-in free collection release push plug-in
猜你喜欢
电气数据|IEEE118(含风能太阳能)
Intensive learning weekly, issue 52: depth cuprl, distspectrl & double deep q-network
Opinions on softmax function
Test de vulnérabilité de téléchargement de fichiers basé sur dvwa
After 95, the CV engineer posted the payroll and made up this. It's really fragrant
MySQL learning notes 2
Study diary: February 13, 2022
282. Stone consolidation (interval DP)
Blue Bridge Cup embedded stm32g431 - the real topic and code of the eighth provincial competition
基於DVWA的文件上傳漏洞測試
随机推荐
激动人心,2022开放原子全球开源峰会报名火热开启
网易智企逆势进场,游戏工业化有了新可能
Cf:d. insert a progression [about the insert in the array + the nature of absolute value + greedy top-down]
Four commonly used techniques for anti aliasing
cf:H. Maximal AND【位运算练习 + k次操作 + 最大And】
SCM Chinese data distribution
Exciting, 2022 open atom global open source summit registration is hot
基于DVWA的文件上传漏洞测试
The growth path of test / development programmers, the problem of thinking about the overall situation
有谁知道 达梦数据库表的列的数据类型 精度怎么修改呀
Convert binary search tree into cumulative tree (reverse middle order traversal)
Dedecms plug-in free SEO plug-in summary
Leetcode1961. 检查字符串是否为数组前缀
Ubantu check cudnn and CUDA versions
Five challenges of ads-npu chip architecture design
Recursive method converts ordered array into binary search tree
ThreeDPoseTracker项目解析
3D model format summary
SPIR-V初窺
After Luke zettlemoyer, head of meta AI Seattle research | trillion parameters, will the large model continue to grow?