当前位置:网站首页>Crawler request module
Crawler request module
2022-07-06 01:17:00 【horizonTel】
request modular
1、 Basic operation
''' - Appoint url - Initiate request - Get the data of the response - Persistent storage '''
import requests
if __name__ == "__main__":
# Appoint url
url = "https://www.sogou.com/"
# Send a request
response = requests.get(url=url)
# Obtain corresponding data ,text The corresponding data of string type is returned
page_text = response.text
print(page_text)
# Persistent storage
with open("./sougou.html", "w", encoding="UTF-8") as fp:
fp.write(page_text)
2、UA camouflage
# UA:user-agent
# UA camouflage
import requests
if __name__ == "__main__":
url = "https://www.sogou.com/web"
kw = input('enter a word:')
# UA camouflage
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:96.0) Gecko/20100101 Firefox/96.0'
}
# Dictionaries , amount to url Parameters in the request of
param = {
'query': kw
}
# agent , Normal use https and sockets5
proxies = {
"HTTP":"http://123.169.122.201:9999"}
response = requests.get(url=url, params=param, headers=headers, proxies=proxies)
page_text = response.text
with open("./sougou.html", "w", encoding="UTF-8") as fp:
fp.write(page_text)
3、 Small cases
# Partial data of the whole page
# Crack Baidu translation
''' - poat request ( With parameters ) - The corresponding data is a set json data json.load('json data ') It can be changed into string form '''
import requests
import json
if __name__ == "__main__":
post_url = "https://fanyi.baidu.com/sug"
# post Requested parameters
data = {
'kw': 'dog'
}
#UA camouflage
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:96.0) Gecko/20100101 Firefox/96.0'
}
response = requests.post(url=post_url, data=data, headers=headers)
# Back to a json object ( The corresponding data must be json data )
dic_obj = post_text = response.json()
# print(dic_obj)
# Storage
with open('./dog.json', 'w', encoding='UTF-8') as fp:
json.dump(dic_obj, fp=fp, ensure_ascii=False, indent=2)
边栏推荐
- Gartner发布2022-2023年八大网络安全趋势预测,零信任是起点,法规覆盖更广
- 282. Stone consolidation (interval DP)
- Cf:d. insert a progression [about the insert in the array + the nature of absolute value + greedy top-down]
- Leetcode 208. 实现 Trie (前缀树)
- Novice entry depth learning | 3-6: optimizer optimizers
- ORA-00030
- Dede collection plug-in free collection release push plug-in
- 直播系统代码,自定义软键盘样式:字母、数字、标点三种切换
- 基于DVWA的文件上传漏洞测试
- Introduction to robotics I. spatial transformation (1) posture, transformation
猜你喜欢
The inconsistency between the versions of dynamic library and static library will lead to bugs
程序员搞开源,读什么书最合适?
Differences between standard library functions and operators
Daily practice - February 13, 2022
Test de vulnérabilité de téléchargement de fichiers basé sur dvwa
Leetcode study - day 35
Five challenges of ads-npu chip architecture design
SSH login is stuck and disconnected
Cf:c. the third problem
Mathematical modeling learning from scratch (2): Tools
随机推荐
Spir - V premier aperçu
Leetcode1961. 检查字符串是否为数组前缀
面试必刷算法TOP101之回溯篇 TOP34
电气数据|IEEE118(含风能太阳能)
How to get the PHP version- How to get the PHP Version?
Cf:c. the third problem
Mysql--- query the top 5 students
Kotlin core programming - algebraic data types and pattern matching (3)
Distributed base theory
VMware Tools安装报错:无法自动安装VSock驱动程序
Netease smart enterprises enter the market against the trend, and there is a new possibility for game industrialization
激动人心,2022开放原子全球开源峰会报名火热开启
Cf:h. maximum and [bit operation practice + K operations + maximum and]
Obstacle detection
基於DVWA的文件上傳漏洞測試
Getting started with devkit
logstash清除sincedb_path上传记录,重传日志数据
Hundreds of lines of code to implement a JSON parser
General operation method of spot Silver
ThreeDPoseTracker项目解析