当前位置:网站首页>Crawler request module
Crawler request module
2022-07-06 01:17:00 【horizonTel】
request modular
1、 Basic operation
''' - Appoint url - Initiate request - Get the data of the response - Persistent storage '''
import requests
if __name__ == "__main__":
# Appoint url
url = "https://www.sogou.com/"
# Send a request
response = requests.get(url=url)
# Obtain corresponding data ,text The corresponding data of string type is returned
page_text = response.text
print(page_text)
# Persistent storage
with open("./sougou.html", "w", encoding="UTF-8") as fp:
fp.write(page_text)
2、UA camouflage
# UA:user-agent
# UA camouflage
import requests
if __name__ == "__main__":
url = "https://www.sogou.com/web"
kw = input('enter a word:')
# UA camouflage
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:96.0) Gecko/20100101 Firefox/96.0'
}
# Dictionaries , amount to url Parameters in the request of
param = {
'query': kw
}
# agent , Normal use https and sockets5
proxies = {
"HTTP":"http://123.169.122.201:9999"}
response = requests.get(url=url, params=param, headers=headers, proxies=proxies)
page_text = response.text
with open("./sougou.html", "w", encoding="UTF-8") as fp:
fp.write(page_text)
3、 Small cases
# Partial data of the whole page
# Crack Baidu translation
''' - poat request ( With parameters ) - The corresponding data is a set json data json.load('json data ') It can be changed into string form '''
import requests
import json
if __name__ == "__main__":
post_url = "https://fanyi.baidu.com/sug"
# post Requested parameters
data = {
'kw': 'dog'
}
#UA camouflage
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:96.0) Gecko/20100101 Firefox/96.0'
}
response = requests.post(url=post_url, data=data, headers=headers)
# Back to a json object ( The corresponding data must be json data )
dic_obj = post_text = response.json()
# print(dic_obj)
# Storage
with open('./dog.json', 'w', encoding='UTF-8') as fp:
json.dump(dic_obj, fp=fp, ensure_ascii=False, indent=2)
边栏推荐
- SCM Chinese data distribution
- vSphere实现虚拟机迁移
- Loop structure of program (for loop)
- DD's command
- Leetcode study - day 35
- [pat (basic level) practice] - [simple mathematics] 1062 simplest fraction
- Recursive method to realize the insertion operation in binary search tree
- Interview must brush algorithm top101 backtracking article top34
- cf:C. The Third Problem【关于排列这件事】
- Leetcode 208. 实现 Trie (前缀树)
猜你喜欢

Building core knowledge points

Finding the nearest common ancestor of binary tree by recursion

WordPress collection plug-in automatically collects fake original free plug-ins

yii中console方法调用,yii console定时任务

Leetcode study - day 35

Dede collection plug-in free collection release push plug-in

Installation and use of esxi

About error 2003 (HY000): can't connect to MySQL server on 'localhost' (10061)

测试/开发程序员的成长路线,全局思考问题的问题......

Illustrated network: the principle behind TCP three-time handshake, why can't two-time handshake?
随机推荐
MobileNet系列(5):使用pytorch搭建MobileNetV3并基于迁移学习训练
Loop structure of program (for loop)
Programmer growth Chapter 9: precautions in real projects
Differences between standard library functions and operators
WordPress collection plug-in automatically collects fake original free plug-ins
SPIR-V初窺
Opinions on softmax function
VMware Tools安装报错:无法自动安装VSock驱动程序
普通人下场全球贸易,新一轮结构性机会浮出水面
3D模型格式汇总
KDD 2022 | EEG AI helps diagnose epilepsy
MIT doctoral thesis | robust and reliable intelligent system using neural symbol learning
Threedposetracker project resolution
Novice entry depth learning | 3-6: optimizer optimizers
Netease smart enterprises enter the market against the trend, and there is a new possibility for game industrialization
Recoverable fuse characteristic test
China Taiwan strategy - Chapter 8: digital marketing assisted by China Taiwan
In the era of industrial Internet, we will achieve enough development by relying on large industrial categories
Introduction to robotics I. spatial transformation (1) posture, transformation
Vulhub vulnerability recurrence 75_ XStream