当前位置:网站首页>爬虫练习题(一)
爬虫练习题(一)
2022-07-04 12:37:00 【InfoQ】
import requests
word = input("请输入搜索内容")
start = int(input("请输入起始页"))
end = int(input("请输入结束页"))
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36 Edg/100.0.1185.44'
}
for n in range(start, end + 1):
url = f'https://www.sogou.com/web?query={word}&page={n}'
# print(url)
response = requests.get(url, headers=headers)
with open(f'{word}的第{n}页。html', "w", encoding="utf-8")as file:
file.write(response.content.decode("utf-8"))




https://www.sogou.com/web?query=python&page=2&ie=utf8
url = f'https://www.sogou.com/web?query={word}&page={n}'
https://www.sogou.com/web?query=Python&_asf=www.sogou.com&_ast=&w=01019900&p=40040100&ie=utf8&from=index-nologin&s_from=index&sut=12736&sst0=1650428312860&lkt=0%2C0%2C0&sugsuv=1650427656976942&sugtime=1650428312860
https://www.sogou.com/web?query=java&_ast=1650428313&_asf=www.sogou.com&w=01029901&p=40040100&dp=1&cid=&s_from=result_up&sut=10734&sst0=1650428363389&lkt=0%2C0%2C0&sugsuv=1650427656976942&sugtime=1650428363389
https://www.sogou.com/web?query=C%E8%AF%AD%E8%A8%80&_ast=1650428364&_asf=www.sogou.com&w=01029901&p=40040100&dp=1&cid=&s_from=result_up&sut=11662&sst0=1650428406805&lkt=0%2C0%2C0&sugsuv=1650427656976942&sugtime=1650428406805
https://www.sogou.com/web?
https://www.sogou.com/web?query=Python&
https://www.sogou.com/web?query=Python&page=2&ie=utf8
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36 Edg/100.0.1185.44'
'cookie' = "IPLOC=CN3600; SUID=191166B6364A910A00000000625F8708; SUV=1650427656976942; browerV=3; osV=1; ABTEST=0|1650428297|v17; SNUID=636A1DCD7B7EA775332A80CB7B347D43; sst0=663; [email protected]@@@@@@@@@; LSTMV=229,37; LCLKINT=1424"
'URl' = "https://www.sogou.com/web?query=Python&_ast=1650429998&_asf=www.sogou.com&w=01029901&cid=&s_from=result_up&sut=5547&sst0=1650430005573&lkt=0,0,0&sugsuv=1650427656976942&sugtime=1650430005573"
url="https://www.sogou.com/web?query={}&page={}:
" ":" ",
# 构建字典的格式,','千万千万别忘了
# headers是关键字不能写错了,写错的话就会有如下报错
import requests
url = "https://www.bxwxorg.com/"
hearders = {
'cookie':'Hm_lvt_46329db612a10d9ae3a668a40c152e0e=1650361322; mc_user={"id":"20812","name":"20220415","avatar":"0","pass":"2a5552bf13f8fa04f5ea26d15699233e","time":1650363349}; Hm_lpvt_46329db612a10d9ae3a668a40c152e0e=1650363378',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36 Edg/100.0.1185.44'
}
response = requests.get(url, hearders=hearders)
print(response.content.decode("UTF-8"))
Traceback (most recent call last):
File "D:/pythonproject/第二次作业.py", line 141, in <module>
response = requests.get(url, hearders=hearders)
File "D:\python37\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "D:\python37\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
TypeError: request() got an unexpected keyword argument 'hearders'
# 原因:三个hearders写的一致,但是headers是关键字,所以报类型错误
# 但是写成heades会有另一种报错形式
import requests
word = input("请输入搜索内容")
start = int(input("请输入起始页"))
end = int(input("请输入结束页"))
heades = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36 Edg/100.0.1185.44'
}
for n in range(start, end + 1):
url = f'https://www.sogou.com/web?query={word}&page={n}'
# print(url)
response = requests.get(url, headers=headers)
with open(f'{word}的第{n}页。html', "w", encoding="utf-8")as file:
file.write(response.content.decode("utf-8"))
Traceback (most recent call last):
File "D:/pythonproject/第二次作业.py", line 117, in <module>
response = requests.get(url, headers=headers)
NameError: name 'headers' is not defined
# 原因:三个hearders写的不一致,所以报名称错误
# 正确写法是,最好不要写错!
import requests
url = "https://www.bxwxorg.com/"
headers = {
'cookie':'Hm_lvt_46329db612a10d9ae3a668a40c152e0e=1650361322; mc_user={"id":"20812","name":"20220415","avatar":"0","pass":"2a5552bf13f8fa04f5ea26d15699233e","time":1650363349}; Hm_lpvt_46329db612a10d9ae3a668a40c152e0e=1650363378',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36 Edg/100.0.1185.44'
}
response = requests.get(url, headers=headers)
print(response.content.decode("UTF-8"))
for n in range(start, end + 1):


边栏推荐
- Etcd 存储,Watch 以及过期机制
- [data clustering] section 3 of Chapter 4: DBSCAN performance analysis, advantages and disadvantages, and parameter selection methods
- 面试官:Redis 过期删除策略和内存淘汰策略有什么区别?
- When synchronized encounters this thing, there is a big hole, pay attention!
- CVPR 2022 | TransFusion:用Transformer进行3D目标检测的激光雷达-相机融合
- ArgMiner:一个用于对论点挖掘数据集进行处理、增强、训练和推理的 PyTorch 的包
- A treasure open source software, cross platform terminal artifact tabby
- Two dimensional code coding theory
- Leetcode day 17
- Interviewer: what is the difference between redis expiration deletion strategy and memory obsolescence strategy?
猜你喜欢
Fly tutorial 02 advanced functions of elevatedbutton (tutorial includes source code) (tutorial includes source code)
面向个性化需求的在线云数据库混合调优系统 | SIGMOD 2022入选论文解读
Argminer: a pytorch package for processing, enhancing, training, and reasoning argument mining datasets
PostgreSQL 9.1 soaring Road
Building intelligent gray-scale data system from 0 to 1: Taking vivo game center as an example
洞见科技解决方案总监薛婧:联邦学习助力数据要素安全流通
Master the use of auto analyze in data warehouse
AI 绘画极简教程
After installing vscode, the program runs (an include error is detected, please update the includepath, which has been solved for this translation unit (waveform curve is disabled) and (the source fil
n++也不靠谱
随机推荐
Etcd 存储,Watch 以及过期机制
A taste of node JS (V), detailed explanation of express module
求解:在oracle中如何用一条语句用delete删除两个表中jack的信息
R language -- readr package reads and writes data
asp. Core is compatible with both JWT authentication and cookies authentication
记一次 Showing Recent Errors Only Command /bin/sh failed with exit code 1 问题
Runc hang causes the kubernetes node notready
DVWA range exercise 4
使用Scrcpy投屏
VIM, another program may be editing the same file If this is the solution of the case
Interviewer: what is the difference between redis expiration deletion strategy and memory obsolescence strategy?
Why can the implementation class of abstractdispatcherservletinitializer be called when initializing the web container
After installing vscode, the program runs (an include error is detected, please update the includepath, which has been solved for this translation unit (waveform curve is disabled) and (the source fil
DC-5 target
16. Memory usage and segmentation
CA:用于移动端的高效坐标注意力机制 | CVPR 2021
从0到1建设智能灰度数据体系:以vivo游戏中心为例
DC-5靶机
Paper notes ACL 2020 improving event detection via open domain trigger knowledge
Global and Chinese markets of digital PCR and real-time PCR 2022-2028: Research Report on technology, participants, trends, market size and share