当前位置:网站首页>爬虫练习题(一)
爬虫练习题(一)
2022-07-04 12:37:00 【InfoQ】
import requests
word = input("请输入搜索内容")
start = int(input("请输入起始页"))
end = int(input("请输入结束页"))
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36 Edg/100.0.1185.44'
}
for n in range(start, end + 1):
url = f'https://www.sogou.com/web?query={word}&page={n}'
# print(url)
response = requests.get(url, headers=headers)
with open(f'{word}的第{n}页。html', "w", encoding="utf-8")as file:
file.write(response.content.decode("utf-8"))
https://www.sogou.com/web?query=python&page=2&ie=utf8
url = f'https://www.sogou.com/web?query={word}&page={n}'
https://www.sogou.com/web?query=Python&_asf=www.sogou.com&_ast=&w=01019900&p=40040100&ie=utf8&from=index-nologin&s_from=index&sut=12736&sst0=1650428312860&lkt=0%2C0%2C0&sugsuv=1650427656976942&sugtime=1650428312860
https://www.sogou.com/web?query=java&_ast=1650428313&_asf=www.sogou.com&w=01029901&p=40040100&dp=1&cid=&s_from=result_up&sut=10734&sst0=1650428363389&lkt=0%2C0%2C0&sugsuv=1650427656976942&sugtime=1650428363389
https://www.sogou.com/web?query=C%E8%AF%AD%E8%A8%80&_ast=1650428364&_asf=www.sogou.com&w=01029901&p=40040100&dp=1&cid=&s_from=result_up&sut=11662&sst0=1650428406805&lkt=0%2C0%2C0&sugsuv=1650427656976942&sugtime=1650428406805
https://www.sogou.com/web?
https://www.sogou.com/web?query=Python&
https://www.sogou.com/web?query=Python&page=2&ie=utf8
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36 Edg/100.0.1185.44'
'cookie' = "IPLOC=CN3600; SUID=191166B6364A910A00000000625F8708; SUV=1650427656976942; browerV=3; osV=1; ABTEST=0|1650428297|v17; SNUID=636A1DCD7B7EA775332A80CB7B347D43; sst0=663; [email protected]@@@@@@@@@; LSTMV=229,37; LCLKINT=1424"
'URl' = "https://www.sogou.com/web?query=Python&_ast=1650429998&_asf=www.sogou.com&w=01029901&cid=&s_from=result_up&sut=5547&sst0=1650430005573&lkt=0,0,0&sugsuv=1650427656976942&sugtime=1650430005573"
url="https://www.sogou.com/web?query={}&page={}:
" ":" ",
# 构建字典的格式,','千万千万别忘了
# headers是关键字不能写错了,写错的话就会有如下报错
import requests
url = "https://www.bxwxorg.com/"
hearders = {
'cookie':'Hm_lvt_46329db612a10d9ae3a668a40c152e0e=1650361322; mc_user={"id":"20812","name":"20220415","avatar":"0","pass":"2a5552bf13f8fa04f5ea26d15699233e","time":1650363349}; Hm_lpvt_46329db612a10d9ae3a668a40c152e0e=1650363378',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36 Edg/100.0.1185.44'
}
response = requests.get(url, hearders=hearders)
print(response.content.decode("UTF-8"))
Traceback (most recent call last):
File "D:/pythonproject/第二次作业.py", line 141, in <module>
response = requests.get(url, hearders=hearders)
File "D:\python37\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "D:\python37\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
TypeError: request() got an unexpected keyword argument 'hearders'
# 原因:三个hearders写的一致,但是headers是关键字,所以报类型错误
# 但是写成heades会有另一种报错形式
import requests
word = input("请输入搜索内容")
start = int(input("请输入起始页"))
end = int(input("请输入结束页"))
heades = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36 Edg/100.0.1185.44'
}
for n in range(start, end + 1):
url = f'https://www.sogou.com/web?query={word}&page={n}'
# print(url)
response = requests.get(url, headers=headers)
with open(f'{word}的第{n}页。html', "w", encoding="utf-8")as file:
file.write(response.content.decode("utf-8"))
Traceback (most recent call last):
File "D:/pythonproject/第二次作业.py", line 117, in <module>
response = requests.get(url, headers=headers)
NameError: name 'headers' is not defined
# 原因:三个hearders写的不一致,所以报名称错误
# 正确写法是,最好不要写错!
import requests
url = "https://www.bxwxorg.com/"
headers = {
'cookie':'Hm_lvt_46329db612a10d9ae3a668a40c152e0e=1650361322; mc_user={"id":"20812","name":"20220415","avatar":"0","pass":"2a5552bf13f8fa04f5ea26d15699233e","time":1650363349}; Hm_lpvt_46329db612a10d9ae3a668a40c152e0e=1650363378',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36 Edg/100.0.1185.44'
}
response = requests.get(url, headers=headers)
print(response.content.decode("UTF-8"))
for n in range(start, end + 1):
边栏推荐
- 面试官:Redis 过期删除策略和内存淘汰策略有什么区别?
- PostgreSQL 9.1 飞升之路
- 【Android Kotlin】lambda的返回语句和匿名函数
- Is the main thread the same as the UI thread- Is main thread the same as UI thread?
- C language function
- C language: find the length of string
- Alibaba cloud award winning experience: build a highly available system with polardb-x
- Definition of cognition
- CTF竞赛题解之stm32逆向入门
- I want to talk about yesterday
猜你喜欢
Alibaba cloud award winning experience: build a highly available system with polardb-x
Interviewer: what is the difference between redis expiration deletion strategy and memory obsolescence strategy?
诸神黄昏时代的对比学习
记一次 Showing Recent Errors Only Command /bin/sh failed with exit code 1 问题
《天天数学》连载57:二月二十六日
众昂矿业:为保障萤石足量供应,开源节流势在必行
Jetson TX2配置Tensorflow、Pytorch等常用库
Introduction to the button control elevatedbutton of the fleet tutorial (the tutorial includes the source code)
C fonctions linguistiques
干货整理!ERP在制造业的发展趋势如何,看这一篇就够了
随机推荐
Article download address
golang 设置goproxy代理的小细节,适用于go module下载超时,阿里云镜像go module下载超时
Talk about "in C language"
16.内存使用与分段
0x15 string
How to realize the function of Sub Ledger of applet?
Daily Mathematics Series 57: February 26
Using nsproxy to forward messages
Kivy tutorial 08 countdown app implements timer call (tutorial includes source code)
Netgear switch basic configuration command set
DVWA range exercise 4
洞见科技解决方案总监薛婧:联邦学习助力数据要素安全流通
面试官:Redis 过期删除策略和内存淘汰策略有什么区别?
PostgreSQL 9.1 飞升之路
2022年中国移动阅读市场年度综合分析
PostgreSQL 9.1 飞升之路
从0到1建设智能灰度数据体系:以vivo游戏中心为例
When to use pointers in go?
诸神黄昏时代的对比学习
C语言函数