当前位置:网站首页>Beginner crawler notes (collecting data)
Beginner crawler notes (collecting data)
2022-08-04 15:39:00 【Sweat always outweighs talent】
import urllib.requestdef main():#1. Crawl the web (parse the data one by one in this)baseurl = 'https://movie.douban.com/top250?start='datalist = getData(baseurl)#2. Save dataprint()#crawl the webdef getData(baseurl):#First you need to get a page of data, and then use a loop to get the information of each pagedatalist = []for i in range(0,10):url = baseurl + str(i*25)html = askURL(url)return datalist#Request web pagedef askURL(url):header = {"User-Agent": "Mozilla/5.0(Linux;Android6.0;Nexus5 Build / MRA58N) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 103.0.5060.134MobileSafari / 537.36Edg / 103.0.1264.77"}request = urllib.request.Request(url, headers = header)html = ""try :response = urllib.request.urlopen(request)html = response.read().decode()print(html)except urllib.error.URLerror as e:if hasattr(e,"code"):print(e.code)if hasattr(e,"reason"):print(e.reason)return htmlif __name__ == '__main__':main()The code has only completed the task of collecting data, it has not been perfected, and will continue to be updated in the future!!!(The source of the tutorial and station B, if there is any offense, please contact me to delete it by private message)
‘
边栏推荐
- A detailed explanation of what is software deployment
- 一文详解什么是软件部署
- Go 事,Gopher 要学的数字类型,变量,常量,运算符 ,第2篇
- Codeforces Round #811 A~F
- What is the difference between member variable and local variable
- MVCC实现过程
- 2022年7月国产数据库大事记-墨天轮
- remote: Check Access Error, please check your access right or username and password!fatal: Authenti
- GPS卫星同步时钟,NTP网络同步时钟,北斗时钟服务器(京准)
- 【Go事】一眼看穿 Go 的集合和切片
猜你喜欢

Many merchants mall system function and dismantling 24 - ping the strength distribution of members

一文详解什么是软件部署

Pisanix v0.2.0 发布|新增动态读写分离支持

字节API鉴权方法

MVCC实现过程

我说MySQL联合索引遵循最左前缀匹配原则,面试官让我回去等通知

To ensure that the communication mechanism

重构指标之如何监控代码圈复杂度

Legal education combined with VR panorama, intuitively feel and learn the spirit of the rule of law
MySQL当前读、快照读、MVCC
随机推荐
Codeforces Round #811 A~F
##ansible自动化运维架构与简介
你一定从未看过如此通俗易懂的YOLO系列(从v1到v5)模型解读
浅谈一下跨端技术方案
For循环控制
DocuWare平台——用于文档管理的内容服务和工作流自动化的平台(上)
MySQL select加锁分析
C# 判断文件编码
洛谷题解P1028 数的计算
Why, when you added a unique index or create duplicate data?
Latex 去掉行号
我说MySQL联合索引遵循最左前缀匹配原则,面试官让我回去等通知
素士科创板IPO撤单,雷军失去“电动牙刷第一股”
Li Mu's deep learning notes are here!
字节API鉴权方法
AIX7.1安装Oracle11g补丁33829709(PSU+OJVM)
HarePoint Analytics for SharePoint Online
MVCC实现过程
Next -18- 添加代码复制按钮
剑指Offer 63.股票的最大利润