当前位置:网站首页>Beginner crawler notes (collecting data)
Beginner crawler notes (collecting data)
2022-08-04 15:39:00 【Sweat always outweighs talent】
import urllib.requestdef main():#1. Crawl the web (parse the data one by one in this)baseurl = 'https://movie.douban.com/top250?start='datalist = getData(baseurl)#2. Save dataprint()#crawl the webdef getData(baseurl):#First you need to get a page of data, and then use a loop to get the information of each pagedatalist = []for i in range(0,10):url = baseurl + str(i*25)html = askURL(url)return datalist#Request web pagedef askURL(url):header = {"User-Agent": "Mozilla/5.0(Linux;Android6.0;Nexus5 Build / MRA58N) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 103.0.5060.134MobileSafari / 537.36Edg / 103.0.1264.77"}request = urllib.request.Request(url, headers = header)html = ""try :response = urllib.request.urlopen(request)html = response.read().decode()print(html)except urllib.error.URLerror as e:if hasattr(e,"code"):print(e.code)if hasattr(e,"reason"):print(e.reason)return htmlif __name__ == '__main__':main()The code has only completed the task of collecting data, it has not been perfected, and will continue to be updated in the future!!!(The source of the tutorial and station B, if there is any offense, please contact me to delete it by private message)
‘
边栏推荐
猜你喜欢
随机推荐
实战:10 种实现延迟任务的方法,附代码!
实战:10 种实现延迟任务的方法,附代码!
分布式链路追踪Jaeger + 微服务Pig在Rainbond上的实践分享
【云原生 | 从零开始学Kubernetes】kubernetes之StatefulSet详解
初学爬虫笔记(收集数据)
我在羊毛和二手群里报复性消费
SAP ABAP SteamPunk 蒸汽朋克的最新进展 - 嵌入式蒸汽朋克
【北亚数据恢复】IBM System Storage存储lvm信息丢失,卷访问不了的数据恢复方案
多线程编程之优先级翻转问题
Summary of some pytorch knowledge points that have been updated for a long time
从-99打造Sentinel高可用集群限流中间件
tif转mat
学 Go,最常用的技能是什么?打日志
全球电子产品需求放缓,三星手机越南工厂每周只需要干 3~4 天
Latex 去掉行号
【愚公系列】2022年07月 Go教学课程 028-函数小结案例(通讯录)
重构指标之如何监控代码圈复杂度
Next -18- 添加代码复制按钮
Redis-哨兵模式
What is the difference between ITSM software and a work order system?








