当前位置:网站首页>Beginner crawler notes (collecting data)
Beginner crawler notes (collecting data)
2022-08-04 15:39:00 【Sweat always outweighs talent】
import urllib.requestdef main():#1. Crawl the web (parse the data one by one in this)baseurl = 'https://movie.douban.com/top250?start='datalist = getData(baseurl)#2. Save dataprint()#crawl the webdef getData(baseurl):#First you need to get a page of data, and then use a loop to get the information of each pagedatalist = []for i in range(0,10):url = baseurl + str(i*25)html = askURL(url)return datalist#Request web pagedef askURL(url):header = {"User-Agent": "Mozilla/5.0(Linux;Android6.0;Nexus5 Build / MRA58N) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 103.0.5060.134MobileSafari / 537.36Edg / 103.0.1264.77"}request = urllib.request.Request(url, headers = header)html = ""try :response = urllib.request.urlopen(request)html = response.read().decode()print(html)except urllib.error.URLerror as e:if hasattr(e,"code"):print(e.code)if hasattr(e,"reason"):print(e.reason)return htmlif __name__ == '__main__':main()
The code has only completed the task of collecting data, it has not been perfected, and will continue to be updated in the future!!!(The source of the tutorial and station B, if there is any offense, please contact me to delete it by private message)
‘
边栏推荐
猜你喜欢
随机推荐
Request method ‘POST‘ not supported。 Failed to load resource: net::ERR_FAILED
一文解答DevOps平台的制品库是什么
【Harmony OS】【FAQ】鸿蒙问题合集2
我说MySQL联合索引遵循最左前缀匹配原则,面试官让我回去等通知
【Harmony OS】【FAQ】Hongmeng Questions Collection 2
什么是 DevOps?看这一篇就够了!
Online Excel based on Next.js
分布式链路追踪Jaeger + 微服务Pig在Rainbond上的实践分享
uni-app之renderjs
项目里的各种配置,你都了解吗?
RTC 场景下的屏幕共享优化实践
Redis-主从复制
皕杰报表配置文件report_config.xml里都配置了什么?
如何防止重复下单?
弄懂#if #ifdef #if defined
Next -20- 使用自定义样式 (custom style)
多线程编程之优先级翻转问题
FTP协议抓包-工具wireshark与filezilla
Go 事,如何成为一个Gopher ,并在7天找到 Go 语言相关工作,第1篇
实战:10 种实现延迟任务的方法,附代码!