当前位置:网站首页>Beginner crawler notes (collecting data)
Beginner crawler notes (collecting data)
2022-08-04 15:39:00 【Sweat always outweighs talent】
import urllib.requestdef main():#1. Crawl the web (parse the data one by one in this)baseurl = 'https://movie.douban.com/top250?start='datalist = getData(baseurl)#2. Save dataprint()#crawl the webdef getData(baseurl):#First you need to get a page of data, and then use a loop to get the information of each pagedatalist = []for i in range(0,10):url = baseurl + str(i*25)html = askURL(url)return datalist#Request web pagedef askURL(url):header = {"User-Agent": "Mozilla/5.0(Linux;Android6.0;Nexus5 Build / MRA58N) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 103.0.5060.134MobileSafari / 537.36Edg / 103.0.1264.77"}request = urllib.request.Request(url, headers = header)html = ""try :response = urllib.request.urlopen(request)html = response.read().decode()print(html)except urllib.error.URLerror as e:if hasattr(e,"code"):print(e.code)if hasattr(e,"reason"):print(e.reason)return htmlif __name__ == '__main__':main()
The code has only completed the task of collecting data, it has not been perfected, and will continue to be updated in the future!!!(The source of the tutorial and station B, if there is any offense, please contact me to delete it by private message)
‘
边栏推荐
猜你喜欢
字节API鉴权方法
For循环控制
【北亚数据恢复】IBM System Storage存储lvm信息丢失,卷访问不了的数据恢复方案
全球电子产品需求放缓,三星手机越南工厂每周只需要干 3~4 天
ICDE‘22推荐系统论文之Research篇
QT笔记——Q_INVOKABLE了解
RTC 场景下的屏幕共享优化实践
我说MySQL联合索引遵循最左前缀匹配原则,面试官让我回去等通知
To ensure that the communication mechanism
Projector reached the party benefits 】 【 beginners entry - brightness projection and curtain selection - from entry to the master
随机推荐
"Research Report on the Development of Global Unicorn Enterprises in the First Half of 2022" released - DEMO WORLD World Innovation Summit ended successfully
PHP 图片转PDF
qt 复杂界面信号槽设计
聊聊与苹果审核员的爱恨情仇
普法教育结合VR全景,直观感受和学习法治精神
我在羊毛和二手群里报复性消费
字节API鉴权方法
IP报文头解析
To ensure that the communication mechanism
Redis-哨兵模式
多线程编程之优先级翻转问题
Why, when you added a unique index or create duplicate data?
numpy入门详细代码
界面组件DevExpress ASP.NET Core v22.1 - 增强数据导出功能
An article to answer what is the product library of the DevOps platform
【Es6中的promise】
RTC 场景下的屏幕共享优化实践
uni-app之renderjs
7 天学个Go,Go 结构体 + Go range 来学学
卖家寄卖流程梳理