当前位置:网站首页>Beginner crawler notes (collecting data)
Beginner crawler notes (collecting data)
2022-08-04 15:39:00 【Sweat always outweighs talent】
import urllib.requestdef main():#1. Crawl the web (parse the data one by one in this)baseurl = 'https://movie.douban.com/top250?start='datalist = getData(baseurl)#2. Save dataprint()#crawl the webdef getData(baseurl):#First you need to get a page of data, and then use a loop to get the information of each pagedatalist = []for i in range(0,10):url = baseurl + str(i*25)html = askURL(url)return datalist#Request web pagedef askURL(url):header = {"User-Agent": "Mozilla/5.0(Linux;Android6.0;Nexus5 Build / MRA58N) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 103.0.5060.134MobileSafari / 537.36Edg / 103.0.1264.77"}request = urllib.request.Request(url, headers = header)html = ""try :response = urllib.request.urlopen(request)html = response.read().decode()print(html)except urllib.error.URLerror as e:if hasattr(e,"code"):print(e.code)if hasattr(e,"reason"):print(e.reason)return htmlif __name__ == '__main__':main()
The code has only completed the task of collecting data, it has not been perfected, and will continue to be updated in the future!!!(The source of the tutorial and station B, if there is any offense, please contact me to delete it by private message)
‘
边栏推荐
- Li Mu's deep learning notes are here!
- DocuWare Platform - Content Services and Workflow Automation Platform for Document Management (Part 1)
- MySQL当前读、快照读、MVCC
- For循环控制
- Codeforces Round #811 A~F
- 弄懂#if #ifdef #if defined
- QT笔记——QUuid了解
- Flutter 运动鞋商铺小demo
- 附加:自定义注解(参数校验注解);(写的不好,别看…)
- Next -20- 使用自定义样式 (custom style)
猜你喜欢
《电磁兼容防护EMC》学习笔记
技术分享| 小程序实现音视频通话
【北亚数据恢复】IBM System Storage存储lvm信息丢失,卷访问不了的数据恢复方案
Manacher(求解最长回文子串)
What is the difference between ITSM software and a work order system?
In action: 10 ways to implement delayed tasks, with code!
MVCC实现过程
JVM调优-GC基本原理和调优关键分析
普法教育结合VR全景,直观感受和学习法治精神
Jupyter常用操作总结(强烈建议收藏,持续更新实用操作)
随机推荐
FTP协议抓包-工具wireshark与filezilla
如何优雅的消除系统重复代码?
How to monitor code cyclomatic complexity by refactoring indicators
我说MySQL联合索引遵循最左前缀匹配原则,面试官让我回去等通知
多商户商城系统功能拆解24讲-平台端分销会员
uni-app之renderjs
Projector reached the party benefits 】 【 beginners entry - brightness projection and curtain selection - from entry to the master
C# 谁改了我的代码
MySQL当前读、快照读、MVCC
附加:自定义注解(参数校验注解);(写的不好,别看…)
"Research Report on the Development of Global Unicorn Enterprises in the First Half of 2022" released - DEMO WORLD World Innovation Summit ended successfully
为什么Redis默认序列化器处理之后的key会带有乱码?
游戏网络 UDP+FEC+KCP
Redis 高可用
Byte、Short、Integer、Long内部缓存类的对比与源码分析
SAP ABAP SteamPunk 蒸汽朋克的最新进展 - 嵌入式蒸汽朋克
分支控制if-else
365天挑战LeetCode1000题——Day 049 非递增顺序的最小子序列 贪心
C# TextBlock 上标
IP报文头解析