当前位置：网站首页>Beginner crawler notes (collecting data)

Beginner crawler notes (collecting data)

2022-08-04 15:39:00 【Sweat always outweighs talent】

import urllib.requestdef main():#1. Crawl the web (parse the data one by one in this)baseurl = 'https://movie.douban.com/top250?start='datalist = getData(baseurl)#2. Save dataprint()#crawl the webdef getData(baseurl):#First you need to get a page of data, and then use a loop to get the information of each pagedatalist = []for i in range(0,10):url = baseurl + str(i*25)html = askURL(url)return datalist#Request web pagedef askURL(url):header = {"User-Agent": "Mozilla/5.0(Linux;Android6.0;Nexus5 Build / MRA58N) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 103.0.5060.134MobileSafari / 537.36Edg / 103.0.1264.77"}request = urllib.request.Request(url, headers = header)html = ""try :response = urllib.request.urlopen(request)html = response.read().decode()print(html)except urllib.error.URLerror as e:if hasattr(e,"code"):print(e.code)if hasattr(e,"reason"):print(e.reason)return htmlif __name__ == '__main__':main()

The code has only completed the task of collecting data, it has not been perfected, and will continue to be updated in the future!!!(The source of the tutorial and station B, if there is any offense, please contact me to delete it by private message)

‘

原网站

版权声明
本文为[Sweat always outweighs talent]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/216/202208041525071355.html

当前位置：网站首页>Beginner crawler notes (collecting data)

Beginner crawler notes (collecting data)

边栏推荐

猜你喜欢

随机推荐