当前位置:网站首页>Basic teaching of crawler code
Basic teaching of crawler code
2022-07-03 06:42:00 【pjiang000】
from http.client import ResponseNotReady
import json
from unicodedata import name
import requests
from lxml import etree
import csv
import xlwt
# base_url = 'https://www.basketball-reference.com'
base_url = https://www.baidu.com
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36',
}
response = requests.get(base_url, headers=headers)
html = etree.HTML(response.text)
url_list = html.xpath('//*[@id="site_menu"]/ul/li[2]/div/a/@href')
team_my_names = html.xpath('//*[@id="site_menu"]/ul/li[2]/div/a/text()')
teams_list = url_list
# print(teams_list)
for h in range(len(teams_list)):
base_url = 'https://www.basketball-reference.com' + teams_list[h]
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36',
}
xls = xlwt.Workbook()
sht1 = xls.add_sheet('Sheet1')
response = requests.get(base_url, headers=headers)
html = etree.HTML(response.text)
url_list = html.xpath('//*[@id="roster"]/tbody/tr/td[1]/a/@href')
names = html.xpath('//*[@id="roster"]/tbody/tr/td[1]/a')
# print(url_list)
# print(names)
name_lst = []
for j in range(len(names)):
name_lst.append(names[j].text)
names = name_lst
person_list = []
for j in range(len(url_list)):
person_list.append('https://www.basketball-reference.com' + url_list[j])
url_list = person_list
print(team_my_names[h], end=": ")
file_name = 'files/' + team_my_names[h] + ".xls"
count = 0
print()
for i in range(len(url_list)):
print(names[i], end=',')
response = requests.get(url_list[i], headers=headers)
html = etree.HTML(response.text)
year_lst = html.xpath('//*[@id="per_game"]/tbody/tr/th/a/text()')
team_name = html.xpath('//*[@id="per_game"]/tbody/tr/td[@data-stat="team_id"]//text()')
for j in range(len(year_lst)):
sht1.write(count, 0, names[i])
sht1.write(count, 1, year_lst[j])
sht1.write(count, 2, 'NULL' if len(team_name) < j else team_name[j])
count += 1
xls.save(file_name)
print()
边栏推荐
猜你喜欢
随机推荐
Selenium ide installation recording and local project maintenance
Install VM tools
opencv
2022-06-23 VGMP-OSPF-域间安全策略-NAT策略(更新中)
Summary of remote connection of MySQL
DNS forward query:
The list of "I'm crazy about open source" was released in the first week, with 160 developers on the list
每日刷題記錄 (十一)
【类和对象】深入浅出类和对象
POI dealing with Excel learning
YOLOV2学习与总结
vmware虚拟机C盘扩容
Pytorch exercise items
Some thoughts on machine learning
VMware virtual machine C disk expansion
Scroll view specifies the starting position of the scrolling element
Derivation of variance iteration formula
UTC time, GMT time, CST time
Selenium - 改变窗口大小,不同机型呈现的宽高长度会不一样
Pdf files can only print out the first page