当前位置:网站首页>Basic teaching of crawler code
Basic teaching of crawler code
2022-07-03 06:42:00 【pjiang000】
from http.client import ResponseNotReady
import json
from unicodedata import name
import requests
from lxml import etree
import csv
import xlwt
# base_url = 'https://www.basketball-reference.com'
base_url = https://www.baidu.com
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36',
}
response = requests.get(base_url, headers=headers)
html = etree.HTML(response.text)
url_list = html.xpath('//*[@id="site_menu"]/ul/li[2]/div/a/@href')
team_my_names = html.xpath('//*[@id="site_menu"]/ul/li[2]/div/a/text()')
teams_list = url_list
# print(teams_list)
for h in range(len(teams_list)):
base_url = 'https://www.basketball-reference.com' + teams_list[h]
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36',
}
xls = xlwt.Workbook()
sht1 = xls.add_sheet('Sheet1')
response = requests.get(base_url, headers=headers)
html = etree.HTML(response.text)
url_list = html.xpath('//*[@id="roster"]/tbody/tr/td[1]/a/@href')
names = html.xpath('//*[@id="roster"]/tbody/tr/td[1]/a')
# print(url_list)
# print(names)
name_lst = []
for j in range(len(names)):
name_lst.append(names[j].text)
names = name_lst
person_list = []
for j in range(len(url_list)):
person_list.append('https://www.basketball-reference.com' + url_list[j])
url_list = person_list
print(team_my_names[h], end=": ")
file_name = 'files/' + team_my_names[h] + ".xls"
count = 0
print()
for i in range(len(url_list)):
print(names[i], end=',')
response = requests.get(url_list[i], headers=headers)
html = etree.HTML(response.text)
year_lst = html.xpath('//*[@id="per_game"]/tbody/tr/th/a/text()')
team_name = html.xpath('//*[@id="per_game"]/tbody/tr/td[@data-stat="team_id"]//text()')
for j in range(len(year_lst)):
sht1.write(count, 0, names[i])
sht1.write(count, 1, year_lst[j])
sht1.write(count, 2, 'NULL' if len(team_name) < j else team_name[j])
count += 1
xls.save(file_name)
print()
边栏推荐
- Decision tree of machine learning
- 堆排序和优先队列
- 每日刷题记录 (十一)
- What are the characteristics and functions of the scientific thinking mode of mechanical view and system view
- How matlab modifies default settings
- opencv
- 2022 cisp-pte (III) command execution
- [LeetCode]404. 左叶子之和
- The mechanical hard disk is connected to the computer through USB and cannot be displayed
- Interface test weather API
猜你喜欢

Mysql

The dynamic analysis and calculation of expressions are really delicious for flee

Une exploration intéressante de l'interaction souris - pointeur

10万奖金被瓜分,快来认识这位上榜者里的“乘风破浪的姐姐”

Summary of UI module design and practical application of agent mode

Realize PDF to picture conversion with C #

熊市里的大机构压力倍增,灰度、Tether、微策略等巨鲸会不会成为'巨雷'?

“我为开源打榜狂”第一周榜单公布,160位开发者上榜

机器学习 | 简单但是能提升模型效果的特征标准化方法(RobustScaler、MinMaxScaler、StandardScaler 比较和解析)

golang操作redis:写入、读取hash类型数据
随机推荐
HMS core helps baby bus show high-quality children's digital content to global developers
致即将毕业大学生的一封信
Mysql database binlog log enable record
[untitled] 5 self use history
Know flex box
[C /vb.net] convert PDF to svg/image, svg/image to PDF
Install VM tools
SQL implementation merges multiple rows of records into one row
What are the characteristics and functions of the scientific thinking mode of mechanical view and system view
New knowledge! The virtual machine network card causes your DNS resolution to slow down
Paper notes vsalm literature review "a comprehensive survey of visual slam algorithms"
【5G NR】UE注册流程
Page text acquisition
Ruoyi interface permission verification
Pytorch exercise items
简易密码锁
Mysql5.7 group by error
error C2017: 非法的转义序列
Learning notes -- principles and comparison of k-d tree and IKD tree
YOLOV1学习笔记