当前位置:网站首页>Basic teaching of crawler code
Basic teaching of crawler code
2022-07-03 06:42:00 【pjiang000】
from http.client import ResponseNotReady
import json
from unicodedata import name
import requests
from lxml import etree
import csv
import xlwt
# base_url = 'https://www.basketball-reference.com'
base_url = https://www.baidu.com
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36',
}
response = requests.get(base_url, headers=headers)
html = etree.HTML(response.text)
url_list = html.xpath('//*[@id="site_menu"]/ul/li[2]/div/a/@href')
team_my_names = html.xpath('//*[@id="site_menu"]/ul/li[2]/div/a/text()')
teams_list = url_list
# print(teams_list)
for h in range(len(teams_list)):
base_url = 'https://www.basketball-reference.com' + teams_list[h]
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36',
}
xls = xlwt.Workbook()
sht1 = xls.add_sheet('Sheet1')
response = requests.get(base_url, headers=headers)
html = etree.HTML(response.text)
url_list = html.xpath('//*[@id="roster"]/tbody/tr/td[1]/a/@href')
names = html.xpath('//*[@id="roster"]/tbody/tr/td[1]/a')
# print(url_list)
# print(names)
name_lst = []
for j in range(len(names)):
name_lst.append(names[j].text)
names = name_lst
person_list = []
for j in range(len(url_list)):
person_list.append('https://www.basketball-reference.com' + url_list[j])
url_list = person_list
print(team_my_names[h], end=": ")
file_name = 'files/' + team_my_names[h] + ".xls"
count = 0
print()
for i in range(len(url_list)):
print(names[i], end=',')
response = requests.get(url_list[i], headers=headers)
html = etree.HTML(response.text)
year_lst = html.xpath('//*[@id="per_game"]/tbody/tr/th/a/text()')
team_name = html.xpath('//*[@id="per_game"]/tbody/tr/td[@data-stat="team_id"]//text()')
for j in range(len(year_lst)):
sht1.write(count, 0, names[i])
sht1.write(count, 1, year_lst[j])
sht1.write(count, 2, 'NULL' if len(team_name) < j else team_name[j])
count += 1
xls.save(file_name)
print()
边栏推荐
- Floating menu operation
- [5g NR] UE registration process
- opencv
- Pytest attempts to execute the test case without skipping, but the case shows that it is all skipped
- VMware virtual machine C disk expansion
- 冒泡排序的简单理解
- Redis cluster creation, capacity expansion and capacity reduction
- Yolov1 learning notes
- Ruoyi interface permission verification
- Daily question brushing record (11)
猜你喜欢
IC_EDA_ALL虚拟机(丰富版):questasim、vivado、vcs、verdi、dc、pt、spyglass、icc2、synplify、INCISIVE、IC617、MMSIM、工艺库
HMS core helps baby bus show high-quality children's digital content to global developers
These two mosquito repellent ingredients are harmful to babies. Families with babies should pay attention to choosing mosquito repellent products
23 design models
Daily question brushing record (11)
ruoyi接口权限校验
Create your own deep learning environment with CONDA
论文笔记 VSALM 文献综述《A Comprehensive Survey of Visual SLAM Algorithms》
Selenium - by changing the window size, the width, height and length of different models will be different
Read blog type data from mysql, Chinese garbled code - solved
随机推荐
[set theory] equivalence relation (concept of equivalence relation | examples of equivalence relation | equivalence relation and closure)
Common interview questions
【开源项目推荐-ColugoMum】这群本科生基于国产深度学习框架PaddlePadddle开源了零售行业解决方案
VMware virtual machine C disk expansion
致即将毕业大学生的一封信
Reinstalling the system displays "setup is applying system settings" stationary
How matlab modifies default settings
UTC time, GMT time, CST time
JMeter performance automation test
Interface test weather API
修改MySQL密码
方差迭代公式推导
UTC时间、GMT时间、CST时间
SQL实现将多行记录合并成一行
Modify MySQL password
opencv
Unittest attempt
Naive Bayes in machine learning
Oracle Database Introduction
These two mosquito repellent ingredients are harmful to babies. Families with babies should pay attention to choosing mosquito repellent products