当前位置:网站首页>爬虫代码基础教学
爬虫代码基础教学
2022-07-03 06:20:00 【pjiang000】
from http.client import ResponseNotReady
import json
from unicodedata import name
import requests
from lxml import etree
import csv
import xlwt
# base_url = 'https://www.basketball-reference.com'
base_url = https://www.baidu.com
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36',
}
response = requests.get(base_url, headers=headers)
html = etree.HTML(response.text)
url_list = html.xpath('//*[@id="site_menu"]/ul/li[2]/div/a/@href')
team_my_names = html.xpath('//*[@id="site_menu"]/ul/li[2]/div/a/text()')
teams_list = url_list
# print(teams_list)
for h in range(len(teams_list)):
base_url = 'https://www.basketball-reference.com' + teams_list[h]
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36',
}
xls = xlwt.Workbook()
sht1 = xls.add_sheet('Sheet1')
response = requests.get(base_url, headers=headers)
html = etree.HTML(response.text)
url_list = html.xpath('//*[@id="roster"]/tbody/tr/td[1]/a/@href')
names = html.xpath('//*[@id="roster"]/tbody/tr/td[1]/a')
# print(url_list)
# print(names)
name_lst = []
for j in range(len(names)):
name_lst.append(names[j].text)
names = name_lst
person_list = []
for j in range(len(url_list)):
person_list.append('https://www.basketball-reference.com' + url_list[j])
url_list = person_list
print(team_my_names[h], end=": ")
file_name = 'files/' + team_my_names[h] + ".xls"
count = 0
print()
for i in range(len(url_list)):
print(names[i], end=',')
response = requests.get(url_list[i], headers=headers)
html = etree.HTML(response.text)
year_lst = html.xpath('//*[@id="per_game"]/tbody/tr/th/a/text()')
team_name = html.xpath('//*[@id="per_game"]/tbody/tr/td[@data-stat="team_id"]//text()')
for j in range(len(year_lst)):
sht1.write(count, 0, names[i])
sht1.write(count, 1, year_lst[j])
sht1.write(count, 2, 'NULL' if len(team_name) < j else team_name[j])
count += 1
xls.save(file_name)
print()
边栏推荐
猜你喜欢
深入解析kubernetes controller-runtime
论文笔记 VSALM 文献综述《A Comprehensive Survey of Visual SLAM Algorithms》
Merge and migrate data from small data volume, sub database and sub table Mysql to tidb
Oauth2.0 - explanation of simplified mode, password mode and client mode
YOLOV3学习笔记
Click cesium to obtain three-dimensional coordinates (longitude, latitude and elevation)
ROS+Pytorch的联合使用示例(语义分割)
Svn branch management
使用conda创建自己的深度学习环境
Selenium - by changing the window size, the width, height and length of different models will be different
随机推荐
Migrate data from Mysql to tidb from a small amount of data
JMeter performance automation test
深入解析kubernetes controller-runtime
Printer related problem record
Oracle database synonym creation
PMP notes
Page text acquisition
Decision tree of machine learning
Merge and migrate data from small data volume, sub database and sub table Mysql to tidb
Advanced technology management - do you know the whole picture of growth?
Tabbar settings
Openresty best practices
使用conda创建自己的深度学习环境
Svn branch management
Yolov1 learning notes
Kubesphere - build Nacos cluster
Leetcode problem solving summary, constantly updating!
Mysql
Interface test weather API
从小数据量 MySQL 迁移数据到 TiDB