当前位置:网站首页>Taiyuan bus route crawling
Taiyuan bus route crawling
2022-07-29 07:59:00 【Zhao [email protected]】
Life is too short , I learned Python
# -*- codeing = utf-8 -*-
import requests
from lxml import etree
from fake_useragent import UserAgent
#** The list is used to save all information **
items = []
headers ={
"User-Agent":UserAgent().chrome
}
def parse_navigation():
start_url = "https://taiyuan.8684.cn/"
r =requests.get(url=start_url,headers=headers)
**# Parse the content to get all navigation links **
tree =etree.HTML(r.text)
**# For a link
# Begin with a number **
number_href_list =tree.xpath('//div[@class="bus-layer depth w120"]/div[1]/div/a/@href')
**# It starts with a letter **
char_href_list =tree.xpath('//div[@class="bus-layer depth w120"]/div[2]/div/a/@href')
**# All links will be returned **
return number_href_list + char_href_list
def parse_erji_route(content):
tree = etree.HTML(content)
# Get every line link
route_list = tree.xpath('//div[@class="list clearfix"]/a/@href')
route_name = tree.xpath('//div[@class="list clearfix"]/a/text()')
# Traverse ,
i = 0
for route in route_list:
print(" Start crawling %s Information about " % route_name[i])
# Facilitate every route
route ="https://taiyuan.8684.cn" + route
# Send a request , # Get a response
r =requests.get(url=route,headers=headers)
parse_sanji_route(r.text)
print(" End of crawling %s Information about " % route_name[i])
i += 1
def parse_sanji_route(content):
tree =etree.HTML(content)
# Grab what you want
# Bus information
bus_num =tree.xpath('//div[@class="info"]/h1/text()')[0]
# Get run time :
run_time = tree.xpath('//ul[@class="bus-desc"]/li[1]/text()')[0]
# Fare information
ticket_price_info = tree.xpath('//ul[@class="bus-desc"]/li[2]/text()')[0]
# Update time
updata_time =tree.xpath('//ul[@class="bus-desc"]/li[4]/text()')[0]
# Get the total number of uplink stations
bus_station_list = tree.xpath('//div[@class="bus-excerpt mb15"]/div/div[@class="total"]/text()')
up_total =bus_station_list[0]
# Get the uplink station name
up_station_name =tree.xpath('//div[@class="bus-lzlist mb15"][1]')
try:
# Get the total number of downlink stations
down_total = bus_station_list[1]
# Get the downstream station name
down_station_name_list = tree.xpath('//div[@class="bus-lzlist mb15"][2]')
except Exception as error:
down_total = ""
down_station_name_list =[]
# Save every bus information , Store in dictionary
item ={
" The name of the bus ":bus_num,
" The elapsed time ":run_time,
" Fare information ":ticket_price_info,
" Update time ":updata_time,
" Number of uplink stations ":up_total,
" Uplink station name ":up_station_name,
" Number of downlink stations ":down_total,
" Downlink station name ":down_station_name_list,
}
items.append(item)
def parse_erji(navi_list):
# Traverse the list and send the request once , Get all bus routes responding to each page
for first_url in navi_list:
print(" Start crawling %s All bus information "%first_url)
# Splice complete links
first_url = "https://taiyuan.8684.cn" +first_url
# Send a request , Get a response
print(first_url)
r =requests.get(first_url,headers=headers)
content =r.text
# Analyze the integrity of each bus url
parse_erji_route(content)
print(" End of crawling %s All bus information " % first_url)
版权声明
本文为[Zhao [email protected]]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/210/202207290520358277.html
边栏推荐
- [cryoelectron microscope] relation4.0 - subtomogram tutorial
- Unity beginner 1 - character movement control (2D)
- Go 事,如何成为一个Gopher ,并在7天找到 Go 语言相关工作,第1篇
- Go, how to become a gopher, and find work related to go language in 7 days, Part 1
- [密码学实验] 0x00 安装NTL库
- QT connects two qslite databases and reports an error qsqlquery:: exec: database not open
- Sqlmap (SQL injection automation tool)
- Technology sharing | quick intercom integrated dispatching system
- Day 014 2D array exercise
- Shell script - global variables, local variables, environment variables
猜你喜欢

Matrix decomposition and gradient descent

C language data type

Excellent urban design ~ good! Design # visualization radio station will be broadcast soon

Compare three clock circuit schemes of single chip microcomputer
![[cryoelectron microscope | paper reading] emclarity: software for high-resolution cryoelectron tomography and sub fault averaging](/img/1e/9f05862288261e16a6b04508b9b292.png)
[cryoelectron microscope | paper reading] emclarity: software for high-resolution cryoelectron tomography and sub fault averaging
What are the common error types and solutions of black box testing?

Technology sharing | quick intercom integrated dispatching system

Analyze the roadmap of 25 major DFI protocols and predict the seven major trends in the future of DFI

An Optimal Buffer Management Scheme with Dynamic Thresholds论文总结

Zero technology is deeply involved in the development of privacy computing financial scenario standards of the ICT Institute
随机推荐
Cs61abc sharing session (VI) detailed explanation of program input and output - standard input and output, file, device, EOF, command line parameters
Compare three clock circuit schemes of single chip microcomputer
[cryoelectron microscope] relion4.0 pipeline command summary (self use)
Popular cow G
Dilworth theorem
Pytest set (7) - parameterization
Convert source package to RPM package
在js中,0表示false,非0表示true
[paper reading] tomoalign: a novel approach to correcting sample motion and 3D CTF in cryoet
Chaos and future of domestic digital collections
Very practical shell and shellcheck
Shell script - global variables, local variables, environment variables
Pytorch's skill record
Detailed explanation of the find command (the most common operation of operation and maintenance at the end of the article)
Solve the problem that the disk is full due to large files
Strongly connected component
Greenplus enterprise deployment
Go, how to become a gopher, and find work related to go language in 7 days, Part 1
Better performance and simpler lazy loading of intersectionobserverentry (observer)
Sqlmap (SQL injection automation tool)