当前位置:网站首页>Taiyuan bus route crawling
Taiyuan bus route crawling
2022-07-29 07:59:00 【Zhao [email protected]】
Life is too short , I learned Python
# -*- codeing = utf-8 -*-
import requests
from lxml import etree
from fake_useragent import UserAgent
#** The list is used to save all information **
items = []
headers ={
"User-Agent":UserAgent().chrome
}
def parse_navigation():
start_url = "https://taiyuan.8684.cn/"
r =requests.get(url=start_url,headers=headers)
**# Parse the content to get all navigation links **
tree =etree.HTML(r.text)
**# For a link
# Begin with a number **
number_href_list =tree.xpath('//div[@class="bus-layer depth w120"]/div[1]/div/a/@href')
**# It starts with a letter **
char_href_list =tree.xpath('//div[@class="bus-layer depth w120"]/div[2]/div/a/@href')
**# All links will be returned **
return number_href_list + char_href_list
def parse_erji_route(content):
tree = etree.HTML(content)
# Get every line link
route_list = tree.xpath('//div[@class="list clearfix"]/a/@href')
route_name = tree.xpath('//div[@class="list clearfix"]/a/text()')
# Traverse ,
i = 0
for route in route_list:
print(" Start crawling %s Information about " % route_name[i])
# Facilitate every route
route ="https://taiyuan.8684.cn" + route
# Send a request , # Get a response
r =requests.get(url=route,headers=headers)
parse_sanji_route(r.text)
print(" End of crawling %s Information about " % route_name[i])
i += 1
def parse_sanji_route(content):
tree =etree.HTML(content)
# Grab what you want
# Bus information
bus_num =tree.xpath('//div[@class="info"]/h1/text()')[0]
# Get run time :
run_time = tree.xpath('//ul[@class="bus-desc"]/li[1]/text()')[0]
# Fare information
ticket_price_info = tree.xpath('//ul[@class="bus-desc"]/li[2]/text()')[0]
# Update time
updata_time =tree.xpath('//ul[@class="bus-desc"]/li[4]/text()')[0]
# Get the total number of uplink stations
bus_station_list = tree.xpath('//div[@class="bus-excerpt mb15"]/div/div[@class="total"]/text()')
up_total =bus_station_list[0]
# Get the uplink station name
up_station_name =tree.xpath('//div[@class="bus-lzlist mb15"][1]')
try:
# Get the total number of downlink stations
down_total = bus_station_list[1]
# Get the downstream station name
down_station_name_list = tree.xpath('//div[@class="bus-lzlist mb15"][2]')
except Exception as error:
down_total = ""
down_station_name_list =[]
# Save every bus information , Store in dictionary
item ={
" The name of the bus ":bus_num,
" The elapsed time ":run_time,
" Fare information ":ticket_price_info,
" Update time ":updata_time,
" Number of uplink stations ":up_total,
" Uplink station name ":up_station_name,
" Number of downlink stations ":down_total,
" Downlink station name ":down_station_name_list,
}
items.append(item)
def parse_erji(navi_list):
# Traverse the list and send the request once , Get all bus routes responding to each page
for first_url in navi_list:
print(" Start crawling %s All bus information "%first_url)
# Splice complete links
first_url = "https://taiyuan.8684.cn" +first_url
# Send a request , Get a response
print(first_url)
r =requests.get(first_url,headers=headers)
content =r.text
# Analyze the integrity of each bus url
parse_erji_route(content)
print(" End of crawling %s All bus information " % first_url)
版权声明
本文为[Zhao [email protected]]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/210/202207290520358277.html
边栏推荐
- Chaos and future of domestic digital collections
- MySQL 45 | 08 is the transaction isolated or not?
- Amaze UI icon query
- An optimal buffer management scheme with dynamic thresholds paper summary
- In an SQL file, a test table and data are defined above, and you can select* from the test table below
- The smallest positive number that a subset of an array cannot accumulate
- Limitations of push down analysis
- Resize2fs: bad magic number in super block
- C# 之 volatile关键字解析
- Implementation of simple cubecap+fresnel shader in unity
猜你喜欢
黑盒测试常见错误类型说明及解决方法有哪些?

QT connects two qslite databases and reports an error qsqlquery:: exec: database not open

Some thoughts on growing into an architect
![[paper reading | cryoet] gum net: fast and accurate 3D subtomo image alignment and average unsupervised geometric matching](/img/dc/255bf122d5243f2a08ca0e03b53137.png)
[paper reading | cryoet] gum net: fast and accurate 3D subtomo image alignment and average unsupervised geometric matching

Do you want to meet all the needs of customers

CentOS deploy PostgreSQL 13

Up sampling deconvolution operation

Solving linear programming problems based on MATLAB

Detailed explanation of two modes of FTP
![[freeze electron microscope] analysis of the source code of the subtomogram alignment function of relion4.0 (for self use)](/img/fe/0efdd151f9661d5cd06a79b7266754.png)
[freeze electron microscope] analysis of the source code of the subtomogram alignment function of relion4.0 (for self use)
随机推荐
Greenplus enterprise deployment
MySQL 45 talk | 07 line lock merits and demerits: how to reduce the impact of line lock on performance?
Solve the problem that CSDN cannot publish blog due to unknown copyright
10 common software architecture modes
An optimal buffer management scheme with dynamic thresholds paper summary
Amaze UI 图标查询
flutter只要是数据,都会判空的
The computer system has no standard tcp/ip port processing operations
330. Complete the array as required
Database persistence +jdbc database connection
Unity beginner 1 - character movement control (2D)
The difference between static library and dynamic library of program
String class
MapReduce steps of each stage
[experience] relevant configuration of remote connection to intranet server through springboard machine
Dynamic Thresholds Buffer Management in a Shared Buffer Packet Switch论文总结
Some thoughts on growing into an architect
Cs61abc sharing session (VI) detailed explanation of program input and output - standard input and output, file, device, EOF, command line parameters
FLink CDC 的mysql connector中,mysql的字段是varbinary, 官方
Jiamusi Market Supervision Bureau carried out special food safety network training on epidemic and insect prevention