当前位置:网站首页>Crawler case 05 - parsing websites using XPath
Crawler case 05 - parsing websites using XPath
2022-06-12 00:47:00 【Smart Aries】
Use in this case Xpath Parsing the source code , To get what we want 
1. Get web source
import requests
from lxml import etree
# Enter keywords and click search , The URL after jump is our target URL
url = 'https://beijing.zbj.com/search/f/?kw=python%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90'
headers = {
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:97.0) Gecko/20100101 Firefox/97.0"
}
resp = requests.get(url,headers = headers)
print(resp.text)
2. take HTML Preload the source code
html = etree.HTML(resp.text) # take HTML Preload the source code
3. Get content
3.1 Find the location of what you need


3.2 Copy path
Get the following path :
/html/body/div[6]/div/div/div[2]/div[5]/div[1]/div
3.3 Use Xpath Expression to get the content
# Get every service provider's div
divs = html.xpath("/html/body/div[6]/div/div/div[2]/div[5]/div[1]/div")
for div in divs:
# The full path
#prices = div.xpath("./div/div/a[2]/div[2]/div[1]/span[1]/text()")
# according to div Of class Property to quickly locate the location
price = div.xpath(".//div[@class='service-price clearfix']/span[1]/text()")[0].strip("¥") # Remove the... In front of the price ¥ Symbol
title = "".join(div.xpath(".//div[@class='service-title']/p//text()"))
company_name = div.xpath("./div/div/a[1]/div/p/text()")[1].strip() # There is a newline in the list , Take the second item in the list , And remove line breaks
loc = div.xpath("./div/div/a[1]/div/div/span/text()")
print(price)
print(title)
print(title)
print(company_name)
边栏推荐
- [answer] what does UML use to represent hexagonal architecture
- Explain asynchronous tasks in detail: the task of function calculation triggers de duplication
- Invalid spacing setting before WPS Title segment solution
- Investment analysis and demand forecast report of global and Chinese fluorosilicone industry in 2022
- What are the software development processes of the visitor push mall?
- [case] building a universal data lake for Fuguo fund based on star ring technology data cloud platform TDC
- Characteristics of JS logical operators
- How much does it cost to develop s2b2c mall system
- Nat. Comm. | 超算+AI: 为天然产物生物合成路线规划提供导航
- About MySQL password modification failure
猜你喜欢

Online Fox game server - room configuration wizard - component attribute and basic configuration assignment

Characteristics of JS logical operators

Jiaming's day13 of C learning -- structure and structure pointer
![Is interface automation difficult? Take you from 0 to 1 to get started with interface automation test [0 basic can also understand series]](/img/78/f36cdc53b94dc7da576d114a3eb2a6.png)
Is interface automation difficult? Take you from 0 to 1 to get started with interface automation test [0 basic can also understand series]

Go out with a stream

Started with trust and loyal to profession | datapipeline received a thank you letter from Shandong city commercial bank Alliance

Devops landing practice drip and pit stepping records - (1)

1、 Getting started with flutter learn to write a simple client

The latest report of Xinsi technology shows that 97% of applications have vulnerabilities

网狐游戏服务器-房间配置向导-组件属性与基本配置赋值
随机推荐
Is the o2o platform worth doing in 2022
王希廷博士:从知识图谱和自然语言生成的角度认识可解释推荐
How to make scripts executable anywhere
详解异步任务:函数计算的任务触发去重
Explore table component virtualization
WPS标题段前间距设置无效解决方案
The "hard words" about interface testing
Lambda终结操作forEach
验证码是自动化的天敌?看看阿里P7大神是怎么解决的
About MySQL password modification failure
What is bonded warehouse and what is the difference between them
Visitors push e-commerce express without tossing about personal payment codes
Go out with a stream
How to change the font size of Apple phone WPS
2022 edition of global and Chinese high purity silicon carbide powder operation research and investment strategy analysis report
Water for a while
Global and Chinese chromatographic silica gel resin industry research and investment direction forecast report 2022 Edition
Xiaomu's interesting PWN
接口自动化测试很难?带你从0到1入门接口自动化测试【0基础也能看懂系列】
多年测试菜鸟对黑盒测试的理解