当前位置:网站首页>[crawler] XPath for data extraction
[crawler] XPath for data extraction
2022-07-04 23:10:00 【Speech unrecognized】
install
pip install lxml
Guide pack
from lxml import etree
Use
take html character string Convert to element object
# take html character string Convert to element object
from lxml import etree
element = etree.HTML(html_str)
The following is through element object .xpath(' Matching rules ')
To extract content
Get tag
Use /
Represents the root node , Path and transition between paths
/html/xx/xx/xxx
Use //
Cross node selection , Go directly to the desired label or text
//xxx # Get all xxx label
Use .
./ Current node
Use ..
../ # The upper node of the current node
.//
When not complete html when , Use , Get relative path
get attribute
@ Property name
Get the current tag The attribute value corresponding to this attribute
//img/@src # all img Of scr attribute
Get text
/text()
Get the text content in the tag // Tag name [contains( text() , ' written words ' ) ]
Get contains In words label
//ol/li//span[contains(text(),' Playable ')]
Get specific condition tags
// Tag name [@ Property name = value ]
Locate specific tags according to their attribute values
//span[@class='title'] # You can get it by class name
// Tag name [ Indexes ]
Index from 1 Start
Get from the front // Upper label / Tag name [position()>3]
From 4 Start
Get from the back // Upper label / Tag name [last()]
Get the last // Upper label / Tag name [last() - 2]
Last but not least 3 individual
combination //ol/li[position()>1][position()<last()-2]
// Tag name [text()=' value ']
Locate the specific label according to the specific text content in the label , You need to match every word
//ol/li//span[text()='[ Playable ]'] # The matching tag content is [ Playable ] The label of
边栏推荐
- Summary of wechat applet display style knowledge points
- A complete tutorial for getting started with redis: redis shell
- MariaDB的Galera集群-双主双活安装设置
- Three stage operations in the attack and defense drill of the blue team
- phpcms付费阅读功能支付宝支付
- ScriptableObject
- Redis入门完整教程:集合详解
- How to choose a securities company? Is it safe to open an account on your mobile phone
- Redis:Redis消息的发布与订阅(了解)
- 字体设计符号组合多功能微信小程序源码
猜你喜欢
Redis入门完整教程:有序集合详解
A complete tutorial for getting started with redis: redis shell
Redis入门完整教程:Bitmaps
Redis入门完整教程:发布订阅
【室友用一局王者荣耀的时间学会了用BI报表数据处理】
On-off and on-off of quality system construction
[sword finger offer] questions 1-5
Attack and defense world misc advanced area can_ has_ stdio?
Redis入门完整教程:GEO
Redis入门完整教程:客户端通信协议
随机推荐
C语言快速解决反转链表
Servlet+JDBC+MySQL简单web练习
P2181 diagonal and p1030 [noip2001 popularization group] arrange in order
Editplus-- usage -- shortcut key / configuration / background color / font size
位运算符讲解
CTF竞赛题解之stm32逆向入门
How can enterprises cross the digital divide? In cloud native 2.0
Advanced area of attack and defense world misc 3-11
Basic knowledge of database
Google collab trample pit
Redis入门完整教程:Pipeline
常用技术指标之一文读懂BOLL布林线指标
Excel 快捷键-随时补充
小程序vant tab组件解决文字过多显示不全的问题
One of the commonly used technical indicators, reading boll Bollinger line indicators
Redis:Redis的事务
Photoshop批量给不同的图片添加不同的编号
金融市场,资产管理与投资基金
UML图记忆技巧
Redis入门完整教程:列表讲解