当前位置:网站首页>[crawler] XPath for data extraction
[crawler] XPath for data extraction
2022-07-04 23:10:00 【Speech unrecognized】
install
pip install lxml
Guide pack
from lxml import etree
Use
take html character string Convert to element object
# take html character string Convert to element object
from lxml import etree
element = etree.HTML(html_str)
The following is through element object .xpath(' Matching rules ') To extract content
Get tag
Use / Represents the root node , Path and transition between paths
/html/xx/xx/xxx
Use // Cross node selection , Go directly to the desired label or text
//xxx # Get all xxx label
Use .
./ Current node
Use ..
../ # The upper node of the current node
.// When not complete html when , Use , Get relative path
get attribute
@ Property name Get the current tag The attribute value corresponding to this attribute
//img/@src # all img Of scr attribute
Get text
/text() Get the text content in the tag // Tag name [contains( text() , ' written words ' ) ] Get contains In words label
//ol/li//span[contains(text(),' Playable ')]
Get specific condition tags
// Tag name [@ Property name = value ] Locate specific tags according to their attribute values
//span[@class='title'] # You can get it by class name
// Tag name [ Indexes ] Index from 1 Start
Get from the front // Upper label / Tag name [position()>3] From 4 Start
Get from the back // Upper label / Tag name [last()] Get the last // Upper label / Tag name [last() - 2] Last but not least 3 individual
combination //ol/li[position()>1][position()<last()-2]
// Tag name [text()=' value '] Locate the specific label according to the specific text content in the label , You need to match every word
//ol/li//span[text()='[ Playable ]'] # The matching tag content is [ Playable ] The label of
边栏推荐
- Redis getting started complete tutorial: Geo
- On-off and on-off of quality system construction
- Redis入门完整教程:哈希说明
- 常用技术指标之一文读懂BOLL布林线指标
- A complete tutorial for getting started with redis: redis shell
- mamp下缺少pcntl扩展的解决办法,Fatal error: Call to undefined function pcntl_signal()
- Wechat official account solves the cache problem of entering from the customized menu
- Redis入门完整教程:初识Redis
- Notepad++--编辑的技巧
- Redis入门完整教程:HyperLogLog
猜你喜欢

【剑指offer】1-5题

小程序vant tab组件解决文字过多显示不全的问题

SHP data making 3dfiles white film

Duplicate ADMAS part name

C语言快速解决反转链表

Three stage operations in the attack and defense drill of the blue team

LabVIEW中比较两个VI

debug和release的区别

vim编辑器知识总结

Set up a website with a sense of ceremony, and post it to 1/2 of the public network through the intranet
随机推荐
Servlet服务器端和客户端中文输出乱码问题
Google Earth engine (GEE) - tasks upgrade enables run all to download all images in task types with one click
云服务器设置ssh密钥登录
MySQL数据库备份与恢复--mysqldump命令
Docker镜像的缓存特性和Dockerfile
JS card style countdown days
How can enterprises cross the digital divide? In cloud native 2.0
Redis introduction complete tutorial: List explanation
MariaDB的Galera集群应用场景--数据库多主多活
ScriptableObject
Redis入门完整教程:列表讲解
【ODX Studio编辑PDX】-0.2-如何对比Compare两个PDX/ODX文件
[ODX studio edit PDX] - 0.2-how to compare two pdx/odx files of compare
常用技术指标之一文读懂BOLL布林线指标
String类中的常用方法
HMS core unified scanning service
LIst 相关待整理的知识点
Redis入门完整教程:Bitmaps
ETCD数据库源码分析——处理Entry记录简要流程
该如何去选择证券公司,手机上开户安不安全