当前位置:网站首页>[crawler] XPath for data extraction
[crawler] XPath for data extraction
2022-07-04 23:10:00 【Speech unrecognized】
install
pip install lxml
Guide pack
from lxml import etree
Use
take html character string Convert to element object
# take html character string Convert to element object
from lxml import etree
element = etree.HTML(html_str)
The following is through element object .xpath(' Matching rules ') To extract content
Get tag
Use / Represents the root node , Path and transition between paths
/html/xx/xx/xxx
Use // Cross node selection , Go directly to the desired label or text
//xxx # Get all xxx label
Use .
./ Current node
Use ..
../ # The upper node of the current node
.// When not complete html when , Use , Get relative path
get attribute
@ Property name Get the current tag The attribute value corresponding to this attribute
//img/@src # all img Of scr attribute
Get text
/text() Get the text content in the tag // Tag name [contains( text() , ' written words ' ) ] Get contains In words label
//ol/li//span[contains(text(),' Playable ')]
Get specific condition tags
// Tag name [@ Property name = value ] Locate specific tags according to their attribute values
//span[@class='title'] # You can get it by class name
// Tag name [ Indexes ] Index from 1 Start
Get from the front // Upper label / Tag name [position()>3] From 4 Start
Get from the back // Upper label / Tag name [last()] Get the last // Upper label / Tag name [last() - 2] Last but not least 3 individual
combination //ol/li[position()>1][position()<last()-2]
// Tag name [text()=' value '] Locate the specific label according to the specific text content in the label , You need to match every word
//ol/li//span[text()='[ Playable ]'] # The matching tag content is [ Playable ] The label of
边栏推荐
- 金融市场,资产管理与投资基金
- Notepad++--编辑的技巧
- vim编辑器知识总结
- SPH中的粒子初始排列问题(两张图解决)
- QT drawing network topology diagram (connecting database, recursive function, infinite drawing, dragging nodes)
- The difference between cout/cerr/clog
- Editplus-- usage -- shortcut key / configuration / background color / font size
- Redis入门完整教程:Redis使用场景
- MySQL数据库备份与恢复--mysqldump命令
- 【剑指offer】1-5题
猜你喜欢
![[machine learning] handwritten digit recognition](/img/26/cabdc5c92035181d82f6f809e6df0f.png)
[machine learning] handwritten digit recognition

Redis入门完整教程:键管理

Complete tutorial for getting started with redis: bitmaps

Redis入门完整教程:集合详解

Analysis of the self increasing and self decreasing of C language function parameters

cout/cerr/clog的区别

【机器学习】手写数字识别

Qt个人学习总结

VIM editor knowledge summary

Redis入门完整教程:Pipeline
随机推荐
Google Earth engine (GEE) - tasks upgrade enables run all to download all images in task types with one click
Redis入门完整教程:集合详解
SHP data making 3dfiles white film
One of the commonly used technical indicators, reading boll Bollinger line indicators
Explanation of bitwise operators
Redis入门完整教程:事务与Lua
ETCD数据库源码分析——处理Entry记录简要流程
A complete tutorial for getting started with redis: redis usage scenarios
MP进阶操作: 时间操作, sql,querywapper,lambdaQueryWapper(条件构造器)快速筛选 枚举类
ECS settings SSH key login
数据库基础知识
The difference between Max and greatest in SQL
常用技术指标之一文读懂BOLL布林线指标
Basic knowledge of database
金融市场,资产管理与投资基金
A complete tutorial for getting started with redis: understanding and using APIs
P2181 对角线和P1030 [NOIP2001 普及组] 求先序排列
【剑指offer】1-5题
PS style JS webpage graffiti board plug-in
Redis démarrer le tutoriel complet: Pipeline