当前位置:网站首页>[crawler] XPath for data extraction
[crawler] XPath for data extraction
2022-07-04 23:10:00 【Speech unrecognized】
install
pip install lxml
Guide pack
from lxml import etree
Use
take html character string Convert to element object
# take html character string Convert to element object
from lxml import etree
element = etree.HTML(html_str)
The following is through element object .xpath(' Matching rules ')
To extract content
Get tag
Use /
Represents the root node , Path and transition between paths
/html/xx/xx/xxx
Use //
Cross node selection , Go directly to the desired label or text
//xxx # Get all xxx label
Use .
./ Current node
Use ..
../ # The upper node of the current node
.//
When not complete html when , Use , Get relative path
get attribute
@ Property name
Get the current tag The attribute value corresponding to this attribute
//img/@src # all img Of scr attribute
Get text
/text()
Get the text content in the tag // Tag name [contains( text() , ' written words ' ) ]
Get contains In words label
//ol/li//span[contains(text(),' Playable ')]
Get specific condition tags
// Tag name [@ Property name = value ]
Locate specific tags according to their attribute values
//span[@class='title'] # You can get it by class name
// Tag name [ Indexes ]
Index from 1 Start
Get from the front // Upper label / Tag name [position()>3]
From 4 Start
Get from the back // Upper label / Tag name [last()]
Get the last // Upper label / Tag name [last() - 2]
Last but not least 3 individual
combination //ol/li[position()>1][position()<last()-2]
// Tag name [text()=' value ']
Locate the specific label according to the specific text content in the label , You need to match every word
//ol/li//span[text()='[ Playable ]'] # The matching tag content is [ Playable ] The label of
边栏推荐
- 剑指Offer 68 - II. 二叉树的最近公共祖先
- Question brushing guide public
- [ODX studio edit PDX] - 0.2-how to compare two pdx/odx files of compare
- ECS settings SSH key login
- Install the gold warehouse database of NPC
- Redis: redis message publishing and subscription (understand)
- Redis getting started complete tutorial: Geo
- Redis入门完整教程:Redis Shell
- Redis入门完整教程:集合详解
- Pagoda 7.9.2 pagoda control panel bypasses mobile phone binding authentication bypasses official authentication
猜你喜欢
随机推荐
推荐收藏:跨云数据仓库(data warehouse)环境搭建,这货特别干!
Feature scaling normalization
phpcms付费阅读功能支付宝支付
Redis入门完整教程:事务与Lua
Talk about Middleware
【机器学习】手写数字识别
SHP data making 3dfiles white film
ScriptableObject
QT drawing network topology diagram (connecting database, recursive function, infinite drawing, dragging nodes)
[roommate learned to use Bi report data processing in the time of King glory in one game]
剑指Offer 68 - II. 二叉树的最近公共祖先
Install the gold warehouse database of NPC
【爬虫】数据提取之JSONpath
OSEK标准ISO_17356汇总介绍
[OpenGL] note 29 anti aliasing (MSAA)
SPH中的粒子初始排列问题(两张图解决)
ffmpeg快速剪辑
One of the commonly used technical indicators, reading boll Bollinger line indicators
Notepad++ -- editing skills
The solution to the lack of pcntl extension under MAMP, fatal error: call to undefined function pcntl_ signal()