当前位置:网站首页>[crawler] XPath for data extraction
[crawler] XPath for data extraction
2022-07-04 23:10:00 【Speech unrecognized】
install
pip install lxml
Guide pack
from lxml import etree
Use
take html character string Convert to element object
# take html character string Convert to element object
from lxml import etree
element = etree.HTML(html_str)
The following is through element object .xpath(' Matching rules ') To extract content
Get tag
Use / Represents the root node , Path and transition between paths
/html/xx/xx/xxx
Use // Cross node selection , Go directly to the desired label or text
//xxx # Get all xxx label
Use .
./ Current node
Use ..
../ # The upper node of the current node
.// When not complete html when , Use , Get relative path
get attribute
@ Property name Get the current tag The attribute value corresponding to this attribute
//img/@src # all img Of scr attribute
Get text
/text() Get the text content in the tag // Tag name [contains( text() , ' written words ' ) ] Get contains In words label
//ol/li//span[contains(text(),' Playable ')]
Get specific condition tags
// Tag name [@ Property name = value ] Locate specific tags according to their attribute values
//span[@class='title'] # You can get it by class name
// Tag name [ Indexes ] Index from 1 Start
Get from the front // Upper label / Tag name [position()>3] From 4 Start
Get from the back // Upper label / Tag name [last()] Get the last // Upper label / Tag name [last() - 2] Last but not least 3 individual
combination //ol/li[position()>1][position()<last()-2]
// Tag name [text()=' value '] Locate the specific label according to the specific text content in the label , You need to match every word
//ol/li//span[text()='[ Playable ]'] # The matching tag content is [ Playable ] The label of
边栏推荐
- Redis入门完整教程:有序集合详解
- Sword finger offer 67 Convert a string to an integer
- Redis démarrer le tutoriel complet: Pipeline
- A complete tutorial for getting started with redis: transactions and Lua
- mamp下缺少pcntl扩展的解决办法,Fatal error: Call to undefined function pcntl_signal()
- Analysis of the self increasing and self decreasing of C language function parameters
- cout/cerr/clog的区别
- 【剑指offer】1-5题
- Redis入门完整教程:Bitmaps
- String类中的常用方法
猜你喜欢

cout/cerr/clog的区别

【剑指offer】1-5题

SHP data making 3dfiles white film

Tweenmax emoticon button JS special effect

Actual combat simulation │ JWT login authentication

Redis入门完整教程:Pipeline

Redis入门完整教程:GEO

A complete tutorial for getting started with redis: redis shell

Analysis of the self increasing and self decreasing of C language function parameters

Advanced area a of attack and defense world misc Masters_ good_ idea
随机推荐
A complete tutorial for getting started with redis: hyperloglog
Sobel filter
One of the commonly used technical indicators, reading boll Bollinger line indicators
Redis getting started complete tutorial: Key Management
Google Earth engine (GEE) -- take modis/006/mcd19a2 as an example to batch download the daily mean, maximum, minimum, standard deviation, statistical analysis of variance and CSV download of daily AOD
Attack and defense world misc master advanced zone 001 normal_ png
Redis introduction complete tutorial: Collection details
微信小程序显示样式知识点总结
PS style JS webpage graffiti board plug-in
Servlet+JDBC+MySQL简单web练习
UML diagram memory skills
The small program vant tab component solves the problem of too much text and incomplete display
CTF竞赛题解之stm32逆向入门
Question brushing guide public
JS 3D explosive fragment image switching JS special effect
Redis getting started complete tutorial: hash description
[Jianzhi offer] 6-10 questions
[ODX studio edit PDX] - 0.2-how to compare two pdx/odx files of compare
该如何去选择证券公司,手机上开户安不安全
Basic knowledge of database