当前位置:网站首页>Lxml module (data extraction)
Lxml module (data extraction)
2022-07-01 06:27:00 【HHYZBC】
lxml yes Python Third party parsing library , Before the first use, you need to use the following command to download
pip install lxml
lxml Usage flow
lxml One is provided in the module etree modular , This module is dedicated to parsing HTML/XML file
- The import module
from lxml import etree
- Initialize the parsing object
parse_html = etree.HTML(html)
HTML() The method can HTML The tag string resolves to HTML file , This method can automatically correct HTML Text .parse_html It's just a variable name , Subsequent calls xpath Expressions are all completed on the basis of this object .
- call xpath expression
r_list = parse_html.xpath('xpath expression ')
xpath Method will meet xpath The result of the expression is returned as a list .
xpath expression
Common path expressions
- nodename
- Select all children of this node .
- /
- Select from root node .
- //
- Select the node in the document from the current node that matches the selection , Regardless of their location .
- .
- Select the current node .
- ..
- Select the parent of the current node .
- @
- Select Properties .
Common methods
- text()
- Return the data of this node , Such as :
<a href="#"> Ha ha ha </a>
- Use text() Method will return hahaha
See the official website for more functions :
XPath course (w3school.com.cn)https://www.w3school.com.cn/xpath/index.asp
边栏推荐
- Async and await
- High order binary search tree
- [ManageEngine Zhuohao] mobile terminal management solution, helping the digital transformation of Zhongzhou aviation industry
- 局域网监控软件有哪些功能
- sql中TCL语句(事务控制语句)
- 端口扫描工具对企业有什么帮助?
- [summary of knowledge points] chi square distribution, t distribution, F distribution
- [self use of advanced mathematics in postgraduate entrance examination] advanced mathematics Chapter 1 thinking map in basic stage
- 存储函数学习笔记
- 【Unity Shader 描边效果_案例分享第一篇】
猜你喜欢
Teach you how to implement a deep learning framework
浅谈SIEM
FPGA - 7 Series FPGA internal structure clocking-01-clock Architecture Overview
[ManageEngine Zhuohao] helps Julia college, the world's top Conservatory of music, improve terminal security
【ManageEngine卓豪 】助力世界顶尖音乐学院--茱莉亚学院,提升终端安全
SQL语句
【企业数据安全】升级备份策略 保障企业数据安全
Top 10 Free 3D modeling software for beginners in 2022
[automatic operation and maintenance] what is the use of the automatic operation and maintenance platform
Promise
随机推荐
三分钟带你快速了解网站开发的整个流程
B-tree series
HDU - 1501 zipper (memory deep search)
ManageEngine Zhuohao helps you comply with ISO 20000 standard (IV)
【ManageEngine卓豪 】助力世界顶尖音乐学院--茱莉亚学院,提升终端安全
Async and await
端口扫描工具是什么?端口扫描工具有什么用
启牛学堂合作的证券公司是哪家?开户安全吗?
Restframework-simplejwt rewrite authentication mechanism
自开发软件NoiseCreater1.1版本免费试用
[ManageEngine Zhuohao] the role of LAN monitoring
Pol8901 LVDS to Mipi DSI supports rotating image processing chip
Requests module (requests)
Movable mechanical wall clock
浅谈SIEM
C语言课设销售管理系统设计(大作业)
【ManageEngine卓豪】助力黄石爱康医院实现智能批量化网络设备配置管理
Tidb single machine simulation deployment production environment cluster (closed pit practice, personal test is effective)
【ManageEngine卓豪】网络运维管理是什么,网络运维平台有什么用
数据库对象:视图学习记录