当前位置:网站首页>Lxml module (data extraction)
Lxml module (data extraction)
2022-07-01 06:27:00 【HHYZBC】
lxml yes Python Third party parsing library , Before the first use, you need to use the following command to download
pip install lxmllxml Usage flow
lxml One is provided in the module etree modular , This module is dedicated to parsing HTML/XML file
- The import module
from lxml import etree- Initialize the parsing object
parse_html = etree.HTML(html)HTML() The method can HTML The tag string resolves to HTML file , This method can automatically correct HTML Text .parse_html It's just a variable name , Subsequent calls xpath Expressions are all completed on the basis of this object .
- call xpath expression
r_list = parse_html.xpath('xpath expression ')xpath Method will meet xpath The result of the expression is returned as a list .
xpath expression
Common path expressions
- nodename
- Select all children of this node .
- /
- Select from root node .
- //
- Select the node in the document from the current node that matches the selection , Regardless of their location .
- .
- Select the current node .
- ..
- Select the parent of the current node .
- @
- Select Properties .
Common methods
- text()
- Return the data of this node , Such as :
<a href="#"> Ha ha ha </a>- Use text() Method will return hahaha
See the official website for more functions :
XPath course (w3school.com.cn)
https://www.w3school.com.cn/xpath/index.asp
边栏推荐
- 【#Unity Shader#自定义材质面板_第一篇】
- C language course set up salary management system (big homework)
- Dongle data collection
- Understanding of C manualresetevent class
- 局域网监控软件有哪些功能
- [ITSM] what is ITSM and why does it department need ITSM
- 码力十足学量化|如何在财务报告寻找合适的财务公告
- C language course design student information management system (big homework)
- libpng12.so. 0: cannot open shared object file: no such file or directory
- lxml模块(数据提取)
猜你喜欢

HCM Beginner (III) - quickly enter pa70 and pa71 to browse employee information PA10

idea 好用插件汇总!!!

Teach you how to implement a deep learning framework

High order binary balanced tree

VS2019如何永久配置本地OpenCV4.5.5使用

高阶-二叉平衡树

图片服务器项目测试
![[ITSM] what is ITSM and why does it department need ITSM](/img/e1/85b5f00f124829b6a6b40c5cf621bd.png)
[ITSM] what is ITSM and why does it department need ITSM

Forkjoin and stream flow test

C语言课设销售管理系统设计(大作业)
随机推荐
libpng12.so. 0: cannot open shared object file: no such file or directory
三分钟带你快速了解网站开发的整个流程
Elements of database ER diagram
Discrimination between left and right limits of derivatives and left and right derivatives
C language course is provided with employee information management system (large operation)
Excel visualization
VS2019如何永久配置本地OpenCV4.5.5使用
存储函数学习笔记
[ManageEngine] terminal management system helps Huasheng securities' digital transformation
async 与 await
Although pycharm is marked with red in the run-time search path, it does not affect the execution of the program
Detailed steps for installing redis on Windows system
Tidb database characteristics summary
How does the port scanning tool help enterprises?
HCM Beginner (I) - Introduction
MongoDB:一、MongoDB是什么?MongoDB的优缺点
Restframework-simplejwt rewrite authentication mechanism
sql中TCL语句(事务控制语句)
lxml模块(数据提取)
Pol8901 LVDS to Mipi DSI supports rotating image processing chip