当前位置:网站首页>Lxml module (data extraction)
Lxml module (data extraction)
2022-07-01 06:27:00 【HHYZBC】
lxml yes Python Third party parsing library , Before the first use, you need to use the following command to download
pip install lxmllxml Usage flow
lxml One is provided in the module etree modular , This module is dedicated to parsing HTML/XML file
- The import module
from lxml import etree- Initialize the parsing object
parse_html = etree.HTML(html)HTML() The method can HTML The tag string resolves to HTML file , This method can automatically correct HTML Text .parse_html It's just a variable name , Subsequent calls xpath Expressions are all completed on the basis of this object .
- call xpath expression
r_list = parse_html.xpath('xpath expression ')xpath Method will meet xpath The result of the expression is returned as a list .
xpath expression
Common path expressions
- nodename
- Select all children of this node .
- /
- Select from root node .
- //
- Select the node in the document from the current node that matches the selection , Regardless of their location .
- .
- Select the current node .
- ..
- Select the parent of the current node .
- @
- Select Properties .
Common methods
- text()
- Return the data of this node , Such as :
<a href="#"> Ha ha ha </a>- Use text() Method will return hahaha
See the official website for more functions :
XPath course (w3school.com.cn)
https://www.w3school.com.cn/xpath/index.asp
边栏推荐
- Camouflage request header Library: Anti useragent
- 子类调用父类的同名方法和属性
- 【#Unity Shader#Amplify Shader Editor(ASE)_第九篇】
- 图片服务器项目测试
- C#如何打印輸出原版數組
- SystemVerilog learning-09-interprocess synchronization, communication and virtual methods
- 高阶-二叉搜索树详解
- 【自动化运维】自动化运维平台有什么用
- 阿里OSS Postman Invalid according to Policy: Policy Condition failed: [“starts-with“, “$key“, “test/“]
- Recueillir des trésors dans le palais souterrain (recherche de mémoire profonde)
猜你喜欢

【#Unity Shader#Amplify Shader Editor(ASE)_第九篇】
![[file system] how to run squashfs on UBI](/img/d7/a4769420c510c47f3c2a615b514a8e.png)
[file system] how to run squashfs on UBI
![[self use of advanced mathematics in postgraduate entrance examination] advanced mathematics Chapter 1 thinking map in basic stage](/img/54/f187e22ad69f3985d30376bad1fa03.png)
[self use of advanced mathematics in postgraduate entrance examination] advanced mathematics Chapter 1 thinking map in basic stage

Design of sales management system for C language course (big homework)

High order binary search tree

【自动化运维】自动化运维平台有什么用

浅谈SIEM

How did ManageEngine Zhuohao achieve the goal of being selected into Gartner Magic Quadrant for four consecutive years?

数据库产生死锁了请问一下有没有解决办法

Redis安装到Windows系统上的详细步骤
随机推荐
子类调用父类的同名方法和属性
SystemVerilog learning-06-class encapsulation
局域网监控软件有哪些功能
Elements of database ER diagram
【ITSM】什么是ITSM,IT部门为什么需要ITSM
SQL中DML语句(数据操作语言)
async 与 await
连续四年入选Gartner魔力象限,ManageEngine卓豪是如何做到的?
网络爬虫
【ManageEngine】如何实现网络自动化运维
做技术,自信不可或缺
JSON module
[ManageEngine Zhuohao] helps Julia college, the world's top Conservatory of music, improve terminal security
Requests module (requests)
HCM Beginner (I) - Introduction
【LeetCode】Day91-存在重复元素
C language course design student information management system (big homework)
Although pycharm is marked with red in the run-time search path, it does not affect the execution of the program
C语言课设工资管理系统(大作业)
端口扫描工具对企业有什么帮助?