当前位置:网站首页>Lxml module (data extraction)

Lxml module (data extraction)

2022-07-01 06:27:00 HHYZBC

lxml yes Python Third party parsing library , Before the first use, you need to use the following command to download

pip install lxml

lxml Usage flow

lxml One is provided in the module etree modular , This module is dedicated to parsing HTML/XML file

  • The import module
from lxml import etree
  • Initialize the parsing object
parse_html = etree.HTML(html)

HTML() The method can HTML The tag string resolves to HTML file , This method can automatically correct HTML Text .parse_html It's just a variable name , Subsequent calls xpath Expressions are all completed on the basis of this object .

  • call xpath expression
r_list = parse_html.xpath('xpath expression ')

xpath Method will meet xpath The result of the expression is returned as a list .

xpath expression

Common path expressions

  • nodename
    • Select all children of this node .
  • /
    • Select from root node .
  • //
    • Select the node in the document from the current node that matches the selection , Regardless of their location .
  • .
    • Select the current node .
  • ..
    • Select the parent of the current node .
  • @
    • Select Properties .

Common methods

  • text()
    • Return the data of this node , Such as :
    • <a href="#"> Ha ha ha </a>
    • Use text() Method will return hahaha

See the official website for more functions :

XPath course (w3school.com.cn)icon-default.png?t=M5H6https://www.w3school.com.cn/xpath/index.asp

原网站

版权声明
本文为[HHYZBC]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/182/202207010617345470.html