当前位置:网站首页>12 initialization of beautifulsoup class
12 initialization of beautifulsoup class
2022-06-22 00:55:00 【Andy Python】
12 BeautifulSoup Class initialization
beautifulsoup4 Shorthand for bs4.
bs4 Kuo is Python Third party library .
The function is to extract data from documents .
bs4 It's the library .
BeautifulSoup It's a class .
【 Knowledge review 】
The first letter of the class should be capitalized .
Class instantiation syntax : object = Class name ( Parameters )
12.1 BeautifulSoup Class initialization method
1. Initialization steps

2. initialization BeautifulSoup object
# from bs4 Import... In the library BeautifulSoup class
from bs4 import BeautifulSoup
# Pass in markup、features2 Parameters , Get an instantiated object
# object = Class name ( Parameters )
soup = BeautifulSoup(markup=, features=)
12.2 BeautifulSoup The meaning of the parameter
1. Parameters markup
Parameters markup Refers to the resolved HTML String or file content .
1. Use string variables
# from bs4 Import... In the library BeautifulSoup class
from bs4 import BeautifulSoup
# html_str Is a string variable , It is usually obtained from the previous step HTML Code
soup = BeautifulSoup(html_str)
2. Use open() Function to open a file
# from bs4 Import... In the library BeautifulSoup class
from bs4 import BeautifulSoup
# Use open Function to open the file , Get the file object
# File objects can also be used as initialization parameters
# index.html finger HTML Code
soup = BeautifulSoup(open(index.html))
2. Parameters features
Parameters features Refers to the type of parser
1. Specify the parser
# from bs4 Import... In the library BeautifulSoup class
from bs4 import BeautifulSoup
# html_str To parse HTML Code ( The data type is string )
# The parser is 'lxml', Notice the quotation marks around the parser
# object = Class name ( Parameters )
soup = BeautifulSoup(html_str, 'lxml')
# from bs4 Import... In the library BeautifulSoup class
from bs4 import BeautifulSoup
# html_str To parse HTML Code ( The data type is string )
# The parser is 'html.parser', Notice the quotation marks around the parser
# object = Class name ( Parameters )
soup = BeautifulSoup(html_str, 'html.parser')
2. Parser not specified , BeautifulSoup Select the default parser to parse the document
# from bs4 Import... In the library BeautifulSoup class
from bs4 import BeautifulSoup
# html_str To parse HTML Code ( The data type is string )
# The parser is 'html.parser', Notice the quotation marks around the parser
soup = BeautifulSoup(html_str)
12.3 summary

边栏推荐
- Meetup03期回顾:Linkis新版本介绍以及DSS的应用实践
- Unicode is not defined_ String identity solution
- [examination skills] memory method and simple derivation of Green formula
- 怎么读一篇论文
- HarmonyOS应用开发第二次作业笔记
- JVM調優簡要思想及簡單案例-老年代空間分配擔保機制
- pytorch学习04:Tensor的创建
- 纯净IP怎么判断?哪里有?贵吗?
- Have you stepped on the 8 most common SQL grammars at work?
- SQL语句——数据更新、修改、删除
猜你喜欢

Query of the range of the cotolly tree chtolly tree old driver tree

How the conductive slip ring works

Client construction and Optimization Practice

Summary of new MySQL 8.0 features

再次认识 WebAssembly

pytorch学习05:索引和切片

对面积的曲面积分中dS与dxdy的转换

eslint:错误

2. 两数相加

Tom Ellison, the new CFO of mendix, promoted the next stage of rapid growth of the company through the transformation of the leadership team
随机推荐
客户端建设及调优实践
NS32F103VBT6软硬件替代STM32F103VBT6
Transformation of DS and DXDY in surface integral of area
花了2小时,搭建了一个物联网项目,值了 ~
Web应用系统开发的两种流行架构
Status code summary
关于一次Web线下面试的思考
关于一次Web线下面试的思考
纯净IP怎么判断?哪里有?贵吗?
小小协议大威力,数字化转型为何缺不了NVMe全闪存?
pytorch学习05:索引和切片
Unicode is not defined_ String identity solution
leetcode 279. Perfect squares (medium)
pytorch学习10:统计运算
Enterprises can improve database security in four ways
积分体系运营汇中,用户的哪些行为可以获得积分
面试官竟然问我订单ID是怎么生成的?难道不是MySQL自增主键?
The importance of rational selection of seal clearance of hydraulic slip ring
Root detection implementation
ARM 的 缓存一致性