当前位置:网站首页>Simple page request and parsing cases
Simple page request and parsing cases
2022-07-05 13:01:00 【South Lake Fishing Song】
<html>
<head>
<meta http-equiv=Content-Type content="text/html;charset=utf-8">
<title> Webpage title </title>
</head>
<body>
<h1> title 1</h1>
<h2> title 2</h2>
<h3> title 3</h3>
<h4> title 4</h4>
<div id="content" class="default">
<p> The paragraph </p>
<a href="http://www.baidu.com"> Baidu </a> <br/>
<a href="http://www.crazyant.net"> Crazy ant </a> <br/>
<a href="http://www.iqiyi.com"> Iqiyi </a> <br/>
<img src="https://www.python.org/static/img/python-logo.png"/>
</div>
</body>
</html>

# -*- coding=utf-8 -*-
from bs4 import BeautifulSoup
with open('./test.html',encoding='utf-8') as fin:
html_doc = fin.read()
soup = BeautifulSoup(html_doc,'html.parser')
links = soup.find_all('a')
for link in links:
print(link.name,link['href'],link.get_text())
img = soup.find('img')
print(img['src'])
# Upgraded version :
from bs4 import BeautifulSoup
with open('./test.html',encoding='utf-8') as fin:
html_doc = fin.read()
soup = BeautifulSoup(html_doc,'html.parser')
div_node = soup.find('div',id = 'content') # Find large blocks first
print(div_node)
print("#"*50)
links = div_node.find_all('a')
for link in links:
print(link.name,link['href'],link.get_text())
img = div_node.find('img')
print(img['src'])

边栏推荐
- 逆波兰表达式
- 10 minute fitness method reading notes (1/5)
- 开发者,云原生数据库是未来吗?
- 函数传递参数小案例
- A deep long article on the simplification and acceleration of join operation
- MySQL giant pit: update updates should be judged with caution by affecting the number of rows!!!
- Kotlin variable
- RHCAS6
- Association modeling method in SAP segw transaction code
- Vonedao solves the problem of organizational development effectiveness
猜你喜欢

RHCSA5

Association modeling method in SAP segw transaction code

I'm doing open source in Didi

Setting up sqli lab environment

Transactions on December 23, 2021

Taobao short video, why the worse the effect

无密码身份验证如何保障用户隐私安全?

SAP SEGW 事物码里的 ABAP Editor

Natural language processing series (I) introduction overview

RHCSA3
随机推荐
《信息系统项目管理师》备考笔记---信息化知识
初识Linkerd项目
How to connect the API interface of Taobao open platform (super detailed)
开发者,云原生数据库是未来吗?
【Nacos云原生】阅读源码第一步,本地启动Nacos
stm32和电机开发(从架构图到文档编写)
HiEngine:可媲美本地的云原生内存数据库引擎
超高效!Swagger-Yapi的秘密
Transactions from January 6 to October 2022
奔跑,开路
Rasa Chat Robot Tutorial (translation) (1)
研究:数据安全工具在 60% 的情况下无法抵御勒索软件
SAP UI5 DynamicPage 控件介绍
RHCSA2
SAP SEGW 事物码里的 ABAP 类型和 EDM 类型映射的一个具体例子
Taobao short video, why the worse the effect
Transactions from January 14 to 19, 2022
155. Minimum stack
HiEngine:可媲美本地的云原生内存数据库引擎
Why is your next computer a computer? Explore different remote operations