当前位置:网站首页>Simple page request and parsing cases
Simple page request and parsing cases
2022-07-05 13:01:00 【South Lake Fishing Song】
<html>
<head>
<meta http-equiv=Content-Type content="text/html;charset=utf-8">
<title> Webpage title </title>
</head>
<body>
<h1> title 1</h1>
<h2> title 2</h2>
<h3> title 3</h3>
<h4> title 4</h4>
<div id="content" class="default">
<p> The paragraph </p>
<a href="http://www.baidu.com"> Baidu </a> <br/>
<a href="http://www.crazyant.net"> Crazy ant </a> <br/>
<a href="http://www.iqiyi.com"> Iqiyi </a> <br/>
<img src="https://www.python.org/static/img/python-logo.png"/>
</div>
</body>
</html>

# -*- coding=utf-8 -*-
from bs4 import BeautifulSoup
with open('./test.html',encoding='utf-8') as fin:
html_doc = fin.read()
soup = BeautifulSoup(html_doc,'html.parser')
links = soup.find_all('a')
for link in links:
print(link.name,link['href'],link.get_text())
img = soup.find('img')
print(img['src'])
# Upgraded version :
from bs4 import BeautifulSoup
with open('./test.html',encoding='utf-8') as fin:
html_doc = fin.read()
soup = BeautifulSoup(html_doc,'html.parser')
div_node = soup.find('div',id = 'content') # Find large blocks first
print(div_node)
print("#"*50)
links = div_node.find_all('a')
for link in links:
print(link.name,link['href'],link.get_text())
img = div_node.find('img')
print(img['src'])

边栏推荐
- 关于 SAP UI5 floating footer 显示与否的单步调试以及使用 SAP UI5 的收益
- NFT: how to make money with unique assets?
- 《2022年中國銀行業RPA供應商實力矩陣分析》研究報告正式啟動
- leetcode:221. 最大正方形【dp状态转移的精髓】
- 155. 最小栈
- Laravel文档阅读笔记-mews/captcha的使用(验证码功能)
- 跨平台(32bit和64bit)的 printf 格式符 %lld 输出64位的解决方式
- A small talk caused by the increase of sweeping
- 实战模拟│JWT 登录认证
- 《2022年中国银行业RPA供应商实力矩阵分析》研究报告正式启动
猜你喜欢

非技术部门,如何参与 DevOps?

Oppo Xiaobu launched Obert, a large pre training model, and promoted to the top of kgclue

A deep long article on the simplification and acceleration of join operation

RHCSA5

太方便了,钉钉上就可完成代码发布审批啦!

Get to know linkerd project for the first time

Transactions on December 23, 2021

SAP UI5 ObjectPageLayout 控件使用方法分享

RHCSA1

10 minute fitness method reading notes (5/5)
随机推荐
Taobao, pinduoduo, jd.com, Doudian order & Flag insertion remarks API solution
SAP UI5 DynamicPage 控件介紹
SAP SEGW 事物码里的 ABAP 类型和 EDM 类型映射的一个具体例子
Actual combat simulation │ JWT login authentication
Didi open source Delta: AI developers can easily train natural language models
关于 SAP UI5 getSAPLogonLanguage is not a function 的错误消息以及 API 版本的讨论
[cloud native] event publishing and subscription in Nacos -- observer mode
Kotlin process control and circulation
How do e-commerce sellers refund in batches?
Talk about my drawing skills in my writing career
leetcode:221. 最大正方形【dp状态转移的精髓】
深度长文探讨Join运算的简化和提速
Taobao short videos are automatically released in batches without manual RPA open source
Lepton 无损压缩原理及性能分析
A deep long article on the simplification and acceleration of join operation
##无监控,不运维,以下是监控里常用的脚本监控
MySQL 巨坑:update 更新慎用影响行数做判断!!!
Introduction to the principle of DNS
A possible investment strategy and a possible fuzzy fast stock valuation method
跨平台(32bit和64bit)的 printf 格式符 %lld 输出64位的解决方式