当前位置:网站首页>简单上手的页面请求和解析案例
简单上手的页面请求和解析案例
2022-07-05 12:40:00 【南湖渔歌】
<html>
<head>
<meta http-equiv=Content-Type content="text/html;charset=utf-8">
<title>网页标题</title>
</head>
<body>
<h1>标题1</h1>
<h2>标题2</h2>
<h3>标题3</h3>
<h4>标题4</h4>
<div id="content" class="default">
<p>段落</p>
<a href="http://www.baidu.com">百度</a> <br/>
<a href="http://www.crazyant.net">疯狂的蚂蚁</a> <br/>
<a href="http://www.iqiyi.com">爱奇艺</a> <br/>
<img src="https://www.python.org/static/img/python-logo.png"/>
</div>
</body>
</html>

# -*- coding=utf-8 -*-
from bs4 import BeautifulSoup
with open('./test.html',encoding='utf-8') as fin:
html_doc = fin.read()
soup = BeautifulSoup(html_doc,'html.parser')
links = soup.find_all('a')
for link in links:
print(link.name,link['href'],link.get_text())
img = soup.find('img')
print(img['src'])
# 升级版:
from bs4 import BeautifulSoup
with open('./test.html',encoding='utf-8') as fin:
html_doc = fin.read()
soup = BeautifulSoup(html_doc,'html.parser')
div_node = soup.find('div',id = 'content') # 先查找大的区块
print(div_node)
print("#"*50)
links = div_node.find_all('a')
for link in links:
print(link.name,link['href'],link.get_text())
img = div_node.find('img')
print(img['src'])

边栏推荐
- CF:A. The Third Three Number Problem【关于我是位运算垃圾这个事情】
- Didi open source Delta: AI developers can easily train natural language models
- 太方便了,钉钉上就可完成代码发布审批啦!
- How can labels/legends be added for all chart types in chart. js (chartjs.org)?
- 实战模拟│JWT 登录认证
- Notes for preparation of information system project manager --- information knowledge
- #yyds干货盘点#js截取文件后缀名
- 946. 验证栈序列
- Pinduoduo flag insertion remarks API
- 由扫地增而引起的小叙
猜你喜欢

解决 UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0xa2 in position 107
![[cloud native] event publishing and subscription in Nacos -- observer mode](/img/0f/34ab42b7fb0085f58f36eb67b6f107.png)
[cloud native] event publishing and subscription in Nacos -- observer mode

【云原生】Nacos-TaskManager 任务管理的使用

太方便了,钉钉上就可完成代码发布审批啦!

OPPO小布推出预训练大模型OBERT,晋升KgCLUE榜首

Transactions from January 14 to 19, 2022

从39个kaggle竞赛中总结出来的图像分割的Tips和Tricks

JDBC -- extract JDBC tool classes

CVPR 2022 | 基于稀疏 Transformer 的单步三维目标识别器

leetcode:221. 最大正方形【dp状态转移的精髓】
随机推荐
A possible investment strategy and a possible fuzzy fast stock valuation method
Alipay transfer system background or API interface to avoid pitfalls
Install rhel8.2 virtual machine
Didi open source Delta: AI developers can easily train natural language models
Redis cluster configuration
JXL notes
Distance measuring sensor chip 4530a used in home intelligent lighting
10 minute fitness method reading notes (5/5)
Transactions from January 6 to October 2022
自然语言处理从小白到精通(四):用机器学习做中文邮件内容分类
A small talk caused by the increase of sweeping
JSON parsing error special character processing (really speechless... Troubleshooting for a long time)
Distributed cache architecture - cache avalanche & penetration & hit rate
CVPR 2022 | 基于稀疏 Transformer 的单步三维目标识别器
Introduction to the principle of DNS
Transactions from December 27 to 28, 2021
##无监控,不运维,以下是监控里常用的脚本监控
Sqoop import and export operation
2021-12-21 transaction record
10 minute fitness method reading notes (1/5)