当前位置:网站首页>简单上手的页面请求和解析案例
简单上手的页面请求和解析案例
2022-07-05 12:40:00 【南湖渔歌】
<html>
<head>
<meta http-equiv=Content-Type content="text/html;charset=utf-8">
<title>网页标题</title>
</head>
<body>
<h1>标题1</h1>
<h2>标题2</h2>
<h3>标题3</h3>
<h4>标题4</h4>
<div id="content" class="default">
<p>段落</p>
<a href="http://www.baidu.com">百度</a> <br/>
<a href="http://www.crazyant.net">疯狂的蚂蚁</a> <br/>
<a href="http://www.iqiyi.com">爱奇艺</a> <br/>
<img src="https://www.python.org/static/img/python-logo.png"/>
</div>
</body>
</html>

# -*- coding=utf-8 -*-
from bs4 import BeautifulSoup
with open('./test.html',encoding='utf-8') as fin:
html_doc = fin.read()
soup = BeautifulSoup(html_doc,'html.parser')
links = soup.find_all('a')
for link in links:
print(link.name,link['href'],link.get_text())
img = soup.find('img')
print(img['src'])
# 升级版:
from bs4 import BeautifulSoup
with open('./test.html',encoding='utf-8') as fin:
html_doc = fin.read()
soup = BeautifulSoup(html_doc,'html.parser')
div_node = soup.find('div',id = 'content') # 先查找大的区块
print(div_node)
print("#"*50)
links = div_node.find_all('a')
for link in links:
print(link.name,link['href'],link.get_text())
img = div_node.find('img')
print(img['src'])

边栏推荐
- Kotlin variable
- CVPR 2022 | 基于稀疏 Transformer 的单步三维目标识别器
- Distributed solution - distributed session consistency problem
- 你的下一台电脑何必是电脑,探索不一样的远程操作
- 实现 1~number 之间,所有数字的加和
- 从39个kaggle竞赛中总结出来的图像分割的Tips和Tricks
- 单独编译内核模块
- 2021-12-21 transaction record
- Didi open source Delta: AI developers can easily train natural language models
- mysql拆分字符串做条件查询
猜你喜欢

Distance measuring sensor chip 4530a used in home intelligent lighting

Setting up sqli lab environment

Comprehensive upgrade of Taobao short video photosynthetic platform

【云原生】Nacos-TaskManager 任务管理的使用

About LDA model

Taobao product details API | get baby SKU, main map, evaluation and other API interfaces

What if wechat is mistakenly sealed? Explain the underlying logic of wechat seal in detail

我在滴滴做开源

Lepton 无损压缩原理及性能分析

Taobao short videos are automatically released in batches without manual RPA open source
随机推荐
OPPO小布推出预训练大模型OBERT,晋升KgCLUE榜首
HiEngine:可媲美本地的云原生内存数据库引擎
Redis master-slave configuration and sentinel mode
Distance measuring sensor chip 4530a used in home intelligent lighting
Difference between JUnit theories and parameterized tests
Shi Zhenzhen's 2021 summary and 2022 outlook | colorful eggs at the end of the article
Redis cluster configuration
Wechat enterprise payment to change access, open quickly
非技术部门,如何参与 DevOps?
在家庭智能照明中应用的测距传感芯片4530A
Sqoop import and export operation
ActiveMQ installation and deployment simple configuration (personal test)
Install rhel8.2 virtual machine
Using MySQL in docker
MySQL 巨坑:update 更新慎用影响行数做判断!!!
leetcode:221. 最大正方形【dp状态转移的精髓】
Comprehensive upgrade of Taobao short video photosynthetic platform
Compile kernel modules separately
关于 SAP UI5 floating footer 显示与否的单步调试以及使用 SAP UI5 的收益
上午面了个腾讯拿 38K 出来的,让我见识到了基础的天花