当前位置:网站首页>简单上手的页面请求和解析案例
简单上手的页面请求和解析案例
2022-07-05 12:40:00 【南湖渔歌】
<html>
<head>
<meta http-equiv=Content-Type content="text/html;charset=utf-8">
<title>网页标题</title>
</head>
<body>
<h1>标题1</h1>
<h2>标题2</h2>
<h3>标题3</h3>
<h4>标题4</h4>
<div id="content" class="default">
<p>段落</p>
<a href="http://www.baidu.com">百度</a> <br/>
<a href="http://www.crazyant.net">疯狂的蚂蚁</a> <br/>
<a href="http://www.iqiyi.com">爱奇艺</a> <br/>
<img src="https://www.python.org/static/img/python-logo.png"/>
</div>
</body>
</html>

# -*- coding=utf-8 -*-
from bs4 import BeautifulSoup
with open('./test.html',encoding='utf-8') as fin:
html_doc = fin.read()
soup = BeautifulSoup(html_doc,'html.parser')
links = soup.find_all('a')
for link in links:
print(link.name,link['href'],link.get_text())
img = soup.find('img')
print(img['src'])
# 升级版:
from bs4 import BeautifulSoup
with open('./test.html',encoding='utf-8') as fin:
html_doc = fin.read()
soup = BeautifulSoup(html_doc,'html.parser')
div_node = soup.find('div',id = 'content') # 先查找大的区块
print(div_node)
print("#"*50)
links = div_node.find_all('a')
for link in links:
print(link.name,link['href'],link.get_text())
img = div_node.find('img')
print(img['src'])

边栏推荐
- SAP SEGW 事物码里的 ABAP 类型和 EDM 类型映射的一个具体例子
- 10 minute fitness method reading notes (1/5)
- SAP SEGW 事物码里的 ABAP Editor
- About LDA model
- JXL notes
- SAP self-development records user login logs and other information
- Full text search of MySQL
- Redis master-slave configuration and sentinel mode
- 单独编译内核模块
- 10 minute fitness method reading notes (3/5)
猜你喜欢

Install rhel8.2 virtual machine

SAP SEGW 事物码里的导航属性(Navigation Property) 和 EntitySet 使用方法

Volatile instruction rearrangement and why instruction rearrangement is prohibited

国内市场上的BI软件,到底有啥区别

Notes for preparation of information system project manager --- information knowledge

2021-12-22 transaction record

Taobao short videos are automatically released in batches without manual RPA open source

SAP 自开发记录用户登录日志等信息

CF:A. The Third Three Number Problem【关于我是位运算垃圾这个事情】

Simply take stock reading notes (2/8)
随机推荐
Vonedao solves the problem of organizational development effectiveness
RHCSA4
Distance measuring sensor chip 4530a used in home intelligent lighting
I met Tencent in the morning and took out 38K, which showed me the basic smallpox
由扫地增而引起的小叙
2021-12-21 transaction record
SAP SEGW 事物码里的 ABAP 类型和 EDM 类型映射的一个具体例子
leetcode:221. 最大正方形【dp状态转移的精髓】
View and terminate the executing thread in MySQL
2021.12.16-2021.12.20 empty four hand transaction records
Shi Zhenzhen's 2021 summary and 2022 outlook | colorful eggs at the end of the article
Using MySQL in docker
从39个kaggle竞赛中总结出来的图像分割的Tips和Tricks
Laravel文档阅读笔记-mews/captcha的使用(验证码功能)
Flume common commands and basic operations
以VMware创新之道,重塑多云产品力
Pinduoduo flag insertion remarks API
View and modify the MySQL data storage directory under centos7
初识Linkerd项目
Difference between JUnit theories and parameterized tests