当前位置:网站首页>简单上手的页面请求和解析案例
简单上手的页面请求和解析案例
2022-07-05 12:40:00 【南湖渔歌】
<html>
<head>
<meta http-equiv=Content-Type content="text/html;charset=utf-8">
<title>网页标题</title>
</head>
<body>
<h1>标题1</h1>
<h2>标题2</h2>
<h3>标题3</h3>
<h4>标题4</h4>
<div id="content" class="default">
<p>段落</p>
<a href="http://www.baidu.com">百度</a> <br/>
<a href="http://www.crazyant.net">疯狂的蚂蚁</a> <br/>
<a href="http://www.iqiyi.com">爱奇艺</a> <br/>
<img src="https://www.python.org/static/img/python-logo.png"/>
</div>
</body>
</html>
# -*- coding=utf-8 -*-
from bs4 import BeautifulSoup
with open('./test.html',encoding='utf-8') as fin:
html_doc = fin.read()
soup = BeautifulSoup(html_doc,'html.parser')
links = soup.find_all('a')
for link in links:
print(link.name,link['href'],link.get_text())
img = soup.find('img')
print(img['src'])
# 升级版:
from bs4 import BeautifulSoup
with open('./test.html',encoding='utf-8') as fin:
html_doc = fin.read()
soup = BeautifulSoup(html_doc,'html.parser')
div_node = soup.find('div',id = 'content') # 先查找大的区块
print(div_node)
print("#"*50)
links = div_node.find_all('a')
for link in links:
print(link.name,link['href'],link.get_text())
img = div_node.find('img')
print(img['src'])
边栏推荐
- Introduction aux contrôles de la page dynamique SAP ui5
- SAP self-development records user login logs and other information
- How do e-commerce sellers refund in batches?
- Setting up sqli lab environment
- 【Nacos云原生】阅读源码第一步,本地启动Nacos
- Yyds dry inventory JS intercept file suffix
- [cloud native] use of Nacos taskmanager task management
- OPPO小布推出预训练大模型OBERT,晋升KgCLUE榜首
- 上午面了个腾讯拿 38K 出来的,让我见识到了基础的天花
- Install rhel8.2 virtual machine
猜你喜欢
Shi Zhenzhen's 2021 summary and 2022 outlook | colorful eggs at the end of the article
【云原生】Nacos中的事件发布与订阅--观察者模式
Simply take stock reading notes (4/8)
[cloud native] event publishing and subscription in Nacos -- observer mode
解决 UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0xa2 in position 107
SAP UI5 ObjectPageLayout 控件使用方法分享
10 minute fitness method reading notes (5/5)
前几年外包干了四年,秋招感觉人生就这样了..
leetcode:221. 最大正方形【dp状态转移的精髓】
《信息系统项目管理师》备考笔记---信息化知识
随机推荐
RHCSA2
Common commands and basic operations of Apache Phoenix
解决 UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0xa2 in position 107
jxl笔记
Didi open source Delta: AI developers can easily train natural language models
从39个kaggle竞赛中总结出来的图像分割的Tips和Tricks
Super efficient! The secret of swagger Yapi
SAP SEGW 事物码里的 Association 建模方式
你的下一台电脑何必是电脑,探索不一样的远程操作
Laravel文档阅读笔记-mews/captcha的使用(验证码功能)
关于 SAP UI5 floating footer 显示与否的单步调试以及使用 SAP UI5 的收益
RHCAS6
Introduction to the principle of DNS
《信息系统项目管理师》备考笔记---信息化知识
Distributed solution - Comprehensive decryption of distributed task scheduling platform - xxljob scheduling center cluster
10 minute fitness method reading notes (3/5)
Sqoop import and export operation
上午面了个腾讯拿 38K 出来的,让我见识到了基础的天花
RHCSA1
Introduction to relational model theory