当前位置:网站首页>简单上手的页面请求和解析案例
简单上手的页面请求和解析案例
2022-07-05 12:40:00 【南湖渔歌】
<html>
<head>
<meta http-equiv=Content-Type content="text/html;charset=utf-8">
<title>网页标题</title>
</head>
<body>
<h1>标题1</h1>
<h2>标题2</h2>
<h3>标题3</h3>
<h4>标题4</h4>
<div id="content" class="default">
<p>段落</p>
<a href="http://www.baidu.com">百度</a> <br/>
<a href="http://www.crazyant.net">疯狂的蚂蚁</a> <br/>
<a href="http://www.iqiyi.com">爱奇艺</a> <br/>
<img src="https://www.python.org/static/img/python-logo.png"/>
</div>
</body>
</html>
# -*- coding=utf-8 -*-
from bs4 import BeautifulSoup
with open('./test.html',encoding='utf-8') as fin:
html_doc = fin.read()
soup = BeautifulSoup(html_doc,'html.parser')
links = soup.find_all('a')
for link in links:
print(link.name,link['href'],link.get_text())
img = soup.find('img')
print(img['src'])
# 升级版:
from bs4 import BeautifulSoup
with open('./test.html',encoding='utf-8') as fin:
html_doc = fin.read()
soup = BeautifulSoup(html_doc,'html.parser')
div_node = soup.find('div',id = 'content') # 先查找大的区块
print(div_node)
print("#"*50)
links = div_node.find_all('a')
for link in links:
print(link.name,link['href'],link.get_text())
img = div_node.find('img')
print(img['src'])
边栏推荐
- Why is your next computer a computer? Explore different remote operations
- Yyds dry inventory JS intercept file suffix
- ActiveMQ installation and deployment simple configuration (personal test)
- Iterator details in list... Interview pits
- Lepton 无损压缩原理及性能分析
- MySQL 巨坑:update 更新慎用影响行数做判断!!!
- 上午面了个腾讯拿 38K 出来的,让我见识到了基础的天花
- From the perspective of technology and risk control, it is analyzed that wechat Alipay restricts the remote collection of personal collection code
- 研究:数据安全工具在 60% 的情况下无法抵御勒索软件
- Reshape the power of multi cloud products with VMware innovation
猜你喜欢
Introduction aux contrôles de la page dynamique SAP ui5
Laravel文档阅读笔记-mews/captcha的使用(验证码功能)
Transactions from December 27 to 28, 2021
使用 jMeter 对 SAP Spartacus 进行并发性能测试
CF:A. The Third Three Number Problem【关于我是位运算垃圾这个事情】
Annotation problem and hidden Markov model
Research: data security tools cannot resist blackmail software in 60% of cases
上午面了个腾讯拿 38K 出来的,让我见识到了基础的天花
Taobao short video, why the worse the effect
Transactions from January 14 to 19, 2022
随机推荐
Simply take stock reading notes (1/8)
自然语言处理系列(一)入门概述
Transactions from December 29, 2021 to January 4, 2022
自然语言处理从小白到精通(四):用机器学习做中文邮件内容分类
逆波兰表达式
Kotlin process control and circulation
CVPR 2022 | 基于稀疏 Transformer 的单步三维目标识别器
关于 SAP UI5 getSAPLogonLanguage is not a function 的错误消息以及 API 版本的讨论
深度长文探讨Join运算的简化和提速
谈谈我写作生涯的画图技巧
非技术部门,如何参与 DevOps?
jxl笔记
Introduction to the principle of DNS
Free testing of Taobao tmall API order and flag insertion remark interface
Iterator details in list... Interview pits
跨平台(32bit和64bit)的 printf 格式符 %lld 输出64位的解决方式
Kotlin function
SAP SEGW 事物码里的 ABAP Editor
Taobao order amount check error, avoid capital loss API
SAP UI5 FlexibleColumnLayout 控件介绍