当前位置:网站首页>Simple page request and parsing cases
Simple page request and parsing cases
2022-07-05 13:01:00 【South Lake Fishing Song】
<html>
<head>
<meta http-equiv=Content-Type content="text/html;charset=utf-8">
<title> Webpage title </title>
</head>
<body>
<h1> title 1</h1>
<h2> title 2</h2>
<h3> title 3</h3>
<h4> title 4</h4>
<div id="content" class="default">
<p> The paragraph </p>
<a href="http://www.baidu.com"> Baidu </a> <br/>
<a href="http://www.crazyant.net"> Crazy ant </a> <br/>
<a href="http://www.iqiyi.com"> Iqiyi </a> <br/>
<img src="https://www.python.org/static/img/python-logo.png"/>
</div>
</body>
</html>
# -*- coding=utf-8 -*-
from bs4 import BeautifulSoup
with open('./test.html',encoding='utf-8') as fin:
html_doc = fin.read()
soup = BeautifulSoup(html_doc,'html.parser')
links = soup.find_all('a')
for link in links:
print(link.name,link['href'],link.get_text())
img = soup.find('img')
print(img['src'])
# Upgraded version :
from bs4 import BeautifulSoup
with open('./test.html',encoding='utf-8') as fin:
html_doc = fin.read()
soup = BeautifulSoup(html_doc,'html.parser')
div_node = soup.find('div',id = 'content') # Find large blocks first
print(div_node)
print("#"*50)
links = div_node.find_all('a')
for link in links:
print(link.name,link['href'],link.get_text())
img = div_node.find('img')
print(img['src'])
边栏推荐
- 研究:数据安全工具在 60% 的情况下无法抵御勒索软件
- Notes for preparation of information system project manager --- information knowledge
- 解决 UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0xa2 in position 107
- 谈谈我写作生涯的画图技巧
- It's too convenient. You can complete the code release and approval by nailing it!
- 《2022年中國銀行業RPA供應商實力矩陣分析》研究報告正式啟動
- Transactions from December 29, 2021 to January 4, 2022
- A specific example of ABAP type and EDM type mapping in SAP segw transaction code
- Introduction to sap ui5 flexiblecolumnlayout control
- Transactions from December 27 to 28, 2021
猜你喜欢
LeetCode20.有效的括号
Kotlin variable
Simply take stock reading notes (3/8)
Setting up sqli lab environment
RHCSA1
Introduction to sap ui5 flexiblecolumnlayout control
HiEngine:可媲美本地的云原生内存数据库引擎
Taobao short videos are automatically released in batches without manual RPA open source
From the perspective of technology and risk control, it is analyzed that wechat Alipay restricts the remote collection of personal collection code
It's too convenient. You can complete the code release and approval by nailing it!
随机推荐
From the perspective of technology and risk control, it is analyzed that wechat Alipay restricts the remote collection of personal collection code
JXL notes
MySQL giant pit: update updates should be judged with caution by affecting the number of rows!!!
非技术部门,如何参与 DevOps?
SAP UI5 视图里的 OverflowToolbar 控件
由扫地增而引起的小叙
SAP ui5 objectpagelayout control usage sharing
Simply take stock reading notes (1/8)
Laravel文档阅读笔记-mews/captcha的使用(验证码功能)
Taobao product details API | get baby SKU, main map, evaluation and other API interfaces
你的下一台电脑何必是电脑,探索不一样的远程操作
DNS的原理介绍
Leetcode20. Valid parentheses
Actual combat simulation │ JWT login authentication
Lepton 无损压缩原理及性能分析
以VMware创新之道,重塑多云产品力
Rocky基础知识1
Natural language processing from Xiaobai to proficient (4): using machine learning to classify Chinese email content
SAP SEGW 事物码里的 Association 建模方式
CVPR 2022 | single step 3D target recognizer based on sparse transformer