当前位置:网站首页>Use mitmproxy to cache 360 degree panoramic web pages offline
Use mitmproxy to cache 360 degree panoramic web pages offline
2022-07-06 23:11:00 【Xiaoming - code entity】
Blog home page :https://blog.csdn.net/as604049322
Welcome to thumb up Collection Leaving a message. Welcome to discuss !
This paper is written by Xiaoming - Code entities original , First appeared in CSDN
There was a problem yesterday :

Some involve dynamically loaded web pages , It is impossible to save all resources with the browser's own function of saving web pages .
If we save documents one by one by hand , There are too many :

Too many folders , One layer at a time .
At this time, I want to cache the target web page offline , Thought of a good way , That is through support python Programmed agent , Let every request be based on URL Save the corresponding file locally .
MitmProxy Installation
Comparison recommendation MitmProxy, The installation method is executed in the command line :
pip install mitmproxy
MitmProxy It is divided into mitmproxy,mitmdump and mitmweb Three commands , among mitmdump Support the use of specified python The script handles each request ( Use -s Parameter assignment ).
After installation, we need the installation certificate MitmProxy Corresponding certificate , visit :http://mitm.it/
Direct access will show :If you can see this, traffic is not passing through mitmproxy.
Here we're going to do it first mitmweb Start a web proxy server :

We give the tourist we use , Set the address of the proxy server , With 360 Take the safe tour as an example :

Set up and use MitmProxy Access again after the proxy server provided http://mitm.it/ You can download and install the certificate :

After downloading, open the certificate and click next to complete the installation .
Visit Baidu at this time , You can see MitmProxy Certificate validation information for :

To write mitmdump Required scripts
mitmdump The template code of the supported script is as follows :
# All sent request packets will be processed by this method
def request(flow):
# Get request object
request = flow.request
# All server response packets are processed by this method
def response(flow):
# Get the response object
response = flow.response
request and response Object and the requests The objects in the library are almost the same .
Our demand is based on url Save the file , Just process the response , Try caching Baidu homepage first :
import os
import re
dest_url = "https://www.baidu.com/"
def response(flow):
url = flow.request.url
response = flow.response
if response.status_code != 200 or not url.startswith(dest_url):
return
r_pos = url.rfind("?")
url = url if r_pos == -1 else url[:r_pos]
url = url if url[-1] != "/" else url+"index.html"
path = re.sub("[/\\\\:\\*\\?\\<\\>\\|\"\s]", "_", dest_url.strip("htps:/"))
file = path + "/" + url.replace(dest_url, "").strip("/")
r_pos = file.rfind("/")
if r_pos != -1:
path, file_name = file[:r_pos], file[r_pos+1:]
os.makedirs(path, exist_ok=True)
with open(file, "wb") as f:
f.write(response.content)
Save the above script as dump.py Then start the agent with the following command ( Close the previously started mitmweb):
>mitmdump -s dump.py
Loading script dump.py
Proxy server listening at http://*:8080
After refreshing the page, baidu home page has been successfully cached :

Use python Test the built-in server and visit :
You can see that you have successfully visited the local Baidu .
Offline caching 360 Panoramic web page
Put the dest_url Change to the following address and save :
dest_url = "https://img360wcs.soufunimg.com/2022/03/25/gz/720/3943919a3a7b46769db6f2db1f4250e5/html"
Revisit :https://img360wcs.soufunimg.com/2022/03/25/gz/720/3943919a3a7b46769db6f2db1f4250e5/html/index.html
If you find that the saved files are not complete , You can open developer tools , Check the network tab Disable caching after , Refresh the page again :

At this time, the main file has been cached :

At this time, just visit all directions on the original web page as much as possible , And zoom in and out to cache as many high-definition detail pictures as possible .
Using the local server to start the test has been successfully accessed :

However, the original script only caches the response code as 200 The ordinary documents of , The above website will also return a response code of 206 Music files , If caching is also needed, it is a little more complicated , Now let's study how to cache music files .
cache 206 Split file
After some research , Modify the above code to the following form :
import os
import re
dest_url = "https://img360wcs.soufunimg.com/2022/03/25/gz/720/3943919a3a7b46769db6f2db1f4250e5/html"
def response(flow):
url = flow.request.url
response = flow.response
if response.status_code not in (200, 206) or not url.startswith(dest_url):
return
r_pos = url.rfind("?")
url = url if r_pos == -1 else url[:r_pos]
url = url if url[-1] != "/" else url+"index.html"
path = re.sub("[/\\\\:\\*\\?\\<\\>\\|\"\s]", "_", dest_url.strip("htps:/"))
file = path + "/" + url.replace(dest_url, "").strip("/")
r_pos = file.rfind("/")
if r_pos != -1:
path, file_name = file[:r_pos], file[r_pos+1:]
os.makedirs(path, exist_ok=True)
if response.status_code == 206:
s, e, length = map(int, re.fullmatch(
r"bytes (\d+)-(\d+)/(\d+)", response.headers['Content-Range']).groups())
if not os.path.exists(file):
with open(file, "wb") as f:
pass
with open(file, "rb+") as f:
f.seek(s)
f.write(response.content)
elif response.status_code == 200:
with open(file, "wb") as f:
f.write(response.content)
Save the modified script ,mitmdump It can be reloaded automatically :

After cleaning up the cache and re accessing , The music files have been downloaded successfully :

summary
adopt mitmdump We have successfully implemented the caching of the designated website , If you want to cache other websites locally in the future, you only need to modify dest_url The website of .
边栏推荐
- 案例推荐丨安擎携手伙伴,保障“智慧法院”更加高效
- Aardio - does not declare the method of directly passing float values
- MySQL数据库之JDBC编程
- 室内LED显示屏应该怎么选择?这5点注意事项必须考虑在内
- Some suggestions for foreign lead2022 in the second half of the year
- Pytest unit test series [v1.0.0] [pytest execute unittest test case]
- 金融人士必读书籍系列之六:权益投资(基于cfa考试内容大纲和框架)
- 实现多彩线条摆出心形
- Docker mysql5.7 how to set case insensitive
- None of the strongest kings in the monitoring industry!
猜你喜欢

企业不想换掉用了十年的老系统

Aardio - does not declare the method of directly passing float values

欧洲生物信息研究所2021亮点报告发布:采用AlphaFold已预测出近1百万个蛋白质

What can be done for traffic safety?

Bipartite graph determination

Custom swap function

Aardio - construct a multi button component with customplus library +plus

Thinkphp5 multi table associative query method join queries two database tables, and the query results are spliced and returned

None of the strongest kings in the monitoring industry!

Machine test question 1
随机推荐
How to achieve text animation effect
安全保护能力是什么意思?等保不同级别保护能力分别是怎样?
COSCon'22 社区召集令来啦!Open the World,邀请所有社区一起拥抱开源,打开新世界~
ICLR 2022 | 基于对抗自注意力机制的预训练语言模型
让我们,从头到尾,通透网络I/O模型
CRMEB商城系统如何助力营销?
【Unity】升级版·Excel数据解析,自动创建对应C#类,自动创建ScriptableObject生成类,自动序列化Asset文件
Mysql 身份认证绕过漏洞(CVE-2012-2122)
HDU 5077 NAND (violent tabulation)
Aardio - does not declare the method of directly passing float values
Demonstration of the development case of DAPP system for money deposit and interest bearing financial management
Const keyword
使用MitmProxy离线缓存360度全景网页
前置机是什么意思?主要作用是什么?与堡垒机有什么区别?
Is there any requirement for the value after the case keyword?
借助这个宝藏神器,我成为全栈了
视图(view)
Sizeof keyword
Children's pajamas (Australia) as/nzs 1249:2014 handling process
Unified Focal loss: Generalising Dice and cross entropy-based losses to handle class imbalanced medi