当前位置:网站首页>Use mitmproxy to cache 360 degree panoramic web pages offline
Use mitmproxy to cache 360 degree panoramic web pages offline
2022-07-06 23:11:00 【Xiaoming - code entity】
Blog home page :https://blog.csdn.net/as604049322
Welcome to thumb up Collection Leaving a message. Welcome to discuss !
This paper is written by Xiaoming - Code entities original , First appeared in CSDN
There was a problem yesterday :

Some involve dynamically loaded web pages , It is impossible to save all resources with the browser's own function of saving web pages .
If we save documents one by one by hand , There are too many :

Too many folders , One layer at a time .
At this time, I want to cache the target web page offline , Thought of a good way , That is through support python Programmed agent , Let every request be based on URL Save the corresponding file locally .
MitmProxy Installation
Comparison recommendation MitmProxy, The installation method is executed in the command line :
pip install mitmproxy
MitmProxy It is divided into mitmproxy,mitmdump and mitmweb Three commands , among mitmdump Support the use of specified python The script handles each request ( Use -s Parameter assignment ).
After installation, we need the installation certificate MitmProxy Corresponding certificate , visit :http://mitm.it/
Direct access will show :If you can see this, traffic is not passing through mitmproxy.
Here we're going to do it first mitmweb Start a web proxy server :

We give the tourist we use , Set the address of the proxy server , With 360 Take the safe tour as an example :

Set up and use MitmProxy Access again after the proxy server provided http://mitm.it/ You can download and install the certificate :

After downloading, open the certificate and click next to complete the installation .
Visit Baidu at this time , You can see MitmProxy Certificate validation information for :

To write mitmdump Required scripts
mitmdump The template code of the supported script is as follows :
# All sent request packets will be processed by this method
def request(flow):
# Get request object
request = flow.request
# All server response packets are processed by this method
def response(flow):
# Get the response object
response = flow.response
request and response Object and the requests The objects in the library are almost the same .
Our demand is based on url Save the file , Just process the response , Try caching Baidu homepage first :
import os
import re
dest_url = "https://www.baidu.com/"
def response(flow):
url = flow.request.url
response = flow.response
if response.status_code != 200 or not url.startswith(dest_url):
return
r_pos = url.rfind("?")
url = url if r_pos == -1 else url[:r_pos]
url = url if url[-1] != "/" else url+"index.html"
path = re.sub("[/\\\\:\\*\\?\\<\\>\\|\"\s]", "_", dest_url.strip("htps:/"))
file = path + "/" + url.replace(dest_url, "").strip("/")
r_pos = file.rfind("/")
if r_pos != -1:
path, file_name = file[:r_pos], file[r_pos+1:]
os.makedirs(path, exist_ok=True)
with open(file, "wb") as f:
f.write(response.content)
Save the above script as dump.py Then start the agent with the following command ( Close the previously started mitmweb):
>mitmdump -s dump.py
Loading script dump.py
Proxy server listening at http://*:8080
After refreshing the page, baidu home page has been successfully cached :

Use python Test the built-in server and visit :
You can see that you have successfully visited the local Baidu .
Offline caching 360 Panoramic web page
Put the dest_url Change to the following address and save :
dest_url = "https://img360wcs.soufunimg.com/2022/03/25/gz/720/3943919a3a7b46769db6f2db1f4250e5/html"
Revisit :https://img360wcs.soufunimg.com/2022/03/25/gz/720/3943919a3a7b46769db6f2db1f4250e5/html/index.html
If you find that the saved files are not complete , You can open developer tools , Check the network tab Disable caching after , Refresh the page again :

At this time, the main file has been cached :

At this time, just visit all directions on the original web page as much as possible , And zoom in and out to cache as many high-definition detail pictures as possible .
Using the local server to start the test has been successfully accessed :

However, the original script only caches the response code as 200 The ordinary documents of , The above website will also return a response code of 206 Music files , If caching is also needed, it is a little more complicated , Now let's study how to cache music files .
cache 206 Split file
After some research , Modify the above code to the following form :
import os
import re
dest_url = "https://img360wcs.soufunimg.com/2022/03/25/gz/720/3943919a3a7b46769db6f2db1f4250e5/html"
def response(flow):
url = flow.request.url
response = flow.response
if response.status_code not in (200, 206) or not url.startswith(dest_url):
return
r_pos = url.rfind("?")
url = url if r_pos == -1 else url[:r_pos]
url = url if url[-1] != "/" else url+"index.html"
path = re.sub("[/\\\\:\\*\\?\\<\\>\\|\"\s]", "_", dest_url.strip("htps:/"))
file = path + "/" + url.replace(dest_url, "").strip("/")
r_pos = file.rfind("/")
if r_pos != -1:
path, file_name = file[:r_pos], file[r_pos+1:]
os.makedirs(path, exist_ok=True)
if response.status_code == 206:
s, e, length = map(int, re.fullmatch(
r"bytes (\d+)-(\d+)/(\d+)", response.headers['Content-Range']).groups())
if not os.path.exists(file):
with open(file, "wb") as f:
pass
with open(file, "rb+") as f:
f.seek(s)
f.write(response.content)
elif response.status_code == 200:
with open(file, "wb") as f:
f.write(response.content)
Save the modified script ,mitmdump It can be reloaded automatically :

After cleaning up the cache and re accessing , The music files have been downloaded successfully :

summary
adopt mitmdump We have successfully implemented the caching of the designated website , If you want to cache other websites locally in the future, you only need to modify dest_url The website of .
边栏推荐
- spark调优(二):UDF减少JOIN和判断
- Motion capture for snake motion analysis and snake robot development
- Some suggestions for foreign lead2022 in the second half of the year
- docker mysql5.7如何设置不区分大小写
- MySQL实现字段分割一行转多行的示例代码
- Interview question: AOF rewriting mechanism, redis interview must ask!!!
- CUDA exploration
- 让 Rust 库更优美的几个建议!你学会了吗?
- DevSecOps软件研发安全实践——发布篇
- POJ 1094 sorting it all out
猜你喜欢
docker启动mysql及-eMYSQL_ROOT_PASSWORD=my-secret-pw问题解决

【Unity】升级版·Excel数据解析,自动创建对应C#类,自动创建ScriptableObject生成类,自动序列化Asset文件
dockermysql修改root账号密码并赋予权限

动作捕捉用于蛇运动分析及蛇形机器人开发

MATLAB小技巧(27)灰色预测

DR-Net: dual-rotation network with feature map enhancement for medical image segmentation

Efficient ETL Testing

云原生(三十二) | Kubernetes篇之平台存储系统介绍

企業不想換掉用了十年的老系統

Les entreprises ne veulent pas remplacer un système vieux de dix ans
随机推荐
[step on pit collection] attempting to deserialize object on CUDA device+buff/cache occupy too much +pad_ sequence
云原生(三十二) | Kubernetes篇之平台存储系统介绍
Method of canceling automatic watermarking of uploaded pictures by CSDN
第十九章 使用工作队列管理器(二)
「小程序容器技术」,是噱头还是新风口?
Extern keyword
POJ 1094 sorting it all out
Pytest unit test series [v1.0.0] [pytest execute unittest test case]
MySQL实现字段分割一行转多行的示例代码
The difference between enumeration and define macro
Graphite document: four countermeasures to solve the problem of enterprise document information security
ICLR 2022 | 基于对抗自注意力机制的预训练语言模型
(shuttle) navigation return interception: willpopscope
MySQL数据库之JDBC编程
UE4 blueprint learning chapter (IV) -- process control forloop and whileloop
案例推荐丨安擎携手伙伴,保障“智慧法院”更加高效
docker启动mysql及-eMYSQL_ROOT_PASSWORD=my-secret-pw问题解决
Void keyword
QT signal and slot
同构+跨端,懂得小程序+kbone+finclip就够了!