当前位置:网站首页>Restoration analysis of protobuf protocol of bullet screen in station B
Restoration analysis of protobuf protocol of bullet screen in station B
2022-07-06 23:22:00 【VIP_ CQCRE】
This is a 「 Attacking Coder」 Of the 657 Technology sharing
author :TheWeiJun
source : The story of reverse and reptile
“
It is necessary to read this article 3 minute .
”Catalog
One 、 What is? protobuf?
Two 、 Website debugging analysis
3、 ... and 、protobuf Protocol restore
Four 、 Complete code implementation
5、 ... and 、 Experience sharing and summary
Interesting module
Xiao Hong is a data analysis engineer , Since last time Xiaohong solved the problem of font anti crawling , Xiao Hong has not encountered any difficult problems . But there's something unexpected , Today, when Xiaohong was analyzing the bullet screen King , There is a new problem . The data is garbled and irregular , Is said to be protobuf, Today, let's analyze the new problems encountered by Xiao Hong !
One 、 What is? protobuf agreement ?
Preface :Protobuf (Protocol Buffers) It is an unrelated platform developed by Google , No language , Scalable , Lightweight and efficient data format of serialization structure , Used to sequence custom data structures into byte streams , And deserializing byte streams into data structures . So it is very suitable for data storage and for different languages , Data exchange format for communication between different applications , As long as the same protocol format is implemented , The suffix is proto Files are compiled into different languages , Join their respective projects , In this way, different languages can parse other languages through Protobuf Serialized data . Currently officially provided c++,java,go Language support .
Two 、 Website debugging analysis
1、 First, open our website for this analysis , Search the content of the specified barrage , The screenshot is as follows :

explain : Because the bullet screen content uses protobuf agreement , So you can't search and locate directly , We need to analyze packet requests , To locate specific url link .
2、 Analyze packet requests , Navigate to the barrage link , The screenshot is as follows :

explain : We can clearly see from the screenshot , This is the content of the barrage . But after all, I used protobuf Protocol code , If we want to restore the plaintext information , Next, we need to go JS Breakpoint debugging analysis .
3、 Use xhr/fetch Debug the request breakpoint , The screenshot is as follows :

explain : Because the request is right response the protobuf Protocol code , So after we locate the location of the request for contract , Just pay attention to the following decoding logic .
4、 After executing the breakpoint operation button , The screenshot is as follows :

explain : At the moment r The variable is the barrage we want to access url Address ; Next, continue to execute the breakpoint .
5、 Continue to execute breakpoints , Continue closer , The screenshot is as follows :

Now we print variables r Value , The screenshot is as follows :

explain : This is the plaintext information we want ? Next , We just need to find protobuf Protocol initialization parameters id The definition can restore the plaintext .
6、 after JS Breakpoint debugging , Finally, it is oriented to protobuf The protocol initialization parameters are as follows :

7、 take Console After copying the data in JSON Online formatting and parsing , The screenshot is as follows :

summary : know response Plaintext and protobuf Protocol defined parameters and id after , Next we just need to build proto File can complete the restoration of the entire plaintext information .
3、 ... and 、protobuf Protocol restore
1、 Restore protobuf agreement , Edit the code structure as follows :

2、 Execute the following command , Compiled into python protobuf Executable file :
protoc --python_out=. *.proto3、 After running the command , Generate protobuf file , The screenshot is as follows :

summary : Come here protobuf The agreement is completely restored , Next, let's enter the complete code implementation .
Four 、 Complete code implementation
1、 The complete code of the whole project is as follows
# -*- coding: utf-8 -*-
# --------------------------------------
# @author : official account : The story of reverse and reptile
# --------------------------------------
import requests
from feed_pb2 import Feed
from google.protobuf.json_format import MessageToDict
def start_requests():
cookies = {
'rpdid': '|(J~RkYYY|k|0J\'uYulYRlJl)',
'buvid3': '794669E2-CEBC-4737-AB8F-73CB9D9C0088184988infoc',
'buvid4': '046D34538-767A-526A-8625-7D1F04E0183673538-022021413-+yHNrXw7i70NUnsrLeJd2Q%3D%3D',
'DedeUserID': '481849275',
'DedeUserID__ckMd5': '04771b27fae39420',
'sid': 'ij1go1j8',
'i-wanna-go-back': '-1',
'b_ut': '5',
'CURRENT_BLACKGAP': '0',
'buvid_fp_plain': 'undefined',
'blackside_state': '0',
'nostalgia_conf': '-1',
'PVID': '2',
'b_lsid': '55BA153F_18190A78A34',
'bsource': 'search_baidu',
'innersign': '1',
'CURRENT_FNVAL': '4048',
'b_timer': '%7B%22ffp%22%3A%7B%22333.1007.fp.risk_794669E2%22%3A%2218190A78B5F%22%2C%22333.788.fp.risk_794669E2%22%3A%2218190A797FF%22%2C%22333.42.fp.risk_794669E2%22%3A%2218190A7A6C5%22%7D%7D',
}
headers = {
'authority': 'xxxxxx',
'accept': '*/*',
'accept-language': 'zh-CN,zh;q=0.9',
'cache-control': 'no-cache',
'origin': 'https://www.xxxxx.com',
'pragma': 'no-cache',
'referer': 'https://www.xxxxxx.li.com/video/BV1434y1L7rb?spm_id_from=333.851.b_7265636f6d6d656e64.1&vd_source=8d45ec9ed78652f966b3625afe95e904',
'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"macOS"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-site',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
params = {
'type': '1',
'oid': '729126061',
'pid': '896926231',
'segment_index': '1',
}
response = requests.get('https://xxxx.xxxx.com/x/v2/dm/web/seg.so', params=params, cookies=cookies,
headers=headers)
info = Feed()
info.ParseFromString(response.content)
_data = MessageToDict(info, preserving_proto_field_name=True)
messages = _data.get("message") or []
for message in messages:
print(message.get("content"))
if __name__ == '__main__':
start_requests()2、 After running the code , The screenshot is as follows :

5、 ... and 、 Experience sharing and summary
Review the entire analysis process , The difficulties are summarized as follows :
How to quickly locate the location of encryption parameters
Understand and master protobuf agreement
It can be restored through the source code proto file
How to be in python Use in protobuf

End
Cui Qingcai's new book 《Python3 Web crawler development practice ( The second edition )》 It's officially on the market ! The book details the use of zero basis Python Develop all aspects of reptile knowledge , At the same time, compared with the first edition, it has added JavaScript reverse 、Android reverse 、 Asynchronous crawler 、 Deep learning 、Kubernetes Related content , At the same time, this book has obtained Python The father of Guido The recommendation of , At present, this book is on sale at a 20% discount !
Content introduction :《Python3 Web crawler development practice ( The second edition )》 Content introduction

Scan purchase


You'd better watch it

边栏推荐
- Let me ask you if there are any documents or cases of flynk SQL generation jobs. I know that flynk cli can create tables and specify items
- 为了交通安全,可以做些什么?
- 服务器的系统怎么选者
- Cover fake big empty talk in robot material sorting
- flinksql select id ,count(*) from a group by id .
- 今日睡眠质量记录78分
- Summary of three methods for MySQL to view table structure
- Docker starts MySQL and -emysql_ ROOT_ Password = my secret PW problem solving
- Redis 持久化机制
- ACL 2022 | small sample ner of sequence annotation: dual tower Bert model integrating tag semantics
猜你喜欢

#DAYU200体验官# 首页aito视频&Canvas绘制仪表盘(ets)

mysql连接vscode成功了,但是报这个错

Isomorphism + cross end, knowing applet +kbone+finclip is enough!

Up to 5million per person per year! Choose people instead of projects, focus on basic scientific research, and scientists dominate the "new cornerstone" funded by Tencent to start the application

Cloud native (32) | kubernetes introduction to platform storage system

European Bioinformatics Institute 2021 highlights report released: nearly 1million proteins have been predicted by alphafold

GPT-3当一作自己研究自己,已投稿,在线蹲一个同行评议

On file uploading of network security
mysql拆分字符串作为查询条件的示例代码

Use mitmproxy to cache 360 degree panoramic web pages offline
随机推荐
asp读取oracle数据库问题
On file uploading of network security
【全网首发】Redis系列3:高可用之主从架构的
服务器的系统怎么选者
每人每年最高500万经费!选人不选项目,专注基础科研,科学家主导腾讯出资的「新基石」启动申报...
新手问个问题,我现在是单机部署的,提交了一个sql job运行正常,如果我重启了服务job就没了又得
石墨文档:4大对策解决企业文件信息安全问题
AcWing 4300. Two operations (minimum number of BFS searches)
浅谈网络安全之文件上传
Station B Big utilise mon monde pour faire un réseau neuronal convolutif, Le Cun Forward! Le foie a explosé pendant 6 mois, et un million de fois.
面试题:AOF重写机制,redis面试必问!!!
QT signal and slot
Designed for decision tree, the National University of Singapore and Tsinghua University jointly proposed a fast and safe federal learning system
flinksql select id ,count(*) from a group by id .
PDF批量拆分、合并、书签提取、书签写入小工具
食品里的添加剂品种越多,越不安全吗?
(1) Chang'an chain learning notes - start Chang'an chain
MySQL数据库之JDBC编程
问下各位,有没有flink sql生成作业的文档啊或是案列啊知道flink cli可以建表和指定目
The worse the AI performance, the higher the bonus? Doctor of New York University offered a reward for the task of making the big model perform poorly