当前位置:网站首页>Restoration analysis of protobuf protocol of bullet screen in station B
Restoration analysis of protobuf protocol of bullet screen in station B
2022-07-06 23:22:00 【VIP_ CQCRE】
This is a 「 Attacking Coder」 Of the 657 Technology sharing
author :TheWeiJun
source : The story of reverse and reptile
“
It is necessary to read this article 3 minute .
”Catalog
One 、 What is? protobuf?
Two 、 Website debugging analysis
3、 ... and 、protobuf Protocol restore
Four 、 Complete code implementation
5、 ... and 、 Experience sharing and summary
Interesting module
Xiao Hong is a data analysis engineer , Since last time Xiaohong solved the problem of font anti crawling , Xiao Hong has not encountered any difficult problems . But there's something unexpected , Today, when Xiaohong was analyzing the bullet screen King , There is a new problem . The data is garbled and irregular , Is said to be protobuf, Today, let's analyze the new problems encountered by Xiao Hong !
One 、 What is? protobuf agreement ?
Preface :Protobuf (Protocol Buffers) It is an unrelated platform developed by Google , No language , Scalable , Lightweight and efficient data format of serialization structure , Used to sequence custom data structures into byte streams , And deserializing byte streams into data structures . So it is very suitable for data storage and for different languages , Data exchange format for communication between different applications , As long as the same protocol format is implemented , The suffix is proto Files are compiled into different languages , Join their respective projects , In this way, different languages can parse other languages through Protobuf Serialized data . Currently officially provided c++,java,go Language support .
Two 、 Website debugging analysis
1、 First, open our website for this analysis , Search the content of the specified barrage , The screenshot is as follows :
explain : Because the bullet screen content uses protobuf agreement , So you can't search and locate directly , We need to analyze packet requests , To locate specific url link .
2、 Analyze packet requests , Navigate to the barrage link , The screenshot is as follows :
explain : We can clearly see from the screenshot , This is the content of the barrage . But after all, I used protobuf Protocol code , If we want to restore the plaintext information , Next, we need to go JS Breakpoint debugging analysis .
3、 Use xhr/fetch Debug the request breakpoint , The screenshot is as follows :
explain : Because the request is right response the protobuf Protocol code , So after we locate the location of the request for contract , Just pay attention to the following decoding logic .
4、 After executing the breakpoint operation button , The screenshot is as follows :
explain : At the moment r The variable is the barrage we want to access url Address ; Next, continue to execute the breakpoint .
5、 Continue to execute breakpoints , Continue closer , The screenshot is as follows :
Now we print variables r Value , The screenshot is as follows :
explain : This is the plaintext information we want ? Next , We just need to find protobuf Protocol initialization parameters id The definition can restore the plaintext .
6、 after JS Breakpoint debugging , Finally, it is oriented to protobuf The protocol initialization parameters are as follows :
7、 take Console After copying the data in JSON Online formatting and parsing , The screenshot is as follows :
summary : know response Plaintext and protobuf Protocol defined parameters and id after , Next we just need to build proto File can complete the restoration of the entire plaintext information .
3、 ... and 、protobuf Protocol restore
1、 Restore protobuf agreement , Edit the code structure as follows :
2、 Execute the following command , Compiled into python protobuf Executable file :
protoc --python_out=. *.proto
3、 After running the command , Generate protobuf file , The screenshot is as follows :
summary : Come here protobuf The agreement is completely restored , Next, let's enter the complete code implementation .
Four 、 Complete code implementation
1、 The complete code of the whole project is as follows
# -*- coding: utf-8 -*-
# --------------------------------------
# @author : official account : The story of reverse and reptile
# --------------------------------------
import requests
from feed_pb2 import Feed
from google.protobuf.json_format import MessageToDict
def start_requests():
cookies = {
'rpdid': '|(J~RkYYY|k|0J\'uYulYRlJl)',
'buvid3': '794669E2-CEBC-4737-AB8F-73CB9D9C0088184988infoc',
'buvid4': '046D34538-767A-526A-8625-7D1F04E0183673538-022021413-+yHNrXw7i70NUnsrLeJd2Q%3D%3D',
'DedeUserID': '481849275',
'DedeUserID__ckMd5': '04771b27fae39420',
'sid': 'ij1go1j8',
'i-wanna-go-back': '-1',
'b_ut': '5',
'CURRENT_BLACKGAP': '0',
'buvid_fp_plain': 'undefined',
'blackside_state': '0',
'nostalgia_conf': '-1',
'PVID': '2',
'b_lsid': '55BA153F_18190A78A34',
'bsource': 'search_baidu',
'innersign': '1',
'CURRENT_FNVAL': '4048',
'b_timer': '%7B%22ffp%22%3A%7B%22333.1007.fp.risk_794669E2%22%3A%2218190A78B5F%22%2C%22333.788.fp.risk_794669E2%22%3A%2218190A797FF%22%2C%22333.42.fp.risk_794669E2%22%3A%2218190A7A6C5%22%7D%7D',
}
headers = {
'authority': 'xxxxxx',
'accept': '*/*',
'accept-language': 'zh-CN,zh;q=0.9',
'cache-control': 'no-cache',
'origin': 'https://www.xxxxx.com',
'pragma': 'no-cache',
'referer': 'https://www.xxxxxx.li.com/video/BV1434y1L7rb?spm_id_from=333.851.b_7265636f6d6d656e64.1&vd_source=8d45ec9ed78652f966b3625afe95e904',
'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"macOS"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-site',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
params = {
'type': '1',
'oid': '729126061',
'pid': '896926231',
'segment_index': '1',
}
response = requests.get('https://xxxx.xxxx.com/x/v2/dm/web/seg.so', params=params, cookies=cookies,
headers=headers)
info = Feed()
info.ParseFromString(response.content)
_data = MessageToDict(info, preserving_proto_field_name=True)
messages = _data.get("message") or []
for message in messages:
print(message.get("content"))
if __name__ == '__main__':
start_requests()
2、 After running the code , The screenshot is as follows :
5、 ... and 、 Experience sharing and summary
Review the entire analysis process , The difficulties are summarized as follows :
How to quickly locate the location of encryption parameters
Understand and master protobuf agreement
It can be restored through the source code proto file
How to be in python Use in protobuf
End
Cui Qingcai's new book 《Python3 Web crawler development practice ( The second edition )》 It's officially on the market ! The book details the use of zero basis Python Develop all aspects of reptile knowledge , At the same time, compared with the first edition, it has added JavaScript reverse 、Android reverse 、 Asynchronous crawler 、 Deep learning 、Kubernetes Related content , At the same time, this book has obtained Python The father of Guido The recommendation of , At present, this book is on sale at a 20% discount !
Content introduction :《Python3 Web crawler development practice ( The second edition )》 Content introduction
Scan purchase
You'd better watch it
边栏推荐
- 室内LED显示屏应该怎么选择?这5点注意事项必须考虑在内
- ACL 2022 | small sample ner of sequence annotation: dual tower Bert model integrating tag semantics
- docker启动mysql及-eMYSQL_ROOT_PASSWORD=my-secret-pw问题解决
- MySQL中正则表达式(REGEXP)使用详解
- js导入excel&导出excel
- Children's pajamas (Australia) as/nzs 1249:2014 handling process
- Interview question: AOF rewriting mechanism, redis interview must ask!!!
- Cover fake big empty talk in robot material sorting
- OpenSSL: a full-featured toolkit for TLS and SSL protocols, and a general encryption library
- Koa2 addition, deletion, modification and query of JSON array
猜你喜欢
js对JSON数组的增删改查
UE4 blueprint learning chapter (IV) -- process control forloop and whileloop
Bipartite graph determination
Cloud native (32) | kubernetes introduction to platform storage system
Use mitmproxy to cache 360 degree panoramic web pages offline
COSCon'22 社区召集令来啦!Open the World,邀请所有社区一起拥抱开源,打开新世界~
Stop saying that microservices can solve all problems
Thinkphp5 multi table associative query method join queries two database tables, and the query results are spliced and returned
为了交通安全,可以做些什么?
企业不想换掉用了十年的老系统
随机推荐
The statement that allows full table scanning does not seem to take effect set odps sql. allow. fullscan=true; I
#DAYU200体验官# 在DAYU200运行基于ArkUI-eTS的智能晾晒系统页面
Modules that can be used by both the electron main process and the rendering process
同一个作业有两个source,同一链接不同数据库账号,为何第二个链接查出来的数据库列表是第一个账号的
前置机是什么意思?主要作用是什么?与堡垒机有什么区别?
Pytest unit test series [v1.0.0] [pytest execute unittest test case]
Should the jar package of MySQL CDC be placed in different places in the Flink running mode?
koa2对Json数组增删改查
JS import excel & Export Excel
JS addition, deletion, modification and query of JSON array
Ajout, suppression et modification d'un tableau json par JS
基于PaddlePaddle平台(EasyDL)设计的人脸识别课堂考勤系统
Detailed explanation of ThreadLocal
What can be done for traffic safety?
spark调优(二):UDF减少JOIN和判断
Hard core observation 545 50 years ago, Apollo 15 made a feather landing experiment on the moon
监控界的最强王者,没有之一!
How can Oracle CDC deserialize with jsondebeziumdeserializationschema
Balanced Multimodal Learning via On-the-fly Gradient Modulation(CVPR2022 oral)
Matlab tips (27) grey prediction