当前位置:网站首页>Restoration analysis of protobuf protocol of bullet screen in station B
Restoration analysis of protobuf protocol of bullet screen in station B
2022-07-06 23:22:00 【VIP_ CQCRE】
This is a 「 Attacking Coder」 Of the 657 Technology sharing
author :TheWeiJun
source : The story of reverse and reptile
“
It is necessary to read this article 3 minute .
”Catalog
One 、 What is? protobuf?
Two 、 Website debugging analysis
3、 ... and 、protobuf Protocol restore
Four 、 Complete code implementation
5、 ... and 、 Experience sharing and summary
Interesting module
Xiao Hong is a data analysis engineer , Since last time Xiaohong solved the problem of font anti crawling , Xiao Hong has not encountered any difficult problems . But there's something unexpected , Today, when Xiaohong was analyzing the bullet screen King , There is a new problem . The data is garbled and irregular , Is said to be protobuf, Today, let's analyze the new problems encountered by Xiao Hong !
One 、 What is? protobuf agreement ?
Preface :Protobuf (Protocol Buffers) It is an unrelated platform developed by Google , No language , Scalable , Lightweight and efficient data format of serialization structure , Used to sequence custom data structures into byte streams , And deserializing byte streams into data structures . So it is very suitable for data storage and for different languages , Data exchange format for communication between different applications , As long as the same protocol format is implemented , The suffix is proto Files are compiled into different languages , Join their respective projects , In this way, different languages can parse other languages through Protobuf Serialized data . Currently officially provided c++,java,go Language support .
Two 、 Website debugging analysis
1、 First, open our website for this analysis , Search the content of the specified barrage , The screenshot is as follows :
explain : Because the bullet screen content uses protobuf agreement , So you can't search and locate directly , We need to analyze packet requests , To locate specific url link .
2、 Analyze packet requests , Navigate to the barrage link , The screenshot is as follows :
explain : We can clearly see from the screenshot , This is the content of the barrage . But after all, I used protobuf Protocol code , If we want to restore the plaintext information , Next, we need to go JS Breakpoint debugging analysis .
3、 Use xhr/fetch Debug the request breakpoint , The screenshot is as follows :
explain : Because the request is right response the protobuf Protocol code , So after we locate the location of the request for contract , Just pay attention to the following decoding logic .
4、 After executing the breakpoint operation button , The screenshot is as follows :
explain : At the moment r The variable is the barrage we want to access url Address ; Next, continue to execute the breakpoint .
5、 Continue to execute breakpoints , Continue closer , The screenshot is as follows :
Now we print variables r Value , The screenshot is as follows :
explain : This is the plaintext information we want ? Next , We just need to find protobuf Protocol initialization parameters id The definition can restore the plaintext .
6、 after JS Breakpoint debugging , Finally, it is oriented to protobuf The protocol initialization parameters are as follows :
7、 take Console After copying the data in JSON Online formatting and parsing , The screenshot is as follows :
summary : know response Plaintext and protobuf Protocol defined parameters and id after , Next we just need to build proto File can complete the restoration of the entire plaintext information .
3、 ... and 、protobuf Protocol restore
1、 Restore protobuf agreement , Edit the code structure as follows :
2、 Execute the following command , Compiled into python protobuf Executable file :
protoc --python_out=. *.proto
3、 After running the command , Generate protobuf file , The screenshot is as follows :
summary : Come here protobuf The agreement is completely restored , Next, let's enter the complete code implementation .
Four 、 Complete code implementation
1、 The complete code of the whole project is as follows
# -*- coding: utf-8 -*-
# --------------------------------------
# @author : official account : The story of reverse and reptile
# --------------------------------------
import requests
from feed_pb2 import Feed
from google.protobuf.json_format import MessageToDict
def start_requests():
cookies = {
'rpdid': '|(J~RkYYY|k|0J\'uYulYRlJl)',
'buvid3': '794669E2-CEBC-4737-AB8F-73CB9D9C0088184988infoc',
'buvid4': '046D34538-767A-526A-8625-7D1F04E0183673538-022021413-+yHNrXw7i70NUnsrLeJd2Q%3D%3D',
'DedeUserID': '481849275',
'DedeUserID__ckMd5': '04771b27fae39420',
'sid': 'ij1go1j8',
'i-wanna-go-back': '-1',
'b_ut': '5',
'CURRENT_BLACKGAP': '0',
'buvid_fp_plain': 'undefined',
'blackside_state': '0',
'nostalgia_conf': '-1',
'PVID': '2',
'b_lsid': '55BA153F_18190A78A34',
'bsource': 'search_baidu',
'innersign': '1',
'CURRENT_FNVAL': '4048',
'b_timer': '%7B%22ffp%22%3A%7B%22333.1007.fp.risk_794669E2%22%3A%2218190A78B5F%22%2C%22333.788.fp.risk_794669E2%22%3A%2218190A797FF%22%2C%22333.42.fp.risk_794669E2%22%3A%2218190A7A6C5%22%7D%7D',
}
headers = {
'authority': 'xxxxxx',
'accept': '*/*',
'accept-language': 'zh-CN,zh;q=0.9',
'cache-control': 'no-cache',
'origin': 'https://www.xxxxx.com',
'pragma': 'no-cache',
'referer': 'https://www.xxxxxx.li.com/video/BV1434y1L7rb?spm_id_from=333.851.b_7265636f6d6d656e64.1&vd_source=8d45ec9ed78652f966b3625afe95e904',
'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"macOS"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-site',
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
params = {
'type': '1',
'oid': '729126061',
'pid': '896926231',
'segment_index': '1',
}
response = requests.get('https://xxxx.xxxx.com/x/v2/dm/web/seg.so', params=params, cookies=cookies,
headers=headers)
info = Feed()
info.ParseFromString(response.content)
_data = MessageToDict(info, preserving_proto_field_name=True)
messages = _data.get("message") or []
for message in messages:
print(message.get("content"))
if __name__ == '__main__':
start_requests()
2、 After running the code , The screenshot is as follows :
5、 ... and 、 Experience sharing and summary
Review the entire analysis process , The difficulties are summarized as follows :
How to quickly locate the location of encryption parameters
Understand and master protobuf agreement
It can be restored through the source code proto file
How to be in python Use in protobuf
End
Cui Qingcai's new book 《Python3 Web crawler development practice ( The second edition )》 It's officially on the market ! The book details the use of zero basis Python Develop all aspects of reptile knowledge , At the same time, compared with the first edition, it has added JavaScript reverse 、Android reverse 、 Asynchronous crawler 、 Deep learning 、Kubernetes Related content , At the same time, this book has obtained Python The father of Guido The recommendation of , At present, this book is on sale at a 20% discount !
Content introduction :《Python3 Web crawler development practice ( The second edition )》 Content introduction
Scan purchase
You'd better watch it
边栏推荐
- How can Oracle CDC deserialize with jsondebeziumdeserializationschema
- Use mitmproxy to cache 360 degree panoramic web pages offline
- GPT-3当一作自己研究自己,已投稿,在线蹲一个同行评议
- How to achieve text animation effect
- Balanced Multimodal Learning via On-the-fly Gradient Modulation(CVPR2022 oral)
- The problem of ASP reading Oracle Database
- (shuttle) navigation return interception: willpopscope
- 企业不想换掉用了十年的老系统
- Graphite document: four countermeasures to solve the problem of enterprise document information security
- 【Unity】升级版·Excel数据解析,自动创建对应C#类,自动创建ScriptableObject生成类,自动序列化Asset文件
猜你喜欢
(shuttle) navigation return interception: willpopscope
云原生(三十二) | Kubernetes篇之平台存储系统介绍
Method of canceling automatic watermarking of uploaded pictures by CSDN
Station B Big utilise mon monde pour faire un réseau neuronal convolutif, Le Cun Forward! Le foie a explosé pendant 6 mois, et un million de fois.
每日刷题记录 (十五)
(1)长安链学习笔记-启动长安链
Koa2 addition, deletion, modification and query of JSON array
Dayu200 experience officer runs the intelligent drying system page based on arkui ETS on dayu200
PDF批量拆分、合并、书签提取、书签写入小工具
Children's pajamas (Australia) as/nzs 1249:2014 handling process
随机推荐
自动更新Selenium驱动chromedriver
室内LED显示屏应该怎么选择?这5点注意事项必须考虑在内
Some suggestions for foreign lead2022 in the second half of the year
Face recognition class attendance system based on paddlepaddle platform (easydl)
How to choose indoor LED display? These five considerations must be taken into account
TDengine 社区问题双周精选 | 第二期
[step on pit collection] attempting to deserialize object on CUDA device+buff/cache occupy too much +pad_ sequence
Designed for decision tree, the National University of Singapore and Tsinghua University jointly proposed a fast and safe federal learning system
Graphite document: four countermeasures to solve the problem of enterprise document information security
None of the strongest kings in the monitoring industry!
Devsecops software R & D security practice - release
Interview question: AOF rewriting mechanism, redis interview must ask!!!
The problem of ASP reading Oracle Database
Why are some people still poor and living at the bottom of society even though they have been working hard?
Dayu200 experience officer homepage AITO video & Canvas drawing dashboard (ETS)
借助这个宝藏神器,我成为全栈了
Docker mysql5.7 how to set case insensitive
Thinkphp5 multi table associative query method join queries two database tables, and the query results are spliced and returned
asp读取oracle数据库问题
Cloud native (32) | kubernetes introduction to platform storage system