当前位置:网站首页>b站 實時彈幕和曆史彈幕 Protobuf 格式解析
b站 實時彈幕和曆史彈幕 Protobuf 格式解析
2022-07-06 15:59:00 【擒賊先擒王】
參考:
- https://zhuanlan.zhihu.com/p/392931611
- https://gitee.com/nbody1996/bilibili-API-collect/blob/master/danmaku/danmaku_proto.md
- Bilibili 曆史彈幕:https://www.cnblogs.com/mollnn/p/14964905.html
b站彈幕傳輸的格式由原來的 xml 改為了 protobuf,這個格式為二進制編碼傳輸,其傳輸銷量遠高於原來的 xml,因此在移動端可以减小網絡的壓力具有一定的優勢。但帶來的一個問題就是,這個格式的彈幕解析起來變得十分困難,通常從 api 獲得的數據直接看是一通亂碼,需要特定的方式才能看到真正的內容,讓人比較頭疼。
B站沒有使用 protobuf 協議前的彈幕接口
1、什麼是 Protobuf
Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.
上面這段話來自穀歌 Protobuf 官網的介紹,簡單來講就是一種傳輸的協議,比 xml 更小、更快、更簡單,更多信息可以見:https://developers.google.com/protocol-buffers/
2、如何解析 Protobuf 的彈幕
2.1 下載 Protoc 編譯器
Protoc 是用於將 .proto 文件編譯成各種編程語言(如 Python、Golang 等)的編譯器,是進行 Protobuf 解析的必要條件,可在下面的鏈接中下載:https://github.com/protocolbuffers/protobuf
下載完成後解壓出來是 exe 文件,不需要安裝,但是需要手動添加到 Path 中。
通過在終端中運行如下代碼來確定是否安裝成功:protoc --version
2.2 下載 Protobuf-Python 以便在 Python 中解析 Protobuf
下載地址:https://github.com/protocolbuffers/protobuf
下載完成後解壓,然後進入 python 進入目錄,
執行以下命令行代碼:
python setup.py clean
python setup.py build
python setup.py install
python setup.py test
2.3 彈幕的 proto 定義並編譯
彈幕格式,protobuf 結構體:
dm.proto
syntax = "proto3";
package dm;
message DmSegMobileReply{
repeated DanmakuElem elems = 1;
}
message DanmakuElem{
int64 id = 1;
int32 progress = 2;
int32 mode = 3;
int32 fontsize = 4;
uint32 color = 5;
string midHash = 6;
string content = 7;
int64 ctime = 8;
int32 weight = 9;
string action = 10;
int32 pool = 11;
string idStr = 12;
}
名稱 | 含義 | 類型 | 備注 |
---|---|---|---|
id | 彈幕dmID | int64 | 唯一 可用於操作參數 |
progress | 視頻內彈幕出現時間 | int32 | 毫秒 |
mode | 彈幕類型 | int32 | 1 2 3:普通彈幕 4:底部彈幕 5:頂部彈幕 6:逆向彈幕 7:高級彈幕 8:代碼彈幕 9:BAS彈幕 |
fontsize | 彈幕字號 | int32 | 18:小 25:標准 36:大 |
color | 彈幕顏色 | uint32 | 十進制RGB888值 |
midHash | 發送者UID的HASH | string | 用於屏蔽用戶和查看用戶發送的所有彈幕 也可反查用戶ID |
content | 彈幕內容 | string | utf-8編碼 |
ctime | 彈幕發送時間 | int64 | 時間戳 |
weight | 權重 | int32 | 用於智能屏蔽級別 |
action | 動作 | string | 未知 |
pool | 彈幕池 | int32 | 0:普通池 1:字幕池 2:特殊池(代碼/BAS彈幕) |
idStr | 彈幕dmID的字符串類型 | string | 唯一 可用於操作參數 |
2.4 解析 seg.so 格式的彈幕數據
示例視頻:https://www.bilibili.com/video/av98919207
解析之前需要先安裝 python 的 probuf 包: pip install protobuf
編譯 proto 結構文件,
protoc --python_out=. dm.proto
執行完成後會生成 dm_pb2.py,代碼中引入這個 python 文件,
dm_pj.py 代碼如下:
注意:
- 實時彈幕 不需要 cookie,直接請求即可得到 seg.so
- 曆史彈幕 需要 cookie 才能得到 seg.so
# -*- coding: utf-8 -*-
# @Author :
# @Date :
# @File : dm_pj.py
# @description : XXX
import json
import requests
from dm_pb2 import DmSegMobileReply
from google.protobuf.json_format import MessageToJson, Parse
b_web_cookie = 'SESSDATA=fd25e2e6%2C1660373048%2C287c9%2A21;'
def get_date_list():
url = "https://api.bilibili.com/x/v2/dm/history/index?type=1&oid=168855206&month=2022-02"
headers = {
'cookie': b_web_cookie
}
response = requests.get(url, headers=headers)
print(json.dumps(response.json(), ensure_ascii=False, indent=4))
def dm_real_time():
url_real_time = 'https://api.bilibili.com/x/v2/dm/web/seg.so?type=1&oid=168855206&pid=98919207&segment_index=1'
resp = requests.get(url_real_time)
DM = DmSegMobileReply()
DM.ParseFromString(resp.content)
data_dict = json.loads(MessageToJson(DM))
# print(data_dict)
list(map(lambda x=None: print(x['content']), data_dict.get('elems', [])))
def dm_history():
url_history = 'https://api.bilibili.com/x/v2/dm/web/history/seg.so?type=1&oid=168855206&date=2022-02-23'
headers = {
'cookie': b_web_cookie
}
resp = requests.get(url_history, headers=headers)
DM = DmSegMobileReply()
DM.ParseFromString(resp.content)
data_dict = json.loads(MessageToJson(DM))
# print(data_dict)
list(map(lambda x=None: print(x['content']), data_dict.get('elems', [])))
if __name__ == '__main__':
# dm_real_time()
get_date_list()
# dm_history()
pass
執行結果截圖:
彈幕對比:
边栏推荐
- 力扣刷题记录--完全背包问题(一)
- Information security - Analysis of security orchestration automation and response (soar) technology
- Research Report on market supply and demand and strategy of Chinese graphic screen printing equipment industry
- 【练习4-1】Cake Distribution(分配蛋糕)
- Cost accounting [17]
- China's peripheral catheter market trend report, technological innovation and market forecast
- 滲透測試 ( 1 ) --- 必備 工具、導航
- E. Breaking the Wall
- STM32 learning record: LED light flashes (register version)
- Opencv learning log 12 binarization of Otsu method
猜你喜欢
【练习-4】(Uva 11988)Broken Keyboard(破损的键盘) ==(链表)
毕业才知道IT专业大学生毕业前必做的1010件事
C语言数组的概念
D - Function(HDU - 6546)女生赛
渗透测试 ( 3 ) --- Metasploit Framework ( MSF )
Information security - threat detection engine - common rule engine base performance comparison
动态规划前路径问题优化方式
STM32 learning record: LED light flashes (register version)
MySQL import database error [err] 1273 - unknown collation: 'utf8mb4_ 0900_ ai_ ci’
信息安全-安全编排自动化与响应 (SOAR) 技术解析
随机推荐
If you want to apply for a programmer, your resume should be written like this [essence summary]
VS2019初步使用
Accounting regulations and professional ethics [5]
力扣刷题记录--完全背包问题(一)
Alice and Bob (2021牛客暑期多校训练营1)
入门C语言基础问答
Perinatal Software Industry Research Report - market status analysis and development prospect forecast
Cost accounting [19]
[exercise-4] (UVA 11988) broken keyboard = = (linked list)
渗透测试 2 --- XSS、CSRF、文件上传、文件包含、反序列化漏洞
Cost accounting [21]
Nodejs+vue online fresh flower shop sales information system express+mysql
C语言必背代码大全
Information security - security professional name | CVE | rce | POC | Vul | 0day
0 - 1 problème de sac à dos (1)
[exercise-1] (UVA 673) parentheses balance/ balanced brackets (stack)
【练习-6】(Uva 725)Division(除法)== 暴力
Market trend report, technical innovation and market forecast of geosynthetic clay liner in China
JS调用摄像头
Path problem before dynamic planning