当前位置:网站首页>b站 實時彈幕和曆史彈幕 Protobuf 格式解析
b站 實時彈幕和曆史彈幕 Protobuf 格式解析
2022-07-06 15:59:00 【擒賊先擒王】
參考:
- https://zhuanlan.zhihu.com/p/392931611
- https://gitee.com/nbody1996/bilibili-API-collect/blob/master/danmaku/danmaku_proto.md
- Bilibili 曆史彈幕:https://www.cnblogs.com/mollnn/p/14964905.html
b站彈幕傳輸的格式由原來的 xml 改為了 protobuf,這個格式為二進制編碼傳輸,其傳輸銷量遠高於原來的 xml,因此在移動端可以减小網絡的壓力具有一定的優勢。但帶來的一個問題就是,這個格式的彈幕解析起來變得十分困難,通常從 api 獲得的數據直接看是一通亂碼,需要特定的方式才能看到真正的內容,讓人比較頭疼。
B站沒有使用 protobuf 協議前的彈幕接口
1、什麼是 Protobuf
Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.
上面這段話來自穀歌 Protobuf 官網的介紹,簡單來講就是一種傳輸的協議,比 xml 更小、更快、更簡單,更多信息可以見:https://developers.google.com/protocol-buffers/
2、如何解析 Protobuf 的彈幕
2.1 下載 Protoc 編譯器
Protoc 是用於將 .proto 文件編譯成各種編程語言(如 Python、Golang 等)的編譯器,是進行 Protobuf 解析的必要條件,可在下面的鏈接中下載:https://github.com/protocolbuffers/protobuf
下載完成後解壓出來是 exe 文件,不需要安裝,但是需要手動添加到 Path 中。
通過在終端中運行如下代碼來確定是否安裝成功:protoc --version
2.2 下載 Protobuf-Python 以便在 Python 中解析 Protobuf
下載地址:https://github.com/protocolbuffers/protobuf
下載完成後解壓,然後進入 python 進入目錄,
執行以下命令行代碼:
python setup.py clean
python setup.py build
python setup.py install
python setup.py test
2.3 彈幕的 proto 定義並編譯
彈幕格式,protobuf 結構體:
dm.proto
syntax = "proto3";
package dm;
message DmSegMobileReply{
repeated DanmakuElem elems = 1;
}
message DanmakuElem{
int64 id = 1;
int32 progress = 2;
int32 mode = 3;
int32 fontsize = 4;
uint32 color = 5;
string midHash = 6;
string content = 7;
int64 ctime = 8;
int32 weight = 9;
string action = 10;
int32 pool = 11;
string idStr = 12;
}
名稱 | 含義 | 類型 | 備注 |
---|---|---|---|
id | 彈幕dmID | int64 | 唯一 可用於操作參數 |
progress | 視頻內彈幕出現時間 | int32 | 毫秒 |
mode | 彈幕類型 | int32 | 1 2 3:普通彈幕 4:底部彈幕 5:頂部彈幕 6:逆向彈幕 7:高級彈幕 8:代碼彈幕 9:BAS彈幕 |
fontsize | 彈幕字號 | int32 | 18:小 25:標准 36:大 |
color | 彈幕顏色 | uint32 | 十進制RGB888值 |
midHash | 發送者UID的HASH | string | 用於屏蔽用戶和查看用戶發送的所有彈幕 也可反查用戶ID |
content | 彈幕內容 | string | utf-8編碼 |
ctime | 彈幕發送時間 | int64 | 時間戳 |
weight | 權重 | int32 | 用於智能屏蔽級別 |
action | 動作 | string | 未知 |
pool | 彈幕池 | int32 | 0:普通池 1:字幕池 2:特殊池(代碼/BAS彈幕) |
idStr | 彈幕dmID的字符串類型 | string | 唯一 可用於操作參數 |
2.4 解析 seg.so 格式的彈幕數據
示例視頻:https://www.bilibili.com/video/av98919207
解析之前需要先安裝 python 的 probuf 包: pip install protobuf
編譯 proto 結構文件,
protoc --python_out=. dm.proto
執行完成後會生成 dm_pb2.py,代碼中引入這個 python 文件,
dm_pj.py 代碼如下:
注意:
- 實時彈幕 不需要 cookie,直接請求即可得到 seg.so
- 曆史彈幕 需要 cookie 才能得到 seg.so
# -*- coding: utf-8 -*-
# @Author :
# @Date :
# @File : dm_pj.py
# @description : XXX
import json
import requests
from dm_pb2 import DmSegMobileReply
from google.protobuf.json_format import MessageToJson, Parse
b_web_cookie = 'SESSDATA=fd25e2e6%2C1660373048%2C287c9%2A21;'
def get_date_list():
url = "https://api.bilibili.com/x/v2/dm/history/index?type=1&oid=168855206&month=2022-02"
headers = {
'cookie': b_web_cookie
}
response = requests.get(url, headers=headers)
print(json.dumps(response.json(), ensure_ascii=False, indent=4))
def dm_real_time():
url_real_time = 'https://api.bilibili.com/x/v2/dm/web/seg.so?type=1&oid=168855206&pid=98919207&segment_index=1'
resp = requests.get(url_real_time)
DM = DmSegMobileReply()
DM.ParseFromString(resp.content)
data_dict = json.loads(MessageToJson(DM))
# print(data_dict)
list(map(lambda x=None: print(x['content']), data_dict.get('elems', [])))
def dm_history():
url_history = 'https://api.bilibili.com/x/v2/dm/web/history/seg.so?type=1&oid=168855206&date=2022-02-23'
headers = {
'cookie': b_web_cookie
}
resp = requests.get(url_history, headers=headers)
DM = DmSegMobileReply()
DM.ParseFromString(resp.content)
data_dict = json.loads(MessageToJson(DM))
# print(data_dict)
list(map(lambda x=None: print(x['content']), data_dict.get('elems', [])))
if __name__ == '__main__':
# dm_real_time()
get_date_list()
# dm_history()
pass
執行結果截圖:
彈幕對比:
边栏推荐
- Cost accounting [16]
- China's salt water membrane market trend report, technological innovation and market forecast
- 渗透测试 2 --- XSS、CSRF、文件上传、文件包含、反序列化漏洞
- Accounting regulations and professional ethics [3]
- Research Report on market supply and demand and strategy of China's earth drilling industry
- Penetration testing (5) -- a collection of practical skills of scanning King nmap and penetration testing tools
- 渗透测试 ( 7 ) --- 漏洞扫描工具 Nessus
- Truck History
- frida hook so层、protobuf 数据解析
- Nodejs+vue online fresh flower shop sales information system express+mysql
猜你喜欢
7-1 懂的都懂 (20 分)
MySQL import database error [err] 1273 - unknown collation: 'utf8mb4_ 0900_ ai_ ci’
用C语言写网页游戏
差分(一维,二维,三维) 蓝桥杯三体攻击
Web based photo digital printing website
1010 things that college students majoring in it must do before graduation
信息安全-威胁检测引擎-常见规则引擎底座性能比较
程序员的你,有哪些炫技的代码写法?
信息安全-威胁检测-flink广播流BroadcastState双流合并应用在过滤安全日志
【高老师软件需求分析】20级云班课习题答案合集
随机推荐
Cost accounting [13]
Perform general operations on iptables
MATLAB综合练习:信号与系统中的应用
X-forwarded-for details, how to get the client IP
Penetration test (1) -- necessary tools, navigation
mysql导入数据库报错 [Err] 1273 – Unknown collation: ‘utf8mb4_0900_ai_ci’
D - Function(HDU - 6546)女生赛
Opencv learning log 12 binarization of Otsu method
Information security - Epic vulnerability log4j vulnerability mechanism and preventive measures
Opencv learning log 15 count the number of solder joints and output
想应聘程序员,您的简历就该这样写【精华总结】
Market trend report, technical innovation and market forecast of geosynthetic clay liner in China
Opencv learning log 16 paperclip count
渗透测试 ( 2 ) --- 渗透测试系统、靶机、GoogleHacking、kali工具
b站 实时弹幕和历史弹幕 Protobuf 格式解析
[exercise-6] (UVA 725) division = = violence
Matlab comprehensive exercise: application in signal and system
nodejs爬虫
渗透测试 ( 5 ) --- 扫描之王 nmap、渗透测试工具实战技巧合集
Accounting regulations and professional ethics [5]