当前位置:网站首页>b站 实时弹幕和历史弹幕 Protobuf 格式解析
b站 实时弹幕和历史弹幕 Protobuf 格式解析
2022-07-06 09:27:00 【擒贼先擒王】
参考:
- https://zhuanlan.zhihu.com/p/392931611
- https://gitee.com/nbody1996/bilibili-API-collect/blob/master/danmaku/danmaku_proto.md
- Bilibili 历史弹幕:https://www.cnblogs.com/mollnn/p/14964905.html
b站弹幕传输的格式由原来的 xml 改为了 protobuf,这个格式为二进制编码传输,其传输销量远高于原来的 xml,因此在移动端可以减小网络的压力具有一定的优势。但带来的一个问题就是,这个格式的弹幕解析起来变得十分困难,通常从 api 获得的数据直接看是一通乱码,需要特定的方式才能看到真正的内容,让人比较头疼。
B站没有使用 protobuf 协议前的弹幕接口
1、什么是 Protobuf
Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.
上面这段话来自谷歌 Protobuf 官网的介绍,简单来讲就是一种传输的协议,比 xml 更小、更快、更简单,更多信息可以见:https://developers.google.com/protocol-buffers/
2、如何解析 Protobuf 的弹幕
2.1 下载 Protoc 编译器
Protoc 是用于将 .proto 文件编译成各种编程语言(如 Python、Golang 等)的编译器,是进行 Protobuf 解析的必要条件,可在下面的链接中下载:https://github.com/protocolbuffers/protobuf
下载完成后解压出来是 exe 文件,不需要安装,但是需要手动添加到 Path 中。
通过在终端中运行如下代码来确定是否安装成功:protoc --version
2.2 下载 Protobuf-Python 以便在 Python 中解析 Protobuf
下载地址:https://github.com/protocolbuffers/protobuf
下载完成后解压,然后进入 python 进入目录,
执行以下命令行代码:
python setup.py clean
python setup.py build
python setup.py install
python setup.py test
2.3 弹幕的 proto 定义并编译
弹幕格式,protobuf 结构体:
dm.proto
syntax = "proto3";
package dm;
message DmSegMobileReply{
repeated DanmakuElem elems = 1;
}
message DanmakuElem{
int64 id = 1;
int32 progress = 2;
int32 mode = 3;
int32 fontsize = 4;
uint32 color = 5;
string midHash = 6;
string content = 7;
int64 ctime = 8;
int32 weight = 9;
string action = 10;
int32 pool = 11;
string idStr = 12;
}
名称 | 含义 | 类型 | 备注 |
---|---|---|---|
id | 弹幕dmID | int64 | 唯一 可用于操作参数 |
progress | 视频内弹幕出现时间 | int32 | 毫秒 |
mode | 弹幕类型 | int32 | 1 2 3:普通弹幕 4:底部弹幕 5:顶部弹幕 6:逆向弹幕 7:高级弹幕 8:代码弹幕 9:BAS弹幕 |
fontsize | 弹幕字号 | int32 | 18:小 25:标准 36:大 |
color | 弹幕颜色 | uint32 | 十进制RGB888值 |
midHash | 发送者UID的HASH | string | 用于屏蔽用户和查看用户发送的所有弹幕 也可反查用户ID |
content | 弹幕内容 | string | utf-8编码 |
ctime | 弹幕发送时间 | int64 | 时间戳 |
weight | 权重 | int32 | 用于智能屏蔽级别 |
action | 动作 | string | 未知 |
pool | 弹幕池 | int32 | 0:普通池 1:字幕池 2:特殊池(代码/BAS弹幕) |
idStr | 弹幕dmID的字符串类型 | string | 唯一 可用于操作参数 |
2.4 解析 seg.so 格式的弹幕数据
示例视频:https://www.bilibili.com/video/av98919207
解析之前需要先安装 python 的 probuf 包: pip install protobuf
编译 proto 结构文件,
protoc --python_out=. dm.proto
执行完成后会生成 dm_pb2.py,代码中引入这个 python 文件,
dm_pj.py 代码如下:
注意:
- 实时弹幕 不需要 cookie,直接请求即可得到 seg.so
- 历史弹幕 需要 cookie 才能得到 seg.so
# -*- coding: utf-8 -*-
# @Author :
# @Date :
# @File : dm_pj.py
# @description : XXX
import json
import requests
from dm_pb2 import DmSegMobileReply
from google.protobuf.json_format import MessageToJson, Parse
b_web_cookie = 'SESSDATA=fd25e2e6%2C1660373048%2C287c9%2A21;'
def get_date_list():
url = "https://api.bilibili.com/x/v2/dm/history/index?type=1&oid=168855206&month=2022-02"
headers = {
'cookie': b_web_cookie
}
response = requests.get(url, headers=headers)
print(json.dumps(response.json(), ensure_ascii=False, indent=4))
def dm_real_time():
url_real_time = 'https://api.bilibili.com/x/v2/dm/web/seg.so?type=1&oid=168855206&pid=98919207&segment_index=1'
resp = requests.get(url_real_time)
DM = DmSegMobileReply()
DM.ParseFromString(resp.content)
data_dict = json.loads(MessageToJson(DM))
# print(data_dict)
list(map(lambda x=None: print(x['content']), data_dict.get('elems', [])))
def dm_history():
url_history = 'https://api.bilibili.com/x/v2/dm/web/history/seg.so?type=1&oid=168855206&date=2022-02-23'
headers = {
'cookie': b_web_cookie
}
resp = requests.get(url_history, headers=headers)
DM = DmSegMobileReply()
DM.ParseFromString(resp.content)
data_dict = json.loads(MessageToJson(DM))
# print(data_dict)
list(map(lambda x=None: print(x['content']), data_dict.get('elems', [])))
if __name__ == '__main__':
# dm_real_time()
get_date_list()
# dm_history()
pass
执行结果截图:
弹幕对比:
边栏推荐
- SSM框架常用配置文件
- 1010 things that college students majoring in it must do before graduation
- 【练习-6】(Uva 725)Division(除法)== 暴力
- mysql导入数据库报错 [Err] 1273 – Unknown collation: ‘utf8mb4_0900_ai_ci’
- Opencv learning log 13 corrosion, expansion, opening and closing operations
- Penetration test (8) -- official document of burp Suite Pro
- China's PCB connector market trend report, technological innovation and market forecast
- 渗透测试 ( 2 ) --- 渗透测试系统、靶机、GoogleHacking、kali工具
- 0-1背包問題(一)
- 用C语言写网页游戏
猜你喜欢
MATLAB综合练习:信号与系统中的应用
【高老师软件需求分析】20级云班课习题答案合集
C语言学习笔记
Information security - Epic vulnerability log4j vulnerability mechanism and preventive measures
Learning record: use stm32f1 watchdog
【练习-7】Crossword Answers
1010 things that college students majoring in it must do before graduation
Borg Maze (BFS+最小生成树)(解题报告)
C语言数组的概念
Learning record: how to perform PWM output
随机推荐
渗透测试 ( 7 ) --- 漏洞扫描工具 Nessus
China chart recorder market trend report, technology dynamic innovation and market forecast
【练习-10】 Unread Messages(未读消息)
入门C语言基础问答
Research Report on market supply and demand and strategy of China's earth drilling industry
CEP used by Flink
信息安全-威胁检测-NAT日志接入威胁检测平台详细设计
STM32 learning record: LED light flashes (register version)
China's salt water membrane market trend report, technological innovation and market forecast
STM32如何使用STLINK下载程序:点亮LED跑马灯(库版本)
STM32 how to use stlink download program: light LED running light (Library version)
【练习-2】(Uva 712) S-Trees (S树)
Research Report of exterior wall insulation system (ewis) industry - market status analysis and development prospect prediction
D - Function(HDU - 6546)女生赛
Accounting regulations and professional ethics [1]
【练习-6】(PTA)分而治之
Borg Maze (BFS+最小生成树)(解题报告)
Perinatal Software Industry Research Report - market status analysis and development prospect forecast
Matlab comprehensive exercise: application in signal and system
Opencv learning log 19 skin grinding