当前位置:网站首页>Webrtc audio anti weak network technology (Part 1)
Webrtc audio anti weak network technology (Part 1)
2022-07-07 14:34:00 【51CTO】
Per capita social fear , Love chatting . Voice social products represented by chat rooms unlock new ways for strangers to socialize , Also constantly telling new out of circle stories . Focus on 【 Rongyun global Internet communication cloud 】 Learn more about
And the sound is stuck 、 intermittent 、 Fast forward 、 Slow playback and other phenomena will seriously affect the user experience , Directly cause the user to leave , These are common problems caused by weak Networks .
This paper mainly analyzes the commonly used weak network countermeasures technology from the perspective of audio applications , There are mainly the following :
- Forward error correction technology (FEC、RED etc. )
- Backward error correction technology (ARQ、PLC etc. )
- Encoder anti weak network characteristics ( This article focuses on OPUS Characteristics of encoder )
- Anti jitter technology (JitterBuffer)
We will use 、 Next two articles , combination WebRTC Audio anti weak network technology used or supported in , Analyze the above technologies , To achieve high availability of audio communication services in weak network environment .
The first part mainly shares forward error correction technology 、 Backward error correction technology and OPUS Codec anti weak network characteristics ; The next topic is to share WebRTC Anti jitter module used NetEQ.
Forward error correction technology
FEC
Forward error correction technology , The most typical is FEC Technology .
The sender : Generate redundant packets to combat packet loss during transmission ;
The receiver : Recover the lost packets in the transmission process for the received redundant packets and normal packets .
FEC In band and out of band ,WebRTC Chinese video is through out of band FEC(ULPFEC[1]、FLEXFEC[2]) To generate redundant packets , Audio is through OPUS Intraband FEC Next generation into redundant packets .
Intraband FEC Because it will occupy part of the coding code rate , So the audio quality will be reduced . Out of band FEC Will not affect the sound quality , But it will occupy additional network bandwidth , Each has its own advantages and disadvantages .
FEC Typical coding methods are XOR and Reed Solomon[3].WebRTC Out of band of FEC It uses XOR Encoding mode (ULPFEC and FLEXFEC), Its characteristic is that the amount of calculation is relatively small , But its ability to resist packet loss is limited .
stay WebRTC in , Out of band FEC, Whether it's ULPFEC, still FLEXFEC Are all based on MASK Mask to determine FEC Packages and protected sources RTP Mapping relationship of packages , Two types of masks are defined ,RandMask and BurstMask, The former has better protection effect in random packet loss ; The latter is better for continuous packet loss caused by burst , But either way , Both have their disadvantages ; Here we use 7-4 Mask ( namely 7 Original package , Will generate 4 Two redundant packages ) give an example :
Expand the above hexadecimal by binary as follows :
The mask above indicates according to S1-S7 common 7 Original package , The sender will generate 4 Two redundant packages R1-R4, among :
- R1 Package protection S3,S4,S5 Three original packages
- R2 Package protection S1,S5,S7 Three original packages
- R3 Package protection S1,S2,S6 Three original packages
- R4 Package protection S2,S3,S7 Three original packages
It can also be seen from the above , Each original package is protected by redundant packages ; When the bag is lost , Generally, it can be recovered through redundant packets and received original packets , For example, the sender sent S1-S7、R1-R4 common 11 A package , The receiver received it S1、S3、S5、S7、R1、R2、R3、R4 common 8 A package , lost S2、S4、S6 Three bags ; be S2、S4、S6 The repair process is as follows :
- S2 Can be R4、S3、S7 Repair , namely S2 = R4 XOR S3 XOR S7
- S4 Can be R1、S3、S5 Repair , namely S4 = R1 XOR S3 XOR S5
- S6 Can be R3、S1、S2 Repair , namely S6 = R3 XOR S1 XOR S2
But some packages cannot be repaired , For example, lost S1、S2、S7, Can't recover , Here's why :
According to the mask protection relationship ,S1 The recovery of can be achieved through R2、S5、S7 perhaps R3、S2、S6; But because S7 and S2 The loss of , To recover S1, You need to recover first S2 or S7
Again ,S2 Can pass R3、S1、S6 recovery , But because S1 The loss of , You need to recover first S1
Empathy ,S6 Can pass R3、S1、S2 recovery , But you need to recover first S1、S2
therefore , Through the above analysis, we can see S1、S2、S7 all ⽆ Method recovery
Empathy , If lost S3、S5、S7, Can't recover , This is a WebRTC The technical disadvantage of using mask to determine the protection relationship between redundant packets and original packets .
That is for (M Original package + N Two redundant packages ) A set of packages , There is less than or equal to N When a packet is lost , It may not be possible to recover lost packets .
Reed Solomon Coding can be done for (M Original package + N Two redundant packages ) A set of packages , There is less than or equal to N Bags lost , Can recover the lost package .
RS FEC It mainly uses Vandermonde matrix or Cauchy matrix to encode and decode [4], The effect of Cauchy matrix is less than that of Vandermonde matrix , Better performance ; But no matter what matrix above , they All have ⼀ One characteristic is reversibility , And any submatrix is reversible , This ensures that the loss is less than or equal to N A bag ,RS Can restore it .
The following is a brief description of Vandermonde matrix . With (7,4) For example , namely 7 Original packages are generated 4 Two redundant packages , The original package is S1、S2、S3、S4、S5、S6、S7, The redundant package is (R1、R2、R3、R4). The relationship between the original package and the redundant package is as follows :
Graph
The above Vandermonde matrix is A, As shown below :
Graph
The identity matrix is expressed as follows :
Graph
hypothesis S2 、S4 Two packets are lost , Then the formula 1 Delete the row corresponding to the identity matrix in , There are the following :
Graph
The formula 2 The matrix on the left is written as B, as follows :
Graph
According to the reversibility of Vandermonde matrix , therefore B It is also a reversible matrix , Write it down as B, In fact, the process of restoring the package is mainly to solve B' The process of matrix , For the formula 2 Make the following derivation , You can solve the original package , As shown below :
Graph
namely (S1、S2、S3、S4、S5、S6、S7) Any package in can pass through the matrix B' And the received package . therefore RS The protection ability of is stronger .
RED[5]
RED It is also forward error correction ⼀ Ways of planting , The sender sends redundancy code actively , Come on ⼀ To some extent, it can resist the problem of packet loss in the transmission network . The decoder can recover the lost packets through redundant packets ,RED The standard specification of is RFC2198 In the definition of , Can be used in video and audio redundant packet generation ,WebRTC Audio in m96 It's on RED The way .
RED Of payload Contains not only the current package , It also includes a history pack , such payload stay ⼀ Redundant information to some extent , Play the role of anti packet loss .
Here is a brief introduction RED The packaging format of :RED block head
Here's a RED Example of a package :
WebRTC Use in RED Package to generate audio Redundant packages , The principle is as follows :
Graph
Above picture , The sender sends the current packet , It will also carry the previous package as a redundant package , When the picture above RED4 The bag is missing , namely 4,3 When the bag is lost , Follow up RED5 Package arrival , Contains 5,4 package , Before integration RED3 package ( Contains 3, 2 package ), You can recover lost packages .
Backward error correction technology
ARQ
ARQ For packet loss retransmission Technology , The receiver recovers the lost packets by requesting the sender to resend the lost packets .
Compared with forward error correction technology , High delay , When the delay is small , Is a more appropriate choice .
The principle is as follows :
chart
Data packets 3 The first ⼀ At the time of sending , The receiver didn't receive , Send a message to the sender 3 The retransmission request for (WebRTC Chinese envoy ⽤ NACK RTCP package ), After the sender receives the received retransmission request , Then resend the message 3.
The following is true. WebRTC Used in NACK[6] RTCP A brief introduction to the format ,NACK RTCP stay RFC4585 In the introduction ,NACK It belongs to feedback message , namely Feedback Message, The format is as follows :
PT There are two major types :
NACK Corresponding FCI The message format is as follows :
PLC
PLC It is called packet loss concealment technology , At the receiving end , That is, the decoder ; The decoding end is based on the historical voice frame , Carry out signal analysis , By linear prediction coefficient LPC modeling , To predict lost voice frames , The feasibility of this technology is based on short-term speech similarity .
Its advantages are , No extra bandwidth ;PLC Technology can handle small packet loss rate (<15%).
NetEQ Packet loss concealment in is based on the linear prediction coefficient of the previous speech frame PLC Modeling , Reconstruct the speech signal according to the historical speech signal, and then load a certain amount of random noise ;
Continuous packet loss concealment , All use the same linear prediction coefficient LPC Reconstructing voice signals , Pay attention to this ⾥ It is necessary to reduce the correlation between continuous reconstructed signals , Therefore, the packet energy generated by packet loss concealment decreases ;
Finally, for voice continuity , It needs to be smoothed . When packet loss compensation is required , From storage recently 70ms Extract the latest frame data from the speech buffer of and calculate the LPC Coefficient is ok
WebRTC Of NetEQ Module and OPUS All decoders have PLC The function of , If Decoder Support PLC, The decoder is preferred PLC function , Otherwise use NetEQ Of PLC function , The next article is about NetEQ When the module , It will be explained in more detail .
Encoder OPUS Anti weak network characteristics [7]
OPUS It is not only an open source and patent free codec , And compared with other codecs , Excellent performance . This is also WebRTC The reason why audio usually uses it .
Following pair OPUS Some features of , These features are of great help against weak Networks .
⽀ Hold full band bandwidth
OPUS The supported bit rate can range from narrowband 6kbps To ⾼ Quality stereo 510kbps, The following picture shows OPUS From narrowband to high-quality broadband coverage , And at the same bit rate , Higher quality .
Graph
OPUS Support dynamic rate adjustment
It can seamlessly adjust the code height , At the same bit rate ,OPUS Better sound quality ; At the same time, in the case of packet loss , When the packet loss rate is greater than a certain range , Will convert the encoding mode to SILK Pattern , That is, low bit rate mode , To adapt to the network .
OPUS Lower latency
OPUS It combines two coding and decoding technologies ,SILK( For voice ) and CELT( For music ), It has the advantage of low latency .
This is essential for use as part of a low delay audio communication link , OPUS The algorithm delay can be reduced to 5 millisecond .
Existing music codecs ( for example MP3、Vorbis and HE-AAC) have 100 Milliseconds or more delay , and OPUS The delay is much lower , But the quality is equivalent to the bit rate , As shown in the figure below :
Graph
OPUS Support in band FEC
OPUS Support in band FEC function , In the use of FEC after , Redundant packets can be generated according to the packet loss rate , Improve the anti packet loss ability of audio .
OPUS In band FEC The function is used in a similar way RED Method , That is, when sending the current packet , Will carry the contents of the previous bag , It's just that the last packet used low bit rate coding to generate redundant packets , It's like this :
|1| | -> |2|1| -> |3|2| -> |4|3| -> |5|4| -> |6|5|
Here is OPUS and FEC Several related interfaces :
What needs to be pointed out here is ,OPUS Built in FEC Package only in SILK Mode ,CELT In coding mode, no redundant packets are generated .
WebRTC in FEC The function of is enabled through SDP Through negotiation , As shown below :
The picture below is OPUS Turn on FEC And it's not on FEC Effect comparison diagram of [8]
Graph
As you can see from the diagram ,FEC After opening , stay 20% In case of packet loss , Audio MOS The value increase is still very obvious .
OPUS The decoder supports PLC
OPUS The decoder supports packet loss concealment , Its principle is based on the short-term similarity of speech signals , Use the normal or recovered voice signal of the previous frame , Carry out signal analysis , Reconstruct and predict the currently lost speech frames .
OPUS Voice function support DTX
When not in music mode , That is to say VoIP In mode , When no voice is detected within a certain time period , To save bandwidth , You can turn on DTX.
This is the time , When no call sound is detected ,OPUS On a regular basis 400ms Send mute packet , Achieve the purpose of reducing bandwidth ,WebRTC This feature is not enabled by default , To turn on DTX, It only needs SDP When negotiating , stay a=ftmp Add usedtx=1 Can be opened .
OPUS Ben ⾝ It has many characteristics of resisting weak Networks , These features are combined with packet loss retransmission , It can make audio have strong anti weak network ability .
This paper mainly combines the actual weak network processing experience , Forward correction 、 Backward error correction and OPUS Characteristics of encoder itself , Make a brief description and summary of some common technologies of audio weak network .
Weak network processing also has a key anti jitter technology , This will be covered in detail in the next article in this series .
Reference material :
[1]: https://datatracker.ietf.org/doc/html/rfc5109
[2]: https://datatracker.ietf.org/doc/html/draft-ietf-payload-flexible-fec-scheme-03
[3]: https://tex2e.github.io/rfctranslater/html/rfc5510.html
[4]: https://www.scirp.org/pdf/6-2.16.pdf
[5]: https://datatracker.ietf.org/doc/html/rfc2198
[6] https://tex2e.github.io/rfc-translater/html/rfc4585.html
[7]: https://ja.wikipedia.org/wiki/OPUS_(%E9%9F%B3%E5%A3%B0%E5%9C%A7%E7%B8%AE)
[8]: https://www.OPUScodec.org/static/presentations/OPUS_voice_aes135.pdf
边栏推荐
- UML 状态图
- Because the employee set the password to "123456", amd stolen 450gb data?
- 关于后台动态模板添加内容的总结 Builder使用
- 全球首款 RISC-V 笔记本电脑开启预售,专为元宇宙而生!
- 潘多拉 IOT 开发板学习(HAL 库)—— 实验12 RTC实时时钟实验(学习笔记)
- 一个程序员的水平能差到什么程度?尼玛,都是人才呀...
- PAG experience: complete AE dynamic deployment and launch all platforms in ten minutes!
- PAG体验:十分钟完成AE动效部署上线各平台!
- Bashrc and profile
- Substance painter notes: settings for multi display and multi-resolution displays
猜你喜欢
随机推荐
Mrs offline data analysis: process OBS data through Flink job
Data Lake (IX): Iceberg features and data types
Million data document access of course design
Instructions for mictr01 tester vibrating string acquisition module development kit
多商戶商城系統功能拆解01講-產品架構
一款你不容错过的Laravel后台管理扩展包 —— Voyager
EMQX 5.0 发布:单集群支持 1 亿 MQTT 连接的开源物联网消息服务器
Navigation - are you sure you want to take a look at such an easy-to-use navigation framework?
MicTR01 Tester 振弦采集模塊開發套件使用說明
Navigation — 这么好用的导航框架你确定不来看看?
GAN发明者Ian Goodfellow正式加入DeepMind,任Research Scientist
大厂做开源的五大痛点
PyTorch模型训练实战技巧,突破速度瓶颈
Mlgo: Google AI releases industrial compiler optimized machine learning framework
潘多拉 IOT 开发板学习(HAL 库)—— 实验12 RTC实时时钟实验(学习笔记)
First choice for stock account opening, lowest Commission for stock trading account opening, is online account opening safe
数据流图,数据字典
Oracle Linux 9.0 正式发布
用例图
ES日志报错赏析-maximum shards open