当前位置:网站首页>Webrtc audio anti weak network technology (Part 1)
Webrtc audio anti weak network technology (Part 1)
2022-07-07 14:34:00 【51CTO】
Per capita social fear , Love chatting . Voice social products represented by chat rooms unlock new ways for strangers to socialize , Also constantly telling new out of circle stories . Focus on 【 Rongyun global Internet communication cloud 】 Learn more about
And the sound is stuck 、 intermittent 、 Fast forward 、 Slow playback and other phenomena will seriously affect the user experience , Directly cause the user to leave , These are common problems caused by weak Networks .
This paper mainly analyzes the commonly used weak network countermeasures technology from the perspective of audio applications , There are mainly the following :
- Forward error correction technology (FEC、RED etc. )
- Backward error correction technology (ARQ、PLC etc. )
- Encoder anti weak network characteristics ( This article focuses on OPUS Characteristics of encoder )
- Anti jitter technology (JitterBuffer)
We will use 、 Next two articles , combination WebRTC Audio anti weak network technology used or supported in , Analyze the above technologies , To achieve high availability of audio communication services in weak network environment .
The first part mainly shares forward error correction technology 、 Backward error correction technology and OPUS Codec anti weak network characteristics ; The next topic is to share WebRTC Anti jitter module used NetEQ.
Forward error correction technology
FEC
Forward error correction technology , The most typical is FEC Technology .
The sender : Generate redundant packets to combat packet loss during transmission ;
The receiver : Recover the lost packets in the transmission process for the received redundant packets and normal packets .
FEC In band and out of band ,WebRTC Chinese video is through out of band FEC(ULPFEC[1]、FLEXFEC[2]) To generate redundant packets , Audio is through OPUS Intraband FEC Next generation into redundant packets .
Intraband FEC Because it will occupy part of the coding code rate , So the audio quality will be reduced . Out of band FEC Will not affect the sound quality , But it will occupy additional network bandwidth , Each has its own advantages and disadvantages .
FEC Typical coding methods are XOR and Reed Solomon[3].WebRTC Out of band of FEC It uses XOR Encoding mode (ULPFEC and FLEXFEC), Its characteristic is that the amount of calculation is relatively small , But its ability to resist packet loss is limited .
stay WebRTC in , Out of band FEC, Whether it's ULPFEC, still FLEXFEC Are all based on MASK Mask to determine FEC Packages and protected sources RTP Mapping relationship of packages , Two types of masks are defined ,RandMask and BurstMask, The former has better protection effect in random packet loss ; The latter is better for continuous packet loss caused by burst , But either way , Both have their disadvantages ; Here we use 7-4 Mask ( namely 7 Original package , Will generate 4 Two redundant packages ) give an example :
Expand the above hexadecimal by binary as follows :
The mask above indicates according to S1-S7 common 7 Original package , The sender will generate 4 Two redundant packages R1-R4, among :
- R1 Package protection S3,S4,S5 Three original packages
- R2 Package protection S1,S5,S7 Three original packages
- R3 Package protection S1,S2,S6 Three original packages
- R4 Package protection S2,S3,S7 Three original packages
It can also be seen from the above , Each original package is protected by redundant packages ; When the bag is lost , Generally, it can be recovered through redundant packets and received original packets , For example, the sender sent S1-S7、R1-R4 common 11 A package , The receiver received it S1、S3、S5、S7、R1、R2、R3、R4 common 8 A package , lost S2、S4、S6 Three bags ; be S2、S4、S6 The repair process is as follows :
- S2 Can be R4、S3、S7 Repair , namely S2 = R4 XOR S3 XOR S7
- S4 Can be R1、S3、S5 Repair , namely S4 = R1 XOR S3 XOR S5
- S6 Can be R3、S1、S2 Repair , namely S6 = R3 XOR S1 XOR S2
But some packages cannot be repaired , For example, lost S1、S2、S7, Can't recover , Here's why :
According to the mask protection relationship ,S1 The recovery of can be achieved through R2、S5、S7 perhaps R3、S2、S6; But because S7 and S2 The loss of , To recover S1, You need to recover first S2 or S7
Again ,S2 Can pass R3、S1、S6 recovery , But because S1 The loss of , You need to recover first S1
Empathy ,S6 Can pass R3、S1、S2 recovery , But you need to recover first S1、S2
therefore , Through the above analysis, we can see S1、S2、S7 all ⽆ Method recovery
Empathy , If lost S3、S5、S7, Can't recover , This is a WebRTC The technical disadvantage of using mask to determine the protection relationship between redundant packets and original packets .
That is for (M Original package + N Two redundant packages ) A set of packages , There is less than or equal to N When a packet is lost , It may not be possible to recover lost packets .
Reed Solomon Coding can be done for (M Original package + N Two redundant packages ) A set of packages , There is less than or equal to N Bags lost , Can recover the lost package .
RS FEC It mainly uses Vandermonde matrix or Cauchy matrix to encode and decode [4], The effect of Cauchy matrix is less than that of Vandermonde matrix , Better performance ; But no matter what matrix above , they All have ⼀ One characteristic is reversibility , And any submatrix is reversible , This ensures that the loss is less than or equal to N A bag ,RS Can restore it .
The following is a brief description of Vandermonde matrix . With (7,4) For example , namely 7 Original packages are generated 4 Two redundant packages , The original package is S1、S2、S3、S4、S5、S6、S7, The redundant package is (R1、R2、R3、R4). The relationship between the original package and the redundant package is as follows :
Graph
The above Vandermonde matrix is A, As shown below :
Graph
The identity matrix is expressed as follows :
Graph
hypothesis S2 、S4 Two packets are lost , Then the formula 1 Delete the row corresponding to the identity matrix in , There are the following :
Graph
The formula 2 The matrix on the left is written as B, as follows :
Graph
According to the reversibility of Vandermonde matrix , therefore B It is also a reversible matrix , Write it down as B, In fact, the process of restoring the package is mainly to solve B' The process of matrix , For the formula 2 Make the following derivation , You can solve the original package , As shown below :
Graph
namely (S1、S2、S3、S4、S5、S6、S7) Any package in can pass through the matrix B' And the received package . therefore RS The protection ability of is stronger .
RED[5]
RED It is also forward error correction ⼀ Ways of planting , The sender sends redundancy code actively , Come on ⼀ To some extent, it can resist the problem of packet loss in the transmission network . The decoder can recover the lost packets through redundant packets ,RED The standard specification of is RFC2198 In the definition of , Can be used in video and audio redundant packet generation ,WebRTC Audio in m96 It's on RED The way .
RED Of payload Contains not only the current package , It also includes a history pack , such payload stay ⼀ Redundant information to some extent , Play the role of anti packet loss .
Here is a brief introduction RED The packaging format of :RED block head
Here's a RED Example of a package :
WebRTC Use in RED Package to generate audio Redundant packages , The principle is as follows :
Graph
Above picture , The sender sends the current packet , It will also carry the previous package as a redundant package , When the picture above RED4 The bag is missing , namely 4,3 When the bag is lost , Follow up RED5 Package arrival , Contains 5,4 package , Before integration RED3 package ( Contains 3, 2 package ), You can recover lost packages .
Backward error correction technology
ARQ
ARQ For packet loss retransmission Technology , The receiver recovers the lost packets by requesting the sender to resend the lost packets .
Compared with forward error correction technology , High delay , When the delay is small , Is a more appropriate choice .
The principle is as follows :
chart
Data packets 3 The first ⼀ At the time of sending , The receiver didn't receive , Send a message to the sender 3 The retransmission request for (WebRTC Chinese envoy ⽤ NACK RTCP package ), After the sender receives the received retransmission request , Then resend the message 3.
The following is true. WebRTC Used in NACK[6] RTCP A brief introduction to the format ,NACK RTCP stay RFC4585 In the introduction ,NACK It belongs to feedback message , namely Feedback Message, The format is as follows :
PT There are two major types :
NACK Corresponding FCI The message format is as follows :
PLC
PLC It is called packet loss concealment technology , At the receiving end , That is, the decoder ; The decoding end is based on the historical voice frame , Carry out signal analysis , By linear prediction coefficient LPC modeling , To predict lost voice frames , The feasibility of this technology is based on short-term speech similarity .
Its advantages are , No extra bandwidth ;PLC Technology can handle small packet loss rate (<15%).
NetEQ Packet loss concealment in is based on the linear prediction coefficient of the previous speech frame PLC Modeling , Reconstruct the speech signal according to the historical speech signal, and then load a certain amount of random noise ;
Continuous packet loss concealment , All use the same linear prediction coefficient LPC Reconstructing voice signals , Pay attention to this ⾥ It is necessary to reduce the correlation between continuous reconstructed signals , Therefore, the packet energy generated by packet loss concealment decreases ;
Finally, for voice continuity , It needs to be smoothed . When packet loss compensation is required , From storage recently 70ms Extract the latest frame data from the speech buffer of and calculate the LPC Coefficient is ok
WebRTC Of NetEQ Module and OPUS All decoders have PLC The function of , If Decoder Support PLC, The decoder is preferred PLC function , Otherwise use NetEQ Of PLC function , The next article is about NetEQ When the module , It will be explained in more detail .
Encoder OPUS Anti weak network characteristics [7]
OPUS It is not only an open source and patent free codec , And compared with other codecs , Excellent performance . This is also WebRTC The reason why audio usually uses it .
Following pair OPUS Some features of , These features are of great help against weak Networks .
⽀ Hold full band bandwidth
OPUS The supported bit rate can range from narrowband 6kbps To ⾼ Quality stereo 510kbps, The following picture shows OPUS From narrowband to high-quality broadband coverage , And at the same bit rate , Higher quality .
Graph
OPUS Support dynamic rate adjustment
It can seamlessly adjust the code height , At the same bit rate ,OPUS Better sound quality ; At the same time, in the case of packet loss , When the packet loss rate is greater than a certain range , Will convert the encoding mode to SILK Pattern , That is, low bit rate mode , To adapt to the network .
OPUS Lower latency
OPUS It combines two coding and decoding technologies ,SILK( For voice ) and CELT( For music ), It has the advantage of low latency .
This is essential for use as part of a low delay audio communication link , OPUS The algorithm delay can be reduced to 5 millisecond .
Existing music codecs ( for example MP3、Vorbis and HE-AAC) have 100 Milliseconds or more delay , and OPUS The delay is much lower , But the quality is equivalent to the bit rate , As shown in the figure below :
Graph
OPUS Support in band FEC
OPUS Support in band FEC function , In the use of FEC after , Redundant packets can be generated according to the packet loss rate , Improve the anti packet loss ability of audio .
OPUS In band FEC The function is used in a similar way RED Method , That is, when sending the current packet , Will carry the contents of the previous bag , It's just that the last packet used low bit rate coding to generate redundant packets , It's like this :
|1| | -> |2|1| -> |3|2| -> |4|3| -> |5|4| -> |6|5|
Here is OPUS and FEC Several related interfaces :
What needs to be pointed out here is ,OPUS Built in FEC Package only in SILK Mode ,CELT In coding mode, no redundant packets are generated .
WebRTC in FEC The function of is enabled through SDP Through negotiation , As shown below :
The picture below is OPUS Turn on FEC And it's not on FEC Effect comparison diagram of [8]
Graph
As you can see from the diagram ,FEC After opening , stay 20% In case of packet loss , Audio MOS The value increase is still very obvious .
OPUS The decoder supports PLC
OPUS The decoder supports packet loss concealment , Its principle is based on the short-term similarity of speech signals , Use the normal or recovered voice signal of the previous frame , Carry out signal analysis , Reconstruct and predict the currently lost speech frames .
OPUS Voice function support DTX
When not in music mode , That is to say VoIP In mode , When no voice is detected within a certain time period , To save bandwidth , You can turn on DTX.
This is the time , When no call sound is detected ,OPUS On a regular basis 400ms Send mute packet , Achieve the purpose of reducing bandwidth ,WebRTC This feature is not enabled by default , To turn on DTX, It only needs SDP When negotiating , stay a=ftmp Add usedtx=1 Can be opened .
OPUS Ben ⾝ It has many characteristics of resisting weak Networks , These features are combined with packet loss retransmission , It can make audio have strong anti weak network ability .
This paper mainly combines the actual weak network processing experience , Forward correction 、 Backward error correction and OPUS Characteristics of encoder itself , Make a brief description and summary of some common technologies of audio weak network .
Weak network processing also has a key anti jitter technology , This will be covered in detail in the next article in this series .
Reference material :
[1]: https://datatracker.ietf.org/doc/html/rfc5109
[2]: https://datatracker.ietf.org/doc/html/draft-ietf-payload-flexible-fec-scheme-03
[3]: https://tex2e.github.io/rfctranslater/html/rfc5510.html
[4]: https://www.scirp.org/pdf/6-2.16.pdf
[5]: https://datatracker.ietf.org/doc/html/rfc2198
[6] https://tex2e.github.io/rfc-translater/html/rfc4585.html
[7]: https://ja.wikipedia.org/wiki/OPUS_(%E9%9F%B3%E5%A3%B0%E5%9C%A7%E7%B8%AE)
[8]: https://www.OPUScodec.org/static/presentations/OPUS_voice_aes135.pdf
边栏推荐
猜你喜欢
因员工将密码设为“123456”,AMD 被盗 450Gb 数据?
Base64 encoding
拼多多败诉,砍价始终差0.9%一案宣判;微信内测同一手机号可注册两个账号功能;2022年度菲尔兹奖公布|极客头条...
Substance painter notes: settings for multi display and multi-resolution displays
Simple use of websocket
2022年13个UX/UI/UE最佳创意灵感网站
Verilog implementation of a simple legv8 processor [4] [explanation of basic knowledge and module design of single cycle implementation]
Ian Goodfellow, the inventor of Gan, officially joined deepmind as research scientist
低代码平台中的数据连接方式(下)
The longest ascending subsequence model acwing 1012 Sister cities
随机推荐
electron remote 报错
Navigation — 这么好用的导航框架你确定不来看看?
Leetcode——344. 反转字符串/541. 反转字符串 II/151. 颠倒字符串中的单词/剑指 Offer 58 - II. 左旋转字符串
ES日志报错赏析-maximum shards open
JS image to Base64
Decrypt the three dimensional design of the game
小程序目录结构
MRS离线数据分析:通过Flink作业处理OBS数据
Analysis of arouter
昇腾体验官第五期随手记I
属性关键字OnDelete,Private,ReadOnly,Required
PD virtual machine tutorial: how to set the available shortcut keys in the parallelsdesktop virtual machine?
2022年13个UX/UI/UE最佳创意灵感网站
The longest ascending subsequence model acwing 1012 Sister cities
Oracle non automatic submission solution
First choice for stock account opening, lowest Commission for stock trading account opening, is online account opening safe
Bashrc and profile
Cocos creator direction and angle conversion
Mrs offline data analysis: process OBS data through Flink job
什么是云原生?这回终于能搞明白了!