当前位置:网站首页>Webrtc AEC process analysis
Webrtc AEC process analysis
2022-06-12 05:32:00 【Atypical nonsense】

Today we are going to introduce 3A One of the most difficult algorithms , It's also WebRTC Process parsing is the last algorithm in this series , Acoustic echo cancellation (Acoustic Echo Cancellation,AEC ). If the reader is right WebRTC Have some understanding of words , You know WebRTC Of AEC The algorithm can be roughly divided into three parts : Time delay estimation 、 Linear echo cancellation 、 Nonlinear processing .
I. Introduction
The simple principle of echo cancellation has already been introduced , Can have reference Analytic adaptive filtering echo cancellation and Echo cancellation algorithm based on Kalman filter .WebRTC AEC The frequency domain autocorrelation method is used for time delay estimation . The linear part adopts the block frequency domain adaptive filter (Partitioned Block Frequency Domain Adaptive Filter, PBFDAF), This filter is in Speex It is called block frequency domain wave generator (Multidelayblock frequency Filter,MDF), In fact, their principle is the same . Here's the difference Speex Of AEC Two filters are used ( Foreground filter and background filter ) Therefore, the linear echo cancellation part has better performance , however AEC3 Two filters are also introduced , I won't start here. I'll have a chance to introduce it later . Finally, by calculating the near end signal 、 Nonlinear processing based on frequency domain correlation between error signal and remote signal (NonLinearProcessing, NLP).
WebRTC AEC The flow of is similar to other algorithms , First of all, we need to create An instance .
int32_t WebRtcAec_Create(void** aecInst)In the above function, we create AEC And resampled instances .
int WebRtcAec_CreateAec(AecCore** aecInst)
int WebRtcAec_CreateResampler(void** resampInst)stay WebRtcAec_CreateAec Will open up some buffer, Including the near end / Distal / Output / Delay estimation, etc . It is worth mentioning that ,WebRTC AEC Of buffer The structure is defined as follows , In addition to the data, we can find that there are some variables that record the location .
struct RingBuffer {
size_t read_pos;
size_t write_pos;
size_t element_count;
size_t element_size;
enum Wrap rw_wrap;
char* data;
};Where near end and output buffer The same size (FRAME_LEN:80, PART_LEN:64)
aec->nearFrBuf = WebRtc_CreateBuffer(FRAME_LEN + PART_LEN, sizeof(int16_t));
aec->outFrBuf = WebRtc_CreateBuffer(FRAME_LEN + PART_LEN, sizeof(int16_t));Distal buffer A little bigger (kBufSizePartitions:250, PART_LEN1:64+1)
aec->far_buf = WebRtc_CreateBuffer(kBufSizePartitions, sizeof(float) * 2 * PART_LEN1);
aec->far_buf_windowed = WebRtc_CreateBuffer(kBufSizePartitions, sizeof(float) * 2 * PART_LEN1);The content about delay estimation will also be initialized here .
void* WebRtc_CreateDelayEstimatorFarend(int spectrum_size, int history_size)
void* WebRtc_CreateDelayEstimator(void* farend_handle, int lookahead)Next is initialization , There are two sampling rates. One is the original sampling rate , The other is the sampling rate after resampling . The original sample rate only supports 8k/16k/32kHz, The sampling rate for resampling is 1—96kHz.
int32_t WebRtcAec_Init(void* aecInst, int32_t sampFreq, int32_t scSampFreq)The following function will set the corresponding parameters according to the original sampling rate , And initially WebRtcAec_Create Open up a variety of buffer Space and various parameter variables and FFT Initialization of calculation .
WebRtcAec_InitAec(AecCore* aec, int sampFreq)Since resampling is involved , You need to initialize resampling related content , It can be found that resampling occurs at WebRTC There are many algorithms .
int WebRtcAec_InitResampler(void* resampInst, int deviceSampleRateHz)Finally, parameter setting ,WebRTC AEC The configuration structure of is as follows
typedef struct {
int16_t nlpMode; // default kAecNlpModerate
int16_t skewMode; // default kAecFalse
int16_t metricsMode; // default kAecFalse
int delay_logging; // default kAecFalse
} AecConfig;During initialization , They are configured as the following parameters by default
aecConfig.nlpMode = kAecNlpModerate;
aecConfig.skewMode = kAecFalse;
aecConfig.metricsMode = kAecFalse;
aecConfig.delay_logging = kAecFalse;Can pass WebRtcAec_set_config To set various parameters .
int WebRtcAec_set_config(void* handle, AecConfig config)When processing each frame ,WebRTC AEC Will first put the remote signal into buffer in
int32_t WebRtcAec_BufferFarend(void* aecInst,
const int16_t* farend,
int16_t nrOfSamples)If resampling is required , The resampling function will be called inside this function ,aec Resampling is very simple and direct linear interpolation processing , There is no mirror image suppression filter . there skew Seems to be right 44.1 and 44 kHz Clock compensation of this kind of exotic sampling rate ( For more details, please refer to [4]).
void WebRtcAec_ResampleLinear(void* resampInst,
const short* inspeech,
int size,
float skew,
short* outspeech,
int* size_out)When far end Of buffer When there is enough data , Conduct FFT Calculation , It will be calculated twice , One time with windows, the other time without windows , The influence of window function can be referred to Framing , Windowing and DFT.
void WebRtcAec_BufferFarendPartition(AecCore* aec, const float* farend)II. Delay Estimation
At the software level, due to various reasons, the near end signal received by the microphone is not aligned with the far end signal transmitted by the network , When the delay between the near end signal and the far end signal is large, we have to use a longer linear filter to process , This undoubtedly increases the amount of calculation . If we can align the near end signal with the far end signal , Then the filter coefficients can be reduced to reduce the algorithm overhead .
Then run the handler , among msInSndCardBuf Is the time difference between the actual input and output of the sound card , That is, the offset time between the local audio and the deleted reference audio . about 8kHz and 16kHz The audio data with sampling rate can be used regardless of the high frequency part , Just pass in the low-frequency data , But for greater than 32kHz The data of sampling rate must be divided into high frequency and low frequency through the filter interface. This is nearend and nearendH The role of .
int32_t WebRtcAec_Process(void* aecInst,
const int16_t* nearend,
const int16_t* nearendH,
int16_t* out,
int16_t* outH,
int16_t nrOfSamples,
int16_t msInSndCardBuf,
int32_t skew)First, make some judgments , Make sure that the input parameters of the function are valid , And then, based on the value of this variable extended_filter_enabled To determine whether to use extend Pattern , The number of partitions and processing methods of the two modes are different .
enum {
kExtendedNumPartitions = 32
};
static const int kNormalNumPartitions = 12;If you use extended The mode needs to set the delay manually (reported_delay_ms)
static void ProcessExtended(aecpc_t* self,
const int16_t* near,
const int16_t* near_high,
int16_t* out,
int16_t* out_high,
int16_t num_samples,
int16_t reported_delay_ms,
int32_t skew) Turn the delay into sampling points and move the far end buffer The pointer , Then on delay Screen and filter .
int WebRtcAec_MoveFarReadPtr(AecCore* aec, int elements)
static void EstBufDelayExtended(aecpc_t* self)If you use normal Pattern
static int ProcessNormal(aecpc_t* aecpc,
const int16_t* nearend,
const int16_t* nearendH,
int16_t* out,
int16_t* outH,
int16_t nrOfSamples,
int16_t msInSndCardBuf,
int32_t skew)There will be one. startup_phase The process of , When the system delay is stable , The process is over ,AEC Will take effect .AEC After taking effect, time delay estimation shall be carried out first buffer, delay Screen and filter .
static void EstBufDelayNormal(aecpc_t* aecpc) And then you go into AEC It's a very important process
void WebRtcAec_ProcessFrame(AecCore* aec,
const short* nearend,
const short* nearendH,
int knownDelay,
int16_t* out,
int16_t* outH)There are clear comments in the code , Explained AEC The core step
For each frame the process is as follows:
1) If the system_delay indicates on being too small for processing a
frame we stuff the buffer with enough data for 10 ms.
2) Adjust the buffer to the system delay, by moving the read pointer.
3) TODO(bjornv): Investigate if we need to add this:
If we can't move read pointer due to buffer size limitations we
flush/stuff the buffer.
4) Process as many partitions as possible.
5) Update the |system_delay| with respect to a full frame of FRAME_LEN
samples. Even though we will have data left to process (we work with
partitions) we consider updating a whole frame, since that's the
amount of data we input and output in audio_processing.
6) Update the outputs.Let's look directly at the processing module , That's the step 4
static void ProcessBlock(AecCore* aec)First, remember that these three variables are the near end signal 、 Remote signal and error signal .
d[PART_LEN], y[PART_LEN], e[PART_LEN]The first step is to estimate and smooth the noise power spectrum of comfortable noise , Then there is the delay estimation .
int WebRtc_DelayEstimatorProcessFloat(void* handle,
float* near_spectrum,
int spectrum_size)The algorithm principle is shown in the following table ,

Firstly, the relationship between subband amplitude and threshold is calculated according to the power spectrum of the far end signal and the near end signal , Thus, the binarization spectrum of far end and near end signals is obtained .
static uint32_t BinarySpectrumFloat(float* spectrum,
SpectrumType* threshold_spectrum,
int* threshold_initialized)Then by solving the bitwise XOR value of both , Select the candidate remote signal with the highest similarity and calculate the corresponding delay .
int WebRtc_ProcessBinarySpectrum(BinaryDelayEstimator* self,
uint32_t binary_near_spectrum) III. PBFDAF
The next step is NLMS Part of , The overall process is shown in the figure below :

PBFDAF The corresponding process can be found in the figure above for each step of , First, the remote frequency domain filtering is realized , Then the results are IFFT operation , The time domain error is obtained by subtracting the near end signal after scaling
static void FilterFar(AecCore* aec, float yf[2][PART_LEN1])Then the error signal is analyzed FFT Transform and normalize the error signal
static void ScaleErrorSignal(AecCore* aec, float ef[2][PART_LEN1])Finally, after FFT/IFFT, Set half the value to zero, etc , Update filter weights in frequency domain .
static void FilterAdaptation(AecCore* aec, float* fft, float ef[2][PART_LEN1])IV. NLP
NLMS A linear filter does not eliminate all echoes , Because the path of the echo is not necessarily nonlinear , Therefore, nonlinear processing is needed to eliminate these residual echoes , Its basic principle is the frequency domain coherence of the signal : If the similarity between the near end signal and the error signal is high, it does not need to be processed , The high similarity between the far end signal and the near end signal requires processing , The nonlinearity is reflected in the use of exponential decay .WebRTC AEC Of NLP In this function
static void NonLinearProcessing(AecCore* aec, short* output, short* outputH)Firstly, the power spectrum of the near end and far end error signal is calculated , Then calculate their cross power spectrum , To calculate the near end - Error subband coherence 、 Distal - Near terminal strip coherence . Then we get the average coherence , Estimate echo status , Calculate the suppression factor and then carry out nonlinear treatment .
static void OverdriveAndSuppress(AecCore* aec,
float hNl[PART_LEN1],
const float hNlFb,
float efw[2][PART_LEN1])Finally, add comfortable noise and proceed IFFT, then overlap and add Get the final output .
static void ComfortNoise(AecCore* aec,
float efw[2][PART_LEN1],
complex_t* comfortNoiseHband,
const float* noisePow,
const float* lambda)Let's look at the effect , The first channel is the remote data , The second channel is the near end data

WebRTC AEC The effect of the treatment

V. Conclusion
WebRTC AEC It consists of linear part and nonlinear part , Personal feeling reflects the role of theory and Engineering in algorithm implementation , in general WebRTC AEC The effect is good , But we know that AEC The effect of is closely related to the hardware, so it requires a lot of energy and time to debug parameters .
This article related code , Click on the official account voice algorithm group menu bar. Code obtain .
reference :
[1]. Real time speech processing practice guide
[2]. https://blog.csdn.net/VideoCloudTech/article/details/110956140
[3]. https://bobondemon.github.io/2019/06/08/Adaptive-Filters-Notes-2/
[4]https://groups.google.com/g/discuss-webrtc/c/T8j0CT_NBvs/m/aLmJ3YwEiYAJ
[5].https://www.bbcyw.com/p-25726216.html
边栏推荐
- Multi thread learning III. classification of threads
- Thingsboard view telemetry data through database
- Calculation method notes for personal use
- Lesson 5: data warehouse construction (III)
- Reason: Canonical names should be kebab-case (‘-‘ separated), lowercase alpha-numeric characters and
- PHP实现图片登录验证码的解决方案
- The most commonly used objective weighting method -- entropy weight method
- Matlab: image rotation and interpolation and comparison of MSE before and after
- 43. Number of occurrences of 1 in 1 ~ n integers
- ESP8266 Arduino OLED
猜你喜欢

Nature | 给全球的新冠伤亡算一笔账
![[getting to the bottom] five minutes to understand the combination evaluation model - fuzzy borde (taking the C question of the 2021 college students' numerical simulation national competition as an e](/img/2e/97310ec36aeb1fc1e9c82361141a36.jpg)
[getting to the bottom] five minutes to understand the combination evaluation model - fuzzy borde (taking the C question of the 2021 college students' numerical simulation national competition as an e

Multi thread learning 4. Sleep, wait, yield, join (), ThreadGroup control the running of threads

Detailed usage of vim editor

Codis 3. X expansion and contraction

Halcon 3D 深度图转换为3D图像

Introduction to audio alsa architecture

Nature | make an account of the new crown casualties in the world

Performance test - Analysis of performance test results

Computer network connected but unable to access the Internet
随机推荐
Sv806 QT UI development
【长时间序列预测】Aotoformer 代码详解之[4]自相关机制
Lvgl8.1 hi3536c platform use
[getting to the bottom] five minutes to understand the combination evaluation model - fuzzy borde (taking the C question of the 2021 college students' numerical simulation national competition as an e
The combined application of TOPSIS and fuzzy borde (taking the second Dawan District cup and the national championship as examples, it may cause misunderstanding, and the Dawan District cup will be up
Project requirements specification
What is thinking
[JS knowledge] easily understand JS anti shake and throttling
按键精灵的简单入门
WiFi protocol and ieee905 protocol learning details
Introduction to MMS memory optimization of Hisilicon MPP service
关于架构(排名不分先后)
Detailed explanation of WiFi 802.1x authentication process
4.3 模拟浏览器操作和页面等待(显示等待和隐式等待、句柄)
57 - II. Continuous positive sequence with sum s
Detailed analysis of the 2021 central China Cup Title A (color selection of mosaic tiles)
Main business objects of pupanvr record (5)
C language - how to define arrays
Minigui3 runs on Hisilicon hi3520d/hi3531 platform
How to deploy dolphin scheduler 1.3.1 on cdh5