当前位置：网站首页>Webrtc AEC process analysis

Webrtc AEC process analysis

2022-06-12 05:32:00 【Atypical nonsense】

Today we are going to introduce 3A One of the most difficult algorithms , It's also WebRTC Process parsing is the last algorithm in this series , Acoustic echo cancellation (Acoustic Echo Cancellation,AEC ). If the reader is right WebRTC Have some understanding of words , You know WebRTC Of AEC The algorithm can be roughly divided into three parts ： Time delay estimation 、 Linear echo cancellation 、 Nonlinear processing .

I. Introduction

The simple principle of echo cancellation has already been introduced , Can have reference Analytic adaptive filtering echo cancellation and Echo cancellation algorithm based on Kalman filter .WebRTC AEC The frequency domain autocorrelation method is used for time delay estimation . The linear part adopts the block frequency domain adaptive filter (Partitioned Block Frequency Domain Adaptive Filter, PBFDAF), This filter is in Speex It is called block frequency domain wave generator （Multidelayblock frequency Filter,MDF）, In fact, their principle is the same . Here's the difference Speex Of AEC Two filters are used ( Foreground filter and background filter ) Therefore, the linear echo cancellation part has better performance , however AEC3 Two filters are also introduced , I won't start here. I'll have a chance to introduce it later . Finally, by calculating the near end signal 、 Nonlinear processing based on frequency domain correlation between error signal and remote signal （NonLinearProcessing, NLP）.

WebRTC AEC The flow of is similar to other algorithms , First of all, we need to create An instance .

int32_t WebRtcAec_Create(void** aecInst)

In the above function, we create AEC And resampled instances .

int WebRtcAec_CreateAec(AecCore** aecInst) 
int WebRtcAec_CreateResampler(void** resampInst)

stay WebRtcAec_CreateAec Will open up some buffer, Including the near end / Distal / Output / Delay estimation, etc . It is worth mentioning that ,WebRTC AEC Of buffer The structure is defined as follows , In addition to the data, we can find that there are some variables that record the location .

struct RingBuffer {
  size_t read_pos;
  size_t write_pos;
  size_t element_count;
  size_t element_size;
  enum Wrap rw_wrap;
  char* data;
};

Where near end and output buffer The same size (FRAME_LEN:80, PART_LEN:64)

aec->nearFrBuf = WebRtc_CreateBuffer(FRAME_LEN + PART_LEN, sizeof(int16_t));
aec->outFrBuf = WebRtc_CreateBuffer(FRAME_LEN + PART_LEN, sizeof(int16_t));

Distal buffer A little bigger (kBufSizePartitions:250, PART_LEN1:64+1)

aec->far_buf = WebRtc_CreateBuffer(kBufSizePartitions, sizeof(float) * 2 * PART_LEN1);
aec->far_buf_windowed = WebRtc_CreateBuffer(kBufSizePartitions, sizeof(float) * 2 * PART_LEN1);

The content about delay estimation will also be initialized here .

void* WebRtc_CreateDelayEstimatorFarend(int spectrum_size, int history_size) 
void* WebRtc_CreateDelayEstimator(void* farend_handle, int lookahead)

Next is initialization , There are two sampling rates. One is the original sampling rate , The other is the sampling rate after resampling . The original sample rate only supports 8k/16k/32kHz, The sampling rate for resampling is 1—96kHz.

int32_t WebRtcAec_Init(void* aecInst, int32_t sampFreq, int32_t scSampFreq)

The following function will set the corresponding parameters according to the original sampling rate , And initially WebRtcAec_Create Open up a variety of buffer Space and various parameter variables and FFT Initialization of calculation .

WebRtcAec_InitAec(AecCore* aec, int sampFreq)

Since resampling is involved , You need to initialize resampling related content , It can be found that resampling occurs at WebRTC There are many algorithms .

int WebRtcAec_InitResampler(void* resampInst, int deviceSampleRateHz)

Finally, parameter setting ,WebRTC AEC The configuration structure of is as follows

typedef struct {
  int16_t nlpMode;      // default kAecNlpModerate
  int16_t skewMode;     // default kAecFalse
  int16_t metricsMode;  // default kAecFalse
  int delay_logging;    // default kAecFalse
} AecConfig;

During initialization , They are configured as the following parameters by default

  aecConfig.nlpMode = kAecNlpModerate;
  aecConfig.skewMode = kAecFalse;
  aecConfig.metricsMode = kAecFalse;
  aecConfig.delay_logging = kAecFalse;

Can pass WebRtcAec_set_config To set various parameters .

int WebRtcAec_set_config(void* handle, AecConfig config)

When processing each frame ,WebRTC AEC Will first put the remote signal into buffer in

int32_t WebRtcAec_BufferFarend(void* aecInst,
                               const int16_t* farend,
                               int16_t nrOfSamples)

If resampling is required , The resampling function will be called inside this function ,aec Resampling is very simple and direct linear interpolation processing , There is no mirror image suppression filter . there skew Seems to be right 44.1 and 44 kHz Clock compensation of this kind of exotic sampling rate （ For more details, please refer to [4]）.

void WebRtcAec_ResampleLinear(void* resampInst,
                              const short* inspeech,
                              int size,
                              float skew,
                              short* outspeech,
                              int* size_out)

When far end Of buffer When there is enough data , Conduct FFT Calculation , It will be calculated twice , One time with windows, the other time without windows , The influence of window function can be referred to Framing , Windowing and DFT.

void WebRtcAec_BufferFarendPartition(AecCore* aec, const float* farend)

II. Delay Estimation

At the software level, due to various reasons, the near end signal received by the microphone is not aligned with the far end signal transmitted by the network , When the delay between the near end signal and the far end signal is large, we have to use a longer linear filter to process , This undoubtedly increases the amount of calculation . If we can align the near end signal with the far end signal , Then the filter coefficients can be reduced to reduce the algorithm overhead .

Then run the handler , among msInSndCardBuf Is the time difference between the actual input and output of the sound card , That is, the offset time between the local audio and the deleted reference audio . about 8kHz and 16kHz The audio data with sampling rate can be used regardless of the high frequency part , Just pass in the low-frequency data , But for greater than 32kHz The data of sampling rate must be divided into high frequency and low frequency through the filter interface. This is nearend and nearendH The role of .

int32_t WebRtcAec_Process(void* aecInst,
                          const int16_t* nearend,
                          const int16_t* nearendH,
                          int16_t* out,
                          int16_t* outH,
                          int16_t nrOfSamples,
                          int16_t msInSndCardBuf,
                          int32_t skew)

First, make some judgments , Make sure that the input parameters of the function are valid , And then, based on the value of this variable extended_filter_enabled To determine whether to use extend Pattern , The number of partitions and processing methods of the two modes are different .

enum {
  kExtendedNumPartitions = 32
};
static const int kNormalNumPartitions = 12;

If you use extended The mode needs to set the delay manually (reported_delay_ms)

static void ProcessExtended(aecpc_t* self,
                            const int16_t* near,
                            const int16_t* near_high,
                            int16_t* out,
                            int16_t* out_high,
                            int16_t num_samples,
                            int16_t reported_delay_ms,
                            int32_t skew)

Turn the delay into sampling points and move the far end buffer The pointer , Then on delay Screen and filter .

int WebRtcAec_MoveFarReadPtr(AecCore* aec, int elements)
static void EstBufDelayExtended(aecpc_t* self)

If you use normal Pattern

static int ProcessNormal(aecpc_t* aecpc,
                         const int16_t* nearend,
                         const int16_t* nearendH,
                         int16_t* out,
                         int16_t* outH,
                         int16_t nrOfSamples,
                         int16_t msInSndCardBuf,
                         int32_t skew)

There will be one. startup_phase The process of , When the system delay is stable , The process is over ,AEC Will take effect .AEC After taking effect, time delay estimation shall be carried out first buffer, delay Screen and filter .

static void EstBufDelayNormal(aecpc_t* aecpc)

And then you go into AEC It's a very important process

void WebRtcAec_ProcessFrame(AecCore* aec,
                            const short* nearend,
                            const short* nearendH,
                            int knownDelay,
                            int16_t* out,
                            int16_t* outH)

There are clear comments in the code , Explained AEC The core step

   For each frame the process is as follows:
   1) If the system_delay indicates on being too small for processing a
      frame we stuff the buffer with enough data for 10 ms.
   2) Adjust the buffer to the system delay, by moving the read pointer.
   3) TODO(bjornv): Investigate if we need to add this:
      If we can't move read pointer due to buffer size limitations we
      flush/stuff the buffer.
   4) Process as many partitions as possible.
   5) Update the |system_delay| with respect to a full frame of FRAME_LEN
      samples. Even though we will have data left to process (we work with
      partitions) we consider updating a whole frame, since that's the
      amount of data we input and output in audio_processing.
   6) Update the outputs.

Let's look directly at the processing module , That's the step 4

static void ProcessBlock(AecCore* aec)

First, remember that these three variables are the near end signal 、 Remote signal and error signal .

d[PART_LEN], y[PART_LEN], e[PART_LEN]

The first step is to estimate and smooth the noise power spectrum of comfortable noise , Then there is the delay estimation .

int WebRtc_DelayEstimatorProcessFloat(void* handle,
                                      float* near_spectrum,
                                      int spectrum_size)

The algorithm principle is shown in the following table ,

Firstly, the relationship between subband amplitude and threshold is calculated according to the power spectrum of the far end signal and the near end signal , Thus, the binarization spectrum of far end and near end signals is obtained .

static uint32_t BinarySpectrumFloat(float* spectrum,
                                    SpectrumType* threshold_spectrum,
                                    int* threshold_initialized)

Then by solving the bitwise XOR value of both , Select the candidate remote signal with the highest similarity and calculate the corresponding delay .

int WebRtc_ProcessBinarySpectrum(BinaryDelayEstimator* self,
                                 uint32_t binary_near_spectrum)

III. PBFDAF

The next step is NLMS Part of , The overall process is shown in the figure below ：

PBFDAF The corresponding process can be found in the figure above for each step of , First, the remote frequency domain filtering is realized , Then the results are IFFT operation , The time domain error is obtained by subtracting the near end signal after scaling

static void FilterFar(AecCore* aec, float yf[2][PART_LEN1])

Then the error signal is analyzed FFT Transform and normalize the error signal

static void ScaleErrorSignal(AecCore* aec, float ef[2][PART_LEN1])

Finally, after FFT/IFFT, Set half the value to zero, etc , Update filter weights in frequency domain .

static void FilterAdaptation(AecCore* aec, float* fft, float ef[2][PART_LEN1])

IV. NLP

NLMS A linear filter does not eliminate all echoes , Because the path of the echo is not necessarily nonlinear , Therefore, nonlinear processing is needed to eliminate these residual echoes , Its basic principle is the frequency domain coherence of the signal ： If the similarity between the near end signal and the error signal is high, it does not need to be processed , The high similarity between the far end signal and the near end signal requires processing , The nonlinearity is reflected in the use of exponential decay .WebRTC AEC Of NLP In this function

static void NonLinearProcessing(AecCore* aec, short* output, short* outputH)

Firstly, the power spectrum of the near end and far end error signal is calculated , Then calculate their cross power spectrum , To calculate the near end - Error subband coherence 、 Distal - Near terminal strip coherence . Then we get the average coherence , Estimate echo status , Calculate the suppression factor and then carry out nonlinear treatment .

static void OverdriveAndSuppress(AecCore* aec,
                                 float hNl[PART_LEN1],
                                 const float hNlFb,
                                 float efw[2][PART_LEN1])

Finally, add comfortable noise and proceed IFFT, then overlap and add Get the final output .

static void ComfortNoise(AecCore* aec,
                         float efw[2][PART_LEN1],
                         complex_t* comfortNoiseHband,
                         const float* noisePow,
                         const float* lambda)

Let's look at the effect , The first channel is the remote data , The second channel is the near end data

WebRTC AEC The effect of the treatment

V. Conclusion

WebRTC AEC It consists of linear part and nonlinear part , Personal feeling reflects the role of theory and Engineering in algorithm implementation , in general WebRTC AEC The effect is good , But we know that AEC The effect of is closely related to the hardware, so it requires a lot of energy and time to debug parameters .

This article related code , Click on the official account voice algorithm group menu bar. Code obtain .

reference ：

[1]. Real time speech processing practice guide

[2]. https://blog.csdn.net/VideoCloudTech/article/details/110956140

[3]. https://bobondemon.github.io/2019/06/08/Adaptive-Filters-Notes-2/

[4]https://groups.google.com/g/discuss-webrtc/c/T8j0CT_NBvs/m/aLmJ3YwEiYAJ

[5].https://www.bbcyw.com/p-25726216.html

原网站

版权声明
本文为[Atypical nonsense]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/163/202206120524312614.html