当前位置:网站首页>Audio alignment using cross-correlation
Audio alignment using cross-correlation
2022-07-31 21:37:00 【atypical nonsense】
When calculating some audio metrics such as SNR, we need the audio signal to be aligned with the reference signal, but sometimes our processed or recorded audio is not aligned with the reference signal, which requires finding a way to align themAlign.
I. Cross-correlation function
Audio alignment can be transformed into a delay estimation problem. We have previously introduced the use of GCC-PHAT for delay estimation, here we use a simpler way to estimate - the cross-correlation function.我们在The time domain analysis of speech signal has introduced the autocorrelation function, and the similar calculation formula for the cross-correlation function of the discrete time domain signal is:

We know that in the frequency domain cross-correlation, we have weighted the generalized cross-correlation, and PHAT has whitened the result, making the peak value of the cross-correlation function more obvious. Similarly, formula (1) can be similarly operated to makeThe peaks are more pronounced:

The calculation of cross-correlation is actually similar to cross-correlation. I found a video to explain the calculation process.
II. Praat
Many programming languages encapsulate cross-correlation functions. Here we use a software commonly used in the field of speech analysis called Praat.Since the speech length is finite Praat has made appropriate modifications on the cross-correlation function, in simple terms the start time of the cross-correlation sequence will be the start time of f minus the end time of g, and the end time will be the end time of f minus the end time of gThe start time of g, i.e. the time of the first sample is the first sample of f minus the last sample of g, the time of the last sample will be the last sample of f minus the first sample of g, autocorrelationThe length of the sequence is the sum of the samples of f and g minus 1.
Let's take a look at the effect below. First, we have two audios, as shown in the figure below. It can be clearly seen that the audio on the two tracks has a significant delay.

We calculated the time delay of the two audios by Praat to be about 1.284s.

We advance the audio of the second track by 1.284s.

The result is as follows, it seems that the two audios are basically aligned.

The relevant code of this article can be obtained by clicking Code in the menu bar of the voice algorithm group of the official account.
References:
[1]. http://paulbourke.net/miscellaneous/correlate/
[2].UCBS, Digital Speech Process
[3]. http://www.dsg-bielefeld.de/dsg_wp/wp-content/uploads/2014/10/video_syncing_fun.pdf
[4]. https://www.fon.hum.uva.nl/praat/manual/Sounds__Cross-correlate___.html
边栏推荐
- NVIDIA has begun testing graphics products with AD106 and AD107 GPU cores
- 【Yugong Series】July 2022 Go Teaching Course 025-Recursive Function
- UVM RAL model and built-in seq
- 高效并发:Synchornized的锁优化详解
- 返回一个零长度的数组或者空的集合,不要返回null
- MATLAB program design and application 2.4 Common internal functions of MATLAB
- multithreaded lock
- Istio introduction
- Linux环境redis集群搭建「建议收藏」
- -xms -xmx(information value)
猜你喜欢

Memblaze released the first enterprise-grade SSD based on long-lasting particles. What is the new value behind it?
![[PIMF] OpenHarmony Thesis Club - Inventory of the open source Hongmeng tripartite library [3]](/img/8c/22e083d2a9a4a6f983b985fe454893.png)
[PIMF] OpenHarmony Thesis Club - Inventory of the open source Hongmeng tripartite library [3]

Flex layout in detail

Implementing a Simple Framework for Managing Object Information Using Reflection

Embedded development has no passion, is it normal?

Socket Review and I/0 Model

Count characters in UTF-8 string function

MATLAB program design and application 2.4 Common internal functions of MATLAB
![[NLP] What is the memory of the model!](/img/d8/a367c26b51d9dbaf53bf4fe2a13917.png)
[NLP] What is the memory of the model!

第七章
随机推荐
Socket回顾与I/0模型
【AcWing】第 62 场周赛 【2022.07.30】
1161. Maximum Sum of Elements in Layer: Hierarchical Traversal Application Problems
sqlite3简单操作
The old music player WinAmp released version 5.9 RC1: migrated to VS 2019, completely rebuilt, compatible with Win11
Golang - from entry to abandonment
Architect 04 - Application Service Encryption Design and Practice
"The core concept of" image classification and target detection in the positive and negative samples and understanding architecture
[Intensive reading of the paper] iNeRF
<artifactId>ojdbc8</artifactId>「建议收藏」
Student management system on the first day: complete login PyQt5 + MySQL5.8 exit the operation logic
移动web开发02
基于STM32 环形队列来实现串口接收数据
请问我的这段sql中sql语法哪里出了错
【Yugong Series】July 2022 Go Teaching Course 023-List of Go Containers
Count characters in UTF-8 string function
Thymeleaf是什么?该如何使用。
Memblaze released the first enterprise-grade SSD based on long-lasting particles. What is the new value behind it?
性能优化:记一次树的搜索接口优化思路
grep命令 笔试题