当前位置:网站首页>Audio alignment using cross-correlation
Audio alignment using cross-correlation
2022-07-31 21:37:00 【atypical nonsense】
When calculating some audio metrics such as SNR, we need the audio signal to be aligned with the reference signal, but sometimes our processed or recorded audio is not aligned with the reference signal, which requires finding a way to align themAlign.
I. Cross-correlation function
Audio alignment can be transformed into a delay estimation problem. We have previously introduced the use of GCC-PHAT for delay estimation, here we use a simpler way to estimate - the cross-correlation function.我们在The time domain analysis of speech signal has introduced the autocorrelation function, and the similar calculation formula for the cross-correlation function of the discrete time domain signal is:
We know that in the frequency domain cross-correlation, we have weighted the generalized cross-correlation, and PHAT has whitened the result, making the peak value of the cross-correlation function more obvious. Similarly, formula (1) can be similarly operated to makeThe peaks are more pronounced:
The calculation of cross-correlation is actually similar to cross-correlation. I found a video to explain the calculation process.
II. Praat
Many programming languages encapsulate cross-correlation functions. Here we use a software commonly used in the field of speech analysis called Praat.Since the speech length is finite Praat has made appropriate modifications on the cross-correlation function, in simple terms the start time of the cross-correlation sequence will be the start time of f minus the end time of g, and the end time will be the end time of f minus the end time of gThe start time of g, i.e. the time of the first sample is the first sample of f minus the last sample of g, the time of the last sample will be the last sample of f minus the first sample of g, autocorrelationThe length of the sequence is the sum of the samples of f and g minus 1.
Let's take a look at the effect below. First, we have two audios, as shown in the figure below. It can be clearly seen that the audio on the two tracks has a significant delay.
We calculated the time delay of the two audios by Praat to be about 1.284s.
We advance the audio of the second track by 1.284s.
The result is as follows, it seems that the two audios are basically aligned.
The relevant code of this article can be obtained by clicking Code in the menu bar of the voice algorithm group of the official account.
References:
[1]. http://paulbourke.net/miscellaneous/correlate/
[2].UCBS, Digital Speech Process
[3]. http://www.dsg-bielefeld.de/dsg_wp/wp-content/uploads/2014/10/video_syncing_fun.pdf
[4]. https://www.fon.hum.uva.nl/praat/manual/Sounds__Cross-correlate___.html
边栏推荐
- The old music player WinAmp released version 5.9 RC1: migrated to VS 2019, completely rebuilt, compatible with Win11
- Shell script quick start to actual combat -02
- pytorch lstm时间序列预测问题踩坑「建议收藏」
- Basics of ResNet: Principles of Residual Blocks
- IDA PRO中汇编结构体识别
- Count characters in UTF-8 string function
- Short-circuit characteristics and protection of SiC MOSFETs
- [Code Hoof Set Novice Village 600 Questions] Leading to the combination of formulas and programs
- The whole network is on the verge of triggering, and the all-round assistant for content distribution from media people - Rongmeibao
- 基于STM32 环形队列来实现串口接收数据
猜你喜欢
老牌音乐播放器 WinAmp 发布 5.9 RC1 版:迁移到 VS 2019 完全重建,兼容 Win11
20. Support vector machine - knowledge of mathematical principles
Write a database document management tool based on WPF repeating the wheel (1)
Realization of character makeup
Quick Start Tutorial for flyway
[Open class preview]: Research and application of super-resolution technology in the field of video image quality enhancement
AI 自动写代码插件 Copilot(副驾驶员)
1161. Maximum Sum of Elements in Layer: Hierarchical Traversal Application Problems
Embedded development has no passion, is it normal?
How to debug TestCafe
随机推荐
【PIMF】OpenHarmony 啃论文俱乐部—盘点开源鸿蒙三方库【3】
Batch (batch size, full batch, mini batch, online learning), iterations and epochs in deep learning
Introduction to Audio Types and Encoding Formats in Unity
Given an ip address, how does the subnet mask calculate the network number (how to get the ip address and subnet mask)
给定一个ip地址,子网掩码怎么算网络号(如何获取ip地址和子网掩码)
BM3 将链表中的节点每k个一组翻转
How to get useragent
Redis综述篇:与面试官彻夜长谈Redis缓存、持久化、淘汰机制、哨兵、集群底层原理!...
C language parsing json string (json object is converted to string)
高效并发:Synchornized的锁优化详解
matplotlib ax bar color Set the color, transparency, label legend of the ax bar
GAC Honda Safety Experience Camp: "Danger" is the best teacher
全网一触即发,自媒体人的内容分发全能助手——融媒宝
[NLP] What is the memory of the model!
Short-circuit characteristics and protection of SiC MOSFETs
Book of the Month (202207): The Definitive Guide to Swift Programming
Three. Introduction to js
Arduino框架下STM32全系列开发固件安装指南
Poker Game in C# -- Introduction and Code Implementation of Blackjack Rules
linux view redis version command (linux view mysql version number)