当前位置:网站首页>The most popular explanation of librosa 𞓜 Mel spectrum
The most popular explanation of librosa 𞓜 Mel spectrum
2022-06-21 05:32:00 【Begonia_ cat】
List of articles
Write it at the front
stay Medium Forum , Read a post about Mel spectrum , The author makes it easy to understand, vivid and humorous , So translate and share . a , It is convenient for me to consult in the future , Second, , Help other confused friends learn together .
Of course , If there are conditions (as you know), It is recommended to go to the original link to read the original English version . Don't talk much , Let's read it together !
Link to the original text :《Understanding the Mel Spectrogram》
author :Leland Roberts
Text begins

If you are like me , Try to understand mel spectrum It's not easy . You read an article , But was led to another article … And another one. … And another one. … To move forward . I hope this passage can clear up some confusion , And explain it from the beginning mel spectrum .
The signal
The signal Is a certain amount of change over time . For audio , The amount of change is air pressure . How do we capture this information digitally ? We can collect barometric samples over time . The rate at which we sample data may vary , But the most common is 44.1kHz, namely Per second 44,100 Samples . What we capture is the waveform of the signal , It can be explained using computer software , Modify and analyze .
import librosa
import librosa.display
import matplotlib.pyplot as plt
y, sr = librosa.load('./example_data/blues.00000.wav')
plt.plot(y);
plt.title('Signal');
plt.xlabel('Time (samples)');
plt.ylabel('Amplitude');

This is great. ! We have a digital representation of the audio signal , We can use it . Welcome to signal processing ! You may want to know , How can we extract useful information from it ? It looks like a mess . This is our friend Fourier's use of Wulin Liye .
The Fourier transform
The audio signal consists of several single frequency sound waves . When sampling a signal over time , We only capture the resulting amplitude . Fourier transform is a mathematical formula , It allows us to decompose the signal into its Each frequency and The magnitude of the frequency . let me put it another way , it Convert the signal from time domain to frequency domain . The result is called spectrum .
It's possible , Because each signal can be decomposed into a set of sine wave and cosine wave , These sine waves add up to the original signal . This is a special theorem called Fourier theorem (Fourier’s theorem).
Click on https://youtu.be/UKHBWzoOKsY, If you want a good intuition to explain why this theorem is correct .
There's another one called “3Blue1Brown” Blogger's amazing video about Fourier transform , Link to https://www.youtube.com/watch?v=spUNpyF58BY.
The fast Fourier transform (FFT) It is an algorithm that can effectively calculate Fourier transform . It is widely used in signal processing . I will use this algorithm on the window segment of the sample audio .
import numpy as np
n_fft = 2048
ft = np.abs(librosa.stft(y[:n_fft], hop_length = n_fft+1))
plt.plot(ft);
plt.title('Spectrum');
plt.xlabel('Frequency Bin');
plt.ylabel('Amplitude');

spectrum
The fast Fourier transform It's a powerful tool , It allows us to analyze the frequency components of the signal , however What if the frequency component of our signal changes over time ? Most audio signals ( Such as music and voice ) It's all like this . These signals are called Aperiodic signal . We need a way to represent the spectrum of these signals , Because they change over time . You might think ,“ well , We cannot do this by executing on several window segments of the signal FFT To calculate a few spectra ? Yes ! That's what we did , It is called The short-time Fourier transform . FFT It is calculated on the overlapping window segment of the signal , We got the so-called spectrum . wow ! It takes a lot of absorption . A lot of things happened here . Good visual effects are orderly .
You can think of a spectrum as a stack of FFT. It is a method to visually represent the loudness or amplitude of a signal , Because it changes with time at different frequencies . When calculating the spectrum diagram , There are some additional details behind it .y Axis Convert to Logarithmic scale , Color size Convert to Decibel ( You can think of it as a logarithmic scale of amplitude ). This is because humans can only perceive a very small and concentrated range of frequencies and amplitudes .
spec = np.abs(librosa.stft(y, hop_length=512))
spec = librosa.amplitude_to_db(spec, ref=np.max)
librosa.display.specshow(spec, sr=sr, x_axis='time', y_axis='log');
plt.colorbar(format='%+2.0f dB');
plt.title('Spectrogram');

Viola! Just a few lines of code , We created a spectrum .OK! We're almost there ! We are right. “ spectrum ” Some of them have a solid grasp , however “ Mel ” Well ? Who is he? ?
Mel scale
Studies have shown that , Humans do not perceive frequencies on a linear scale . We are better at detecting differences in lower frequencies ``, Instead of higher frequency differences . for example , We can easily distinguish 500 and 1000 Hz Differences between , But even if the distance between the two pairs is the same , It's hard to tell 10,000 and 10,500 Hz Differences between .
1937 year , Stevens , Volkman and Newman proposed a pitch unit , Making the pitch equal distance sounds as far away to the audience . This is called the mel scale . We perform mathematical operations on frequencies , To convert it into mel scale .

Mel spectrum
mel A spectrum diagram is a spectrum diagram , Where the frequency is converted to mel scale . That's true. , Right ? Who would have thought ? It's amazing , After experiencing all those psychological preparations to try to understand mel After the spectrum , it It only takes a few lines of code to implement .
mel_spect = librosa.feature.melspectrogram(y=y, sr=sr, n_fft=2048, hop_length=1024)
mel_spect = librosa.power_to_db(spect, ref=np.max)
librosa.display.specshow(mel_spect, y_axis='mel', fmax=8000, x_axis='time');
plt.title('Mel Spectrogram');
plt.colorbar(format='%+2.0f dB');

summary
It takes a lot of information , Especially if you are new to signal processing like me . however , If you continue to review the concepts listed in this article ( And spend enough time staring at the corner and thinking about them ), It will begin to make sense ! Let's briefly review what we've done .
- We took samples of air pressure over time , Digital representation of audio signals .
- We use The fast Fourier transform Turn the audio signal on
Mapping from time domain to frequency domain, And perform this operation on the overlapping window segment of the audio signal . - We will
y Axis ( frequency )Convert toLogarithmic scale, takeColor dimension ( The amplitude )Convert toDecibelTo form a spectrum .
We willy Axis ( frequency )Mapping tomel scaleTop to formmel spectrum.
this is it ! It sounds simple , Right ? Um. , Not entirely. , But I hope this article will make mel spectrum Less daunting . It took me a long time to understand it . However , At the end of the day , I found that Mel was not so cold .
( End )
Thank you again for Leland Roberts I wrote this article , It solves some puzzles in the author's mind !
边栏推荐
- [reading papers] sorting out the learning methods of trans series knowledge representation
- librosa | 梅尔谱图最幽默的解释
- Launcher page cut Animation
- [QNX Hypervisor 2.2用户手册]5.4 启动VM
- 基于SSM+MySQL+LayUI+JSP的公共交通运输信息管理系统
- Summary of acl2020 information extraction related papers
- build opencv3.4.16
- Don't know the latest version of kubernetes? An open class to solve all your questions
- Oracle数据库启停
- Usage of JSON extractor and debugger in JMeter
猜你喜欢

面试题_V1.0

AI OPEN DAY---如何通过采用开源技术来优化产品和业务收益

Redis cache penetration, cache breakdown, cache avalanche

vscode+platformIO开发STM32(七)

Interview questions_ V1.0

一文全面解读CKA认证的含金量、详细介绍及考试攻略

Error 1030 (HY000): got error 168 from storage engine

Abnova chicken anti cotton mouse IgG (H & L) secondary antibody (HRP) protocol

如何保证数据库和缓存双写一致性?

C语言:随机输入每位同学中每门学科的成绩,要求每位同学的所有学科平均分从低到高排序求出每位同学的平均成绩后,再进行判断该同学的平均成绩是否达到及格。
随机推荐
Mac NAMP Pro comes with MySQL 5.7 setup SQL_ Model remove only_ FULL_ GROUP_ Invalid by may be caused by:,
After the code cloud creates a warehouse and associates it, the first submission always fails
build opencv3.4.16
Redis cache penetration, cache breakdown, cache avalanche
Ue4/5 impactor on begin overlap and on end overlap trigger simultaneously for resolution
build opencv3.4.16
反射(reflective)
Yyds dry goods inventory solution Huawei machine test: conversion between integer and IP address
@GetMapping,@PostMapping,@ApiOperation,@ApiParam,@ApiImplicitParams
Global and Chinese markets without regenerative dryers 2022-2028: Research Report on technology, participants, trends, market size and share
GRBL学习(八)
Summary and practice of knowledge map construction technology
C语言:随机输入每位同学中每门学科的成绩,要求每位同学的所有学科平均分从低到高排序求出每位同学的平均成绩后,再进行判断该同学的平均成绩是否达到及格。
单例模式详解
七大设计原则
In the NLP field, what are the most practical and effective skills or ideas used in text classification, NER, QA, generation, relationship extraction, etc?
Global and Chinese markets of RFID (radio frequency identification) smart cabinets 2022-2028: Research Report on technology, participants, trends, market size and share
Oracle笔记 之 表空间使用情况查询
适配器模式Adapter
Principle and implementation of instanceof
