当前位置:网站首页>The most popular explanation of librosa 𞓜 Mel spectrum

The most popular explanation of librosa 𞓜 Mel spectrum

2022-06-21 05:32:00 Begonia_ cat

Write it at the front

stay Medium Forum , Read a post about Mel spectrum , The author makes it easy to understand, vivid and humorous , So translate and share . a , It is convenient for me to consult in the future , Second, , Help other confused friends learn together .

Of course , If there are conditions (as you know), It is recommended to go to the original link to read the original English version . Don't talk much , Let's read it together !


Link to the original text :《Understanding the Mel Spectrogram
author :Leland Roberts

Text begins

 Insert picture description here
If you are like me , Try to understand mel spectrum It's not easy . You read an article , But was led to another article … And another one. … And another one. … To move forward . I hope this passage can clear up some confusion , And explain it from the beginning mel spectrum .

The signal

The signal Is a certain amount of change over time . For audio , The amount of change is air pressure . How do we capture this information digitally ? We can collect barometric samples over time . The rate at which we sample data may vary , But the most common is 44.1kHz, namely Per second 44,100 Samples . What we capture is the waveform of the signal , It can be explained using computer software , Modify and analyze .

import librosa
import librosa.display
import matplotlib.pyplot as plt
y, sr = librosa.load('./example_data/blues.00000.wav')
plt.plot(y);
plt.title('Signal');
plt.xlabel('Time (samples)');
plt.ylabel('Amplitude');

 Insert picture description here
This is great. ! We have a digital representation of the audio signal , We can use it . Welcome to signal processing ! You may want to know , How can we extract useful information from it ? It looks like a mess . This is our friend Fourier's use of Wulin Liye .

The Fourier transform

The audio signal consists of several single frequency sound waves . When sampling a signal over time , We only capture the resulting amplitude . Fourier transform is a mathematical formula , It allows us to decompose the signal into its Each frequency and The magnitude of the frequency . let me put it another way , it Convert the signal from time domain to frequency domain . The result is called spectrum .
 Insert picture description here
It's possible , Because each signal can be decomposed into a set of sine wave and cosine wave , These sine waves add up to the original signal . This is a special theorem called Fourier theorem (Fourier’s theorem).

Click on https://youtu.be/UKHBWzoOKsY, If you want a good intuition to explain why this theorem is correct .

There's another one called “3Blue1Brown” Blogger's amazing video about Fourier transform , Link to https://www.youtube.com/watch?v=spUNpyF58BY.
 Insert picture description here

The fast Fourier transform (FFT) It is an algorithm that can effectively calculate Fourier transform . It is widely used in signal processing . I will use this algorithm on the window segment of the sample audio .

import numpy as np
n_fft = 2048
ft = np.abs(librosa.stft(y[:n_fft], hop_length = n_fft+1))
plt.plot(ft);
plt.title('Spectrum');
plt.xlabel('Frequency Bin');
plt.ylabel('Amplitude');

 Insert picture description here

spectrum

The fast Fourier transform It's a powerful tool , It allows us to analyze the frequency components of the signal , however What if the frequency component of our signal changes over time ? Most audio signals ( Such as music and voice ) It's all like this . These signals are called Aperiodic signal . We need a way to represent the spectrum of these signals , Because they change over time . You might think ,“ well , We cannot do this by executing on several window segments of the signal FFT To calculate a few spectra ? Yes ! That's what we did , It is called The short-time Fourier transform . FFT It is calculated on the overlapping window segment of the signal , We got the so-called spectrum . wow ! It takes a lot of absorption . A lot of things happened here . Good visual effects are orderly .
 Insert picture description here
You can think of a spectrum as a stack of FFT. It is a method to visually represent the loudness or amplitude of a signal , Because it changes with time at different frequencies . When calculating the spectrum diagram , There are some additional details behind it .y Axis Convert to Logarithmic scale , Color size Convert to Decibel ( You can think of it as a logarithmic scale of amplitude ). This is because humans can only perceive a very small and concentrated range of frequencies and amplitudes .

spec = np.abs(librosa.stft(y, hop_length=512))
spec = librosa.amplitude_to_db(spec, ref=np.max)
librosa.display.specshow(spec, sr=sr, x_axis='time', y_axis='log');
plt.colorbar(format='%+2.0f dB');
plt.title('Spectrogram');

 Insert picture description here
Viola! Just a few lines of code , We created a spectrum .OK! We're almost there ! We are right. “ spectrum ” Some of them have a solid grasp , however “ Mel ” Well ? Who is he? ?

Mel scale

Studies have shown that , Humans do not perceive frequencies on a linear scale . We are better at detecting differences in lower frequencies ``, Instead of higher frequency differences . for example , We can easily distinguish 500 and 1000 Hz Differences between , But even if the distance between the two pairs is the same , It's hard to tell 10,000 and 10,500 Hz Differences between .

1937 year , Stevens , Volkman and Newman proposed a pitch unit , Making the pitch equal distance sounds as far away to the audience . This is called the mel scale . We perform mathematical operations on frequencies , To convert it into mel scale .

 Insert picture description here

Mel spectrum

mel A spectrum diagram is a spectrum diagram , Where the frequency is converted to mel scale . That's true. , Right ? Who would have thought ? It's amazing , After experiencing all those psychological preparations to try to understand mel After the spectrum , it It only takes a few lines of code to implement .

mel_spect = librosa.feature.melspectrogram(y=y, sr=sr, n_fft=2048, hop_length=1024)
mel_spect = librosa.power_to_db(spect, ref=np.max)
librosa.display.specshow(mel_spect, y_axis='mel', fmax=8000, x_axis='time');
plt.title('Mel Spectrogram');
plt.colorbar(format='%+2.0f dB');

 Insert picture description here

summary

It takes a lot of information , Especially if you are new to signal processing like me . however , If you continue to review the concepts listed in this article ( And spend enough time staring at the corner and thinking about them ), It will begin to make sense ! Let's briefly review what we've done .

  • We took samples of air pressure over time , Digital representation of audio signals .
  • We use The fast Fourier transform Turn the audio signal on Mapping from time domain to frequency domain , And perform this operation on the overlapping window segment of the audio signal .
  • We will y Axis ( frequency ) Convert to Logarithmic scale , take Color dimension ( The amplitude ) Convert to Decibel To form a spectrum .
    We will y Axis ( frequency ) Mapping to mel scale Top to form mel spectrum .

this is it ! It sounds simple , Right ? Um. , Not entirely. , But I hope this article will make mel spectrum Less daunting . It took me a long time to understand it . However , At the end of the day , I found that Mel was not so cold .


( End )
Thank you again for Leland Roberts I wrote this article , It solves some puzzles in the author's mind !

原网站

版权声明
本文为[Begonia_ cat]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/172/202206210526538329.html