当前位置:网站首页>Speech enhancement - spectrum mapping
Speech enhancement - spectrum mapping
2022-06-28 06:24:00 【Salute=】
Catalog
One 、 introduction
The main goal of speech enhancement is to extract pure speech signals from noisy speech signals , In automatic speech recognition 、 The hearing aid has A wide range of applications . Deep speech enhancement methods can be divided into two categories :1) A speech enhancement method based on mapping ; 2) Speech enhancement method based on mask .
Two 、 A speech enhancement method based on mapping
The speech enhancement method based on mapping is divided into different domains ( Time domain / frequency domain ) Handle , It can be divided into two categories :
1) Speech enhancement method based on spectrum mapping : The mapping relationship between noisy speech signal spectrum and clean speech signal spectrum is learned through neural network .
2) End to end speech enhancement methods : The mapping relationship between the time domain waveform of noisy speech signal and the time domain waveform of clean speech signal is learned through neural network .
2.1 Spectrum mapping system model
The spectrum mapping system model is shown in the figure below ,
Speech feature extraction and Time domain reconstruction The specific process is as follows ,
Training phase :
1) Input : The input feature used in this experiment is noisy speech signal Logarithmic amplitude spectrum . It is worth noting that , With reference to the literature [1] Frame expansion technology is adopted , Such as the input 5 Frame log amplitude spectrum data when , The network output is predicted The first 3 Frame log amplitude spectrum data , As shown in the figure below .
2) label : Is the logarithmic amplitude spectrum of a clean speech signal , For example, when entering 5 Frame log amplitude spectrum data , The output is the predicted 3 Frame log amplitude spectrum data .
3) Loss function :MSE Loss function , L Loss = ∥ L ^ − L ∥ 2 2 L_{\text {Loss }}=\|\hat{\mathbf{L}}-\mathbf{L}\|_{2}^{2} LLoss =∥L^−L∥22
remarks : Normalizing the input logarithmic amplitude spectrum can accelerate the convergence of the network , And In this paper, the experimental method is BN Layer normalizes the input features .
3、 ... and 、 experimental analysis
3.1 Experimental data set and parameter setting
Clean voice signals for training :TIMIT-TRAIN in DR1 All clean voice signals ; Clean voice signals used for testing :TIMIT-TEST in DR1 front 10 A clean voice signal ; SNR of synthetic noisy speech signal (dB):[-5, 0, 5, 10]; The noise source used to synthesize noisy speech signals :NoiseX-92 Medium 3 Kind of noise [‘babble’, ‘destroyerengine’, ‘factory1’] .
Parameter setting : Short time Fourier transform length :N_fft = 512, Window length :win_length=512, Window movement :hop_length=128 , Window function :‘hamming’; Training related parameters epoch=30, lr=1e-4, batch_size=16.
3.1 experimental result
3.1.1 Framing parameters (n_expand=3)
Frame expansion parameters n _ e x p a n d = 3 n\_expand=3 n_expand=3, That is, the number of frames input to the network is 2 ∗ n _ e x p a n d + 1 = 7 2*n\_expand+1=7 2∗n_expand+1=7, n _ e x p a n d = 3 n\_expand=3 n_expand=3 At the time of the PESQ Scoring and STOI The values are as follows .


3.1.2 Different framing parameters (n_expand=1, 3, 5, 7)
The influence of frame expansion parameters on the performance of spectrum mapped speech enhancement is discussed :
(1) n_expand=1, 3, 5, 7 when , each snr Under the PESQ Values and STOI value , As shown in the figure below .




【 Conclusion : Under the current experimental conditions ,n_expand=3 The speech enhancement performance of is the best .】
Four 、 reference
[1]An Experimental Study on Speech Enhancement Based on Deep Neural Networks
[2] The blue sky , Peng Chuan , Leeson , Qian Yuxin , Chen Cong , Liu Qiao . be based on RefineNet End to end speech enhancement method [J]. Journal of Automation ,2022,48(02):554-563.
[3] Single channel speech enhancement based on deep learning
[4] Yu Hong, teacher of Ludong University, has a speech enhancement course
[5] Reference code
边栏推荐
- fpm工具安装
- High quality domestic stereo codec cjc8988, pin to pin replaces wm8988
- FPGA - 7 Series FPGA selectio -07- iserdese2 of advanced logic resources
- Idea generates entity classes from database tables
- Some habits of it veterans in the workplace
- Deleting MySQL under Linux
- Yygh-6-wechat login
- socke. IO long connection enables push, version control, and real-time active user statistics
- Error reporting - resolve core JS / modules / es error. cause. JS error
- mac下安装多个版本php并且进行管理
猜你喜欢

报错--解决core-js/modules/es.error.cause.js报错

mac下安装多个版本php并且进行管理

Oracle condition, circular statement

AutoCAD C# 多段线自相交检测

自定义 cube-ui 弹出框dialog支持多个且多种类型的input框

异常处理(一)——空指针和数组索引越界

Openharmony gnawing paper growth plan -- json-rpc

What is the e-commerce conversion rate so abstract?

ROS rviz_satellite功能包可视化GNSS轨迹,卫星地图的使用

Development trend of mobile advertising: Leveraging stock and fine marketing
随机推荐
慢内容广告:品牌增长的长线主义
Teach you how to use UCOS
JDBC学习(一)——实现简单的CRUD操作
CAD二次开发+NetTopologySuite+PGIS 引用多版本DLL问题
Socket. Io long Connection Push, version Control, Real - Time Active user volume Statistics
Freeswitch使用originate转dialplan
Caused by: com.fasterxml.jackson.databind.exc.InvalidDefinitionException: Cannot construct instance
Promotion intégrale et ordre des octets de fin de taille
Some habits of it veterans in the workplace
Camx架构开UMD、KMD log以及dump图的方式
Linked list (I) - remove linked list elements
AutoCAD C polyline self intersection detection
JSP
Online facing such an online world, the only limitation is our imagination
整型提升和大小端字节序
MR-WordCount
MySQL (I) - Installation
Openharmony gnawing paper growth plan -- json-rpc
2 startup, interrupt and system call
Linked list (III) - reverse linked list