当前位置:网站首页>Speech recognition learning summary
Speech recognition learning summary
2022-07-05 08:30:00 【... Manmu mountains and rivers】
Learning summary
After a semester of study , Have a superficial understanding of the direction of speech recognition , Now I am writing this blog to sort out what I have learned , The content may be messy , But write it to yourself , It will be updated continuously in the future , Improve your professional level .
Speech recognition process
1. Traditional speech recognition
First, receive the sound through the microphone , Because sound is a kind of wave , Propagation by vibration , Sound waves will cause the vibration of microphone elements , Produce amplitudes of different sizes , It will also produce different current values , This converts analog signals into digital signals , A one-dimensional sequence signal in time domain , The waveform is drawn in the coordinate axis , Then the computer processes the waveform , Filter out useless information , Extract useful information , And produce a text sequence . The auditory mechanism of human ear is to distinguish sound through the frequency domain of sound , The waveforms produced when the pronunciation is similar may also be very different , Therefore, it is difficult to find the pronunciation rules from the waveform , The required waveform is further processed , Transform the waveform in time domain into the waveform in frequency domain through Fourier transform , Then the frequency domain features are processed , Learn rules from them . Because sound is a short-term stable signal , So in processing , Divide the sound into small segments and deal with them , It's a frame , It can be considered that the state of the sound in this short segment is unchanged . Then recognize these frames into corresponding states , Then several states are combined into a phoneme , Then combine the phonemes into the pronunciation of words , For example, in Chinese speech recognition , Phonemes correspond to the initials and finals of a word , Then predict the corresponding text with the pronunciation of the word , Splice the recognized text into a sentence , It completes one sentence speech recognition .
Usually complete the above traditional speech recognition process , Three independent models are needed , Namely :
1. Acoustic models , Recognize the frame as the corresponding state , Then three states are combined to form a phoneme
2. Articulation model , Combine phonemes into the pronunciation of the corresponding word
3. Language model , Predict the corresponding text according to the pronunciation of the word
These three models are trained independently , The training process is complicated , Therefore, it increases the entry difficulty of speech recognition .
2. End to end speech recognition
In recent years , Thanks to the development of neural network and the improvement of software and hardware technology , It has a large number of phonetic corpora , An end-to-end system . To simplify the network , Directly convert speech into text in a model , So this system is called end-to-end system . The general idea of end-to-end speech recognition , It uses a unified and optimized model to realize speech recognition , Simplify the training process of speech recognition , The input of the model is voice , The output is the corresponding text , The text here can be letters 、 Subwords or words . The main principles of end-to-end speech recognition include the use of CTC、RNN、Attention etc. .
The next task is , Read front end beamformer Code for , And how to prepare multi-channel data , Build a multi-channel speech recognition system baseline.
For the first time to use csdn Write an article ,markdown The user is not proficient , The typesetting is relatively simple , The content written is also relatively small , Continue to study and stick to csdn Write an article , Next time, I will write about the preparation process of multi-channel data .
边栏推荐
- Sql Server的存儲過程詳解
- FIO测试硬盘性能参数和实例详细总结(附源码)
- PIP installation
- Summary of SIM card circuit knowledge
- 实例006:斐波那契数列
- Imx6ull bare metal development learning 2- use C language to light LED indicator
- STM32 summary (HAL Library) - DHT11 temperature sensor (intelligent safety assisted driving system)
- On boost circuit
- Briefly talk about the identification protocol of mobile port -bc1.2
- MATLAB小技巧(28)模糊綜合評價
猜你喜欢
MySQL之MHA高可用集群
[three tier architecture and JDBC summary]
Detailed summary of FIO test hard disk performance parameters and examples (with source code)
My-basic application 2: my-basic installation and operation
Stablq of linked list
Talk about the circuit use of TVs tube
STM32 single chip microcomputer - external interrupt
【云原生 | 从零开始学Kubernetes】三、Kubernetes集群管理工具kubectl
Design a clock frequency division circuit that can be switched arbitrarily
每日一题——替换空格
随机推荐
Briefly talk about the identification protocol of mobile port -bc1.2
Esphone Feixun DC1 soft change access homeassstant
MHA High available Cluster for MySQL
[three tier architecture]
【论文阅读】2022年最新迁移学习综述笔注(Transferability in Deep Learning: A Survey)
Weidongshan Internet of things learning lesson 1
One question per day - replace spaces
OC and OD gate circuit
[trio basic from introduction to mastery tutorial 20] trio calculates the arc center and radius through three points of spatial arc
亿学学堂给的证券账户安不安全?哪里可以开户
Talk about the circuit use of TVs tube
QEMU STM32 vscode debugging environment configuration
PIP installation
Bluetooth hc-05 pairing process and precautions
2020-05-21
Stablq of linked list
STM32 summary (HAL Library) - DHT11 temperature sensor (intelligent safety assisted driving system)
Working principle and type selection of common mode inductor
STM32 outputs 1PPS with adjustable phase
每日一题——替换空格