当前位置:网站首页>Speech recognition Series 1: speech recognition overview
Speech recognition Series 1: speech recognition overview
2022-07-02 23:28:00 【Mr anhydrous】

Term contract :
Voice recognition ----- VOICE RECOGNITION
speech recognition -----SPEECH RECOGNITION
1 What is voice recognition VOICE RECOGNITION?
Speech or speaker recognition is the ability of a program to recognize people based on their unique voiceprints . It works by scanning the voice and matching it with the desired voice fingerprint . The development of artificial intelligence has opened up broad opportunities for this sub field of Computer Science . It enables us to interact with machines without touching them . It is developing rapidly , Developers are looking for more and more ways to apply it to various fields .
2 Voice recognition (VOICE RECOGNITION) And speech recognition (SPEECH RECOGNITION) What's the difference? ?
It is important to understand the differences between the two disciplines . The purpose of speech recognition is to recognize the speech owner . The purpose of language recognition is to recognize the speaker's words . In the first case , The program needs to compare the speaker's unique voiceprint . In the second case , The program needs a huge dictionary to identify the meaning expression of the speaker .
3 Voice recognition (VOICE RECOGNITION) The type of system
There are two kinds of speech recognition , They are :
- Text correlation —— The system is trained , It can recognize the pre-determined voice password of the speaker ;
- Text independence - It does not require a predetermined password . The topic of analysis is conversational speech .
4 The type of speech recognition system
We can use automatic speech recognition (ASR) Divided into different categories . First , It depends on speakers . In this respect , Two types are known , They are :
- It depends on the speaker —— The program is trained to recognize specific sounds , Similar to speech recognition . The speaker must interact with the program “ conversation ” And give the program the ability to analyze sound . Such a system is easier to implement . They provide high accuracy in speech recognition ;
- The speaker is independent —— This type of speech recognition software has a wider range of uses . It doesn't need training to analyze sound . The focus is on the speaker's word recognition . A typical example of such a program is IVR System .
Another classification method is based on the way users speak . These categories are :
- Discrete speech recognition ——ASR Applications have used this method since early versions . Т The speaker must pronounce each word separately , Insert a pause between them . Use such a program , It's more difficult to work . Oral frequency is not easy to guarantee ;
- Continuous speech recognition —— This is a relatively new ASR Method , More efforts are needed to develop . under these circumstances , The speaker's speaking speed is close to normal .
In the field of artificial intelligence speech recognition , Another technology is well known . It's natural language processing (NLP). Тhe The task of speech recognition system is to understand words . NLP The task of the system is to understand and answer the speaker . That is to imitate the communication between human and machine . NLP Close to voice / speech recognition , But based on different algorithms .
5 A brief history of speech recognition
The first important step of this technology begins with IBM The Bell Labs in the United States . 1952 year ,IBM Launched Audrey Audrey, This is the first recorded speech recognizer . Audrey is a complete analogy system , You can understand a single number , There is a pause in the middle . Ten years later ,IBM Launched Shoebox, Able to identify 0 To 9 Of 16 English words and numbers . stay 1970 In the early s , The development of this technology has made a leap . This is mainly attributed to the R & D institutions of the U.S. Department of defense DARPA. After five years of research , Carnegie Mellon University was born Harpy. One can understand 1011 A word machine . Besides ,Harpy It is very different from its predecessors . It can understand sentences . 80 s , The vocabulary of speech recognition system has increased to thousands of words . This is mainly due to hidden Markov statistical models . Speech recognition has changed from pattern based digital signal processing to the use of statistical models to predict words from unknown sounds .
Besides , Machines become more accurate in recognizing words . IBM Our speech recognition team is 80 In the mid-s, an experimental transcription system was introduced Tangora. Tangora Able to identify 20000 Word . from 1990 s , With the help of personal computer ,DragonDictate When speech recognition products begin to be used by consumers . In the past 20 years , Many technology giants are engaged in this technology . Later in this article , You will be familiar with their products .
6 How speech recognition works
modern ASR The system is based on three models : acoustics 、 Pronunciation and language .
- Acoustic modeling makes it possible to distinguish speech signals from phonemes ( sound unit ) Make it possible . hidden Markov model (HMM) It is a common acoustic modeling method . Other methods use deep neural networks or Convolutional Neural Networks ;
- The pronunciation model defines how to combine phonemes to make words ;
- Language modeling is a subject that helps to distinguish words and phrases with the same pronunciation .
After recording voice , The noise is cleared , Useful signals are filtered out of the recording . Т His record is divided into small pieces . after , Each segment passes through the acoustic model . These clips are compared with phonemes , Phoneme is a statistical model originally constructed , Used to describe the pronunciation of each sound in speech . Based on these matches , Collect words from phonemes . Тhe The efficiency of finding words depends largely on the size of the phoneme database prepared in advance .
6.1 Record your voice
On any device , All use microphones for recording . If the device does not , You need to connect a microphone headset or professional microphone . So , You can use pre installed applications , for example Windows 10 Tape recorder on 、Apple Voice memo on the product . There are also a large number of applications with advanced functions . They provide selective record quality 、 The opportunity to save records in bit rate or format . Some are based on artificial intelligence , It can help you get rid of unnecessary noise in the recording .
6.2 register
User registration requires recording the voice of the speaker and extracting unique voiceprints as the first stage of each speaker recognition software . The next stage is verification . Compare the recorded voice with the database of different voices , To find the best match or with a specific voice .
7 Speech recognition tools
If you don't want to build a speech recognition system , You can use various open source tools . Among them is :
- CMU Sphinx—— A speaker independent continuous speech recognition system developed by Carnegie Mellon University . CMU Sphinx Including a set of products designed for different purposes . Can be obtained from GitHub Website download . Besides , You can also find user documentation there . Support many popular programming languages , Such as C/C++、C#、Java、Python;
- HTK tool kit —— Toolkit for processing hidden Markov models . It was developed by Machine Intelligence Laboratory at Cambridge University , Mainly used in speech recognition research . It is not completely open source . Users can go to HTK Find information about using this product on the official website . The supported programming language is C and Python;
- Kaldi—— This is an open source toolkit for speech recognition and signal processing . The toolkit itself is available from GitHub Repository download . This document can be found on the official website . The supported programming language is C++ and Python.
8 How to use speech recognition
Due to the rapid development of personal computers, smart phones and artificial intelligence , Speech and speech recognition software have entered our daily life . They let us control our equipment through conversation . The first product worth mentioning is virtual assistant . Google and apple are releasing operating systems with built-in virtual assistants . Microsoft has put its virtual assistant Cortana Add to Windows. Smart speaker integrated with Virtual Assistant . Examples of such devices include embedding Alexa Of Amazon Echo And in Siri Running on Apple HomePod. Speech recognition in the call center IVR System 、 In medical devices . It is used in security systems with voice biometrics . Wherever human beings need to interact with machines , This technology will be very helpful .
9 Why is speech recognition good ?
Speech recognition technology improves the working efficiency of users . It can capture human voice much faster than our typing speed . Besides , When your hands are busy with other work , You can talk to your device , Perform two operations at the same time . For disabled people who cannot use their hands , This is essential . They add an extra layer of reliability in terms of security , Because it's not easy to forge unique voiceprints .
10 Advantages and disadvantages of speech recognition
Speech recognition is a relatively new science . It has developed from a simple program that can recognize dozens of words in a single language to a complex system based on artificial intelligence . For decades, , It has made great progress , And begin to solve broader tasks . For all that , There is still a lot of work to be done to improve it . Let's summarize the advantages and disadvantages of it .
10.1 Advantages of speech recognition
- Improve the productivity of enterprises ;
- Automate the interaction between enterprises and customers ;
- Add additional security levels ;
- The speed of capturing voice is faster than that of human typing ;
- Help the Disabled ;
- Help control your home equipment ;
- Assist the driver in using the vehicle ASR Systems, etc .
10.2 Disadvantages of speech recognition
- If the speaker speaks quickly and clearly , The system will not fully recognize speech ;
- Large vocabulary is needed to improve recognition accuracy ;
- Each language needs a separate ASR train ;
- Enterprises can collect and use users' voice data without their permission ;
- High time and financial costs ;
- ASR Software consumes a lot of memory and requires a lot of RAM.
11 Application of speech recognition technology
We talked about the widespread use of speech recognition systems . Let's see what applications it has in specific areas .
11.1 Health care
In medicine , Speech recognition is mainly used to write patient documents . There are two different document process methods .
Front end document is the process of translating voice into text in real time . under these circumstances , The system is more likely to make mistakes . The doctor must correct the text . So it's best to take personal notes with it ;
The function of back-end documents is the same , But also attach the speaker's recording to the text . The system provides text drafts , So that doctors can fix mistakes .
11.2 army
In this field , It is mainly used for the command and control of machines and equipment . Voice commands are much faster . In the battle , This can play a key role in winning the battle .
11.3 Educational use
Students can check their pronunciation while learning the language . It can help avoid grammar 、 punctuation mistakes . Writing large texts is less challenging . Students can input large text without feeling tired .
11.4 People with disabilities
Students with disabilities or blind people can write without restrictions . ASR So that they can keep up with the progress of learning .
11.4 Vehicle mounted system
Speech recognition in cars reduces the risk of accidents on the road . Such as dialing 、 Use MP3 There is no need to remove your hand from the steering wheel for operations such as player or radio .
11.5 Voice controlled video games
It can help you learn Games . Players need time to remember the game controls . contrary , They can use voice commands .
12 Different speech recognition ( Virtual Assistant ) Software
Virtual assistant system is quite complex and expensive . The solutions of technology giants mainly dominate the market . Let's get to know them .
APPLE'S SIRI
This personal assistant is only applicable to Apple user . It first appeared in iPhone 4S in , And become new Apple An integral part of the product . Siri Can be in Twitter or Facebook Post on 、 Solve complex mathematical problems 、 Save notes 、 Make a reservation, etc .
AMAZON ALEXA
Amazon is shipping with Alexa The smart speaker . It's on 2013 Made its debut in . And Siri Different , It can be integrated into third-party devices . It can carry out voice interaction 、 Manage online shopping and music playback . It can also control multiple intelligent devices .
MICROSOFT'S CORTANA
It is Microsoft in 2014 Virtual assistant released in , Mainly for Windows Operating system users use , But it also applies to Android and IOS user . Cortana Allows you to manage calendars 、 stay Microsoft Teams Join the meeting 、 Set reminders and open applications on your computer .
GOOGLE ASSISTANT
Google adopt Google Now Started the journey of creating a virtual assistant . This is a function of Google search , Allow users to use voice to search for information . A few years later , Google stopped developing the project , And in 2016 Years issued Google Assistant. It was initially integrated into Google Home Smart speakers and Google Pixel In a smartphone .
NUANCE'S DRAGON ASSISTANT AND DRAGON NATURALLY SPEAKING
Dragon Naturally speak By Nuance Communications Developed speech recognition software . Before this article , We mentioned that Dragon Dictate Applications . these years , It has been improved , It is now called Dragon natural speech . The company also provides personal assistants for personal computers Dragon Assistant.
13 Does speech recognition need training ?
Use a speech recognition system , You don't need a long training course . There is a lot of information about how to enable and use them on the Internet . They can be found on the manufacturer's official website or other platforms . Here are some useful links .
- Apple About how to be in MAC Using voice control on . Youtube Video on ;
- An article about how to Windows Use voice control and Youtube Video articles on ;
- Nuance Online University of communication products .
14 Future uses of speech recognition technology
The future of speech recognition is very promising . ASR The system can not only recognize words , It can also identify a person's emotions . Speech recognition will be applied to aerospace 、 Home automation 、 robot 、 Telematics and video games .
Reference article :
What is Voice Recognition? Voice & Speech Recognition Overview — RecFaces
边栏推荐
- Win11麦克风测试在哪里?Win11测试麦克风的方法
- @BindsInstance在Dagger2中怎么使用
- Numerical solution of partial differential equations with MATLAB
- golang中new与make的区别
- 实现BottomNavigationView和Navigation联动
- 内网渗透 | 手把手教你如何进行内网渗透
- VIM interval deletion note
- Eight bit responder [51 single chip microcomputer]
- I've been interviewed. The starting salary is 16K
- SQL advanced syntax
猜你喜欢
![Simple square wave generating circuit [51 single chip microcomputer and 8253a]](/img/fa/83a5a1ef2d8b95923e6084d6ef1fa2.jpg)
Simple square wave generating circuit [51 single chip microcomputer and 8253a]

Writing of head and bottom components of non routing components

一文掌握基于深度学习的人脸表情识别开发(基于PaddlePaddle)

Why does RTOS system use MPU?

【Redis笔记】压缩列表(ziplist)

2022年最新最全软件测试面试题大全

Compose 中的 'ViewPager' 详解 | 开发者说·DTalk

Where is the win11 automatic shutdown setting? Two methods of setting automatic shutdown in win11

Yolox enhanced feature extraction network panet analysis

为什么RTOS系统要使用MPU?
随机推荐
【ML】李宏毅三:梯度下降&分类(高斯分布)
Go basic data type
内网渗透 | 手把手教你如何进行内网渗透
高数有多难?AI 卷到数学圈,高数考试正确率 81%!
All things work together, and I will review oceanbase's practice in government and enterprise industry
[live broadcast appointment] database obcp certification comprehensive upgrade open class
JSON数据传递参数
Doorplate making C language
Redis expiration policy +conf record
Application of containerization technology in embedded field
BBR 遭遇 CUBIC
密码技术---分组密码的模式
Remote connection of raspberry pie by VNC viewer
【STL源码剖析】仿函数(待补充)
用matlab调用vs2015来编译vs工程
【直播预约】数据库OBCP认证全面升级公开课
Redis 过期策略+conf 记录
Submit code process
Start from the bottom structure to learn the customization and testing of FPGA --- Xilinx ROM IP
Alibaba cloud award winning experience: how to use polardb-x