当前位置：网站首页>Speech recognition Series 1: speech recognition overview

Speech recognition Series 1: speech recognition overview

2022-07-02 23:28:00 【Mr anhydrous】

Term contract ：

Voice recognition ----- VOICE RECOGNITION

speech recognition -----SPEECH RECOGNITION

1 What is voice recognition VOICE RECOGNITION？

Speech or speaker recognition is the ability of a program to recognize people based on their unique voiceprints . It works by scanning the voice and matching it with the desired voice fingerprint . The development of artificial intelligence has opened up broad opportunities for this sub field of Computer Science . It enables us to interact with machines without touching them . It is developing rapidly , Developers are looking for more and more ways to apply it to various fields .

2 Voice recognition （VOICE RECOGNITION） And speech recognition (SPEECH RECOGNITION) What's the difference? ？

It is important to understand the differences between the two disciplines . The purpose of speech recognition is to recognize the speech owner . The purpose of language recognition is to recognize the speaker's words . In the first case , The program needs to compare the speaker's unique voiceprint . In the second case , The program needs a huge dictionary to identify the meaning expression of the speaker .

3 Voice recognition （VOICE RECOGNITION） The type of system

There are two kinds of speech recognition , They are ：

Text correlation —— The system is trained , It can recognize the pre-determined voice password of the speaker ;
Text independence - It does not require a predetermined password . The topic of analysis is conversational speech .

4 The type of speech recognition system

We can use automatic speech recognition （ASR） Divided into different categories . First , It depends on speakers . In this respect , Two types are known , They are ：

It depends on the speaker —— The program is trained to recognize specific sounds , Similar to speech recognition . The speaker must interact with the program “ conversation ” And give the program the ability to analyze sound . Such a system is easier to implement . They provide high accuracy in speech recognition ;
The speaker is independent —— This type of speech recognition software has a wider range of uses . It doesn't need training to analyze sound . The focus is on the speaker's word recognition . A typical example of such a program is IVR System .

Another classification method is based on the way users speak . These categories are ：

Discrete speech recognition ——ASR Applications have used this method since early versions . Т The speaker must pronounce each word separately , Insert a pause between them . Use such a program , It's more difficult to work . Oral frequency is not easy to guarantee ;
Continuous speech recognition —— This is a relatively new ASR Method , More efforts are needed to develop . under these circumstances , The speaker's speaking speed is close to normal .

In the field of artificial intelligence speech recognition , Another technology is well known . It's natural language processing （NLP）. Тhe The task of speech recognition system is to understand words . NLP The task of the system is to understand and answer the speaker . That is to imitate the communication between human and machine . NLP Close to voice / speech recognition , But based on different algorithms .

5 A brief history of speech recognition

The first important step of this technology begins with IBM The Bell Labs in the United States . 1952 year ,IBM Launched Audrey Audrey, This is the first recorded speech recognizer . Audrey is a complete analogy system , You can understand a single number , There is a pause in the middle . Ten years later ,IBM Launched Shoebox, Able to identify 0 To 9 Of 16 English words and numbers . stay 1970 In the early s , The development of this technology has made a leap . This is mainly attributed to the R & D institutions of the U.S. Department of defense DARPA. After five years of research , Carnegie Mellon University was born Harpy. One can understand 1011 A word machine . Besides ,Harpy It is very different from its predecessors . It can understand sentences . 80 s , The vocabulary of speech recognition system has increased to thousands of words . This is mainly due to hidden Markov statistical models . Speech recognition has changed from pattern based digital signal processing to the use of statistical models to predict words from unknown sounds .

Besides , Machines become more accurate in recognizing words . IBM Our speech recognition team is 80 In the mid-s, an experimental transcription system was introduced Tangora. Tangora Able to identify 20000 Word . from 1990 s , With the help of personal computer ,DragonDictate When speech recognition products begin to be used by consumers . In the past 20 years , Many technology giants are engaged in this technology . Later in this article , You will be familiar with their products .

6 How speech recognition works

modern ASR The system is based on three models ： acoustics 、 Pronunciation and language .

Acoustic modeling makes it possible to distinguish speech signals from phonemes （ sound unit ） Make it possible . hidden Markov model (HMM) It is a common acoustic modeling method . Other methods use deep neural networks or Convolutional Neural Networks ;
The pronunciation model defines how to combine phonemes to make words ;
Language modeling is a subject that helps to distinguish words and phrases with the same pronunciation .

After recording voice , The noise is cleared , Useful signals are filtered out of the recording . Т His record is divided into small pieces . after , Each segment passes through the acoustic model . These clips are compared with phonemes , Phoneme is a statistical model originally constructed , Used to describe the pronunciation of each sound in speech . Based on these matches , Collect words from phonemes . Тhe The efficiency of finding words depends largely on the size of the phoneme database prepared in advance .

6.1 Record your voice

On any device , All use microphones for recording . If the device does not , You need to connect a microphone headset or professional microphone . So , You can use pre installed applications , for example Windows 10 Tape recorder on 、Apple Voice memo on the product . There are also a large number of applications with advanced functions . They provide selective record quality 、 The opportunity to save records in bit rate or format . Some are based on artificial intelligence , It can help you get rid of unnecessary noise in the recording .

6.2 register

User registration requires recording the voice of the speaker and extracting unique voiceprints as the first stage of each speaker recognition software . The next stage is verification . Compare the recorded voice with the database of different voices , To find the best match or with a specific voice .

7 Speech recognition tools

If you don't want to build a speech recognition system , You can use various open source tools . Among them is ：

CMU Sphinx—— A speaker independent continuous speech recognition system developed by Carnegie Mellon University . CMU Sphinx Including a set of products designed for different purposes . Can be obtained from GitHub Website download . Besides , You can also find user documentation there . Support many popular programming languages , Such as C/C++、C#、Java、Python;
HTK tool kit —— Toolkit for processing hidden Markov models . It was developed by Machine Intelligence Laboratory at Cambridge University , Mainly used in speech recognition research . It is not completely open source . Users can go to HTK Find information about using this product on the official website . The supported programming language is C and Python;
Kaldi—— This is an open source toolkit for speech recognition and signal processing . The toolkit itself is available from GitHub Repository download . This document can be found on the official website . The supported programming language is C++ and Python.

8 How to use speech recognition

Due to the rapid development of personal computers, smart phones and artificial intelligence , Speech and speech recognition software have entered our daily life . They let us control our equipment through conversation . The first product worth mentioning is virtual assistant . Google and apple are releasing operating systems with built-in virtual assistants . Microsoft has put its virtual assistant Cortana Add to Windows. Smart speaker integrated with Virtual Assistant . Examples of such devices include embedding Alexa Of Amazon Echo And in Siri Running on Apple HomePod. Speech recognition in the call center IVR System 、 In medical devices . It is used in security systems with voice biometrics . Wherever human beings need to interact with machines , This technology will be very helpful .

9 Why is speech recognition good ？

Speech recognition technology improves the working efficiency of users . It can capture human voice much faster than our typing speed . Besides , When your hands are busy with other work , You can talk to your device , Perform two operations at the same time . For disabled people who cannot use their hands , This is essential . They add an extra layer of reliability in terms of security , Because it's not easy to forge unique voiceprints .

10 Advantages and disadvantages of speech recognition

Speech recognition is a relatively new science . It has developed from a simple program that can recognize dozens of words in a single language to a complex system based on artificial intelligence . For decades, , It has made great progress , And begin to solve broader tasks . For all that , There is still a lot of work to be done to improve it . Let's summarize the advantages and disadvantages of it .

10.1 Advantages of speech recognition

Improve the productivity of enterprises ;
Automate the interaction between enterprises and customers ;
Add additional security levels ;
The speed of capturing voice is faster than that of human typing ;
Help the Disabled ;
Help control your home equipment ;
Assist the driver in using the vehicle ASR Systems, etc .

10.2 Disadvantages of speech recognition

If the speaker speaks quickly and clearly , The system will not fully recognize speech ;
Large vocabulary is needed to improve recognition accuracy ;
Each language needs a separate ASR train ;
Enterprises can collect and use users' voice data without their permission ;
High time and financial costs ;
ASR Software consumes a lot of memory and requires a lot of RAM.

11 Application of speech recognition technology

We talked about the widespread use of speech recognition systems . Let's see what applications it has in specific areas .

11.1 Health care

In medicine , Speech recognition is mainly used to write patient documents . There are two different document process methods .

Front end document is the process of translating voice into text in real time . under these circumstances , The system is more likely to make mistakes . The doctor must correct the text . So it's best to take personal notes with it ;
The function of back-end documents is the same , But also attach the speaker's recording to the text . The system provides text drafts , So that doctors can fix mistakes .

11.2 army

In this field , It is mainly used for the command and control of machines and equipment . Voice commands are much faster . In the battle , This can play a key role in winning the battle .

11.3 Educational use

Students can check their pronunciation while learning the language . It can help avoid grammar 、 punctuation mistakes . Writing large texts is less challenging . Students can input large text without feeling tired .

11.4 People with disabilities

Students with disabilities or blind people can write without restrictions . ASR So that they can keep up with the progress of learning .

11.4 Vehicle mounted system

Speech recognition in cars reduces the risk of accidents on the road . Such as dialing 、 Use MP3 There is no need to remove your hand from the steering wheel for operations such as player or radio .

11.5 Voice controlled video games

It can help you learn Games . Players need time to remember the game controls . contrary , They can use voice commands .

12 Different speech recognition （ Virtual Assistant ） Software

Virtual assistant system is quite complex and expensive . The solutions of technology giants mainly dominate the market . Let's get to know them .

APPLE'S SIRI

This personal assistant is only applicable to Apple user . It first appeared in iPhone 4S in , And become new Apple An integral part of the product . Siri Can be in Twitter or Facebook Post on 、 Solve complex mathematical problems 、 Save notes 、 Make a reservation, etc .

AMAZON ALEXA

Amazon is shipping with Alexa The smart speaker . It's on 2013 Made its debut in . And Siri Different , It can be integrated into third-party devices . It can carry out voice interaction 、 Manage online shopping and music playback . It can also control multiple intelligent devices .

MICROSOFT'S CORTANA

It is Microsoft in 2014 Virtual assistant released in , Mainly for Windows Operating system users use , But it also applies to Android and IOS user . Cortana Allows you to manage calendars 、 stay Microsoft Teams Join the meeting 、 Set reminders and open applications on your computer .

GOOGLE ASSISTANT

Google adopt Google Now Started the journey of creating a virtual assistant . This is a function of Google search , Allow users to use voice to search for information . A few years later , Google stopped developing the project , And in 2016 Years issued Google Assistant. It was initially integrated into Google Home Smart speakers and Google Pixel In a smartphone .

NUANCE'S DRAGON ASSISTANT AND DRAGON NATURALLY SPEAKING

Dragon Naturally speak By Nuance Communications Developed speech recognition software . Before this article , We mentioned that Dragon Dictate Applications . these years , It has been improved , It is now called Dragon natural speech . The company also provides personal assistants for personal computers Dragon Assistant.

13 Does speech recognition need training ？

Use a speech recognition system , You don't need a long training course . There is a lot of information about how to enable and use them on the Internet . They can be found on the manufacturer's official website or other platforms . Here are some useful links .

Apple About how to be in MAC Using voice control on . Youtube Video on ;
An article about how to Windows Use voice control and Youtube Video articles on ;
Nuance Online University of communication products .

14 Future uses of speech recognition technology

The future of speech recognition is very promising . ASR The system can not only recognize words , It can also identify a person's emotions . Speech recognition will be applied to aerospace 、 Home automation 、 robot 、 Telematics and video games .