当前位置：网站首页>Xiaodu Xiaodu is here!

Xiaodu Xiaodu is here!

2022-08-05 06:15:00 【Chengyun Technology】

What is Intelligent Speech Recognition?

Simply put

Intelligent speech recognition is the human voice signal

The process of converting to text.

We usually come into contact with

Speech recognition, face recognition, OCR, etc.

All belong to the perceptual intelligence in artificial intelligence

Its core function is

Transforms information from the physical world into computer-processable information

Provide the foundation for subsequent cognitive intelligence.

The hierarchy of needs that speech recognition can meet

01Information synchronization between people

The voice information converted into text, due to the lack of time axis constraints, can be obtained by human eyes much faster than ears in the same order of magnitude.

02Search & Semantic Extraction

Using semantic modeling to retrieve words/semantics that are more concerned in some business scenarios, or extract them and record them in a structured way.

03Human Interaction

Interact with machines/virtual assistants in a more natural way, enabling anthropomorphic conversations, manipulating devices, or obtaining answers to questions.

04Data Mining

By clustering data or opening up with various dimensional data systems, value mining can be performed on the semantic data of individuals/populations/specific fields.

Closed Domain Identification

1 Definition:

The recognition range is a pre-specified set of words/words.

The algorithm only performs speech recognition within the set of closed-domain recognition words preset by the developer, and will reject speech outside the range.

2. Product form :

Streaming - Simultaneous acquisition.

3. Typical application scenarios:

Scenarios that do not involve multiple rounds of interaction and multiple semantic statements.

For example, for smart home and TV boxes with simple command interaction, the voice control commands are generally only "open the curtains", "open the central station" and so on.

Open Domain Identification

1. Definition

There is no need to specify a set of recognized words in advance, the algorithm will recognize the entire range of the large set of languages.

2. Product form

1. Streaming upload - synchronous acquisition

The application/software will automatically record the speaker's voice and upload it to the cloud continuously, and the speaker can see the returned text in real time after speaking.

2. Recorded audio file upload - asynchronous acquisition

Audio duration is generally <3/5 hours.The user needs to call the software interface or the hardware platform to pre-record the audio in the specified format, and use the interface provided by the voice cloud service provider to upload the audio. After the upload is complete, the connection can be disconnected.The user obtains the result by polling the voice cloud server or using the callback interface.

3.The recorded audio file is uploaded and obtained synchronously. The audio duration is generally less than <1 minute.Users need to pre-record the audio in the specified format and upload the audio using the interface provided by the voice cloud service provider.

4. Typical application scenarios

1. Mainly in input scenarios, such as input method, real-time subtitles during conferences/court trials.

2. Audio/video subtitle configuration that has been recorded; customer service voice quality inspection and UGC voice content review scenarios with low real-time requirements.

3. As a supplement to the first two, it is suitable for scenarios where the audio recording interface cannot be used to upload real-time audio streams, or the real-time requirements for result acquisition are relatively high.

原网站

版权声明
本文为[Chengyun Technology]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/217/202208050514264964.html

当前位置：网站首页>Xiaodu Xiaodu is here!

Xiaodu Xiaodu is here!

边栏推荐

猜你喜欢

随机推荐