当前位置:网站首页>Xiaodu Xiaodu is here!
Xiaodu Xiaodu is here!
2022-08-05 06:15:00 【Chengyun Technology】
What is Intelligent Speech Recognition?
Simply put
Intelligent speech recognition is the human voice signal
The process of converting to text.
We usually come into contact with
Speech recognition, face recognition, OCR, etc.
All belong to the perceptual intelligence in artificial intelligence
Its core function is
Transforms information from the physical world into computer-processable information
Provide the foundation for subsequent cognitive intelligence.
The hierarchy of needs that speech recognition can meet
01Information synchronization between people
The voice information converted into text, due to the lack of time axis constraints, can be obtained by human eyes much faster than ears in the same order of magnitude.
02Search & Semantic Extraction
Using semantic modeling to retrieve words/semantics that are more concerned in some business scenarios, or extract them and record them in a structured way.
03Human Interaction
Interact with machines/virtual assistants in a more natural way, enabling anthropomorphic conversations, manipulating devices, or obtaining answers to questions.
04Data Mining
By clustering data or opening up with various dimensional data systems, value mining can be performed on the semantic data of individuals/populations/specific fields.
Closed Domain Identification
1 Definition:
The recognition range is a pre-specified set of words/words.
The algorithm only performs speech recognition within the set of closed-domain recognition words preset by the developer, and will reject speech outside the range.
2. Product form :
Streaming - Simultaneous acquisition.
3. Typical application scenarios:
Scenarios that do not involve multiple rounds of interaction and multiple semantic statements.
For example, for smart home and TV boxes with simple command interaction, the voice control commands are generally only "open the curtains", "open the central station" and so on.
Open Domain Identification
1. Definition
There is no need to specify a set of recognized words in advance, the algorithm will recognize the entire range of the large set of languages.
2. Product form
1. Streaming upload - synchronous acquisition
The application/software will automatically record the speaker's voice and upload it to the cloud continuously, and the speaker can see the returned text in real time after speaking.
2. Recorded audio file upload - asynchronous acquisition
Audio duration is generally <3/5 hours.The user needs to call the software interface or the hardware platform to pre-record the audio in the specified format, and use the interface provided by the voice cloud service provider to upload the audio. After the upload is complete, the connection can be disconnected.The user obtains the result by polling the voice cloud server or using the callback interface.
3.The recorded audio file is uploaded and obtained synchronously. The audio duration is generally less than <1 minute.Users need to pre-record the audio in the specified format and upload the audio using the interface provided by the voice cloud service provider.
4. Typical application scenarios
1. Mainly in input scenarios, such as input method, real-time subtitles during conferences/court trials.
2. Audio/video subtitle configuration that has been recorded; customer service voice quality inspection and UGC voice content review scenarios with low real-time requirements.
3. As a supplement to the first two, it is suitable for scenarios where the audio recording interface cannot be used to upload real-time audio streams, or the real-time requirements for result acquisition are relatively high.
边栏推荐
- 入门文档01 series按顺序执行
- spark source code - task submission process - 2-YarnClusterApplication
- Apache configure reverse proxy
- 时间复杂度和空间复杂度
- Logical volume creation
- 静态路由
- What impact does CIPU have on the cloud computing industry?
- 用户和用户组管理、文件权限管理
- Mongodb查询分析器解析
- Getting Started 03 Distinguish between development and production environments ("hot update" is performed only in the production environment)
猜你喜欢

Remembering my first CCF-A conference paper | After six rejections, my paper is finally accepted, yay!
![[Day1] (Super detailed steps) Build a soft RAID disk array](/img/40/cda8e5522c2795e03c0d47e8a689f8.png)
[Day1] (Super detailed steps) Build a soft RAID disk array

入门文档05 使用cb()指示当前任务已完成

【Day8】RAID Disk Array
![[Day8] (Super detailed steps) Use LVM to expand capacity](/img/ff/d4f06d8b30148496da64360268cf1b.png)
[Day8] (Super detailed steps) Use LVM to expand capacity

入门文档08 条件插件

VLAN details and experiments

IP packet format (ICMP protocol and ARP protocol)

IP数据包格式(ICMP协议与ARP协议)

硬盘分区和永久挂载
随机推荐
LinkSLA坚持用户第一,打造可持续的运维服务方案
The idea of commonly used shortcut key
入职前,没想到他们玩的这么花
【机器学习】1单变量线性回归
硬盘分区和永久挂载
【Day8】磁盘及磁盘的分区有关知识
[Paper Intensive Reading] The relationship between Precision-Recall and ROC curves
The Servlet to jump to the JSP page, forwarding and redirection
网络不通?服务丢包?看这篇就够了
Introductory document 05-2 use return instructions the current task has been completed
入门文档03 区分开发与生产环境(生产环境才执行‘热更新’)
spark operator - map vs mapPartitions operator
time complexity and space complexity
Getting Started Documentation 12 webserve + Hot Updates
Wechat applet page jump to pass parameters
Getting Started Documentation 10 Resource Mapping
Three modes of vim
千亿IT运维市场,产品要凭实力说话
NIO works is analysed
Getting Started Doc 08 Conditional Plugins