当前位置:网站首页>A new round of competition for speech recognition has started. Will natural dialogue be the next commanding height?
A new round of competition for speech recognition has started. Will natural dialogue be the next commanding height?
2022-08-03 15:32:00 【51CTO】
当前,Global intelligent voice enterprise in reading style voice the word error rate of basic to keep the same level,With the increase of distance between application scenario,More and more companies began to increase in natural dialogue speech recognition technology r&d.
Trillions of huge market
多年来,Speech recognition technology is more and more be taken seriously.It is becoming a with computer、Smart phones and smart devices related to a common part of personal life.
The rapid growth of the voice equipment,Consumer demand for smart devices increase,And the car infotainment system integration,Is a key factor to promote the growth of voice recognition market.另外,Artificial intelligence in the car、Health care and consumer electronic products in the increasingly frequent use of,Increased demand for voice equipment.同时,For smart speaker、消费电子产品、智能可穿戴设备、联网汽车、Smart home and health care equipment such as the rising of voice application demand,Is one of the key driving forces for the speech recognition market.
根据市场研究机构Meticulous Market Research发布的最新报告预测,到2025年,Speech recognition market scale will reach267.9亿美元,从2019年到2025年,将以17.2%的年复合增长率持续增长.
Don't drop the word error rate
众所周知,Speech recognition systems commonly used evaluation criteria is word error rate(Word ErrorRate,WER),Also known as the word error rate.为了使识别出来的词序列和标准的词序列之间保持一致,需要进行替换(Substitution)、删除(Deletion)或者插入(Insertion)某些词,这些插入、替换或删除的词的总个数,除以标准的词序列中词的总个数的百分比,即为WER.其公式如下:
Despite these professional measure,Through frequent use of the intelligent voice products around,Also can clear perception of the speech recognition effect,But if not all of the recognition results are satisfactory.We might as well by two case look at.
案例一 News broadcast speech recognition evaluation
数据来源
通过YouTube、CCTVCCTV channel official crawl2019In all news league video content.分12个月,Extracting every month2期,共计24期节目,And extract audio,The total time about12小时.
场景特点
环境:Main body is airtight studio,安静,No background noise.With a small amount of the venue、Outdoor interview pickup.
设备:Professional high fidelity microphone,Equal to the near field,Sound quality is good.
说话人:Subject to a professional announcer,With a small amount of leaders talk,Journalists and their subjects.
说话方式:Subject is reading type,Medium speed,Almost no slip of the tongue、重复、Pause the phenomenon such as accent、方言:无,Very standard mandarin.
内容领域:National politics news.
评测结果
案例二 Devon community crosstalk speech recognition evaluation
数据来源
Through the Devon communityYoutubeSpecial official channels were randomly selected playlists5期,累计约2.5小时.
场景特点
环境:More than for the performance stage,Environment empty,There are reverberation,There is background noise(The audience laughter、掌声、Input etc.),无背景音乐.
拾音设备:Crosstalk performers in front of vertical microphone or clip microphone、近场.
说话人:郭德纲、于谦、YueYunPeng Devon community such as comedian.
说话方式:Crosstalk characteristic,Double alternate,A medium to fast speed.
方言:Most of mandarin,I have a small amount of imitation dialect fragments
内容领域:娱乐、相声段子.
评测结果
Why so much difference
Compared the above two case,我们不难看出,News broadcast scene features very close to the ideal of the speech recognition scene,Basic can represent the existing Chinese speech recognition system performance ceiling,Word error rate of1%-2%,即100Only one or two words in the word mistake.
然而,More scenes may be closer to the second case,The speaker's pronunciation habits closer to the daily communication behavior,In the voice when they have a lot of liaison、吞音、Pronunciation deformation、Pronunciation is not clear, etc,Include some unconscious“嗯、啊、呃”等,Won't deliberately to control the voice、Pronunciation habits, etc,Combined with the external environment and dialect、The influence of factors such as language,This preference for daily natural dialogue style of speech recognition rate is not very ideal.
If an intelligent voice product requirements to meet the standard news broadcast the voice of the host to give enough recognition result,基本是不可能的.可见,Natural dialogue style of speech recognition result is whether a speech recognition platform the highest standards of good.
A statement from the data of solution
好的AINeed better training data.At present, the data hall has20Thousands of hours finished voice data set,其中,Natural dialogue style of voice and data near4万小时,Including Chinese mandarin、方言、英语、日语、韩语、印地语、越南语、阿拉伯语、西班牙语、法语、德语、意大利语等.
Considering the multichannel influence on recognition rate,Mandarin Chinese natural conversational speech data covers the phone、电话、Network, and other types of channel.
Hall natural conversational speech data set also covers the seven major dialect area,Pronunciation people from different regions and cities、Age, gender covering the balance.Language aspects include、韩、印地、越南、The Arab Asian languages such as,法、德语、意大利、Spain and other European languages and people from all over the world, such as English conversation,.
在采集数据时,No default corpora,Only given subject list,The recording people choose the more interest and familiar with the topic of the dialogue,To ensure conversational speech is easy.
All audio through strict artificial transcription and quality inspection,标注文本内容、Effective sentence time of beginning and ending point、The recording person identification, etc,Other accuracy up to95%以上.
Data of conversational speech finished product data set has a global service100Companies in speech recognition products,Successfully applied to the intelligent customer service、智能会议、Video subtitles to automatically generate such scene.
The era of after the outbreak ofAI赋能
Epidemic changed his life,也改变了我们的生活方式.The application of artificial intelligence technology scene more rich,Also more ground.
Different from previous meeting,2020In the world manufacturing assembly highlights the artificial intelligence technology can assign the widely.In the main link of BBS opening ceremony,The Volkswagen group, chairman of the board of the tees、Whirlpool's global chief executive mark·Than the hazel、Alibaba group chairman of the board zhang、Huawei technologies co., LTD., managing director yu attended the meeting and do relevant speech,Xunfei heard on both sides of real-time bilingual is located in the main screen,Barrier-free communication provides technical support for the international meeting.
Xunfei heard in huawei's riseA.I.New global conference provides natural style of Chinese real-time transfer、German translation、俄语、法语、Korean multi-lingual subtitles services such as.
2022年2月10日,CerenceThe wheel, for Japan's pioneer corporation announced that it will(Pioneer Corporation)Provide natural conversational speech recognition technology support.Japanese consumers to drive any type of car,Through the pioneer of intelligent products can bring them a safe and efficient Japanese voice personalized experience.
Artificial intelligence is a great historical process,Its starting today,Has ushered in the artificial intelligence scale ground first.未来,随着5GTechnologies such as the synchronous development of,Increasingly rich voice recognition application scenario will also promote different languages、不同肤色、The barrier-free communication between different regions.
Hall natural conversational speech data attached list:
边栏推荐
- 一次做数据报表的踩坑经历,让我领略了数据同步增量和全量的区别
- 2021年12月电子学会图形化四级编程题解析含答案:新冠疫苗接种系统
- 程序员面试必备PHP基础面试题 – 第十九天
- 生物统计师与临床医生协同研究使用的低代码洞察平台丨数据科学 x 临床医学
- How to play deep paging with hundreds of millions of data?Compatible with MySQL + ES + MongoDB
- 随笔-Unity中一个简易的Spine动画控制器
- 程序员面试必备PHP基础面试题 – 第二十一天
- 问题1:批量测试(正式测试)之前应该怎么做?
- 神经网络,凉了?
- 并发编程的核心问题
猜你喜欢
Internship Road: Documenting Confusion in My First Internship Project
Deep Learning - Install CUDA and CUDNN to implement GPU operation of tensorflow
一通骚操作,我把SQL执行效率提高了10000000倍!
2021年12月电子学会图形化二级编程题解析含答案:绘制多边形
测试基础整合-测试分类、软件质量模型、测试流程、测试用例、测试点划分方法、缺陷、例子
身为售后工程师的我还是觉得软件测试香,转行成功定薪11.5K,特来分享下经验。
扫雷?拿来吧你(递归展开+坐标标记)
一次做数据报表的踩坑经历,让我领略了数据同步增量和全量的区别
8月份加密市场的三个关键预期 价格虽向北移动?预计仍将处于动荡之中
Ark server opening tutorial win
随机推荐
Several methods of installing Mysql in Linux
liunx服务器遇到SYN_SENT洪水攻击
Taurus.MVC WebAPI 入门开发教程1:框架下载环境配置与运行(含系列目录)。
兔起鹘落全端涵盖,Go lang1.18入门精炼教程,由白丁入鸿儒,全平台(Sublime 4)Go lang开发环境搭建EP00
有希望就是好的
Ark server opening tutorial win
Use Typora+EasyBlogImageForTypora to write a blog and upload pictures quickly without a picture bed
STM32H743VIT6配置ADC为1M采样率
夜神浏览器fiddler抓包
2021年12月电子学会图形化四级编程题解析含答案:森林运动会
自定SvgIcon公用组件
R7 6800H+RTX3050+120Hz 2.8K OLED screen, Intrepid Pro15 2022 pre-sale
在北极都可以穿短袖了,温度飙升至32.5℃
【重构map】【重构filter】【重构Some】【重构reduce方法】【重构flat函数】
HDU 1027 Ignatius and the Princess II(求由1-n组成按字典序排序的第m个序列)
PHP高级面试题 - 第二天
问题4:什么是缺陷?你们公司缺陷的优先级是怎样划分的?
力扣1206. 设计跳表--SkipList跳表是怎么跳的?
sql注入之盲注(纯原创)
MySQL中的基数是啥?