当前位置：网站首页>A new round of competition for speech recognition has started. Will natural dialogue be the next commanding height?

A new round of competition for speech recognition has started. Will natural dialogue be the next commanding height?

2022-08-03 15:32:00 【51CTO】

当前,Global intelligent voice enterprise in reading style voice the word error rate of basic to keep the same level,With the increase of distance between application scenario,More and more companies began to increase in natural dialogue speech recognition technology r&d.

Trillions of huge market

多年来,Speech recognition technology is more and more be taken seriously.It is becoming a with computer、Smart phones and smart devices related to a common part of personal life.

Speech recognition started a new round of competition,Natural dialogue will be the next point？_语音识别

The rapid growth of the voice equipment,Consumer demand for smart devices increase,And the car infotainment system integration,Is a key factor to promote the growth of voice recognition market.另外,Artificial intelligence in the car、Health care and consumer electronic products in the increasingly frequent use of,Increased demand for voice equipment.同时,For smart speaker、消费电子产品、智能可穿戴设备、联网汽车、Smart home and health care equipment such as the rising of voice application demand,Is one of the key driving forces for the speech recognition market.

根据市场研究机构Meticulous Market Research发布的最新报告预测,到2025年,Speech recognition market scale will reach267.9亿美元,从2019年到2025年,将以17.2%的年复合增长率持续增长.

Don't drop the word error rate

众所周知,Speech recognition systems commonly used evaluation criteria is word error rate（Word ErrorRate,WER）,Also known as the word error rate.为了使识别出来的词序列和标准的词序列之间保持一致,需要进行替换（Substitution）、删除（Deletion）或者插入（Insertion）某些词,这些插入、替换或删除的词的总个数,除以标准的词序列中词的总个数的百分比,即为WER.其公式如下：

Speech recognition started a new round of competition,Natural dialogue will be the next point？_人工智能_02

Despite these professional measure,Through frequent use of the intelligent voice products around,Also can clear perception of the speech recognition effect,But if not all of the recognition results are satisfactory.We might as well by two case look at.

案例一 News broadcast speech recognition evaluation

Speech recognition started a new round of competition,Natural dialogue will be the next point？_人工智能_03

数据来源

通过YouTube、CCTVCCTV channel official crawl2019In all news league video content.分12个月,Extracting every month2期,共计24期节目,And extract audio,The total time about12小时.

场景特点

环境：Main body is airtight studio,安静,No background noise.With a small amount of the venue、Outdoor interview pickup.

设备：Professional high fidelity microphone,Equal to the near field,Sound quality is good.

说话人：Subject to a professional announcer,With a small amount of leaders talk,Journalists and their subjects.

说话方式：Subject is reading type,Medium speed,Almost no slip of the tongue、重复、Pause the phenomenon such as accent、方言：无,Very standard mandarin.

内容领域：National politics news.

评测结果

Speech recognition started a new round of competition,Natural dialogue will be the next point？_数据_04

案例二 Devon community crosstalk speech recognition evaluation

Speech recognition started a new round of competition,Natural dialogue will be the next point？_数据_05

数据来源

Through the Devon communityYoutubeSpecial official channels were randomly selected playlists5期,累计约2.5小时.

场景特点

环境：More than for the performance stage,Environment empty,There are reverberation,There is background noise（The audience laughter、掌声、Input etc.）,无背景音乐.

拾音设备：Crosstalk performers in front of vertical microphone or clip microphone、近场.

说话人：郭德纲、于谦、YueYunPeng Devon community such as comedian.

说话方式：Crosstalk characteristic,Double alternate,A medium to fast speed.

方言：Most of mandarin,I have a small amount of imitation dialect fragments

内容领域：娱乐、相声段子.

评测结果

Speech recognition started a new round of competition,Natural dialogue will be the next point？_数据_06

Why so much difference

Compared the above two case,我们不难看出,News broadcast scene features very close to the ideal of the speech recognition scene,Basic can represent the existing Chinese speech recognition system performance ceiling,Word error rate of1%-2%,即100Only one or two words in the word mistake.

Speech recognition started a new round of competition,Natural dialogue will be the next point？_数据_07

然而,More scenes may be closer to the second case,The speaker's pronunciation habits closer to the daily communication behavior,In the voice when they have a lot of liaison、吞音、Pronunciation deformation、Pronunciation is not clear, etc,Include some unconscious“嗯、啊、呃”等,Won't deliberately to control the voice、Pronunciation habits, etc,Combined with the external environment and dialect、The influence of factors such as language,This preference for daily natural dialogue style of speech recognition rate is not very ideal.

If an intelligent voice product requirements to meet the standard news broadcast the voice of the host to give enough recognition result,基本是不可能的.可见,Natural dialogue style of speech recognition result is whether a speech recognition platform the highest standards of good.

Speech recognition started a new round of competition,Natural dialogue will be the next point？_语音识别_08

A statement from the data of solution

好的AINeed better training data.At present, the data hall has20Thousands of hours finished voice data set,其中,Natural dialogue style of voice and data near4万小时,Including Chinese mandarin、方言、英语、日语、韩语、印地语、越南语、阿拉伯语、西班牙语、法语、德语、意大利语等.

Considering the multichannel influence on recognition rate,Mandarin Chinese natural conversational speech data covers the phone、电话、Network, and other types of channel.

Speech recognition started a new round of competition,Natural dialogue will be the next point？_人工智能_09

Hall natural conversational speech data set also covers the seven major dialect area,Pronunciation people from different regions and cities、Age, gender covering the balance.Language aspects include、韩、印地、越南、The Arab Asian languages such as,法、德语、意大利、Spain and other European languages and people from all over the world, such as English conversation,.

Speech recognition started a new round of competition,Natural dialogue will be the next point？_人工智能_10

在采集数据时,No default corpora,Only given subject list,The recording people choose the more interest and familiar with the topic of the dialogue,To ensure conversational speech is easy.

Speech recognition started a new round of competition,Natural dialogue will be the next point？_数据_11

All audio through strict artificial transcription and quality inspection,标注文本内容、Effective sentence time of beginning and ending point、The recording person identification, etc,Other accuracy up to95%以上.

Data of conversational speech finished product data set has a global service100Companies in speech recognition products,Successfully applied to the intelligent customer service、智能会议、Video subtitles to automatically generate such scene.

The era of after the outbreak ofAI赋能

Epidemic changed his life,也改变了我们的生活方式.The application of artificial intelligence technology scene more rich,Also more ground.

Different from previous meeting,2020In the world manufacturing assembly highlights the artificial intelligence technology can assign the widely.In the main link of BBS opening ceremony,The Volkswagen group, chairman of the board of the tees、Whirlpool's global chief executive mark·Than the hazel、Alibaba group chairman of the board zhang、Huawei technologies co., LTD., managing director yu attended the meeting and do relevant speech,Xunfei heard on both sides of real-time bilingual is located in the main screen,Barrier-free communication provides technical support for the international meeting.

Speech recognition started a new round of competition,Natural dialogue will be the next point？_数据_12

Xunfei heard in huawei's riseA.I.New global conference provides natural style of Chinese real-time transfer、German translation、俄语、法语、Korean multi-lingual subtitles services such as.

Speech recognition started a new round of competition,Natural dialogue will be the next point？_语音识别_13

2022年2月10日,CerenceThe wheel, for Japan's pioneer corporation announced that it will（Pioneer Corporation）Provide natural conversational speech recognition technology support.Japanese consumers to drive any type of car,Through the pioneer of intelligent products can bring them a safe and efficient Japanese voice personalized experience.

Speech recognition started a new round of competition,Natural dialogue will be the next point？_人工智能_14

Artificial intelligence is a great historical process,Its starting today,Has ushered in the artificial intelligence scale ground first.未来,随着5GTechnologies such as the synchronous development of,Increasingly rich voice recognition application scenario will also promote different languages、不同肤色、The barrier-free communication between different regions.

Hall natural conversational speech data attached list：

Speech recognition started a new round of competition,Natural dialogue will be the next point？_语音识别_15