当前位置:网站首页>Scheme sharing | experts gather to jointly explore accent AI speech recognition
Scheme sharing | experts gather to jointly explore accent AI speech recognition
2022-07-28 03:09:00 【Magic Data】

7 month 6 Day and 14 Japan ,“Magichub Heavy accent dialogue ASR challenge round ” The live broadcast of online awards and award-winning scheme sharing has come to a successful conclusion . Two live sharing activities of technical dry goods attracted AI Algorithm engineer 600+, The number of interactions exceeds 5000+.
In the live broadcast , In addition to the program sharing of the award-winning team representatives , Also invite Magic Data Founder and CEO Zhang Qingqing 、 A new generation of Xiaomi Technology Kaldi The team also brings wonderful theme sharing online , Explore together AI Technology and trend in the field of speech recognition .
RAMC Open source datasets
MagicData-RAMC Introduction to open source datasets
First ,Magic Data Founder and CEO Zhang Qingqing bring 《MagicData-RAMC Introduction to open source datasets 》 Share the theme of . With the rapid development of artificial intelligence industry , The demand for natural dialogue speech recognition is growing , In recent years, the research of conversational speech recognition faces many challenges . This competition is based on Magic Data Open source 180 Hours MagicData-RAMC Data sets , The data is recorded in the real scene , Collect spontaneous conversations between people , Capture natural language phenomena . meanwhile , There is no script preset for the conversation , The topic is natural and rich , common 351 Group multiple rounds of dialogue , Each group of dialogues revolves around only one theme . Besides , The gender and geographical distribution of collectors are balanced , It is suitable for speech recognition with accent .


The paper on this data set has been heavily reviewed , Has been the voice of the top conference INTERSPEECH 2022 Included . This year, 9 month Magic Data Will also participate in the event as a silver sponsor , Strengthen cooperation between industry and academia , Support the smooth progress of the meeting .

Data download : MagicData-RAMC Conversational Speech Dataset - MagicHub
The paper : https://arxiv.org/abs/2203.16844
The baseline : https://github.com/MagicHub-io/MagicData-RAMC-Challenge
Magichub Open source communities explore more : MagicHub - Datasets Download | Open-Source Datasets
The champion team : Xiaomi Tech
be based on Conformer End to end model accent Mandarin speech recognition
MITC The team consists of Xiaomi AI Chen Junjie from the laboratory shared . The team used the same online voice service based on Hybrid CTC/attention Structural conformer End to end modeling approach . The team made a comprehensive analysis of the competition data , And targeted data expansion according to data characteristics , Based on kalid Data expansion method , And try the way of personalized speech synthesis . Because the data of this competition belongs to the field of oral dialogue , Therefore, the experience of algorithm optimization used in previous products is used for reference , Good experimental results are achieved in a short time . In addition, in the final model decoding , Use k2 Provided TLG and attention rescore Mode of decoding , It provides an important guarantee for the team to finally win the first place .


Second prize team : flush & Tianjin University
RoyalFlush-CCA Heavy accent dialogue ASR Introduction of the plan
Composed of flush and Tianjin University RoyalFlush-CCA The team was shared by song Tongtong of Tianjin University . The team uses WeNet Build the model , use Conformer And Bi-Decoder Model structure . Speed disturbance and noise disturbance are carried out on the data . Adopted in decoding Decoder The method of re scoring , In addition, join Transformer Language model Shallow Fusion To assist in decoding . Because it involves low resource model adaptation , Using low resource data to fine tune the whole model is easy to produce over fitting, resulting in the decline of model generalization ability , So we introduce Adapter technology [1] To solve this problem . First, the whole model is trained on Mandarin and accent data , The fine-tuning stage only trains Adapter Parameters , While the performance of the model is improved, the training time required in the fine-tuning stage is greatly reduced . Finally, several different models are adopted ROVER Technology for system integration to get the final result .


[1] Chen, S., Ge, C., Tong, Z., Wang, J., Song, Y., Wang, J., & Luo, P. (2022). AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition. ArXiv, abs/2205.13535.
Third prize team : NetEase has a way
be based on ESPNet Summary of heavy accent recognition task technology
NetEase has a way DAO The team will share on behalf of gaoshengzhou . The team is mainly based on Hybrid CTC/Attention frame , This task is completed by merging Mandarin data with accent data . use CTC Attention Joint Decoding Mode of decoding , take decoder attention score Add to CTC Prefix Beam Search in . Combined with data enhancement , Model averaging and other methods further improve the robustness and accuracy of the model .

Third prize team : China Mobile Online
be based on wenet Application of end-to-end technology in accent recognition
China Mobile Online AIzyzx The team is shared by Ren Yuling, the team representative . The team's heavy accent speech recognition scheme is based on wenet Design , The model mainly includes three parts , They are shared Encoder、CTC decode device 、Attention Decoder. among ,Attention Decoder use U2++ structure . In order to enrich the speech features of training data and the anti noise ability of speech recognition , Data preprocessing adds sonic 、 Volume disturbance and spectral masking . The dictionary is organized according to common Chinese characters and English words 5967 That's ok , The word mark is not registered during word segmentation <unk>. Use during training CTC and Attention Loss Joint optimization , And through dynamic chunk Training skills , send Shared Encoder Can handle any size chunk. When decoding, first use CTC Decoder Produce multiple candidate results with the highest scores , Reuse Attention Decoder Test the candidate results Rescoring, And choose the result with the highest weighted score after re scoring as the final recognition result .


k2 Share : millet AI laboratory
k2 The core algorithm and its application share
In the live broadcast , We also invited Xiaomi AI Kang Wei of the Laboratory 《k2 Core algorithm —— The principle and application of differentiable finite state automata 》 Theme sharing , And to the team in RNN-T Introduce the research progress on the model .
The first part , Kang Wei introduced in detail k2 in FSA Characteristics and functions of , And through a simple CTC The modeling example illustrates the principle of sequence modeling using differentiable finite state automata , Finally, it explains the use k2 Framework for efficient decoding . The second part , Mainly around the team RNN-T Optimization and improvement in training and decoding , For example, the team proposed Pruned RNN-T Loss function , send RNN-T The training speed of the model has been greatly improved , meanwhile , The team's self-study is based on GPU Of RNN-T The parallel decoding method also makes RNN-T The deployment of class models is more efficient . Last , Kang Wei shared the team's experience in RNN-T A series of explorations and evolutions made on the model , Experimental results show that ,RNN-T The model has achieved the best results in the industry on all major open source datasets .

边栏推荐
- Docker高级篇-Docker容器内Redis集群配置
- Is it you who are not suitable for learning programming?
- Note that these regions cannot take the NPDP exam in July
- app 自动化 环境搭建(一)
- 行业洞察 | 语音识别真的超过人耳朵了吗?
- Arm32进行远程调试
- Pytest the best testing framework
- 注意,这些地区不能参加7月NPDP考试
- Vscode debug displays multiple columns of data
- Data Lake: each module component
猜你喜欢

CNN训练循环重构——超参数测试 | PyTorch系列(二十八)

Data Lake: flume, a massive log collection engine

Web服务器

app 自动化 环境搭建(一)

Unexpected harvest of epic distributed resources, from basic to advanced are full of dry goods, big guys are strong!

Kubernetes-----介绍

四、固态硬盘存储技术的分析(论文)

R 笔记 MICE

MySQL index learning
![Trivy [1] tool scanning application](/img/b1/c05949f9379fcde658da64f3a0157a.png)
Trivy [1] tool scanning application
随机推荐
clientY vs pageY
满满干货赶紧进来!!!轻松掌握C语言中的函数
Trivy [1] tool scanning application
style=“width: ___“ VS width=“___“
Pychart shortcut key for quickly modifying all the same names on the whole page
[elm classification] classification of UCI data sets based on nuclear limit learning machine and limit learning machine, with matlab code
CNN中的混淆矩阵 | PyTorch系列(二十三)
Ah Han's story
分布式事务——Senta(一)
数据中台夯实数据基础
牛客-TOP101-BM340
WEB安全基础 - - -命令执行漏洞
Day 8 of DL
Intelligent industrial design software company Tianfu C round financing of hundreds of millions of yuan
The applet has obtained the total records and user locations in the database collection. How to use aggregate.geonear to arrange the longitude and latitude from near to far?
stm32F407-------DSP学习
Pytest the best testing framework
分布式 session 的4个解决方案,你觉得哪个最好?
注意,这些地区不能参加7月NPDP考试
GAMES101复习:光线追踪(Ray Tracing)