当前位置:网站首页>Monthly report of speech synthesis (TTS) and speech recognition (ASR) papers in June 2022
Monthly report of speech synthesis (TTS) and speech recognition (ASR) papers in June 2022
2022-07-06 19:49:00 【My name is Yongqiang】

The paper statistics are updated once a month , It mainly tracks the development of speech synthesis and speech recognition ( Many articles are sent out after the meeting , But it does not affect the statistics . There are inevitable omissions in the statistical process , Therefore, the statistical results are only for reference . For the statistical list of all articles in the field of speech synthesis, please visit http://yqli.tech/page/tts_paper.html, For the statistics of papers in the field of speech recognition, please visit http://yqli.tech/page/asr_paper.html. Open source voice data query http://yqli.tech/page/data.html.
How to find voice information, please refer to the article https://mp.weixin.qq.com/s/eJcpsfs3OuhrccJ7_BvKOg). Readers can send me a message directly if they have any suggestions , I will constantly revise the statistics . If reproduced , Please indicate the source . Welcome to WeChat official account. : Keep a low profile .
One Speech synthesis
Table 1 gives specific classification instructions .2022 year 6 This month's speech synthesis related articles are 43 piece , comparison 5 The month doubled. See Figure 1, But less than 2021 Year of 54 piece . Table II and figure 2 Is the specific direction of speech synthesis, the situation of the article . This month's article is on acoustic models 、 emotional tts、 Voice conversion and multi language and multi speaker direction are more .
Table 1 Speech synthesis classification description
classification | explain |
front end | Polyphony , rhythm ,g2p wait . |
Acoustic models | From linguistic features to acoustic features ,attention Work , Multi speaker and dual learning |
Vocoder | Waveform generation |
Individualization | Less data , Dirty data application and other adaptive methods |
Multilingual and multi speaker | Multilingual model 、 Multi speaker model |
Singing synthesis | The combination of singing and music |
emotional | Style and emotion |
Multimodal | Mainly collect talking head article |
Voice conversion | be based on GAN Scheme and feature decoupling scheme |
S2S | speech-to-speech |
Other | be based on EEG synthesis , Open source data ,MOS Evaluation and application of speech synthesis |
chart 1 Total number of speech synthesis papers

Table two Distribution of speech synthesis papers
| 1 month | 2 month | 3 month | 4 month | 5 month | 6 month | |
| front end | 2 | 0 | 3 | 0 | 0 | 2 |
| Acoustic models | 4 | 5 | 17 | 8 | 2 | 7 |
| Vocoder | 1 | 5 | 7 | 5 | 3 | 4 |
| Individualization | 1 | 2 | 4 | 3 | 3 | 1 |
| Multilingual | 1 | 1 | 0 | 3 | 0 | 5 |
| Singing synthesis | 5 | 3 | 5 | 2 | 2 | 3 |
| Emotional style | 2 | 2 | 1 | 3 | 2 | 6 |
| Multimodal | 4 | 3 | 2 | 5 | 3 | 3 |
| Voice conversion | 4 | 2 | 11 | 3 | 2 | 6 |
| s2s | 1 | 0 | 2 | 1 | 2 | 0 |
| Other | 2 | 0 | 4 | 12 | 3 | 6 |
chart 2 Histogram of speech synthesis papers distribution

For a list of speech synthesis articles, please visit http://yqli.tech/page/tts_paper.html

2022.06 Month's article

Two Speech recognition
The classification of speech recognition articles is described in Table 3 , Direction speech translation and multimodal There was no statistics in the previous few months , So don't put it in the chart first . chart 3 The total number of speech recognition articles is , This month 55 piece . The research direction of speech recognition is shown in table 4 Sum graph 4, Obviously , Unsupervised learning is still the most popular direction . in addition , Tal open source some data , Especially this open source 500 Hours of mixed Chinese and English data , See https://ai.100tal.com/dataset. Open source data summary is accessible http://yqli.tech/page/data.html.
Table 3 Speech recognition classification description
classification | explain |
general | Including tradition 、 Hybrid speech recognition , And right asr The optimization of the |
ctc | ctc Optimize |
rnn-t | rnn-t The optimization of the |
aed | aed Optimize |
dataset | Open source database |
data aug | Data augmentation |
lm | Language model research |
multilingual | Multi voice system and code-switch |
personal | Less data, adaptive and personalized ASR |
rescoring | Joint scoring of multiple models |
unsupervised | Unsupervised or self supervised learning |
accent ,dialect | Accent and dialect |
other | Other research directions , Including system evaluation criteria and so on |
| robust | Robustness |
| speaker diarization | speaker diarization |
multichannel | Multichannel |
| speech translation | Voice translation |
| multi-modal | Multimodal |
chart 3 Statistics on the number of speech recognition articles ( Company : piece )

surface 4 Distribution of speech recognition research directions
| 1 month | 2 month | 3 month | 4 month | 5 month | 6 month | |
| general | 12 | 10 | 13 | 9 | 6 | 7 |
| ctc | 1 | 0 | 2 | 5 | 1 | 1 |
| rnn-t | 3 | 1 | 2 | 3 | 0 | 2 |
| aed | 1 | 1 | 1 | 1 | 0 | 1 |
| dataset | 3 | 0 | 3 | 2 | 1 | 4 |
| data augmentation | 1 | 1 | 1 | 2 | 2 | 0 |
| lm | 2 | 2 | 4 | 3 | 0 | 3 |
| multilingual | 2 | 1 | 2 | 1 | 2 | 2 |
| personal | adaptation | 0 | 7 | 3 | 1 | 2 | 2 |
| rescoring | 1 | 1 | 2 | 0 | 0 | 2 |
| unsupervised | 2 | 3 | 17 | 19 | 7 | 9 |
| accent | 1 | 0 | 0 | 2 | 2 | 0 |
| multichannel | 0 | 4 | 1 | 1 | 0 | 0 |
| robust | 0 | 0 | 5 | 2 | 2 | 1 |
| other | 6 | 13 | 22 | 13 | 9 | 10 |
| speaker diarization | 0 | 3 | 4 | 5 | 2 | 2 |
| speech translation | - | - | - | - | 6 | 4 |
| multimodal | - | - | - | - | 3 | 5 |
chart 4 Histogram of speech recognition research direction

For a list of speech recognition articles, please visit http://yqli.tech/page/asr_paper.html

2022.06 Specific articles on speech recognition in January

边栏推荐
- Transformer model (pytorch code explanation)
- 系统性详解Redis操作Hash类型数据(带源码分析及测试结果)
- Hudi vs Delta vs Iceberg
- Lick the dog until the last one has nothing (simple DP)
- 学习探索-无缝轮播图
- 测试用里hi
- Microservice architecture debate between radical technologists vs Project conservatives
- 腾讯T2大牛亲自讲解,跳槽薪资翻倍
- 深入浅出,面试突击版
- MySQL information schema learning (II) -- InnoDB table
猜你喜欢

10 schemes to ensure interface data security

思維導圖+源代碼+筆記+項目,字節跳動+京東+360+網易面試題整理

JDBC details

学习探索-无缝轮播图

手把手教你学会js的原型与原型链,猴子都能看懂的教程
![[calculating emotion and thought] floor sweeper, typist, information panic and Oppenheimer](/img/8c/afb90128e7a523bbee4c6c4166363f.png)
[calculating emotion and thought] floor sweeper, typist, information panic and Oppenheimer
算法面试经典100题,Android程序员最新职业规划

Reflection and illegalaccessexception exception during application

MySQL information schema learning (I) -- general table

Leetcode 30. Concatenate substrings of all words
随机推荐
部门树递归实现
(3) Web security | penetration testing | basic knowledge of network security construction, IIS website construction, EXE backdoor generation tool quasar, basic use of
Pay attention to the partners on the recruitment website of fishing! The monitoring system may have set you as "high risk of leaving"
Analysis of rainwater connection
学习探索-使用伪元素清除浮动元素造成的高度坍塌
Appx代码签名指南
小微企业难做账?智能代账小工具快用起来
腾讯T3大牛手把手教你,大厂内部资料
零基础入门PolarDB-X:搭建高可用系统并联动数据大屏
社招面试心得,2022最新Android高频精选面试题分享
语音识别(ASR)论文优选:全球最大的中英混合开源数据TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a Speech
HDU 1026 search pruning problem within the labyrinth of Ignatius and the prince I
Carte de réflexion + code source + notes + projet, saut d'octets + jd + 360 + tri des questions d'entrevue Netease
MySQL information Schema Learning (i) - - General table
logstash高速入口
Chic Lang: attributeerror: partially initialized module 'CV2' has no attribute 'GAPI_ wip_ gst_ GStreamerPipe
Cesium 点击绘制圆形(动态绘制圆形)
POJ3617 Best Cow Line 馋
Blue Bridge Cup microbial proliferation C language
short i =1; I=i+1 and short i=1; Difference of i+=1