当前位置:网站首页>Monthly report of speech synthesis (TTS) and speech recognition (ASR) papers in June 2022
Monthly report of speech synthesis (TTS) and speech recognition (ASR) papers in June 2022
2022-07-06 19:49:00 【My name is Yongqiang】

The paper statistics are updated once a month , It mainly tracks the development of speech synthesis and speech recognition ( Many articles are sent out after the meeting , But it does not affect the statistics . There are inevitable omissions in the statistical process , Therefore, the statistical results are only for reference . For the statistical list of all articles in the field of speech synthesis, please visit http://yqli.tech/page/tts_paper.html, For the statistics of papers in the field of speech recognition, please visit http://yqli.tech/page/asr_paper.html. Open source voice data query http://yqli.tech/page/data.html.
How to find voice information, please refer to the article https://mp.weixin.qq.com/s/eJcpsfs3OuhrccJ7_BvKOg). Readers can send me a message directly if they have any suggestions , I will constantly revise the statistics . If reproduced , Please indicate the source . Welcome to WeChat official account. : Keep a low profile .
One Speech synthesis
Table 1 gives specific classification instructions .2022 year 6 This month's speech synthesis related articles are 43 piece , comparison 5 The month doubled. See Figure 1, But less than 2021 Year of 54 piece . Table II and figure 2 Is the specific direction of speech synthesis, the situation of the article . This month's article is on acoustic models 、 emotional tts、 Voice conversion and multi language and multi speaker direction are more .
Table 1 Speech synthesis classification description
classification | explain |
front end | Polyphony , rhythm ,g2p wait . |
Acoustic models | From linguistic features to acoustic features ,attention Work , Multi speaker and dual learning |
Vocoder | Waveform generation |
Individualization | Less data , Dirty data application and other adaptive methods |
Multilingual and multi speaker | Multilingual model 、 Multi speaker model |
Singing synthesis | The combination of singing and music |
emotional | Style and emotion |
Multimodal | Mainly collect talking head article |
Voice conversion | be based on GAN Scheme and feature decoupling scheme |
S2S | speech-to-speech |
Other | be based on EEG synthesis , Open source data ,MOS Evaluation and application of speech synthesis |
chart 1 Total number of speech synthesis papers

Table two Distribution of speech synthesis papers
| 1 month | 2 month | 3 month | 4 month | 5 month | 6 month | |
| front end | 2 | 0 | 3 | 0 | 0 | 2 |
| Acoustic models | 4 | 5 | 17 | 8 | 2 | 7 |
| Vocoder | 1 | 5 | 7 | 5 | 3 | 4 |
| Individualization | 1 | 2 | 4 | 3 | 3 | 1 |
| Multilingual | 1 | 1 | 0 | 3 | 0 | 5 |
| Singing synthesis | 5 | 3 | 5 | 2 | 2 | 3 |
| Emotional style | 2 | 2 | 1 | 3 | 2 | 6 |
| Multimodal | 4 | 3 | 2 | 5 | 3 | 3 |
| Voice conversion | 4 | 2 | 11 | 3 | 2 | 6 |
| s2s | 1 | 0 | 2 | 1 | 2 | 0 |
| Other | 2 | 0 | 4 | 12 | 3 | 6 |
chart 2 Histogram of speech synthesis papers distribution

For a list of speech synthesis articles, please visit http://yqli.tech/page/tts_paper.html

2022.06 Month's article

Two Speech recognition
The classification of speech recognition articles is described in Table 3 , Direction speech translation and multimodal There was no statistics in the previous few months , So don't put it in the chart first . chart 3 The total number of speech recognition articles is , This month 55 piece . The research direction of speech recognition is shown in table 4 Sum graph 4, Obviously , Unsupervised learning is still the most popular direction . in addition , Tal open source some data , Especially this open source 500 Hours of mixed Chinese and English data , See https://ai.100tal.com/dataset. Open source data summary is accessible http://yqli.tech/page/data.html.
Table 3 Speech recognition classification description
classification | explain |
general | Including tradition 、 Hybrid speech recognition , And right asr The optimization of the |
ctc | ctc Optimize |
rnn-t | rnn-t The optimization of the |
aed | aed Optimize |
dataset | Open source database |
data aug | Data augmentation |
lm | Language model research |
multilingual | Multi voice system and code-switch |
personal | Less data, adaptive and personalized ASR |
rescoring | Joint scoring of multiple models |
unsupervised | Unsupervised or self supervised learning |
accent ,dialect | Accent and dialect |
other | Other research directions , Including system evaluation criteria and so on |
| robust | Robustness |
| speaker diarization | speaker diarization |
multichannel | Multichannel |
| speech translation | Voice translation |
| multi-modal | Multimodal |
chart 3 Statistics on the number of speech recognition articles ( Company : piece )

surface 4 Distribution of speech recognition research directions
| 1 month | 2 month | 3 month | 4 month | 5 month | 6 month | |
| general | 12 | 10 | 13 | 9 | 6 | 7 |
| ctc | 1 | 0 | 2 | 5 | 1 | 1 |
| rnn-t | 3 | 1 | 2 | 3 | 0 | 2 |
| aed | 1 | 1 | 1 | 1 | 0 | 1 |
| dataset | 3 | 0 | 3 | 2 | 1 | 4 |
| data augmentation | 1 | 1 | 1 | 2 | 2 | 0 |
| lm | 2 | 2 | 4 | 3 | 0 | 3 |
| multilingual | 2 | 1 | 2 | 1 | 2 | 2 |
| personal | adaptation | 0 | 7 | 3 | 1 | 2 | 2 |
| rescoring | 1 | 1 | 2 | 0 | 0 | 2 |
| unsupervised | 2 | 3 | 17 | 19 | 7 | 9 |
| accent | 1 | 0 | 0 | 2 | 2 | 0 |
| multichannel | 0 | 4 | 1 | 1 | 0 | 0 |
| robust | 0 | 0 | 5 | 2 | 2 | 1 |
| other | 6 | 13 | 22 | 13 | 9 | 10 |
| speaker diarization | 0 | 3 | 4 | 5 | 2 | 2 |
| speech translation | - | - | - | - | 6 | 4 |
| multimodal | - | - | - | - | 3 | 5 |
chart 4 Histogram of speech recognition research direction

For a list of speech recognition articles, please visit http://yqli.tech/page/asr_paper.html

2022.06 Specific articles on speech recognition in January

边栏推荐
- 小微企业难做账?智能代账小工具快用起来
- 【翻译】Linkerd在欧洲和北美的采用率超过了Istio,2021年增长118%。
- 【计算情与思】扫地僧、打字员、信息恐慌与奥本海默
- js实现力扣71题简化路径
- leetcode先刷_Maximum Subarray
- 面试突击63:MySQL 中如何去重?
- Hudi vs Delta vs Iceberg
- Lick the dog until the last one has nothing (simple DP)
- Chic Lang: attributeerror: partially initialized module 'CV2' has no attribute 'GAPI_ wip_ gst_ GStreamerPipe
- 精彩编码 【进制转换】
猜你喜欢

企业精益管理体系介绍

利用 clip-path 绘制不规则的图形

Blue Bridge Cup microbial proliferation C language

LeetCode_ Double pointer_ Medium_ 61. rotating linked list

10 schemes to ensure interface data security

Mysql Information Schema 学习(一)--通用表

Phoenix Architecture 3 - transaction processing
![[玩转Linux] [Docker] MySQL安装和配置](/img/04/6253ef9fdf7d2242b42b4c7fb2c607.png)
[玩转Linux] [Docker] MySQL安装和配置

How to do smoke test

深入浅出,面试突击版
随机推荐
Recursive implementation of department tree
DaGAN论文解读
Mysql Information Schema 学习(一)--通用表
Transformer model (pytorch code explanation)
系统性详解Redis操作Hash类型数据(带源码分析及测试结果)
[infrastructure] deployment and configuration of Flink / Flink CDC (MySQL / es)
beegfs高可用模式探讨
手把手教你学会js的原型与原型链,猴子都能看懂的教程
MySQL information schema learning (II) -- InnoDB table
Classic 100 questions of algorithm interview, the latest career planning of Android programmers
How to do smoke test
方法关键字Deprecated,ExternalProcName,Final,ForceGenerate
如何自定义动漫头像?这6个免费精品在线卡通头像生成器,看一眼就怦然心动!
句号压缩过滤器
Alibaba数据源Druid可视化监控配置
深入浅出,面试突击版
In depth analysis, Android interview real problem analysis is popular all over the network
腾讯T3手把手教你,真的太香了
Finally, there is no need to change a line of code! Shardingsphere native driver comes out
Vscode debug run fluent message: there is no extension for debugging yaml. Should we find yaml extensions in the market?