当前位置:网站首页>Monthly report of speech synthesis (TTS) and speech recognition (ASR) papers in June 2022
Monthly report of speech synthesis (TTS) and speech recognition (ASR) papers in June 2022
2022-07-06 19:49:00 【My name is Yongqiang】

The paper statistics are updated once a month , It mainly tracks the development of speech synthesis and speech recognition ( Many articles are sent out after the meeting , But it does not affect the statistics . There are inevitable omissions in the statistical process , Therefore, the statistical results are only for reference . For the statistical list of all articles in the field of speech synthesis, please visit http://yqli.tech/page/tts_paper.html, For the statistics of papers in the field of speech recognition, please visit http://yqli.tech/page/asr_paper.html. Open source voice data query http://yqli.tech/page/data.html.
How to find voice information, please refer to the article https://mp.weixin.qq.com/s/eJcpsfs3OuhrccJ7_BvKOg). Readers can send me a message directly if they have any suggestions , I will constantly revise the statistics . If reproduced , Please indicate the source . Welcome to WeChat official account. : Keep a low profile .
One Speech synthesis
Table 1 gives specific classification instructions .2022 year 6 This month's speech synthesis related articles are 43 piece , comparison 5 The month doubled. See Figure 1, But less than 2021 Year of 54 piece . Table II and figure 2 Is the specific direction of speech synthesis, the situation of the article . This month's article is on acoustic models 、 emotional tts、 Voice conversion and multi language and multi speaker direction are more .
Table 1 Speech synthesis classification description
classification | explain |
front end | Polyphony , rhythm ,g2p wait . |
Acoustic models | From linguistic features to acoustic features ,attention Work , Multi speaker and dual learning |
Vocoder | Waveform generation |
Individualization | Less data , Dirty data application and other adaptive methods |
Multilingual and multi speaker | Multilingual model 、 Multi speaker model |
Singing synthesis | The combination of singing and music |
emotional | Style and emotion |
Multimodal | Mainly collect talking head article |
Voice conversion | be based on GAN Scheme and feature decoupling scheme |
S2S | speech-to-speech |
Other | be based on EEG synthesis , Open source data ,MOS Evaluation and application of speech synthesis |
chart 1 Total number of speech synthesis papers

Table two Distribution of speech synthesis papers
| 1 month | 2 month | 3 month | 4 month | 5 month | 6 month | |
| front end | 2 | 0 | 3 | 0 | 0 | 2 |
| Acoustic models | 4 | 5 | 17 | 8 | 2 | 7 |
| Vocoder | 1 | 5 | 7 | 5 | 3 | 4 |
| Individualization | 1 | 2 | 4 | 3 | 3 | 1 |
| Multilingual | 1 | 1 | 0 | 3 | 0 | 5 |
| Singing synthesis | 5 | 3 | 5 | 2 | 2 | 3 |
| Emotional style | 2 | 2 | 1 | 3 | 2 | 6 |
| Multimodal | 4 | 3 | 2 | 5 | 3 | 3 |
| Voice conversion | 4 | 2 | 11 | 3 | 2 | 6 |
| s2s | 1 | 0 | 2 | 1 | 2 | 0 |
| Other | 2 | 0 | 4 | 12 | 3 | 6 |
chart 2 Histogram of speech synthesis papers distribution

For a list of speech synthesis articles, please visit http://yqli.tech/page/tts_paper.html

2022.06 Month's article

Two Speech recognition
The classification of speech recognition articles is described in Table 3 , Direction speech translation and multimodal There was no statistics in the previous few months , So don't put it in the chart first . chart 3 The total number of speech recognition articles is , This month 55 piece . The research direction of speech recognition is shown in table 4 Sum graph 4, Obviously , Unsupervised learning is still the most popular direction . in addition , Tal open source some data , Especially this open source 500 Hours of mixed Chinese and English data , See https://ai.100tal.com/dataset. Open source data summary is accessible http://yqli.tech/page/data.html.
Table 3 Speech recognition classification description
classification | explain |
general | Including tradition 、 Hybrid speech recognition , And right asr The optimization of the |
ctc | ctc Optimize |
rnn-t | rnn-t The optimization of the |
aed | aed Optimize |
dataset | Open source database |
data aug | Data augmentation |
lm | Language model research |
multilingual | Multi voice system and code-switch |
personal | Less data, adaptive and personalized ASR |
rescoring | Joint scoring of multiple models |
unsupervised | Unsupervised or self supervised learning |
accent ,dialect | Accent and dialect |
other | Other research directions , Including system evaluation criteria and so on |
| robust | Robustness |
| speaker diarization | speaker diarization |
multichannel | Multichannel |
| speech translation | Voice translation |
| multi-modal | Multimodal |
chart 3 Statistics on the number of speech recognition articles ( Company : piece )

surface 4 Distribution of speech recognition research directions
| 1 month | 2 month | 3 month | 4 month | 5 month | 6 month | |
| general | 12 | 10 | 13 | 9 | 6 | 7 |
| ctc | 1 | 0 | 2 | 5 | 1 | 1 |
| rnn-t | 3 | 1 | 2 | 3 | 0 | 2 |
| aed | 1 | 1 | 1 | 1 | 0 | 1 |
| dataset | 3 | 0 | 3 | 2 | 1 | 4 |
| data augmentation | 1 | 1 | 1 | 2 | 2 | 0 |
| lm | 2 | 2 | 4 | 3 | 0 | 3 |
| multilingual | 2 | 1 | 2 | 1 | 2 | 2 |
| personal | adaptation | 0 | 7 | 3 | 1 | 2 | 2 |
| rescoring | 1 | 1 | 2 | 0 | 0 | 2 |
| unsupervised | 2 | 3 | 17 | 19 | 7 | 9 |
| accent | 1 | 0 | 0 | 2 | 2 | 0 |
| multichannel | 0 | 4 | 1 | 1 | 0 | 0 |
| robust | 0 | 0 | 5 | 2 | 2 | 1 |
| other | 6 | 13 | 22 | 13 | 9 | 10 |
| speaker diarization | 0 | 3 | 4 | 5 | 2 | 2 |
| speech translation | - | - | - | - | 6 | 4 |
| multimodal | - | - | - | - | 3 | 5 |
chart 4 Histogram of speech recognition research direction

For a list of speech recognition articles, please visit http://yqli.tech/page/asr_paper.html

2022.06 Specific articles on speech recognition in January

边栏推荐
- [calculating emotion and thought] floor sweeper, typist, information panic and Oppenheimer
- 手把手教你学会js的原型与原型链,猴子都能看懂的教程
- Leetcode 30. 串联所有单词的子串
- MySQL must know and learn
- Web开发小妙招:巧用ThreadLocal规避层层传值
- 凤凰架构3——事务处理
- Alibaba data source Druid visual monitoring configuration
- 思維導圖+源代碼+筆記+項目,字節跳動+京東+360+網易面試題整理
- 350. 两个数组的交集 II
- 凤凰架构2——访问远程服务
猜你喜欢

Leetcode 30. 串联所有单词的子串

How to customize animation avatars? These six free online cartoon avatar generators are exciting at a glance!

学习探索-无缝轮播图

接雨水问题解析

企业精益管理体系介绍

【计算情与思】扫地僧、打字员、信息恐慌与奥本海默

零基础入门PolarDB-X:搭建高可用系统并联动数据大屏

Low CPU load and high loadavg processing method

系统性详解Redis操作Hash类型数据(带源码分析及测试结果)

Swiftui game source code Encyclopedia of Snake game based on geometryreader and preference
随机推荐
Leetcode 30. Concatenate substrings of all words
[translation] Digital insider. Selection process of kubecon + cloudnativecon in Europe in 2022
Vmware虚拟机无法打开内核设备“\\.\Global\vmx86“的解决方法
企业精益管理体系介绍
颜色(color)转换为三刺激值(r/g/b)(干股)
Selenium advanced operations
LeetCode_ Double pointer_ Medium_ 61. rotating linked list
MySql必知必会学习
小微企业难做账?智能代账小工具快用起来
蓝桥杯 微生物增殖 C语言
激进技术派 vs 项目保守派的微服务架构之争
社招面试心得,2022最新Android高频精选面试题分享
LeetCode_双指针_中等_61. 旋转链表
利用 clip-path 绘制不规则的图形
【云小课】EI第47课 MRS离线数据分析-通过Flink作业处理OBS数据
VMware virtual machine cannot open the kernel device "\.\global\vmx86"
腾讯T4架构师,android面试基础
Leetcode 30. 串联所有单词的子串
How to customize animation avatars? These six free online cartoon avatar generators are exciting at a glance!
About image reading and processing, etc