当前位置:网站首页>Monthly report of speech synthesis (TTS) and speech recognition (ASR) papers in June 2022
Monthly report of speech synthesis (TTS) and speech recognition (ASR) papers in June 2022
2022-07-06 19:49:00 【My name is Yongqiang】

The paper statistics are updated once a month , It mainly tracks the development of speech synthesis and speech recognition ( Many articles are sent out after the meeting , But it does not affect the statistics . There are inevitable omissions in the statistical process , Therefore, the statistical results are only for reference . For the statistical list of all articles in the field of speech synthesis, please visit http://yqli.tech/page/tts_paper.html, For the statistics of papers in the field of speech recognition, please visit http://yqli.tech/page/asr_paper.html. Open source voice data query http://yqli.tech/page/data.html.
How to find voice information, please refer to the article https://mp.weixin.qq.com/s/eJcpsfs3OuhrccJ7_BvKOg). Readers can send me a message directly if they have any suggestions , I will constantly revise the statistics . If reproduced , Please indicate the source . Welcome to WeChat official account. : Keep a low profile .
One Speech synthesis
Table 1 gives specific classification instructions .2022 year 6 This month's speech synthesis related articles are 43 piece , comparison 5 The month doubled. See Figure 1, But less than 2021 Year of 54 piece . Table II and figure 2 Is the specific direction of speech synthesis, the situation of the article . This month's article is on acoustic models 、 emotional tts、 Voice conversion and multi language and multi speaker direction are more .
Table 1 Speech synthesis classification description
classification | explain |
front end | Polyphony , rhythm ,g2p wait . |
Acoustic models | From linguistic features to acoustic features ,attention Work , Multi speaker and dual learning |
Vocoder | Waveform generation |
Individualization | Less data , Dirty data application and other adaptive methods |
Multilingual and multi speaker | Multilingual model 、 Multi speaker model |
Singing synthesis | The combination of singing and music |
emotional | Style and emotion |
Multimodal | Mainly collect talking head article |
Voice conversion | be based on GAN Scheme and feature decoupling scheme |
S2S | speech-to-speech |
Other | be based on EEG synthesis , Open source data ,MOS Evaluation and application of speech synthesis |
chart 1 Total number of speech synthesis papers

Table two Distribution of speech synthesis papers
| 1 month | 2 month | 3 month | 4 month | 5 month | 6 month | |
| front end | 2 | 0 | 3 | 0 | 0 | 2 |
| Acoustic models | 4 | 5 | 17 | 8 | 2 | 7 |
| Vocoder | 1 | 5 | 7 | 5 | 3 | 4 |
| Individualization | 1 | 2 | 4 | 3 | 3 | 1 |
| Multilingual | 1 | 1 | 0 | 3 | 0 | 5 |
| Singing synthesis | 5 | 3 | 5 | 2 | 2 | 3 |
| Emotional style | 2 | 2 | 1 | 3 | 2 | 6 |
| Multimodal | 4 | 3 | 2 | 5 | 3 | 3 |
| Voice conversion | 4 | 2 | 11 | 3 | 2 | 6 |
| s2s | 1 | 0 | 2 | 1 | 2 | 0 |
| Other | 2 | 0 | 4 | 12 | 3 | 6 |
chart 2 Histogram of speech synthesis papers distribution

For a list of speech synthesis articles, please visit http://yqli.tech/page/tts_paper.html

2022.06 Month's article

Two Speech recognition
The classification of speech recognition articles is described in Table 3 , Direction speech translation and multimodal There was no statistics in the previous few months , So don't put it in the chart first . chart 3 The total number of speech recognition articles is , This month 55 piece . The research direction of speech recognition is shown in table 4 Sum graph 4, Obviously , Unsupervised learning is still the most popular direction . in addition , Tal open source some data , Especially this open source 500 Hours of mixed Chinese and English data , See https://ai.100tal.com/dataset. Open source data summary is accessible http://yqli.tech/page/data.html.
Table 3 Speech recognition classification description
classification | explain |
general | Including tradition 、 Hybrid speech recognition , And right asr The optimization of the |
ctc | ctc Optimize |
rnn-t | rnn-t The optimization of the |
aed | aed Optimize |
dataset | Open source database |
data aug | Data augmentation |
lm | Language model research |
multilingual | Multi voice system and code-switch |
personal | Less data, adaptive and personalized ASR |
rescoring | Joint scoring of multiple models |
unsupervised | Unsupervised or self supervised learning |
accent ,dialect | Accent and dialect |
other | Other research directions , Including system evaluation criteria and so on |
| robust | Robustness |
| speaker diarization | speaker diarization |
multichannel | Multichannel |
| speech translation | Voice translation |
| multi-modal | Multimodal |
chart 3 Statistics on the number of speech recognition articles ( Company : piece )

surface 4 Distribution of speech recognition research directions
| 1 month | 2 month | 3 month | 4 month | 5 month | 6 month | |
| general | 12 | 10 | 13 | 9 | 6 | 7 |
| ctc | 1 | 0 | 2 | 5 | 1 | 1 |
| rnn-t | 3 | 1 | 2 | 3 | 0 | 2 |
| aed | 1 | 1 | 1 | 1 | 0 | 1 |
| dataset | 3 | 0 | 3 | 2 | 1 | 4 |
| data augmentation | 1 | 1 | 1 | 2 | 2 | 0 |
| lm | 2 | 2 | 4 | 3 | 0 | 3 |
| multilingual | 2 | 1 | 2 | 1 | 2 | 2 |
| personal | adaptation | 0 | 7 | 3 | 1 | 2 | 2 |
| rescoring | 1 | 1 | 2 | 0 | 0 | 2 |
| unsupervised | 2 | 3 | 17 | 19 | 7 | 9 |
| accent | 1 | 0 | 0 | 2 | 2 | 0 |
| multichannel | 0 | 4 | 1 | 1 | 0 | 0 |
| robust | 0 | 0 | 5 | 2 | 2 | 1 |
| other | 6 | 13 | 22 | 13 | 9 | 10 |
| speaker diarization | 0 | 3 | 4 | 5 | 2 | 2 |
| speech translation | - | - | - | - | 6 | 4 |
| multimodal | - | - | - | - | 3 | 5 |
chart 4 Histogram of speech recognition research direction

For a list of speech recognition articles, please visit http://yqli.tech/page/asr_paper.html

2022.06 Specific articles on speech recognition in January

边栏推荐
- logstash高速入口
- 系统性详解Redis操作Hash类型数据(带源码分析及测试结果)
- 面试突击63:MySQL 中如何去重?
- 腾讯T4架构师,android面试基础
- Hudi vs Delta vs Iceberg
- Leetcode 30. Concatenate substrings of all words
- [calculating emotion and thought] floor sweeper, typist, information panic and Oppenheimer
- redisson bug分析
- 【云小课】EI第47课 MRS离线数据分析-通过Flink作业处理OBS数据
- RT-Thread 组件 FinSH 使用时遇到的问题
猜你喜欢

Pay attention to the partners on the recruitment website of fishing! The monitoring system may have set you as "high risk of leaving"

Blue Bridge Cup microbial proliferation C language

Mysql Information Schema 学习(二)--Innodb表

How to access localhost:8000 by mobile phone

LeetCode_双指针_中等_61. 旋转链表

学习打卡web

潇洒郎: AttributeError: partially initialized module ‘cv2‘ has no attribute ‘gapi_wip_gst_GStreamerPipe

Transformer model (pytorch code explanation)

力扣101题:对称二叉树

Analysis of rainwater connection
随机推荐
激进技术派 vs 项目保守派的微服务架构之争
mod_wsgi + pymssql通路SQL Server座
Yyds dry goods inventory leetcode question set 751 - 760
蓝桥杯 微生物增殖 C语言
Leetcode 30. 串联所有单词的子串
测试用里hi
MySQL information schema learning (I) -- general table
Low CPU load and high loadavg processing method
企业精益管理体系介绍
算法面试经典100题,Android程序员最新职业规划
CF960G - Bandit Blues(第一类斯特林数+OGF)
Tensorflow2.0 自定义训练的方式求解函数系数
Simple application of VBA script in Excel
Finally, there is no need to change a line of code! Shardingsphere native driver comes out
MySql必知必会学习
js实现力扣71题简化路径
Microservice architecture debate between radical technologists vs Project conservatives
1805. 字符串中不同整数的数目
Example of shutter text component
【翻译】数字内幕。KubeCon + CloudNativeCon在2022年欧洲的选择过程