当前位置:网站首页>Monthly report of speech synthesis (TTS) and speech recognition (ASR) papers in June 2022
Monthly report of speech synthesis (TTS) and speech recognition (ASR) papers in June 2022
2022-07-06 19:49:00 【My name is Yongqiang】
The paper statistics are updated once a month , It mainly tracks the development of speech synthesis and speech recognition ( Many articles are sent out after the meeting , But it does not affect the statistics . There are inevitable omissions in the statistical process , Therefore, the statistical results are only for reference . For the statistical list of all articles in the field of speech synthesis, please visit http://yqli.tech/page/tts_paper.html, For the statistics of papers in the field of speech recognition, please visit http://yqli.tech/page/asr_paper.html. Open source voice data query http://yqli.tech/page/data.html.
How to find voice information, please refer to the article https://mp.weixin.qq.com/s/eJcpsfs3OuhrccJ7_BvKOg). Readers can send me a message directly if they have any suggestions , I will constantly revise the statistics . If reproduced , Please indicate the source . Welcome to WeChat official account. : Keep a low profile .
One Speech synthesis
Table 1 gives specific classification instructions .2022 year 6 This month's speech synthesis related articles are 43 piece , comparison 5 The month doubled. See Figure 1, But less than 2021 Year of 54 piece . Table II and figure 2 Is the specific direction of speech synthesis, the situation of the article . This month's article is on acoustic models 、 emotional tts、 Voice conversion and multi language and multi speaker direction are more .
Table 1 Speech synthesis classification description
classification | explain |
front end | Polyphony , rhythm ,g2p wait . |
Acoustic models | From linguistic features to acoustic features ,attention Work , Multi speaker and dual learning |
Vocoder | Waveform generation |
Individualization | Less data , Dirty data application and other adaptive methods |
Multilingual and multi speaker | Multilingual model 、 Multi speaker model |
Singing synthesis | The combination of singing and music |
emotional | Style and emotion |
Multimodal | Mainly collect talking head article |
Voice conversion | be based on GAN Scheme and feature decoupling scheme |
S2S | speech-to-speech |
Other | be based on EEG synthesis , Open source data ,MOS Evaluation and application of speech synthesis |
chart 1 Total number of speech synthesis papers
Table two Distribution of speech synthesis papers
1 month | 2 month | 3 month | 4 month | 5 month | 6 month | |
front end | 2 | 0 | 3 | 0 | 0 | 2 |
Acoustic models | 4 | 5 | 17 | 8 | 2 | 7 |
Vocoder | 1 | 5 | 7 | 5 | 3 | 4 |
Individualization | 1 | 2 | 4 | 3 | 3 | 1 |
Multilingual | 1 | 1 | 0 | 3 | 0 | 5 |
Singing synthesis | 5 | 3 | 5 | 2 | 2 | 3 |
Emotional style | 2 | 2 | 1 | 3 | 2 | 6 |
Multimodal | 4 | 3 | 2 | 5 | 3 | 3 |
Voice conversion | 4 | 2 | 11 | 3 | 2 | 6 |
s2s | 1 | 0 | 2 | 1 | 2 | 0 |
Other | 2 | 0 | 4 | 12 | 3 | 6 |
chart 2 Histogram of speech synthesis papers distribution
For a list of speech synthesis articles, please visit http://yqli.tech/page/tts_paper.html
2022.06 Month's article
Two Speech recognition
The classification of speech recognition articles is described in Table 3 , Direction speech translation and multimodal There was no statistics in the previous few months , So don't put it in the chart first . chart 3 The total number of speech recognition articles is , This month 55 piece . The research direction of speech recognition is shown in table 4 Sum graph 4, Obviously , Unsupervised learning is still the most popular direction . in addition , Tal open source some data , Especially this open source 500 Hours of mixed Chinese and English data , See https://ai.100tal.com/dataset. Open source data summary is accessible http://yqli.tech/page/data.html.
Table 3 Speech recognition classification description
classification | explain |
general | Including tradition 、 Hybrid speech recognition , And right asr The optimization of the |
ctc | ctc Optimize |
rnn-t | rnn-t The optimization of the |
aed | aed Optimize |
dataset | Open source database |
data aug | Data augmentation |
lm | Language model research |
multilingual | Multi voice system and code-switch |
personal | Less data, adaptive and personalized ASR |
rescoring | Joint scoring of multiple models |
unsupervised | Unsupervised or self supervised learning |
accent ,dialect | Accent and dialect |
other | Other research directions , Including system evaluation criteria and so on |
robust | Robustness |
speaker diarization | speaker diarization |
multichannel | Multichannel |
speech translation | Voice translation |
multi-modal | Multimodal |
chart 3 Statistics on the number of speech recognition articles ( Company : piece )
surface 4 Distribution of speech recognition research directions
1 month | 2 month | 3 month | 4 month | 5 month | 6 month | |
general | 12 | 10 | 13 | 9 | 6 | 7 |
ctc | 1 | 0 | 2 | 5 | 1 | 1 |
rnn-t | 3 | 1 | 2 | 3 | 0 | 2 |
aed | 1 | 1 | 1 | 1 | 0 | 1 |
dataset | 3 | 0 | 3 | 2 | 1 | 4 |
data augmentation | 1 | 1 | 1 | 2 | 2 | 0 |
lm | 2 | 2 | 4 | 3 | 0 | 3 |
multilingual | 2 | 1 | 2 | 1 | 2 | 2 |
personal | adaptation | 0 | 7 | 3 | 1 | 2 | 2 |
rescoring | 1 | 1 | 2 | 0 | 0 | 2 |
unsupervised | 2 | 3 | 17 | 19 | 7 | 9 |
accent | 1 | 0 | 0 | 2 | 2 | 0 |
multichannel | 0 | 4 | 1 | 1 | 0 | 0 |
robust | 0 | 0 | 5 | 2 | 2 | 1 |
other | 6 | 13 | 22 | 13 | 9 | 10 |
speaker diarization | 0 | 3 | 4 | 5 | 2 | 2 |
speech translation | - | - | - | - | 6 | 4 |
multimodal | - | - | - | - | 3 | 5 |
chart 4 Histogram of speech recognition research direction
For a list of speech recognition articles, please visit http://yqli.tech/page/asr_paper.html
2022.06 Specific articles on speech recognition in January
边栏推荐
- Method keywords deprecated, externalprocname, final, forcegenerate
- Alibaba数据源Druid可视化监控配置
- Systematic and detailed explanation of redis operation hash type data (with source code analysis and test results)
- Tensorflow2.0 自定义训练的方式求解函数系数
- Recursive implementation of department tree
- Cf960g - bandit Blues (type I Stirling number +ogf)
- [玩转Linux] [Docker] MySQL安装和配置
- Hudi vs Delta vs Iceberg
- 腾讯T2大牛亲自讲解,跳槽薪资翻倍
- Dom 操作
猜你喜欢
ZABBIX proxy server and ZABBIX SNMP monitoring
[translation] linkerd's adoption rate in Europe and North America exceeded istio, with an increase of 118% in 2021.
(3) Web security | penetration testing | basic knowledge of network security construction, IIS website construction, EXE backdoor generation tool quasar, basic use of
MySQL information schema learning (II) -- InnoDB table
[calculating emotion and thought] floor sweeper, typist, information panic and Oppenheimer
2022年6月语音合成(TTS)和语音识别(ASR)论文月报
腾讯T3手把手教你,真的太香了
利用 clip-path 绘制不规则的图形
Mysql Information Schema 学习(二)--Innodb表
在解决了 2961 个用户反馈后,我做出了这样的改变...
随机推荐
Classic 100 questions of algorithm interview, the latest career planning of Android programmers
Learning and Exploration - Seamless rotation map
1805. 字符串中不同整数的数目
beegfs高可用模式探讨
Configuration and simple usage of the EXE backdoor generation tool quasar
在解决了 2961 个用户反馈后,我做出了这样的改变...
潇洒郎: AttributeError: partially initialized module ‘cv2‘ has no attribute ‘gapi_wip_gst_GStreamerPipe
Method keywords deprecated, externalprocname, final, forcegenerate
Leetcode 30. 串联所有单词的子串
CF960G - Bandit Blues(第一类斯特林数+OGF)
redisson bug分析
【翻译】云原生观察能力微调查。普罗米修斯引领潮流,但要了解系统的健康状况仍有障碍...
Finally, there is no need to change a line of code! Shardingsphere native driver comes out
AddressSanitizer 技术初体验
Unbalance balance (dynamic programming, DP)
方法关键字Deprecated,ExternalProcName,Final,ForceGenerate
The slave i/o thread stops because master and slave have equal MySQL serv
Yyds dry goods inventory leetcode question set 751 - 760
LeetCode_双指针_中等_61. 旋转链表
腾讯T3手把手教你,真的太香了