当前位置：网站首页>Monthly report of speech synthesis (TTS) and speech recognition (ASR) papers in June 2022

Monthly report of speech synthesis (TTS) and speech recognition (ASR) papers in June 2022

2022-07-06 19:49:00 【My name is Yongqiang】

The paper statistics are updated once a month , It mainly tracks the development of speech synthesis and speech recognition ( Many articles are sent out after the meeting , But it does not affect the statistics . There are inevitable omissions in the statistical process , Therefore, the statistical results are only for reference . For the statistical list of all articles in the field of speech synthesis, please visit http://yqli.tech/page/tts_paper.html, For the statistics of papers in the field of speech recognition, please visit http://yqli.tech/page/asr_paper.html. Open source voice data query http://yqli.tech/page/data.html.

How to find voice information, please refer to the article https://mp.weixin.qq.com/s/eJcpsfs3OuhrccJ7_BvKOg）. Readers can send me a message directly if they have any suggestions , I will constantly revise the statistics . If reproduced , Please indicate the source . Welcome to WeChat official account. ： Keep a low profile .

One Speech synthesis

Table 1 gives specific classification instructions .2022 year 6 This month's speech synthesis related articles are 43 piece , comparison 5 The month doubled. See Figure 1, But less than 2021 Year of 54 piece . Table II and figure 2 Is the specific direction of speech synthesis, the situation of the article . This month's article is on acoustic models 、 emotional tts、 Voice conversion and multi language and multi speaker direction are more .

Table 1 Speech synthesis classification description

classification	explain
front end	Polyphony , rhythm ,g2p wait .
Acoustic models	From linguistic features to acoustic features ,attention Work , Multi speaker and dual learning
Vocoder	Waveform generation
Individualization	Less data , Dirty data application and other adaptive methods
Multilingual and multi speaker	Multilingual model 、 Multi speaker model
Singing synthesis	The combination of singing and music
emotional	Style and emotion
Multimodal	Mainly collect talking head article
Voice conversion	be based on GAN Scheme and feature decoupling scheme
S2S	speech-to-speech
Other	be based on EEG synthesis , Open source data ,MOS Evaluation and application of speech synthesis

chart 1 Total number of speech synthesis papers

Table two Distribution of speech synthesis papers

	1 month	2 month	3 month	4 month	5 month	6 month
front end	2	0	3	0	0	2
Acoustic models	4	5	17	8	2	7
Vocoder	1	5	7	5	3	4
Individualization	1	2	4	3	3	1
Multilingual	1	1	0	3	0	5
Singing synthesis	5	3	5	2	2	3
Emotional style	2	2	1	3	2	6
Multimodal	4	3	2	5	3	3
Voice conversion	4	2	11	3	2	6
s2s	1	0	2	1	2	0
Other	2	0	4	12	3	6

chart 2 Histogram of speech synthesis papers distribution

For a list of speech synthesis articles, please visit http://yqli.tech/page/tts_paper.html

2022.06 Month's article

Two Speech recognition

The classification of speech recognition articles is described in Table 3 , Direction speech translation and multimodal There was no statistics in the previous few months , So don't put it in the chart first . chart 3 The total number of speech recognition articles is , This month 55 piece . The research direction of speech recognition is shown in table 4 Sum graph 4, Obviously , Unsupervised learning is still the most popular direction . in addition , Tal open source some data , Especially this open source 500 Hours of mixed Chinese and English data , See https://ai.100tal.com/dataset. Open source data summary is accessible http://yqli.tech/page/data.html.

Table 3 Speech recognition classification description

classification	explain
general	Including tradition 、 Hybrid speech recognition , And right asr The optimization of the
ctc	ctc Optimize
rnn-t	rnn-t The optimization of the
aed	aed Optimize
dataset	Open source database
data aug	Data augmentation
lm	Language model research
multilingual	Multi voice system and code-switch
personal	Less data, adaptive and personalized ASR
rescoring	Joint scoring of multiple models
unsupervised	Unsupervised or self supervised learning
accent ,dialect	Accent and dialect
other	Other research directions , Including system evaluation criteria and so on
robust	Robustness
speaker diarization	speaker diarization
multichannel	Multichannel
speech translation	Voice translation
multi-modal	Multimodal

chart 3 Statistics on the number of speech recognition articles （ Company ： piece ）

surface 4 Distribution of speech recognition research directions

	1 month	2 month	3 month	4 month	5 month	6 month
general	12	10	13	9	6	7
ctc	1	0	2	5	1	1
rnn-t	3	1	2	3	0	2
aed	1	1	1	1	0	1
dataset	3	0	3	2	1	4
data augmentation	1	1	1	2	2	0
lm	2	2	4	3	0	3
multilingual	2	1	2	1	2	2
personal \| adaptation	0	7	3	1	2	2
rescoring	1	1	2	0	0	2
unsupervised	2	3	17	19	7	9
accent	1	0	0	2	2	0
multichannel	0	4	1	1	0	0
robust	0	0	5	2	2	1
other	6	13	22	13	9	10
speaker diarization	0	3	4	5	2	2
speech translation	-	-	-	-	6	4
multimodal	-	-	-	-	3	5

chart 4 Histogram of speech recognition research direction

For a list of speech recognition articles, please visit http://yqli.tech/page/asr_paper.html

2022.06 Specific articles on speech recognition in January

原网站

版权声明
本文为[My name is Yongqiang]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/187/202207061148030644.html

当前位置：网站首页>Monthly report of speech synthesis (TTS) and speech recognition (ASR) papers in June 2022

Monthly report of speech synthesis (TTS) and speech recognition (ASR) papers in June 2022

边栏推荐

猜你喜欢

随机推荐