当前位置:网站首页>Monthly report of speech synthesis (TTS) and speech recognition (ASR) papers in June 2022

Monthly report of speech synthesis (TTS) and speech recognition (ASR) papers in June 2022

2022-07-06 19:49:00 My name is Yongqiang

The paper statistics are updated once a month , It mainly tracks the development of speech synthesis and speech recognition ( Many articles are sent out after the meeting , But it does not affect the statistics . There are inevitable omissions in the statistical process , Therefore, the statistical results are only for reference . For the statistical list of all articles in the field of speech synthesis, please visit http://yqli.tech/page/tts_paper.html, For the statistics of papers in the field of speech recognition, please visit http://yqli.tech/page/asr_paper.html. Open source voice data query http://yqli.tech/page/data.html.

How to find voice information, please refer to the article https://mp.weixin.qq.com/s/eJcpsfs3OuhrccJ7_BvKOg). Readers can send me a message directly if they have any suggestions , I will constantly revise the statistics . If reproduced , Please indicate the source . Welcome to WeChat official account. : Keep a low profile .


One Speech synthesis

Table 1 gives specific classification instructions .2022 year 6 This month's speech synthesis related articles are 43 piece , comparison 5 The month doubled. See Figure 1, But less than 2021 Year of 54 piece . Table II and figure 2 Is the specific direction of speech synthesis, the situation of the article . This month's article is on acoustic models 、 emotional tts、 Voice conversion and multi language and multi speaker direction are more .

Table 1   Speech synthesis classification description

classification

explain

front end

Polyphony , rhythm ,g2p wait .

Acoustic models

From linguistic features to acoustic features ,attention Work , Multi speaker and dual learning

Vocoder

Waveform generation

Individualization

Less data , Dirty data application and other adaptive methods

Multilingual and multi speaker

Multilingual model 、 Multi speaker model

Singing synthesis

The combination of singing and music

emotional

Style and emotion

Multimodal

Mainly collect talking head article

Voice conversion

be based on GAN Scheme and feature decoupling scheme

S2S

 speech-to-speech

Other

be based on EEG synthesis , Open source data ,MOS Evaluation and application of speech synthesis

chart 1  Total number of speech synthesis papers

Table two    Distribution of speech synthesis papers

1 month 2 month 3 month 4 month 5 month 6 month
front end 203002
Acoustic models 4517827
Vocoder 157534
Individualization 124331
Multilingual 110305
Singing synthesis 535223
Emotional style 221326
Multimodal 432533
Voice conversion 4211326
s2s102120
Other 2041236

chart 2  Histogram of speech synthesis papers distribution

For a list of speech synthesis articles, please visit http://yqli.tech/page/tts_paper.html

2022.06 Month's article

Two   Speech recognition

The classification of speech recognition articles is described in Table 3 , Direction speech translation and multimodal There was no statistics in the previous few months , So don't put it in the chart first . chart 3 The total number of speech recognition articles is , This month 55 piece . The research direction of speech recognition is shown in table 4 Sum graph 4, Obviously , Unsupervised learning is still the most popular direction . in addition , Tal open source some data , Especially this open source 500 Hours of mixed Chinese and English data , See https://ai.100tal.com/dataset​. Open source data summary is accessible http://yqli.tech/page/data.html​.

                                  Table 3   Speech recognition classification description

classification

explain

general

Including tradition 、 Hybrid speech recognition , And right asr The optimization of the

ctc

ctc Optimize

rnn-t

rnn-t The optimization of the

aed

aed Optimize

dataset

Open source database

data aug

Data augmentation

lm

Language model research

multilingual

Multi voice system and code-switch

personal

Less data, adaptive and personalized ASR

rescoring

Joint scoring of multiple models

unsupervised

Unsupervised or self supervised learning

accent ,dialect

Accent and dialect

other

Other research directions , Including system evaluation criteria and so on

robust Robustness
speaker diarizationspeaker diarization

multichannel

Multichannel
speech translation Voice translation
multi-modal Multimodal

                                      chart 3   Statistics on the number of speech recognition articles ( Company : piece )

                               surface 4  Distribution of speech recognition research directions

1 month 2 month 3 month 4 month 5 month 6 month
general121013967
ctc102511
rnn-t312302
aed111101
dataset303214
data augmentation111220
lm224303
multilingual212122
personal | adaptation073122
rescoring112002
unsupervised23171979
accent100220
multichannel041100
robust005221
other6132213910
speaker diarization034522
speech translation----64
multimodal----35

                                 chart 4 Histogram of speech recognition research direction

For a list of speech recognition articles, please visit http://yqli.tech/page/asr_paper.html

2022.06 Specific articles on speech recognition in January

原网站

版权声明
本文为[My name is Yongqiang]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/187/202207061148030644.html