当前位置:网站首页>Speech Synthesis Model Cheat Sheet (1)

Speech Synthesis Model Cheat Sheet (1)

2022-08-03 01:32:00 Andy Dennis

Foreword

Voice is also an increasingly popular industry.Given a piece of text, we want it to be read. We need to use speech synthesis technology, which is Text-to-Speech, or TTS for short.Here are some interesting models I saw.

One-stage speech synthesis is generally called end-to-end
Two-stage speech synthesis step, usually stage1:
Text-(FFT)-> Spectrogram-(filtering)-> 梅尔谱/线性谱
stage 2: 将梅尔谱/线性谱生成波形(音频)


Thesis

VITS

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
ICML 2021
Paper: https://arxiv.org/abs/2106.06103
Code: https://github.com/jaywalnut310/vits

condition VAE + flow + GAN
flow can look at the two articles v-flow and flow++.

I saw two paper notes on Zhihu:
More detailed Read the classic: VITS, for speech synthesis tapeConditional Variational Autoencoders with Adversarial Learning
Short [Paper Notes] VITS_OlaWod

原网站

版权声明
本文为[Andy Dennis]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/215/202208022243041294.html