当前位置：网站首页>Speech Synthesis Model Cheat Sheet (1)

Speech Synthesis Model Cheat Sheet (1)

2022-08-03 01:32:00 【Andy Dennis】

Foreword

Voice is also an increasingly popular industry.Given a piece of text, we want it to be read. We need to use speech synthesis technology, which is Text-to-Speech, or TTS for short.Here are some interesting models I saw.

One-stage speech synthesis is generally called end-to-end
Two-stage speech synthesis step, usually stage1:
Text-(FFT)-> Spectrogram-(filtering)-> 梅尔谱/线性谱
stage 2: 将梅尔谱/线性谱生成波形(音频)

Thesis

VITS

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
ICML 2021
Paper: https://arxiv.org/abs/2106.06103
Code: https://github.com/jaywalnut310/vits

condition VAE + flow + GAN
flow can look at the two articles v-flow and flow++.

I saw two paper notes on Zhihu:
More detailed Read the classic: VITS, for speech synthesis tapeConditional Variational Autoencoders with Adversarial Learning
Short [Paper Notes] VITS_OlaWod

版权声明
本文为[Andy Dennis]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/215/202208022243041294.html

边栏推荐

猜你喜欢

随机推荐