当前位置:网站首页>Speech Synthesis Model Cheat Sheet (1)
Speech Synthesis Model Cheat Sheet (1)
2022-08-03 01:32:00 【Andy Dennis】
Foreword
Voice is also an increasingly popular industry.Given a piece of text, we want it to be read. We need to use speech synthesis technology, which is Text-to-Speech, or TTS for short.Here are some interesting models I saw.
One-stage speech synthesis is generally called end-to-end
Two-stage speech synthesis step, usually stage1:
Text-(FFT)-> Spectrogram-(filtering)-> 梅尔谱/线性谱
stage 2: 将梅尔谱/线性谱生成波形(音频)
Thesis
VITS
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
ICML 2021
Paper: https://arxiv.org/abs/2106.06103
Code: https://github.com/jaywalnut310/vits

condition VAE + flow + GAN
flow can look at the two articles v-flow and flow++.
I saw two paper notes on Zhihu:
More detailed Read the classic: VITS, for speech synthesis tapeConditional Variational Autoencoders with Adversarial Learning
Short [Paper Notes] VITS_OlaWod
边栏推荐
猜你喜欢
随机推荐
如何通过 IDEA 数据库管理工具连接 TDengine?
Day117.尚医通:生成挂号订单模块
2022中国眼博会,山东眼健康展,视力矫正仪器展,护眼产品展
ROS2初级知识(9):bag记录过程数据和重放
CAS:1445723-73-8,DSPE-PEG-NHS,磷脂-聚乙二醇-活性酯两亲性脂质PEG共轭物
NLP常用Backbone模型小抄(1)
HCIP(17)
mysql根据多字段分组——group by带两个或多个参数
The CTF command execution subject their thinking
Image recognition from zero to write DNF script key points
Token、Redis实现单点登录
2022杭电多校第一场(K/L/B/C)
Kubernetes 进阶训练营 网络
数据库主键一定要自增吗?有哪些场景不建议自增?
基于STM32的FLASH读写实验含代码(HAL库)
典型相关分析CCA计算过程
目前为止 DAO靠什么盈利?
基于STM32设计的老人防摔倒报警设备(OneNet)
Shunted Self-Attention via Multi-Scale Token Aggregation
Based on two levels of decomposition and the length of the memory network multi-step combined forecasting model of short-term wind speed









