当前位置:网站首页>Speech Synthesis Model Cheat Sheet (1)
Speech Synthesis Model Cheat Sheet (1)
2022-08-03 01:32:00 【Andy Dennis】
Foreword
Voice is also an increasingly popular industry.Given a piece of text, we want it to be read. We need to use speech synthesis technology, which is Text-to-Speech, or TTS for short.Here are some interesting models I saw.
One-stage speech synthesis is generally called end-to-end
Two-stage speech synthesis step, usually stage1:
Text-(FFT)-> Spectrogram-(filtering)-> 梅尔谱/线性谱
stage 2: 将梅尔谱/线性谱生成波形(音频)
Thesis
VITS
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
ICML 2021
Paper: https://arxiv.org/abs/2106.06103
Code: https://github.com/jaywalnut310/vits
condition VAE + flow + GAN
flow can look at the two articles v-flow and flow++.
I saw two paper notes on Zhihu:
More detailed Read the classic: VITS, for speech synthesis tapeConditional Variational Autoencoders with Adversarial Learning
Short [Paper Notes] VITS_OlaWod
边栏推荐
猜你喜欢
随机推荐
思源笔记 本地存储无使用第三方同步盘,突然打不开文件。
令人心动的AI综述(1)
R语言自学 1 - 向量
VMware workstation program starts slowly
Day117.尚医通:生成挂号订单模块
Yocto系列讲解[实战篇]85 - 制作ubi镜像和自动挂载ubifs文件系统
目前为止 DAO靠什么盈利?
B站回应HR称用户是Loser:涉事面试官去年底已被劝退
基于奇异谱分析法和长短时记忆网络组合模型的滑坡位移预测
Test | ali internship 90 days in life: from the perspective of interns, talk about personal growth
FastCorrect:语音识别快速纠错模型丨RTC Dev Meetup
别再用Field注入了
MDL 内存描述符链表
today‘s task
科研用Cholesterol-PEG-NHS,NHS-PEG-CLS,胆固醇-聚乙二醇-活性酯
工业元宇宙的价值和发展
IDEA 重复代码的黄色波浪线取消设置
Mock工具之Moco使用教程
HCIP(16)
Strict feedback nonlinear systems based on event trigger preset since the immunity of finite time tracking control