EdiTTS: Score-based Editing for Controllable Text-to-Speech

Overview

EdiTTS: Score-based Editing for Controllable Text-to-Speech

Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech. Audio samples are available on our demo page.

Abstract

We present EdiTTS, an off-the-shelf speech editing methodology based on score-based generative modeling for text-to-speech synthesis. EdiTTS allows for targeted, granular editing of audio, both in terms of content and pitch, without the need for any additional training, task-specific optimization, or architectural modifications to the score-based model backbone. Specifically, we apply coarse yet deliberate perturbations in the Gaussian prior space to induce desired behavior from the diffusion model, while applying masks and softening kernels to ensure that iterative edits are applied only to the target region. Listening tests demonstrate that EdiTTS is capable of reliably generating natural-sounding audio that satisfies user-imposed requirements.

Citation

Please cite this work as follows.

@misc{tae&kim2021editts,
      title={EdiTTS: Score-based Editing for Controllable Text-to-Speech}, 
      author={Jaesung Tae and Hyeongju Kim and Taesu Kim},
      year={2021}
}

Setup

  1. Create a Python virtual environment (venv or conda) and install package requirements as specified in requirements.txt.

    python -m venv venv
    source venv/bin/activate
    pip install -U pip
    pip install -r requirements.txt
  2. Build the monotonic alignment module.

    cd model/monotonic_align
    python setup.py build_ext --inplace

For more information, refer to the official repository of Grad-TTS.

Checkpoints

The following checkpoints are already included as part of this repository, under checkpts.

Pitch Shifting

  1. Prepare an input file containing samples for speech generation. Mark the segment to be edited via a vertical bar separator, |. For instance, a single sample might look like

    In | the face of impediments confessedly discouraging |

    We provide a sample input file in resources/filelists/edit_pitch_example.txt.

  2. To run inference, type

    CUDA_VISIBLE_DEVICES=0 python edit_pitch.py \
        -f resources/filelists/edit_pitch_example.txt \
        -c checkpts/grad-tts-old.pt -t 1000 \
        -s out/pitch/wavs

    Adjust CUDA_VISIBLE_DEVICES as appropriate.

Content Replacement

  1. Prepare an input file containing pairs of sentences. Concatenate each pair with # and mark the parts to be replaced with a vertical bar separator. For instance, a single pair might look like

    Three others subsequently | identified | Oswald from a photograph. #Three others subsequently | recognized | Oswald from a photograph.

    We provide a sample input file in resources/filelists/edit_content_example.txt.

  2. To run inference, type

    CUDA_VISIBLE_DEVICES=0 python edit_content.py \
        -f resources/filelists/edit_content_example.txt \
        -c checkpts/grad-tts-old.pt -t 1000 \
        -s out/content/wavs

References

License

Released under the modified GNU General Public License.

Owner
Neosapience
Neosapience, an artificial being enabled by artificial intelligence, will soon be everywhere in our daily lives.
Neosapience
A Semi-Intelligent ChatBot filled with statistical and economical data for the Premier League.

MONEYBALL - ChatBot Module: 4006CEM, Class: B, Group: 5 Contributors: Jonas Djondo Roshan Kc Cole Samson Daniel Rodrigues Ihteshaam Naseer Kind remind

Jonas Djondo 1 Nov 18, 2021
Final Project Bootcamp Zero

The Quest (Pygame) Descripción Este es el repositorio de código The-Quest para el proyecto final Bootcamp Zero de KeepCoding. El juego consiste en la

Seven-z01 1 Mar 02, 2022
This simple Python program calculates a love score based on your and your crush's full names in English

This simple Python program calculates a love score based on your and your crush's full names in English. There is no logic or reason in the calculation behind the love score. The calculation could ha

p.katekomol 1 Jan 24, 2022
This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Abstractive Text Summarization for 1500+ Language Pairs".

CrossSum This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Abstractive Text Summ

BUET CSE NLP Group 29 Nov 19, 2022
A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)

A2T: Towards Improving Adversarial Training of NLP Models This is the source code for the EMNLP 2021 (Findings) paper "Towards Improving Adversarial T

QData 17 Oct 15, 2022
Python functions for summarizing and improving voice dictation input.

Helpmespeak Help me speak uses Python functions for summarizing and improving voice dictation input. Get started with OpenAI gpt-3 OpenAI is a amazing

Margarita Humanitarian Foundation 6 Dec 17, 2022
An A-SOUL Text Generator Based on CPM-Distill.

ASOUL-Generator-Backend 本项目为 https://asoul.infedg.xyz/ 的后端。 模型为基于 CPM-Distill 的 transformers 转化版本 CPM-Generate-distill 训练而成。

infinityedge 46 Dec 11, 2022
Implementation of TTS with combination of Tacotron2 and HiFi-GAN

Tacotron2-HiFiGAN-master Implementation of TTS with combination of Tacotron2 and HiFi-GAN for Mandarin TTS. Inference In order to inference, we need t

SunLu Z 7 Nov 11, 2022
pytorch implementation of Attention is all you need

A Pytorch Implementation of the Transformer: Attention Is All You Need Our implementation is largely based on Tensorflow implementation Requirements N

230 Dec 07, 2022
Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

Francis R. Willett 305 Dec 22, 2022
Simple and efficient RevNet-Library with DeepSpeed support

RevLib Simple and efficient RevNet-Library with DeepSpeed support Features Half the constant memory usage and faster than RevNet libraries Less memory

Lucas Nestler 112 Dec 05, 2022
Mlcode - Continuous ML API Integrations

mlcode Basic APIs for ML applications. Django REST Application Contains REST API

Sujith S 1 Jan 01, 2022
API for the GPT-J language model 🦜. Including a FastAPI backend and a streamlit frontend

gpt-j-api 🦜 An API to interact with the GPT-J language model. You can use and test the model in two different ways: Streamlit web app at http://api.v

Víctor Gallego 276 Dec 31, 2022
VD-BERT: A Unified Vision and Dialog Transformer with BERT

VD-BERT: A Unified Vision and Dialog Transformer with BERT PyTorch Code for the following paper at EMNLP2020: Title: VD-BERT: A Unified Vision and Dia

Salesforce 44 Nov 01, 2022
Implementation of paper Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa.

RoBERTaABSA This repo contains the code for NAACL 2021 paper titled Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoB

106 Nov 28, 2022
Creating an Audiobook (mp3 file) using a Ebook (epub) using BeautifulSoup and Google Text to Speech

epub2audiobook Creating an Audiobook (mp3 file) using a Ebook (epub) using BeautifulSoup and Google Text to Speech Input examples qual a pasta do seu

7 Aug 25, 2022
The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

Graformer The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models Graformer (also named BridgeTransformer in t

22 Dec 14, 2022
Task-based datasets, preprocessing, and evaluation for sequence models.

SeqIO: Task-based datasets, preprocessing, and evaluation for sequence models. SeqIO is a library for processing sequential data to be fed into downst

Google 290 Dec 26, 2022
PyJPBoatRace: Python-based Japanese boatrace tools 🚤

pyjpboatrace :speedboat: provides you with useful tools for data analysis and auto-betting for boatrace.

5 Oct 29, 2022
ConvBERT: Improving BERT with Span-based Dynamic Convolution

ConvBERT Introduction In this repo, we introduce a new architecture ConvBERT for pre-training based language model. The code is tested on a V100 GPU.

YITUTech 237 Dec 10, 2022