Ukrainian TTS (text-to-speech) using Coqui TTS

Last update: Dec 26, 2022

Overview

title	emoji	colorFrom	colorTo	sdk	app_file	pinned
Ukrainian TTS	🐸	green	green	gradio	app.py	false

Ukrainian TTS 📢 🤖

Ukrainian TTS (text-to-speech) using Coqui TTS.

Trained on M-AILABS Ukrainian dataset using sumska voice.

Link to online demo -> https://huggingface.co/spaces/robinhad/ukrainian-tts

Support

If you like my work, please support -> SUPPORT LINK

Example

test.mp4

How to use :

pip install -r requirements.txt.
Download model from "Releases" tab.
Launch as one-time command:

tts --text "Text for TTS" \
    --model_path path/to/model.pth.tar \
    --config_path path/to/config.json \
    --out_path folder/to/save/output.wav

or alternatively launch web server using:

tts-server --model_path path/to/model.pth.tar \
    --config_path path/to/config.json

How to train:

Refer to "Nervous beginner guide" in Coqui TTS docs.
Instead of provided config.json use one from this repo.

Attribution

Code for app.py taken from https://huggingface.co/spaces/julien-c/coqui

Comments

Error with file: speakers.pth

FileNotFoundError: [Errno 2] No such file or directory: '/home/user/Soft/Python/mamba1/TTS/vits_mykyta_latest-September-12-2022_12+38AM-829e2c24/speakers.pth'

opened by akirsoft 4

doc: fix examples in README

Problem

The one-time snippet does not work as is and complains that the speaker is not defined

 > initialization of speaker-embedding layers.
 > Text: Перевірка мікрофона
 > Text splitted to sentences.
['Перевірка мікрофона']
Traceback (most recent call last):
  File "/home/serg/.local/bin/tts", line 8, in <module>
    sys.exit(main())
  File "/home/serg/.local/lib/python3.8/site-packages/TTS/bin/synthesize.py", line 350, in main
    wav = synthesizer.tts(
  File "/home/serg/.local/lib/python3.8/site-packages/TTS/utils/synthesizer.py", line 228, in tts
    raise ValueError(
ValueError:  [!] Look like you use a multi-speaker model. You need to define either a `speaker_name` or a `speaker_wav` to use a multi-speaker model.

Also it speakers.pth should be downloaded.

Fix

Just a few documentation changes:

make instructions on what to download from Releases more precise
add --speaker_id argument with one of the speakers

opened by seriar 2

One vowel words in the end of the sentence aren't stressed

Input:


Бобер на березі з бобренятами бублики пік.

Боронила борона по боронованому полю.

Ішов Прокіп, кипів окріп, прийшов Прокіп - кипить окріп, як при Прокопі, так і при Прокопі і при Прокопенятах.

Сидить Прокоп — кипить окроп, Пішов Прокоп — кипить окроп. Як при Прокопові кипів окроп, Так і без Прокопа кипить окроп.

Result:


Боб+ер н+а березі з бобрен+ятами б+ублики пік.

Борон+ила борон+а п+о борон+ованому п+олю.

Іш+ов Пр+окіп, кип+ів окр+іп, прийш+ов Пр+окіп - кип+ить окр+іп, +як пр+и Пр+окопі, т+ак +і пр+и Пр+окопі +і пр+и Прокопенятах.

Сид+ить Прок+оп — кип+ить окроп, Піш+ов Прок+оп — кип+ить окроп. +Як пр+и Пр+окопові кип+ів окроп, Т+ак +і б+ез Пр+окопа кип+ить окроп.```

opened by robinhad 0

Error import StressOption

Traceback (most recent call last): File "/home/user/Soft/Python/mamba1/test.py", line 1, in from ukrainian_tts.tts import TTS, Voices, StressOption ImportError: cannot import name 'StressOption' from 'ukrainian_tts.tts'

opened by akirsoft 0

Vits improvements

vitsArgs = VitsArgs(
    # hifi V3
    resblock_type_decoder = '2',
    upsample_rates_decoder = [8,8,4],
    upsample_kernel_sizes_decoder = [16,16,8],
    upsample_initial_channel_decoder = 256,
    resblock_kernel_sizes_decoder = [3,5,7],
    resblock_dilation_sizes_decoder = [[1,2], [2,6], [3,12]],
)

opened by robinhad 0

Model improvement checklist
[x] Add Ukrainian accentor - https://github.com/egorsmkv/ukrainian-accentor

[ ] Fine-tune from existing checkpoint (e.g. VITS Ljspeech)

[ ] Try to increase fft_size, hop_length to match sample_rate accordingly

[ ] Include more dataset samples into model
opened by robinhad 0

Releases(v4.0.0)

v4.0.0(Dec 10, 2022)

This is a release of Ukrainian TTS model and checkpoint. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 160 000 steps by @robinhad .
Source code(tar.gz)
Source code(zip)
config.yaml(7.88 KB)
model.pth(368.17 MB)
v3.0.0(Sep 14, 2022)
This is a release of Ukrainian TTS model and checkpoint. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 280 000 steps by @robinhad . Kudos to @egorsmkv for providing dataset for this model. Kudos to @proger for providing alignment scripts. Kudos to @dchaplinsky for Dmytro voice.

Example:

Test sentence:

К+ам'ян+ець-Под+ільський - м+істо в Хмельн+ицькій +області Укра+їни, ц+ентр Кам'ян+ець-Под+ільської міськ+ої об'+єднаної територі+альної гром+ади +і Кам'ян+ець-Под+ільського рай+ону.

Mykyta (male):

https://user-images.githubusercontent.com/5759207/190852232-34956a1d-77a9-42b9-b96d-39d0091e3e34.mp4

Olena (female):

https://user-images.githubusercontent.com/5759207/190852238-366782c1-9472-45fc-8fea-31346242f927.mp4

Dmytro (male):

https://user-images.githubusercontent.com/5759207/190852251-db105567-52ba-47b5-8ec6-5053c3baac8c.mp4

Olha (female):

https://user-images.githubusercontent.com/5759207/190852259-c6746172-05c4-4918-8286-a459c654eef1.mp4

Lada (female):

https://user-images.githubusercontent.com/5759207/190852270-7aed2db9-dc08-4a9f-8775-07b745657ca1.mp4
Source code(tar.gz)
Source code(zip)
config.json(12.07 KB)
model-inference.pth(329.95 MB)
model.pth(989.97 MB)
speakers.pth(495 bytes)
v3.0.0-alpha(Sep 8, 2022)

Mykyta, Lada and Olena License: GPL v3 licence.
Source code(tar.gz)
Source code(zip)
config.json(9.94 KB)
model-inference.pth(329.95 MB)
model.pth(989.96 MB)
speakers.pth(431 bytes)
v2.0.0(Jul 10, 2022)
This is a release of Ukrainian TTS model and checkpoint using voice (7 hours) from Mykyta dataset. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 140 000 steps by @robinhad . Kudos to @egorsmkv for providing Mykyta and Olena dataset.

Example:

Test sentence:

К+ам'ян+ець-Под+ільський - м+істо в Хмельн+ицькій +області Укра+їни, ц+ентр Кам'ян+ець-Под+ільської міськ+ої об'+єднаної територі+альної гром+ади +і Кам'ян+ець-Под+ільського рай+ону.

Mykyta (male):

https://user-images.githubusercontent.com/5759207/178158485-29a5d496-7eeb-4938-8ea7-c345bc9fed57.mp4

Olena (female):

https://user-images.githubusercontent.com/5759207/178158492-8504080e-2f13-43f1-83f0-489b1f9cd66b.mp4
Source code(tar.gz)
Source code(zip)
config.json(9.97 KB)
model-inference.pth(329.95 MB)
model.pth(989.72 MB)
optimized.pth(329.95 MB)
speakers.pth(431 bytes)
v2.0.0-beta(May 8, 2022)

This is a beta release of Ukrainian TTS model and checkpoint using voice (7 hours) from Mykyta dataset. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 150 000 steps by @robinhad . Kudos to @egorsmkv for providing Mykyta dataset.

Example:

https://user-images.githubusercontent.com/5759207/167305810-2b023da7-0657-44ac-961f-5abf1aa6ea7d.mp4

:
Source code(tar.gz)
Source code(zip)
config.json(8.85 KB)
LICENSE(34.32 KB)
model-inference.pth(317.15 MB)
model.pth(951.32 MB)
tts_output.wav(1.11 MB)
v1.0.0(Jan 14, 2022)

This is release of Ukrainian TTS model and checkpoint using voice (10 hours) from M-AILABS Ukrainian dataset. Model was trained using 200 000 steps. Example:

https://user-images.githubusercontent.com/5759207/149566245-40656002-3999-48a8-b671-e0f74c3d6e2f.mp4
Source code(tar.gz)
Source code(zip)
config.json(6.44 KB)
example.mp4(32.07 KB)
v0.0.1(Oct 14, 2021)

This is release of Ukrainian TTS model and checkpoint using voice (10 hours) from M-AILABS Ukrainian dataset. Model was trained using 14145 steps. Example:

https://user-images.githubusercontent.com/5759207/140622395-9e734c95-159c-4d72-9f56-e8d1f1ac66c2.mp4
Source code(tar.gz)
Source code(zip)
config.json(4.96 KB)
test.mp4(32.73 KB)
vocoder_config.json(4.39 KB)

Owner

Yurii Paniv

AI and stuff

GitHub Repository https://huggingface.co/spaces/robinhad/ukrainian-tts

Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

5 Dec 28, 2021

Convolutional Neural Networks for Sentence Classification

Convolutional Neural Networks for Sentence Classification Code for the paper Convolutional Neural Networks for Sentence Classification (EMNLP 2014). R

2k Jan 02, 2023

Tracking Progress in Natural Language Processing

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

21.2k Dec 30, 2022

APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

APEACH - Korean Hate Speech Evaluation Datasets APEACH is the first crowd-generated Korean evaluation dataset for hate speech detection. Sentences of

70 Dec 06, 2022

Semi-automated vocabulary generation from semantic vector models

vec2word Semi-automated vocabulary generation from semantic vector models This script generates a list of potential conlang word forms along with asso

9 Nov 25, 2022

Refactored version of FastSpeech2

Refactored version of FastSpeech2. An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

10 May 26, 2022

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Chinese real time voice cloning (VC) and Chinese text to speech (TTS). 好用的中文语音克隆兼中文语音合成系统，包含语音编码器、语音合成器、声码器和可视化模块。

6 Nov 08, 2022

Training code for Korean multi-class sentiment analysis

KoSentimentAnalysis Bert implementation for the Korean multi-class sentiment analysis 왜 한국어 감정 다중분류 모델은 거의 없는 것일까?에서 시작된 프로젝트 Environment: Pytorch, Da

3 Dec 02, 2022

A library that integrates huggingface transformers with the world of fastai, giving fastai devs everything they need to train, evaluate, and deploy transformer specific models.

blurr A library that integrates huggingface transformers with version 2 of the fastai framework Install You can now pip install blurr via pip install

253 Dec 31, 2022

code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling This repository contains PyTorch evaluation code, training code and pretrain

94 Oct 26, 2022

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform This repo try to implement iSTFTNet : Fast

126 Jan 02, 2023

Precision Medicine Knowledge Graph (PrimeKG)

PrimeKG Website | bioRxiv Paper | Harvard Dataverse Precision Medicine Knowledge Graph (PrimeKG) presents a holistic view of diseases. PrimeKG integra

103 Dec 10, 2022

Automatically search Stack Overflow for the command you want to run

stackshell Automatically search Stack Overflow (and other Stack Exchange sites) for the command you want to ru Use the up and down arrows to change be

22 Oct 27, 2021

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Google Text-To-Speech Batch Prompt File Maker Are you in the need of IVR prompts, but you have no voice actors? Let Google talk your prompts like a pr

1 Aug 19, 2021

Ukrainian TTS (text-to-speech) using Coqui TTS

Related tags

Overview

Ukrainian TTS 📢 🤖

Support

Example

How to use :

How to train:

Attribution

Comments

Error with file: speakers.pth

doc: fix examples in README

Problem

Fix

One vowel words in the end of the sentence aren't stressed

Error import StressOption

Vits improvements

Model improvement checklist

Releases(v4.0.0)

v4.0.0(Dec 10, 2022)

v3.0.0(Sep 14, 2022)

Example:

v3.0.0-alpha(Sep 8, 2022)

v2.0.0(Jul 10, 2022)

Example:

v2.0.0-beta(May 8, 2022)

Example:

v1.0.0(Jan 14, 2022)

v0.0.1(Oct 14, 2021)

Owner

Yurii Paniv

Simple Speech to Text, Text to Speech

Convolutional Neural Networks for Sentence Classification

Tracking Progress in Natural Language Processing

APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

Semi-automated vocabulary generation from semantic vector models

Refactored version of FastSpeech2

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Training code for Korean multi-class sentiment analysis

A library that integrates huggingface transformers with the world of fastai, giving fastai devs everything they need to train, evaluate, and deploy transformer specific models.

code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

Precision Medicine Knowledge Graph (PrimeKG)

Automatically search Stack Overflow for the command you want to run

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Anuvada: Interpretable Models for NLP using PyTorch

문장단위로 분절된 나무위키 데이터셋. Releases에서 다운로드 받거나, tfds-korean을 통해 다운로드 받으세요.

Lingtrain Aligner — ML powered library for the accurate texts alignment.

Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention

Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch

A framework for cleaning Chinese dialog data