Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Overview

Mailing list : test Mailing list : test License: CC BY-NC 4.0

Donations Backers Sponsors

header

Silero Models

Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks.

Enterprise-grade STT made refreshingly simple (seriously, see benchmarks). We provide quality comparable to Google's STT (and sometimes even better) and we are not Google.

As a bonus:

  • No Kaldi;
  • No compilation;
  • No 20-step instructions;

Also we have published TTS models that satisfy the following criteria:

  • One-line usage;
  • A large library of voices;
  • A fully end-to-end pipeline;
  • Naturally sounding speech;
  • No GPU or training required;
  • Minimalism and lack of dependencies;
  • Faster than real-time on one CPU thread (!!!);
  • Support for 16kHz and 8kHz out of the box;

Speech-To-Text

All of the provided models are listed in the models.yml file. Any meta-data and newer versions will be added there.

Screenshot_1

Currently we provide the following checkpoints:

PyTorch ONNX Quantization Quality Colab
English (en_v5) ✔️ ✔️ ✔️ link Open In Colab
German (de_v4) ✔️ ✔️ link Open In Colab
English (en_v3) ✔️ ✔️ ✔️ link Open In Colab
German (de_v3) ✔️ link Open In Colab
German (de_v1) ✔️ ✔️ link Open In Colab
Spanish (es_v1) ✔️ ✔️ link Open In Colab
Ukrainian (ua_v3) ✔️ ✔️ ✔️ N/A Open In Colab

Model flavours:

jit jit jit jit jit_q jit_q onnx onnx onnx onnx
xsmall small large xlarge xsmall small xsmall small large xlarge
English en_v5 ✔️ ✔️ ✔️ ✔️ ✔️
English en_v4_0 ✔️ ✔️
English en_v3 ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
German de_v4 ✔️ ✔️
German de_v3 ✔️
German de_v1 ✔️ ✔️
Spanish es_v1 ✔️ ✔️
Ukrainian ua_v3 ✔️ ✔️ ✔️

Dependencies

  • All examples:
    • torch, 1.8+ (used to clone the repo in tf and onnx examples), breaking changes for version older than 1.6
    • torchaudio, latest version bound to PyTorch should work
    • omegaconf, latest just should work
  • Additional for ONNX examples:
    • onnx, latest just should work
    • onnxruntime, latest just should work
  • Additional for TensorFlow examples:
    • tensorflow, latest just should work
    • tensorflow_hub, latest just should work

Please see the provided Colab for details for each example below. All examples are maintained to work with the latest major packaged versions of the installed libraries.

PyTorch

Open In Colab

Open on Torch Hub

import torch
import zipfile
import torchaudio
from glob import glob

device = torch.device('cpu')  # gpu also works, but our models are fast enough for CPU
model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                       model='silero_stt',
                                       language='en', # also available 'de', 'es'
                                       device=device)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils  # see function signature for details

# download a single file, any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav',
                               dst ='speech_orig.wav', progress=True)
test_files = glob('speech_orig.wav')
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]),
                            device=device)

output = model(input)
for example in output:
    print(decoder(example.cpu()))

ONNX

Open In Colab

You can run our model everywhere, where you can import the ONNX model or run ONNX runtime.

import onnx
import torch
import onnxruntime
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual ONNX model
torch.hub.download_url_to_file(models.stt_models.en.latest.onnx, 'model.onnx', progress=True)
onnx_model = onnx.load('model.onnx')
onnx.checker.check_model(onnx_model)
ort_session = onnxruntime.InferenceSession('model.onnx')

# download a single file, any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# actual onnx inference and decoding
onnx_input = input.detach().cpu().numpy()
ort_inputs = {'input': onnx_input}
ort_outs = ort_session.run(None, ort_inputs)
decoded = decoder(torch.Tensor(ort_outs[0])[0])
print(decoded)

TensorFlow

Open In Colab

SavedModel example

import os
import torch
import subprocess
import tensorflow as tf
import tensorflow_hub as tf_hub
from omegaconf import OmegaConf

language = 'en' # also available 'de', 'es'

# load provided utils using torch.hub for brevity
_, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_stt', language=language)
(read_batch, split_into_batches,
 read_audio, prepare_model_input) = utils

# see available models
torch.hub.download_url_to_file('https://raw.githubusercontent.com/snakers4/silero-models/master/models.yml', 'models.yml')
models = OmegaConf.load('models.yml')
available_languages = list(models.stt_models.keys())
assert language in available_languages

# load the actual tf model
torch.hub.download_url_to_file(models.stt_models.en.latest.tf, 'tf_model.tar.gz')
subprocess.run('rm -rf tf_model && mkdir tf_model && tar xzfv tf_model.tar.gz -C tf_model',  shell=True, check=True)
tf_model = tf.saved_model.load('tf_model')

# download a single file, any format compatible with TorchAudio
torch.hub.download_url_to_file('https://opus-codec.org/static/examples/samples/speech_orig.wav', dst ='speech_orig.wav', progress=True)
test_files = ['speech_orig.wav']
batches = split_into_batches(test_files, batch_size=10)
input = prepare_model_input(read_batch(batches[0]))

# tf inference
res = tf_model.signatures["serving_default"](tf.constant(input.numpy()))['output_0']
print(decoder(torch.Tensor(res.numpy())[0]))

Text-To-Speech

Models and Speakers

All of the provided models are listed in the models.yml file. Any meta-data and newer versions will be added there.

Currently we provide the following speakers:

Speaker Auto-stress Language SR Colab
aidar_v2 yes ru (Russian) 8000, 16000 Open In Colab
baya_v2 yes ru (Russian) 8000, 16000 Open In Colab
irina_v2 yes ru (Russian) 8000, 16000 Open In Colab
kseniya_v2 yes ru (Russian) 8000, 16000 Open In Colab
natasha_v2 yes ru (Russian) 8000, 16000 Open In Colab
ruslan_v2 yes ru (Russian) 8000, 16000 Open In Colab
lj_v2 no en (English) 8000, 16000 Open In Colab
thorsten_v2 no de (German) 8000, 16000 Open In Colab
tux_v2 no es (Spanish) 8000, 16000 Open In Colab
gilles_v2 no fr (French) 8000, 16000 Open In Colab
multi_v2 no ru, en, de, es, fr, tt 8000, 16000 Open In Colab
aigul_v2 no ba (Bashkir) 8000, 16000 Open In Colab
erdni_v2 no xal (Kalmyk) 8000, 16000 Open In Colab
dilyara_v2 no tt (Tatar) 8000, 16000 Open In Colab
dilnavoz_v2 no uz (Uzbek) 8000, 16000 Open In Colab

(!!!) In multi_v2 all speakers can speak all of langauges (with various levels of fidelity).

Dependencies

Basic dependencies for colab examples:

  • torch, 1.9+;
  • torchaudio, latest version bound to PyTorch should work (required only because models are hosted together with STT, not required for work);
  • omegaconf, latest (can be removed as well, if you do not load all of the configs);

PyTorch

Open In Colab

Open on Torch Hub

import torch

language = 'ru'
speaker = 'kseniya_v2'
sample_rate = 16000
device = torch.device('cpu')

model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                     model='silero_tts',
                                     language=language,
                                     speaker=speaker)
model.to(device)  # gpu or cpu

audio = model.apply_tts(texts=[example_text],
                        sample_rate=sample_rate)

Standalone Use

  • Standalone usage just requires PyTorch 1.9+ and python standard library;
  • Please see the detailed examples in Colab;
import os
import torch

device = torch.device('cpu')
torch.set_num_threads(4)
local_file = 'model.pt'

if not os.path.isfile(local_file):
    torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/v2_kseniya.pt',
                                   local_file)  

model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model")
model.to(device)

example_batch = ['В недрах тундры выдры в г+етрах т+ырят в вёдра ядра кедров.',
                 'Котики - это жидкость!',
                 'М+ама М+илу м+ыла с м+ылом.']
sample_rate = 16000

audio_paths = model.save_wav(texts=example_batch,
                             sample_rate=sample_rate)

FAQ

Wiki

Also check out our wiki.

Performance and Quality

Please refer to this wiki sections:

Adding new Languages

Please refer here.

Contact

Get in Touch

Try our models, create an issue, join our chat, email us, read our news.

Commercial Inquiries

Please see our wiki and tiers for relevant information and email us.

Citations

@misc{Silero Models,
  author = {Silero Team},
  title = {Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/snakers4/silero-models}},
  commit = {insert_some_commit_here},
  email = {hello@silero.ai}
}

Further reading

English

  • STT:

    • Towards an Imagenet Moment For Speech-To-Text - link
    • A Speech-To-Text Practitioners Criticisms of Industry and Academia - link
    • Modern Google-level STT Models Released - link
  • TTS:

    • High-Quality Text-to-Speech Made Accessible, Simple and Fast - link
  • VAD:

    • Modern Portable Voice Activity Detector Released - link

Chinese

  • STT:
    • 迈向语音识别领域的 ImageNet 时刻 - link
    • 语音领域学术界和工业界的七宗罪 - link

Russian

  • STT

    • Мы опубликовали современные STT модели сравнимые по качеству с Google - link
    • Понижаем барьеры на вход в распознавание речи - link
    • Огромный открытый датасет русской речи версия 1.0 - link
    • Насколько Быстрой Можно Сделать Систему STT? - link
    • Наша система Speech-To-Text - link
    • Speech To Text - link
  • TTS:

    • Мы Опубликовали Качественный, Простой, Доступный и Быстрый Синтез Речи - link
  • VAD:

    • Мы опубликовали современный Voice Activity Detector и не только -link

Donations

Please use the "sponsor" button.

Comments
  • Feature request - Adding Proper TF 2.0 Checkpoints (not onnx-tensorflow) + Batching + TF JS

    Feature request - Adding Proper TF 2.0 Checkpoints (not onnx-tensorflow) + Batching + TF JS

    Hello, gyus! Your models are brilliant and I want to use it in my project via tensorflow serving. But it can't works without batching. Can you pleese save models with batching? Thank you!

    enhancement help wanted 
    opened by aleks73337 28
  • README's Standalone Use misses to mention NumPy

    README's Standalone Use misses to mention NumPy

    Currently, https://github.com/snakers4/silero-models#standalone-use states that:

    • Standalone usage just requires PyTorch 1.10+ and python standard library;

    but I had to install NumPy as well to make the example work.

    bug 
    opened by ghost 23
  • Bug report - RuntimeError: Unknown qengine

    Bug report - RuntimeError: Unknown qengine

    Hello. Great project! I would like to test a standard example, but at the line: model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model") I get an error: \lib\site-packages\torch\jit_script.py", line 351, in unpackage_script_module cpp_module = torch._C._import_ir_module_from_package( untimeError: Unknown qengine

    Python 10.4 , Torch 11.0 , device='cpu', Windows 10 Model: torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/ru_v3.pt', local_file)
    Tell me, please, how to fix it?

    bug 
    opened by lik2129 14
  • Bug report -problem loading STT model on Windows

    Bug report -problem loading STT model on Windows

    Hi, I decided to try selero_models, I do everything as in the dock, but I get an error. How to fix?

    code:

    import torch
    import zipfile
    import torchaudio
    from glob import glob
    
    device = torch.device('cpu')  # gpu also works, but our models are fast enough for CPU
    model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models',
                                           model='silero_stt',
                                           language='en', # also available 'de', 'es'
                                           device=device)
    

    Error: RuntimeError Traceback (most recent call last) C:\Users\E786~1\AppData\Local\Temp/ipykernel_9444/3004546653.py in 1 device = torch.device('cpu') # gpu also works, but our models are fast enough for CPU ----> 2 model, decoder, utils = torch.hub.load(repo_or_dir='snakers4/silero-models', 3 model='silero_stt', 4 language='en', # also available 'de', 'es' 5 device=device)

    c:\PY\asistent.venv\lib\site-packages\torch\hub.py in load(repo_or_dir, model, source, force_reload, verbose, skip_validation, *args, **kwargs) 397 repo_or_dir = _get_cache_or_reload(repo_or_dir, force_reload, verbose, skip_validation) 398 --> 399 model = _load_local(repo_or_dir, model, *args, **kwargs) 400 return model 401

    c:\PY\asistent.venv\lib\site-packages\torch\hub.py in _load_local(hubconf_dir, model, *args, **kwargs) 426 427 entry = _load_entry_from_hubconf(hub_module, model) --> 428 model = entry(*args, **kwargs) 429 430 sys.path.remove(hubconf_dir)

    ~/.cache\torch\hub\snakers4_silero-models_master\hubconf.py in silero_stt(language, version, jit_model, **kwargs) 32 assert language in available_languages 33 ---> 34 model, decoder = init_jit_model(model_url=models.stt_models.get(language).get(version).get(jit_model), 35 **kwargs) 36 utils = (read_batch,

    ~/.cache\torch\hub\snakers4_silero-models_master\utils.py in init_jit_model(model_url, device) 128 progress=True) 129 --> 130 model = torch.jit.load(model_path, map_location=device) 131 model.eval() 132 return model, Decoder(model.labels)

    c:\PY\asistent.venv\lib\site-packages\torch\jit_serialization.py in load(f, map_location, _extra_files) 159 cu = torch._C.CompilationUnit() 160 if isinstance(f, str) or isinstance(f, pathlib.Path): --> 161 cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files) 162 else: 163 cpp_module = torch._C.import_ir_module_from_buffer(

    RuntimeError: open file failed because of errno 2 on fopen: No such file or directory, file path: C:\Users\Дом/.cache\torch\hub\snakers4_silero-models_master\model\en_v5.jit

    bug 
    opened by lev007-ops 13
  • Feature request - SAPI5

    Feature request - SAPI5

    SAPI5 compatibility

    🚀 Feature

    Motivation

    Mostly enough for screen readers (Windows). But this interface is for integration by its nature. Ready to help!

    enhancement 
    opened by studennikov-serg 11
  • How to obtain an intermediate layer output?

    How to obtain an intermediate layer output?

    How do we obtain the output of an intermediate layer of the pre-trained model? For example, the output at the end of the convolution encoder, or the output just after the transformer encoder layers.

    help wanted 
    opened by prajwalkr 11
  • Feature request - Expressiveness

    Feature request - Expressiveness

    🚀 Feature

    Right now, in French STT, there is no decay upon a end of sentence. So if you have 2 sentences, the prosody is wrong and painful to hear. Each sentence by itself is almost perfect, but upon the end of a sentence, the pitch should decrease, the rate should also decrease and a short pause is required before starting a new sentence.

    Motivation

    This is useful as soon as you have more than 2 sentences to synthetize. Else, the current, excellent quality of the STT engine is useless, since no human speaks continuously across sentences.

    enhancement 
    opened by X-Ryl669 9
  • Errors running example.ipynb locally or in Colab (PyTorch 1.10 issues)

    Errors running example.ipynb locally or in Colab (PyTorch 1.10 issues)

    Hi,

    I am unable to run example.ipynb notebook locally (on CPU machine) or any of the Google Colab notebooks (either on CPU or GPU runtime).

    Following error occurs for example.ipynb notebook:

    model_url = model_conf.get('package')
    
    model_dir = "downloaded_model"
    os.makedirs(model_dir, exist_ok=True)
    model_path = os.path.join(model_dir, os.path.basename(model_url))
    
    if not os.path.isfile(model_path):
        torch.hub.download_url_to_file(model_url,
                                       model_path,
                                       progress=True)
    
    imp = package.PackageImporter(model_path)
    model = imp.load_pickle("te_model", "model")
    example_texts = model.examples
    
    def apply_te(text, lan='en'):
        return model.enhance_text(text, lan)
    
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    /tmp/ipykernel_2498123/2005539933.py in <module>
         10                                    progress=True)
         11 
    ---> 12 imp = package.PackageImporter(model_path)
         13 model = imp.load_pickle("te_model", "model")
         14 example_texts = model.examples
    
    ~/miniconda3/lib/python3.8/site-packages/torch/package/importer.py in __init__(self, file_or_buffer, module_allowed)
         59             self.filename = str(file_or_buffer)
         60             if not os.path.isdir(self.filename):
    ---> 61                 self.zip_reader = torch._C.PyTorchFileReader(self.filename)
         62             else:
         63                 self.zip_reader = MockZipReader(self.filename)
    
    RuntimeError: [enforce fail at inline_container.cc:222] . file not found: v1_4lang_q/version
    

    For any of the Google Colab notebooks, I get the following error when executing the very first cell:

         |████████████████████████████████| 74 kB 2.2 MB/s 
         |████████████████████████████████| 2.9 MB 11.8 MB/s 
         |████████████████████████████████| 112 kB 35.0 MB/s 
         |████████████████████████████████| 596 kB 46.5 MB/s 
      Building wheel for antlr4-python3-runtime (setup.py) ... done
    /content/silero-models
    ---------------------------------------------------------------------------
    OSError                                   Traceback (most recent call last)
    <ipython-input-1-5d873de0231f> in <module>()
         16 from glob import glob
         17 from omegaconf import OmegaConf
    ---> 18 from utils import (init_jit_model, 
         19                    split_into_batches,
         20                    read_audio,
    
    5 frames
    /usr/lib/python3.7/ctypes/__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error)
        362 
        363         if handle is None:
    --> 364             self._handle = _dlopen(self._name, mode)
        365         else:
        366             self._handle = handle
    
    OSError: libcudart.so.10.2: cannot open shared object file: No such file or directory
    

    Thus, as a result, I am unable to run any examples - either locally or in Google Colab.

    Thanks!

    bug 
    opened by abhinavkulkarni 9
  • Bug report - running on ARM / RPI

    Bug report - running on ARM / RPI

    🐛 Bug

    I tried to use the model in a Raspberry PI 3B and i get the following error : fft: ATen not compiled with MKL support So i tried to modify the stft function in torch/functional.py to use the librosa stft instead, but it seems that the model use another torch stft instead of this i have on my package.

    The function used instead of torch stft

    def stft(input: Tensor, n_fft: int, hop_length: Optional[int] = None, win_length: Optional[int] = None, window: Optional[Tensor] = None, center: bool = True, pad_mode: str = 'reflect', normalized: bool = False, onesided: Optional[bool] = None, return_complex: Optional[bool] = None): S = librosa.stft(np.array(input),n_fft,hop_length,win_length,window,center,pad_mode) s_real = np.real(S) s_real_shape = np.shape(s_real) s_real = np.reshape(s_real,(s_real_shape[0],s_real_shape[1],1)) s_imag = np.imag(S) s_imag_shape = np.shape(s_imag) s_imag = np.reshape(s_imag,(s_imag_shape[0],s_imag_shape[1],1)) S = np.concatenate((s_real,s_imag),axis=2) return torch.tensor(S)

    stack traces

    File "/home/Salim/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/torch/stt_pretrained/models/model.py", line 27, in forward _2 = self.win_length _3 = torch.hann_window(self.n_fft, dtype=ops.prim.dtype(x), layout=None, device=ops.prim.device(x), pin_memory=None) x0 = torch.torch.functional.stft(x, _0, _1, _2, _3, True, "reflect", False, True, ) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE _4 = torch.slice(x0, 0, 0, 9223372036854775807, 1) _5 = torch.slice(_4, 1, 0, 9223372036854775807, 1) File "code/torch/torch/functional.py", line 21, in stft input0 = input print("test ok") _2 = torch.stft(input0, n_fft, hop_length, win_length, window, normalized, onesided) ~~~~~~~~~~ <--- HERE return _2

    Traceback of TorchScript, original code (most recent call last): File "/opt/conda/lib/python3.7/site-packages/torch/functional.py", line 465, in stft input = F.pad(input.view(extended_shape), (pad, pad), pad_mode) input = input.view(input.shape[-signal_dim:]) return _VF.stft(input, n_fft, hop_length, win_length, window, normalized, onesided) ~~~~~~~~ <--- HERE RuntimeError: fft: ATen not compiled with MKL support

    Expected behavior

    Is it possible to modify the forward function that it will use the librosa stft for the raspberry PIs users ?

    Environment

    PyTorch version: 1.7.0a0+e85d494 Is debug build: True CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

    OS: Raspbian GNU/Linux 10 (buster) (armv7l) GCC version: (Raspbian 8.3.0-6+rpi1) 8.3.0 Clang version: Could not collect CMake version: version 3.13.4

    Python version: 3.7 (32-bit runtime) Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A

    Versions of relevant libraries: [pip3] numpy==1.20.2 [pip3] numpydoc==0.7.0 [pip3] torch==1.7.0a0 [pip3] torchaudio==0.7.0a0+ac17b64 [pip3] torchvision==0.8.0a0+291f7e2 [conda] Could not collect

    bug 
    opened by Salim-alileche 9
  • Feature request - Offline use of model

    Feature request - Offline use of model

    At the moment it is nearly impossible to create a docker container that works offline (without internet access). Even if you include this line during docker build:

    RUN python -c "import torch; torch.backends.quantized.engine='qnnpack'; torch.hub.load(repo_or_dir='snakers4/silero-models', model='silero_te', force_reload=True)"

    During execution of the docker container (without internet) you load it locally:

    torch.hub.load(repo_or_dir='/root/.cache/torch/hub/snakers4_silero-models_master', model='silero_te', source='local', force_reload=False)

    Then you have the problem that the hubconf.py is called again (and fails due to no internet access) and it tries to download the files in hubconf.py Lines 21, 49, 101, even though they already exist.

    So my suggestion would be to also includes checks in the Lines 21,49,101 to check if the file already exists locally and if yes then skip it (like done in Line 114)

    Any reasons against that?

    enhancement 
    opened by Phil1108 7
  • Issue getting from silero model tried for text enhancement

    Issue getting from silero model tried for text enhancement

    Issue

    File "<torch_package_104>.release_module.py", line 122, in enhance_text File "<torch_package_104>.release_module.py", line 101, in enhance_long_textblock File "<torch_package_104>.release_module.py", line 72, in enhance_textblock File "<torch_package_104>.release_module.py", line 165, in enhance_tokens IndexError: string index out of range

    Details

    I added punctuation to the text using Silero models over the PyTorch hub, and everything was going smoothly until the attached text example appeared. I have no idea why this is occurring. I'm using this model to add punctuation to transcripts that I collect from YouTube; some of them have a few missing punctuation marks (supplied by the video author), while others have no punctuation at all (auto-generated by youtube).

    Transcript throwing Error

    transcript 1: ""Hey there. How's it going everybody in this video? We'll be learning about python Data types and specifically We'll be learning about how to work with textual data and textual data in python are represented with strings So we currently have [opened] our intro pi file that we were working with in the last video Where we just printed out hello world and I'll go ahead and run this so that we can see that down here It does print out hello [world] [now] This line here is using the print function and we're passing this text value into that print function now if we wanted to create a Variable that holds that text value then we could say now I'll just get rid of this comment for now So if I wanted a variable to hold that value then I can just create a variable and we'll call that"

    transcript 2: "you're now ready to see how to go one layer of a convolution on your network let's go through the example you've seen in the previous video how to take a 3d volume and convolve it with say two different filters in order to get in this example two different 4x4 outputs so let's say convolving with the first filter gives this first 4x4 output and convolving with this second filter gives a different 4x4 output the final thing to turn this into a convolutional neural net layer is that for each of these we're going to add it bias so this is going to be a real number and what - broadcasting you kind of had the same number - every you know one of these sixteen elements and then apply a non-linearity which for illustration that says there a luna mini arity and this gives you a 4x4 output after applying the bias and the non-linearity and then for this thing at the bottom as well you had some different buyers again this is a real number so you had the same row number - all 16 numbers and then applies some non-linearity that fairly non-linearity and this gives you a different 4x4 output then same as we did before if you take this and stack it up as follows so they end up with a 4 by 4 by 2 output then this computation where you've gone from 6 by 6 by 3 to a 4 by 4 by 4 this is one layer of a convolutional neural network Center mapped is back to one layer of for propagation in the standard neural network when a non convolutional neural network remember that one step afford prot was something like this right z1 equals w1 times a0 a0 was also equal to X right and then plus b1 and he applied the non-linearity to get a 1 so that's G of Z 1" Please review the above transcript that is and let us know what the problem is.

    opened by Kishan-Sahu 6
  • Model getting stuck on some texts.

    Model getting stuck on some texts.

    There hasn't been a debugging message to explain why the model keeps getting stuck for a very long period. Please assist us in adding a debugging message to the model so we can identify the cause of the problem.

    The text for which the model stuck is given below:

    Text: "we're going to set this by saying export Python path all uppercase and then equals and now we want to set that location so I'm just going to come over here and grab that location and paste that in those quotes and we want it to look just like that no space in between the equals and the path so to save that we can just hit ctrl X and then Y to save and then enter to keep the same file name and now we can either restart our terminal or run a source command on that file but I'll just restart the terminal here and pull this up and now if we run Python then let's see if we can import that module so import my module and we can see that that worked and the reason that worked is that if we import sis and look at our sis then we can see that after our current directory that we have the directory that was added there and the reason that it's added is that we added it to our Python path environment variable so now let's take a look at how to set"

    I manually tested by eliminating strange letters and words and discovered that removing "ctrl" from text, worked effectively.

    opened by Kishan-Sahu 2
  • Feature request - `<phoneme>` support for SSML

    Feature request - `` support for SSML

    🚀 Feature

    Allow phonetic pronunciation for necessary words

    Motivation

    Sometimes it's necessary to customize pronunciation of words with non-standard spelling or word borrowed from other languages. In that case having transcription in IPA or X-SAMPA would be nice (see e.g. Polly for explanation of the syntax)

    Pitch

    Wrapping IPA or X-SAMPA transcription into a <phoneme> tag makes the engine pronounce the word according to its specification.

    Alternatives

    Not sure if there are any within the project. Using other projects supporting <phoneme> is possible.

    Additional context

    enhancement 
    opened by lagleki 1
  • Packaging and PyPI releases

    Packaging and PyPI releases

    Hello,

    Thank you for your hard work.

    Is there any chance of getting installable Python package from PyPI for the project?

    For example, it might look like this for installing STT models with PyTorch:

    pip install silero-models-stt[torch]
    

    This would be very handy for using the models in the production projects and environments.

    help wanted 
    opened by espdev 9
  • Feature request - [Wake Word Detection]

    Feature request - [Wake Word Detection]

    🚀 Feature

    It would be helpful if we could easily use wake word detection to complement the STT functionality. At present I'm using a third-party tool for wake word detection which then records audio for 4 seconds which is processed through silero for home automation purposes.

    Motivation & Pitch

    Adding a simple method for custom wake word detection would allow seamless integration for the purposes of home automation where an always listening device waits for a given wake word or phrase and then listens for a sentence for STT purposes, the text of which is then passed on to a different step in the chain.

    Additionally, while waiting a fixed amount of time for the follow-up sentence is straight-forward, it would be a helpful addition to also use the length of silence in a sentence to determine its termination.

    Alternatives

    Theses things can be done at present, but by having to use multiple tools. Being able to do this in one place would make this use case seamless and easier to process.

    I do understand if this is too far outside of your scope for this project.

    enhancement 
    opened by waytotheweb 1
Releases(v0.4.1)
  • v0.4.1(Jun 12, 2022)

    What's Changed

    • Fix models.yml loading by @rominf in https://github.com/snakers4/silero-models/pull/162

    New Contributors

    • @rominf made their first contribution in https://github.com/snakers4/silero-models/pull/162

    Full Changelog: https://github.com/snakers4/silero-models/compare/v0.4...v0.4.1

    Source code(tar.gz)
    Source code(zip)
  • v0.4(Jun 6, 2022)

    What's Changed

    • Add version 3.1 by @Islanna in https://github.com/snakers4/silero-models/pull/157
    • Fx by @Islanna in https://github.com/snakers4/silero-models/pull/158
    • Fx by @Islanna in https://github.com/snakers4/silero-models/pull/159

    Full Changelog: https://github.com/snakers4/silero-models/compare/v0.3...v0.4

    Source code(tar.gz)
    Source code(zip)
  • v0.3(May 23, 2022)

    What's Changed

    • Testing the auto-build functionality
    • Update examples by @snakers4 in https://github.com/snakers4/silero-models/pull/137
    • Fx ssml and model loading by @Islanna in https://github.com/snakers4/silero-models/pull/140
    • Update README.md by @Islanna in https://github.com/snakers4/silero-models/pull/138
    • Tts v3 by @Islanna in https://github.com/snakers4/silero-models/pull/141

    Full Changelog: https://github.com/snakers4/silero-models/compare/v0.1...v0.2

    Source code(tar.gz)
    Source code(zip)
  • v0.1(Feb 28, 2022)

  • v1(Sep 16, 2020)

    header)

    Mailing list : test Mailing list : test License: CC BY-NC 4.0

    We publish the following models in this release:

    • English V1
    • German V1
    • Spanish V1

    | | PyTorch | ONNX | TensorFlow | Quantization | Quality | Colab | |-----------------|--------------------|--------------------|--------------------|--------------|---------|-------| | English (en_v1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :hourglass: | link | Open In Colab | | German (de_v1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :hourglass: | link | Open In Colab | | Spanish (es_v1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :hourglass: | link | Open In Colab |

    Source code(tar.gz)
    Source code(zip)
Owner
Alexander Veysov
Alexander Veysov
Suite of 500 procedurally-generated NLP tasks to study language model adaptability

TaskBench500 The TaskBench500 dataset and code for generating tasks. Data The TaskBench dataset is available under wget http://web.mit.edu/bzl/www/Tas

Belinda Li 20 May 17, 2022
Easy Language Model Pretraining leveraging Huggingface's Transformers and Datasets

Easy Language Model Pretraining leveraging Huggingface's Transformers and Datasets What is LASSL • How to Use What is LASSL LASSL은 LAnguage Semi-Super

LASSL: LAnguage Self-Supervised Learning 116 Dec 27, 2022
A multi-voice TTS system trained with an emphasis on quality

TorToiSe Tortoise is a text-to-speech program built with the following priorities: Strong multi-voice capabilities. Highly realistic prosody and inton

James Betker 2.1k Jan 01, 2023
Shared, streaming Python dict

UltraDict Sychronized, streaming Python dictionary that uses shared memory as a backend Warning: This is an early hack. There are only few unit tests

Ronny Rentner 192 Dec 23, 2022
Knowledge Management for Humans using Machine Learning & Tags

HyperTag helps humans intuitively express how they think about their files using tags and machine learning. Represent how you think using tags. Find what you look for using semantic search for your t

Ravn Tech, Inc. 166 Jan 07, 2023
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Tensor2Tensor Tensor2Tensor, or T2T for short, is a library of deep learning models and datasets designed to make deep learning more accessible and ac

12.9k Jan 07, 2023
List of GSoC organisations with number of times they have been selected.

Welcome to GSoC Organisation Frequency And Details 👋 List of GSoC organisations with number of times they have been selected, techonologies, topics,

Shivam Kumar Jha 41 Oct 01, 2022
Refactored version of FastSpeech2

Refactored version of FastSpeech2. An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

ILJI CHOI 10 May 26, 2022
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
A Python 3.6+ package to run .many files, where many programs written in many languages may exist in one file.

RunMany Intro | Installation | VSCode Extension | Usage | Syntax | Settings | About A tool to run many programs written in many languages from one fil

6 May 22, 2022
End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.0.1 1.1.0 1.2.0 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 ubuntu18/python3.8/pip ubuntu18

ESPnet 5.9k Jan 03, 2023
NLP Text Classification

多标签文本分类任务 近年来随着深度学习的发展,模型参数的数量飞速增长。为了训练这些参数,需要更大的数据集来避免过拟合。然而,对于大部分NLP任务来说,构建大规模的标注数据集非常困难(成本过高),特别是对于句法和语义相关的任务。相比之下,大规模的未标注语料库的构建则相对容易。为了利用这些数据,我们可以

Jason 1 Nov 11, 2021
Unlimited Call - Text Bombing Tool

FastBomber Unlimited Call - Text Bombing Tool Installation On Termux

Aryan 6 Nov 10, 2022
Host your own GPT-3 Discord bot

GPT3 Discord Bot Host your own GPT-3 Discord bot i'd host and make the bot invitable myself, however GPT3 terms of service prohibit public use of GPT3

[something hillarious here] 8 Jan 07, 2023
A Chinese to English Neural Model Translation Project

ZH-EN NMT Chinese to English Neural Machine Translation This project is inspired by Stanford's CS224N NMT Project Dataset used in this project: News C

Zhenbang Feng 29 Nov 26, 2022
Unsupervised Language Model Pre-training for French

FlauBERT and FLUE FlauBERT is a French BERT trained on a very large and heterogeneous French corpus. Models of different sizes are trained using the n

GETALP 212 Dec 10, 2022
Revisiting Pre-trained Models for Chinese Natural Language Processing (Findings of EMNLP 2020)

This repository contains the resources in our paper "Revisiting Pre-trained Models for Chinese Natural Language Processing", which will be published i

Yiming Cui 463 Dec 30, 2022
Simple and efficient RevNet-Library with DeepSpeed support

RevLib Simple and efficient RevNet-Library with DeepSpeed support Features Half the constant memory usage and faster than RevNet libraries Less memory

Lucas Nestler 112 Dec 05, 2022
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge This is an implementation of the paper,

Mutian He 19 Oct 14, 2022
Mysticbbs-rjam - rJAM splitscreen message reader for MysticBBS A46+

rJAM splitscreen message reader for MysticBBS A46+

Robbert Langezaal 4 Nov 22, 2022