NeMo: a toolkit for conversational AI

Overview

Project Status: Active – The project has reached a stable, usable state and is being actively developed. Documentation NeMo core license and license for collections in this repo Language grade: Python Total alerts Code style: black

NVIDIA NeMo

Introduction

NeMo is a toolkit for creating Conversational AI applications.

NeMo product page.

Introductory video.

The toolkit comes with extendable collections of pre-built modules and ready-to-use models for:

Built for speed, NeMo can utilize NVIDIA's Tensor Cores and scale out training to multiple GPUs and multiple nodes.

Requirements

  1. Python 3.6 or above
  2. Pytorch 1.7.1 or above

Installation

Pip

Use this installation mode if you want the latest released version.

apt-get update && apt-get install -y libsndfile1 ffmpeg
pip install Cython
pip install nemo_toolkit[all]==1.0.0b3

Pip from source

Use this installation mode if you want the a version from particular GitHub branch (e.g main).

apt-get update && apt-get install -y libsndfile1 ffmpeg
pip install Cython
python -m pip install git+https://github.com/NVIDIA/NeMo.git@{BRANCH}#egg=nemo_toolkit[all]

From source

Use this installation mode if you are contributing to NeMo.

apt-get update && apt-get install -y libsndfile1 ffmpeg
git clone https://github.com/NVIDIA/NeMo
cd NeMo
./reinstall.sh

Docker containers:

The easiest way to start training with NeMo is by using NeMo's container. It has all requirements and NeMo 1.0.0b3 already installed.

docker run --gpus all -it --rm --shm-size=8g \
-p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit \
stack=67108864 --device=/dev/snd nvcr.io/nvidia/nemo:1.0.0b3

If you chose to work with main branch, we recommend using NVIDIA's PyTorch container version 20.11-py3 and then installing from GitHub.

docker run --gpus all -it --rm -v <nemo_github_folder>:/NeMo --shm-size=8g \
-p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit \
stack=67108864 --device=/dev/snd nvcr.io/nvidia/pytorch:20.11-py3

Examples

Simplest application with NeMo. (runs in Google Colab, no local installation necessary)

Lots of other examples in "Examples" folder.

Documentation

Version Status Description
Latest Documentation Status Documentation of the latest (i.e. main) branch
Stable Documentation Status Documentation of the stable (i.e. v1.0.0b1) branch

Getting help with NeMo

FAQ can be found on NeMo's Discussions board. You are welcome to ask questions or start discussions there.

Tutorials

The best way to get started with NeMo is to checkout one of our tutorials.

Most NeMo tutorials can be run on Google's Colab.

To run tutorials:

  • Click on Colab link (see table below)
  • Connect to an instance with a GPU (Runtime -> Change runtime type -> select "GPU" for hardware accelerator)
Tutorials
Domain Title GitHub URL
NeMo Simple Application with NeMo Voice swap app
NeMo Exploring NeMo Fundamentals NeMo primer
NeMo Models Exploring NeMo Model Construction NeMo models
ASR ASR with NeMo ASR with NeMo
ASR ASR with Subword Tokenization ASR with Subword Tokenization
ASR Speech Commands Speech commands
ASR Speaker Recognition and Verification Speaker Recognition and Verification
ASR Online Noise Augmentation Online noise augmentation
ASR Beam Search and External Language Model Rescoring Beam search and external language model rescoring
NLP Using Pretrained Language Models for Downstream Tasks Pretrained language models for downstream tasks
NLP Exploring NeMo NLP Tokenizers NLP tokenizers
NLP Text Classification (Sentiment Analysis) with BERT Text Classification (Sentiment Analysis)
NLP Question answering with SQuAD Question answering Squad
NLP Token Classification (Named Entity Recognition) Token classification: named entity recognition
NLP Joint Intent Classification and Slot Filling Joint Intent and Slot Classification
NLP GLUE Benchmark GLUE benchmark
NLP Punctuation and Capitialization Punctuation and capitalization
NLP Named Entity Recognition - BioMegatron Named Entity Recognition - BioMegatron
NLP Relation Extraction - BioMegatron Relation Extraction - BioMegatron
TTS Speech Synthesis TTS inference
TTS Speech Synthesis Tacotron2 training
Tools CTC Segmentation CTC Segmentation
Tools Text Normalization for Text To Speech Text Normalization

Contributing

We welcome community contributions! Please refer to the CONTRIBUTING.md CONTRIBUTING.md for the process.

License

NeMo is under Apache 2.0 license.

Comments
  • T5 pipeline parallel

    T5 pipeline parallel

    What does this PR do ?

    Adds pipeline parallel training support to T5.

    Collection: NLP

    Changelog

    • TBD

    Usage

    Set pipeline_model_parallel_size=2/4 in megatron_t5_config.yaml.

    Before your PR is "Ready for review"

    Pre checks:

    • [ ] Make sure you read and followed Contributor guidelines
    • [ ] Did you write any new necessary tests?
    • [ ] Did you add or update any necessary documentation?
    • [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
      • [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

    PR Type:

    • [ ] New Feature
    • [ ] Bugfix
    • [ ] Documentation

    If you haven't finished some of the above items you can still open "Draft" PR.

    Who can review?

    Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

    Additional Information

    • Related to # (issue)
    opened by MaximumEntropy 70
  • NLP refactoring - Stage 2

    NLP refactoring - Stage 2

    Signed-off-by: Evelina Bakhturina [email protected]

    Stage 2 of NLP refactoring:

    • Cleaning up and restructuring of functions and files in nlp collection.
    • Cleaning up losses ++ Added weighting option to LossAggregatorNM ++ Moved LossAggregatorNM to the losses.py in the backend common ++ Splited JointIntentSlotLoss into two separate common losses and removed it ++ Merged MaskedLanguageModelingLossNM, PaddedSmoothedCrossEntropyLossNM and SmoothedCrossEntropyLoss into a unified loss SmoothedCrossEntropyLoss ++ Changed QuestionAnsweringLoss to a more general name SpanningLoss ++ Changed TRADEMaskedCrossEntropy to a more general name MaskedXEntropyLoss ++ Removed TokenClassificationLoss, CrossEntropyLoss3D and JointIntentSlotLoss ++ Added weighting and masking support to CrossEntropyLossNM ++ Added dynamic port sizes to CrossEntropyLossNM ++ Changed CrossEntropyLoss to CrossEntropyLossNM to prevent confusion with pytorch's CrossEntropyLoss
    opened by ekmb 60
  • Dialogue task

    Dialogue task

    What does this PR do ?

    Add various functionalities to dialogue domain for NeMo

    Collection: NLP

    Changelog

    1. Support Zero Shot Intent Recognition
    2. Further refactored Dialogue module
    3. Implement Dialogue GPT Generation Model
    4. Support MS Marco Data Processor
    5. Implement Dialogue S2S Generation Model (HF fully supported, Megatron training supported, inference pending integration of common generation API)
    6. Support System Response Generation using user utterance and system slots based on SGD dataset
    7. Support Design Data Processor
    8. Implement HF BART based classifier into zero shot intent model
    9. Implement Dialogue Nearest Neighbour Model
    10. Refactor Dialogue SGD Data Processor to make interface with models cleaner
    11. Update Nearest Neighbour Model and ZeroShotIntentModel to support SGD dataset and ZeroShot Datasets
    12. Support Mellon QA Data Processor
    13. Add Documentation and Tutorial

    See details in NVIDIA only dev log

    Usage

    • You can potentially add a usage example below
    # Add a code snippet demonstrating how to use this 
    

    Before your PR is "Ready for review"

    Pre checks:

    • [x] Make sure you read and followed Contributor guidelines
    • [x] Did you write any new necessary tests?
    • [x] Did you add or update any necessary documentation?
    • [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
      • [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

    PR Type:

    • [x] New Feature
    • [ ] Bugfix
    • [ ] Documentation

    If you haven't finished some of the above items you can still open "Draft" PR.

    Who can review?

    Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

    Additional Information

    • Related to # (issue)
    opened by Zhilin123 58
  • Binarized memmap dataloader for Megatron NMT, Inference and checkpoint -> nemo

    Binarized memmap dataloader for Megatron NMT, Inference and checkpoint -> nemo

    What does this PR do ?

    1. Adds megatron memory-mapped dataloaders to NMT.
    2. Inference script/config with a translate() method.

    Collection: NLP

    Changelog

    • Add a new dataset class for megatron memmap dataset.
    • Add an inference script with the associated yaml config.
    • Change the use_tarred_dataset arg to a generic dataset_type arg that can take [text, tarred, bin_memmap, text_memmap]

    Usage

    • Set the following in the yaml config
    • dataset_type: bin_memmap.

    Before your PR is "Ready for review"

    Pre checks:

    • [ ] Make sure you read and followed Contributor guidelines
    • [ ] Did you write any new necessary tests?
    • [ ] Did you add or update any necessary documentation?
    • [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
      • [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

    PR Type:

    • [ ] New Feature
    • [ ] Bugfix
    • [ ] Documentation

    If you haven't finished some of the above items you can still open "Draft" PR.

    Who can review?

    Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

    Additional Information

    • Related to # (issue)
    opened by MaximumEntropy 54
  • Adding Conformer model

    Adding Conformer model

    Here are some of the main changes:

    • Added the modules needed for Conformer
    • Added use_start_end_tokens to the data layers to support dropping these tokens
    • Updated our CTC loss to support different reduction methods including 'mean_batch'. User may select other reduction approaches from Pytorch. User may set it via ctc_reduction param added to the model config.
    • Added log_prediction parameter to model's config to control if we want to see prediction samples in output or not
    • Added LSTM decoder and Swish activation
    • Added NoamScheduler
    • Added subsampling module which supports VGGNet and striding approach for subsampling
    • Added multi-head attention, relative multi-head attention along with positional embedding and relative positional embedding
    • Fixed the bug in the data layer which added some paddings after normalization (fixing this bug would fail the tests! going to investigate it)
    opened by VahidooX 52
  • Adding cache-aware streaming Conformer with look-ahead support

    Adding cache-aware streaming Conformer with look-ahead support

    What does this PR do ?

    Adding cache-aware streaming Conformer training and inference with look-ahead support. It is achieved by training a model with limited effective right context and then perform the streaming with activation caching support. Limiting the right context would reduce the accuracy in compare to the an offline model but it gives better accuracy and significantly higher throughput by dropping duplicates in the computations which happens in buffered-based streaming.Large right context decreases the WER while increasing the latency.

    It supports the three following modes: 1-fully causal model with zero look-ahead with zero latency 2-regular look-ahead 3-chunk-aware look-ahead with small duplication in computations.

    It supports both Conformer-CTC and Conformer-Transducer and they can get trained with regular scripts but the configs files in the following folder: NeMo/examples/asr/conf/conformer/streaming/

    A model trained in streaming mode can get evaluated with the following script: NeMo/examples/asr/conf/conformer/streaming/speech_to_text_streaming_infer.py

    This script would simulate the streaming inference for a single audio or a manifest of audio files. Streaming can be done in multi-streaming mode (batched inference) for the manifest file to speed up the streaming. It can also compare the results with offline evaluation and report the differences in both the WER and models' outputs.

    The accuracy of the model in both the offline evaluation and streaming is going to be exactly the same. In offline mode, the whole audio is passed through the model while in streaming audio is passed chunk by chunk.

    Changelog

    • Added frame-wise streaming Conformer models with look-ahead support and caching mechanism for streaming inference.

    Usage

    • You can potentially add a usage example below
    # Add a code snippet demonstrating how to use this 
    

    PR Type:

    • [x ] New Feature
    • [ ] Bugfix
    • [ ] Documentation
    opened by VahidooX 49
  • [NeMoMegatron] Pipeline parallelism for GPT

    [NeMoMegatron] Pipeline parallelism for GPT

    PR to add pipeline parallelism to GPT using fwd/bwd functions from Apex.

    FP32, FP16, and BF16 are all working now.

    When using pipeline parallel it is recommended to use BF16 + Megatron amp O2.

    model.megatron_amp_O2=True
    trainer.precision='bf16'
    

    TODOs

    • under review

    Known issues

    • complete method will be supported in a subsequent PR
    • prompt tuning temporarily disabled, use NeMo 1.6 if needed
    • when using tensor parallel only, we're still using sync grad all-reduce which reduces perf. Will be fixed in NeMo 1.8.
    opened by ericharper 49
  • Maximum sample-based training for Megatron NMT and Text Memmap based Seq2seq Pre-training

    Maximum sample-based training for Megatron NMT and Text Memmap based Seq2seq Pre-training

    What does this PR do ?

    1. Trains Megatron-based NMT models based on maximum number of samples.
    2. Added support in text_memmap and csv_memmap in Megatron encoder-decoder models (T5, BART, UL2)

    Collection: NLP

    Usage

    Add to command line

      model.data.data_impl=text_mmap \
      +model.data.data_impl_kwargs.newline_int=10 \
      +model.data.data_impl_kwargs.header_lines=0 \
      +model.data.data_impl_kwargs.workers=null \
      +model.data.data_impl_kwargs.sort_dataset_paths=False
    
    # Add a code snippet demonstrating how to use this 
    

    Before your PR is "Ready for review"

    Pre checks:

    • [ ] Make sure you read and followed Contributor guidelines
    • [ ] Did you write any new necessary tests?
    • [ ] Did you add or update any necessary documentation?
    • [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
      • [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

    PR Type:

    • [ ] New Feature
    • [ ] Bugfix
    • [ ] Documentation

    If you haven't finished some of the above items you can still open "Draft" PR.

    Who can review?

    Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

    Additional Information

    • Related to # (issue)
    opened by MaximumEntropy 46
  • Tn es

    Tn es

    What does this PR do ?

    Adds Text Normalization for spanish language to nemo_text_normalization

    Collection: Nemo Text Normalization

    Changelog

    • Adds Text Normalization for Spanish language. Verbalizers and classifiers are available for following classes
      • Cardinal
      • Decimal
      • Ordinal
      • Fraction
      • Money
      • Measure
      • Date
      • Time
      • Electronic
      • Whitelist

    Also includes a localization option of es-amer, which changes formatting rules to accommodate tendencies in Central American orthography. (e.g. Use of periods to group cardinals instead of commas - as is customary for other Spanish speaking locales.)

    Includes updated es_pytests for text normalization and edits to export_grammar.sh and normalize.py to allow deployment. All tests have passed in nemo docker environment.

    # Add a code snippet demonstrating how to use this 
    

    Before your PR is "Ready for review"

    Pre checks:

    • [Y ] Make sure you read and followed Contributor guidelines
    • [Y ] Did you write any new necessary tests?
    • [Y ] Did you add or update any necessary documentation?
    • [ ]Y Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
      • [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

    PR Type:

    • [ Y] New Feature

    If you haven't finished some of the above items you can still open "Draft" PR.

    Who can review?

    Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

    Additional Information

      • N. B. Sparrowhawk deployment is altering white-space following periods, so TN is unable to manage both am/pm along with time zones. That is -> 10 a.m. est will cause an error. However, 10.00 h est is stable. This issue is Sparrowhawk related so have been unable to debug.
    opened by bonham79 46
  • Neural Graphs

    Neural Graphs

    This (3rd!) PR follows the proposal from "Neural Graphs" design doc: https://docs.google.com/document/d/1218tRm2XtfLbYJvoepnbg3ET1CkJ72CCBTGAUZ1G7gY/edit#

    Additionally, it assumes that a graph is developed for training/inference, so changes the mode of the "connected" modules during its build.

    • [x] Application State (singleton)
    • [x] Registration of a Neural Graph
    • [x] Recording of operation/modules forming a Graph
    • [x] Input port binding - with default port name and option to provide a new name (manual)
    • [x] Output port binding - with default port name and option to provide a new name (manual)
    • [x] Graph nesting
    • [x] Summary of graph/modules in a graph
    • [x] Export of a graph to YML file
    • [x] Import of a graph from YML file
    • [x] Built-in handling of training/inference modes
    • [x] Serialization of NeuralTypes for connections/inputs/outputs
    • [x] Graphs with loops
    • [x] Extended train() signature enabling to pass the "training_graph"

    And a whole bunch of unit tests covering different aspects, from simple binding to "nesting of deserialized graph with input and output port bound into a graph with different ports bound" to "a graph with a loop"...

    opened by tkornuta-nvidia 46
  • Text memmap dataset

    Text memmap dataset

    Signed-off-by: Micha Livne [email protected]

    What does this PR do ?

    Has mechanism to retire older ind files by updating internal idx version.

    Indexing speed of 1443990774 samples in 147 files using 6 workers

    Loading speed

    [NeMo I 2022-04-29 00:23:22 text_memmap_dataset:85] Time loading 147 mem-mapped files: 0:00:04.395558
    
    In [9]: len(ds)
    Out[9]: 1443990774
    # Timing without tokenizer
    In [10]: %timeit -n 1000  ds[np.random.randint(len(ds))]
    555 µs ± 16.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
    # TIming with 'byte-level' tokenizer
    In [20]: %timeit -n 1000  ds[np.random.randint(len(ds))]
    724 µs ± 19.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
    

    Add a one line overview of what this PR aims to accomplish.

    Collection: [Note which collection this PR will affect]

    Changelog

    • Added TextMemMapDataset
    • Added CSVMemMapDataset
    • Retired MegatronDataset
    • Added nemo/collections/nlp/data/language_modeling/text_memmap_dataset.py to preprocess indices (else happs on the fly at first run)
    • Added nemo/collections/nlp/data/machine_translation/sequence_to_sequence_dataset.py
    • Added scripts/nlp_language_modeling/build_index_memmap_data.py

    Usage

    Example for caching index files:

    NeMo/scripts/nlp_language_modeling/build_index_memmap_data.py *.txt
    

    Index files will be created when instantiating a memory mapped dataset if missing.

    Before your PR is "Ready for review"

    Pre checks:

    • [ ] Make sure you read and followed Contributor guidelines
    • [ ] Did you write any new necessary tests?
    • [ ] Did you add or update any necessary documentation?
    • [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
      • [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

    PR Type:

    • [ ] New Feature
    • [ ] Bugfix
    • [ ] Documentation

    If you haven't finished some of the above items you can still open "Draft" PR.

    Who can review?

    Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

    Additional Information

    • Related to # (issue)
    opened by michalivne 45
  • Don't add output directory twice when creating shared sentencepiece tokenizer

    Don't add output directory twice when creating shared sentencepiece tokenizer

    Signed-off-by: Patrick Simianer [email protected]

    What does this PR do ?

    As the title says this is a small fix: The output dir was already added to encoder_tokenizer_model on line 789 in the same file.

    Collection: NLP

    Changelog

    • Bugfix for creating shared sentencepiece tokenizer.

    Usage

    The error can be triggered by running preprocessing with NeMo for an MT data set:

    #!/bin/bash
    
    python ../nemo/examples/nlp/machine_translation/enc_dec_nmt.py \
      -cn aayn_base \
      do_training=false \
      model.preproc_out_dir=./preproc_dir/ \
      model.train_ds.use_tarred_dataset=true \
      model.train_ds.lines_per_dataset_fragment=1000000 \
      model.train_ds.num_batches_per_tarfile=200 \
      model.train_ds.src_file_name=../europarl-v7.de-en.en \
      model.train_ds.tgt_file_name=../europarl-v7.de-en.de \
      model.validation_ds.src_file_name=../valid.en \
      model.validation_ds.tgt_file_name=../valid.de \
      model.encoder_tokenizer.vocab_size=32000 \
      model.decoder_tokenizer.vocab_size=32000 \
      model.encoder_tokenizer.library=sentencepiece \
      model.encoder_tokenizer.training_sample_size=9999 \
      model.decoder_tokenizer.library=sentencepiece \
      model.decoder_tokenizer.training_sample_size=9999 \
      ~model.test_ds \
      trainer.accelerator='cpu' \
      +trainer.fast_dev_run=true \
      exp_manager=null
    

    Before your PR is "Ready for review"

    Pre checks:

    • [x] Make sure you read and followed Contributor guidelines
    • [ ] Did you write any new necessary tests?
    • [ ] Did you add or update any necessary documentation?
    • [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
      • [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

    PR Type:

    • [ ] New Feature
    • [x] Bugfix
    • [ ] Documentation

    Who can review?

    Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

    Additional Information

    NLP 
    opened by pks 0
  • Sanitize params before DLLogger log_hyperparams

    Sanitize params before DLLogger log_hyperparams

    What does this PR do ?

    Allows DLLogger to work with types of non-builtin containers.

    Before your PR is "Ready for review"

    Pre checks:

    • [x] Make sure you read and followed Contributor guidelines
    • [ ] Did you write any new necessary tests?
    • [ ] Did you add or update any necessary documentation?
    • [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
      • [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

    PR Type:

    • [ ] New Feature
    • [x] Bugfix
    • [ ] Documentation

    Who can review?

    Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

    opened by milesial 0
  • Esperanto example

    Esperanto example

    What does this PR do ?

    Adds ASR example for training Esperanto Conformer-CTC-large model.

    Collection: ASR

    Changelog

    • Adds Esperanto example to docs/source/asr/examples/

    Before your PR is "Ready for review"

    Pre checks:

    • [x] Make sure you read and followed Contributor guidelines
    • [ ] Did you write any new necessary tests?
    • [ ] Did you add or update any necessary documentation?
    • [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
      • [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

    PR Type:

    • [ ] New Feature
    • [ ] Bugfix
    • [x] Documentation
    ASR 
    opened by andrusenkoau 0
  • fix: clamp keep input size in update_cache for causal conv

    fix: clamp keep input size in update_cache for causal conv

    What does this PR do ?

    Sometimes in CausalConv1D.update_cache input_x_keep ends up having no frames (ie size of [M, N, 0]). Make sure that the we have at least one frame.

    Collection: asr

    Changelog

    • Add specific line by line info of high level changes in this PR.

    Usage

    • You can potentially add a usage example below
    # Add a code snippet demonstrating how to use this 
    

    Before your PR is "Ready for review"

    Pre checks:

    • [ ] Make sure you read and followed Contributor guidelines
    • [ ] Did you write any new necessary tests?
    • [ ] Did you add or update any necessary documentation?
    • [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
      • [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

    PR Type:

    • [ ] New Feature
    • [x] Bugfix
    • [ ] Documentation

    If you haven't finished some of the above items you can still open "Draft" PR.

    Who can review?

    Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas. @VahidooX

    Additional Information

    To reproduce the original issue in main run:

    #!/bin/bash
    python examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py \
        --asr_model=stt_en_conformer_ctc_small \
        --chunk_size=100 \
        --shift_size=50 \
        --left_chunks=2 \
        --online_normalization \
        --manifest_file=/datasets/ls_test_other/transcripts.local.json \
        --batch_size=16 \
        --compare_vs_offline \
        --use_amp \
        --debug_mode
    

    Error output:

    ...
    Traceback (most recent call last):
      File "examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py", line 393, in <module>
        main()
      File "examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py", line 349, in main
        streaming_tran, offline_tran = perform_streaming(
      File "examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py", line 154, in perform_streaming
        ) = asr_model.conformer_stream_step(
      File "/home/grclark/code/NeMo.git/streaming-conformer/nemo/collections/asr/parts/mixins/mixins.py", line 475, in conformer_stream_step
        (encoded, encoded_len, cache_last_channel_next, cache_last_time_next) = self.encoder.cache_aware_stream_step(
      File "/home/grclark/code/NeMo.git/streaming-conformer/nemo/collections/asr/parts/mixins/streaming.py", line 61, in cache_aware_stream_step
        encoder_output = self(
      File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/grclark/code/NeMo.git/streaming-conformer/nemo/core/classes/common.py", line 1087, in __call__
        outputs = wrapped(*args, **kwargs)
      File "/home/grclark/code/NeMo.git/streaming-conformer/nemo/collections/asr/modules/conformer_encoder.py", line 471, in forward
        audio_signal = layer(
      File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/grclark/code/NeMo.git/streaming-conformer/nemo/collections/asr/parts/submodules/conformer_modules.py", line 191, in forward
        x = self.conv(x, pad_mask=pad_mask, cache=cache_last_time, cache_next=cache_last_time_next)
      File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/grclark/code/NeMo.git/streaming-conformer/nemo/collections/asr/parts/submodules/conformer_modules.py", line 350, in forward
        x = self.depthwise_conv(x, cache=cache, cache_next=cache_next)
      File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/grclark/code/NeMo.git/streaming-conformer/nemo/collections/asr/parts/submodules/causal_convs.py", line 162, in forward
        x = self.update_cache(x, cache=cache, cache_next=cache_next)
      File "/home/grclark/code/NeMo.git/streaming-conformer/nemo/collections/asr/parts/submodules/causal_convs.py", line 158, in update_cache
        cache_next[self._cache_id, :, :, -cache_keep_size:] = input_x_kept[:, :, -cache_keep_size:]
    RuntimeError: The expanded size of the tensor (1) must match the existing size (0) at non-singleton dimension 2.  Target sizes: [16, 176, 1].  Tensor sizes: [16, 176, 0]
    
    ASR 
    opened by messiaen 0
  • ASR evaluator

    ASR evaluator

    What does this PR do ?

    Add a one line overview of what this PR aims to accomplish.

    Collection: [Note which collection this PR will affect]

    Changelog

    • Add specific line by line info of high level changes in this PR.

    Usage

    • You can potentially add a usage example below
    # Add a code snippet demonstrating how to use this 
    

    Before your PR is "Ready for review"

    Pre checks:

    • [ ] Make sure you read and followed Contributor guidelines
    • [ ] Did you write any new necessary tests?
    • [ ] Did you add or update any necessary documentation?
    • [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
      • [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

    PR Type:

    • [ ] New Feature
    • [ ] Bugfix
    • [ ] Documentation

    If you haven't finished some of the above items you can still open "Draft" PR.

    Who can review?

    Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

    Additional Information

    • Related to # (issue)
    ASR 
    opened by fayejf 0
  • [ASR] Use a subset of manifest when using `scatter` shard strategy

    [ASR] Use a subset of manifest when using `scatter` shard strategy

    What does this PR do ?

    When using shard_strategy='scatter' this PR loads only a subset of lines from the manifest file. This may reduce the time to process a manifest file by an order of magnitude.

    Opening as a draft to get feedback if there are any underlying assumptions which may be broken by this change.

    Collection: ASR

    Changelog

    | File | Change | | ---------------- | -------- | | manifest.py::item_iter | Load only a subset of lines if shard_strategy == 'scatter' | | collections.py:: ASRAudioText | forward shard_strategy, global_rank and world_size to manifest.item_iter | | collections.py:: AudioText | Use rank and world size to restore the original data list length for shard_strategy='scatter' | | audio_to_text.py | forward shard_strategy, global_rank and world_size to collections.ASRAudioText | | utils,py | Added a function to get number of lines from a text file | | test_utils.py | Added unit test for the utility function added above |

    Before your PR is "Ready for review"

    Pre checks:

    • [x] Make sure you read and followed Contributor guidelines
    • [x] Did you write any new necessary tests?
    • [ ] Did you add or update any necessary documentation?
    • [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
      • [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

    PR Type:

    • [ ] New Feature
    • [x] Bugfix
    • [ ] Documentation

    If you haven't finished some of the above items you can still open "Draft" PR.

    Who can review?

    Anyone in the community is free to review the PR once the checks have passed. Contributor guidelines contains specific people who can review PRs to various areas.

    Additional Information

    • Related to # (issue)
    ASR common 
    opened by anteju 0
Releases(v1.14.0)
  • v1.14.0(Dec 24, 2022)

    Highlights

    NeMo ASR

    • Hybrid CTC + Transducer loss ASR #5364
    • Sampled Softmax RNNT (Enables large vocab RNNT, for speech translation and multilingual ASR) #5216
    • ASR Adapters hyper parameter search scripts #5159
    • RNNT {ONNX, TorchScript} x GPU export infer #5248
    • Exportable MelSpectrogram (TorchScript) #5512
    • Audio To Audio Dataset Processor #5196
    • Multi Channel Audio Transcription #5479
    • Silence Augmentation #5476

    NeMo Megatron

    • Support for the Mixture of Experts for T5
    • Fix PTL model size output for GPT-3 and BERT
    • BERT with Tensor Parallelism & Pipeline Parallel Support

    NeMo Core

    • Hydra Multirun core support + NeMo HP optim in YAML #5159

    NeMo Models

    Detailed Changelogs

    ASR

    Changelog
    • [Tools][ASR] Tool for generating data using simulated RIRs by @anteju :: PR: #5158
    • Modernize RNNT ONNX export and add TS export by @titu1994 :: PR: #5248
    • Add Gradio App to ASR Docs by @titu1994 :: PR: #5270
    • Add support for Sampled Softmax for RNNT Joint by @titu1994 :: PR: #5216
    • Speed up HF data processing script for ASR by @titu1994 :: PR: #5330
    • bugfix in volume loss for CTC models by @bmwshop :: PR: #5348
    • Add cpWER for evaluation of ASR with diarization by @tango4j :: PR: #5279
    • Fix for getting tokenizer in character-based ASR models when using tarred dataset by @jonghwanhyeon :: PR: #5442
    • Refactor/unify ASR offline and buffered inference by @fayejf :: PR: #5440
    • Standalone diarization+ASR evaluation script by @tango4j :: PR: #5439
    • [ASR] Transcribe for multi-channel signals by @anteju :: PR: #5479
    • Add Silence Augmentation by @fayejf :: PR: #5476
    • add exportable mel spec by @1-800-BAD-CODE :: PR: #5512
    • add RNN-T loss implemented by PyTorch and test code by @hainan-xv :: PR: #5312
    • [ASR] AudioToAudio datasets and related test by @anteju :: PR: #5196
    • Add StreamingFeatureBufferer class for real-life streaming decoding by @tango4j :: PR: #5534
    • Pool stats with padding by @1-800-BAD-CODE :: PR: #5403
    • Adding Hybrid RNNT-CTC model by @VahidooX :: PR: #5364
    • Fix ASR Buffered inference scripts by @titu1994 :: PR: #5552
    • Add wer details - insertion, deletion, substitution rate by @fayejf :: PR: #5557
    • Add support for Time Stamp calculation using transcribe_speech.py by @titu1994 :: PR: #5568
    • [STT] Add Esperanto (Eo) ASR Conformer-CTC and Conformer-Transducer models by @andrusenkoau :: PR: #5639

    TTS

    Changelog
    • [TTS] Fastpitch energy condition and refactoring by @subhankar-ghosh :: PR: #5218
    • [TTS] HiFi-TTS Download Script by @oleksiivolk :: PR: #5241
    • [TTS] Add Mandarin/English Bilingual Recipe for Training Fastpitch Models by @yuekaizhang :: PR: #5208
    • [TTS] fixed type of filepath and rename openslr. by @XuesongYang :: PR: #5276
    • [TTS] replace obsolete torch_tts unit test marker with run_only_on('CPU') by @XuesongYang :: PR: #5307
    • [TTS] bugfix IPAG2P and refactor to remove duplicate process. by @XuesongYang :: PR: #5304
    • Update path to get_data.py in TTS tutorial by @redoctopus :: PR: #5311
    • [TTS] Replace IPA lambda arguments with locale string by @rlangman :: PR: #5298
    • [TTS] expand to support flexible dictionary entry formats in IPAG2P. by @XuesongYang :: PR: #5318
    • [TTS] update organization of model checkpoints and their pointers. by @XuesongYang :: PR: #5327
    • [TTS] bugfix for the script of generating mels from fastpitch. by @XuesongYang :: PR: #5344
    • [TTS] Add Spanish model documentation by @rlangman :: PR: #5390
    • [TTS] Add Spanish FastPitch training configs by @rlangman :: PR: #5383
    • [TTS] replace pitch normalization params with ??? by @XuesongYang :: PR: #5392
    • [TTS] Create script for processing TTS training audio by @rlangman :: PR: #5262
    • [TTS] remove useless logic for set_tokenizer. by @XuesongYang :: PR: #5430
    • [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue by @borisfom :: PR: #5358
    • JOC Optimization in FastPitch by @subhankar-ghosh :: PR: #5450
    • [TTS] Support speaker level pitch normalization by @rlangman :: PR: #5455
    • TTS tutorial update: use speaker 9017 instead of 6097 by @redoctopus :: PR: #5532
    • [TTS] Remove unused TTS eval function by @redoctopus :: PR: #5605
    • [TTS][ZH] add fastpitch and hifigan model NGC urls and update NeMo docs. by @XuesongYang :: PR: #5596
    • [TTS][DOC] add notes about automatic conversion to target sampling ra… by @XuesongYang :: PR: #5624
    • [TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. by @XuesongYang :: PR: #5643
    • [TTS][ZH] bugfix for ngc cli installation. by @XuesongYang :: PR: #5652
    • [TTS][ZH] fix broken link for the script. by @XuesongYang :: PR: #5666

    NLP / NMT

    Changelog
    • Option to pad the last validation input sequence if its smaller than the encoder sequence length for MegatronGPT by @anmolgupt :: PR: #5243
    • Fixes bugs with loss averaging with for Megatron GPT by @shanmugamr1992 :: PR: #5329
    • Fixing bug in Megatron BERT when loss mask is all zeros by @shanmugamr1992 :: PR: #5424
    • support to disable sequence length + 1 input tokens for each sample in MegatronGPT by @anmolgupt :: PR: #5363
    • [TN] raise NotImplementedError for unsupported languages and other minor fixes by @XuesongYang :: PR: #5414
    • Bug fix/gpt by @shanmugamr1992 :: PR: #5493
    • prompt tuning fix for unscale grad errors by @arendu :: PR: #5523
    • Bert sequence parallel support by @shanmugamr1992 :: PR: #5494
    • NLP docs fixes by @vsl9 :: PR: #5528
    • Switch order of args in optimizer_step override by @ericharper :: PR: #5549
    • Upgrade to 22.11 by @ericharper :: PR: #5550
    • Merge r1.13.0 main by @ericharper :: PR: #5570
    • some tokenizers do not have additional_special_tokens_ids attribute by @arendu :: PR: #5642
    • Remove cell output from tutorial by @ericharper :: PR: #5689

    Text Normalization / Inverse Text Normalization

    Changelog
    • [ITN] fix year date graph, cardinals extension for hundreds by @ekmb :: PR: #5435
    • [TN] raise NotImplementedError for unsupported languages and other minor fixes by @XuesongYang :: PR: #5414

    Export

    Changelog
    • Fixed the onnx bug in conformer for non-streaming models. by @VahidooX :: PR: #5242
    • Modernize RNNT ONNX export and add TS export by @titu1994 :: PR: #5248
    • Fixes for Conformer-xl export by @borisfom :: PR: #5309
    • Remove onnx graphsurgery from Dockerfile by @titu1994 :: PR: #5320
    • add exportable mel spec by @1-800-BAD-CODE :: PR: #5512

    General Improvements

    Changelog
    • bugfix in volume loss for CTC models by @bmwshop :: PR: #5348
    • Fix setting up of learning rate scheduler by @PeganovAnton :: PR: #5444
    • Better patch hydra by @titu1994 :: PR: #5591
    • [TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. by @XuesongYang :: PR: #5643
    • Add fully torch.jit.script-able speaker clustering module by @tango4j :: PR: #5191
    • Update perturb.py by @stevehuang52 :: PR: #5231
    • remove CV requirements. by @XuesongYang :: PR: #5233
    • checks for accepted adapter type at module level by @arendu :: PR: #5194
    • fix hypotheses return by @nithinraok :: PR: #5253
    • Support for inserting additional subsampling in conformer encoder by @shan18 :: PR: #5224
    • update tutorials to use meeting config as default and VAD by @nithinraok :: PR: #5237
    • Specifying audio signal dropout separately for the Conformer Encoder by @shan18 :: PR: #5263
    • created by @bmwshop :: PR: #5268
    • Fix failing speaker counting for short audio samples by @tango4j :: PR: #5267
    • O2bert + apex pipeline functions by @shanmugamr1992 :: PR: #5221
    • Upperbound PTL by @titu1994 :: PR: #5302
    • Update Interface(s) phonetic entry by @blisc :: PR: #5212
    • add label inference support to EncDecSpeakerLabel class by @nithinraok :: PR: #5278
    • Add italian model checkpoints by @Kipok :: PR: #5315
    • Text Memmap Parsing Improvements by @michalivne :: PR: #5265
    • Update librosa signature in HF processing script by @titu1994 :: PR: #5321
    • Force wav file format for audio_filepath by @titu1994 :: PR: #5323
    • Updates to T0 Dataset and Model by @MaximumEntropy :: PR: #5201
    • [DOC] add sphinx-copybutton requirement to copy button on code snippets. by @XuesongYang :: PR: #5326
    • Add support for Hydra multirun to NeMo by @titu1994 :: PR: #5159
    • typo fix by @arendu :: PR: #5328
    • add precommit hood to automatic sort entries in requirements. by @XuesongYang :: PR: #5333
    • Add speaker clustering arguments to forward function by @tango4j :: PR: #5306
    • Fixing de-autocast by @borisfom :: PR: #5319
    • [Bugfix] Added rm -f / wget- nc command to avoid bash error in multispeaker sim notebook by @tango4j :: PR: #5292
    • [DOC] added ipython dependency to support IPython.sphinxext extension by @XuesongYang :: PR: #5345
    • Bug fix (removing old compute consumed samples) by @shanmugamr1992 :: PR: #5355
    • removed uninstall nemo_cv and nemo_simple_gan and relax numba version… by @XuesongYang :: PR: #5332
    • Enable mlflow logger by @whrichd :: PR: #4893
    • Fix Python type hints according to Python Docs by @artbataev :: PR: #5370
    • Distributed optimizer support for BERT by @timmoon10 :: PR: #5305
    • SpeakerClustering: fix tensor dimennsions in forward() by @virajkarandikar :: PR: #5387
    • add squad by @arendu :: PR: #5407
    • added python and c++ alignment code by @yzhang123 :: PR: #5346
    • Add MoE support for T5 model (w/o expert parallel) by @aklife97 :: PR: #5409
    • Fix for concat map dataset by @1-800-BAD-CODE :: PR: #5133
    • Support for finetuning and finetuning inference with .ckpt files & batch size refactoring by @MaximumEntropy :: PR: #5339
    • update doc in terms of get_label for lang id model by @fayejf :: PR: #5366
    • Debug support for interleaved pipeline parallelism with the distributed Adam optimizer by @timmoon10 :: PR: #5236
    • Create codeql.yml by @titu1994 :: PR: #5445
    • Update codeql.yml by @titu1994 :: PR: #5449
    • Fix support for legacy sentencepiece models by @Numeri :: PR: #5406
    • Update docs with Comparison tool info, and slightly change .sh for ea… by @Jorjeous :: PR: #5182
    • Add float32 type casting for get_samples function by @tango4j :: PR: #5399
    • Add missing import in transcribe_utils.py by @jonghwanhyeon :: PR: #5487
    • Add auto-labeler by @SeanNaren :: PR: #5498
    • Add more glob patterns for labeler by @SeanNaren :: PR: #5504
    • Fix issues with PL 1.8 by @SeanNaren :: PR: #5353
    • [BugFix] Removing tokens from decoding timestamp by @tango4j :: PR: #5481
    • Upperbound the torchmetrics version by @SeanNaren :: PR: #5537
    • Data parallel collect results by @michalivne :: PR: #5547
    • Fix log-rank-0-only logic by @mikolajblaz :: PR: #5555
    • Fixed Docker build by @borisfom :: PR: #5562
    • Patch hydra launch by @titu1994 :: PR: #5589
    • Fix race condition bug with hydra multirun by @titu1994 :: PR: #5594
    • Update Dockerfile to use numba==0.53.1 by @stevehuang52 :: PR: #5614
    • Fixed a missing import for gather_objects by @michalivne :: PR: #5622
    Source code(tar.gz)
    Source code(zip)
  • v1.13.0(Dec 7, 2022)

    Highlights

    NeMo ASR

    • Spoken Language Understanding (SLU) models based on Conformer encoder and transformer decoder
    • Support for codeswitched manifests during training
    • Support for Language ID during inference for ML models
    • Support of cache-aware streaming for offline models
    • Word confidence estimation for CTC & RNNT greedy decoding

    NeMo Megatron

    • Interleaved Pipeline schedule
    • Transformer Engine for GPT
    • HF T5v1.1 -> NeMo-Megatron conversion and finetuning/p-tuning
    • IA3 and Adapter Tuning (Tensor + Pipeline Parallel)
    • Pipeline Parallel Support for T5 Prompt Learning
    • MegatronNMT export

    NeMo TTS

    • TTS introductory tutorial
    • Phonemizer/espeak removal (Spanish/German)
    • Char-only support for Spanish/German models
    • Documentation Refactor

    NeMo Core

    • Upgrade to NGC PyTorch 22.09 container
    • Add pre-commit hooks
    • Exponential moving average (EMA) of weights during training

    NeMo Models

    Detailed Changelogs

    Container

    For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

    docker pull nvcr.io/nvidia/nemo:22.09
    

    Known Issues

    Issues
    • pytest for RadTTSModel_export_to_torchscript are failing intermittently due to random input values. Fixed in main.

    ASR

    Changelog
    • Add docs tutorial on kinyarwanda asr by @bene-ges :: PR: #4953
    • Asr codeswitch by @bmwshop :: PR: #4821
    • Add test for nested ASR model by @titu1994 :: PR: #5002
    • Greedy decoding confidence for CTC and RNNT by @GNroy :: PR: #4931
    • [ASR][Tools] RIR corpus generator by @anteju :: PR: #4927
    • Add Squeezeformer CTC model checkpoints on Librispeech by @titu1994 :: PR: #5121
    • adding loss normalization options to rnnt joint by @bmwshop :: PR: #4829
    • Asr concat dataloader by @bmwshop :: PR: #5108
    • Added ASR model comparison to SDE by @Jorjeous :: PR: #5043
    • Add scripts for converting Spoken Wikipedia to asr dataset by @bene-ges :: PR: #5138
    • ASR confidence bug fix for older Python versions by @GNroy :: PR: #5180
    • Update ASR Scores and Results by @titu1994 :: PR: #5254
    • [STT] Add Ru ASR Conformer-CTC and Conformer-Transducer by @ssh-meister :: PR: #5340

    TTS

    Changelog
    • [TTS] Adding speaker embedding conditioning in fastpitch by @subhankar-ghosh :: PR: #4986
    • [TTS] Remove PhonemizerTokenizer by @rlangman :: PR: #4990
    • [TTS] FastPitch speaker interpolation by @subhankar-ghosh :: PR: #4997
    • RADTTS model changes to accommodate export with batch size > 1 by @borisfom :: PR: #4947
    • [TTS] remove phonemizer.py by @XuesongYang :: PR: #5090
    • [TTS] Add NeMo TTS Primer Tutorial by @rlangman :: PR: #4933
    • [TTS] Add SpanishCharsTokenizer by @rlangman :: PR: #5135
    • Fixes for docs/typos + remove max_utts parameter from tarred datasets as it causes hang in training by @Kipok :: PR: #5118
    • refactor TTS documentation organization and add new contents. by @XuesongYang :: PR: #5137
    • [TTS][DOC] update models trained on HifiTTS dataset. by @XuesongYang :: PR: #5173
    • [TTS] Fix TTS Primer image markup by @rlangman :: PR: #5192
    • [TTS] deprecate TextToWaveform base class. by @XuesongYang :: PR: #5205
    • [TTS] remove the avoidance of circular imports by @XuesongYang :: PR: #5214
    • [TTS] remove LinVocoder and apply Vocoder as parent class. by @XuesongYang :: PR: #5206
    • [TTS] unify requirements_tts.txt and requirements_torch_tts.txt by @XuesongYang :: PR: #5232
    • Minor typo fixes in TTS tutorial by @redoctopus :: PR: #5266
    • Radtts 1.13 by @borisfom :: PR: #5451
    • Radtts 1.13 plus by @borisfom :: PR: #5457

    NLP / NMT

    Changelog
    • IA3 support for GPT and T5 by @arendu :: PR: #4909
    • Fix and refactor consumed samples save/restore for Megatron models. by @MaximumEntropy :: PR: #5077
    • Remove unsupported arguments from MegatronNMT by @MaximumEntropy :: PR: #5065
    • Update megatron interface to dialogue by @Zhilin123 :: PR: #4936
    • gpt ia3 CI tests by @arendu :: PR: #5140
    • Fix NMT Eval Sampler by @aklife97 :: PR: #5154
    • Add interleaved pipeline schedule to GPT by @ericharper :: PR: #5025
    • fix for bug in bignlp by @arendu :: PR: #5172
    • Fixes some args that were not removed properly for multilingual Megatron NMT by @MaximumEntropy :: PR: #5142
    • Fix absolute path in GPT Adapter CI tests by @arendu :: PR: #5184
    • Add ability to configure drop last batch for validation datasets with MegatronGPT by @shanmugamr1992 :: PR: #5067
    • Megatron Export Update by @Davood-M :: PR: #5343
    • Fix GPT generation when using sentencepiece tokenizer by @MaximumEntropy :: PR: #5413
    • Disable sync_batch_comm in validation_step for GPT by @ericharper :: PR: #5397
    • Set sync_batch_comm=False in prompt learning and inference by @MaximumEntropy :: PR: #5448
    • Fix a bug with positional vs key-word based argument passing in the transformer layer by @MaximumEntropy :: PR: #5475

    Text Normalization / Inverse Text Normalization

    Changelog
    • [Chinese text normalization] speed up graph building by @pengzhendong :: PR: #5128

    NeMo Tools

    Changelog
    • Added ASR model comparison to SDE by @Jorjeous :: PR: #5043

    Export

    Changelog
    • Fix export bug by @VahidooX :: PR: #5009
    • RADTTS model changes to accommodate export with batch size > 1 by @borisfom :: PR: #4947
    • Support TorchScript export for Squeezeformer by @titu1994 :: PR: #5164
    • Expose keep_initializers_as_inputs to Exportable class by @pks :: PR: #5052
    • Fix the self-attention export bug for cache-aware streaming Conformer by @VahidooX :: PR: #5114
    • replace ColumnParallelLinear with nn.Linear in export_utils by @arendu :: PR: #5217
    • Megatron Export Update by @Davood-M :: PR: #5343
    • Fix Conformer Export in 1.13.0 (cherry-pick from main) by @artbataev :: PR: #5446
    • export_utils bugfix by @Davood-M :: PR: #5480
    • Export fixes for Riva by @borisfom :: PR: #5496

    General Improvements and Bugfixes

    Changelog
    • don't use bfloat16 when in jit by @bmwshop :: PR: #5051
    • Set sync_batch_comm=False in prompt learning and inference by @MaximumEntropy :: PR: #5448
    • Fix a bug with positional vs key-word based argument passing in the transformer layer by @MaximumEntropy :: PR: #5475
    • Pin Transformers version to fix CI by @SeanNaren :: PR: #4955
    • Fix changelog builder (#4962) by @titu1994 :: PR: #4963
    • Checkpoint averaging class fix by @michalivne :: PR: #4946
    • Add ability to give seperate datasets for test, train and validation by @shanmugamr1992 :: PR: #4798
    • Add simple pre-commit file by @SeanNaren :: PR: #4983
    • Import pycuda.autoprimaryctx or pycuda.autoinit to init pycuda execut… by @liji-nv :: PR: #4951
    • Improvements to AMI script by @SeanNaren :: PR: #4974
    • clean warnings from tests and CI runs, and prepare for upgrade to PTL 1.8 by @nithinraok :: PR: #4830
    • Update libraries by @titu1994 :: PR: #5010
    • add close inactive issues and PRs github action. by @XuesongYang :: PR: #5015
    • Fix filename extraction in vad_utils.py by @GKPr0 :: PR: #4999
    • Add black to pre-commit by @SeanNaren :: PR: #5027
    • [CI] Enable previous build abort when new commit pushed by @SeanNaren :: PR: #5041
    • Tutorials and Docs for Multi-scale Diarization Decoder by @tango4j :: PR: #4930
    • Refactor output directory for MSDD Inference Notebook by @SeanNaren :: PR: #5044
    • text_memmap dataset index range testing fix by @michalivne :: PR: #5034
    • fix undefined constant in code example by @bene-ges :: PR: #5046
    • Text generation refactor and RETRO text generation implementation by @yidong72 :: PR: #4985
    • Lids by @bmwshop :: PR: #4820
    • Add datasets folder, add diarization datasets voxconverse/aishell by @SeanNaren :: PR: #5042
    • Fix the bugs in cache-aware streaming Conformer by @VahidooX :: PR: #5032
    • Bug fix - Limit val batches set to 1.0 by @shanmugamr1992 :: PR: #5023
    • [bug_fix] kv_channels is used when available by @arendu :: PR: #5066
    • Add spe_split_by_unicode_script arg by @piraka9011 :: PR: #5072
    • Transformer Engine Integration by @ericharper :: PR: #5104
    • Text memmap dataset index memory efficiency by @michalivne :: PR: #5056
    • Add NGC links for Aligner and FastPitch by @redoctopus :: PR: #5235
    • Fix link to inference notebook by @redoctopus :: PR: #5247
    • Fix links to speaker identification notebook by @SeanNaren :: PR: #5260
    • Fix bug into Dialogue tutorial by @Zhilin123 :: PR: #5277
    • PCLA tutorial typo fix by @jubick1337 :: PR: #5288
    • Fix dialogue tutorial bug by @Zhilin123 :: PR: #5297
    • small bugfix for r1.13.0 by @fayejf :: PR: #5310
    • Add italian model checkpoints by @Kipok :: PR: #5316
    • Pcla tutorial fixes by @jubick1337 :: PR: #5313
    • Fix issue with HF Model upload tutorial by @titu1994 :: PR: #5359
    • P&C LA tutorial fixes by @jubick1337 :: PR: #5354
    • Add SDP documentation by @erastorgueva-nv :: PR: #5274
    • [Bugfix] Added rm -f / wget- nc command in multispeaker sim notebook to r1.13.0 by @tango4j :: PR: #5375
    • Rename Speech Dataset Processor to Speech Data Processor by @erastorgueva-nv :: PR: #5378
    • fix for num worker 0 causing issues in losses after 1 epoch by @arendu :: PR: #5379
    • Fixed bug in notebook by @vadam5 :: PR: #5382
    • Force MHA QKV onto fp32 by @titu1994 :: PR: #5391
    • Fix for prompt table restore error by @vadam5 :: PR: #5393
    • Fix activation checkpoint args for T5 by @MaximumEntropy :: PR: #5410
    • Temporary hard code fix in PTL for CUDA Error by @yaoyu-33 :: PR: #5421
    • disable pc test by @ekmb :: PR: #5426
    • Revert Temporary hard code fix in PTL for CUDA Error by @yaoyu-33 :: PR: #5431
    • Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False by @MaximumEntropy :: PR: #5420
    • Add num layers check for full activation checkpointing by @MaximumEntropy :: PR: #5470
    • Cherry Pick T5 finetuning changes into 1.13 by @MaximumEntropy :: PR: #5478
    • T5 Eval bugfix by @Davood-M :: PR: #5521
    • added set_start_method + function param bugfix by @Davood-M :: PR: #5539
    • Remove notebook by @ericharper :: PR: #5548
    • Remove broadcast from T5 prompt learning inference by @MaximumEntropy :: PR: #5558
    • Fix all gather while writing to a file during T5 finetuning by @MaximumEntropy :: PR: #5561
    Source code(tar.gz)
    Source code(zip)
  • v1.12.0(Oct 10, 2022)

    Container

    For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

    docker pull nvcr.io/nvidia/nemo:22.08

    ASR

    Changelog
    • Add support for RNNT Char/Word Timestamp Calculation by @titu1994 :: PR: #4665
    • add conditional logic to rnnt_wer to handle when arrays have no elements by @mgoldey :: PR: #4776
    • fix handling of the final word for rnnt word timestamps by @mgoldey :: PR: #4779
    • amend rnnt word timestamps by @mgoldey :: PR: #4782
    • fix type error in rnnt_wer.py, rnnt_wer_bpe.py, wer_bpe.py by @hainan-xv :: PR: #4822
    • add kab language asr models by @nithinraok :: PR: #4819
    • [Tutorial][ASR][Fix] Data paths in ASR with NeMo tutorial by @anteju :: PR: #4845
    • [ASR] Fix for multi-channel signals in AudioSegment by @anteju :: PR: #4824
    • [ASR] Generate multichannel noise by @anteju :: PR: #4870
    • Fix asr model order by @nithinraok :: PR: #4959
    • Fix ASR issues by @titu1994 :: PR: #4984
    • Fix diarization ASR inference link in notebook by @SeanNaren :: PR: #5016
    • Code switching by @KunalDhawan :: PR: #4784
    • Release SOTA Lang ID model by @fayejf :: PR: #5080
    • Stateless decoder for RNN-T by @hainan-xv :: PR: #4710

    TTS

    Changelog
    • [TTS] use consistent spline interpolation for fastpitch and hifigan. by @XuesongYang :: PR: #4679
    • TTS tokenizers moved to collections.common.tokenizers by @AlexGrinch :: PR: #4690
    • [TTS] Fix text normalizer bugs in TTS data loader by @rlangman :: PR: #4781
    • ARP to IPA mapping, g2p_encode for IPATokenizer by @ekmb :: PR: #4850
    • IPA G2P bugfixes by @redoctopus :: PR: #4869
    • [TTS] add missing WikiHomograph data entries to CMUdict, updates to match new ipa set by @ekmb :: PR: #4886
    • [TTS] fix wrong g2p path. by @XuesongYang :: PR: #4902
    • [TTS] FastPitch training: speed up align_prior_matrix calculation by @racoiaws :: PR: #4718
    • [TTS] fix broken tutorial for MixerTTS. by @XuesongYang :: PR: #4949
    • [TTS] bugfix 'EnglishPhonemesTokenizer' object has no attribute 'encode_from_g2p' by @XuesongYang :: PR: #4992
    • [TTS] added missing German phoneme tokenizer by @XuesongYang :: PR: #5070
    • [TTS] fixed wrong val loss for epoch 0 and inconsistent metrics names by @XuesongYang :: PR: #5087

    NLP / NMT

    Changelog
    • Fix bug intent slot classification tokenizer to dialogue by @Zhilin123 :: PR: #4694
    • Intent slot model onnx export test by @Zhilin123 :: PR: #4731
    • Fix megatron p tuning notebook by @nithinraok :: PR: #4741
    • Add support for Apex distributed Adam optimizer with GPT-3 by @timmoon10 :: PR: #4487
    • Fixes NLPModel's load from checkpoint due to PTL private function changes by @MaximumEntropy :: PR: #4755
    • Adapter tuning for Megatron GPT models by @arendu :: PR: #4717
    • Megatron Encoder Decoder models with RPE and PP > 2 by @MaximumEntropy :: PR: #4663
    • add kab language asr models by @nithinraok :: PR: #4819
    • add chinese to language doc and fix bug by @yzhang123 :: PR: #4834
    • Spoken Language Identification by @fayejf :: PR: #4846
    • Fix decoding bug for megatron enc-dec models with O2 by @MaximumEntropy :: PR: #4989
    • Updating Megatron LM conversion according to PTL 1.7 by @Davood-M :: PR: #5038
    • Adding RETRO model Faiss sharding index and KNN sharding index by @yidong72 :: PR: #4713
    • MLP Prompt Learning Encoder by @vadam5 :: PR: #4849
    • Update the prompt learning to handle large lanague model by @yidong72 :: PR: #4906

    Text Normalization / Inverse Text Normalization

    Changelog
    • [TTS] Fix text normalizer bugs in TTS data loader by @rlangman :: PR: #4781
    • [Chinese text normalization]Chinese TN part in text_normalization by @mzxcpp :: PR: #4826
    • Fix zh tn by @yzhang123 :: PR: #5035
    • Bug fixes for parallel mp3 to wav conversion, PC notebook, update Readme for TN requirements by @ekmb :: PR: #5047
    • Added P&C lexical audio model by @jubick1337 :: PR: #4802

    Export

    Changelog
    • Intent slot model onnx export test by @Zhilin123 :: PR: #4731

    General Improvements

    Changelog
    • Fix logger reference by @SeanNaren :: PR: #4786

    • Fix error with class method reference in msdd by @SeanNaren :: PR: #4865

    • Add sync for logging calls to ensure aggregation across devices by @SeanNaren :: PR: #4876

    • Fix saving the last checkpoint when using val check interval by @SeanNaren :: PR: #4905

    • Add support for skipping validation on resume + extend saving last ckpt test by @SeanNaren :: PR: #4922

    • Move trainer calls for ssl models to training and validation steps only by @sam1373 :: PR: #4685

    • Change Num Partitions size expansion fix by @aklife97 :: PR: #4719

    • upgrade to PTL 1.7 by @nithinraok :: PR: #4672

    • Fixing outputs of infer() and use of NeMo length regulator helper by @borisfom :: PR: #4724

    • bug fix: enable async grad reduction when DP > 1 by @erhoo82 :: PR: #4740

    • Add LayerNorm1P, weight decay for LN and unscaled initialization by @mikolajblaz :: PR: #4743

    • Data Simulator by @chooper1 :: PR: #4686

    • jenkins data simulator fix by @nithinraok :: PR: #4751

    • Mutiscale Diarization Decoder (MSDD) model and module files by @tango4j :: PR: #4650

    • Fix logging in gradient clipping with PTL 1.7.2 by @MaximumEntropy :: PR: #4769

    • Fix checkpoint restoring by @nithinraok :: PR: #4777

    • avoid data clipping after convolution with rir samples by @nithinraok :: PR: #4806

    • Fixed in_features dim if bidirectional is True by @farisalasmary :: PR: #4588

    • Fix float/integer type error in WER.update() by @fujimotos :: PR: #4816

    • [Speech Data Explorer] An option to explicitly specify the base dir by @anteju :: PR: #4678

    • adding instancenorm as an option for conv normalization by @bmwshop :: PR: #4827

    • Fix small spelling mistakes by @SeanNaren :: PR: #4839

    • [Tutorials] Fix matplotlib version and directory name in Multispeaker_Simulator by @anteju :: PR: #4804

    • Update diarization folder structure by @tango4j :: PR: #4823

    • Missing types in clustering by @SeanNaren :: PR: #4858

    • add new models by @Jorjeous :: PR: #4852

    • Fix decoding for T5 models with RPE by @MaximumEntropy :: PR: #4847

    • Update Speaker Diarization notebooks with unknown oracle_num_speakers by @fayejf :: PR: #4861

    • Fix mha bug by @yzhang123 :: PR: #4859

    • Updates to adapter training by @arendu :: PR: #4842

    • Changes to MSDD code after review, fix test log call by @SeanNaren :: PR: #4881

    • Fixed output of BERT to be [batch x seq x hidden] by @michalivne :: PR: #4887

    • Add AMI dataset script by @SeanNaren :: PR: #4864

    • Update label_models.py by @stevehuang52 :: PR: #4891

    • Update tutorials.rst for question answering by @Zhilin123 :: PR: #4895

    • removed unused imports for all domains. by @XuesongYang :: PR: #4901

    • Fix ptl_load_state not providing cls by @MaximumEntropy :: PR: #4914

    • Remove unused cv collection by @okuchaiev :: PR: #4907

    • Add mixed-representation config to PhonemizerTokenizer by @rlangman :: PR: #4904

    • Fix implicit bug in _AudioLabelDataset by @stevehuang52 :: PR: #4923

    • Fix and refactor label models by @fayejf :: PR: #4913

    • Sparrowhawk deployment fix by @ekmb :: PR: #4928

    • Upgrade to NGC PyTorch 22.08 Container by @ericharper :: PR: #4929

    • Fixes for Cherry Picked PRs by @titu1994 :: PR: #4962

    • Fix cherry pick workflow by @ericharper :: PR: #4964

    • check for active conda environment by @nithinraok :: PR: #4970

    • fix label models restoring issue from weighted cross entropy by @nithinraok :: PR: #4968

    • Add simple pre-commit file (#4983) by @SeanNaren :: PR: #4995

    • Fix bug in Squeezeformer Conv block by @titu1994 :: PR: #5011

    • Fix bugs by @Zhilin123 :: PR: #5036

    • Add black to pre-commit (#5027) by @SeanNaren :: PR: #5045

    • Fix bug in question answering tutorial by @Zhilin123 :: PR: #5049

    • Missing fixes from r1.11.0 to T5 finetuning eval by @MaximumEntropy :: PR: #5054

    • P&C docs by @jubick1337 :: PR: #5068

    • probabilites -> probabilities by @nithinraok :: PR: #5078

    • Notebook bug fixes by @vadam5 :: PR: #5084

    • update strategy in notebook from ddp_fork to dp by @Zhilin123 :: PR: #5088

    • Fix Unhashable type list for Numba Cuda spec augment kernel by @titu1994 :: PR: #5093

    • Remove numba import by @titu1994 :: PR: #5095

    • T5 prompt learning fixes missing from r.11.0 merge by @MaximumEntropy :: PR: #5075

    • T5 Decoding with PP > 2 fix by @MaximumEntropy :: PR: #5091

    • Multiprocessing fix by @jubick1337 :: PR: #5106

    • [Bug fix] PC lexical + audio by @ekmb :: PR: #5109

    • bugfix: pybtex.database.InvalidNameString: Too many commas in author … by @XuesongYang :: PR: #5112

    Source code(tar.gz)
    Source code(zip)
  • v1.11.0(Sep 8, 2022)

    Container

    For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

    docker pull nvcr.io/nvidia/nemo:22.07

    ASR

    Changelog
    • Add ASR CTC Decoding module by @titu1994 :: PR: #4342
    • Fixing bugs in calling method ctc_decoder_predictions_tensor. by @VahidooX :: PR: #4414
    • Fixed WER initialization in ASR_with_Nemo notebook by @anteju :: PR: #4523
    • Update signature of Hypothesis alignments by @titu1994 :: PR: #4511
    • Add support for ASR Adapter Auxiliary Losses by @titu1994 :: PR: #4480
    • Catalan ASR NGC Resource by @stevehuang52 :: PR: #4576
    • Add kw asr models, add itn ru checkpoint (tagger-based) by @bene-ges :: PR: #4595
    • Add DALI char dataset support to SSL model by @piraka9011 :: PR: #4592
    • Customize arguments for trimming the leading/trailing silence by @XuesongYang :: PR: #4582
    • Update Offline ASR with CTC Decoding by @titu1994 :: PR: #4608
    • Add Squeezeformer to ASR by @titu1994 :: PR: #4416
    • Fix ASR notebooks by @titu1994 :: PR: #4738
    • Add pretrained ASR models for Croatian by @anteju :: PR: #4682
    • Dataloader, collector, loss and metric for multiscale diarization decoder by @tango4j :: PR: #4187
    • Multilingual VAD model by @fayejf :: PR: #4734
    • Adding support for models trained with full context for cache-aware streaming. by @VahidooX :: PR: #4687
    • Fp16 support for Conformer by @bmwshop :: PR: #4571
    • Tiny VAD refactoring for postprocessing by @fayejf :: PR: #4625
    • Add silence handling for speaker diarization pipeline by @nithinraok :: PR: #4512
    • Add Bucketing support to TarredAudioToClassificationLabelDataset by @entn-at :: PR: #4465

    TTS

    Changelog
    • Wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4388
    • Pitch, voiced_mask, prob_voiced have the same values which is not expected. by @XuesongYang :: PR: #4392
    • Add static method decorator. by @XuesongYang :: PR: #4443
    • Fix typo in HiFi-GAN config's max steps by @XuesongYang :: PR: #4450
    • Relaxed support for both CPUs and GPUs by @XuesongYang :: PR: #4461
    • Multi-speaker fastpitch model training recipe on HUI-Audio-Corpus-German by @XuesongYang :: PR: #4413
    • Created the finetuning Hifigan 44100Hz recipe on HUI-Audio-Corpus-German by @XuesongYang :: PR: #4478
    • Fix dataset parameter typo on tacotron2 example yaml by @saarus72 :: PR: #4471
    • Update cmudict by @jasro23 :: PR: #4510
    • Customize arguments for trimming the leading/trailing silence by @XuesongYang :: PR: #4582
    • Fix off-by-1 bug in Beta Binomial Prior by @rlangman :: PR: #4616
    • G2P Aligner by @redoctopus :: PR: #4604
    • RADTTS ADLR-NEMO porting by @MikyasDesta :: PR: #4538
    • Fixed wrong pronunciations for r1.11. by @XuesongYang :: PR: #4677
    • Incremented the version number to 22.08 in tutorials. by @XuesongYang :: PR: #4684
    • Bugfix for missing configs. by @XuesongYang :: PR: #4725
    • Fix pynini install in TTS tutorials by @redoctopus :: PR: #4729
    • Updated config with a German IPA phoneme tokenizer by @XuesongYang :: PR: #4756
    • Add multi-speaker German FastPitch and HiFiGAN NGC checkpoints by @XuesongYang :: PR: #4763
    • Add single male speaker German FastPitch and HiFiGAN NGC checkpoints by @XuesongYang :: PR: #4770
    • Deprecated old scripts for ljspeech. by @XuesongYang :: PR: #4780
    • Fix MixerTTS data loading index error by @redoctopus :: PR: #4811
    • G2P docs by @ekmb :: PR: #4841
    • NMESC speaker counting algorithm update by @tango4j :: PR: #4500

    NLP / NMT

    Changelog
    • Add O2 support for RETRO model by @yidong72 :: PR: #4411
    • Add MTEncDec Finetune support by @aklife97 :: PR: #4540
    • Fix metric setup for finetuning without a test set by @MaximumEntropy :: PR: #4585
    • T0 model and dataset by @MaximumEntropy :: PR: #4598
    • Add prompt learning for T5 by @HeyyyyyyG :: PR: #4391
    • Add MuTransfer Capablity to RETRO model pretraining by @yidong72 :: PR: #4643
    • Label Smoothing in VocabParallelCrossEntropy by @MaximumEntropy :: PR: #4602
    • Megatron BART BOS / EOS bug fix by @michalivne :: PR: #4495
    • GPT Prompt Learning Improvements by @vadam5 :: PR: #4496
    • Megatron perceiver with tensor parallelism only by @MaximumEntropy :: PR: #4318
    • Refactor for punctuation model by @jubick1337 :: PR: #4367
    • Update megatron prompt learning interface to dialogue by @Zhilin123 :: PR: #4545
    • Removed NLPDDPPlugin Import check by @vadam5 :: PR: #4555
    • Option to disregard document boundaries for t5, bart, ul2 by @MaximumEntropy :: PR: #4481
    • Add Tokenization and Normalization pre-proecssing script for NMT by @aklife97 :: PR: #4557
    • Integrating support for GPT/T5/BART for Question Answering by @ameyasm1154 :: PR: #4532
    • NeMo Megatron: Add sequence parallelism and selective activation checkpointing (rebased) by @ericharper :: PR: #4380
    • Update megatron t5 interface to dialogue by @Zhilin123 :: PR: #4626
    • Additional sentencepiece args - Byte fallback, split digits, split_on_whitespace by @MaximumEntropy :: PR: #4525
    • Maximum sample-based training for Megatron NMT and Text Memmap based Seq2seq Pre-training by @MaximumEntropy :: PR: #4396
    • NeMo Megatron Doc updates1 by @okuchaiev :: PR: #4633
    • Asymmetric Encoder and Decoder Configuration for Megatron Models by @MaximumEntropy :: PR: #4568
    • Add sentencepiece legacy arg to megatron tokenizer configs by @MaximumEntropy :: PR: #4659
    • Megatron encode function with RPE fix by @MaximumEntropy :: PR: #4692
    • Updates to NeMo Megatron OSS docs by @okuchaiev :: PR: #4709
    • Changes to make Megatron NMT exportable by @Davood-M :: PR: #4499
    • fix bug relating to ddp strategy in joint intent slot classification … by @Zhilin123 :: PR: #4762
    • Fix qa notebook typos and branch by @ericharper :: PR: #4788
    • Colab py37 compatibility megatron by @Zhilin123 :: PR: #4791
    • added/fixed export for Megatron models by @Davood-M :: PR: #4712
    • Fix providing glue in seq2seq eval by @MaximumEntropy :: PR: #4843
    • Fix Megatron NMT consumed samples and ckpt_to_nemo split rank by @MaximumEntropy :: PR: #4884
    • Fixing Megatron BERT output dimensions to [batch x sec x hidden] by @michalivne :: PR: #4894
    • Prompt Learning Inference Improvements by @vadam5 :: PR: #4566
    • MegaMolBART Compatibility by @michalivne :: PR: #4603

    Text Normalization / Inverse Text Normalization

    Changelog
    • Add ITN pt by @guidefloripa :: PR: #4516
    • add kw asr models, add itn ru checkpoint (tagger-based) by @bene-ges :: PR: #4595
    • Fix ITN pt by @guidefloripa :: PR: #4623
    • Bug fix hundred in Audio-based, added method so split text in sentences by @ekmb :: PR: #4610
    • Fix itn pt time by @guidefloripa :: PR: #4630
    • Pin lightning version to be < 1.7.0 by @MaximumEntropy :: PR: #4660
    • G2P for OOV and heteronyms by @ekmb :: PR: #4624
    • Publish pretrained itn t5 model for English by @bene-ges :: PR: #4748
    • Added MLM Scoring by @yzhang123 :: PR: #4476

    Export

    Changelog
    • update fastpitch to add export controls by @blisc :: PR: #4509
    • Fix Fastpitch Export by @blisc :: PR: #4676
    • Changes to make Megatron NMT exportable by @Davood-M :: PR: #4499
    • Added/fixed export for Megatron models by @Davood-M :: PR: #4712

    Bugfixes

    Changelog
    • Wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4388
    • Pitch, voiced_mask, prob_voiced have the same values which is not expected. by @XuesongYang :: PR: #4392
    • Fix tarred dataset len when num shards is not divisible by workers by @itzsimpl :: PR: #4553
    • Fix multiple dev/test datasets after restoring from checkpoint by @PeganovAnton :: PR: #4636
    • Fix/need different cache dirs for different datasets by @PeganovAnton :: PR: #4640
    • Improve mAES algorithm with patches by @titu1994 :: PR: #4662

    General Improvements

    Changelog
    • Option to disable mp in VAD via num_workers=1 by @gkucsko :: PR: #4317
    • Remove redundant bias expand by @xrennvidia :: PR: #4382
    • Add option for specifying wandb save_dir from config by @shan18 :: PR: #4379
    • Quick wav2vec fix. In-place operation adding convolutional positions … by @bonham79 :: PR: #4383
    • Fixing import error in some cases by @borisfom :: PR: #4401
    • Update with new conformer checkpoints. by @VahidooX :: PR: #4417
    • Wav2vec fix by @bonham79 :: PR: #4467
    • Relative Audio Paths by @stevehuang52 :: PR: #4470
    • Allow Noam lr scheduler to run for more than max_steps by @alancucki :: PR: #4472
    • Support for Different LRs with Param Groups by @stevehuang52 :: PR: #4508
    • Fix runtime check by @borisfom :: PR: #4501
    • Update finetune label models by @nithinraok :: PR: #4504
    • Weighted bucketing by @tbartley94 :: PR: #4530
    • Relative Audio Path by @stevehuang52 :: PR: #4520
    • Fix duplex inference with grammars by @ekmb :: PR: #4517
    • Add nsys profiling by @ericharper :: PR: #4539
    • Remove the variable that is not used in the context. by @XuesongYang :: PR: #4547
    • Adding multispeaker fastpitch and hifigan en model links to available… by @subhankar-ghosh :: PR: #4550
    • Add length ratio filtering script by @MaximumEntropy :: PR: #4551
    • Relative audio path in speech data explorer by @anteju :: PR: #4570
    • Dividing generative question-answering CI tests by @ameyasm1154 :: PR: #4600
    • Updating the default parameters in the example adapters config file by @shan18 :: PR: #4607
    • Improve normalize_batch ValueError message by @piraka9011 :: PR: #4614
    • Support listing Hugging Face model info by @titu1994 :: PR: #4619
    • Update diarization data loader to train meeting data by @tango4j :: PR: #4567
    • Fix HF check for model card info by @titu1994 :: PR: #4628
    • Add Github Action for auto webpage build by @titu1994 :: PR: #4645
    • Empty commit by @titu1994 :: PR: #4646
    • Force git config for doc build by @titu1994 :: PR: #4647
    • Correct branch name for github page source by @titu1994 :: PR: #4648
    • Adding lang id to shard by @bmwshop :: PR: #4649
    • Fix special tokens in vocab to arguments of constructor by @gwarmstrong :: PR: #4631
    • Fix apex for r1.11 by @michalivne :: PR: #4666
    • Update readme by @nithinraok :: PR: #4667
    • Removed trailing spaces in CI test by @vadam5 :: PR: #4671
    • Pynini dependency fix by @ekmb :: PR: #4674
    • Fix for incorrect batch size issue while decoding by @rilango :: PR: #4675
    • Fix to fetch config file by @nithinraok :: PR: #4699
    • Fix notebook for buffered inference by @titu1994 :: PR: #4703
    • Prompt Learning Notebook Bug Fix by @vadam5 :: PR: #4689
    • Add psutils to mock imports by @ericharper :: PR: #4728
    • Update Aligner model and tutorial to add NGC checkpoint loading by @redoctopus :: PR: #4714
    • Updated docs and doc paths by @vadam5 :: PR: #4754
    • Update r1.11 to new heteronyms list by @redoctopus :: PR: #4745
    • Update CMUdict with more recent 0.7b entries by @redoctopus :: PR: #4768
    • Add pynini to Docker container by @artbataev :: PR: #4733
    • Fix tutorial formatting by @redoctopus :: PR: #4778
    • Fix initializing weights from ptl ckpt with exclude by @sam1373 :: PR: #4807
    • T5 prompt learning fixes by @MaximumEntropy :: PR: #4771
    • Updated inference code and squad scripts by @vadam5 :: PR: #4835
    • Fix uppercasing mismatch for IPA heteronyms by @redoctopus :: PR: #4860
    • Set the number of workers to 0 for validation and test sets in all enc-dec models by @MaximumEntropy :: PR: #4790
    • Fix mha by @yzhang123 :: PR: #4866
    • ipa bug fix by @ekmb :: PR: #4871
    • Added utf8 encoding by @vadam5 :: PR: #4892
    • Fix question answering docs r1p11 by @Zhilin123 :: PR: #4897
    Source code(tar.gz)
    Source code(zip)
  • v1.10.0(Jul 1, 2022)

    Container

    For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

    docker pull nvcr.io/nvidia/nemo:22.05

    Known Issues

    Issues
    • Tutorial: Fastpitch_Training_GermanTTS.ipynb is experimental and still being tested.

    ASR

    Changelog
    • Multilang asr tutorial by @bmwshop :: PR: #3931
    • Add ASR with Adapters Tutorial by @titu1994 :: PR: #4149
    • Add support for Decoder + Joint Adapters for ASR by @titu1994 :: PR: #4189
    • updating PretrainedModelInfo and benchmark sheet for ASR models by @krishnacpuvvada :: PR: #4259
    • Remove verbose flag from Dali Index Creator by @titu1994 :: PR: #4309
    • updating PretrainedModelInfo for ASR SSL models by @krishnacpuvvada :: PR: #4292
    • Adding docs for ASR SSL by @krishnacpuvvada :: PR: #4303
    • Add ASR Scores to Docs by @titu1994 :: PR: #4412
    • [ASR] Replace all paths with /content/ by @titu1994 :: PR: #4427
    • added conformer mandarin model. by @VahidooX :: PR: #4201
    • Runtime audio segment sampling for SSL by @krishnacpuvvada :: PR: #4126

    TTS

    Changelog
    • [TTS] Add volume passthrough to fp for riva by @blisc :: PR: #4167
    • Update TTS Configs from LAMB to AdamW by @redoctopus :: PR: #4233
    • Add benchmark=false to all TTS configs by @redoctopus :: PR: #4263
    • [TTS] add staticmethod decoration for BetaBinomialInterpolator by @XuesongYang :: PR: #4319
    • [TTS] capture exception of non-supported windows. by @XuesongYang :: PR: #4320
    • [TTS] enforced pin_memory = True by @XuesongYang :: PR: #4341
    • [TTS] Training Fastpitch on German text and phonemes and finetuning HiFi-GAN on predicted mels by @aroraakshit :: PR: #4266
    • IPA support for TTS by @redoctopus :: PR: #4310
    • Bits of RADTTS support by @borisfom :: PR: #4343

    NLP / NMT

    Changelog
    • Megatron NMT Restore from T5/BART and finetune by @MaximumEntropy :: PR: #3977
    • Binarized memmap dataloader for Megatron NMT, Inference and checkpoint -> nemo by @MaximumEntropy :: PR: #4137
    • Use unique names for temporary directories in punctuation and capitalization tests by @PeganovAnton :: PR: #4298
    • Removes debug logging statements in Megatron NMT by @MaximumEntropy :: PR: #4312
    • Raise error if trainer object is None for MegatronBaseModel by @MaximumEntropy :: PR: #4356
    • Punctuation and capitalization tests race condition by @PeganovAnton :: PR: #4399
    • unify intent slot dataset util functions in tutorials by @Zhilin123 :: PR: #4445
    • Fix for TP=2,PP=2 decoding with megatron encoder-decoder models by @MaximumEntropy :: PR: #4484
    • Add RETRO model for pretraining by @yidong72 :: PR: #4121
    • Add async grad allreduce and chunk optimization by @xrennvidia :: PR: #4084
    • Implements the UL2 Dataset and config by @MaximumEntropy :: PR: #4184
    • Add RETRO indexed dataset and inference by @yidong72 :: PR: #4220
    • Finetune T5 on the prefix-lm objective by @MaximumEntropy :: PR: #4328
    • Fuse bias with geglu in ParallelMLP by @xrennvidia :: PR: #4213
    • Support larger datasets for question answering by @Zhilin123 :: PR: #4205
    • Refactor bias act fusion by @MaximumEntropy :: PR: #4376
    • Prompt Learning Pipeline Parallel by @vadam5 :: PR: #4291
    • Text memmap dataset by @michalivne :: PR: #4068
    • Fuse grad division into async grad allreduce by @xrennvidia :: PR: #4327

    Text Normalization / Inverse Text Normalization

    Changelog
    • [TN] WFST to normalize punctuation by @ekmb :: PR: #4108
    • [TN/TTS] Add graph to tag IPA words/sentences in square brackets and leave them unchanged by @ekmb :: PR: #4323
    • Tn tutorial by @yzhang123 :: PR: #4090
    • [TN] WFST to normalize punctuation by @ekmb :: PR: #4108
    • Tn add rules by @yzhang123 :: PR: #4302
    • [TN/TTS] Add graph to tag IPA words/sentences in square brackets and leave them unchanged by @ekmb :: PR: #4323
    • Tn install by @yzhang123 :: PR: #4055
    • Fix electronic bug, new time ITN rule by @ekmb :: PR: #4355
    • [TN] Bug fix: expand serial coverage of unknown symbol, remove constraints from word graph by @ekmb :: PR: #4463
    • Configure T5 finetuning metrics by @MaximumEntropy :: PR: #4122

    Export

    Changelog
    • Added support for subnet export by @borisfom :: PR: #4299

    Core

    Changelog
    • Add Module-level Adapters, Save-Restore and tests by @titu1994 :: PR: #4114
    • Add NeMo Adapters tutorial to Core by @titu1994 :: PR: #4311
    • NeMo Model to HF Hub Upload Tutorial by @titu1994 :: PR: #4322

    General Improvements and Fixes

    Changelog
    • Update container to 22.05 by @ericharper :: PR: #4329
    • Fix PTL step calculation by @titu1994 :: PR: #4307
    • [NLP] P&C Fix multi node cache issue, add pynini guard by @ekmb :: PR: #4410
    • NeMo Megatron GPT Unit Tests by @ericharper :: PR: #4099
    • Add the PP2 GPT eval CI test by @yidong72 :: PR: #4168
    • BigNLP perf regression fix by @MaximumEntropy :: PR: #4267
    • Fixes for Megatron Base Model Artifacts by @MaximumEntropy :: PR: #4248
    • Fix a wrong description in offline_diarization_with_asr.yaml by @tango4j :: PR: #4141
    • bugfix for import error in Offline_ASR_with_VAD_for_CTC_models by @fayejf :: PR: #4424
    • [Fix] ASR RNNT Tutorial by @stevehuang52 :: PR: #4352
    • [TTS] Fix Hifigan finetune tutorial by @subhankar-ghosh :: PR: #4182
    • [Bugfix][TTS] wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4432
    • [bugfix][TTS] pitch, voiced_mask, prob_voiced have the same values. by @XuesongYang :: PR: #4435
    • [TTS] [bugfix] German FastPitch HiFi-GAN tutorial and lr by @aroraakshit :: PR: #4459
    • [TTS] [bugfix] update indentation by @aroraakshit :: PR: #4468
    • Fix some 's' cases for IPA G2P by @redoctopus :: PR: #4460
    • Fix ASR Typos in tutorials by @titu1994 :: PR: #4384
    • Use unique names for temporary directories in punctuation and capitalization tests by @PeganovAnton :: PR: #4298
    • Punctuation and capitalization tests race condition by @PeganovAnton :: PR: #4399
    • Dialogue tasks unit test by @Zhilin123 :: PR: #4112
    • fix error by @yzhang123 :: PR: #4120
    • fix typo by @stevehuang52 :: PR: #4134
    • Fix cmudict typo: phoneme YI1 -> IY1 in NVME by @redoctopus :: PR: #4139
    • transcribe: scan directories recursively by @virajkarandikar :: PR: #4159
    • Add 44KHz yaml file for Fastpitch training by @subhankar-ghosh :: PR: #4161
    • [bugfix] consistent highfreq to both fastpitch and hifigan in their 44100 configs. by @XuesongYang :: PR: #4177
    • Upperbound OmegaConf by @titu1994 :: PR: #4191
    • Prompt tokenization bugfix by @vadam5 :: PR: #4197
    • Updated to Prompt Learning Model to Use Distributed Sampler by @vadam5 :: PR: #4208
    • Freesound fixes by @virajkarandikar :: PR: #4155
    • Patch Hydra by @titu1994 :: PR: #4202
    • Prompt Learning Model Saving Changes by @vadam5 :: PR: #4212
    • Speakertasks manifest by @yzhang123 :: PR: #4185
    • SSL Multi-loss Update by @sam1373 :: PR: #4186
    • Support load_adapters with just adapter_name by @titu1994 :: PR: #4255
    • Add special tokens to existing (trained) SentencePiece models by @aklife97 :: PR: #4203
    • Fixing the speed slow-down for speech models. by @VahidooX :: PR: #4260
    • Fix and add functions in speaker utils by @tango4j :: PR: #4138
    • pt container 1.10->1.11.0 by @ekmb :: PR: #4273
    • ssl fixes by @sam1373 :: PR: #4268
    • Save Virtual Prompt Weights Only by @vadam5 :: PR: #4237
    • add 'relative positional embedding (RPE)' feature - re-creating after… by @khcs :: PR: #4256
    • Docs CSS: Update h4 tag style for the right side bar by @nickolyamba :: PR: #4284
    • Fix Docs CSS: align docs left and increase width for large screens by @nickolyamba :: PR: #4154
    • remove redundant condition for fastpitch. by @XuesongYang :: PR: #4281
    • [Add] automaticly resolving relative audio path by @stevehuang52 :: PR: #4277
    • forcing conv subsampling to 32 bit by @bmwshop :: PR: #4293
    • Add library name and version when downloading from the Hugging Face Hub by @osanseviero :: PR: #4304
    • clear access registry when adding if not empty by @sam1373 :: PR: #4306
    • [collections] bugfix for capturing NotImplementedError of non-supported sup data types. by @XuesongYang :: PR: #4297
    • Adjust lr for AdamW from LAMB default by @redoctopus :: PR: #4308
    • Fix bugs in indexed dataset exam script by @yidong72 :: PR: #4325
    • Torchaudio installation fix by @GNroy :: PR: #4330
    • Speedup the speech commands dataset processing script by @shan18 :: PR: #4347
    • fix wrong requirement by @yzhang123 :: PR: #4349
    • Refactored path to manifest by @treacker :: PR: #4251
    • Fix the post LN bug by @yidong72 :: PR: #4350
    • [Fix] Hanging for Fully Randomized Bucketing by @stevehuang52 :: PR: #4348
    • Auto-switch the input dimensions in the conformer encoder adapter to correct value by @shan18 :: PR: #4354
    • Set headscale false by @MaximumEntropy :: PR: #4364
    • Add wandb as dependency by @titu1994 :: PR: #4365
    • Fix trainer.global_steps in WandB logging by @titu1994 :: PR: #4366
    • Finetuning changes for BART by @MaximumEntropy :: PR: #4003
    • Make position embedding expansion specific to a batch to avoid checkpoint size mismatches by @MaximumEntropy :: PR: #4357
    • Correct support for dataclasses in default module dim by @titu1994 :: PR: #4372
    • Fix no attribute 'pad_id' bug when pre-processing by @yidong72 :: PR: #4377
    • Question answering bug fix by @Zhilin123 :: PR: #4381
    • Docs for NeMo Adapters by @titu1994 :: PR: #4369
    • Update NeMo docs by @titu1994 :: PR: #4397
    • Fixing import error in some cases by @borisfom :: PR: #4402
    • Fix tutorial typos and docs by @titu1994 :: PR: #4415
    • Add reconfigure on validation epoch start by @MaximumEntropy :: PR: #4393
    • Re-apply fixes from r1.9.0 by @redoctopus :: PR: #4425
    • Fix hanging issue by multiprocessing in SD tutorial and add ETA for VAD processing by @fayejf :: PR: #4405
    • Fix notebook text by @yidong72 :: PR: #4438
    • Update dialogue tutorial version by @Zhilin123 :: PR: #4437
    • Docs: Add table overflow handling by @nickolyamba :: PR: #4441
    • Docs: Decrease Font Size on Tables by @nickolyamba :: PR: #4444
    • Notebook bug fix: add subfolder by @ekmb :: PR: #4442
    • Fix typo in HiFi-GAN config's max steps by @redoctopus :: PR: #4446
    • Updated notebook to fix batch configuration and precision bugs by @vadam5 :: PR: #4447
    • fix branch in link by @ekmb :: PR: #4454
    • t5-rpe-fix targeting r1.10.0; raise exception for PP>2. by @khcs :: PR: #4469
    • Add kwargs to exact string match by @MaximumEntropy :: PR: #4479
    Source code(tar.gz)
    Source code(zip)
  • v1.9.0(Jun 3, 2022)

    Container

    For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

    docker pull nvcr.io/nvidia/nemo:22.04

    ASR

    Changelog
    • Fix changed function name in offline vad asr notebeook by @fayejf :: PR: #4007
    • NeMo Adapters Support + ASR Adapters by @titu1994 :: PR: #3942
    • Update ASR configs with num_workers and pin_memory by @titu1994 :: PR: #4270
    • Verbose k2 install, skip if failed by @GNroy :: PR: #4289
    • Torch conversion for VAD-Diarization pipeline by @tango4j :: PR: #3930
    • Multiprocess improvements by @nithinraok :: PR: #4127

    TTS

    Changelog
    • Tn tts e by @ekmb :: PR: #3988
    • Remove AudioToCharWithPriorAndPitchDataset dependency from fastpitch by @subhankar-ghosh :: PR: #4008
    • Deprecation by @blisc :: PR: #4082
    • FastPitch FT notebook - Improving Speech Quality clarifications by @redoctopus :: PR: #3954

    NLP / NMT

    Changelog
    • Option to remove bias terms from Megatron transformers by @MaximumEntropy :: PR: #3973
    • Add NMT method to translate with TN/ITN pre/post-processing by @MaximumEntropy :: PR: #4009
    • Fix Punctuation and Capitalization model batching. An issue with shuffling. by @PeganovAnton :: PR: #4050
    • Fix GPT model parallel eval by @yidong72 :: PR: #4054
    • Updating with main by @jpilaul :: PR: #4073
    • Cherry-pick fix for megatron ckpt conversion script when using BCP by @ericharper :: PR: #4089
    • Check implicit grad acc in GLUE dataset building by @MaximumEntropy :: PR: #4123
    • Fix/punctuation avoid overwritting tmp files by @PeganovAnton :: PR: #4144
    • Fix/punctuation/trainer required for setting test data by @PeganovAnton :: PR: #4199
    • Raise error if bicleaner is not installed in NMT Data preprocesing notebook by @MaximumEntropy :: PR: #4264
    • Fix epoch end for NeMo NMT by @MaximumEntropy :: PR: #4265
    • Update YAML with trainer.benchmark=False for NLP by @MaximumEntropy :: PR: #4261
    • Add NMT method to translate with TN/ITN pre/post-processing by @MaximumEntropy :: PR: #4009
    • Continuous prompt refactor by @vadam5 :: PR: #3877
    • T5 finetuning for generic small text-to-text datasets by @MaximumEntropy :: PR: #4032

    Text Normalization / Inverse Text Normalization

    Changelog
    • Tn special text support by @yzhang123 :: PR: #3969
    • Tn update numbers by @yzhang123 :: PR: #3992
    • Tn tts e by @ekmb :: PR: #3988
    • Itn vi by @yzhang123 :: PR: #4029
    • Refactor tn data folder, and update of measure by @yzhang123 :: PR: #4028
    • Remove conda dependency for tn by @yzhang123 :: PR: #4057
    • Tn electronic by @yzhang123 :: PR: #4053
    • ThutmoseTaggerModel, a new model for inverse text normalization by @bene-ges :: PR: #4011
    • Tutorial on ITN with Thutmose tagger and small fixes by @bene-ges :: PR: #4117
    • Cleaned up TN/ ITN doc by @yzhang123 :: PR: #4119
    • Update default for SH by @ekmb :: PR: #4135
    • Update ContextNet version by @titu1994 :: PR: #4207

    NeMo Tools

    Changelog
    • Added exception handling for audio player in SDE by @vsl9 :: PR: #4077

    NeMo Core

    Changelog
    • Support pre-extracted nemo checkpoint for restoration by @titu1994 :: PR: #4061
    • Fix type checking to be compatible with named tuples by @artbataev :: PR: #3986
    • Update num worker calculation due to PTL flag changes by @redoctopus :: PR: #4056
    • Refresh NeMo documentation to Sphinx Book Theme by @titu1994 :: PR: #3996
    • Generalize adapter merge strategy for future adapters by @titu1994 :: PR: #4091

    General Improvements

    Changelog
    • Fix Punctuation and Capitalization model batching. An issue with shuffling. by @PeganovAnton :: PR: #4050
    • Fix restoring from checkpoint for case when is provided by @PeganovAnton :: PR: #4136
    • Fix/punctuation avoid overwritting tmp files by @PeganovAnton :: PR: #4144
    • Fix/punctuation/trainer required for setting test data by @PeganovAnton :: PR: #4199
    • Ability to set log_prediction to false by @bmwshop :: PR: #3929
    • Glu activation variants by @MaximumEntropy :: PR: #3951
    • Ranking merge by @yzhang123 :: PR: #3906
    • Fix path in doc by @nithinraok :: PR: #3979
    • Adding fisher audio conversion script from old NeMo branch by @jbalam-nv :: PR: #3991
    • improvements to geet_commonvoice_data script by @bmwshop :: PR: #3999
    • Bugfix and variable name change for clustering code by @tango4j :: PR: #4023
    • Exp manager log rank 0 only arguments by @MaximumEntropy :: PR: #4026
    • Force import test on PR by @titu1994 :: PR: #4037
    • Drop support for kaldi-io by @titu1994 :: PR: #4042
    • Cherry pick HF integration and bug fixes from 1.8.1 by @ericharper :: PR: #4052
    • Make saving prompt encoder embeddings non-configurable by @vadam5 :: PR: #4071
    • Replace sampled tokens with EOD after EOD has been sampled once by @vadam5 :: PR: #4070
    • Added answer only loss for prompt learning by @vadam5 :: PR: #4069
    • added stacking suport to conformer. by @VahidooX :: PR: #4045
    • Update LJSpeech whitelist file path by @redoctopus :: PR: #4078
    • Added check for microbatch calculator by @vadam5 :: PR: #4043
    • Prompt Learning Docs by @vadam5 :: PR: #4046
    • Fix link to prompt tuning page by @SeanNaren :: PR: #4081
    • Add docs for by @titu1994 :: PR: #4079
    • Dialogue task by @Zhilin123 :: PR: #3884
    • RMSNorm, Normformer and fixes from merging 1.8.0 into main by @MaximumEntropy :: PR: #4048
    • Correct link to PTL by @titu1994 :: PR: #4088
    • Added encoder and decoder modules for RETRO model by @yidong72 :: PR: #4038
    • Upgrade container to NGC PyTorch 22.04 by @ericharper :: PR: #4085
    • Tarred fix label models by @nithinraok :: PR: #4092
    • Fix link to tutorial in dialogue docs by @Zhilin123 :: PR: #4093
    • Prompt learning Notebook by @vadam5 :: PR: #4031
    • Add more papers by @yzhang123 :: PR: #4097
    • Ignore speakers with few utterances by @nithinraok :: PR: #3722
    • Access mixin by @sam1373 :: PR: #4098
    • Add CharParser for Cyrillic letters by @karpov-nick :: PR: #4101
    • Restored tests previously disabled for 22.03 base by @borisfom :: PR: #4109
    • Add augmentation to label models by @nithinraok :: PR: #4113
    • Fix register artifacts by @ramanathan831 :: PR: #4116
    • Fix typo by @yzhang123 :: PR: #4140
    • bug_fix_diarization_manifest_creation by @yzhang123 :: PR: #4125
    • Tacotron2 retrain by @treacker :: PR: #4103
    • WaveGlow input type fixes by @redoctopus :: PR: #4151
    • Notebooks' link, typo and import fix by @fayejf :: PR: #4158
    • Thutmose tagger bug fixes by @bene-ges :: PR: #4162
    • Update speaker docs by @nithinraok :: PR: #4164
    • Set plugin to None when no apex by @ekmb :: PR: #4171
    • Fix doc by @yzhang123 :: PR: #4152
    • Small import name fix by @fayejf :: PR: #4180
    • Rename folder VAD -> vad by @fayejf :: PR: #4163
    • Fix the server key value problem in the notebook by @yidong72 :: PR: #4196
    • Pin omegaconf for r1.9.0 by @ericharper :: PR: #4195
    • Fix cherrypicks by @titu1994 :: PR: #4204
    • Fix bugs for dialogue tutorial by @Zhilin123 :: PR: #4211
    • Tacotron2 1.9.0 bugfixes by @redoctopus :: PR: #4209
    • Add docs for Thutmose Tagger by @bene-ges :: PR: #4173
    • Dialogue tutorial fix by @Zhilin123 :: PR: #4221
    • Fix syntax error in ipynb-file by @bene-ges :: PR: #4228
    • Fix JSON serialization problem by @yidong72 :: PR: #4235
    • Prompt Learning Typo Fixes by @vadam5 :: PR: #4238
    • Fixing bug 3642622 by @pasandi20 :: PR: #4250
    • Fix broken link in the tutorial by @bene-ges :: PR: #4257
    • Prompt learning notebook bugfix by @vadam5 :: PR: #4262
    • Fix missing validation dataset, whitelist certain keywords for datasets by @titu1994 :: PR: #4269
    • Set Save on train end to false by @vadam5 :: PR: #4274
    • Updated config to fix CI test OOM error by @vadam5 :: PR: #4279
    • Changed total virtual prompt tokens by @vadam5 :: PR: #4295
    Source code(tar.gz)
    Source code(zip)
  • v1.8.2(Apr 26, 2022)

    Known Issues

    • Megatron BERT export does not currently work in the NVIDIA NGC PyTorch 22.03 container. The issue will be fixed in the NGC PyTorch 22.04 container.

    TTS

    • Fastpitch Tutorial fix by @subhankar-ghosh :: PR: #4044
    Source code(tar.gz)
    Source code(zip)
  • v1.8.1(Apr 22, 2022)

    Known Issues

    • Megatron BERT export does not currently work in the NVIDIA NGC PyTorch 22.03 container. The issue will be fixed in the NGC PyTorch 22.04 container.

    TTS

    • Restore_buffer bug fix and update NeMo checkpoint URL by @subhankar-ghosh :: PR: #4041

    Hugging Face Hub Integration

    • Add support for Huggingface Hub to NeMo by @titu1994 :: PR: #4030

    Bug Fixes

    • Added apex import guard back
    • Patch commons.py by @ericharper :: PR: #4039
    • Fixing pretrained name by @borisfom :: PR: #4022
    • Add back Citrinet zh by @titu1994 :: PR: #4040
    Source code(tar.gz)
    Source code(zip)
  • v1.8.0(Apr 20, 2022)

    Known Issues

    Issues
    • Megatron BERT export does not currently work in the NVIDIA NGC PyTorch 22.03 container. The issue will be fixed in the NGC PyTorch 22.04 container.
    • pytest for Vietnamese inverse text normalization are failing. Fixed in main

    Container

    For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

    docker pull nvcr.io/nvidia/nemo:22.03
    

    ASR

    Changelog
    • ASR SSL Update by @sam1373 :: PR: #3714
    • Polylang asr by @bmwshop :: PR: #3721
    • Test grad accumulation for RNNT loss by @titu1994 :: PR: #3731
    • Add readme files describing model execution flow for ASR tasks by @titu1994 :: PR: #3812
    • add fr asr ckpt to doc by @yzhang123 :: PR: #3809
    • Fix asr tests in 22.02 by @titu1994 :: PR: #3823
    • Add new pretrained Spanish ASR models by @erastorgueva-nv :: PR: #3830
    • Documentation updates for ASR by @titu1994 :: PR: #3846
    • Offline VAD+ASR tutorial by @fayejf :: PR: #3828
    • Added Hindi and Marathi Models in Nemo pretrained ASR_CTC_BPE models … by @meghmak13 :: PR: #3856
    • Add a missing line to ASR_with_NeMo.ipynb by @lifefeel :: PR: #3908
    • Multilang asr models by @bmwshop :: PR: #3907
    • added stt_en_conformer_transducer_large_ls to NGC by @VahidooX :: PR: #3920
    • Fix DALI test on 22.03 by @titu1994 :: PR: #3911
    • Adding RNN encoder for LSTM-Transducer and LSTM-CTC models by @VahidooX :: PR: #3886
    • Fix issue with Segfault in ASR models by @titu1994 :: PR: #3956
    • Added Mandarin pretrained Conformer-Transducer-Large model trained on AISHELL2. by @VahidooX :: PR: #3970

    TTS

    Changelog
    • Bump TTS deprecation version to 1.9 by @blisc :: PR: #3955
    • Add pinned pynini and scipy installs to TTS training tutorial by @redoctopus :: PR: #3967
    • Compatability override to load_state_dict for old TTS checkpoints by @redoctopus :: PR: #3978

    NLP / NMT

    Changelog
    • Use worker processes for data preprocessing by @crcrpar :: PR: #3665
    • Set find_unused_parameters to False in GPT example script by @ericharper :: PR: #3837
    • GPT multinode eval by @ericharper :: PR: #3821
    • Fix MegatronPretrainingRandomSampler by taking into account by @crcrpar :: PR: #3826
    • Add slot filling into DST Generative model by @Zhilin123 :: PR: #3695
    • Disable nvfuser for gpt by @ericharper :: PR: #3845
    • Multi-Label Joint Intent Slot Classification by @chenrichard10 :: PR: #3742
    • fix bug in intent/slot model reloading by @carolmanderson :: PR: #3874
    • Make test_gpt_eval unit test less strict by @yidong72 :: PR: #3898
    • Comment gpt resume ci test by @MaximumEntropy :: PR: #3901
    • Neural Machine Translation with Megatron Transformer Models (Tensor Parallel and Tarred Datasets Only) by @MaximumEntropy :: PR: #3861
    • Megatron support by @ramanathan831 :: PR: #3893
    • Populate the GPT/BERT with uploaded models by @yidong72 :: PR: #3885
    • Megatron BART by @michalivne :: PR: #3666
    • Additional Japanese processor for NMT that uses MeCab segmentation. Fix for BLEU in one-many NMT by @MaximumEntropy :: PR: #3889
    • NMT GRPC sever URL fix by @MaximumEntropy :: PR: #3918
    • Megatron legacy conversion support by @ramanathan831 :: PR: #3919
    • Update max_epochs on megatron configs by @ericharper :: PR: #3958
    • Fix NMT variable passing bug by @aklife97 :: PR: #3985
    • Fix nemo megatron restore with artifacts by @ericharper :: PR: #3997
    • Fix megatron notebook by @ramanathan831 :: PR: #4004
    • Megatron work-arounds by @borisfom :: PR: #3998
    • Add T5 model P-tuning support by @yidong72 :: PR: #3768
    • Make index mappings dir configurable by @ericharper :: PR: #3868
    • T5 pipeline parallel by @MaximumEntropy :: PR: #3750

    Text Normalization / Inverse Text Normalization

    Changelog
    • Tn es by @bonham79 :: PR: #3632
    • Fix single GPU training issue + change deprecated Lightning args by @aklife97 :: PR: #4010

    Export

    Changelog
    • Conformer WARs for TRT8.2 by @borisfom :: PR: #3787
    • bert_module: fix inputs of export model by @virajkarandikar :: PR: #3815
    • Exports 22.03 war by @borisfom :: PR: #3957

    Bugfixes

    Changelog
    • patch librosa deprecation and fix by @fayejf :: PR: #3818

    General Improvements

    Changelog
    • Pynini pip by @yzhang123 :: PR: #3726
    • upgrade PTL trainer flags by @nithinraok :: PR: #3589
    • Updated Speech Data Explorer by @vsl9 :: PR: #3710
    • Fix spelling error in num_workers parameter to actually set number of dataset workers specified in yaml configs by @themikem :: PR: #3800
    • Support for Camembert Huggingface bert-like models by @itzsimpl :: PR: #3799
    • Update to 22.02 by @ericharper :: PR: #3771
    • Fixing the defaults of conformer models in the config files by @VahidooX :: PR: #3836
    • Fix T5 Encoder Mask while decoding by @MaximumEntropy :: PR: #3838
    • fix: multilingual transcribe does not require lang id param by @bmwshop :: PR: #3833
    • Misc improvements by @titu1994 :: PR: #3843
    • Change container by @MaximumEntropy :: PR: #3844
    • Making gender assignment random for cardinals, fractions, and decimal… by @bonham79 :: PR: #3759
    • Jenkinsfile test changes by @chenrichard10 :: PR: #3879
    • Adding a RegEx tokenizers by @michalivne :: PR: #3839
    • enable bias+dropout+add fusion with nvfuser at inference by @erhoo82 :: PR: #3869
    • Add text_generation_util to support TopK, TopP sampling + Tabular Data Generation. by @yidong72 :: PR: #3834
    • Ptl requirements bound by @MaximumEntropy :: PR: #3903
    • doc links update by @ekmb :: PR: #3891
    • add citations by @yzhang123 :: PR: #3902
    • Update NeMo CI to 22.03 by @MaximumEntropy :: PR: #3900
    • Add domain groups to changelog builder by @titu1994 :: PR: #3904
    • add input threshhold by @yzhang123 :: PR: #3913
    • improvements to commonvoice data script by @bmwshop :: PR: #3892
    • fixes to the cleanup flag by @bmwshop :: PR: #3921
    • Upgrade to PTL 1.6.0 by @ericharper :: PR: #3890
    • JSON output from diarization now includes sentences. Optimized senten… by @demsarjure :: PR: #3897
    • Stateless timer fix for PTL 1.6 by @MaximumEntropy :: PR: #3925
    • fix save_best missing chpt bug, update for setup_tokenizer() changes by @ekmb :: PR: #3932
    • Fix tarred sentence dataset length by @MaximumEntropy :: PR: #3941
    • remove old doc by @ekmb :: PR: #3946
    • Fix issues with librosa deprecations by @titu1994 :: PR: #3950
    • Fix notebook bugs for branch r1.8.0 by @yidong72 :: PR: #3948
    • Fix global batch fit loop by @ericharper :: PR: #3936
    • Refactor restorefrom by @ramanathan831 :: PR: #3927
    • Fix variable name and move models to CPU in Change partition by @aklife97 :: PR: #3972
    • Fix notebook error by @yidong72 :: PR: #3975
    • Notebook Bug Fixes for r1.8.0 by @vadam5 :: PR: #3989
    • Fix compat override for TalkNet Aligner by @redoctopus :: PR: #3993
    • docs fixes by @ekmb :: PR: #3987
    • Fixes val_check_interval, skip loading train data during eval by @MaximumEntropy :: PR: #3968
    • LogProb calculation performance fix by @yidong72 :: PR: #3984
    • Fix P-Tune T5 model by @yidong72 :: PR: #4001
    • Fix the broadcast shape mismatch by @yidong72 :: PR: #4017
    • Add known issues to notebook by @ericharper :: PR: #4024
    Source code(tar.gz)
    Source code(zip)
  • v1.7.2(Mar 17, 2022)

    GPT Bugfixes

    • GPT dataloader improvements and fixes by @crcrpar :: PRs #3826 , #3665
    • Disable nvfuser by @ericharper :: PR #3845
    • Set find_unused_parameters to False by @ericharper :: PR #3837

    T5 XNLI Example

    • T5 xnli eval by @yaoyu-33 :: PR: #3848
    Source code(tar.gz)
    Source code(zip)
  • v1.7.1(Mar 8, 2022)

    Known Issues

    • find_unused_parameters should be False when training GPT: #3837

    Bugfixes

    • revert changes by @yzhang123 :: PR: #3785
    • Fixed soft prompt eval loading bug by @vadam5 :: PR: #3805
    • mT5 whole word masking and T5 finetuning config fixes by @MaximumEntropy :: PR: #3776
    • Raise error if FP16 training is tried with O2 recipe. by @ericharper :: PR: #3806
    Source code(tar.gz)
    Source code(zip)
  • v1.7.0(Mar 2, 2022)

    Known Issues

    • Megatron GPT training with O2 and FP16 is bugged. FP16 and O1 still works.
    • find_unused_parameters should be False when training GPT: #3837
    • FastPitch training may result in stalled GPUs. Users will have to manually kill their runs and continue training from the latest checkpoint.
    • mT5 issue with whole word masking, see #3776
    • T5 finetuning config issue, see #3776

    Container

    NOTE: From NeMo 1.7.0 onwards, NeMo containers will follow the YY.MM conversion for naming, where the YY.MM value is based on the base container. For additional information regarding NeMo containers, please visit : https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

    docker pull nvcr.io/nvidia/nemo:22.01
    

    ASR

    • Wav2vec by @tbartley94 :: PR: #3297
    • Fix bug in multi-checkpoint loading by @sam1373 :: PR: #3536
    • Add HuggingFace Datasets to NeMo ASR Dataset script by @titu1994 :: PR: #3513
    • Add support for Gradient Clipping (clamp) in RNNT Numba loss by @titu1994 :: PR: #3550
    • Enable Tarred Dataset Support for NVIDIA DALI by @titu1994 :: PR: #3485
    • Add initial support for Buffered RNNT Scripts by @titu1994 :: PR: #3602
    • Significantly speed up RNNT loss on CUDA by @titu1994 :: PR: #3653
    • Fixing the bug in the stateful rnnt decoder. by @VahidooX :: PR: #3673
    • Add Buffered RNNT with LCS Merge algorithm by @titu1994 :: PR: #3669
    • Asr noise data scripts by @jbalam-nv :: PR: #3660
    • ASR SSL update by @sam1373 :: PR: #3746
    • Add randomized bucketing by @VahidooX :: PR: #3445
    • Self-supervised tutorial & update by @sam1373 :: PR: #3344
    • Updated conformer models. by @VahidooX :: PR: #3741
    • Added speaker identification script with cosine and neural classifier… by @nithinraok :: PR: #3672
    • Fix in clustering diarizer by @nithinraok :: PR: #3701
    • Add a function that writes cluster label in diarization pipeline by @tango4j :: PR: #3643

    TTS

    • port UnivNet to NeMo TTS collection by @L0SG :: PR: #3186
    • E2E TTS fixes by @redoctopus :: PR: #3508
    • New structure for TTS datasets in scripts/dataset_processing, VocoderDataset, update TTSDataset by @Oktai15 :: PR: #3484
    • Depreciate some TTS models and TTS datasets by @Oktai15 :: PR: #3576
    • Fix bugs in HiFi-GAN (scheduler, optimizers) and add input_example() in Mixer-TTS/Mixer-TTS-X by @Oktai15 :: PR: #3564
    • Update UnivNet, HiFi-GAN and WaveGlow, small fixes in Mixer-TTS, FastPitch and Exportable by @Oktai15 :: PR: #3585
    • Fix typo in FastPitch config (pitch_avg -> pitch_mean) by @eyentei :: PR: #3593
    • Fix incorrect usage of TTSDataset in some files and fix one-line bug in NVIDIA's CMUDict by @Oktai15 :: PR: #3594
    • Convert entry from UTF-16 to UTF-8 by @redoctopus :: PR: #3597
    • remove CheckInstall by @blisc :: PR: #3577
    • Fix UnivNet LibriTTS pretrained location by @m-toman :: PR: #3615
    • FastPitch training tutorial by @subhankar-ghosh :: PR: #3631
    • Update Aligner, add new methods to AlignmentEncoder by @Oktai15 :: PR: #3641
    • Add Mixed Representation Training by @blisc :: PR: #3473
    • Add speakerID to libritts/get_data.py by @subhankar-ghosh :: PR: #3662
    • Update TTS tutorials, Simplification of testing Mixer-TTS and FastPitch by @Oktai15 :: PR: #3680
    • Clean FastPitch_Finetuning.ipynb notebook by @Oktai15 :: PR: #3698
    • Add cache_size to BetaBinomialInterpolator, fix bugs in TTS tutorials and FastPitch by @Oktai15 :: PR: #3706
    • Fix bugs in VocoderDataset and TTSDataset by @Oktai15 :: PR: #3713
    • Fix bugs in E2E TTS, Mixer-TTS and FastPitch by @Oktai15 :: PR: #3740

    NLP / NMT

    • NLPDDPPlugin find_unused_parameters is configurable by @mlgill :: PR: #3478
    • Megatron encoder-decoder refactor by @michalivne :: PR: #3542
    • Finetuning NeMo Megatron T5 Models on GLUE by @MaximumEntropy :: PR: #3408
    • Pipeline parallelism for GPT by @ericharper :: PR: #3388
    • Generalized the P-tuning method to support various NLP tasks by @yidong72 :: PR: #3623
    • Megatron_LM checkpoint to NeMo checkpoint support by @yidong72 :: PR: #3692
    • Bugfix for GPT eval by @ericharper :: PR: #3744
    • Yuya/megatron t5 glue eval by @yaoyu-33 :: PR: #3751
    • Enforce legacy tokenizer for sentencepiece to add special tokens for T5 by @MaximumEntropy :: PR: #3457
    • Added P-Tuning method by @yidong72 :: PR: #3488
    • O2 style mixed precision training for T5 by @MaximumEntropy :: PR: #3664
    • LM adapted T5 dataset by @MaximumEntropy :: PR: #3654
    • Fix consumed samples calculation + PTune Model bugs by @yidong72 :: PR: #3738
    • Add pipeline support to eval methods by @ericharper :: PR: #3684
    • XNli benchmark by @yidong72 :: PR: #3693
    • Refactor dialogue state tracking for modelling/dataset interoperability by @Zhilin123 :: PR: #3526
    • Changes to support mean n-gram size masking for T5 by @MaximumEntropy :: PR: #3646
    • Dialogue state tracking refactor by @Zhilin123 :: PR: #3667
    • Parallel prompt tuning by @vadam5 :: PR: #3670
    • GEGLU activation for T5 by @MaximumEntropy :: PR: #3694

    Text Normalization / Inverse Text Normalization

    • Text normalization takes too much time for a string which contains a lot of dates by @PeganovAnton :: PR: #3451
    • ITN bug fixes, ip address, card num support, whitelist clean up by @ekmb :: PR: #3574
    • Fix tn bugs by @yzhang123 :: PR: #3580
    • add serial number to itn by @yzhang123 :: PR: #3584
    • ITN: SH bug fixes for telephone by @ekmb :: PR: #3592
    • Tn bug 1.7.0 by @yzhang123 :: PR: #3730
    • TN docs update by @ekmb :: PR: #3735

    Export

    • Update UnivNet, HiFi-GAN and WaveGlow, small fixes in Mixer-TTS, FastPitch and Exportable by @Oktai15 :: PR: #3585
    • Conformer onnx fix by @borisfom :: PR: #3524
    • Add onnx support for speaker models by @nithinraok :: PR: #3650
    • Jasper mask/export fix by @borisfom :: PR: #3691

    Bugfixes

    • Text normalization takes too much time for a string which contains a lot of dates by @PeganovAnton :: PR: #3451
    • Dialogue state tracking refactor/ SGDGEN patch 2 by @Zhilin123 :: PR: #3674
    • lower bound PTL to 1.5.10 and remove last ckpt patch fix by @nithinraok :: PR: #3690

    Improvements

    • Wfst tutorial by @tbartley94 :: PR: #3479
    • Update CMUdict with ADLR version pronunciations by @redoctopus :: PR: #3446
    • Fix docs by @yzhang123 :: PR: #3523
    • Add docstring to UnivNetModel by @L0SG :: PR: #3529
    • Increase lower bound due to security vulnerability by @ericharper :: PR: #3537
    • Add Change Log builder to NeMo by @titu1994 :: PR: #3527
    • Bugfix, need to freeze the model by @yidong72 :: PR: #3540
    • Bucketing quick fix by @tbartley94 :: PR: #3543
    • More fixes to SentencePiece for T5 by @MaximumEntropy :: PR: #3515
    • Update CONTRIBUTING.md by @Oktai15 :: PR: #3569
    • Update pr template and re-add Changelog builder by @titu1994 :: PR: #3575
    • Apex quick fix by @ekmb :: PR: #3591
    • Upgrade to 22.01 container by @ericharper :: PR: #3571
    • Fix typo and update minimal version of scipy by @Oktai15 :: PR: #3604
    • Add env variable to force transformers to run offline during CI by @ericharper :: PR: #3607
    • Correctly install NeMo wheel by @titu1994 :: PR: #3599
    • Fix wheel build by @titu1994 :: PR: #3610
    • Fixed EH and error reporting in restore_from by @borisfom :: PR: #3583
    • Clarifying documentation by @itzsimpl :: PR: #3616
    • Improve docs for finetuning by @titu1994 :: PR: #3622
    • Add NeMo version to all new .nemo files by @titu1994 :: PR: #3605
    • Update numba if NVIDIA_PYTORCH_VERSION not correct by @itzsimpl :: PR: #3614
    • Remove @experimental decorator in diarization related files. by @tango4j :: PR: #3625
    • Remove compression from .nemo files by @okuchaiev :: PR: #3626
    • Update adobe analytics by @ericharper :: PR: #3645
    • Add ssl tutorial to tutorial docs page by @sam1373 :: PR: #3649
    • Fix number of channels>1 issue by @ekmb :: PR: #3652
    • Fixed the bug in bucketing. by @VahidooX :: PR: #3663
    • Adding guard by @yzhang123 :: PR: #3655
    • Add tutorial paths by @titu1994 :: PR: #3651
    • Folder name update by @ekmb :: PR: #3671
    • Test HF online for SGD-GEN only by @MaximumEntropy :: PR: #3681
    • Update Librosa support to 0.9 by @titu1994 :: PR: #3682
    • Comment out numba in 22.01 release by @titu1994 :: PR: #3685
    • Fix failing tests inside of the 22.01 container in PR 3571 by @fayejf :: PR: #3609
    • Fixed Apex guard when imported classes are used for default values by @michalivne :: PR: #3700
    • Update citrinet_512.yaml by @Jorjeous :: PR: #3642
    • update torchaudio in Dockerfile to match torch version by @GNroy :: PR: #3637
    • Enforce import tests on the three domains by @titu1994 :: PR: #3702
    • Audio based norm speed up by @ekmb :: PR: #3703
    • Fix device on notebook by @titu1994 :: PR: #3732
    • pynini pip by @yzhang123 :: PR: #3729
    • Removed fp16 converting in complete method by @dimapihtar :: PR: #3709
    • Mirror AN4 while CMU servers are down by @titu1994 :: PR: #3743
    • Fix SSL configs for 1.7 by @sam1373 :: PR: #3748
    • Punct process bug fix by @ekmb :: PR: #3747
    • Specify gpus in SSL notebook by @sam1373 :: PR: #3753
    • Duplex model inference fix, money encoder fix by @ekmb :: PR: #3754
    • Update decoding strategy docs and override general value for tutorials by @titu1994 :: PR: #3755
    • Fix directories in ssl notebook by @sam1373 :: PR: #3758
    • Update Tacotron2_Training.ipynb by @blisc :: PR: #3769
    • Fix dockerfile by @yzhang123 :: PR: #3778
    • Prompt-Tuning-Documentation by @vadam5 :: PR: #3777
    • Prompt tuning bug fix by @vadam5 :: PR: #3780
    Source code(tar.gz)
    Source code(zip)
  • v1.6.2(Feb 5, 2022)

  • v1.6.1(Feb 2, 2022)

    Bug Fixes

    • Fix embedding name for verifying speakers #3578
    • Add rank check and barrier helpers compilation for megatron dataset #3581
    • Add apex import guards #3579
    Source code(tar.gz)
    Source code(zip)
  • v1.6.0(Jan 29, 2022)

    ASR

    • Add new features to ASR with diarization with modified tutorial and README. by @tango4j :: PR: #3007
    • Enable stateful decoding of RNNT over multiple transcribe calls by @titu1994 :: PR: #3037
    • Move vocabs from asr to common by @Oktai15 :: PR: #3084
    • Adding parallel transcribe for ASR models - suppports multi-gpu/multi-node by @VahidooX :: PR: #3017
    • CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072
    • Adding pretrained French ASR models to ctc_bpe and rnnt_bpe listings by @tbartley94 :: PR: #3225
    • adding german conformer ctc and rnnt by @yzhang123 :: PR: #3242
    • Add aishell and fisher dataset processing scripts for ASR by @jbalam-nv :: PR: #3203
    • Better default for RNNT greedy decoding by @titu1994 :: PR: #3332
    • Add uniform ASR evaluation script for all models by @titu1994 :: PR: #3334
    • CTC Segmentation-Citrinet support by @ekmb :: PR: #3279
    • Updates on ASR with diarization util files by @tango4j :: PR: #3359
    • Asr fr by @tbartley94 :: PR: #3404
    • Refactor ASR Examples Directory by @titu1994 :: PR: #3392
    • Asr patches by @titu1994 :: PR: #3443
    • Properly support -1 for labels in ctc char models by @titu1994 :: PR: #3487

    TTS

    • MixerTTS, MixerTTSDataset and small updates in tts tokenizers by @Oktai15 :: PR: #2859
    • ONNX and TorchScript support for Mixer-TTS by @Oktai15 :: PR: #3082
    • Update name of files to one style in TTS folder by @Oktai15 :: PR: #3189
    • Update TTS Dataset, FastPitch with TTS dataset and small improvements in HiFiGAN by @Oktai15 :: PR: #3205
    • Add Beta-binomial Interpolator to TTSDataset by @Oktai15 :: PR: #3230
    • Normalizer to TTS models, TTS tokenizer updates, AxisKind updates by @Oktai15 :: PR: #3271
    • Update Mixer-TTS, FastPitch and TTSDataset by @Oktai15 :: PR: #3366
    • Minor Updates to TTS Finetuning by @blisc :: PR: #3455

    NLP / NMT

    • NMT timing and tokenizer stats utils by @michalivne :: PR: #3004
    • Add offsets calculation to MegatronGPTModel.complete method by @dimapihtar :: PR: #3117
    • NMT checkpoint averaging by @michalivne :: PR: #3096
    • NMT validation examples with inputs by @michalivne :: PR: #3194
    • Improve data pipeline for punctuation capitalization model and make other useful changes by @PeganovAnton :: PR: #3159
    • Reduce test time of punctuation and capitalization model by @PeganovAnton :: PR: #3286
    • NLP text augmentation by @michalivne :: PR: #3291
    • Adding Megatron NeMo Bert support by @yidong72 :: PR: #3303
    • Added Script to convert Megatron LM to . nemo file by @yidong72 :: PR: #3371
    • Support Changing Number of Tensor Parallel Partitions for Megatron by @aklife97 :: PR: #3365
    • Megatron AMP fix for scheduler step counter by @titu1994 :: PR: #3293
    • T5 Pre-training in NeMo using Megatron by @MaximumEntropy :: PR: #3036
    • NMT MIM mean variance fix by @michalivne :: PR: #3385
    • NMT Shared Embeddings Weights by @michalivne :: PR: #3340
    • Make saving .nemo during on_train_end configurable by @ericharper :: PR: #3427
    • Byte-level Multilingual NMT by @aklife97 :: PR: #3368
    • BioMegatron token classification tutorial fix to be compatible with current Megatron BERT by @yidong72 :: PR: #3435
    • NMT documentation for bottleneck architecture by @michalivne :: PR: #3464
    • (1) O2-style mixed precision recipe, (2) Persistent layer-norm, (3) Grade scale hysteresis, (4) gradient_as_bucket_view by @erhoo82 :: PR: #3259

    Text Normalization / Inverse Text Normalization

    • Tn clean upsample by @yzhang123 :: PR: #3024
    • Tn add nn wfst and doc by @yzhang123 :: PR: #3135
    • Update english tn ckpt by @yzhang123 :: PR: #3143
    • WFST_tutorial for ITN development by @tbartley94 :: PR: #3128
    • German TN wfst by @yzhang123 :: PR: #3174
    • Add ITN Vietnamese by @binh234 :: PR: #3217
    • WFST TN updates by @ekmb :: PR: #3235
    • Itn german refactor by @yzhang123 :: PR: #3262
    • Tn german deterministic by @yzhang123 :: PR: #3308
    • TN updates by @ekmb :: PR: #3285
    • Added double digits to EN ITN by @yzhang123 :: PR: #3321
    • TN_non_deterministic optimized by @ekmb :: PR: #3343
    • Missing init for TN German by @ekmb :: PR: #3355
    • Ru TN by @ekmb :: PR: #3390
    • Update ContextNet models trained on more datasets by @titu1994 :: PR: #3440

    NeMo Tools

    • CTC Segmentation-Citrinet support by @ekmb :: PR: #3279
    • Updated NumPy SDE requirement by @vsl9 :: PR: #3442

    Export

    • ONNX and TorchScript support for Mixer-TTS by @Oktai15 :: PR: #3082
    • CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072

    Documentation

    • Merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3133
    • Tn add nn wfst and doc by @yzhang123 :: PR: #3135
    • Add apex into by @PeganovAnton :: PR: #3214
    • Final merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3232
    • Nemo container docker building instruction - merge to main by @fayejf :: PR: #3236
    • Doc link fixes by @nithinraok :: PR: #3264
    • French ASR Doc updates by @tbartley94 :: PR: #3322
    • german asr doc page update by @yzhang123 :: PR: #3325
    • update docs and replace speakernet with titanet in tutorials by @nithinraok :: PR: #3405
    • Asr fr by @tbartley94 :: PR: #3404
    • Update copyright to 2022 by @ericharper :: PR: #3426
    • Update Speech Classificatoin - VAD doc by @fayejf :: PR: #3430
    • Update speaker diarization docs by @tango4j :: PR: #3419
    • NMT documentation for bottleneck architecture by @michalivne :: PR: #3464
    • Add verification helper function and update docs by @nithinraok :: PR: #3514
    • Prompt tuning documentation by @vadam5 :: PR: #3541
    • French ASR Doc updates by @tbartley94 :: PR: #3322
    • German asr doc page update by @yzhang123 :: PR: #3325

    Bugfixes

    • Fixed wrong tgt_length for timing by @michalivne :: PR: #3050
    • Update nltk version with a CVE fix by @thomasdhc :: PR: #3054
    • Fix README by @ericharper :: PR: #3070
    • Transformer Decoder: Fix swapped input name issue by @aklife97 :: PR: #3066
    • Fixes bugs in collect_tokenizer_dataset_stats.py by @michalivne :: PR: #3060
    • Attribute is not working in . by @PeganovAnton :: PR: #3099
    • Merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3133
    • A quick fix for issue #3094 index out-of-bound when truncating long text to max_seq_length by @bugface :: PR: #3131
    • Fixed two typos by @bene-ges :: PR: #3157
    • Merge r1.5.0 bugfixes to main by @ericharper :: PR: #3173
    • LJSpeech alignment scripts fixed for latest MFA by @m-toman :: PR: #3177
    • Add apex into by @PeganovAnton :: PR: #3214
    • Patch omegaconf for cfg by @fayejf :: PR: #3224
    • Final merge r1.5.0 bugfixes and doc updates to main by @ericharper :: PR: #3232
    • CTC Conformer fixes for ONNX/TS export by @borisfom :: PR: #3072
    • Fix Masked SE for Citrinets + export Limited Context Citrinet by @titu1994 :: PR: #3216
    • Fix text length type in TTSDataset for beta_binomial_interpolator by @Oktai15 :: PR: #3233
    • Fix cast type in _se_pool_step_script related functions by @Oktai15 :: PR: #3239
    • Doc link fixes by @nithinraok :: PR: #3264
    • Escape chars fix by @ekmb :: PR: #3253
    • Fix asr output - eval mode by @nithinraok :: PR: #3274
    • Remove ArrayLike because it is not supported in numpy 1.18 by @PeganovAnton :: PR: #3282
    • Fix megatron_gpt_ckpt_to_nemo.py with torch distributed by @yaoyu-33 :: PR: #3278
    • Reduce test time of punctuation and capitalization model by @PeganovAnton :: PR: #3286
    • Tn en money fix by @yzhang123 :: PR: #3290
    • Fixing the bucketing_batch_size bug. by @VahidooX :: PR: #3294
    • Adaptiv fixed positional embeddings by @michalivne :: PR: #3263
    • Fix specaugment time start for numba kernel by @titu1994 :: PR: #3299
    • Fix for Stalled ASR training/eval on Pytorch 1.10+ (multigpu/multinode) by @titu1994 :: PR: #3304
    • Fix bucketing list bug. by @VahidooX :: PR: #3315
    • Fix MixerTTS types and dimensions by @Oktai15 :: PR: #3330
    • Fix german and vietnames grammar by @yzhang123 :: PR: #3331
    • Fix readme to show cmd by @yzhang123 :: PR: #3345
    • Fix speaker label models training convergence by @nithinraok :: PR: #3354
    • Tqdm get datasets by @bmwshop :: PR: #3358
    • Fixed future masking in cross attention of Perceiver by @michalivne :: PR: #3314
    • Fixed the bug of fixed-size bucketing. by @VahidooX :: PR: #3364
    • Fix minor problems in punctuation and capitalization model by @PeganovAnton :: PR: #3376
    • Megatron AMP fix for scheduler step counter by @titu1994 :: PR: #3293
    • fixed the bug of bucketing when fixed-size batch is used. by @VahidooX :: PR: #3399
    • TalkNet Fix by @stasbel :: PR: #3092
    • Fix linear annealing not annealing lr to min_lr by @MaximumEntropy :: PR: #3400
    • Resume training on SLURM multi-node multi-gpu by @itzsimpl :: PR: #3374
    • Fix running token classification in multinode setting by @PeganovAnton :: PR: #3413
    • Fix order of lang checking to ignore input langs by @MaximumEntropy :: PR: #3417
    • NMT MIM mean variance fix by @michalivne :: PR: #3385
    • Fix bug for missing variable by @MaximumEntropy :: PR: #3437
    • Asr patches by @titu1994 :: PR: #3443
    • Prompt tuning loss mask fix by @vadam5 :: PR: #3438
    • BioMegatron token classification tutorial fix to be compatible with current Megatron BERT by @yidong72 :: PR: #3435
    • Fix hysterisis loading by @MaximumEntropy :: PR: #3460
    • Fix the tutorial notebooks bug by @yidong72 :: PR: #3465
    • Fix the errors/bugs in ASR with diarization tutorial by @tango4j :: PR: #3461
    • WFST Punct post fix + punct tutorial fixes by @ekmb :: PR: #3469
    • Process correctly label ids dataset parameter + standardize type of label ids model attribute + minor changes (error messages, typing) by @PeganovAnton :: PR: #3471
    • file name fix - Segmentation tutorial by @ekmb :: PR: #3474
    • Patch fix for the multiple last checkpoints issue by @nithinraok :: PR: #3468
    • Fix bug with arguments for TalkNet's preprocessor by @Oktai15 :: PR: #3481
    • Fix description by @PeganovAnton :: PR: #3482
    • typo fix in diarization notebooks by @nithinraok :: PR: #3480
    • Fix checkpoint converter in O2 style by @yaoyu-33 :: PR: #3486
    • Remove pickled features from tarred dataset by @PeganovAnton :: PR: #3491
    • Fix link to NGC page for ASR by @titu1994 :: PR: #3512
    • vad typo fix by @fayejf :: PR: #3490
    • fixed the num_classes bug of conv decoder. by @VahidooX :: PR: #3525
    • Fixed section typo by @vadam5 :: PR: #3522
    • Fixed duplicate cell bug by @vadam5 :: PR: #3518
    • Fix bug in inference tts notebook by @Oktai15 :: PR: #3532
    • Fix nmt resume by @ericharper :: PR: #3539
    • TN bug fix by @ekmb :: PR: #3538
    • Fix bug with pretrained method in Inference_ModelSelect.ipynb by @Oktai15 :: PR: #3546
    • Fix an issue with wandb not displaying updated config changes by @titu1994 :: PR: #3552
    • Fix bug in inference tts notebook by @Oktai15 :: PR: #3532
    • Fix bug with pretrained method in Inference_ModelSelect.ipynb by @Oktai15 :: PR: #3546
    • Fix asr output - eval mode by @nithinraok :: PR: #3274
    • Fix for Stalled ASR training/eval on Pytorch 1.10+ (multigpu/multinode) by @titu1994 :: PR: #3304
    • Fix text length type in TTSDataset for beta_binomial_interpolator by @Oktai15 :: PR: #3233
    • Fix MixerTTS types and dimensions by @Oktai15 :: PR: #3330
    • Fix the errors/bugs in ASR with diarization tutorial by @tango4j :: PR: #3461
    • Fix link to NGC page for ASR by @titu1994 :: PR: #3512
    • Fix megatron_gpt_ckpt_to_nemo.py with torch distributed by @yaoyu-33 :: PR: #3278
    • Fix minor problems in punctuation and capitalization model by @PeganovAnton :: PR: #3376
    • Fix running token classification in multinode setting by @PeganovAnton :: PR: #3413
    • Fix description by @PeganovAnton :: PR: #3482
    • Fix nmt resume by @ericharper :: PR: #3539
    • TN bug fix by @ekmb :: PR: #3538
    • Fix german and vietnames grammar by @yzhang123 :: PR: #3331
    • Tn en money fix by @yzhang123 :: PR: #3290

    Improvements:

    • Remove STFT checks due to min PT version of 1.10 by @titu1994 :: PR: #3034
    • Add a stateless timer to specify max_time per run instead of global m… by @MaximumEntropy :: PR: #3056
    • (1) reduce the validation loss within a epoch, (2) convert global-bat… by @erhoo82 :: PR: #3055
    • Timer class monitors total time (train + validation + testing) to monitor when to end training by @MaximumEntropy :: PR: #3061
    • Add new by @PeganovAnton :: PR: #2963
    • Add PUBLICATIONS.md by @titu1994 :: PR: #3051
    • Hg cache by @yzhang123 :: PR: #3080
    • Add sequence axis to AxisKind.from_str() and improve time axis by @Oktai15 :: PR: #3090
    • Add logging to LS script by @titu1994 :: PR: #3141
    • Modify speaker input by @nithinraok :: PR: #3100
    • Typo correction in README.rst by @satpalsr :: PR: #3103
    • Self-supervised pre-training for speech models by @sam1373 :: PR: #3139
    • Add AISHELL 2 processing script by @titu1994 :: PR: #3195
    • Add support for multi-speaker FastPitch export by @ryanleary :: PR: #3192
    • Reduce number of log files for large runs by @blisc :: PR: #3191
    • Add support to modify nemo cache directory by @titu1994 :: PR: #3208
    • Add Pitch, Duration Tensors for Riva by @blisc :: PR: #3207
    • Upgrade to NVIDIA PyTorch 21.11 Container by @ericharper :: PR: #3234
    • Add WMT21 paper to Publications by @MaximumEntropy :: PR: #3256
    • Support for gecko tool by @nithinraok :: PR: #3266
    • Adding adaptive bucketing for tarred datasets. by @VahidooX :: PR: #3222
    • Initial refactor by @borisfom :: PR: #3272
    • Refactored prepare_for_export calls to ensure input size of example i… by @borisfom :: PR: #3305
    • Replacing outdated exports scripts by @borisfom :: PR: #3311
    • Batch implementation by @dimapihtar :: PR: #3276
    • Multiscale processing feature for speaker diarization by @tango4j :: PR: #3296
    • Add titanet by @nithinraok :: PR: #3333
    • update sparrowhawk export grammars to able to skip pynini by @yzhang123 :: PR: #3346
    • Prompt tuning by @vadam5 :: PR: #3309
    • Remove wordninja by @ekmb :: PR: #3363
    • Repair arbitrary file or folder deletion vulnerability by @haby0 :: PR: #3362
    • Moved shebangs to the first line by @davidalami :: PR: #3361
    • Added new method for logprobs computation by @dimapihtar :: PR: #3329
    • Update speaker collate functions by @nithinraok :: PR: #3381
    • Cache_hf by @ekmb :: PR: #3406
    • Update to NVIDIA PyTorch 21.12 Container by @ericharper :: PR: #3424
    • Working around Pytorch exporter issue with expand() by @borisfom :: PR: #3422
    • Remove apex by @ekmb :: PR: #3428
    • Vad infer refactor by @fayejf :: PR: #3394
    • Update LJSpeech preprocessing by @Oktai15 :: PR: #3423
    • Preprocess an entire folder of .json or .json.gz files into a single .bin and .idx file. by @MaximumEntropy :: PR: #3425
    • TimingCallback default buffer_size=1 by @michalivne :: PR: #3439
    • Extending input_example() to take max batch and dimension arguments by @borisfom :: PR: #3429
    • Refactor data preprocessing script by @yzhang123 :: PR: #3444
    • Test only if the model was trained on single GPU for accurate results. by @titu1994 :: PR: #3470
    • Upper bound ptl for r1.6.0, lower bound numpy in general by @ericharper :: PR: #3466
    • Add Apex import guard by @ericharper :: PR: #3467
    • Adding missing init files by @yzhang123 :: PR: #3505
    • Typos by @ekmb :: PR: #3504
    • Update titanet conf by @nithinraok :: PR: #3507
    • Raise PTL upper bound on r1.6.0 by @ericharper :: PR: #3510
    • Enforce utf-8 on all file r/w by @titu1994 :: PR: #3520
    • Pushing updated WFST Tutorial to r1.6.0 by @tbartley94 :: PR: #3521
    • WFST tutorial update by @tbartley94 :: PR: #3531
    • Update nvidia container check by @ericharper :: PR: #3535
    • Remove extra instance during restore by @ericharper :: PR: #3551
    • Remove wordtokenizer example from NLP tokenizer notebook by @aklife97 :: PR: #3477
    Source code(tar.gz)
    Source code(zip)
  • v1.5.1(Dec 4, 2021)

    Features

    • Minor updates to expose speaker id, pitch, and duration on export of FastPitch #3192, #3207

    Known Issues

    • Training of speaker models converge very slowly due to a bug (fixed in main: #3354)
    • ASR training does not reach adequate WER due to bug in Numba Spec Augment (fixed in main : #3299). For details refer to https://github.com/NVIDIA/NeMo/issues/3288#issuecomment-1000766337 . For a temporary workaround, disable Numba Spec Augment with https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/asr/modules/audio_preprocessing.py#L471 set to False in the config for SpecAugment in the yaml config. The fix will be part of 1.6.0.
    Source code(tar.gz)
    Source code(zip)
  • v1.5.0(Nov 20, 2021)

    Features

    • Megatron GPT pre-training with tensor model parallelism #2975
    • NMT encoder and decoder with different hidden size #2856
    • Logging timing of train/val/test steps #2936
    • Logging NMT encoder and decoder timing #2956
    • Logging timing per sentence length and tokenized text statistics #3004
    • Upgrade to PyTorch Lightning 1.5.0, bfloat support #2975
    • French Inverse Text Normalization #2921
    • Bucketing of tarred datasets for ASR models #2999
    • ASR with diarization #3007
    • Adding parallel transcribe for ASR models - suppports multi-gpu/multi-node #3017

    Documentation Updates

    • RNNT

    Contributors

    @ericharper @michalivne @MaximumEntropy @VahidooX @titu1994 @blisc @okuchaiev @tango4j @erastorgueva-nv @fayejf @vadam5 @ekmb @yaoyu-33 @nithinraok @erhoo82 @tbartley94 @PeganovAnton @madhukarkm @yzhang123 (Please let us know if you have contributed to this release and we have missed you here.)

    Source code(tar.gz)
    Source code(zip)
  • v1.4.0(Oct 2, 2021)

    Features

    • Improved speaker clustering #2729
    • Upgrade to NVIDIA PyTorch 21.08 container #2799
    • RNNT mAES beam search support #2802
    • Transfer learning for new speakers #2684
    • Simplify speaker scripts #2777
    • Perceiver-encoder architecture #2737
    • Relative paths in tarred datasets #2776
    • Torch only TTS package #2643
    • Inverse text normalization for Spanish #2489

    Tutorial Notebooks

    • Duration and pitch control for TTS # 2700

    Bug fixes

    • Fixed max delta generation #2727
    • Waveglow export #2671, #2699

    Contributors

    @tango4j @titu1994 @paarthneekhara @nithinraok @michalivne @erastorgueva-nv @borisfom @blisc (some contributors may not be listed explicitly)

    Source code(tar.gz)
    Source code(zip)
  • v1.3.0(Aug 27, 2021)

    Added

    • RNNT Exportable to ONNX #2510
    • Multi-batch inference support for speaker diarization #2522
    • DALI Integration for char/subword ASR #2567
    • VAD Postprocessing #2636
    • Perceiver encoder for NMT #2621
    • gRPC NMT server #2656
    • German ITN # 2486
    • Russian TN and ITN #2519
    • Save/restore connector # 2592
    • PTL 1.4+ # 2600

    Tutorial Notebooks

    • Non-English downstream NLP task #2532
    • RNNT Basics #2651

    Bug Fixes

    • NMESE clustering for very small audio files #2566

    Contributors

    @pasandi20 @ekmb @nithinraok @titu1994 @ryanleary @yzhang123 @ericharper @michalivne @MaximumEntropy @fayejf (some contributors may not be listed explicitly)

    Source code(tar.gz)
    Source code(zip)
  • v1.2.0(Jul 30, 2021)

    Added

    • Improve performance of speak clustering (#2445)
    • Update Conformer for ONNX conversion (#2439)
    • Mean and length normalization for better embeddings speaker verification and diarization (#2397)
    • FastEmit RNNT Loss Numba for reducing latency (#2374)
    • Multiple datasets, right to left models, noisy channel re-ranking, ensembling for NMT (#2379)
    • Byte level tokenization (#2365)
    • Bottleneck with attention bridge for more efficient NMT training (#2390)
    • Tutorial notebook for NMT data cleaning and preprocessing (#2467)
    • Streaming Conformer inference script for long audio files (#2373)
    • Res2Net Ecapa equivalent implementation for speaker verification and diarization (#2468)
    • Update end-to-end tutorial notebook to use CitriNet (#2457)

    Contributors

    @nithinraok @tango4j @jbalam-nv @titu1994 @MaximumEntropy @mchrzanowski @michalivne @jbalam-nv @fayejf @okuchaiev

    (some contributors may not be listed explicitly)

    Known Issues

    • import nemo.collections.nlp as nemo_nlp will result in an error. This will be patched in the upcoming version. Please try to import the individual files as a work-around.
    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(Jul 2, 2021)

    NeMo 1.1.0 release is our first release in our new monthly release cadence. Monthly releases will focus on adding new features that enable new NeMo Models or improve existing ones.

    Added

    • Pretrained Megatron-LM encoders (including model parallel) for NMT (#2238)
    • RNNT Numba loss (#1995)
    • Enable multiple models to be restored (#2245)
    • Audio based text normalization (#2285)
    • Multilingual NMT (#2160)
    • FastPitch export (#2355)
    • ASR fine-tuning tutorial for other languages (#2346)

    Bugfixes

    • HiFiGan Export (#2279)
    • OmegaConf forward compatibilty (#2319)

    Documentation

    • ONNX export documentation (#2330

    Contributors

    @borisfom @MaximumEntropy @ericharper @aklife97 @titu1994 @ekmb @yzhang123 @blisc

    (some contributors may not be listed explicitly)

    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Jun 11, 2021)

  • v1.0.1(Jun 9, 2021)

  • v1.0.0(Jun 3, 2021)

    Release 1.0.0

    NeMo 1.0.0 release is a stable version of "1.0.0 release candidate". It substantially improves overall quality and documentation. This update adds support for new tasks such as neural machine translation and many new models pretrained in different languages. As a mature tool for ASR and TTS it also adds new features for text normalization and denormalization, dataset creation based on CTC-segmentation and speech data explorer. These updates will benefit researchers in academia and industry by making it easier for them to develop and train new conversational AI models.

    To install this specific version from pip do:

    apt-get update && apt-get install -y libsndfile1 ffmpeg
    pip install Cython
    pip install nemo-toolkit['all']==1.0.0
    
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0rc1(Apr 7, 2021)

    Release 1.0.0rc1

    This release contains major new models, features and docs improvements. It is a "candidate" release for 1.0.0.

    To install from Pip do:

    apt-get update && apt-get install -y libsndfile1 ffmpeg
    pip install Cython
    pip install nemo_toolkit['all']==1.0.0rc1
    

    It adds the following model architectures:

    • CitriNet and Conformer-CTC for ASR
    • HiFiGan, MelGan, GlowTTS, UniGlow SqueezeWave for TTS

    In NLP collections, a neural machine translation task (NMT) has been added with Transformer-based models. This release includes pre-trained NMT models for these language pairs (in both directions):

    • En<->Es
    • En<->Ru
    • En<->Zh
    • En<->De
    • En<->Fr

    For ASR task, we also added QuartzNet models, trained on the following languages from Mozilla's Common Voice dataset: Zh, Ru, Es, Pl, Ca, It, Fr and De. In total, this release adds 60 new pre-trained models.

    This release also adds new NeMo tools for:

    • Text normalization
    • Dataset Creation Tool Based on CTC-Segmentation
    • Speech Data Explorer

    Known Issues

    This version is not compatible with PyTorch 1.8.* Please use 1.7.* with it or use our container.

    Source code(tar.gz)
    Source code(zip)
    1-100_roman_numeral_table_spanish.csv(8.87 KB)
    Screen.Shot.2021-04-08.at.2.23.25.PM.png(86.93 KB)
    test_data.tar.gz(9.96 MB)
    test_data.tar.gz-stable.gz(7.00 MB)
  • v1.0.0b4(Feb 16, 2021)

    Release 1.0.0b4

    This release is compatible with Jarvis and TLT public beta. It also updates versions of many dependencies and contains minor bug fixes over 1.0.0b3.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0b3(Dec 11, 2020)

    Release 1.0.0b3

    This release contains minor bug fixes over 1.0.0b2. It sets compatible version ranges for Hugging Face Transformers and Pytorch Lightning packages.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0b2(Nov 17, 2020)

    Release 1.0.0b2

    This release contains stability improvements and bug fixes. It also adds beam search support for CTC based ASR models.

    Highlights

    • Added beam search and external LM rescoring support for character-based CTC ASR models.
    • Switch to Pytorch Lightning version 1.0.5 or above.
    • Switch to Hydra version 1.0.3 or above.
    • Increase NVIDIA Pytorch container version to 20.09

    Known Issues

    This version will not work with Hugging Face transformers library version >=4.0.0. Please make sure your transformers library version is transformers>=3.1.0 and <4.0.0.

    Toolkit in an early version software.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0b1(Oct 5, 2020)

    Release 1.0.0b1

    This release is a major re-design compared to previous version. All NeMo models and modules are now compatible out-of-the box with Pytorch and Pytorch Lightning. Every NeMo model is a LightningModule that comes equipped with all supporting infrastructure for training and reproducibility. Every NeMo model has an example configuration file and a corresponding script that contains all configurations needed for training. NeMo, Pytorch Lightning, and Hydra makes all NeMo models have the same look and feel so that it is easy to do Conversational AI research across multiple domains. New models such as Speaker verification and Megatron are added.

    Highlights

    • Pytorch Lightning based Core
    • Hydra and Omegaconf configuration management
    • All model's files tarred together as .nemo files make it easy for users to download models automatically from NGC
    • NGC collections now includes a collection of all NeMo assets in one
    • New Models & tutorials
      • ASR: SpeakerNet speaker verification model
      • NLP: Bio Megatron state of the art model trained on bio medical tasks
    • ASR, NLP and TTS tutorials as interactive notebooks

    Known Issues

    Toolkit in an early version software. Breaking changes compared to previous version.

    Resolved Issues

    All models and modules can be used anywhere torch.nn.Module is expected.

    Source code(tar.gz)
    Source code(zip)
  • v0.11.0(Jul 10, 2020)

Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downstream tasks like translation and summarisation.

PART 2: CHAIN LINKING AUDIO-TO-TEXT NLP TASKS 2A: TRANSCRIBE-TRANSLATE-SENTIMENT-ANALYSIS In notebook3.0, I demo a simple workflow to: transcribe a lo

Chua Chin Hon 30 Jul 13, 2022
Python implementation of TextRank for phrase extraction and summarization of text documents

PyTextRank PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension, used to: extract the top-ranked phrases from text document

derwen.ai 1.9k Jan 06, 2023
Graphical user interface for Argos Translate

Argos Translate GUI Website | GitHub | PyPI Graphical user interface for Argos Translate. Install pip3 install argostranslategui

Argos Open Tech 16 Dec 07, 2022
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation In this repo you can find the code of the Supervised Hybrid Audio Segmentatio

Machine Translation @ UPC 21 Dec 20, 2022
Associated Repository for "Translation between Molecules and Natural Language"

MolT5: Translation between Molecules and Natural Language Associated repository for "Translation between Molecules and Natural Language". Table of Con

67 Dec 15, 2022
Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

PyStanfordDependencies Python interface for converting Penn Treebank trees to Universal Dependencies and Stanford Dependencies. Example usage Start by

David McClosky 64 May 08, 2022
An extension for asreview implements a version of the tf-idf feature extractor that saves the matrix and the vocabulary.

Extension - matrix and vocabulary extractor for TF-IDF and Doc2Vec An extension for ASReview that adds a tf-idf extractor that saves the matrix and th

ASReview 4 Jun 17, 2022
Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

XLM-EMO: Multilingual Emotion Prediction in Social Media Text Abstract Detecting emotion in text allows social and computational scientists to study h

MilaNLP 35 Sep 17, 2022
Summarization module based on KoBART

KoBART-summarization Install KoBART pip install git+https://github.com/SKT-AI/KoBART#egg=kobart Requirements pytorch==1.7.0 transformers==4.0.0 pytor

seujung hwan, Jung 148 Dec 28, 2022
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

ASYML 2.3k Jan 07, 2023
Sequence Modeling with Structured State Spaces

Structured State Spaces for Sequence Modeling This repository provides implementations and experiments for the following papers. S4 Efficiently Modeli

HazyResearch 902 Jan 06, 2023
A unified tokenization tool for Images, Chinese and English.

ICE Tokenizer Token id [0, 20000) are image tokens. Token id [20000, 20100) are common tokens, mainly punctuations. E.g., icetk[20000] == 'unk', ice

THUDM 42 Dec 27, 2022
Multilingual text (NLP) processing toolkit

polyglot Polyglot is a natural language pipeline that supports massive multilingual applications. Free software: GPLv3 license Documentation: http://p

RAMI ALRFOU 2.1k Jan 07, 2023
Outreachy TFX custom component project

Schema Curation Custom Component Outreachy TFX custom component project This repo contains the code for Schema Curation Custom Component made as a par

Robert Crowe 5 Jul 16, 2021
Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021.

capbot-siic Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021. Problem Inspiration A plethora

Aryan Kargwal 19 Feb 17, 2022
MHtyper is an end-to-end pipeline for recognized the Forensic microhaplotypes in Nanopore sequencing data.

MHtyper is an end-to-end pipeline for recognized the Forensic microhaplotypes in Nanopore sequencing data. It is implemented using Python.

willow 6 Jun 27, 2022
Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models

PEGASUS library Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models, or PEGASUS, uses self-supervised

Google Research 1.4k Dec 22, 2022
Kinky furry assitant based on GPT2

KinkyFurs-V0 Kinky furry assistant based on GPT2 How to run python3 V0.py then, open web browser and go to localhost:8080 Requirements: Flask trans

Sparki 1 Jun 11, 2022
PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Feature_CRF_AE Feature_CRF_AE provides a implementation of Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging

Jacob Zhou 6 Apr 29, 2022
Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)

Bunkai Bunkai is a sentence boundary (SB) disambiguation tool for Japanese texts. Quick Start $ pip install bunkai $ echo -e '宿を予約しました♪!まだ2ヶ月も先だけど。早すぎ

Megagon Labs 160 Dec 23, 2022