HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

Last update: Dec 26, 2022

Overview

HuggingSound

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools.

I have no intention of building a very complex tool here. I just wanna have an easy-to-use toolkit for my speech-related experiments. I hope this library could be helpful for someone else too :)

Requirements

Python 3.7+

Installation

$ pip install huggingsound

How to use it?

I'll try to summarize the usage of this toolkit. But many things will be missing from the documentation below. I promise to make it better soon. For now, you can open an issue if you have some questions or look at the source code to see how it works. You can check more usage examples in the repository examples folder.

Speech recognition

For speech recognition you can use any CTC model hosted on the Hugging Face Hub. You can find some available models here.

Inference

from huggingsound import SpeechRecognitionModel

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-english")
audio_paths = ["/path/to/sagan.mp3", "/path/to/asimov.wav"]

transcriptions = model.transcribe(audio_paths)

print(transcriptions)

# transcriptions format (a list of dicts, one for each audio file):
# [
#  {
#   "transcription": "extraordinary claims require extraordinary evidence", 
#   "start_timestamps": [100, 120, 140, 180, ...],
#   "end_timestamps": [120, 140, 180, 200, ...],
#   "probabilities": [0.95, 0.88, 0.9, 0.97, ...]
# },
# ...]
#
# as you can see, not only the transcription is returned but also the timestamps (in milliseconds) 
# and probabilities of each character of the transcription.

Inference (boosted by a language model)

from huggingsound import SpeechRecognitionModel, KenshoLMDecoder

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-english")
audio_paths = ["/path/to/sagan.mp3", "/path/to/asimov.wav"]

# The LM format used by the LM decoders is the KenLM format (arpa or binary file).
# You can download some LM files examples from here: https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english/tree/main/language_model
lm_path = "path/to/your/lm_files/lm.binary"
unigrams_path = "path/to/your/lm_files/unigrams.txt"

# We implemented three different decoders for LM boosted decoding: KenshoLMDecoder, ParlanceLMDecoder, and FlashlightLMDecoder
# On this example, we'll use the KenshoLMDecoder
# To use this decoder you'll need to install the Kensho's ctcdecode first (https://github.com/kensho-technologies/pyctcdecode)
decoder = KenshoLMDecoder(model.token_set, lm_path=lm_path, unigrams_path=unigrams_path)

transcriptions = model.transcribe(audio_paths, decoder=decoder)

print(transcriptions)

Evaluation

from huggingsound import SpeechRecognitionModel

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-english")

references = [
    {"path": "/path/to/sagan.mp3", "transcription": "extraordinary claims require extraordinary evidence"},
    {"path": "/path/to/asimov.wav", "transcription": "violence is the last refuge of the incompetent"},
]

evaluation = model.evaluate(references)

print(evaluation)

# evaluation format: {"wer": 0.08, "cer": 0.02}

Fine-tuning

from huggingsound import TrainingArguments, ModelArguments, SpeechRecognitionModel, TokenSet

model = SpeechRecognitionModel("facebook/wav2vec2-large-xlsr-53")
output_dir = "my/finetuned/model/output/dir"

# first of all, you need to define your model's token set
# however, the token set is only needed for non-finetuned models
# if you pass a new token set for an already finetuned model, it'll be ignored during training
tokens = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "'"]
token_set = TokenSet(tokens)

# define your train/eval data
train_data = [
    {"path": "/path/to/sagan.mp3", "transcription": "extraordinary claims require extraordinary evidence"},
    {"path": "/path/to/asimov.wav", "transcription": "violence is the last refuge of the incompetent"},
]
eval_data = [
    {"path": "/path/to/sagan2.mp3", "transcription": "absence of evidence is not evidence of absence"},
    {"path": "/path/to/asimov2.wav", "transcription": "the true delight is in the finding out rather than in the knowing"},
]

# and finally, fine-tune your model
model.finetune(
    output_dir, 
    train_data=train_data, 
    eval_data=eval_data, # the eval_data is optional
    token_set=token_set,
)

Troubleshooting

If you are having trouble when loading MP3 files: $ sudo apt-get install ffmpeg

Want to help?

See the contribution guidelines if you'd like to contribute to HuggingSound project.

You don't even need to know how to code to contribute to the project. Even the improvement of our documentation is an outstanding contribution.

If this project has been useful for you, please share it with your friends. This project could be helpful for them too.

If you like this project and want to motivate the maintainers, give us a ⭐ . This kind of recognition will make us very happy with the work that we've done with ❤️

You can also

Citation

If you want to cite the tool you can use this:

@misc{grosman2022huggingsound,
  title={HuggingSound},
  author={Grosman, Jonatas},
  publisher={GitHub},
  journal={GitHub repository},
  howpublished={\url{https://github.com/jonatasgrosman/huggingsound}},
  year={2022}
}

Comments

Compability with python 3.10

This package cannot be installed with python 3.10.

Trying to install the wheel manually it complains about the correct numba version not being available. Is numba<0.54.0,>=0.53.1 really required instead e.g. numba==0.55.0 or any other version which is available for python 3.10?

I would love to use this library, but currently it does not seem to be possible to install it on Ubuntu 22.04.

opened by FredHaa 5
'CTCTrainer' object has no attribute 'use_amp'
Use the latest huggingsound.

#!pip list | grep huggingsound huggingsound 0.1.4

AttributeError occurs when finetune is performed as shown in the sample below. https://github.com/jonatasgrosman/huggingsound#fine-tuning

/usr/local/lib/python3.7/dist-packages/huggingsound/trainer.py in training_step(self, model, inputs) 432 inputs = self._prepare_inputs(inputs) 433 --> 434 if self.use_amp: 435 with torch.cuda.amp.autocast(): 436 loss = self.compute_loss(model, inputs)

AttributeError: 'CTCTrainer' object has no attribute 'use_amp'

Can you find the cause?
opened by its-ogawa 5
Solved issue related to prediction padding and pad_token_id
This is a relatively hidden bug and took me quite some time to debug :)

Issue _compute_metrics() in the evaluation loop calculates wrong CER & WER metrics when using a TokenSet where the pad_token_id is not equal to 0. This doesn't affect the loss calculation / training as such, but the logged metrics during training will be wrong and won't match the metrics calculated using model.evaluate() after training.

Reason Similar to the label_ids the prediction logits are passed to _compute_metrics() as a matrix padded with -100 values. Currently the argmax call which maps logits to token ids converts these -100 values to 0. So after the argmax, pred_ids will be 0-padded. For most wav2vec2 models this is not an issue, because their vocab.json assigns the ID 0 to the <pad> token. However, if you use a custom TokenSet for finetuning, <pad> will most probably not be mapped to 0, so the obtained "0-padding" values will wrongly correpond to another token. See relevant code here: https://github.com/jonatasgrosman/huggingsound/blob/main/huggingsound/trainer.py#L599

pred_logits = pred.predictions pred_ids = np.argmax(pred_logits, axis=-1) pred.label_ids[pred.label_ids == -100] = processor.tokenizer.pad_token_id

Proposed Solution Save a padding_mask which stores the location of the padded -100 in the prediction logits. Then after applying the argmax use the mask to set the padded entries to the ID corresponding to the padding token.
opened by nkaenzig 2

Question about '1b' model

Dear Jonatas,

Question, not a bug-report. The jonatasgrosman/wav2vec2-xls-r-1b-german model removes all numbers. Is there a way to recognize numbers?

Thank you for your great models! Best wishes from Vienna Markus

Testcase - output.zip Meaning: etwa 20000 euro - ungefähr 12000 euro, 1b result: etwa euro - ungefähr euro

import torch, transformers, librosa
filepath = 'output.wav'
for MODEL_ID in ['jonatasgrosman/wav2vec2-large-xlsr-53-german','jonatasgrosman/wav2vec2-xls-r-1b-german']:
    processor = transformers.Wav2Vec2Processor.from_pretrained(MODEL_ID)
    model = transformers.Wav2Vec2ForCTC.from_pretrained(MODEL_ID)
    speech_array, sampling_rate = librosa.load(filepath, sr=16_000)
    inputs = processor(speech_array, sampling_rate=16_000, return_tensors="pt", padding=True)
    with torch.no_grad():
        logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
    predicted_ids = torch.argmax(logits, dim=-1)
    predicted_sentences = processor.batch_decode(predicted_ids)
    print( MODEL_ID, predicted_sentences[0] )

opened by doublex 2

Getting error during training

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

Which means that the data did not move to GPU.

My code:

torch.device("cuda")

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-spanish", device="cuda")
processor_ref = Wav2Vec2Processor.from_pretrained("jonatasgrosman/wav2vec2-large-xlsr-53-spanish")
token_list = list(processor_ref.tokenizer.encoder.keys())
token_set = TokenSet(token_list)

train_set = []
eval_set = []

train_set, eval_set = add_sealed_data_set(train_set, eval_set, config[environment][SAMPLES_DIR])

training_arguments = TrainingArguments()
training_arguments.overwrite_output_dir = True
training_arguments.per_device_train_batch_size = 128
training_arguments.per_device_eval_batch_size = 128

model.finetune(
    config[environment][MODEL_OUTPUT_DIR],
    train_data=train_set,
    eval_data=eval_set,  # the eval_data is optional
    token_set=token_set,
    training_args=training_arguments
)

Managing to work around this by adding a move to cuda of my dataset inside huggingsound code. If I can make it work I'll create a PR

opened by arikhalperin 2

Pre-trained uppercase models don't work
First of all thanks for this great library, it's really helpful :)

I just tried to fine-tune a model by facebook that they previously fine-tuned on English transcription tasks: facebook/wav2vec2-large-960h-lv60-self.

During the training I get WERs of 100%, and after training, model.transcribe() returns empty results.

The issue seems to be that this model was trained with a upper-case character vocabulary.

To overcome this, I found this very easy fix, which just converts the vocabulary of the encoder/decoder to lower case:

from huggingsound import TrainingArguments, ModelArguments, SpeechRecognitionModel, TokenSet model = SpeechRecognitionModel(model_name, device='cuda') model.processor.tokenizer.encoder = {k.lower(): v for k, v in model.processor.tokenizer.encoder.items()} model.processor.tokenizer.decoder = {k: v.lower() for k, v in model.processor.tokenizer.decoder.items()}

Would be great to integrate this somehow into the library.
opened by nkaenzig 2

raise NoBackendError() audioread.exceptions.NoBackendError

First of all thank you for your work! I am not able to run the transcibe() method.

This is my code:

from huggingsound import SpeechRecognitionModel

model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-large-xlsr-53-german")
path = r"C:/Users/johndoe/PycharmProjects/kedro_pipeline/data/01_raw/Bond_ueber_das_wetter_und_berlin.mp3"
audio_paths = [path]

transcriptions = model.transcribe(audio_paths)

I assume my path is not correct, but I already tried different Formats:

r"C:\\Users\\johndoe\\..."
r"C:\Users\johndoe\..."

-> did not work either.

This is the output:

02/24/2022 11:40:11 - INFO - huggingsound.speech_recognition.model - Loading model...
  0%|          | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\librosa\core\audio.py", line 149, in load
    with sf.SoundFile(path) as sf_desc:
  File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\soundfile.py", line 629, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\soundfile.py", line 1183, in _open
    _error_check(_snd.sf_error(file_ptr),
  File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\soundfile.py", line 1357, in _error_check
    raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening 'C:/Users/johndoe/PycharmProjects/kedro_pipeline/data/01_raw/Bond_ueber_das_wetter_und_berlin.mp3': File contains data in an unknown format.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/johndoe/PycharmProjects/main.py", line 7, in <module>
    transcriptions = model.transcribe(audio_paths)
  File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\huggingsound\speech_recognition\model.py", line 108, in transcribe
    waveforms = get_waveforms(paths_batch, sampling_rate)
  File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\huggingsound\utils.py", line 52, in get_waveforms
    waveform, sr = librosa.load(path, sr=sampling_rate)
  File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\librosa\core\audio.py", line 166, in load
    y, sr_native = __audioread_load(path, offset, duration, dtype)
  File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\librosa\core\audio.py", line 190, in __audioread_load
    with audioread.audio_open(path) as input_file:
  File "C:\Users\johndoe\anaconda3\envs\lib\site-packages\audioread\__init__.py", line 116, in audio_open
    raise NoBackendError()
audioread.exceptions.NoBackendError

Process finished with exit code 1

opened by DanielGuo1 2

Fine-tuned version of model - a raised exception
Hello.

I am getting the following exception:

ValueError: Not fine-tuned model! Please, fine-tune the model first.

I have looked into the code and see that it needs to have Wav2Vec2ForPreTraining (self.model_config.architectures) in the ctc_finetuded_architectures variable.

Now this variable has these values:

{'WavLMForCTC', 'HubertForCTC', 'UniSpeechSatForCTC', 'Wav2Vec2ForCTC', 'UniSpeechForCTC', 'SEWForCTC', 'SEWDForCTC'}

I am running the code with this model - https://huggingface.co/Yehor/wav2vec2-xls-r-300m-uk-with-lm

I disabled the code that raises that exception and it seems there is no issue.

I would like to use some type of configuration to be able to run the code without changing the library code.
opened by egorsmkv 2
Bump datasets from 1.18.3 to 2.6.1
Bumps datasets from 1.18.3 to 2.6.1.

Release notes

Sourced from datasets's releases.

2.6.1

Bug fixes

Fix filter indices when batched by @albertvillanova in huggingface/datasets#5113

fixed a bug where filter could return examples with the wrong indices

Fix iter_batches by @lhoestq in huggingface/datasets#5115

fixed a bug where map with batch=True could return a dataset with less examples

Fix a typo in arrow_dataset.py by @yangky11 in huggingface/datasets#5108

New Contributors

@yangky11 made their first contribution in huggingface/datasets#5108

Full Changelog: https://github.com/huggingface/datasets/compare/2.6.0...2.6.1

2.6.0

Important

[GH->HF] Remove all dataset scripts from github by @lhoestq in huggingface/datasets#4974

all the dataset scripts and dataset cards are now on https://hf.co/datasets

we invite users and contributors to open discussions or pull requests on the Hugging Face Hub from now on

Datasets features

Add ability to read-write to SQL databases. by @Dref360 in huggingface/datasets#4928

Read from sqlite file:

from datasets import Dataset dataset = Dataset.from_sql("data_table", "sqlite:///sqlite_file.db")

Allow connection objects in from_sql + small doc improvement by @mariosasko in huggingface/datasets#5091

from datasets import Dataset from sqlite3 import connect con = connect(...) dataset = Dataset.from_sql("SELECT text FROM table WHERE length(text) > 100 LIMIT 10", con)

Image & Audio formatting for numpy/torch/tf/jax by @lhoestq in huggingface/datasets#5072

return numpy/torch/tf/jax tensors with

from datasets import load_dataset ds = load_dataset("imagenet-1k").with_format("torch") # or numpy/tf/jax ds[0]["image"]

Added IterableDataset.from_generator by @hamid-vakilzadeh in huggingface/datasets#5052

Fast dataset iter by @mariosasko in huggingface/datasets#5030

speed up by a factor of 2 using the Arrow Table reader

Dataset infos in yaml by @lhoestq in huggingface/datasets#4926

you can now specify the feature types and number of samples in the dataset card, see https://huggingface.co/docs/datasets/dataset_card

Add kwargs to Dataset.from_generator by @mariosasko in huggingface/datasets#5049

Support converters in CsvBuilder by @mariosasko in huggingface/datasets#5057

Restore saved format state in load_from_disk by @asofiaoliveira in huggingface/datasets#5073

... (truncated)

Commits

1742cf1 Release: 2.6.1

eadc79a Fix iter_batches (#5115)

d60f5ff Fix filter indices when batched (#5113)

3ad9644 Fix a typo in arrow_dataset.py (#5108)

0d4e390 set dev version

dc3f72e Release: 2.6.0

99680a7 Fix task template reload from dict (#5106)

dc4c764 fix for evaluate 0.2.2

9ec6cc7 Free the "hf" filesystem protocol for hffs (#5101)

bbebe3f url encode hub url (#5099) (#5103)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 1
Bump transformers from 4.16.2 to 4.23.1
Bumps transformers from 4.16.2 to 4.23.1.

Release notes

Sourced from transformers's releases.

v4.23.1 Patch release

Fix a revert introduced by mistake making the "automatic-speech-recognition" for Whisper.

Fix whisper for pipeline by @ArthurZucker in #19482

v4.23.0: Whisper, Deformable DETR, Conditional DETR, MarkupLM, MSN, safetensors

Whisper

The Whisper model was proposed in Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever.

Whisper is an encoder-decoder Transformer trained on 680,000 hours of labeled (transcribed) audio. The model shows impressive performance and robustness in a zero-shot setting, in multiple languages.

Add WhisperModel to transformers by @ArthurZucker in #19166

Add TF whisper by @amyeroberts in #19378

Deformable DETR

The Deformable DETR model was proposed in Deformable DETR: Deformable Transformers for End-to-End Object Detection by Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai.

Deformable DETR mitigates the slow convergence issues and limited feature spatial resolution of the original DETR by leveraging a new deformable attention module which only attends to a small set of key sampling points around a reference.

Add Deformable DETR by @NielsRogge in #17281

[fix] Add DeformableDetrFeatureExtractor by @NielsRogge in #19140

Conditional DETR

The Conditional DETR model was proposed in Conditional DETR for Fast Training Convergence by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang.

Conditional DETR presents a conditional cross-attention mechanism for fast DETR training. Conditional DETR converges 6.7× to 10× faster than DETR.

Add support for conditional detr by @DeppMeng in #18948

Improve conditional detr docs by @NielsRogge in #19154

Time Series Transformer

The Time Series Transformer model is a vanilla encoder-decoder Transformer for time series forecasting.

The model is trained in a similar way to how one would train an encoder-decoder Transformer (like T5 or BART) for machine translation; i.e. teacher forcing is used. At inference time, one can autoregressively generate samples, one time step at a time.

:warning: This is a recently introduced model and modality, so the API hasn't been tested extensively. There may be some bugs or slight breaking changes to fix it in the future. If you see something strange, file a Github Issue.

time series forecasting model by @kashif in #17965

Masked Siamese Networks

The ViTMSN model was proposed in Masked Siamese Networks for Label-Efficient Learning by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas.

MSN (masked siamese networks) consists of a joint-embedding architecture to match the prototypes of masked patches with that of the unmasked patches. With this setup, the method yields excellent performance in the low-shot and extreme low-shot regimes for image classification, outperforming other self-supervised methods such as DINO. For instance, with 1% of ImageNet-1K labels, the method achieves 75.7% top-1 accuracy.

MSN (Masked Siamese Networks) for ViT by @sayakpaul in #18815

... (truncated)

Commits

bd469c4 Release: v4.23.1

c8bc0a0 Fix whisper for pipeline (#19482)

9ae22fe Release: v4.23.0

df2f281 wrap forward passes with torch.no_grad() (#19412)

5f5e264 wrap forward passes with torch.no_grad() (#19413)

c6a928c wrap forward passes with torch.no_grad() (#19414)

d739a70 wrap forward passes with torch.no_grad() (#19416)

870a954 wrap forward passes with torch.no_grad() (#19438)

692c5be wrap forward passes with torch.no_grad() (#19439)

a7bc422 fix (#19469)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 1
Bump datasets from 1.18.3 to 2.5.2
Bumps datasets from 1.18.3 to 2.5.2.

Release notes

Sourced from datasets's releases.

2.5.2

Bug fixes

Revert task removal in folder-based builders (#5051)

Support hfh 0.10 implicit auth (#5031)

Full Changelog: https://github.com/huggingface/datasets/compare/2.5.1...2.5.2

2.5.1

Bug fixes

Revert input_columns change by @lhoestq in huggingface/datasets#5006

Full Changelog: https://github.com/huggingface/datasets/compare/2.5.0...2.5.1

2.5.0

Important

Drop Python 3.6 support by @mariosasko in huggingface/datasets#4460

Deprecate metrics by @albertvillanova in huggingface/datasets#4739

Metrics are now deprecated and have been moved to evaluate:
!pip install evaluate import evaluate metric = evaluate.load("accuracy")

Load GitHub datasets from Hub by @albertvillanova in huggingface/datasets#4059

datasets with no namespace like "squad" were loaded from this GitHub repository, now they're loaded from https://huggingface.co/datasets

Decode mp3 with librosa if torchaudio is > 0.12 as a temporary workaround by @polinaeterna in huggingface/datasets#4923

latest version of torchaudio 0.12 now requires ffmpeg (version 4) to read MP3 files, please downgrade to 0.12 for now or use librosa

Use HTTP requests to access data and metadata through the Datasets REST API (docs here)

Datasets features

No-code loaders

Add AudioFolder packaged loader by @polinaeterna in huggingface/datasets#4530

Add support for CSV metadata files to ImageFolder by @mariosasko in huggingface/datasets#4837

Add support for parsing JSON files in array form by @mariosasko in huggingface/datasets#4997

Dataset methods

add Dataset.from_list by @sanderland in huggingface/datasets#4890

Add Dataset.from_generator by @mariosasko in huggingface/datasets#4957

Add oversampling strategies to interleave datasets by @ylacombe in huggingface/datasets#4831

Preserve non-input_colums in Dataset.map if input_columns are specified by @mariosasko in huggingface/datasets#4971

Add fn_kwargs param to IterableDataset.map by @mariosasko in huggingface/datasets#4975

More rigorous shape inference in to_tf_dataset by @Rocketknight1 in huggingface/datasets#4763

Parquet support

... (truncated)

Commits

c59cc34 Release: 2.5.2

60dcc68 patch CI_HUB_TOKEN_PATH with Path instead of str (#5026)

4a81477 run ci

c1a66f0 Support hfh 0.10 implicit auth (#5031)

08dfdc9 Revert task removal in folder-based builders (#5051)

0c84b71 Release: 2.5.1

4889d5d Revert input_columns change (#5006)

cec23d5 fix missing bigbench in ci

98dec70 set dev version

6fc30c1 Release: 2.5.0

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 1

Error during fine-tuning

I have code:

from huggingsound import TrainingArguments, ModelArguments, SpeechRecognitionModel, TokenSet
from transformers import Wav2Vec2Processor

processor_ref = Wav2Vec2Processor.from_pretrained("/my/dir/wav2vec2-large-xlsr-53-kalmyk")
token_list = list(processor_ref.tokenizer.encoder.keys())
print(len(token_list))

model = SpeechRecognitionModel("/my/dir/wav2vec2-large-xlsr-53-kalmyk")
output_dir = "/my/dir/tuned"

token_set = TokenSet(token_list)

model.finetune(
    output_dir, 
    train_data=train_data,
    token_set=token_set
)

I have list of dicts like this in my train_data:

train_data = [
    {"path": "/path/to/sagan.mp3", "transcription": "extraordinary claims require extraordinary evidence"},
    {"path": "/path/to/asimov.wav", "transcription": "violence is the last refuge of the incompetent"},
]

Then i get some errors. Can someone help me with that?

	size mismatch for lm_head.weight: copying a param with shape torch.Size([41, 1024]) from checkpoint, the shape in current model is torch.Size([45, 1024]).
	size mismatch for lm_head.bias: copying a param with shape torch.Size([41]) from checkpoint, the shape in current model is torch.Size([45]).
	You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method. ```

opened by utnasun 0

Bump transformers from 4.23.1 to 4.24.0
Bumps transformers from 4.23.1 to 4.24.0.

Release notes

Sourced from transformers's releases.

v4.24.0: ESM-2/ESMFold, LiLT, Flan-T5, Table Transformer and Contrastive search decoding

ESM-2/ESMFold

ESM-2 and ESMFold are new state-of-the-art Transformer protein language and folding models from Meta AI's Fundamental AI Research Team (FAIR). ESM-2 is trained with a masked language modeling objective, and it can be easily transferred to sequence and token classification tasks for proteins. Checkpoints exist in various sizes, from 8 million parameters up to a huge 15 billion parameter model.

ESMFold is a state-of-the-art single sequence protein folding model which produces high accuracy predictions significantly faster. Unlike previous protein folding tools like AlphaFold2 and openfold, ESMFold uses a pretrained protein language model to generate token embeddings that are used as input to the folding model, and so does not require a multiple sequence alignment (MSA) of related proteins as input. As a result, proteins can be folded in a single forward pass of the model without requiring any external databases or search/alignment tools to be present at inference time. This hugely reduces the time and compute requirements for folding.

Transformer protein language models were introduced in the paper Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus.

ESMFold was introduced in the paper Language models of protein sequences at the scale of evolution enable accurate structure prediction by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, and Alexander Rives.

Add ESMFold by @Rocketknight1 in #19977

TF port of ESM by @Rocketknight1 in #19587

LiLT

LiLT allows to combine any pre-trained RoBERTa text encoder with a lightweight Layout Transformer, to enable LayoutLM-like document understanding for many languages.

It was proposed in LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding by Jiapeng Wang, Lianwen Jin, Kai Ding.

Add LiLT by @NielsRogge in #19450

Flan-T5

FLAN-T5 is an enhanced version of T5 that has been finetuned on a mixture of tasks.

It was released in the paper Scaling Instruction-Finetuned Language Models by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei.

Add flan-t5 documentation page by @younesbelkada in #19892

Table Transformer

Table Transformer is a model that can perform table extraction and table structure recognition from unstructured documents based on the DETR architecture.

It was proposed in PubTables-1M: Towards comprehensive table extraction from unstructured documents by Brandon Smock, Rohith Pesala, Robin Abraham.

Add table transformer [v2] by @NielsRogge in #19614

Contrastive search decoding

Contrastive search decoding is a new state-of-the-art generation method which aims at reducing the repetitive patterns in which generation models often fall.

It was introduced in A Contrastive Framework for Neural Text Generation by Yixuan Su, Tian Lan, Yan Wang, Dani Yogatama, Lingpeng Kong, Nigel Collier.

Adding the state-of-the-art contrastive search decoding methods for the codebase of generation_utils.py by @gmftbyGMFTBY in #19477

Safety and security

We continue to explore the new serialization format not using Pickle via the safetensors library, this time by adding support for TensorFlow models. More checkpoints have been converted to this format. Support is still experimental.

... (truncated)

Commits

94b3f54 Unpin PyTorch for the release

8f95346 Add ESMFold code sample (#20000)

502d3b6 Remove pin temporarily to get tests

0e654e0 Added onnx config whisper (#19525)

1ebb3f7 Release v4.24.0

9c13b66 Unpin PyTorch

7f9b7b3 Add ESMFold (#19977)

4c9e0f0 Add support for gradient checkpointing (#19990)

8214a9f Pin torch to < 1.13 temporarily (#19989)

6aede2d Tranformers documentation translation to Italian #17459 (#19988)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 0
Bump pytest from 5.4.3 to 7.2.0
Bumps pytest from 5.4.3 to 7.2.0.

Release notes

Sourced from pytest's releases.

7.2.0

pytest 7.2.0 (2022-10-23)

Deprecations

#10012: Update pytest.PytestUnhandledCoroutineWarning{.interpreted-text role="class"} to a deprecation; it will raise an error in pytest 8.

#10396: pytest no longer depends on the py library. pytest provides a vendored copy of py.error and py.path modules but will use the py library if it is installed. If you need other py.* modules, continue to install the deprecated py library separately, otherwise it can usually be removed as a dependency.

#4562: Deprecate configuring hook specs/impls using attributes/marks.

Instead use :pypytest.hookimpl{.interpreted-text role="func"} and :pypytest.hookspec{.interpreted-text role="func"}. For more details, see the docs <legacy-path-hooks-deprecated>{.interpreted-text role="ref"}.

#9886: The functionality for running tests written for nose has been officially deprecated.

This includes:

Plain setup and teardown functions and methods: this might catch users by surprise, as setup() and teardown() are not pytest idioms, but part of the nose support.

Setup/teardown using the @with_setup decorator.

For more details, consult the deprecation docs <nose-deprecation>{.interpreted-text role="ref"}.

Features

#9897: Added shell-style wildcard support to testpaths.

Improvements

#10218: @pytest.mark.parametrize() (and similar functions) now accepts any Sequence[str] for the argument names, instead of just list[str] and tuple[str, ...].

(Note that str, which is itself a Sequence[str], is still treated as a comma-delimited name list, as before).

#10381: The --no-showlocals flag has been added. This can be passed directly to tests to override --showlocals declared through addopts.

#3426: Assertion failures with strings in NFC and NFD forms that normalize to the same string now have a dedicated error message detailing the issue, and their utf-8 representation is expresed instead.

#7337: A warning is now emitted if a test function returns something other than [None]{.title-ref}. This prevents a common mistake among beginners that expect that returning a [bool]{.title-ref} (for example [return foo(a, b) == result]{.title-ref}) would cause a test to pass or fail, instead of using [assert]{.title-ref}.

#8508: Introduce multiline display for warning matching via :pypytest.warns{.interpreted-text role="func"} and enhance match comparison for :py_pytest._code.ExceptionInfo.match{.interpreted-text role="func"} as returned by :pypytest.raises{.interpreted-text role="func"}.

#8646: Improve :pypytest.raises{.interpreted-text role="func"}. Previously passing an empty tuple would give a confusing error. We now raise immediately with a more helpful message.

#9741: On Python 3.11, use the standard library's tomllib{.interpreted-text role="mod"} to parse TOML.

tomli{.interpreted-text role="mod"}` is no longer a dependency on Python 3.11.

#9742: Display assertion message without escaped newline characters with -vv.

#9823: Improved error message that is shown when no collector is found for a given file.

... (truncated)

Commits

3af3f56 Prepare release version 7.2.0

bc2c3b6 Merge pull request #10408 from NateMeyvis/patch-2

d84ed48 Merge pull request #10409 from pytest-dev/asottile-patch-1

ffe49ac Merge pull request #10396 from pytest-dev/pylib-hax

d352098 allow jobs to pass if codecov.io fails

c5c562b Fix typos in CONTRIBUTING.rst

d543a45 add deprecation changelog for py library vendoring

f341a5c Merge pull request #10407 from NateMeyvis/patch-1

1027dc8 [pre-commit.ci] auto fixes from pre-commit.com hooks

6b905ee Add note on tags to CONTRIBUTING.rst

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 0

Bump torch from 1.10.2 to 1.13.0

Bumps torch from 1.10.2 to 1.13.0.

Release notes

Sourced from torch's releases.

PyTorch 1.13: beta versions of functorch and improved support for Apple’s new M1 chips are now available

Pytorch 1.13 Release Notes

Highlights

Backwards Incompatible Changes

New Features

Improvements

Performance

Documentation

Developers

Highlights

We are excited to announce the release of PyTorch 1.13! This includes stable versions of BetterTransformer. We deprecated CUDA 10.2 and 11.3 and completed migration of CUDA 11.6 and 11.7. Beta includes improved support for Apple M1 chips and functorch, a library that offers composable vmap (vectorization) and autodiff transforms, being included in-tree with the PyTorch release. This release is composed of over 3,749 commits and 467 contributors since 1.12.1. We want to sincerely thank our dedicated community for your contributions.

Summary:

The BetterTransformer feature set supports fastpath execution for common Transformer models during Inference out-of-the-box, without the need to modify the model. Additional improvements include accelerated add+matmul linear algebra kernels for sizes commonly used in Transformer models and Nested Tensors is now enabled by default.

Timely deprecating older CUDA versions allows us to proceed with introducing the latest CUDA version as they are introduced by Nvidia®, and hence allows support for C++17 in PyTorch and new NVIDIA Open GPU Kernel Modules.

Previously, functorch was released out-of-tree in a separate package. After installing PyTorch, a user will be able to import functorch and use functorch without needing to install another package.

PyTorch is offering native builds for Apple® silicon machines that use Apple's new M1 chip as a beta feature, providing improved support across PyTorch's APIs.

Stable Beta Prototype

Better TransformerCUDA 10.2 and 11.3 CI/CD Deprecation Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIsExtend NNC to support channels last and bf16Functorch now in PyTorch Core LibraryBeta Support for M1 devices Arm® Compute Library backend support for AWS Graviton CUDA Sanitizer

You can check the blogpost that shows the new features here.

Backwards Incompatible changes

Python API

uint8 and all integer dtype masks are no longer allowed in Transformer (#87106)

Prior to 1.13, key_padding_mask could be set to uint8 or other integer dtypes in TransformerEncoder and MultiheadAttention, which might generate unexpected results. In this release, these dtypes are not allowed for the mask anymore. Please convert them to torch.bool before using.

1.12.1
>>> layer = nn.TransformerEncoderLayer(2, 4, 2)
>>> encoder = nn.TransformerEncoder(layer, 2)
>>> pad_mask = torch.tensor([[1, 1, 0, 0]], dtype=torch.uint8)
>>> inputs = torch.cat([torch.randn(1, 2, 2), torch.zeros(1, 2, 2)], dim=1)
# works before 1.13
>>> outputs = encoder(inputs, src_key_padding_mask=pad_mask)

Stable	Beta	Prototype
Better TransformerCUDA 10.2 and 11.3 CI/CD Deprecation	Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIsExtend NNC to support channels last and bf16Functorch now in PyTorch Core LibraryBeta Support for M1 devices	Arm® Compute Library backend support for AWS Graviton CUDA Sanitizer

... (truncated)

Changelog

Sourced from torch's changelog.

Releasing PyTorch

General Overview

Cutting a release branch preparations

Cutting release branches

pytorch/pytorch

pytorch/builder / PyTorch domain libraries

Making release branch specific changes for PyTorch

Making release branch specific changes for domain libraries

Drafting RCs (https://github.com/pytorch/pytorch/blob/master/Release Candidates) for PyTorch and domain libraries

Release Candidate Storage

Release Candidate health validation

Cherry Picking Fixes

Promoting RCs to Stable

Additional Steps to prepare for release day

Modify release matrix

Open Google Colab issue

Patch Releases

Patch Release Criteria

Patch Release Process

Triage

Building a release schedule / cherry picking

Building Binaries / Promotion to Stable

Hardware / Software Support in Binary Build Matrix

Python

TL;DR

Accelerator Software

Special support cases

Special Topics

Updating submodules for a release

General Overview

Releasing a new version of PyTorch generally entails 3 major steps:

Cutting a release branch preparations

Cutting a release branch and making release branch specific changes

Drafting RCs (Release Candidates), and merging cherry picks

Promoting RCs to stable and performing release day tasks

Cutting a release branch preparations

Following Requirements needs to be met prior to final RC Cut:

Resolve all outstanding issues in the milestones(for example 1.11.0)before first RC cut is completed. After RC cut is completed following script should be executed from builder repo in order to validate the presence of the fixes in the release branch : python github_analyze.py --repo-path ~/local/pytorch --remote upstream --branch release/1.11 --milestone-id 26 --missing-in-branch

... (truncated)

Commits

7c98e70 attempted fix for nvrtc with lovelace (#87611) (#87618)
4e1a4b1 fix docs push (#87498) (#87628)
341c377 Add General Project Policies (#87385) (#87613)
fdb18da Fix distributed issue by including distributed files (#87612)
8569a44 [MPS] Revamp copy_to_mps_ implementation (#87475)
6a8be2c [ONNX] Reland: Update training state logic to support ScriptedModule (#86745)...
f6c42ae Reenable isinstance with torch.distributed.ReduceOp (#87303) (#87463)
51fa4fa Move PadNd from ATen/native to ATen (#87456)
d3aecbd Delete torch::deploy from pytorch core (#85953) (#85953) (#87454)
d253eb2 Avoid calling logging.basicConfig (#86959) (#87455)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies

opened by dependabot[bot] 0

Different evaluation results on HuggingFace and locally

Hello I encountered a problem.

Model: jonatasgrosman/wav2vec2-xls-r-1b-russian

Example 1

On HuggingFace using Hosted inference API (good):

рекомендуем при обращении в контактный центр использовать код клиента

Locally using huggingsound library (bad, missing a whitespace)

рекомендуем приобращение в контактный центр использовать кодклиента

Example 2

On HuggingFace using Hosted inference API (good):

в настоящий момент по техническим причинам купюры номиналом пять тысяч рублей действительно не принимаются в некоторых банкоматах

Locally using huggingsound library (bad, ending, syntax)

в настоящий момент по техническим причинам купюра номеналом пять тысяч рублей действительно не принимаются в некоторых банкоматах

opened by kirillrybin 0
Bump coverage from 5.5 to 6.5.0
Bumps coverage from 5.5 to 6.5.0.

Release notes

Sourced from coverage's releases.

coverage-5.6b1

Third-party packages are now ignored in coverage reporting. This solves a few problems:

Coverage will no longer report about other people’s code (issue 876). This is true even when using --source=. with a venv in the current directory.

Coverage will no longer generate “Already imported a file that will be measured” warnings about coverage itself (issue 905).

The HTML report uses j/k to move up and down among the highlighted chunks of code. They used to highlight the current chunk, but 5.0 broke that behavior. Now the highlighting is working again.

The JSON report now includes percent_covered_display, a string with the total percentage, rounded to the same number of decimal places as the other reports’ totals.

Changelog

Sourced from coverage's changelog.

Version 6.5.0 — 2022-09-29

The JSON report now includes details of which branches were taken, and which are missing for each file. Thanks, Christoph Blessing (pull 1438). Closes issue 1425.

Starting with coverage.py 6.2, class statements were marked as a branch. This wasn't right, and has been reverted, fixing issue 1449_. Note this will very slightly reduce your coverage total if you are measuring branch coverage.

Packaging is now compliant with PEP 517, closing issue 1395.

A new debug option --debug=pathmap shows details of the remapping of paths that happens during combine due to the [paths] setting.

Fix an internal problem with caching of invalid Python parsing. Found by OSS-Fuzz, fixing their bug 50381_.

.. _bug 50381: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=50381 .. _PEP 517: https://peps.python.org/pep-0517/ .. _issue 1395: nedbat/coveragepy#1395 .. _issue 1425: nedbat/coveragepy#1425 .. _pull 1438: nedbat/coveragepy#1438 .. _issue 1449: nedbat/coveragepy#1449

.. _changes_6-4-4:

Version 6.4.4 — 2022-08-16

Wheels are now provided for Python 3.11.

.. _changes_6-4-3:

Version 6.4.3 — 2022-08-06

Fix a failure when combining data files if the file names contained glob-like patterns (pull 1405_). Thanks, Michael Krebs and Benjamin Schubert.

Fix a messaging failure when combining Windows data files on a different drive than the current directory. (pull 1430, fixing issue 1428). Thanks, Lorenzo Micò.

Fix path calculations when running in the root directory, as you might do in

... (truncated)

Commits

0ac2453 docs: sample html report

0954c85 build: prep for 6.5.0

95195b1 docs: changelog for json report branch details

789f175 fix: keep negative arc values

aabc540 feat: include branches taken and missed in JSON report. #1425

a59fc44 docs: minor tweaks to db docs

d296083 docs: add a note to the class-branch change

7f07df6 chore: make upgrade

6bc29a9 build: use the badge action coloring

fd36918 fix: class statements shouldn't be branches. #1449

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 0

Releases(v0.1.6)

v0.1.6(Nov 2, 2022)

Source code(tar.gz)
Source code(zip)
v0.1.5(Sep 14, 2022)

Source code(tar.gz)
Source code(zip)
v0.1.4(May 30, 2022)

Source code(tar.gz)
Source code(zip)
v0.1.3(May 11, 2022)

Source code(tar.gz)
Source code(zip)
v0.1.2(Mar 28, 2022)

Source code(tar.gz)
Source code(zip)
v0.1.1(Feb 21, 2022)

Source code(tar.gz)
Source code(zip)
v0.1.0(Feb 20, 2022)

Source code(tar.gz)
Source code(zip)
v0.0.1(Feb 18, 2022)

Source code(tar.gz)
Source code(zip)

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

Related tags

Overview

HuggingSound

Requirements

Installation

How to use it?

Speech recognition

Inference

Inference (boosted by a language model)

Evaluation

Fine-tuning

Troubleshooting

Want to help?

Citation

Comments

2.6.1

Bug fixes

New Contributors

2.6.0

Important

Datasets features

v4.23.1 Patch release

v4.23.0: Whisper, Deformable DETR, Conditional DETR, MarkupLM, MSN, safetensors

Whisper

Deformable DETR

Conditional DETR

Time Series Transformer

Masked Siamese Networks

2.5.2

Bug fixes

2.5.1

Bug fixes

2.5.0

Important

Datasets features

No-code loaders

Dataset methods

Parquet support

v4.24.0: ESM-2/ESMFold, LiLT, Flan-T5, Table Transformer and Contrastive search decoding

ESM-2/ESMFold

LiLT

Flan-T5

Table Transformer

Contrastive search decoding

Safety and security

7.2.0

pytest 7.2.0 (2022-10-23)

Deprecations

Features

Improvements

PyTorch 1.13: beta versions of functorch and improved support for Apple’s new M1 chips are now available

Pytorch 1.13 Release Notes

Highlights

Backwards Incompatible changes

Python API

uint8 and all integer dtype masks are no longer allowed in Transformer (#87106)

Releasing PyTorch

General Overview

Cutting a release branch preparations

Example 1

Example 2

coverage-5.6b1

Version 6.5.0 — 2022-09-29

Version 6.4.4 — 2022-08-16

Version 6.4.3 — 2022-08-06

Releases(v0.1.6)

v0.1.6(Nov 2, 2022)

v0.1.5(Sep 14, 2022)

v0.1.4(May 30, 2022)

v0.1.3(May 11, 2022)

v0.1.2(Mar 28, 2022)

v0.1.1(Feb 21, 2022)

v0.1.0(Feb 20, 2022)

v0.0.1(Feb 18, 2022)

Owner

Jonatas Grosman

String Gen + Word Checker

TaCL: Improve BERT Pre-training with Token-aware Contrastive Learning

Official code for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

v4.23.0: Whisper, Deformable DETR, Conditional DETR, MarkupLM, MSN, `safetensors`