This repository describes our reproducible framework for assessing self-supervised representation learning from speech

Last update: Aug 24, 2022

Related tags

Text Data & NLP Interspeech2021

Overview

LeBenchmark: a reproducible framework for assessing SSL from speech

Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient speech systems, their evaluation was mostly made on ASR and using multiple and heterogeneous experimental settings (most of them for English). This renders difficult the objective comparison between SSL approaches and the evaluation of their impact on building speech systems.

In this repository, we propose LeBenchmark: a reproducible framework for assessing SSL from speech. It not only includes ASR (high and low resource) tasks but also spoken language understanding, speech translation and emotion recognition. Also, it targets speech technologies in a language different than English: French. SSL models of different sizes are trained from carefully sourced and documented datasets.

Our pre-trained SSL models for French are available through this HuggingFace link: https://huggingface.co/LeBenchmark

Our benchmark tasks are available on the following directories:

ASR: Automatic Speech Recognition

SLU: Spoken Language Understanding

AER: Automatic Emotion Recognition

AST: Automatic Speech Translation

Detailed descriptions of experiments and results are given in on our paper: https://arxiv.org/pdf/2104.11462.pdf

(this page is still under construction)

This repository describes our reproducible framework for assessing self-supervised representation learning from speech

Related tags

Overview

LeBenchmark: a reproducible framework for assessing SSL from speech

Owner

Search for documents in a domain through Google. The objective is to extract metadata

Toward Model Interpretability in Medical NLP

Python api wrapper for JellyFish Lights

A python project made to generate code using either OpenAI's codex or GPT-J (Although not as good as codex)

Revisiting Pre-trained Models for Chinese Natural Language Processing (Findings of EMNLP 2020)

The entmax mapping and its loss, a family of sparse softmax alternatives.

Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"

Fast topic modeling platform

Multi Task Vision and Language

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

Searching keywords in PDF file folders

Labelling platform for text using distant supervision

CYGNUS, the Cynical AI, combines snarky responses with uncanny aggression.

Word Bot for JKLM Bomb Party

Deduplication is the task to combine different representations of the same real world entity.

One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

Get list of common stop words in various languages in Python

Python utility library for compositing PDF documents with reportlab.

Russian words synonyms and antonyms

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency