This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks.

Overview

slue-toolkit

made-with-python License: MIT

We introduce Spoken Language Understanding Evaluation (SLUE) benchmark. This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks. Refer https://arxiv.org/abs/2111.10367 for more details.

News

  • Nov. 22: We release the SLUE paper on arXiv along with the slue-toolkit repository. The repository contains data processing and evaluation scripts. We will publish the scripts for trainig the baseline models soon.

Installation

  1. git clone this repository and install slue-toolkit (development mode)
git clone https://github.com/asappresearch/slue-toolkit.git
pip install -e .

or install directly from Github

pip install git+https://github.com/asappresearch/slue-toolkit.git
  1. Install additional dependency based on your choice (e.g. you need fairseq and transformers for baselines)

SLUE Tasks

Automatic Speech Recognition (ASR)

Although this is not a SLU task, ASR can help analyze performance of downstream SLU tasks on the same domain. Additionally, pipeline approaches depend on ASR outputs, making ASR relevant to SLU. ASR is evaluated using word error rate (WER).

Named Entity Recognition (NER)

Named entity recognition involves detecting the named entities and their tags (types) in a given sentence. We evaluate performance using micro-averaged F1 and label-F1 scores. The F1 score evaluates an unordered list of named entity phrase and tag pairs predicted for each sentence. Only the tag predictions are considered for label-F1.

Sentiment Analysis (SA)

Sentiment analysis refers to classifying a given speech segment as having negative, neutral, or positive sentiment. We evaluate SA using macro-averaged (unweighted) recall and F1 scores.

Datasets

Corpus Size - utts (hours) Tasks License
Fine-tune Dev Test
SLUE-VoxPopuli 5,000 (14.5) 1,753 (5.0) 1,842 (4.9) ASR, NER CC0 (check complete license here)
SLUE-VoxCeleb 5,777 (12.8) 955 (2.1) 4,052 (9.0) ASR, SA CC-BY 4.0 (check complete license here)

For SLUE, you need VoxCeleb and VoxPopuli dataset. We carefully curated subset of those dataset for fine-tuning and evaluation for SLUE tasks, and we re-distribute the the subsets. Thus, you don't need to download a whole gigantic datasets. In the dataset, we also includes the human annotation and transcription for SLUE tasks. All you need to do is just running the script below and it will download and pre-process the dataset.

Download and pre-process dataset

bash scripts/download_datasets.sh

SLUE score evaluation

The test set data and annotation will be used for the official SLUE score evaluation, however we will not release the test set annotation. Thus, the SLUE score can be evaluated by submitting your prediction result in tsv format. We will prepare the website to accept your submission. Please stay tuned for this.

Model development rule

To train model, You can use fine-tuning and dev sets (audio, transcription and annotation) except the test set of SLUE task. Additionally you can use any kind of external dataset whether it is labeled or unlabeled for any purpose of training (e.g. pre-training and fine-tuning).

For vadidation of your model, you can use official dev set we provide, or you can make your own splits or cross-validation splits by mixing fine-tuning and dev set all together.

Baselines

ASR

Fine-tuning

Assuming that the preprocessed manifest files are in manifest/slue-voxceleb and manifest/slue-voxpopuli for SLUE-VoxCeleb and SLUE-VoxPopuli. This command fine-tune a wav2vec 2.0 base model on these two datasets using one GPU.

bash baselines/asr/ft-w2v2-base.sh manifest/slue-voxceleb save/asr/w2v2-base-vc
bash baselines/asr/ft-w2v2-base.sh manifest/slue-voxpopuli save/asr/w2v2-base-vp

Evaluation

To evaluate the fine-tuned wav2vec 2.0 ASR models on the dev set, please run the following commands.

python slue_toolkit/eval/eval_w2v.py eval_asr save/asr/w2v2-base-vc --data manifest/slue-voxceleb --subset dev
python slue_toolkit/eval/eval_w2v.py eval_asr save/asr/w2v2-base-vp --data manifest/slue-voxpopuli --subset dev

The WER will be printed directly. The predictions are saved in save/asr/w2v2-base-vc/pred-dev.wrd and save/asr/w2v2-base-vp/pred-dev.wrd and can be used for pipeline models.

More detail baseline experiment described here

NER

Fine-tuning End-to-end model

Assuming that the preprocessed manifest files are in manifest/slue-voxpopuli for SLUE-VoxPopuli. This command fine-tune a wav2vec 2.0 base model using one GPU.

bash baselines/ner/e2e_scripts/ft-w2v2-base.sh manifest/slue-voxpopuli/e2e_ner save/e2e_ner/w2v2-base

Evaluating End-to-End model

To evaluate the fine-tuned wav2vec 2.0 E2E NER model on the dev set, please run the following command. (decoding without language model)

bash baselines/ner/e2e_scripts/eval-ner.sh w2v2-base dev combined nolm

More detail baseline experiment described here

Sentiment Analysis

Fine-tuning

This command fine-tune a wav2vec 2.0 base model on the voxceleb dataset

bash baselines/sentiment/ft-w2v2-base-senti.sh manifest/slue-voxceleb save/sentiment/w2v2-base

Evaluation

To evaluate the fine-tuned wav2vec 2.0 sentiment model, run following commands or run baselines/sentiment/e2e_scripts/eval.sh

python3 slue_toolkit/eval/eval_w2v_sentiment.py --save-dir save/sentiment/w2v2-base --data manifest/slue-voxceleb --subset dev

More detail baseline experiment described here

Owner
ASAPP Research
AI for Enterprise
ASAPP Research
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

STAM - Pytorch Implementation of STAM (Space Time Attention Model), yet another pure and simple SOTA attention model that bests all previous models in

Phil Wang 109 Dec 28, 2022
Object recognition using Azure Custom Vision AI and Azure Functions

Step by Step on how to create an object recognition model using Custom Vision, export the model and run the model in an Azure Function

El Bruno 11 Jul 08, 2022
Pyramid Pooling Transformer for Scene Understanding

Pyramid Pooling Transformer for Scene Understanding Requirements: torch 1.6+ torchvision 0.7.0 timm==0.3.2 Validated on torch 1.6.0, torchvision 0.7.0

Yu-Huan Wu 119 Dec 29, 2022
Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CVPR 2021)

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CAC) Xin Lai*, Zhuotao Tian*, Li Jiang, Shu Liu, Hengshuang Zhao, Li

Jia Research Lab 137 Dec 14, 2022
Asymmetric Bilateral Motion Estimation for Video Frame Interpolation, ICCV2021

ABME (ICCV2021) Junheum Park, Chul Lee, and Chang-Su Kim Official PyTorch Code for "Asymmetric Bilateral Motion Estimation for Video Frame Interpolati

Junheum Park 86 Dec 28, 2022
Data Engineering ZoomCamp

Data Engineering ZoomCamp I'm partaking in a Data Engineering Bootcamp / Zoomcamp and will be tracking my progress here. I can't promise these notes w

Aaron 61 Jan 06, 2023
The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection .

GCoNet The official repo of the CVPR 2021 paper Group Collaborative Learning for Co-Salient Object Detection . Trained model Download final_gconet.pth

Qi Fan 46 Nov 17, 2022
Melanoma Skin Cancer Detection using Convolutional Neural Networks and Transfer Learning🕵🏻‍♂️

This is a Kaggle competition in which we have to identify if the given lesion image is malignant or not for Melanoma which is a type of skin cancer.

Vipul Shinde 1 Jan 27, 2022
PyTorch code for SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised DA

PyTorch Code for SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised Domain Adaptation Viraj Prabhu, Shivam Khare, Deeks

Viraj Prabhu 46 Dec 24, 2022
🔪 Elimination based Lightweight Neural Net with Pretrained Weights

ELimNet ELimNet: Eliminating Layers in a Neural Network Pretrained with Large Dataset for Downstream Task Removed top layers from pretrained Efficient

snoop2head 4 Jul 12, 2022
SeisComP/SeisBench interface to enable deep-learning (re)picking in SeisComP

scdlpicker SeisComP/SeisBench interface to enable deep-learning (re)picking in SeisComP Objective This is a simple deep learning (DL) repicker module

Joachim Saul 6 May 13, 2022
The official pytorch implemention of the CVPR paper "Temporal Modulation Network for Controllable Space-Time Video Super-Resolution".

This is the official PyTorch implementation of TMNet in the CVPR 2021 paper "Temporal Modulation Network for Controllable Space-Time VideoSuper-Resolu

Gang Xu 95 Oct 24, 2022
A Pytree Module system for Deep Learning in JAX

Treex A Pytree-based Module system for Deep Learning in JAX Intuitive: Modules are simple Python objects that respect Object-Oriented semantics and sh

Cristian Garcia 216 Dec 20, 2022
MMRazor: a model compression toolkit for model slimming and AutoML

Documentation: https://mmrazor.readthedocs.io/ English | 简体中文 Introduction MMRazor is a model compression toolkit for model slimming and AutoML, which

OpenMMLab 899 Jan 02, 2023
Codes for "Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier"

Deep-RTC [project page] This repository contains the source code accompanying our ECCV 2020 paper. Solving Long-tailed Recognition with Deep Realistic

Gina Wu 16 May 26, 2022
Official Code Release for "TIP-Adapter: Training-free clIP-Adapter for Better Vision-Language Modeling"

Official Code Release for "TIP-Adapter: Training-free clIP-Adapter for Better Vision-Language Modeling" Pipeline of Tip-Adapter Tip-Adapter can provid

peng gao 187 Dec 28, 2022
Home for cuQuantum Python & NVIDIA cuQuantum SDK C++ samples

Welcome to the cuQuantum repository! This public repository contains two sets of files related to the NVIDIA cuQuantum SDK: samples: All C/C++ sample

NVIDIA Corporation 147 Dec 27, 2022
Quick program made to generate alpha and delta tables for Hidden Markov Models

HMM_Calc Functions for generating Alpha and Delta tables from a Hidden Markov Model. Parameters: a: Matrix of transition probabilities. a[i][j] = a_{i

Adem Odza 1 Dec 04, 2021
Data-Driven Operational Space Control for Adaptive and Robust Robot Manipulation

OSCAR Project Page | Paper This repository contains the codebase used in OSCAR: Data-Driven Operational Space Control for Adaptive and Robust Robot Ma

NVIDIA Research Projects 74 Dec 22, 2022
Extremely simple and fast extreme multi-class and multi-label classifiers.

napkinXC napkinXC is an extremely simple and fast library for extreme multi-class and multi-label classification, that focus of implementing various m

Marek Wydmuch 43 Nov 14, 2022