wenet-kws

Production First and Production Ready End-to-End Keyword Spotting Toolkit.

The goal of this toolkit it to...

Small footprint keyword spotting (KWS), or specifically wake-up word (WuW) detection is a typical and important module in internet of things (IoT) devices. It provides a way for users to control IoT devices with a hands-free experience. A WuW detection system usually runs locally and persistently on IoT devices, which requires low consumptional power, less model parameters, low computational comlexity and to detect predefined keyword in a streaming way, i.e., requires low latency.

Typical Scenario

We are going to support the following typical applications of wakeup word:

Single wake-up word
Multiple wake-up words
Customizable wake-up word
Personalized wake-up word, i.e. combination of wake-up word detection and voiceprint

Dataset

We plan to support a variaty of open source wake-up word datasets, include but not limited to:

All the well-trained models on these dataset will be made public avaliable.

Runtime

We plan to support a variaty of hardwares and platforms, including:

Web browser
x86
Android
Raspberry Pi

Production First and Production Ready End-to-End Keyword Spotting Toolkit

Related tags

Overview

wenet-kws

Typical Scenario

Dataset

Runtime

Owner

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

The Easy-to-use Dialogue Response Selection Toolkit for Researchers

A library for finding knowledge neurons in pretrained transformer models.

VoiceFixer VoiceFixer is a framework for general speech restoration.

Baseline code for Korean open domain question answering(ODQA)

A python script to prefab your scripts/text files, and re create them with ease and not have to open your browser to copy code or write code yourself

Big Bird: Transformers for Longer Sequences

SimpleChinese2 集成了许多基本的中文NLP功能，使基于 Python 的中文文字处理和信息提取变得简单方便。

A programming language with logic of Python, and syntax of all languages.

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Unofficial Python library for using the Polish Wordnet (plWordNet / Słowosieć)

Repositório do trabalho de introdução a NLP

Finally decent dictionaries based on Wiktionary for your beloved eBook reader.

This github repo is for Neurips 2021 paper, NORESQA A Framework for Speech Quality Assessment using Non-Matching References.

Code release for "COTR: Correspondence Transformer for Matching Across Images"

A practical and feature-rich paraphrasing framework to augment human intents in text form to build robust NLU models for conversational engines. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

fastai ulmfit - Pretraining the Language Model, Fine-Tuning and training a Classifier

Arabic-Phonetic-Output - You can input the phonetic version of any Arabic text here. This software will show you output in Arabic (with vowels)

Common Voice Dataset explorer