Crowd sourced training data for Rasa NLU models

Overview

Open in Streamlit

NLU Training Data

Crowd-sourced training data for the development and testing of Rasa NLU models.

If you're interested in grabbing some data feel free to check out our live data fetching ui.


About this repository

This is an experiment with the goal of providing basic training data for developing chatbots, therefore, this repository is open for contributions!

We need your help to create an open source dataset to empower chatbot makers and conversational AI enthusiasts alike, and we very much appreciate your support in expanding the collection of data available to the community.

How do I donate my training data?

Each folder should contain a list of multiple intents, consider if the set of training data you're contributing could fit within an existing folder before creating a new one.

To contribute via pull request, follow these steps:

  1. Create an issue describing the training data you would like to contribute.

  2. Create a new file with a folder title and a NLU.yml file, or contribute to an existing folder.

  3. In the NLU.yml file, format your training data using YAML, remove all entities (see script), title each section with the intent types and add a short description e.g.intent:inform_rain <!--The user says that it is currently raining somewhere.-->

  4. Update the README.md file, include a list of the intent types added.

  5. Create a pull request describing your changes.

Your pull request will be reviewed by a maintainer, who will get back to you about any necessary changes or questions. You will also be asked to sign a Contributor License Agreement.

FAQs

How should I label my intents?

Please always put the domain at the end of each intent. For example: ask_transport

What do I do about multi-intent utterences?

If you would like to contribute multi-intent utterences, please add a + to indicate an additional intent, for example: affirm+ask_transport

What about training data that’s not in English?

Currently, we are unable to evaluate the quality of all language contributions, and therefore, during the initial phase we can only accept English training data to the repository. However, we understand that the Rasa community is a global one, and in the long-term we would like to find a solution for this in collaboration with the community.

Why do I need to remove entities from my training data?

We would like to make the training data as easy as possible to adopt to new training models and annotating entities highly dependent on your bot’s purpose. Therefore, we will first focus on collecting training data that only includes intents.

To help you remove the annotated entities from your training data, you can run this script.


About Rasa

Owner
Rasa
Open source machine learning tools for developers to build, improve, and deploy text-and voice-based chatbots and assistants
Rasa
TFIDF-based QA system for AIO2 competition

AIO2 TF-IDF Baseline This is a very simple question answering system, which is developed as a lightweight baseline for AIO2 competition. In the traini

Masatoshi Suzuki 4 Feb 19, 2022
Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipB

Jie Lei 雷杰 612 Jan 04, 2023
Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

Auto-Research A no-code utility to generate a detailed well-cited survey with topic clustered sections (draft paper format) and other interesting arti

Sidharth Pal 20 Dec 14, 2022
Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding

Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding

Bethge Lab 61 Dec 21, 2022
Natural Language Processing at EDHEC, 2022

Natural Language Processing Here you will find the teaching materials for the "Natural Language Processing" course at EDHEC Business School, 2022 What

1 Feb 04, 2022
Asr abc - Automatic speech recognition(ASR),中文语音识别

语音识别的简单示例,主要在课堂演示使用 创建python虚拟环境 在linux 和macos 上验证通过 # 如果已经有pyhon3.6 环境,跳过该步骤,使用

LIyong.Guo 8 Nov 11, 2022
DeBERTa: Decoding-enhanced BERT with Disentangled Attention

DeBERTa: Decoding-enhanced BERT with Disentangled Attention This repository is the official implementation of DeBERTa: Decoding-enhanced BERT with Dis

Microsoft 1.2k Jan 03, 2023
Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration This is the official repository for the EMNLP 2021 long pa

70 Dec 11, 2022
A python script to prefab your scripts/text files, and re create them with ease and not have to open your browser to copy code or write code yourself

Scriptfab - What is it? A python script to prefab your scripts/text files, and re create them with ease and not have to open your browser to copy code

DevNugget 3 Jul 28, 2021
A collection of GNN-based fake news detection models.

This repo includes the Pytorch-Geometric implementation of a series of Graph Neural Network (GNN) based fake news detection models. All GNN models are implemented and evaluated under the User Prefere

SafeGraph 251 Jan 01, 2023
GPT-3: Language Models are Few-Shot Learners

GPT-3: Language Models are Few-Shot Learners arXiv link Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-trainin

OpenAI 12.5k Jan 05, 2023
Package for controllable summarization

summarizers summarizers is package for controllable summarization based CTRLsum. currently, we only supports English. It doesn't work in other languag

Hyunwoong Ko 72 Dec 07, 2022
PG-19 Language Modelling Benchmark

PG-19 Language Modelling Benchmark This repository contains the PG-19 language modeling benchmark. It includes a set of books extracted from the Proje

DeepMind 161 Oct 30, 2022
Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields [project page][paper][cite] Geometry-Consistent Neural Shape Represe

Yifan Wang 100 Dec 19, 2022
justCTF [*] 2020 challenges sources

justCTF [*] 2020 This repo contains sources for justCTF [*] 2020 challenges hosted by justCatTheFish. TLDR: Run a challenge with ./run.sh (requires Do

justCatTheFish 25 Dec 27, 2022
Non-Autoregressive Predictive Coding

Non-Autoregressive Predictive Coding This repository contains the implementation of Non-Autoregressive Predictive Coding (NPC) as described in the pre

Alexander H. Liu 43 Nov 15, 2022
Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

NeX: Real-time View Synthesis with Neural Basis Expansion Project Page | Video | Paper | COLAB | Shiny Dataset We present NeX, a new approach to novel

537 Jan 05, 2023
Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Semantic Segmentation".

Dual Path Learning for Domain Adaptation of Semantic Segmentation Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Sema

27 Dec 22, 2022
Korean stereoypte detector with TUNiB-Electra and K-StereoSet

Korean Stereotype Detector Korean stereotype sentence classifier using K-StereoSet with TUNiB-Electra Web demo you can test this model easily in demo

Sae_Chan_Oh 11 Feb 18, 2022
InferSent sentence embeddings

InferSent InferSent is a sentence embeddings method that provides semantic representations for English sentences. It is trained on natural language in

Facebook Research 2.2k Dec 27, 2022