AllenNLP integration for Shiba: Japanese CANINE model

Last update: Feb 16, 2022

Overview

Allennlp Integration for Shiba

allennlp-shiab-model is a Python library that provides AllenNLP integration for shiba-model.

SHIBA is an approximate reimplementation of CANINE [1] in raw Pytorch, pretrained on the Japanese wikipedia corpus using random span masking. If you are unfamiliar with CANINE, you can think of it as a very efficient (approximately 4x as efficient) character-level BERT model. Of course, the name SHIBA comes from the identically named Japanese canine.

Installation

Installing the library and dependencies is simple using pip.

pip install allennlp-shiba

Example

This library enables users to specify the in a jsonnet config file. Here is an example of the model in jsonnet config file:

{
    "dataset_reader": {
        "tokenizer": {
            "type": "shiba",
        },
        "token_indexers": {
            "tokens": {
                "type": "shiba",
            }
        },
    },
    "model": {
        "shiba_embedder": {
            "type": "basic",
            "token_embedders": {
                "shiba": {
                    "type": "shiba",
                    "eval_model": true,
                }
            }

        }
    }
}

Reference

Joshua Tanner and Masato Hagiwara (2021). SHIBA: Japanese CANINE model. GitHub repository, GitHub.

You might also like...

Auto translate textbox from Japanese to English or Indonesia

priconne-auto-translate Auto translate textbox from Japanese to English or Indonesia How to use Install python first, Anaconda is recommended Install

5 Aug 25, 2022

Code for evaluating Japanese pretrained models provided by NTT Ltd.

japanese-dialog-transformers 日本語の説明文はこちら This repository provides the information necessary to evaluate the Japanese Transformer Encoder-decoder dialo

216 Dec 22, 2022

Script to download some free japanese lessons in portuguse from NHK

Nihongo_nhk This is a script to download some free japanese lessons in portuguese from NHK. It can be executed by installing the packages with: pip in

2 Jan 6, 2022

An open collection of annotated voices in Japanese language

声庭 (Koniwa): オープンな日本語音声とアノテーションのコレクション Koniwa (声庭): An open collection of annotated voices in Japanese language 概要 Koniwa(声庭)は利用・修正・再配布が自由でオープンな音声とアノテ

32 Dec 14, 2022

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

Japanese-LUW-Tokenizer Japanese Long-Unit-Word (国語研長単位) Tokenizer for Transformers based on 青空文庫 Basic Usage from transformers import RemBertToken

3 Dec 22, 2021

PyJPBoatRace: Python-based Japanese boatrace tools 🚤

pyjpboatrace :speedboat: provides you with useful tools for data analysis and auto-betting for boatrace.

5 Oct 29, 2022

A Japanese tokenizer based on recurrent neural networks

Nagisa is a python module for Japanese word segmentation/POS-tagging. It is designed to be a simple and easy-to-use tool. This tool has the following

325 Jan 5, 2023

This repository has a implementations of data augmentation for NLP for Japanese.

daaja This repository has a implementations of data augmentation for NLP for Japanese: EDA: Easy Data Augmentation Techniques for Boosting Performance

60 Nov 11, 2022

Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃

This repository provides a library for efficient training of masked language models (MLM), built with fairseq. We fork fairseq to give researchers mor

92 Dec 27, 2022

AllenNLP integration for Shiba: Japanese CANINE model

Related tags

Overview

Allennlp Integration for Shiba

Installation

Example

Reference

You might also like...

Auto translate textbox from Japanese to English or Indonesia

Code for evaluating Japanese pretrained models provided by NTT Ltd.

Script to download some free japanese lessons in portuguse from NHK

An open collection of annotated voices in Japanese language

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

PyJPBoatRace: Python-based Japanese boatrace tools 🚤

A Japanese tokenizer based on recurrent neural networks

This repository has a implementations of data augmentation for NLP for Japanese.

Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃

Releases(v0.1.1)

v0.1.1(Jun 26, 2021)

v0.1.0(Jun 26, 2021)

v0.0.1(Jun 26, 2021)

Owner

Shunsuke KITADA

Common Voice Dataset explorer

Crowd sourced training data for Rasa NLU models

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models

A retro text-to-speech bot for Discord

NeMo: a toolkit for conversational AI

Fake news detector filters - Smart filter project allow to classify the quality of information and web pages

Weakly-supervised Text Classification Based on Keyword Graph

A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

A multi-voice TTS system trained with an emphasis on quality

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Code and dataset for the EMNLP 2021 Finding paper "Can NLI Models Verify QA Systems’ Predictions?"

GPT-2 Model for Leetcode Questions in python

Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

Transformer related optimization, including BERT, GPT

SentimentArcs: a large ensemble of dozens of sentiment analysis models to analyze emotion in text over time

ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

Pytorch NLP library based on FastAI

Kinky furry assitant based on GPT2