TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,

Overview

Github Runner Covergae Status

Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing

[TextFlint Documentation on ReadTheDocs]

AboutSetupUsageDesign

Github Runner Covergae Status PyPI version

About

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks, which unifies general text transformation, task-specific transformation, adversarial attack, sub-population, and their combinations to provide a comprehensive robustness analysis.

Features:

There are lots of reasons to use TextFlint:

  • Full coverage of transformation types, including 20 general transformations, 8 subpopulations and 60 task-specific transformations, as well as thousands of their combinations, which basically covers all aspects of text transformations to comprehensively evaluate the robustness of your model. TextFlint also supports adversarial attack to generate model specific transformed datas.
  • Generate targeted augmented data, and you can use the additional data to train or fine-tune your model to improve your model's robustness.
  • Provide a complete analytical report automatically to accurately explain where your model's shortcomings are, such as the problems in syntactic rules or syntactic rules.

Setup

Installation

You can either use pip or clone this repo to install TextFlint.

  1. Using pip (recommended)
pip install TextFlint
  1. Cloning this repo
git clone https://github.com/textflint/textflint.git
cd TextFlint
python setup.py install

Usage

Workflow

The general workflow of TextFlint is displayed above. Evaluation of target models could be devided into three steps:

  1. For input preparation, the original dataset for testing, which is to be loaded by Dataset, should be firstly formatted as a series of JSON objects. TextFlint configuration is specified by Config. Target model is also loaded as FlintModel.
  2. In adversarial sample generation, multi-perspective transformations (i.e., Transformation,Subpopulation and AttackRecipe), are performed on Dataset to generate transformed samples. Besides, to ensure semantic and grammatical correctness of transformed samples, Validator calculates confidence of each sample to filter out unacceptable samples.
  3. Lastly, Analyzer collects evaluation results and ReportGenerator automatically generates a comprehensive report of model robustness.

Quick Start

The following code snippet shows how to generate transformed data on the Sentiment Analysis task.

from TextFlint.engine import Engine

# load the data samples
sample1 = {'x': 'Titanic is my favorite movie.', 'y': 'pos'}
sample2 = {'x': 'I don\'t like the actor Tim Hill', 'y': 'neg'}
data_samples = [sample1, sample2]

# define the output directory
out_dir_path = './test_result/'

# run transformation/subpopulation/attack and save the transformed data to out_dir_path in json format
engine = Engine('SA')
engine.run(data_samples, out_dir_path, config)

You can also feed data to TextFlintEngine in other ways (e.g., json or csv) where one line represents for a sample. We have defined some transformations and subpopulations in SA.json, and you can also pass your own configuration file as you need.

Transformed Datasets

After transformation, here are the contents in ./test_result/:

ori_AddEntitySummary-movie_1.json
ori_AddEntitySummary-person_1.json
trans_AddEntitySummary-movie_1.json
trans_AddEntitySummary-person_1.json
...

where the trans_AddEntitySummary-movie_1.json contains 1 successfully transformed sample by transformation AddEntitySummary and ori_AddEntitySummary-movie_1.json contains the corresponding original sample. The content in ori_AddEntitySummary-movie_1.json:

{'x': 'Titanic is my favorite movie.', 'y': 'pos', "sample_id": 0}

The content in trans_AddEntitySummary-movie_1.json:

{"x": "Titanic (A seventeen-year-old aristocrat falls in love with a kind but poor artist aboard the luxurious, ill-fated R.
M.S. Titanic .) is my favorite movie.", "y": "pos", "sample_id": 0}

Design

Architecture

Input layer: receives textual datasets and models as input, represented as Dataset and FlintModel separately.

  • DataSet: a container for Sample, provides efficiently and handily operation interfaces for Sample. Dataset supports loading, verification, and saving data in Json or CSV format for various NLP tasks.
  • FlintModel: a target model used in an adversarial attack.

Generation layer: there are mainly four parts in generation layer:

  • Subpopulation: generates a subset of a DataSet.
  • Transformation: transforms each sample of Dataset if it can be transformed.
  • AttackRecipe: attacks the FlintModel and generate a DataSet of adversarial examples.
  • Validator: verifies the quality of samples generated by Transformation and AttackRecipe.

Report layer: analyzes model testing results and provides robustness report for users.

Transformation

In order to verify the robustness comprehensively, TextFlint offers 20 universal transformations and 60 task-specific transformations, covering 12 NLP tasks. The following table summarizes the Transformation currently supported and the examples for each transformation can be found in our web site.

Task Transformation Description Reference
UT (Universal Transformation) AppendIrr Extend sentences by irrelevant sentences -
BackTrans BackTrans (Trans short for translation) replaces test data with paraphrases by leveraging back translation, which is able to figure out whether or not the target models merely capture the literal features instead of semantic meaning. -
Contraction Contraction replaces phrases like `will not` and `he has` with contracted forms, namely, `won’t` and `he’s` -
InsertAdv Transforms an input by add adverb word before verb -
Keyboard Keyboard turn to the way how people type words and change tokens into mistaken ones with errors caused by the use of keyboard, like `word → worf` and `ambiguous → amviguius`. -
MLMSuggestion MLMSuggestion (MLM short for masked language model) generates new sentences where one syntactic category element of the original sentence is replaced by what is predicted by masked language models. -
Ocr Transformation that simulate ocr error by random values. -
Prejudice Transforms an input by Reverse gender or place names in sentences. -
Punctuation Transforms input by add punctuation at the end of sentence. -
ReverseNeg Transforms an affirmative sentence into a negative sentence, or vice versa. -
SpellingError Transformation that leverage pre-defined spelling mistake dictionary to simulate spelling mistake. Text Data Augmentation Made Simple By Leveraging NLP Cloud APIs (https://arxiv.org/ftp/arxiv/papers/1812/1812.04718.pdf)
SwapAntWordNet Transforms an input by replacing its words with antonym provided by WordNet. -
SwapNamedEnt Swap entities with other entities of the same category. -
SwapNum Transforms an input by replacing the numbers in it. -
SwapSynWordEmbedding Transforms an input by replacing its words by Glove. -
SwapSynWordNet Transforms an input by replacing its words with synonyms provided by WordNet. -
Tense Transforms all verb tenses in sentence. -
TwitterType Transforms input by common abbreviations in TwitterType. -
Typos Randomly inserts, deletes, swaps or replaces a single letter within one word (Ireland → Irland). Synthetic and noise both break neural machine translation (https://arxiv.org/pdf/1711.02173.pdf)
WordCase Transform an input to upper and lower case or capitalize case. -
RE (Relation Extraction) InsertClause InsertClause is a transformation method which inserts entity description for head and tail entity -
SwapEnt-LowFreq SwapEnt-LowFreq is a sub-transformation method from EntitySwap which replace entities in text with random same typed entities with low frequency. -
SwapTriplePos-Birth SwapTriplePos-Birth is a transformation method specially designed for birth relation. It paraphrases the sentence and keeps the original birth relation between the entity pairs. -
SwapTriplePos-Employee SwapTriplePos-Employee is a transformation method specially designed for employee relation. It deletes the TITLE description of each employee and keeps the original employee relation between the entity pairs. -
SwapEnt-SamEtype SwapEnt-SamEtype is a sub-transformation method from EntitySwap which replace entities in text with random entities with the same type. -
SwapTriplePos-Age SwapTriplePos-Age is a transformation method specially designed for age relation. It paraphrases the sentence and keeps the original age relation between the entity pairs. -
SwapEnt-MultiType SwapEnt-MultiType is a sub-transformation method from EntitySwap which replace entities in text with random same-typed entities with multiple possible types. -
NER (Named Entity Recognition) EntTypos Swap/delete/add random character for entities -
ConcatSent Concatenate sentences to a longer one. -
SwapLonger Substitute short entities to longer ones -
CrossCategory Entity Swap by swaping entities with ones that can be labeled by different labels. -
OOV Entity Swap by OOV entities. -
POS (Part-of-Speech Tagging) SwapMultiPOSRB It is implied by the phenomenon of conversion that some words hold multiple parts of speech. That is to say, these multi-part-of-speech words might confuse the language models in terms of POS tagging. Accordingly, we replace adverbs with words holding multiple parts of speech. -
SwapPrefix Swapping the prefix of one word and keeping its part of speech tag. -
SwapMultiPOSVB It is implied by the phenomenon of conversion that some words hold multiple parts of speech. That is to say, these multi-part-of-speech words might confuse the language models in terms of POS tagging. Accordingly, we replace verbs with words holding multiple parts of speech. -
SwapMultiPOSNN It is implied by the phenomenon of conversion that some words hold multiple parts of speech. That is to say, these multi-part-of-speech words might confuse the language models in terms of POS tagging. Accordingly, we replace nouns with words holding multiple parts of speech. -
SwapMultiPOSJJ It is implied by the phenomenon of conversion that some words hold multiple parts of speech. That is to say, these multi-part-of-speech words might confuse the language models in terms of POS tagging. Accordingly, we replace adjectives with words holding multiple parts of speech. -
COREF (Coreference Resolution) RndConcat RndConcat is a task-specific transformation of coreference resolution, this transformation will randomly retrieve an irrelevant paragraph from the corpus, and concatenate it after the original document -
RndDelete RndDelete is a task-specific transformation of coreference resolution, through this transformation, there is a possibility (20% by default) for each sentence in the original document to be deleted, and at least one sentence will be deleted; related coreference labels will also be deleted -
RndReplace RndInsert is a task-specific transformation of coreference resolution, this transformation will randomly retrieve irrelevant sentences from the corpus, and replace sentences from the original document with them (the proportion of replaced sentences and original sentences is 20% by default) -
RndShuffle RndShuffle is a task-specific transformation of coreference resolution, during this transformation, a certain number of swapping will be processed, which swap the order of two adjacent sentences of the original document (the number of swapping is 20% of the number of original sentences by default) -
RndInsert RndInsert is a task-specific transformation of coreference resolution, this transformation will randomly retrieve irrelevant sentences from the corpus, and insert them into the original document (the proportion of inserted sentences and original sentences is 20% by default) -
RndRepeat RndRepeat is a task-specific transformation of coreference resolution, this transformation will randomly pick sentences from the original document, and insert them somewhere else in the document (the proportion of inserted sentences and original sentences is 20% by default) -
ABSA (Aspect-based Sentiment Analysis) RevTgt RevTgt: reverse the sentiment of the target aspect. Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis (https://www.aclweb.org/anthology/2020.emnlp-main.292.pdf)
AddDiff RevNon: Reverse the sentiment of the non-target aspects with originally the same sentiment as target.
RevNon AddDiff: Add aspects with the opposite sentiment from the target aspect.
CWS (Chinese Word Segmentation) SwapContraction SwapContriction is a task-specific transformation of Chinese Word Segmentation, this transformation will replace some common abbreviations in the sentence with complete words with the same meaning -
SwapNum SwapNum is a task-specific transformation of Chinese Word Segmentation, this transformation will replace the numerals in the sentence with other numerals of similar size -
SwapSyn SwapSyn is a task-specific transformation of Chinese Word Segmentation, this transformation will replace some words in the sentence with some very similar words -
SwapName SwapName is a task-specific transformation of Chinese Word Segmentation, this transformation will replace the last name or first name of the person in the sentence to produce some local ambiguity that has nothing to do with the sentence -
SwapVerb SwapName is a task-specific transformation of Chinese Word Segmentation, this transformation will transform some of the verbs in the sentence to other forms in Chinese -
SM (Semantic Matching) SwapWord This transformation will add some meaningless sentence to premise, which do not change the semantics. -
SwapNum This transformation will find some num words in sentences and replace them with different num word. -
Overlap This method generate some data by some template, whose hypotheis and sentence1 have many overlap but different meaning. -
SA (Sentiment Analysis) SwapSpecialEnt-Person SpecialEntityReplace-Person is a task-specific transformation of sentiment analysis, this transformation will identify some special person name in the sentence, randomly replace it with other entity names of the same kind -
SwapSpecialEnt-Movie SpecialEntityReplace is a task-specific transformation of sentiment analysis, this transformation will identify some special movie name in the sentence, randomly replace it with other movie name. -
AddSum-Movie AddSummary-Movie is a task-specific transformation of sentiment analysis, this transformation will identify some special movie name in the sentence, and insert the summary of these entities after them (the summary content is from wikipedia). -
AddSum-Person AddSummary-Person is a task-specific transformation of sentiment analysis, this transformation will identify some special person name in the sentence, and insert the summary of these entities after them (the summary content is from wikipedia). -
DoubleDenial SpecialWordDoubleDenial is a task-specific transformation of sentiment analysis, this transformation will find some special words in the sentence and replace them with double negation -
NLI (Natural Language Inference) NumWord This transformation will find some num words in sentences and replace them with different num word. Stress Test Evaluation for Natural Language Inference (https://www.aclweb.org/anthology/C18-1198/)
SwapAnt This transformation will find some keywords in sentences and replace them with their antonym.
AddSent This transformation will add some meaningless sentence to premise, which do not change the semantics.
Overlap This method generate some data by some template, whose hypotheis and premise have many overlap but different meaning. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference (https://www.aclweb.org/anthology/P19-1334/)
MRC (Machine Reading Comprehension) PerturbQuestion-MLM PerturbQuestion is a task-specific transformation of machine reading comprehension, this transformation paraphrases the question. -
PerturbQuestion-BackTrans PerturbQuestion is a task-specific transformation of machine reading comprehension, this transformation paraphrases the question. -
AddSentDiverse AddSentenceDiverse is a task-specific transformation of machine reading comprehension, this transformation generates a distractor with altered question and fake answer. Adversarial Augmentation Policy Search for Domain and Cross-LingualGeneralization in Reading Comprehension (https://arxiv.org/pdf/2004.06076)
PerturbAnswer PerturbAnswer is a task-specific transformation of machine reading comprehension, this transformation transforms the sentence with golden answer based on specific rules.
ModifyPos ModifyPosition is a task-specific transformation of machine reading comprehension, this transformation rotates the sentences of context. -
DP (Dependency Parsing) AddSubtree AddSubtree is a task-specific transformation of dependency parsing, this transformation will transform the input sentence by adding a subordinate clause from WikiData. -
RemoveSubtree RemoveSubtree is a task-specific transformation of dependency parsing, this transformation will transform the input sentence by removing a subordinate clause. -

Subpopulation

Subpopulation is to identify the specific part of dataset on which the target model performs poorly. To retrieve a subset that meets the configuration, Subpopulation divides the dataset through sorting samples by certain attributes. We also support the following Subpopulation:

Subpopulation Description Reference
LMSubPopulation_0%-20% Filter samples based on the text perplexity from a language model (i.e., GPT-2), 0-20% is the lower part of the scores. Robustness Gym: Unifying the NLP Evaluation Landscape (https://arxiv.org/pdf/2101.04840)
LMSubPopulation_80%-100% Filter samples based on the text perplexity from a language model (i.e., GPT-2), 80-100% is the higher part of the scores.
LengthSubPopulation_0%-20% Filter samples based on text length, 0-20% is the lower part of the length.
LengthSubPopulation_80%-100% Filter samples based on text length, 80-100% is the higher part of the length.
PhraseSubPopulation-negation Filter samples based on a group of phrases, the remaining samples contain negation words (e.g., not, don't, aren't, no).
PhraseSubPopulation-question Filter samples based on a group of phrases, the remaining samples contain question words (e.g., what, which, how, when).
PrejudiceSubpopulation-man Filter samples based on gender bias, the chosen samples only contain words related to male (e.g., he, his, father, boy).
PrejudiceSubpopulation-woman Filter samples based on gender bias, the chosen samples only contain words related to female (e.g., she, her, mother, girl)

AttackRecipe

AttackRecipe aims to find a perturbation of an input text satisfies the attack's goal to fool the given FlintModel. In contrast to Transformation, AttackRecipe requires the prediction scores of the target model. TextFlint provides an interface to integrate the easy-to-use adversarial attack recipes implemented based on textattack. Users can refer to textattack for more information about the supported AttackRecipe.

Validator

It is crucial to verify the quality of samples generated by Transformation and AttackRecipe. TextFlint provides several metrics to calculate confidence:

Validator Description Reference
MaxWordsPerturbed Word replacement ratio in the generated text compared with the original text based on LCS. -
LevenshteinDistance The edit distance between original text and generated text -
DeCLUTREncoder Semantic similarity calculated based on Universal Sentence Encoder Universal sentence encoder (https://arxiv.org/pdf/1803.11175.pdf)
GPT2Perplexity Language model perplexity calculated based on the GPT2 model Language models are unsupervised multitask learners (http://www.persagen.com/files/misc/radford2019language.pdf)
TranslateScore BLEU/METEOR/chrF score Bleu: a method for automatic evaluation of machine translation (https://www.aclweb.org/anthology/P02-1040.pdf)
METEOR: An automatic metric for MT evaluation with improved correlation with human judgments (https://www.aclweb.org/anthology/W05-0909.pdf)
chrF: character n-gram F-score for automatic MT evaluation (https://www.aclweb.org/anthology/W15-3049.pdf)

Report

In Generation Layer, TextFlint can generate three types of adversarial samples and verify the robustness of the target model. Based on the results from Generation Layer, Report Layer aims to provide users with a standard analysis report from lexics, syntax, and semantic levels. For example, on the Sentiment Analysis (SA) task, this is a statistical chart of the performance ofXLNET with different types of Transformation/Subpopulation/AttackRecipe on the IMDB dataset. We can find that the model performance is lower than the original results in all the transformed dataset.

Citation

If you are using TextFlint for your work, please cite:

@article{gui2021textflint,
  title={TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing},
  author={Gui, Tao and Wang, Xiao and Zhang, Qi and Liu, Qin and Zou, Yicheng and Zhou, Xin and Zheng, Rui and Zhang, Chong and Wu, Qinzhuo and Ye, Jiacheng and others},
  journal={arXiv preprint arXiv:2103.11441},
  year={2021}
}
Comments
  • Quick start Error

    Quick start Error

    Hi, I just installed the textflint using 'pip install textflint' as recommended, under a python 3.7 environment, CentOS Linux release 7.8.2003. And I ran the quick start command 'from textflint import Engine'. The process downloaded a 764M model.zip and a 10.8M wordnet.zip, then a error appeared:

    Traceback (most recent call last): File "", line 1, in File "/data/users/wuchen/anaconda3/envs/textflint/lib/python3.7/site-packages/textflint/init.py", line 5, in from .input_layer import * File "/data/users/wuchen/anaconda3/envs/textflint/lib/python3.7/site-packages/textflint/input_layer/init.py", line 3, in from .dataset import Dataset File "/data/users/wuchen/anaconda3/envs/textflint/lib/python3.7/site-packages/textflint/input_layer/dataset/init.py", line 1, in from .dataset import Dataset, sample_map File "/data/users/wuchen/anaconda3/envs/textflint/lib/python3.7/site-packages/textflint/input_layer/dataset/dataset.py", line 23, in sample_map = get_sample_map() File "/data/users/wuchen/anaconda3/envs/textflint/lib/python3.7/site-packages/textflint/input_layer/dataset/dataset.py", line 20, in get_sample_map filter_str='_sample') File "/data/users/wuchen/anaconda3/envs/textflint/lib/python3.7/site-packages/textflint/common/utils/load.py", line 34, in task_class_load for module in modules: File "/data/users/wuchen/anaconda3/envs/textflint/lib/python3.7/site-packages/textflint/common/utils/load.py", line 27, in module_loader yield importlib.import_module(module_name) File "/data/users/wuchen/anaconda3/envs/textflint/lib/python3.7/importlib/init.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) ModuleNotFoundError: No module named 'textflint.lib'

    opened by wuchen95 10
  • NER Task Specific Transformed CoNLL 2003 and OntoNotes v5 Training Sets

    NER Task Specific Transformed CoNLL 2003 and OntoNotes v5 Training Sets

    Hi,

    I would like to use the TextFlint task-specific transformed CoNLL 2003 and OntoNotes v5 Training Sets for experiments in my paper. Are these transformed training sets readily downloadable? On textflint.io I was only able to find the transformed test sets of these datasets.

    Thank you for the help,

    Aaron

    opened by agr505 5
  • textflint是否支持Chinese Semantic Matching任务

    textflint是否支持Chinese Semantic Matching任务

    首先非常感谢textflint这个开源项目跟作者们~ 我在主页上看到demo,textflint应该是已支持Chinese Semantic Matching任务的。 但我尝试使用时遇到了些困难,可能是我对textflint框架还不够熟悉。 我仿照Chinese Word Segmentation任务写了下测试代码,但并没有获得相应的鲁棒性增广数据。以下是我的代码

    from textflint.engine import Engine
    from textflint.adapter import auto_config
    from textflint.input.config import Config
    import os
    engine = Engine()
    # 是否只支持英文SM,还是中英文都共用SM,不像CWS特指chinese word segmentation
    config = auto_config(task='SM')
    config.trans_methods = [
        "SwapWord",
        ]
    config.sub_methods = []
    engine.run(os.path.normcase('test.json'), config=config)
    

    测试数据test.json是参照主页中的示例 { "sentence1": "我喜欢这本书。", "sentence2": "这本书是我喜欢的。", "y": "1" } 希望能尽快得到回复,谢谢~

    opened by guanghuixu 4
  • Chinese NER 问题

    Chinese NER 问题

    @qzhangFDU @gpengzhi @Tribleave @jiacheng-ye @Tannidy 请问中文命名实体增强与英文命名实体存在哪些区别呢?

    中文按词做,当插入标点在实体中间,预测标签是将实体拆为两部分还是加上标点的也算实体 同理,当替换的同音词,同义词,反义词词语在实体中间,预测不是实体还是预测为实体呢

    若是替换交叉实体增强,替换的类别该如何确定呢

    谢谢啦

    good first issue 
    opened by 447428054 4
  • CrossCategory, OOV, SwapLonger not working for NER Task

    CrossCategory, OOV, SwapLonger not working for NER Task

    Hi,

    I was able to use TextFlint to augment the CoNLL2003 training data with the EntTypos and the ConcatSent transformations for the NER task. Thank you very much for providing this. It is presenting an error however when trying to use the other task-specific transformations: CrossCategory, OOV, SwapLonger on the NER Task showing this for each of those transformations:

    ValueError: Method CrossCategory is not allowed in task NER

    Are these transformations supported?

    Best Regards,

    Aaron

    opened by agr505 3
  • Unable to download detachable_word file

    Unable to download detachable_word file

    https://textflint.oss-cn-beijing.aliyuncs.com/download/CWS_DATA/detachable_word seems broken, which leads to CWS SwapVerb does not work properly. Could you help me check it? Thank you!

    opened by JackGao09 3
  • problems of

    problems of "pip install textflint''

    I have successfully install textflint. But when I import textflint, it have the error :ModuleNotFoundError: No module named 'textflint.lib'. Could you help me solve this problem? Thank you.

    opened by random2719 3
  • Run Quick start Error

    Run Quick start Error

    Traceback (most recent call last): File "d:/pyscript/test_text.py", line 1, in <module> from TextFlint import Engine File "D:\pyscript\work\lib\site-packages\textflint-0.0.1-py3.6.egg\TextFlint\__init__.py", line 7, in <module> from .generation_layer.generator import Generator File "D:\pyscript\work\lib\site-packages\textflint-0.0.1-py3.6.egg\TextFlint\generation_layer\generator\__init__.py", line 1, in <module> from .generator import Generator File "D:\pyscript\work\lib\site-packages\textflint-0.0.1-py3.6.egg\TextFlint\generation_layer\generator\generator.py", line 12, in <module> from ...input_layer.dataset import Dataset File "D:\pyscript\work\lib\site-packages\textflint-0.0.1-py3.6.egg\TextFlint\input_layer\dataset\__init__.py", line 1, in <module> from .dataset import Dataset, sample_map File "D:\pyscript\work\lib\site-packages\textflint-0.0.1-py3.6.egg\TextFlint\input_layer\dataset\dataset.py", line 23, in <module> sample_map = get_sample_map() File "D:\pyscript\work\lib\site-packages\textflint-0.0.1-py3.6.egg\TextFlint\input_layer\dataset\dataset.py", line 20, in get_sample_map filter_str='_sample') File "D:\pyscript\work\lib\site-packages\textflint-0.0.1-py3.6.egg\TextFlint\common\utils\load.py", line 34, in task_class_load for module in modules: File "D:\pyscript\work\lib\site-packages\textflint-0.0.1-py3.6.egg\TextFlint\common\utils\load.py", line 27, in module_loader yield importlib.import_module(module_name) File "c:\users\huisuan016\appdata\local\programs\python\python36\lib\importlib\__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ModuleNotFoundError: No module named 'TextFlint-0'

    opened by ValarMorghulis2018 3
  • Click dependency

    Click dependency

    Hello,

    I am wondering why textflint has a pinned dependency on click (click==7.1.1 in the requirements.txt), as from my understanding click is not even used in the project, the cli script seems to use argparse instead. Would it be possible to get rid of that dependency?

    opened by helpmefindaname 2
  • Is anyone still maintaining this project?

    Is anyone still maintaining this project?

    There are so many typo in code misleading users. For example: In 4_Sample_Dataset.ipynb, the first line is "from TextFlint.input_layer.component.sample.sa_sample import SASample", which should be "from textflint.input.component.sample.sa_sample import SASample". Try to be a user-friendly project, at least for the tutorial material.

    documentation 
    opened by EdwardMao 2
  • Error about installing textflint with setup.py

    Error about installing textflint with setup.py

    I installed textflint in my project by setup.py. However, an error occurred when I try to import the package, below is the screenshot:

    无标题

    I have checked the variable module_name in importlib.import_module(module_name), which is textflint-0.0.3-py3.8.egg.textflint.input_layer.component.sample.absa_sample

    Any suggestions to solve this issue?

    opened by JackGao09 2
  • POS Swap failing with VB a lot

    POS Swap failing with VB a lot

    I am running the pos swap with the parameter 'VB' and I am just concerned with how much it is failing. Of 1000 samples, only 82 were successful. Here are some examples of failures:

    Failed to get result for ['two', 'child', 'playing', 'in', 'the', 'house'] with transform SwapMultiPOS-VB Failed to get result for ['a', 'goat', 'attacks', 'a', 'man', 'and', 'the', 'man', 'fights', 'back'] with transform SwapMultiPOS-VB Failed to get result for ['a', 'man', 'shows', 'how', 'a', 'video', 'game', 'works'] with transform SwapMultiPOS-VB Failed to get result for ['someone', 'is', 'frying', 'food'] with transform SwapMultiPOS-VB Failed to get result for ['a', 'woman', 'is', 'ripping', 'off', 'a', 'man', 'clothes'] with transform SwapMultiPOS-VB Failed to get result for ['a', 'diver', 'goes', 'underwater'] with transform SwapMultiPOS-VB Failed to get result for ['a', 'young', 'boy', 'rocks', 'out', 'on', 'a', 'guitar'] with transform SwapMultiPOS-VB Failed to get result for ['someone', 'is', 'drawing', 'pictures'] with transform SwapMultiPOS-VB

    There are clearly verbs present, so maybe not strict verbs but maybe also including pos like VBG???

    opened by Maddy12 0
  • Negate where add at end

    Negate where add at end

    There was an internal error reported that you wanted me to create an issue for.

    This is for transforming ReverseNeg in UT.

    The tokens were input was ['guy', 'explaining', 'what', 'stiff', 'person', 'syndrome', 'is'] It is trying to add next to the root id, which is in this case is. However, there is index out of range error because is is at the end of the phrase.

    This occurs here: https://github.com/textflint/textflint/blob/master/textflint/generation/transformation/UT/reverse_neg.py#L150

    opened by Maddy12 0
  • About OCR generation

    About OCR generation

    Hi, guys! I am trying to reuse the OCR transformation module in TextFlint, but I somehow find it rather trivial... I quote the code about the OCR rules in the source code as below:

    mapping = {
                '0': ['8', '9', 'o', 'O', 'D'],
                '1': ['4', '7', 'l', 'I'],
                '2': ['z', 'Z'],
                '5': ['8'],
                '6': ['b'],
                '8': ['s', 'S', '@', '&'],
                '9': ['g'],
                'o': ['u'],
                'r': ['k'],
                'C': ['G'],
                'O': ['D', 'U'],
                'E': ['B']
            }
    

    Here, the rules do not even cover the alphabet... And there are for sure more rules, eg., "w" => "vv". "m" => "rn". I have found a dataset here (https://github.com/jie-mei/MiBio-OCR-dataset), which contains some OCR errors retrieved from real-world. Although I find it quite annoying to parse the files in the aforementioned dataset... I believe that it may be benefitial to this work!

    good first issue TODO 
    opened by LC-John 2
Releases(v0.1.0)
  • v0.1.0(Mar 15, 2022)

    New Features

    Add 6 Chinese NLP tasks support

    This update adds preprocessing and transformations for 6 Chinese NLP tasks, including Machine Reading Comprehension, Semantic Matching, Named Entity Recognition, and Sentiment Analysis.

    It provides 15 universal transformations and 12 specific transformations.

    Add 3 English NLP task support

    Now support transformations of Neural Machine Translation transformation between English and German.

    Now support transformations of Word Sense Disambiguation.

    Now support transformations of the Winograd Schema Challenge.

    Fix

    Update requirements.

    Update tutorial docs to synchronize with toolset version.

    Source code(tar.gz)
    Source code(zip)
  • v0.0.5(Aug 30, 2021)

    Performance

    1> Update README: provide more tutorial docs and relate links

    Fix

    1> Fix bug of pos tagging components which was not initialized; 2> Fix CSV load bug of NER sample; 3> Fixed some bugs for FlintmodelNER, and the tutorial for that is updated.

    Source code(tar.gz)
    Source code(zip)
  • v0.0.4(Jul 9, 2021)

    Performance

    1> Optimize the installation , remove the textattack in the requirements. Because textattack relies on too many packages which may cause the failure of installation. It is recommended to install the package manually for adversarial attack.

    2> Speed up the loading process of textflint from 1 minute to 3 seconds.

    Source code(tar.gz)
    Source code(zip)
  • v0.0.3(Apr 21, 2021)

    Features

    1. Add command supports
    2. Reconstruct Engine interfaces

    Fix

    1. Barchat incomplete display
    2. UT sample is_legal bug
    3. Specify importlib-metadata lib version
    4. Flintmodel load bug
    Source code(tar.gz)
    Source code(zip)
  • v0.0.2(Apr 9, 2021)

    Input layer: receives textual datasets and models as input, represented as Dataset and FlintModel separately.

    • DataSet: a container for Sample, provides efficiently and handily operation interfaces for Sample. Dataset supports loading, verification, and saving data in Json or CSV format for various NLP tasks.
    • FlintModel: a target model used in an adversarial attack.

    Generation layer: there are mainly four parts in generation layer:

    • Subpopulation: generates a subset of a DataSet.
    • Transformation: transforms each sample of Dataset if it can be transformed.
    • AttackRecipe: attacks the FlintModel and generate a DataSet of adversarial examples.
    • Validator: verifies the quality of samples generated by Transformation and AttackRecipe.

    Report layer: analyzes model testing results and provides robustness report for users.

    Source code(tar.gz)
    Source code(zip)
Owner
TextFlint
Text Robustness Evaluation Toolkit
TextFlint
Header-only C++ HNSW implementation with python bindings

Hnswlib - fast approximate nearest neighbor search Header-only C++ HNSW implementation with python bindings. NEWS: version 0.6 Thanks to (@dyashuni) h

2.3k Jan 05, 2023
Open source annotation tool for machine learning practitioners.

doccano doccano is an open source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequ

7.1k Jan 01, 2023
PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

Non-Autoregressive Transformer Code release for Non-Autoregressive Neural Machine Translation by Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K.

Salesforce 261 Nov 12, 2022
chaii - hindi & tamil question answering

chaii - hindi & tamil question answering This is the solution for rank 5th in Kaggle competition: chaii - Hindi and Tamil Question Answering. The comp

abhishek thakur 33 Dec 18, 2022
Korean stereoypte detector with TUNiB-Electra and K-StereoSet

Korean Stereotype Detector Korean stereotype sentence classifier using K-StereoSet with TUNiB-Electra Web demo you can test this model easily in demo

Sae_Chan_Oh 11 Feb 18, 2022
The code from the whylogs workshop in DataTalks.Club on 29 March 2022

whylogs Workshop The code from the whylogs workshop in DataTalks.Club on 29 March 2022 whylogs - The open source standard for data logging (Don't forg

DataTalksClub 12 Sep 05, 2022
2021海华AI挑战赛·中文阅读理解·技术组·第三名

文字是人类用以记录和表达的最基本工具,也是信息传播的重要媒介。透过文字与符号,我们可以追寻人类文明的起源,可以传播知识与经验,读懂文字是认识与了解的第一步。对于人工智能而言,它的核心问题之一就是认知,而认知的核心则是语义理解。

21 Dec 26, 2022
An official repository for tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a University of Edinburgh master's course.

PMR computer tutorials on HMMs (2021-2022) This is a repository for computer tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a Univer

Vaidotas Šimkus 10 Dec 06, 2022
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

ELECTRA Introduction ELECTRA is a method for self-supervised language representation learning. It can be used to pre-train transformer networks using

Google Research 2.1k Dec 28, 2022
This is the Alpha of Nutte language, she is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda

nutte-language This is the Alpha of Nutte language, it is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda My language was

catdochrome 2 Dec 18, 2021
The entmax mapping and its loss, a family of sparse softmax alternatives.

entmax This package provides a pytorch implementation of entmax and entmax losses: a sparse family of probability mappings and corresponding loss func

DeepSPIN 330 Dec 22, 2022
C.J. Hutto 3.8k Dec 30, 2022
CDLA: A Chinese document layout analysis (CDLA) dataset

CDLA: A Chinese document layout analysis (CDLA) dataset 介绍 CDLA是一个中文文档版面分析数据集,面向中文文献类(论文)场景。包含以下10个label: 正文 标题 图片 图片标题 表格 表格标题 页眉 页脚 注释 公式 Text Title

buptlihang 84 Dec 28, 2022
jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese.

jel: Japanese Entity Linker jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese. Usage Currently, link and question methods

izuna385 10 Jan 06, 2023
Utilize Korean BERT model in sentence-transformers library

ko-sentence-transformers 이 프로젝트는 KoBERT 모델을 sentence-transformers 에서 보다 쉽게 사용하기 위해 만들어졌습니다. Ko-Sentence-BERT-SKTBERT 프로젝트에서는 KoBERT 모델을 sentence-trans

Junghyun 40 Dec 20, 2022
Fake news detector filters - Smart filter project allow to classify the quality of information and web pages

fake-news-detector-1.0 Lists, lists and more lists... Spam filter list, quality keyword list, stoplist list, top-domains urls list, news agencies webs

Memo Sim 1 Jan 04, 2022
PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Chung-Ming Chien 1k Dec 30, 2022
Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow

This Repository contains a sample code for Tacotron 2, WaveGlow with multi-speaker, emotion embeddings together with a script for data preprocessing.

Ivan Didur 106 Jan 01, 2023
Python bot created with Selenium that can guess the daily Wordle word correct 96.8% of the time.

Wordle_Bot Python bot created with Selenium that can guess the daily Wordle word correct 96.8% of the time. It will log onto the wordle website and en

Lucas Polidori 15 Dec 11, 2022
A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型,适用于英语、普通话/中文、日语、韩语、俄语和藏语(当前已测试)。

简体中文 | English 并行语音合成 [TOC] 新进展 2021/04/20 合并 wavegan 分支到 main 主分支,删除 wavegan 分支! 2021/04/13 创建 encoder 分支用于开发语音风格迁移模块! 2021/04/13 softdtw 分支 支持使用 Sof

Atomicoo 161 Dec 19, 2022