An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

Last update: Jan 03, 2023

Overview

Welcome to AdaptNLP

A high level framework and library for running, training, and deploying state-of-the-art Natural Language Processing (NLP) models for end to end tasks.

What is AdaptNLP?

AdaptNLP is a python package that allows users ranging from beginner python coders to experienced Machine Learning Engineers to leverage state-of-the-art Natural Language Processing (NLP) models and training techniques in one easy-to-use python package.

Utilizing fastai with HuggingFace's Transformers library and Humboldt University of Berlin's Flair library, AdaptNLP provides Machine Learning Researchers and Scientists a modular and adaptive approach to a variety of NLP tasks simplifying what it takes to train, perform inference, and deploy NLP-based models and microservices.

What is the Benefit of AdaptNLP Rather Than Just Using Transformers?

Despite quick inference functionalities such as the pipeline API in transformers, it still is not quite as flexible nor fast enough. With AdaptNLP's Easy* inference modules, these tend to be slightly faster than the pipeline interface (bare minimum the same speed), while also providing the user with simple intuitive returns to alleviate any unneeded junk that may be returned.

Along with this, with the integration of the fastai library the code needed to train or run inference on your models has a completely modular API through the fastai Callback system. Rather than needing to write your entire torch loop, if there is anything special needed for a model a Callback can be written in less than 10 lines of code to achieve your specific functionalities.

Finally, when training your model fastai is on the forefront of beign a library constantly bringing in the best practices for achiving state-of-the-art training with new research methodologies heavily tested before integration. As such, AdaptNLP fully supports training with the One-Cycle policy, and using new optimizer combinations such as the Ranger optimizer with Cosine Annealing training through simple one-line fitting functions (fit_one_cycle and fit_flat_cos).

Installation Directions

PyPi

To install with pypi, please use:

pip install adaptnlp

Or if you have pip3:

pip3 install adaptnlp

Conda (Coming Soon)

Developmental Builds

To install any developmental style builds, please follow the below directions to install directly from git:

Stable Master Branch The master branch generally is not updated much except for hotfixes and new releases. To install please use:

pip install git+https://github.com/Novetta/adaptnlp

Developmental Branch {% include note.html content='Generally this branch can become unstable, and it is only recommended for contributors or those that really want to test out new technology. Please make sure to see if the latest tests are passing (A green checkmark on the commit message) before trying this branch out' %} You can install the developmental builds with:

pip install git+https://github.com/Novetta/[email protected]

Docker Images

There are actively updated Docker images hosted on Novetta's DockerHub

The guide to each tag is as follows:

latest: This is the latest pypi release and installs a complete package that is CUDA capable
dev: These are occasionally built developmental builds at certain stages. They are built by the dev branch and are generally stable
*api: The API builds are for the REST-API

To pull and run any AdaptNLP image immediatly you can run:

docker run -itp 8888:8888 novetta/adaptnlp:TAG

Replacing TAG with any of the afformentioned tags earlier.

Afterwards check localhost:8888 or localhost:888/lab to access the notebook containers

Navigating the Documentation

The AdaptNLP library is built with nbdev, so any documentation page you find (including this one!) can be directly run as a Jupyter Notebook. Each page at the top includes an "Open in Colab" button as well that will open the notebook in Google Colaboratory to allow for immediate access to the code.

The documentation is split into six sections, each with a specific purpose:

Getting Started

This group contains quick access to the homepage, what are the AdaptNLP Cookbooks, and how to contribute

Models and Model Hubs

These contain any relevant documentation for the AdaptiveModel class, the HuggingFace Hub model search integration, and the Result class that various inference API's return

Class API

This section contains the module documentation for the inference framework, the tuning framework, as well as the utilities and foundations for the AdaptNLP library.

Inference and Training Cookbooks

These two sections provide quick access to single use recipies for starting any AdaptNLP project for a particular task, with easy to use code designed for that specific use case. There are currently over 13 different tutorials available, with more coming soon.

NLP Services with FastAPI

This section provides directions on how to use the AdaptNLP REST API for deploying your models quickly with FastAPI

Contributing

There is a controbution guide available here

Testing

AdaptNLP is run on the nbdev framework. To run all tests please do the following:

pip install nbverbose
git clone https://github.com/Novetta/adaptnlp
cd adaptnlp
pip install -e .
nbdev_test_nbs

This will run every notebook and ensure that all tests have passed. Please see the nbdev documentation for more information about it.

Contact

Please contact Zachary Mueller at [email protected] with questions or comments regarding AdaptNLP.

License

This project is licensed under the terms of the Apache 2.0 license.

Comments

multi-label classification / paperswithcode dataset
Hi guys,

Hope you are all well !

I was wondering if adaptnlp can handle multi-label classification with 1560 labels.

More precisely, I would like to apply it to paperswithcode dataset where labels are called tasks.

Refs:

Repository

Dataset

Thanks for any insights or inputs on that.

Cheers, X
opened by ghost 7
cannot import name 'SAVE_STATE_WARNING' from 'torch.optim.lr_scheduler'
Describe the bug Your demo Colab Notebook "Custom Fine-Tuning and Training with Transformer Models" doesn't work and generates the following error:

To Reproduce Steps to reproduce the behavior:

Go to '...'

Click on '....'

Scroll down to '....'

See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]

Browser [e.g. chrome, safari]

Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]

OS: [e.g. iOS8.1]

Browser [e.g. stock browser, safari]

Version [e.g. 22]

Additional context Add any other context about the problem here.
bug
opened by lematmat 5
Significant slowdown in EasyTokenTagger release 0.2.0

I'm experiencing a slowdown in NER performance using EasyTokenTagger and 'ner-ontonotes' after updating to release 0.20. Has there been any underlying changes to how the tagger object works?

Specifically, I am dealing with a very large chunk of text. Prior to this release, the NER tagging took around 15 seconds for this particular text. Now, it's taking 15+ minutes the first time but subsequent calls on that text are very quick. Is there some sort of caching or indexing that's being done now? I'd imagine this could create a lot of overhead for large chunks of text.

opened by mkongsiri-Novetta 5
Can't load big dataset

Describe the bug It happens when I want to learning_rate = finetuner.find_learning_rate(**learning_rate_finder_configs) in the tutorial. I have a big dataset with 200k rows and each of them has a text with around 200 words.

In your code when you instantiate the TextDataset, the line tokenized_text = tokenizer.convert_tokens_to_ids(tokenizer.tokenize(text)) takes an eternity for a text of 20 million words. Do you think it can be achieved in the better/faster way like by keeping the rows like they are ?

For the record: Time for 100 characters: 0.0003399848937988281s Time for 1000 characters: 0.00124359130859375s Time for 10 000 characters: 0.012135982513427734s Time for 100 000 characters: 0.2131056785583496s Time for 1 000 000 characters: 8.782422542572021s Time for 10 000 000 characters: 734.5397665500641s

Can't reach the end of the full TextDataset (109 610 928 characters).

To Reproduce Tutorial with a big dataset

opened by NicoLivesey 5
AttributeError: 'CamembertForMaskedLM' object has no attribute 'cls'
Describe the bug Trying to freeze a LMFinetuner based on Camembert weights and get:

AttributeError Traceback (most recent call last) in 6 } 7 finetuner = LMFineTuner(**ft_configs) ----> 8 finetuner.freeze()

~/anaconda3/envs/pe_adaptnlp/lib/python3.8/site-packages/adaptnlp/transformers/finetuning.py in freeze(self) 1630 """Freeze last classification layer group only 1631 """ -> 1632 layers_len = len(list(self.model.cls.parameters())) 1633 self.freeze_to(-layers_len) 1634

~/anaconda3/envs/pe_adaptnlp/lib/python3.8/site-packages/torch/nn/modules/module.py in getattr(self, name) 573 if name in modules: 574 return modules[name] --> 575 raise AttributeError("'{}' object has no attribute '{}'".format( 576 type(self).name, name)) 577

AttributeError: 'CamembertForMaskedLM' object has no attribute 'cls'

To Reproduce

from adaptnlp import LMFineTuner train_file = "path/to/train" valid_file = "path/to/valid" ft_configs = { "train_data_file": train_file, "eval_data_file": valid_file, "model_type": "camembert", "model_name_or_path": "camembert-base", } finetuner = LMFineTuner(**ft_configs) finetuner.freeze()

Expected behavior No error

Desktop (please complete the following information):

OS: Amazon Linux

Browser Chrome
opened by NicoLivesey 4

AttributeError: 'EasyDocumentEmbeddings' object has no attribute 'rnn_embeddings'

Cannot use pool option to generate embeddings (instead of the default rnn).

A snippet for the problem:

embedding_type='albert-xxlarge-v2'
embedding_methods=["pool"]
doc_embeddings = EasyDocumentEmbeddings(embedding_type, methods = embedding_methods)

This is the error I get:

  File "env/lib/python3.7/site-packages/adaptnlp/training.py", line 91, in __init__
   self._initial_setup(self.label_dict, **kwargs)
 File "env/lib/python3.7/site-packages/adaptnlp/training.py", line 97, in _initial_setup
   document_embeddings: DocumentRNNEmbeddings = self.encoder.rnn_embeddings
AttributeError: 'EasyDocumentEmbeddings' object has no attribute 'rnn_embeddings'

Expected behavior would be to successfully obtain an easy document embeddings object with no errors

Running on debian buster, python3.7

If someone could give me a fix or a workaround or if I'm using this incorrectly, then please let me know

opened by blerstpub 3

EasySequenceClassifier tag_text function returns None for FlairSequenceClassifier model
Hi! I tried to follow the tutorial for training custom sequence classifier: https://novetta.github.io/adaptnlp/tutorial/training-sequence-classification.html The last step returns empty sentences while expected labels: sentences = classifier.tag_text(example_text, model_name_or_path=OUTPUT_DIR)

To Reproduce the behavior:

from adaptnlp import EasySequenceClassifier from flair.data import Sentence OUTPUT_DIR = "…/best-model.pt" # my custom model classifier = EasySequenceClassifier() ex_text = "This is a good text example" example_text=[Sentence(ex_text)] sentences = classifier.tag_text(text=example_text, model_name_or_path=OUTPUT_DIR, mini_batch_size=1) print("Label output:\n") print(sentences)

Returns

2020-12-28 17:44:31,111 loading file .../best-model.pt Label output: None

Surprisingly labels got added to example_text print(example_text) Returns [Sentence: " This is a good text example " [− Tokens: 17 − Sentence-Labels: {'label': [0 (0.8812)]}]]

Proposed explanation/ contribution: I think I know the reason for unexpected behavior and will be happy to help. classifier.tag_text creates FlairSequenceClassifier classifier. FlairSequenceClassifier initiates flair.models.TextClassifier classifier and uses TextClassifier predict method within its own predict method. But flair.models.TextClassifier predict method returns None because the labels are directly added to the sentences. I can re-write FlairSequenceClassifier predict method to return Sentences with labels instead of None.
opened by DinaraN 3
Sequence classification using REST API fails with models except en-sentiment

Sequence classification over REST API using any model except for en-sentiment fails with:

File "/usr/local/lib/python3.6/dist-packages/starlette/routing.py", line 41, in app response = await func(request) File "/usr/local/lib/python3.6/dist-packages/fastapi/routing.py", line 197, in app dependant=dependant, values=values, is_coroutine=is_coroutine File "/usr/local/lib/python3.6/dist-packages/fastapi/routing.py", line 147, in run_endpoint_function return await dependant.call(**values) File "./app/main.py", line 87, in sequence_classifier text=text, mini_batch_size=1, model_name_or_path=_SEQUENCE_CLASSIFICATION_MODEL File "/adaptnlp/adaptnlp/sequence_classification.py", line 285, in tag_text return classifier.predict(text=text, mini_batch_size=mini_batch_size, **kwargs,) File "/adaptnlp/adaptnlp/sequence_classification.py", line 140, in predict text_sent.add_label(label) TypeError: add_label() missing 1 required positional argument: 'value'

Reproducable with: docker run -itp 5000:5000 -e TOKEN_TAGGING_MODE='ner' \ -e TOKEN_TAGGING_MODEL='ner-ontonotes-fast' \ -e SEQUENCE_CLASSIFICATION_MODEL='nlptown/bert-base-multilingual-uncased-sentiment' \ achangnovetta/adaptnlp-rest:latest \ bash

opened by VogtAI 3
AdaptNLP v0.2.x Additional Features Discussion
There are a lot of ideas that may be floating for feature implementations, so this thread just provides a mini roadmap and environment to think about adaptnlp's progression.

Ideas can be stated freely in this thread and do not replace feature-request issue posts.

[x] Tokenizer Start integrating tokenizers all across adaptnlp for speed and performance enhancements for training and inference.

[x] Summarization Add NLP-task of summarization using document-level encoder based on transformer language models

[x] GPU Multi-GPU and mixed-precision is prevalent in AdaptNLP, but its implementation can be improved and debugged ~~FastAPI Batch-Serving Improve on the concurrent calls with batch processing from the NLP models (maybe try to make it CPU and GPU agnostic for ease-of-use)~~ ~~Model Downloading Start structuring a way to download and potentially upload pre-trained NLP-task models~~

enhancement
opened by aychang95 3
Data API
We probably should have a data API of some form, that ties into https://github.com/Novetta/adaptnlp/issues/128

Ideally it should simply prep a dataset for tokenization of a model, or tokenize the data itself.

For now we cover two inputs:

Individual texts

CSV

We should support something akin to fastai's get_y, but with decent defaults so that customization is available, but not needed.

Ideally something like:

dset = TaskDataset.from_df( df, # Can be fname or dataframe get_x = ColReader('text'), get_y = ColReader('label'), splitter = RandomSplitter(), model = 'bert-base-uncased', # The name/type of downstream model task = "ner" # Or use a `Task.NER` namespace class )

And further:

dset.dataloaders(bs=8, collate_fn=data_collator)

It reads extremely similar to the fastai API, but we do not use the fastai API, as for text doing it like this is a bit easier.

The highest level API would look like so:

dls = TaskDataLoaders.from_df(df, 'text', 'label', model='bert-base-uncased')

We should note the model used, and when integrating it with the tuning API if something is off with the model entered, we make note of that
enhancement
opened by muellerzr 2

ImportError: cannot import name 'EasyTokenTagger'

Describe the bug A clear and concise description of what the bug is. I tried to run the code in the tutorial

from adaptnlp import EasyTokenTagger


## Example Text
example_text = "Novetta's headquarters is located in Mclean, Virginia."

## Load the token tagger module and tag text with the NER model 
tagger = EasyTokenTagger()
sentences = tagger.tag_text(text=example_text, model_name_or_path="ner")

## Output tagged token span results in Flair's Sentence object model
for sentence in sentences:
    for entity in sentence.get_spans("ner"):
        print(entity)

and it gave me the error:

...
  File "/home/rajiv/Documents/dev/python/nltk-trial/adaptnlp.py", line 2, in <module>
    from adaptnlp import EasyTokenTagger
ImportError: cannot import name 'EasyTokenTagger'

Desktop (please complete the following information):

OS: Ubuntu
Version: 20.04
Python: 3.6.9

opened by RAbraham 2

classifier.tag_text on GPU!

hi i want to classify texts: classifier = EasySequenceClassifier() hub = HFModelHub() hub.search_model_by_task('text-classification') model = hub.search_model_by_name('nlptown/bert-base', user_uploaded=True)[0]; sentence = classifier.tag_text(text=inputs, model_name_or_path=model, mini_batch_size=1)

Q1: how force to run it on CPU? Q2: now i have GPU but i can't success to run, my errors: ... FileNotFoundError: [Errno 2] No such file or directory: 'nlptown/bert-base-multilingual-uncased-sentiment' During handling of the above exception, another exception occurred: ... RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

opened by topliftarm 0
Unified Training API
Training API will use fastai under the hood, and we'll make a few functions to build general datasets.

Tasks and sample datasets to use:

[x] Language Models:

IMDB_SAMPLE text (fastai)

[x] NER/Token Classification

Annotated Corpus for Named Entity Recognition

[ ] Question/Answering

SQuAD dataset (HuggingFace Datasets)

[x] Sequence Classification

IMDB_SAMPLE classification (fastai)

[ ] Summarization

Amazon Fine Food Reviews

[ ] Translation

English to French

Other Information

Task API's should have a simple user interface, IE high-level can only input specific options, while midlevel has access to the full fastai Learner params.

Example mid-level API I'm thinking about:

dls = some_build_data_thing() tuner = QAFineTuner(dls, 'bert-base-cased') tuner.tune( scheduler = 'fit_flat_cos', n_epochs = 3, lr = None, suggest_method = 'valley', # Triggers if lr is None additional_callbacks = [] )

And its high-level:

tuner = QAFineTuner.from_csv( question_column_name = "question", answer_column_name = "answer", model = "bert-base-cased" ) tuner.tune(...)

We should automatically pull in proper metrics for each task, but users have the option to bring in their own as well and pass it to QAFineTuner (good defaults)

Tuners should also have a func like QAFineTuner.from_csv() to build the dataset in-house
enhancement
opened by muellerzr 2

Save context in QuestionAnswering and re-use it

I notices when we run any code snippet, it convert the text to vectors or some similar thing. For example in this code snippet

from adaptnlp import EasyQuestionAnswering 
from pprint import pprint

## Example Query and Context 
query = "What is the meaning of life?"
context = "Machine Learning is the meaning of life."
top_n = 5

## Load the QA module and run inference on results 
qa = EasyQuestionAnswering()
best_answer, best_n_answers = qa.predict_qa(query=query, context=context, n_best_size=top_n, mini_batch_size=1, model_name_or_path="distilbert-base-uncased-distilled-squad")

## Output top answer as well as top 5 answers
print(best_answer)
pprint(best_n_answers)

It convert both query and context to vectors first. What if we have very long context and we have a lot of queries, each time it will convert the context to vector. I think there should be a way to save context vector and re-use it instead of creating again and again.

enhancement

opened by talhaanwarch 1

Stretch Goals
[x] HuggingFace raw embeddings over Flair

[x] Try and integrate Callbacks for text generation and other classes that aren't using it

Note: Didn't do this for text generation, more complex than its worth

[x] Use fastrelease (with conda)

[x] Improve test coverage

[x] GH CI for testing Mac, Windows, and Linux, similar to how fastai has it setup

[x] nbdev?

[x] Windows support

[x] Use Pipeline for inference

Note: Pipeline is slower on many tasks that AdaptNLP covers, tests are in place to ensure that this is always true

[ ] 1.0.0: Unified training framework for at least 4 NLP tasks

enhancement
opened by muellerzr 0

Releases(v0.3.7)

v0.3.7(Nov 10, 2021)
Bug Fixes

Fixes bug introduced in https://github.com/Novetta/adaptnlp/issues/149 with https://github.com/Novetta/adaptnlp/pull/151

Source code(tar.gz)
Source code(zip)
v0.3.6(Nov 9, 2021)
New Features:

NER Fine-Tuner added

Source code(tar.gz)
Source code(zip)
v0.3.3(Sep 3, 2021)
Bug Squashed

Embeddings were conjoined rather than separated out by word

Question Answering Results would only return the first instance, rather than top n instances

AdaptiveTuner can accept a label_names parameter for where the labels in a batch are present

Source code(tar.gz)
Source code(zip)
v0.3.0(Aug 11, 2021)
New Features

A new Data API that integrates with HuggingFace's Dataset class

A new Tuner API for training and fine-tuning Transformer models

Full integration of the latest fastai library for full access to state-of-the-art practices when training and fine-tuning a model. As improvements are made to the library AdaptNLP will update to accomodate them

A new Result API that most inference modules return. This is a filterable result ensuring that you only get the most relevent information when returning a prediction from the Easy* modules

Breaking Changes

The train and eval capabilities in the Easy* modules no longer exist, and all training related functionalities have migrated to the Tuner API

LanguageModelFineTuner no longer exists, and the same tuning functionality is in LanguageModelTuner

Bugs Squashed

max_len Attribute Error (127

Integrate a complete Data API (milestone) (129

Use the latest fastcore (132)

Fix unused kwarg arguments in text generation (134)

Fix name 'df' is not defined (135)

Source code(tar.gz)
Source code(zip)
0.2.3(May 5, 2021)
Breaking Changes:

New versions of AdaptNLP will require a minimum torch version of 1.7, and flair of 0.9 (currently we install via git until 0.9/0.81 is released)

New Features

Complete conversion to the nbdev library format and actions

Complete revamp of the documentation

Inference API entirely relies on fastai_minima and is now built on fastai's Callback System

Integration with fastcore to simplify logic

HuggingFace and Flair ModelHubs, an easier API to interact, search, and download HF and Flair models. Uses huggingface_hub as a backend. Has logged every single Flair model, including those not in the HuggingFace API

Bugs Squashed

Fix accessing bart-large-cnn (110)

Fix SAVE_STATE_WARNING (114)

Source code(tar.gz)
Source code(zip)
v0.2.2(Jan 11, 2021)
Official AdaptNLP Docker Images updated

Using NVIDIA NGC Container Registry Cuda base images #101

All images should be deployable via. Kubeflow Jupyter Servers

Cleaner python virtualvenv setup #101

Official readme can be found at https://github.com/Novetta/adaptnlp/blob/master/docker/README.md

Minor Bug Fixes

Fix token tagging REST application type check #92

Semantic fixes in readme #94

Standalone microservice REST application images #93

Python 3.7+ is now an official requirement #97

Source code(tar.gz)
Source code(zip)
v0.2.1(Sep 17, 2020)
Updated to nlp 0.4 -> datasets 1.0+ and multi-label training for sequence classification fixes.

EasySequenceClassifier.train() Updates

Integrates datasets.Dataset now

Swapped order of formatting and label column renaming due to labels not showing up from torch data batches #87

Tutorials and Documentation

Documentation and sequence classification tutorials have been updated to address nlp->datasets name change

Broken links also updated

ODSC Europe Workshop 2020: Notebooks and Colab

ODSC Europe 2020 workshop materials now available in repository "/tutorials/Workshop"

Easy to run notebooks and colab links aligned with the tutorials are available

Source code(tar.gz)
Source code(zip)
v0.2.0(Sep 1, 2020)
Updated to transformers 3+, nlp 0.4+, flair 0.6+, pandas 1+

New Features!

New and "easier" training framework with easy modules: EasySequenceClassifier.train() and EasySequenceClassifier.evaluate()

Integrates nlp.Dataset and transformers.Trainer for a streamlined training workflow

Tutorials, notebooks, and colab links available

Sequence Classification task has been implemented, other NLP tasks are in the works

SequenceClassifierTrainer is still available, but will be transitioned into the EasySequenceClassifier and deprecated

New and "easier" LMFineTuner

Integrates transformers.Trainer for a streamlined training workflow

Older LMFineTuner is still available as LMFineTunerManual, but will be deprecated in later releases

Tutorials, notebooks, and colab links available

EasyTextGenerator

New module for text generation. GPT models are currently supported, other models may work but still experimental

Tutorials, notebooks, and colab links available

Tutorials and Documentation

Documentation has been edited and updated to include additional features like the change in training frameworks and fine-tuning

The sequence classification tutorial is a good indicator of the direction we are going with the training and fine-tuning framework

Notebooks and Colab

Easy to run notebooks and colab links aligned with the tutorials are available

Bug fixes

Minor bug and implementation error fixes from flair upgrades

Source code(tar.gz)
Source code(zip)
v0.1.6(May 1, 2020)

Split dev requirements #29 #66 Pinned torch #70
Source code(tar.gz)
Source code(zip)
v0.1.5(Apr 17, 2020)
Updated to Transformers 2.8.0 which now includes the ELECTRA language model

EasySummarizer and EasyTranslator Bug Fix #63

Address mini batch output format issue for language model heads for the summarization and translation task

Tutorials and Workshop #64

Add the ODSC Timeline Generator notebooks along with colab links

Small touch ups in tutorial notebooks

Documentation

Address missing model_name_or_path param in some easy modules

Source code(tar.gz)
Source code(zip)
v0.1.4(Apr 2, 2020)
Updated to Transformers 2.7.0 which includes the Bart and T5 Language Models!

EasySummarizer #47

New module for summarizing documents. These support both the T5 and Bart pre-trained models provided by Hugging Face.

Helper objects for the easy module that can be run as standalone instances TransformersSummarizer

EasyTranslator #49

New module for translating documents with T5 pre-trained models provided by Hugging Face.

Helper objects for the easy module that can be run as standalone instances TransformersTranslator

Documentation and Tutorials #52

New Class API documentation for EasySummarizer and EasyTranslator

New tutorial guides, initial notebooks, and links to colab for the above as well

Readme provides quickstart samples that show examples from the notebooks #53

Other

Dockerhub repo for adaptnlp-rest added here https://hub.docker.com/r/achangnovetta/adaptnlp-rest

Upgraded CircleCI allowing us to run #40

Added Nightly build #39

Source code(tar.gz)
Source code(zip)
v0.1.3(Mar 6, 2020)
Sequence Classification and Question Answering updates to integrate Hugging Face's public models.

EasySequenceClassifier

Can now take Flair and Transformers pre-trained sequence classification models as input in the model_name_or_path param

Helper objects for the easy module that can be run as standalone instances TransformersSequenceClassifier FlairSequenceClassifier

EasyQuestionAnswering

Can now take Transformers pre-trained sequence classification models as input in the model_name_or_path param

Helper objects for the easy module that can be run as standalone instances TransformersQuestionAnswering

Documentation and Tutorials

Documentation has been updated with the above implementations

Tutorials updated with better examples to convey changes

Class API docs updated

Tutorial notebooks updated

Colab notebooks better displayed on readme

FastAPI Rest

FastAPI updated to latest (0.52.0) FastAPI endpoints can now be stood up and deployed with any huggingface sequence classification or question answering model specified as an env var arg.

Dependencies

Transformers pinned for stable updates
Source code(tar.gz)
Source code(zip)
v0.1.2(Feb 19, 2020)
AdaptNLP's first published release on github.

Easy API:

EasyTokenTagger

EasySequenceClassifier

EasyWordEmbeddings

EasyStackedEmbeddings

EasyDocumentEmbeddings

Training and Fine-tuning Interface

SequenceClassifierTrainer

LMFineTuner

FastAPI AdaptNLP App for Streamlined Rapid NLP-Model Deployment

adaptnlp/rest

configured to run any pretrained and custom trained flair/adaptnlp models

compatible with nvidia-docker for GPU use

AdaptNLP integration but loosely coupled

Documentation

Documentation release with walk-through guides, tutorials, and Class API docs of the above

Built with mkdocs, material for mkdocs, and mkautodoc

Tutorials

IPython/Colab Notebooks provided and updated to showcase AdaptNLP Modules

Continuous Integration

CircleCI build and tests running successfully and minimally

Github workflow for pypi publishing added

Formatting

Flake8 and Black adherence

Source code(tar.gz)
Source code(zip)

Owner

Novetta

GitHub Repository https://novetta.github.io/adaptnlp/

A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

Multilingual Latent Dirichlet Allocation (LDA) Pipeline This project is for text clustering using the Latent Dirichlet Allocation (LDA) algorithm. It

74 Oct 07, 2022

A Fast Command Analyser based on Dict and Pydantic

Alconna Alconna 隶属于ArcletProject，在Cesloi内有内置 Alconna 是 Cesloi-CommandAnalysis 的高级版，支持解析消息链一般情况下请当作简易的消息链解析器/命令解析器文档暂时的文档 Example from arclet.alcon

19 Jan 03, 2023

Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2021).

Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER. @inproceedings{tedes

40 Dec 11, 2022

Unsupervised text tokenizer focused on computational efficiency

YouTokenToMe YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE)

847 Dec 19, 2022

Amazon Multilingual Counterfactual Dataset (AMCD)

35 Sep 20, 2022

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing pororo performs Natural Language Processing and Speech-related tasks. It is easy to

1.2k Dec 21, 2022

Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding

61 Dec 21, 2022

Python library for interactive topic model visualization. Port of the R LDAvis package.

pyLDAvis Python library for interactive topic model visualization. This is a port of the fabulous R package by Carson Sievert and Kenny Shirley. pyLDA

1.7k Dec 20, 2022

Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

PythonTextObfuscator Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense. Requi

2 Aug 29, 2022

Code for ACL 2020 paper "Rigid Formats Controlled Text Generation"

SongNet SongNet: SongCi + Song (Lyrics) + Sonnet + etc. @inproceedings{li-etal-2020-rigid, title = "Rigid Formats Controlled Text Generation",

212 Dec 17, 2022

Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch

N-Grammer - Pytorch Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch Install $ pip install n-grammer-pytorch Usage

66 Dec 29, 2022

一个基于Nonebot2和go-cqhttp的娱乐性qq机器人

Takker - 一个普通的QQ机器人此项目为基于 Nonebot2 和 go-cqhttp 开发，以 Sqlite 作为数据库的QQ群娱乐机器人关于纯兴趣开发，部分功能借鉴了大佬们的代码，作为Q群的娱乐+功能性Bot 声明此项目仅用于学习交流，请勿用于非法用途这是开发者的第一个Pytho

79 Dec 29, 2022

An A-SOUL Text Generator Based on CPM-Distill.

ASOUL-Generator-Backend 本项目为 https://asoul.infedg.xyz/ 的后端。模型为基于 CPM-Distill 的 transformers 转化版本 CPM-Generate-distill 训练而成。

46 Dec 11, 2022

Synthetic data for the people.

zpy: Synthetic data in Blender. Website • Install • Docs • Examples • CLI • Contribute • Licence Abstract Collecting, labeling, and cleaning data for

253 Dec 21, 2022

🗣️ NALP is a library that covers Natural Adversarial Language Processing.

NALP: Natural Adversarial Language Processing Welcome to NALP. Have you ever wanted to create natural text from raw sources? If yes, NALP is for you!

21 Aug 12, 2022

Fine-tune GPT-3 with a Google Chat conversation history

Google Chat GPT-3 This repo will help you fine-tune GPT-3 with a Google Chat conversation history. The trained model will be able to converse as one o

7 Dec 10, 2022

🦅 Pretrained BigBird Model for Korean (up to 4096 tokens)

Pretrained BigBird Model for Korean What is BigBird • How to Use • Pretraining • Evaluation Result • Docs • Citation 한국어 | English What is BigBird? Bi

183 Dec 14, 2022

Seq2seq attn - Use the Seq2Seq method to implement machine translation and introduce Attention mechanism to improve the results

Seq2seq_attn Use the Seq2Seq method to implement machine translation and use the

1 Jun 28, 2022

SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

The goal is to create a single, flexible, and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies, including systems for speech recognition, speaker recognit

5.1k Jan 09, 2023

Text vectorization tool to outperform TFIDF for classification tasks

WHAT: Supervised text vectorization tool Textvec is a text vectorization tool, with the aim to implement all the "classic" text vectorization NLP meth

186 Dec 29, 2022

An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

Related tags

Overview

Welcome to AdaptNLP

What is AdaptNLP?

What is the Benefit of AdaptNLP Rather Than Just Using Transformers?

Installation Directions

PyPi

Conda (Coming Soon)

Developmental Builds

Docker Images

Navigating the Documentation

Class API

Contributing

Testing

Contact

License

Comments

Tasks and sample datasets to use:

Other Information

Releases(v0.3.7)

v0.3.7(Nov 10, 2021)

Bug Fixes

v0.3.6(Nov 9, 2021)

New Features:

v0.3.3(Sep 3, 2021)

v0.3.0(Aug 11, 2021)

New Features

Breaking Changes

Bugs Squashed

0.2.3(May 5, 2021)

Breaking Changes:

New Features

Bugs Squashed

v0.2.2(Jan 11, 2021)

Official AdaptNLP Docker Images updated

Minor Bug Fixes

v0.2.1(Sep 17, 2020)

EasySequenceClassifier.train() Updates

Tutorials and Documentation

ODSC Europe Workshop 2020: Notebooks and Colab

v0.2.0(Sep 1, 2020)

New and "easier" training framework with easy modules: EasySequenceClassifier.train() and EasySequenceClassifier.evaluate()

New and "easier" LMFineTuner

EasyTextGenerator

Tutorials and Documentation

Notebooks and Colab

Bug fixes

v0.1.6(May 1, 2020)

v0.1.5(Apr 17, 2020)

EasySummarizer and EasyTranslator Bug Fix #63

Tutorials and Workshop #64

Documentation

v0.1.4(Apr 2, 2020)

EasySummarizer #47

EasyTranslator #49

Documentation and Tutorials #52

Other

v0.1.3(Mar 6, 2020)

EasySequenceClassifier

EasyQuestionAnswering

Documentation and Tutorials

FastAPI Rest

Dependencies

v0.1.2(Feb 19, 2020)

Easy API:

Training and Fine-tuning Interface

FastAPI AdaptNLP App for Streamlined Rapid NLP-Model Deployment

Documentation

Tutorials

Continuous Integration

Formatting

Owner

Novetta

A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

A Fast Command Analyser based on Dict and Pydantic

Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2021).

Unsupervised text tokenizer focused on computational efficiency

Amazon Multilingual Counterfactual Dataset (AMCD)

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

`EasySequenceClassifier.train()` Updates

New and "easier" training framework with easy modules: `EasySequenceClassifier.train()` and `EasySequenceClassifier.evaluate()`

New and "easier" `LMFineTuner`

`EasyTextGenerator`

`EasySummarizer` and `EasyTranslator` Bug Fix #63

`EasySummarizer` #47

`EasyTranslator` #49

`EasySequenceClassifier`

`EasyQuestionAnswering`