An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

Overview

Welcome to AdaptNLP

A high level framework and library for running, training, and deploying state-of-the-art Natural Language Processing (NLP) models for end to end tasks.

CI PyPI

What is AdaptNLP?

AdaptNLP is a python package that allows users ranging from beginner python coders to experienced Machine Learning Engineers to leverage state-of-the-art Natural Language Processing (NLP) models and training techniques in one easy-to-use python package.

Utilizing fastai with HuggingFace's Transformers library and Humboldt University of Berlin's Flair library, AdaptNLP provides Machine Learning Researchers and Scientists a modular and adaptive approach to a variety of NLP tasks simplifying what it takes to train, perform inference, and deploy NLP-based models and microservices.

What is the Benefit of AdaptNLP Rather Than Just Using Transformers?

Despite quick inference functionalities such as the pipeline API in transformers, it still is not quite as flexible nor fast enough. With AdaptNLP's Easy* inference modules, these tend to be slightly faster than the pipeline interface (bare minimum the same speed), while also providing the user with simple intuitive returns to alleviate any unneeded junk that may be returned.

Along with this, with the integration of the fastai library the code needed to train or run inference on your models has a completely modular API through the fastai Callback system. Rather than needing to write your entire torch loop, if there is anything special needed for a model a Callback can be written in less than 10 lines of code to achieve your specific functionalities.

Finally, when training your model fastai is on the forefront of beign a library constantly bringing in the best practices for achiving state-of-the-art training with new research methodologies heavily tested before integration. As such, AdaptNLP fully supports training with the One-Cycle policy, and using new optimizer combinations such as the Ranger optimizer with Cosine Annealing training through simple one-line fitting functions (fit_one_cycle and fit_flat_cos).

Installation Directions

PyPi

To install with pypi, please use:

pip install adaptnlp

Or if you have pip3:

pip3 install adaptnlp

Conda (Coming Soon)

Developmental Builds

To install any developmental style builds, please follow the below directions to install directly from git:

Stable Master Branch The master branch generally is not updated much except for hotfixes and new releases. To install please use:

pip install git+https://github.com/Novetta/adaptnlp

Developmental Branch {% include note.html content='Generally this branch can become unstable, and it is only recommended for contributors or those that really want to test out new technology. Please make sure to see if the latest tests are passing (A green checkmark on the commit message) before trying this branch out' %} You can install the developmental builds with:

pip install git+https://github.com/Novetta/[email protected]

Docker Images

There are actively updated Docker images hosted on Novetta's DockerHub

The guide to each tag is as follows:

  • latest: This is the latest pypi release and installs a complete package that is CUDA capable
  • dev: These are occasionally built developmental builds at certain stages. They are built by the dev branch and are generally stable
  • *api: The API builds are for the REST-API

To pull and run any AdaptNLP image immediatly you can run:

docker run -itp 8888:8888 novetta/adaptnlp:TAG

Replacing TAG with any of the afformentioned tags earlier.

Afterwards check localhost:8888 or localhost:888/lab to access the notebook containers

Navigating the Documentation

The AdaptNLP library is built with nbdev, so any documentation page you find (including this one!) can be directly run as a Jupyter Notebook. Each page at the top includes an "Open in Colab" button as well that will open the notebook in Google Colaboratory to allow for immediate access to the code.

The documentation is split into six sections, each with a specific purpose:

Getting Started

This group contains quick access to the homepage, what are the AdaptNLP Cookbooks, and how to contribute

Models and Model Hubs

These contain any relevant documentation for the AdaptiveModel class, the HuggingFace Hub model search integration, and the Result class that various inference API's return

Class API

This section contains the module documentation for the inference framework, the tuning framework, as well as the utilities and foundations for the AdaptNLP library.

Inference and Training Cookbooks

These two sections provide quick access to single use recipies for starting any AdaptNLP project for a particular task, with easy to use code designed for that specific use case. There are currently over 13 different tutorials available, with more coming soon.

NLP Services with FastAPI

This section provides directions on how to use the AdaptNLP REST API for deploying your models quickly with FastAPI

Contributing

There is a controbution guide available here

Testing

AdaptNLP is run on the nbdev framework. To run all tests please do the following:

  1. pip install nbverbose
  2. git clone https://github.com/Novetta/adaptnlp
  3. cd adaptnlp
  4. pip install -e .
  5. nbdev_test_nbs

This will run every notebook and ensure that all tests have passed. Please see the nbdev documentation for more information about it.

Contact

Please contact Zachary Mueller at [email protected] with questions or comments regarding AdaptNLP.

Follow us on Twitter at @TheZachMueller and @AdaptNLP for updates and NLP dialogue.

License

This project is licensed under the terms of the Apache 2.0 license.

Comments
  • multi-label classification / paperswithcode dataset

    multi-label classification / paperswithcode dataset

    Hi guys,

    Hope you are all well !

    I was wondering if adaptnlp can handle multi-label classification with 1560 labels.

    More precisely, I would like to apply it to paperswithcode dataset where labels are called tasks.

    Refs:

    Thanks for any insights or inputs on that.

    Cheers, X

    opened by ghost 7
  • cannot import name 'SAVE_STATE_WARNING' from 'torch.optim.lr_scheduler'

    cannot import name 'SAVE_STATE_WARNING' from 'torch.optim.lr_scheduler'

    Describe the bug Your demo Colab Notebook "Custom Fine-Tuning and Training with Transformer Models" doesn't work and generates the following error: image

    To Reproduce Steps to reproduce the behavior:

    1. Go to '...'
    2. Click on '....'
    3. Scroll down to '....'
    4. See error

    Expected behavior A clear and concise description of what you expected to happen.

    Screenshots If applicable, add screenshots to help explain your problem.

    Desktop (please complete the following information):

    • OS: [e.g. iOS]
    • Browser [e.g. chrome, safari]
    • Version [e.g. 22]

    Smartphone (please complete the following information):

    • Device: [e.g. iPhone6]
    • OS: [e.g. iOS8.1]
    • Browser [e.g. stock browser, safari]
    • Version [e.g. 22]

    Additional context Add any other context about the problem here.

    bug 
    opened by lematmat 5
  • Significant slowdown in EasyTokenTagger release 0.2.0

    Significant slowdown in EasyTokenTagger release 0.2.0

    I'm experiencing a slowdown in NER performance using EasyTokenTagger and 'ner-ontonotes' after updating to release 0.20. Has there been any underlying changes to how the tagger object works?

    Specifically, I am dealing with a very large chunk of text. Prior to this release, the NER tagging took around 15 seconds for this particular text. Now, it's taking 15+ minutes the first time but subsequent calls on that text are very quick. Is there some sort of caching or indexing that's being done now? I'd imagine this could create a lot of overhead for large chunks of text.

    opened by mkongsiri-Novetta 5
  • Can't load big dataset

    Can't load big dataset

    Describe the bug It happens when I want to learning_rate = finetuner.find_learning_rate(**learning_rate_finder_configs) in the tutorial. I have a big dataset with 200k rows and each of them has a text with around 200 words.

    In your code when you instantiate the TextDataset, the line tokenized_text = tokenizer.convert_tokens_to_ids(tokenizer.tokenize(text)) takes an eternity for a text of 20 million words. Do you think it can be achieved in the better/faster way like by keeping the rows like they are ?

    For the record: Time for 100 characters: 0.0003399848937988281s Time for 1000 characters: 0.00124359130859375s Time for 10 000 characters: 0.012135982513427734s Time for 100 000 characters: 0.2131056785583496s Time for 1 000 000 characters: 8.782422542572021s Time for 10 000 000 characters: 734.5397665500641s

    Can't reach the end of the full TextDataset (109 610 928 characters).

    To Reproduce Tutorial with a big dataset

    opened by NicoLivesey 5
  • AttributeError: 'CamembertForMaskedLM' object has no attribute 'cls'

    AttributeError: 'CamembertForMaskedLM' object has no attribute 'cls'

    Describe the bug Trying to freeze a LMFinetuner based on Camembert weights and get:


    AttributeError Traceback (most recent call last) in 6 } 7 finetuner = LMFineTuner(**ft_configs) ----> 8 finetuner.freeze()

    ~/anaconda3/envs/pe_adaptnlp/lib/python3.8/site-packages/adaptnlp/transformers/finetuning.py in freeze(self) 1630 """Freeze last classification layer group only 1631 """ -> 1632 layers_len = len(list(self.model.cls.parameters())) 1633 self.freeze_to(-layers_len) 1634

    ~/anaconda3/envs/pe_adaptnlp/lib/python3.8/site-packages/torch/nn/modules/module.py in getattr(self, name) 573 if name in modules: 574 return modules[name] --> 575 raise AttributeError("'{}' object has no attribute '{}'".format( 576 type(self).name, name)) 577

    AttributeError: 'CamembertForMaskedLM' object has no attribute 'cls'

    To Reproduce

    from adaptnlp import LMFineTuner
    train_file = "path/to/train" 
    valid_file = "path/to/valid"
    ft_configs = {
                  "train_data_file": train_file,
                  "eval_data_file": valid_file,
                  "model_type": "camembert",
                  "model_name_or_path": "camembert-base",
                 }
    finetuner = LMFineTuner(**ft_configs)
    finetuner.freeze()
    

    Expected behavior No error

    Desktop (please complete the following information):

    • OS: Amazon Linux
    • Browser Chrome
    opened by NicoLivesey 4
  • AttributeError: 'EasyDocumentEmbeddings' object has no attribute 'rnn_embeddings'

    AttributeError: 'EasyDocumentEmbeddings' object has no attribute 'rnn_embeddings'

    Cannot use pool option to generate embeddings (instead of the default rnn).

    A snippet for the problem:

    embedding_type='albert-xxlarge-v2'
    embedding_methods=["pool"]
    doc_embeddings = EasyDocumentEmbeddings(embedding_type, methods = embedding_methods)
    
    

    This is the error I get:

      File "env/lib/python3.7/site-packages/adaptnlp/training.py", line 91, in __init__
       self._initial_setup(self.label_dict, **kwargs)
     File "env/lib/python3.7/site-packages/adaptnlp/training.py", line 97, in _initial_setup
       document_embeddings: DocumentRNNEmbeddings = self.encoder.rnn_embeddings
    AttributeError: 'EasyDocumentEmbeddings' object has no attribute 'rnn_embeddings'
    

    Expected behavior would be to successfully obtain an easy document embeddings object with no errors

    Running on debian buster, python3.7

    If someone could give me a fix or a workaround or if I'm using this incorrectly, then please let me know

    opened by blerstpub 3
  • EasySequenceClassifier tag_text function returns None for FlairSequenceClassifier model

    EasySequenceClassifier tag_text function returns None for FlairSequenceClassifier model

    Hi! I tried to follow the tutorial for training custom sequence classifier: https://novetta.github.io/adaptnlp/tutorial/training-sequence-classification.html The last step returns empty sentences while expected labels: sentences = classifier.tag_text(example_text, model_name_or_path=OUTPUT_DIR)

    To Reproduce the behavior:

    from adaptnlp import EasySequenceClassifier
    from flair.data import Sentence
    
    OUTPUT_DIR = "…/best-model.pt"    # my custom model
    classifier = EasySequenceClassifier()
    
    ex_text = "This is a good text example"
    example_text=[Sentence(ex_text)]
    
    sentences = classifier.tag_text(text=example_text, model_name_or_path=OUTPUT_DIR, mini_batch_size=1)
    print("Label output:\n")
    print(sentences)
    

    Returns

    2020-12-28 17:44:31,111 loading file .../best-model.pt
    Label output:
    
    None
    

    Surprisingly labels got added to example_text print(example_text) Returns [Sentence: " This is a good text example " [− Tokens: 17 − Sentence-Labels: {'label': [0 (0.8812)]}]]

    Proposed explanation/ contribution: I think I know the reason for unexpected behavior and will be happy to help. classifier.tag_text creates FlairSequenceClassifier classifier. FlairSequenceClassifier initiates flair.models.TextClassifier classifier and uses TextClassifier predict method within its own predict method. But flair.models.TextClassifier predict method returns None because the labels are directly added to the sentences. I can re-write FlairSequenceClassifier predict method to return Sentences with labels instead of None.

    opened by DinaraN 3
  • Sequence classification using REST API fails with models except en-sentiment

    Sequence classification using REST API fails with models except en-sentiment

    Sequence classification over REST API using any model except for en-sentiment fails with:

    File "/usr/local/lib/python3.6/dist-packages/starlette/routing.py", line 41, in app response = await func(request) File "/usr/local/lib/python3.6/dist-packages/fastapi/routing.py", line 197, in app dependant=dependant, values=values, is_coroutine=is_coroutine File "/usr/local/lib/python3.6/dist-packages/fastapi/routing.py", line 147, in run_endpoint_function return await dependant.call(**values) File "./app/main.py", line 87, in sequence_classifier text=text, mini_batch_size=1, model_name_or_path=_SEQUENCE_CLASSIFICATION_MODEL File "/adaptnlp/adaptnlp/sequence_classification.py", line 285, in tag_text return classifier.predict(text=text, mini_batch_size=mini_batch_size, **kwargs,) File "/adaptnlp/adaptnlp/sequence_classification.py", line 140, in predict text_sent.add_label(label) TypeError: add_label() missing 1 required positional argument: 'value'

    Reproducable with: docker run -itp 5000:5000 -e TOKEN_TAGGING_MODE='ner' \ -e TOKEN_TAGGING_MODEL='ner-ontonotes-fast' \ -e SEQUENCE_CLASSIFICATION_MODEL='nlptown/bert-base-multilingual-uncased-sentiment' \ achangnovetta/adaptnlp-rest:latest \ bash

    opened by VogtAI 3
  • AdaptNLP v0.2.x Additional Features Discussion

    AdaptNLP v0.2.x Additional Features Discussion

    There are a lot of ideas that may be floating for feature implementations, so this thread just provides a mini roadmap and environment to think about adaptnlp's progression.

    Ideas can be stated freely in this thread and do not replace feature-request issue posts.

    • [x] Tokenizer Start integrating tokenizers all across adaptnlp for speed and performance enhancements for training and inference.
    • [x] Summarization Add NLP-task of summarization using document-level encoder based on transformer language models
    • [x] GPU Multi-GPU and mixed-precision is prevalent in AdaptNLP, but its implementation can be improved and debugged ~~FastAPI Batch-Serving Improve on the concurrent calls with batch processing from the NLP models (maybe try to make it CPU and GPU agnostic for ease-of-use)~~ ~~Model Downloading Start structuring a way to download and potentially upload pre-trained NLP-task models~~
    enhancement 
    opened by aychang95 3
  • Data API

    Data API

    We probably should have a data API of some form, that ties into https://github.com/Novetta/adaptnlp/issues/128

    Ideally it should simply prep a dataset for tokenization of a model, or tokenize the data itself.

    For now we cover two inputs:

    1. Individual texts
    2. CSV

    We should support something akin to fastai's get_y, but with decent defaults so that customization is available, but not needed.

    Ideally something like:

    dset = TaskDataset.from_df(
      df,  # Can be fname or dataframe
      get_x = ColReader('text'),
      get_y = ColReader('label'),
      splitter = RandomSplitter(),
      model = 'bert-base-uncased', # The name/type of downstream model
      task = "ner" # Or use a `Task.NER` namespace class
    )
    

    And further:

    dset.dataloaders(bs=8, collate_fn=data_collator)
    

    It reads extremely similar to the fastai API, but we do not use the fastai API, as for text doing it like this is a bit easier.

    The highest level API would look like so:

    dls = TaskDataLoaders.from_df(df, 'text', 'label', model='bert-base-uncased')
    

    We should note the model used, and when integrating it with the tuning API if something is off with the model entered, we make note of that

    enhancement 
    opened by muellerzr 2
  • ImportError: cannot import name 'EasyTokenTagger'

    ImportError: cannot import name 'EasyTokenTagger'

    Describe the bug A clear and concise description of what the bug is. I tried to run the code in the tutorial

    from adaptnlp import EasyTokenTagger
    
    
    ## Example Text
    example_text = "Novetta's headquarters is located in Mclean, Virginia."
    
    ## Load the token tagger module and tag text with the NER model 
    tagger = EasyTokenTagger()
    sentences = tagger.tag_text(text=example_text, model_name_or_path="ner")
    
    ## Output tagged token span results in Flair's Sentence object model
    for sentence in sentences:
        for entity in sentence.get_spans("ner"):
            print(entity)
    

    and it gave me the error:

    ...
      File "/home/rajiv/Documents/dev/python/nltk-trial/adaptnlp.py", line 2, in <module>
        from adaptnlp import EasyTokenTagger
    ImportError: cannot import name 'EasyTokenTagger'
    

    Desktop (please complete the following information):

    • OS: Ubuntu
    • Version: 20.04
    • Python: 3.6.9
    opened by RAbraham 2
  • classifier.tag_text on GPU!

    classifier.tag_text on GPU!

    hi i want to classify texts: classifier = EasySequenceClassifier() hub = HFModelHub() hub.search_model_by_task('text-classification') model = hub.search_model_by_name('nlptown/bert-base', user_uploaded=True)[0]; sentence = classifier.tag_text(text=inputs, model_name_or_path=model, mini_batch_size=1)

    Q1: how force to run it on CPU? Q2: now i have GPU but i can't success to run, my errors: ... FileNotFoundError: [Errno 2] No such file or directory: 'nlptown/bert-base-multilingual-uncased-sentiment' During handling of the above exception, another exception occurred: ... RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

    opened by topliftarm 0
  • Unified Training API

    Unified Training API

    Training API will use fastai under the hood, and we'll make a few functions to build general datasets.

    Tasks and sample datasets to use:

    Other Information

    Task API's should have a simple user interface, IE high-level can only input specific options, while midlevel has access to the full fastai Learner params.

    Example mid-level API I'm thinking about:

    dls = some_build_data_thing()
    tuner = QAFineTuner(dls, 'bert-base-cased')
    tuner.tune(
      scheduler = 'fit_flat_cos',
      n_epochs = 3,
      lr = None,
      suggest_method = 'valley', # Triggers if lr is None
      additional_callbacks = []
    )
    

    And its high-level:

    tuner = QAFineTuner.from_csv(
      question_column_name = "question",
      answer_column_name = "answer",
      model = "bert-base-cased"
    )
    tuner.tune(...)
    

    We should automatically pull in proper metrics for each task, but users have the option to bring in their own as well and pass it to QAFineTuner (good defaults)

    Tuners should also have a func like QAFineTuner.from_csv() to build the dataset in-house

    enhancement 
    opened by muellerzr 2
  • Save context in QuestionAnswering and re-use it

    Save context in QuestionAnswering and re-use it

    I notices when we run any code snippet, it convert the text to vectors or some similar thing. For example in this code snippet

    from adaptnlp import EasyQuestionAnswering 
    from pprint import pprint
    
    ## Example Query and Context 
    query = "What is the meaning of life?"
    context = "Machine Learning is the meaning of life."
    top_n = 5
    
    ## Load the QA module and run inference on results 
    qa = EasyQuestionAnswering()
    best_answer, best_n_answers = qa.predict_qa(query=query, context=context, n_best_size=top_n, mini_batch_size=1, model_name_or_path="distilbert-base-uncased-distilled-squad")
    
    ## Output top answer as well as top 5 answers
    print(best_answer)
    pprint(best_n_answers)
    

    It convert both query and context to vectors first. What if we have very long context and we have a lot of queries, each time it will convert the context to vector. I think there should be a way to save context vector and re-use it instead of creating again and again.

    enhancement 
    opened by talhaanwarch 1
  • Stretch Goals

    Stretch Goals

    • [x] HuggingFace raw embeddings over Flair
    • [x] Try and integrate Callbacks for text generation and other classes that aren't using it

      Note: Didn't do this for text generation, more complex than its worth

    • [x] Use fastrelease (with conda)
    • [x] Improve test coverage
    • [x] GH CI for testing Mac, Windows, and Linux, similar to how fastai has it setup
    • [x] nbdev?
    • [x] Windows support
    • [x] Use Pipeline for inference

      Note: Pipeline is slower on many tasks that AdaptNLP covers, tests are in place to ensure that this is always true

    • [ ] 1.0.0: Unified training framework for at least 4 NLP tasks
    enhancement 
    opened by muellerzr 0
Releases(v0.3.7)
  • v0.3.7(Nov 10, 2021)

  • v0.3.6(Nov 9, 2021)

  • v0.3.3(Sep 3, 2021)

    Bug Squashed

    • Embeddings were conjoined rather than separated out by word
    • Question Answering Results would only return the first instance, rather than top n instances
    • AdaptiveTuner can accept a label_names parameter for where the labels in a batch are present
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Aug 11, 2021)

    New Features

    • A new Data API that integrates with HuggingFace's Dataset class

    • A new Tuner API for training and fine-tuning Transformer models

    • Full integration of the latest fastai library for full access to state-of-the-art practices when training and fine-tuning a model. As improvements are made to the library AdaptNLP will update to accomodate them

    • A new Result API that most inference modules return. This is a filterable result ensuring that you only get the most relevent information when returning a prediction from the Easy* modules

    Breaking Changes

    • The train and eval capabilities in the Easy* modules no longer exist, and all training related functionalities have migrated to the Tuner API
    • LanguageModelFineTuner no longer exists, and the same tuning functionality is in LanguageModelTuner

    Bugs Squashed

    • max_len Attribute Error (127
    • Integrate a complete Data API (milestone) (129
    • Use the latest fastcore (132)
    • Fix unused kwarg arguments in text generation (134)
    • Fix name 'df' is not defined (135)
    Source code(tar.gz)
    Source code(zip)
  • 0.2.3(May 5, 2021)

    Breaking Changes:

    • New versions of AdaptNLP will require a minimum torch version of 1.7, and flair of 0.9 (currently we install via git until 0.9/0.81 is released)

    New Features

    Bugs Squashed

    • Fix accessing bart-large-cnn (110)
    • Fix SAVE_STATE_WARNING (114)
    Source code(tar.gz)
    Source code(zip)
  • v0.2.2(Jan 11, 2021)

    Official AdaptNLP Docker Images updated

    • Using NVIDIA NGC Container Registry Cuda base images #101
    • All images should be deployable via. Kubeflow Jupyter Servers
    • Cleaner python virtualvenv setup #101
    • Official readme can be found at https://github.com/Novetta/adaptnlp/blob/master/docker/README.md

    Minor Bug Fixes

    • Fix token tagging REST application type check #92
    • Semantic fixes in readme #94
    • Standalone microservice REST application images #93
    • Python 3.7+ is now an official requirement #97
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Sep 17, 2020)

    Updated to nlp 0.4 -> datasets 1.0+ and multi-label training for sequence classification fixes.

    EasySequenceClassifier.train() Updates

    • Integrates datasets.Dataset now
    • Swapped order of formatting and label column renaming due to labels not showing up from torch data batches #87

    Tutorials and Documentation

    • Documentation and sequence classification tutorials have been updated to address nlp->datasets name change
    • Broken links also updated

    ODSC Europe Workshop 2020: Notebooks and Colab

    • ODSC Europe 2020 workshop materials now available in repository "/tutorials/Workshop"
    • Easy to run notebooks and colab links aligned with the tutorials are available
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Sep 1, 2020)

    Updated to transformers 3+, nlp 0.4+, flair 0.6+, pandas 1+

    New Features!

    New and "easier" training framework with easy modules: EasySequenceClassifier.train() and EasySequenceClassifier.evaluate()

    • Integrates nlp.Dataset and transformers.Trainer for a streamlined training workflow
    • Tutorials, notebooks, and colab links available
    • Sequence Classification task has been implemented, other NLP tasks are in the works
    • SequenceClassifierTrainer is still available, but will be transitioned into the EasySequenceClassifier and deprecated

    New and "easier" LMFineTuner

    • Integrates transformers.Trainer for a streamlined training workflow
    • Older LMFineTuner is still available as LMFineTunerManual, but will be deprecated in later releases
    • Tutorials, notebooks, and colab links available

    EasyTextGenerator

    • New module for text generation. GPT models are currently supported, other models may work but still experimental
    • Tutorials, notebooks, and colab links available

    Tutorials and Documentation

    • Documentation has been edited and updated to include additional features like the change in training frameworks and fine-tuning
    • The sequence classification tutorial is a good indicator of the direction we are going with the training and fine-tuning framework

    Notebooks and Colab

    • Easy to run notebooks and colab links aligned with the tutorials are available

    Bug fixes

    • Minor bug and implementation error fixes from flair upgrades
    Source code(tar.gz)
    Source code(zip)
  • v0.1.6(May 1, 2020)

  • v0.1.5(Apr 17, 2020)

    Updated to Transformers 2.8.0 which now includes the ELECTRA language model

    EasySummarizer and EasyTranslator Bug Fix #63

    • Address mini batch output format issue for language model heads for the summarization and translation task

    Tutorials and Workshop #64

    • Add the ODSC Timeline Generator notebooks along with colab links
    • Small touch ups in tutorial notebooks

    Documentation

    • Address missing model_name_or_path param in some easy modules
    Source code(tar.gz)
    Source code(zip)
  • v0.1.4(Apr 2, 2020)

    Updated to Transformers 2.7.0 which includes the Bart and T5 Language Models!

    EasySummarizer #47

    • New module for summarizing documents. These support both the T5 and Bart pre-trained models provided by Hugging Face.
    • Helper objects for the easy module that can be run as standalone instances TransformersSummarizer

    EasyTranslator #49

    • New module for translating documents with T5 pre-trained models provided by Hugging Face.
    • Helper objects for the easy module that can be run as standalone instances TransformersTranslator

    Documentation and Tutorials #52

    • New Class API documentation for EasySummarizer and EasyTranslator
    • New tutorial guides, initial notebooks, and links to colab for the above as well
    • Readme provides quickstart samples that show examples from the notebooks #53

    Other

    • Dockerhub repo for adaptnlp-rest added here https://hub.docker.com/r/achangnovetta/adaptnlp-rest
    • Upgraded CircleCI allowing us to run #40
    • Added Nightly build #39
    Source code(tar.gz)
    Source code(zip)
  • v0.1.3(Mar 6, 2020)

    Sequence Classification and Question Answering updates to integrate Hugging Face's public models.

    EasySequenceClassifier

    • Can now take Flair and Transformers pre-trained sequence classification models as input in the model_name_or_path param
    • Helper objects for the easy module that can be run as standalone instances TransformersSequenceClassifier FlairSequenceClassifier

    EasyQuestionAnswering

    • Can now take Transformers pre-trained sequence classification models as input in the model_name_or_path param
    • Helper objects for the easy module that can be run as standalone instances TransformersQuestionAnswering

    Documentation and Tutorials

    Documentation has been updated with the above implementations

    • Tutorials updated with better examples to convey changes
    • Class API docs updated
    • Tutorial notebooks updated
    • Colab notebooks better displayed on readme

    FastAPI Rest

    FastAPI updated to latest (0.52.0) FastAPI endpoints can now be stood up and deployed with any huggingface sequence classification or question answering model specified as an env var arg.

    Dependencies

    Transformers pinned for stable updates

    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Feb 19, 2020)

    AdaptNLP's first published release on github.

    Easy API:

    • EasyTokenTagger
    • EasySequenceClassifier
    • EasyWordEmbeddings
    • EasyStackedEmbeddings
    • EasyDocumentEmbeddings

    Training and Fine-tuning Interface

    • SequenceClassifierTrainer
    • LMFineTuner

    FastAPI AdaptNLP App for Streamlined Rapid NLP-Model Deployment

    • adaptnlp/rest
    • configured to run any pretrained and custom trained flair/adaptnlp models
    • compatible with nvidia-docker for GPU use
    • AdaptNLP integration but loosely coupled

    Documentation

    • Documentation release with walk-through guides, tutorials, and Class API docs of the above
    • Built with mkdocs, material for mkdocs, and mkautodoc

    Tutorials

    • IPython/Colab Notebooks provided and updated to showcase AdaptNLP Modules

    Continuous Integration

    • CircleCI build and tests running successfully and minimally
    • Github workflow for pypi publishing added

    Formatting

    • Flake8 and Black adherence
    Source code(tar.gz)
    Source code(zip)
A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.

Multilingual Latent Dirichlet Allocation (LDA) Pipeline This project is for text clustering using the Latent Dirichlet Allocation (LDA) algorithm. It

Artifici Online Services inc. 74 Oct 07, 2022
A Fast Command Analyser based on Dict and Pydantic

Alconna Alconna 隶属于ArcletProject, 在Cesloi内有内置 Alconna 是 Cesloi-CommandAnalysis 的高级版,支持解析消息链 一般情况下请当作简易的消息链解析器/命令解析器 文档 暂时的文档 Example from arclet.alcon

19 Jan 03, 2023
Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2021).

Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER. @inproceedings{tedes

Babelscape 40 Dec 11, 2022
Unsupervised text tokenizer focused on computational efficiency

YouTokenToMe YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE)

VK.com 847 Dec 19, 2022
Amazon Multilingual Counterfactual Dataset (AMCD)

Amazon Multilingual Counterfactual Dataset (AMCD)

35 Sep 20, 2022
PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing pororo performs Natural Language Processing and Speech-related tasks. It is easy to

Kakao Brain 1.2k Dec 21, 2022
Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding

Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding

Bethge Lab 61 Dec 21, 2022
Python library for interactive topic model visualization. Port of the R LDAvis package.

pyLDAvis Python library for interactive topic model visualization. This is a port of the fabulous R package by Carson Sievert and Kenny Shirley. pyLDA

Ben Mabey 1.7k Dec 20, 2022
Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

PythonTextObfuscator Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense. Requi

2 Aug 29, 2022
Code for ACL 2020 paper "Rigid Formats Controlled Text Generation"

SongNet SongNet: SongCi + Song (Lyrics) + Sonnet + etc. @inproceedings{li-etal-2020-rigid, title = "Rigid Formats Controlled Text Generation",

Piji Li 212 Dec 17, 2022
Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch

N-Grammer - Pytorch Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch Install $ pip install n-grammer-pytorch Usage

Phil Wang 66 Dec 29, 2022
一个基于Nonebot2和go-cqhttp的娱乐性qq机器人

Takker - 一个普通的QQ机器人 此项目为基于 Nonebot2 和 go-cqhttp 开发,以 Sqlite 作为数据库的QQ群娱乐机器人 关于 纯兴趣开发,部分功能借鉴了大佬们的代码,作为Q群的娱乐+功能性Bot 声明 此项目仅用于学习交流,请勿用于非法用途 这是开发者的第一个Pytho

风屿 79 Dec 29, 2022
An A-SOUL Text Generator Based on CPM-Distill.

ASOUL-Generator-Backend 本项目为 https://asoul.infedg.xyz/ 的后端。 模型为基于 CPM-Distill 的 transformers 转化版本 CPM-Generate-distill 训练而成。

infinityedge 46 Dec 11, 2022
Synthetic data for the people.

zpy: Synthetic data in Blender. Website • Install • Docs • Examples • CLI • Contribute • Licence Abstract Collecting, labeling, and cleaning data for

Zumo Labs 253 Dec 21, 2022
🗣️ NALP is a library that covers Natural Adversarial Language Processing.

NALP: Natural Adversarial Language Processing Welcome to NALP. Have you ever wanted to create natural text from raw sources? If yes, NALP is for you!

Gustavo Rosa 21 Aug 12, 2022
Fine-tune GPT-3 with a Google Chat conversation history

Google Chat GPT-3 This repo will help you fine-tune GPT-3 with a Google Chat conversation history. The trained model will be able to converse as one o

Nate Baer 7 Dec 10, 2022
🦅 Pretrained BigBird Model for Korean (up to 4096 tokens)

Pretrained BigBird Model for Korean What is BigBird • How to Use • Pretraining • Evaluation Result • Docs • Citation 한국어 | English What is BigBird? Bi

Jangwon Park 183 Dec 14, 2022
1 Jun 28, 2022
SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

The goal is to create a single, flexible, and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies, including systems for speech recognition, speaker recognit

SpeechBrain 5.1k Jan 09, 2023
Text vectorization tool to outperform TFIDF for classification tasks

WHAT: Supervised text vectorization tool Textvec is a text vectorization tool, with the aim to implement all the "classic" text vectorization NLP meth

186 Dec 29, 2022