Natural Language Processing library built with AllenNLP ๐ŸŒฒ๐ŸŒฑ

Overview



CI GitHub Documentation GitHub release

Natural Language Processing library built with AllenNLP

Quick Links

Features

  • State-of-the-art and not so state-of-the-art models trained with your own data with simple workflows.

  • Efficient data reading for (large) datasets in multiple formats and sources (CSV, Parquet, JSON, etc.).

  • Modular configuration and extensibility of models, datasets and training runs programmatically or via config files.

  • Use via cli or as plain Python (e.g., inside a Jupyter Notebook)

  • Compatible with AllenNLP

Installation

For the installation we recommend setting up a fresh conda environment:

conda create -n biome python~=3.7.0 pip>=20.3.0
conda activate biome

Once the conda environment is activated, you can install the latest release via pip:

pip install -U biome-text

After installing biome.text, the best way to test your installation is by running the biome.text cli command:

biome --help

Get started

The best way to see how biome.text works is to go through our first tutorial.

Please refer to our documentation for more tutorials, detailed user guides and how you can contribute to biome.text.

Licensing

The code in this project is licensed under Apache 2 license.

Comments
  • feat: package redefiniton

    feat: package redefiniton

    Introduction

    This PR creates a new pipeline design based on internal discussions.

    A lot of changes will be found here, so, a review tracking will be hard to do. This is the reason why every commit comes from a different pull request where you can check (too late for review) the partial changes

    Keep in mind

    • The old pipeline implementations are fully operatives
    • The command line works using the old implementation/configuration
    • You can use these new features using as python library

    An example of use

    See examples folder

    opened by frascuchon 13
  • Adding a new HPO component, making is compatible with Datasets

    Adding a new HPO component, making is compatible with Datasets

    This PR introduces a new HPO class (RayTuneTrainable) that is compatible with Datasets and is intended to replace the HpoParams and HpoExperiment classes.

    Personally i find it confusing that you can define the same parameters in the tune.Experiment and in tune.run, but the latter will be ignored if they were specified in the former. So the new implementation does not make use of the tune.Experiment and relies more on the parameters provided directly to tune.run. Also working with @ignacioct we noticed that it is more intuitive and faster to just copy the configs of your pipeline or trainer, and replace the parameters with the search spaces, than to write a new dict with only the search spaces. So the merging capabilities of the HpoParams are not really needed.

    A minimal usage example of the new class would be:

    my_trainable = RayTuneTrainable(pipeline_config, trainer_config, train_dataset, valid_dataset)
    tune.run(my_trainable.func, config=my_trainable.config)
    

    For a more detailed usage you can have a look at the updated tutorial. I think it's a bit more elegant than the three step flow we have right now:

    HpoParams -> HpoExperiment -> ray.tune(HpoExperiment.as_tune_experiment())
    

    @frascuchon If you are OK with this proposal i would go ahead and remove the old HPO components in a follow-up PR.

    opened by dcfidalgo 9
  • [Draft] Add a `Pipeline.evaluate` method

    [Draft] Add a `Pipeline.evaluate` method

    This PR adds an evaluate method to our Pipeline class addressing issue #406 . I left some todos, since i want to have a quick discussion first, before considering this for a merge.

    Right now the Pipeline class has the predict/explain/evaluate methods that only really make sense for the _PreTrainedPipeline class. To make a meaningful prediction you have following flow:

    • pl = Pipeline.from_config() -> pl.train() -> pl = Pipeline.from_pretrained() -> pl.predict /pl.evaluate

    The reason is that we do not modify the weights "in-place", but create a copy of the pipeline when we train. The advantages of an "in-place" modification would be:

    • allows for a straight forward flow: pl.from_config -> pl.train -> pl.predict/pl.evaluate (right now, this flow will "work", but with an unexpected result);
    • less memory footprint, no need to keep two models in memory when training;

    Disadvantages would be (maybe i am missing some!):

    • consecutive trainings from scratch would need an intermediate reset step: pl.from_config() -> pl.train -> pl.reset -> pl.train

    In summary i vote for either:

    • implementing some sort of reset method that allows to reset a model to its initial state after a training, and modify the weights in-place,
    • or moving the inference methods to the _PreTrainedPipeline class, although this will break the possibility of calling the predict method to test a pipeline configuration.

    @frascuchon what do you think?

    opened by dcfidalgo 6
  • Ray Tune tutorial

    Ray Tune tutorial

    This PR adds a HPO tutorial in which we use Ray Tune to perform a hyperparameter search. With this link you can have a look at it in Google Colab, i think it is the best way to review this.

    There is still a section missing (Checking results), but maybe we can have a quick call tomorrow to have a look at this together.

    opened by dcfidalgo 6
  • quick pass over API doc strings

    quick pass over API doc strings

    Another quick pass over the API doc strings. The main additions are the TrainerConfiguration, WordFeatures and CharFeatures doc strings. I am not sure if the arguments cache_instances and in_memory_batches are used at all at the moment, have to check this.

    I also propose a slight change in the format: i would avoid specifying the type in the doc string, so this:

    def get_example(argument: int) -> str:
    """Gets an example name.
    
    Parameters
    ----------
    argument : int
        An argument
    
    Returns
    -------
    example : str
        Name of the example
    

    becomes this:

    def get_example(argument: int) -> str:
    """Gets an example name.
    
    Parameters
    ----------
    argument
        An argument
    
    Returns
    -------
    example
        Name of the example
    

    Since we consequently use type annotations the information is already in the signature of the method. Also the rendered html files look prettier without all the colored boxes in my opinion.

    @dvsrepo @frascuchon what do you think? I would be willing to change the format for all present doc strings in a follow-up PR.

    opened by dcfidalgo 6
  • Move to_yaml/from_yaml logic to PipelineConfiguration

    Move to_yaml/from_yaml logic to PipelineConfiguration

    Just a small refactoring: this PR moves the Pipeline.to_yaml() method and from_yaml() logic to the PipelineConfiguration.

    I think it is more explicit to write my_pipeline.config.to_yaml() than my_pipeline.to_yaml(), since you really just serialize the configuration, and not the whole pipeline with its model/weights. @dvsrepo would that be ok for you?

    opened by dcfidalgo 6
  • integration test for the text classification

    integration test for the text classification

    This PR adds an integration test using the TextClassification head.

    On my machine it takes <1 min and the numbers are reproducible. It covers only a small part of the functionality, but with this test we would have caught the embedding bug for example. The idea is that with time we extend the test to cover more functionalities, and maybe it can serve as blue print for other integration tests.

    opened by dcfidalgo 6
  • Feat/precommit hook

    Feat/precommit hook

    Here is a little idea I discussed with @dcfidalgo and we found it interesting to optimice the way in which we use code formatters and make the commits. It is based on pre-commit, a python package that allows the introduction of small scripts applied before each commit. It can be configured in the .yaml file attached to this PR.

    To test it, I added hooks for:

    • Three predefined hooks of the pre-coomit package: checking the config file integrity, EOF fixer and trailing whitespaces fixer.
    • Black hook, as if we typed black ourscript.py in the terminal.
    • Reorder python imports, another pip package capable of reordering the imports of all script in a logical way.

    I work in VSCode, and push using the GUI functionalities built within it. In my POV, once I commit I script, I get a warning that not all changes have been committed, and I can recommit the file with all those changes applied. I expect this works in a similar way via terminal or Pycharm.

    In order to introduce this functionality to our workflow, we could make the dev version of biome to require pre-commit package (and the reorder if we may), or the interested one could include it in their personal repositories and add it to .git/info/exclude (a .gitignore that does not get uploaded to the repository).

    Tell me your thoughts and opinions ๐Ÿ˜ƒ

    opened by ignacioct 5
  • Feat/add slot filling tutorial

    Feat/add slot filling tutorial

    This PR adds the slot filling tutorial. You can find the tutorial here

    Apart from the tutorial there are several fixes:

    • bug fix in the tokenclassifier
    • i set flatten by default to False when reading json (cc @frascuchon )
    • moved the basicConfig for the logging module to the init of the package (cc @frascuchon ). this one was driving me crazy to figure out how logging works when no handler is specified ...

    I still have to improve the test, so it does not take ages, but tests a fair amount of functionality.

    opened by dcfidalgo 5
  • [BUG] Check empty instances on _model.predict()

    [BUG] Check empty instances on _model.predict()

    Describe the bug

    Predicting on examples which generate empty instances should raise an error.

    This type of instances make explore and predict to fail (at least when using char features)

    To Reproduce

    pipeline = Pipeline.from_pretrained('runs/v1.text.classifier/model.tar.gz')
    pipeline.predict('')
    
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    <ipython-input-9-002150b8da6a> in <module>
    ----> 1 pipeline.predict('')
    
    ~/recognai/biome/text/src/biome/text/helpers.py in wrapper(*args, **kwargs)
         59 
         60     def wrapper(*args, **kwargs):
    ---> 61         return to_method(*args, **kwargs)
         62 
         63     wrapper.__signature__ = signature
    
    ~/recognai/biome/text/src/biome/text/pipeline.py in predict(self, *args, **kwargs)
        284             A dictionary containing the predictions and additional information
        285         """
    --> 286         return self._model.predict(*args, **kwargs)
        287 
        288     def explain(self, *args, **kwargs) -> Dict[str, Any]:
    
    ~/recognai/biome/text/src/biome/text/_model.py in predict(self, *args, **kwargs)
        277         inputs = self._model_inputs_from_args(*args, **kwargs)
        278         instance = self.text_to_instance(**inputs)
    --> 279         prediction = self.forward_on_instance(instance)
        280         self.log_prediction(inputs, prediction)
        281 
    
    /anaconda3/lib/python3.7/site-packages/allennlp/models/model.py in forward_on_instance(self, instance)
        144         `torch.Tensors` into numpy arrays and remove the batch dimension.
        145         """
    --> 146         return self.forward_on_instances([instance])[0]
        147 
        148     def forward_on_instances(self, instances: List[Instance]) -> List[Dict[str, numpy.ndarray]]:
    
    /anaconda3/lib/python3.7/site-packages/allennlp/models/model.py in forward_on_instances(self, instances)
        170             dataset.index_instances(self.vocab)
        171             model_input = util.move_to_device(dataset.as_tensor_dict(), cuda_device)
    --> 172             outputs = self.make_output_human_readable(self(**model_input))
        173 
        174             instance_separated_output: List[Dict[str, numpy.ndarray]] = [
    
    /anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
        548             result = self._slow_forward(*input, **kwargs)
        549         else:
    --> 550             result = self.forward(*input, **kwargs)
        551         for hook in self._forward_hooks.values():
        552             hook_result = hook(self, input, result)
    
    ~/recognai/biome/text/src/biome/text/_model.py in forward(self, *args, **kwargs)
        134     def forward(self, *args, **kwargs) -> Dict[str, torch.Tensor]:
        135         """The main forward method. Wraps the head forward method and converts the head output into a dictionary"""
    --> 136         head_output: TaskOutput = self._head.forward(*args, **kwargs)
        137         # we don't want to break AllenNLP API: TaskOutput -> as_dict()
        138         return head_output.as_dict()
    
    ~/recognai/biome/text/src/biome/text/modules/heads/classification/text_classification.py in forward(self, text, label)
         66 
         67         mask = get_text_field_mask(text)
    ---> 68         embedded_text = self.backbone.forward(text, mask)
         69         embedded_text = self.pooler(embedded_text, mask=mask)
         70 
    
    ~/recognai/biome/text/src/biome/text/backbone.py in forward(self, text, mask, num_wrapping_dims)
         51     ) -> torch.Tensor:
         52         """Applies embedding + encoder layers"""
    ---> 53         embeddings = self.embedder(text, num_wrapping_dims=num_wrapping_dims)
         54         return self.encoder(embeddings, mask=mask)
         55 
    
    /anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
        548             result = self._slow_forward(*input, **kwargs)
        549         else:
    --> 550             result = self.forward(*input, **kwargs)
        551         for hook in self._forward_hooks.values():
        552             hook_result = hook(self, input, result)
    
    /anaconda3/lib/python3.7/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py in forward(self, text_field_input, num_wrapping_dims, **kwargs)
         82                 # If there's only one tensor argument to the embedder, and we just have one tensor to
         83                 # embed, we can just pass in that tensor, without requiring a name match.
    ---> 84                 token_vectors = embedder(list(tensors.values())[0], **forward_params_values)
         85             else:
         86                 # If there are multiple tensor arguments, we have to require matching names from the
    
    /anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
        548             result = self._slow_forward(*input, **kwargs)
        549         else:
    --> 550             result = self.forward(*input, **kwargs)
        551         for hook in self._forward_hooks.values():
        552             hook_result = hook(self, input, result)
    
    /anaconda3/lib/python3.7/site-packages/allennlp/modules/token_embedders/token_characters_encoder.py in forward(self, token_characters)
         35     def forward(self, token_characters: torch.Tensor) -> torch.Tensor:
         36         mask = (token_characters != 0).long()
    ---> 37         return self._dropout(self._encoder(self._embedding(token_characters), mask))
    
    /anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
        548             result = self._slow_forward(*input, **kwargs)
        549         else:
    --> 550             result = self.forward(*input, **kwargs)
        551         for hook in self._forward_hooks.values():
        552             hook_result = hook(self, input, result)
    
    /anaconda3/lib/python3.7/site-packages/allennlp/modules/time_distributed.py in forward(self, pass_through, *inputs, **kwargs)
         33         pass_through = pass_through or []
         34 
    ---> 35         reshaped_inputs = [self._reshape_tensor(input_tensor) for input_tensor in inputs]
         36 
         37         # Need some input to then get the batch_size and time_steps.
    
    /anaconda3/lib/python3.7/site-packages/allennlp/modules/time_distributed.py in <listcomp>(.0)
         33         pass_through = pass_through or []
         34 
    ---> 35         reshaped_inputs = [self._reshape_tensor(input_tensor) for input_tensor in inputs]
         36 
         37         # Need some input to then get the batch_size and time_steps.
    
    /anaconda3/lib/python3.7/site-packages/allennlp/modules/time_distributed.py in _reshape_tensor(input_tensor)
         66         input_size = input_tensor.size()
         67         if len(input_size) <= 2:
    ---> 68             raise RuntimeError(f"No dimension to distribute: {input_size}")
         69         # Squash batch_size and time_steps into a single axis; result has shape
         70         # (batch_size * time_steps, **input_size).
    
    RuntimeError: No dimension to distribute: torch.Size([1, 0])
    
    

    Expected behavior

    A clear and concise description of what you expected to happen.

    Screenshots

    If applicable, add screenshots to help explain your problem.

    OS environment

    • OS: macOS
    • biome.text Version 1.0.0rc

    Additional context

    Add any other context about the problem here.

    bug 
    opened by dvsrepo 5
  • feat(record-pair): allow compare (explain) record with missing keys

    feat(record-pair): allow compare (explain) record with missing keys

    This PR includes a minimal change to RecordPairClassification record featurize for even both record before generate features.

    This change make available explain over records with missing keys

    opened by frascuchon 5
  • Invalid metric warning in HPO

    Invalid metric warning in HPO

    Is your feature request related to a problem? Please describe. When specifying a metric in an HPO run with ray tune, we could try to check if that metric exists and can be used. Maybe is out of our hand, as this is passed to tune.run(), but biome is running in the background so maybe that metric can be checcked

    Describe the solution you'd like Imagine:

    analysis_frozen = tune.run( tune_exp, scheduler=tune.schedulers.ASHAScheduler(), metric="totally_fake_metric", mode="max", progress_reporter=tune.JupyterNotebookReporter(overwrite=True), )

    We could print a warning/error.

    enhancement 
    opened by ignacioct 1
  • Search algorithm requires config argument in `tune.run`

    Search algorithm requires config argument in `tune.run`

    When we want to use a search algorithm in an HPO run, we need to provide a config to the tune.run method. This should be properly documented in our docs:

    analysis = tune.run(
        hpo_experiment,
        config=hpo_experiment.config,
        scheduler=tune.schedulers.ASHAScheduler(),
        search_alg=search_alg,
        metric="validation_valid_ner/f1-measure-overall",
        mode="max",
    )
    opened by dcfidalgo 0
  • Improve error message when the

    Improve error message when the "label" column is missing in the dataset

    At the moment, if the dataset is missing the "label" column (or whatever column is necessary to train the model) and you want to train the model with it, the error message is:

    RuntimeError: The model you are trying to optimize does not contain a 'loss' key in the output of model.forward(inputs).
    

    I think we should catch this failure earlier and print out a more precise error message.

    One idea would be to have a bool argument for_training in the Dataset.to_instance method. Depending on this argument, it checks for the necessary columns. Edit: This idea is actually bull**** since we only call to_instance when we want to create the vocab or train the pipeline ...

    opened by dcfidalgo 0
  • Investigate posibility of making entities a feature

    Investigate posibility of making entities a feature

    Is your feature request related to a problem? Please describe. We introduced a RelationClassifier in #370, but the implementation is not optimal. As discussed in this PR, we may want to treat the entities as a feature and not as an direct input to the forward method.

    Describe the solution you'd like Treat entities as a feature (like word, char or transformers).

    Describe alternatives you've considered Leave as is, if there are major obstacles.

    Additional context It would be nice to have a general solution to eventually add other token features like POS, for example.

    enhancement 
    opened by dcfidalgo 0
  • [head] Relation extraction + NER multitask head

    [head] Relation extraction + NER multitask head

    Is your feature request related to a problem? Please describe. In order to better support information extraction use cases, joint models performing relation extraction + NER typically perform better and simplify extraction problems.

    Describe the solution you'd like The solution will be to create a joint task head performing NER -> Relation Extraction (Classification). This can be done combining our current TokenClassification and RelationClassification heads.

    I include a working implementation draft (https://gist.github.com/dvsrepo/a33bcd1c4e7074fbf15aefdccca5b46f) with several caveats:

    • We need to extend our current vocabulary handling to support heads to have custom label namespaces (now its fixed in vocabulary.LABEL_NAMESPACE. When you start combining heads with different label domains (e.g., labels for a classifier and tags for a token classifier) they will basically overwrite each other, leading to indexing issues. Ideally, the label namespace could be set in the head (although I would no recommend to request this to the user in the init or configuration).

    • Loss could be calculated with different coefficients, e.g. loss_classiffier + 0.5*loss_ner. This is a hyperparam which could be optimized with HPO so it should go to the head config.

    • We need to think about the TaskOutput and metrics report (see the implementation for a rough idea).

    • This is the first implementation of a multitask head so we should set the basis for other multitask models (e.g., classification + lm loss term)

    • Backbone forward pass is done twice (or N times if we had N heads).

    • There are some issues with default_mapping functionality when we have several optional params (entities and labels in our case) see data creation in gist

    enhancement 
    opened by dvsrepo 0
Releases(v3.3.0)
  • v3.3.0(Sep 8, 2021)

    Added:

    • Create parent dirs with Trainer.fit(output_dir) https://github.com/recognai/biome-text/commit/b6de84a43989c55c8b8da70cfabd172af4694ce4
    • Add a vocab_config argument in the TuneExperiment class https://github.com/recognai/biome-text/commit/16ef230b4f9631f4a2dbd57f04ed2e79a8265f36

    Removed:

    Changed:

    • bumped up versions for a lot of dependencies (including AllenNLP to 2.7.0) and broaden the versions of the spacy, ray tune, datasets and mlflow dependencies.
    Source code(tar.gz)
    Source code(zip)
  • v3.2.1(Jun 28, 2021)

    Added:

    Removed:

    • vocab parameter in Pipeline.from_config as well as for the TuneExperiment

    Changed:

    • If inference encounters an unexpected error, return an empty prediction instead of a None
    Source code(tar.gz)
    Source code(zip)
  • v3.2.0(Jun 24, 2021)

    Added:

    • Added sentence splitting feature to our TransformersTokenizer

    Removed:

    • Removed PredictionError, instead simply return None

    Changed:

    • fix new Dataset methods
    Source code(tar.gz)
    Source code(zip)
  • v3.1.0(Jun 20, 2021)

    Added:

    • Added dropout to our TextClassification and DocumentClassification head
    • Added max/min_sentence_length and truncate_sentence parameters in the TokenizerConfiguration

    Removed:

    Changed:

    • changed documentation url in docs/readme
    • max_sequence_length -> truncate_input (TokenizerConfiguration)
    • bump up allennlp version to 2.5
    • bump up datasets version to 1.8
    Source code(tar.gz)
    Source code(zip)
  • v3.0.0(Jun 7, 2021)

    This release can break backward compatibility with some older models!

    Added:

    Removed:

    • completely removed allennlp trainer stuff

    Changed:

    • fix logit nans in TokenClassification
    • bump up spacy version to 3
    • bump up allennlp version to 2
    • Pipeline.evaluate method uses now pytorch lightning
    Source code(tar.gz)
    Source code(zip)
  • v2.2.0(May 7, 2021)

    Added:

    • New Lightning trainer (#543), deprecate AllenNLPTrainer

    Removed:

    • UI, explore (#557), will be handled by Rubrix.

    Changed:

    • improve to_mlflow (#534)
    • divide training/validation metrics to allow for inter-epoch validation runs (#531)
    • apply max_sequence_length also to the TransformersTokenizer (#554)
    • activate warnings (#547)
    • Improved multi label metrics for the classification heads
    Source code(tar.gz)
    Source code(zip)
  • v2.1.0(Feb 18, 2021)

    • Introduction of the TaskPrediction class that defines the output of a given task
    • improved Pipeline.predict method with add_tokens and add_attributions parameters
    • Ability to easily export your pipeline as an MLFlow model via Pipeline.to_mlflow
    • Improvement of the biome serve cli command and removal of the Pipeline.serve method
    • minor changes + bug fixes
    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Dec 29, 2020)

    • Replaced DataSource with Dataset
    • Vocab creation is now automatically done when executing Pipeline.train()
    • Introduced TuneExperiment class
    • Added the transformers feature
    • Move Pipeline.explore() command to its own module
    • Pipeline.train() modifies the pipeline inplace instead of creating a copy for the training
    • TokenClassification accepts entities
    • Added a RelationClassification head
    • A LOT if minor and not so minor changes ...
    Source code(tar.gz)
    Source code(zip)
  • 0.3.0(Apr 27, 2020)

  • 0.3.0.rc1(Apr 27, 2020)

  • v0.2.1(Dec 5, 2019)

Owner
Recognai
A software company building Natural Language Processing and Machine Learning tools
Recognai
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part

VILLA: Vision-and-Language Adversarial Training This is the official repository of VILLA (NeurIPS 2020 Spotlight). This repository currently supports

Zhe Gan 109 Dec 31, 2022
PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Feature_CRF_AE Feature_CRF_AE provides a implementation of Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging

Jacob Zhou 6 Apr 29, 2022
Quick insights from Zoom meeting transcripts using Graph + NLP

Transcript Analysis - Graph + NLP This program extracts insights from Zoom Meeting Transcripts (.vtt) using TigerGraph and NLTK. In order to run this

Advit Deepak 7 Sep 17, 2022
Creating an LSTM model to generate music

Music-Generation Creating an LSTM model to generate music music-generator Used to create basic sin wave sounds music-ai Contains the functions to conv

Jerin Joseph 2 Dec 02, 2021
This is an incredibly powerful calculator that is capable of many useful day-to-day functions.

Description ๐Ÿ’ป This is an incredibly powerful calculator that is capable of many useful day-to-day functions. Such functions include solving basic ari

Jordan Leich 37 Nov 19, 2022
Precision Medicine Knowledge Graph (PrimeKG)

PrimeKG Website | bioRxiv Paper | Harvard Dataverse Precision Medicine Knowledge Graph (PrimeKG) presents a holistic view of diseases. PrimeKG integra

Machine Learning for Medicine and Science @ Harvard 103 Dec 10, 2022
Syntax-aware Multi-spans Generation for Reading Comprehension (TASLP 2022)

SyntaxGen Syntax-aware Multi-spans Generation for Reading Comprehension (TASLP 2022) In this repo, we upload all the scripts for this work. Due to siz

Zhuosheng Zhang 3 Jun 13, 2022
100+ Chinese Word Vectors ไธŠ็™พ็ง้ข„่ฎญ็ปƒไธญๆ–‡่ฏๅ‘้‡

Chinese Word Vectors ไธญๆ–‡่ฏๅ‘้‡ ไธญๆ–‡ This project provides 100+ Chinese Word Vectors (embeddings) trained with different representations (dense and sparse),

embedding 10.4k Jan 09, 2023
translate using your voice

speech-to-text-translator Usage translate using your voice description this project makes translating a word easy, all you have to do is speak and...

1 Oct 18, 2021
CorNet Correlation Networks for Extreme Multi-label Text Classification

CorNet Correlation Networks for Extreme Multi-label Text Classification Prerequisites python==3.6.3 pytorch==1.2.0 torchgpipe==0.0.5 click==7.0 ruamel

Guangxu Xun 38 Dec 31, 2022
Code and dataset for the EMNLP 2021 Finding paper "Can NLI Models Verify QA Systemsโ€™ Predictions?"

Code and dataset for the EMNLP 2021 Finding paper "Can NLI Models Verify QA Systemsโ€™ Predictions?"

Jifan Chen 22 Oct 21, 2022
Creating a chess engine using GPT-3

GPT3Chess Creating a chess engine using GPT-3 Code for my article : https://towardsdatascience.com/gpt-3-play-chess-d123a96096a9 My game (white) vs GP

19 Dec 17, 2022
Extract rooms type, door, neibour rooms, rooms corners nad bounding boxes, and generate graph from rplan dataset

Housegan-data-reader House-GAN++ (data-reader) Code and instructions for converting rplan dataset (raster images) to housegan++ data format. House-GAN

Sepid Hosseini 13 Nov 24, 2022
NLP techniques such as named entity recognition, sentiment analysis, topic modeling, text classification with Python to predict sentiment and rating of drug from user reviews.

This file contains the following documents sumbited for Baruch CIS9665 group 9 fall 2021. 1. Dataset: drug_reviews.csv 2. python codes for text classi

Aarif Munwar Jahan 2 Jan 04, 2023
PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset

PyTorch Large-Scale Language Model A Large-Scale PyTorch Language Model trained on the 1-Billion Word (LM1B) / (GBW) dataset Latest Results 39.98 Perp

Ryan Spring 114 Nov 04, 2022
An open source library for deep learning end-to-end dialog systems and chatbots.

DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. DeepPavlov is designed for development of production re

Neural Networks and Deep Learning lab, MIPT 6k Dec 31, 2022
PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing pororo performs Natural Language Processing and Speech-related tasks. It is easy to

Kakao Brain 1.2k Dec 21, 2022
An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

pl_prompt_sst An example project using OpenPrompt under the framework of pytorch-lightning for a training prompt-based text classification model on SS

Zhiling Zhang 5 Oct 21, 2022
The proliferation of disinformation across social media has led the application of deep learning techniques to detect fake news.

Fake News Detection Overview The proliferation of disinformation across social media has led the application of deep learning techniques to detect fak

Kushal Shingote 1 Feb 08, 2022
Lyrics generation with GPT2-based Transformer

HuggingArtists - Train a model to generate lyrics Create AI-Artist in just 5 minutes! ๐Ÿš€ Run the demo notebook to train ๐Ÿš€ Run the GUI demo to test Di

Aleksey Korshuk 65 Dec 19, 2022