Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Overview

keytotext

pypi Version Downloads Open In Colab Streamlit App API Call Docker Call HuggingFace Documentation Status Code style: black CodeFactor

keytotext

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Potential use case can include:

  • Marketing
  • Search Engine Optimization
  • Topic generation etc.
  • Fine tuning of topic modeling models

Model:

Keytotext is based on the Amazing T5 Model: HuggingFace

  • k2t: Model
  • k2t-base: Model
  • mrm8488/t5-base-finetuned-common_gen (by Manuel Romero): Model

Training Notebooks can be found in the Training Notebooks Folder

Note: To add your own model to keytotext Please read Models Documentation

Usage:

Example usage: Open In Colab

Example Notebooks can be found in the Notebooks Folder

pip install keytotext

carbon (3)

Trainer:

Keytotext now has a trainer class than be used to train and finetune any T5 based model on new data. Updated Trainer docs here: Docs

Trainer example here: Open In Colab

from keytotext import trainer

carbon (6)

UI:

UI: Streamlit App

pip install streamlit-tags

This uses a custom streamlit component built by me: GitHub

image

API:

API: API Call Docker Call

The API is hosted in the Docker container and it can be run quickly. Follow instructions below to get started

docker pull gagan30/keytotext

docker run -dp 8000:8000 gagan30/keytotext

This will start the api at port 8000 visit the url below to get the results as below:

http://localhost:8000/api?data=["India","Capital","New Delhi"]

k2t_json

Note: The Hosted API is only available on demand

BibTex:

To quote keytotext please use this citation

@misc{bhatia, 
      title={keytotext},
      url={https://github.com/gagan3012/keytotext}, 
      journal={GitHub}, 
      author={Bhatia, Gagan}
}

References

Articles about keytotext:

Comments
  • ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none)

    ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none)

    Hi,

    I tried to install keytotext via pip install keytotext --upgrade in local machine.

    but came across the following :

    ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none)
    ERROR: No matching distribution found for keytotext
    

    My pip version is the latest. However, the above works just fine in colab. Please guide me through the fix?

    opened by abhijithneilabraham 6
  • Add finetuning model to keytotext

    Add finetuning model to keytotext

    Is your feature request related to a problem? Please describe. Its difficult to use it without fine-tuning on new corpus so we need to build script to finetune it on new corpus

    enhancement good first issue 
    opened by gagan3012 2
  • "Oh no." ?

    "Error running app. If this keeps happening, please file an issue."

    Ok,...sure? I know nothing about this app.

    Just saw your tweet, clicked the link to this repo, then clicked the link on the side. Got that message. Now what?

    Chrome browser, Linux.

    opened by drscotthawley 2
  • Add Citations

    Add Citations

    Is your feature request related to a problem? Please describe. Inspirations: https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45

    Describe the solution you'd like A clear and concise description of what you want to happen.

    Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

    Additional context Add any other context or screenshots about the feature request here.

    opened by gagan3012 1
  • Adding new models to keytotext

    Adding new models to keytotext

    Is your feature request related to a problem? Please describe. Adding new models to keytotext: https://huggingface.co/mrm8488/t5-base-finetuned-common_gen

    Describe the solution you'd like A clear and concise description of what you want to happen.

    Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

    Additional context Add any other context or screenshots about the feature request here.

    enhancement good first issue 
    opened by gagan3012 1
  • Inference API for Keytotext

    Inference API for Keytotext

    Is your feature request related to a problem? Please describe. It is difficult to host the UI on streamlit without API

    Describe the solution you'd like Inference API

    enhancement good first issue 
    opened by gagan3012 1
  • Create Better UI

    Create Better UI

    Is your feature request related to a problem? Please describe. The current UI is not functional It needs to be fixed

    Describe the solution you'd like Better UI with a nicer design

    enhancement 
    opened by gagan3012 1
  • Add `st.cache` to load model

    Add `st.cache` to load model

    Hi @gagan3012,

    Johannes from the Streamlit team here :) I am currently investigating why apps run over the resource limits of Streamlit Sharing and saw that your app was affected in the past few days.

    Thought I'd send you a small PR which should fix this. You've already been on a good way with using st.cache but it gets even better if you use it once more to load the model. This makes sure the model and tokenizer are only loaded once, which should make the app consume less memory (and not run into resource limits again! Plus, I've seen that it also works a bit faster now ;).

    Hope this works for you and let me know if you have any other questions! 🎈

    Cheers, Johannes

    opened by jrieke 1
  • ValueError: transformers.models.auto.__spec__ is None

    ValueError: transformers.models.auto.__spec__ is None

    'from keytotext import pipeline'

    While running the above line, it is showing this error . "ValueError: transformers.models.auto.spec is None"

    opened by varunakk 0
  • Update README.md

    Update README.md

    Description

    Motivation and Context

    How Has This Been Tested?

    Screenshots (if appropriate):

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [ ] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [ ] I have read the CONTRIBUTING document.
    opened by gagan3012 0
  • Update trainer.py

    Update trainer.py

    Description

    Motivation and Context

    How Has This Been Tested?

    Screenshots (if appropriate):

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [ ] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [ ] I have read the CONTRIBUTING document.
    opened by gagan3012 0
  • Pipeline error on fresh install

    Pipeline error on fresh install

    Hi I'm getting this on a first run and fresh install

    Global seed set to 42 Traceback (most recent call last): File "C:\Users\skint\PycharmProjects\spacynd2\testdata.py", line 1, in <module> from keytotext import pipeline File "C:\Users\skint\venv\lib\site-packages\keytotext\__init__.py", line 11, in <module> from .dataset import make_dataset File "C:\Users\skint\venv\lib\site-packages\keytotext\dataset.py", line 1, in <module> from cv2 import randShuffle ModuleNotFoundError: No module named 'cv2'

    opened by skintflickz 0
  • New TypeError: __init__() got an unexpected keyword argument 'progress_bar_refresh_rate'

    New TypeError: __init__() got an unexpected keyword argument 'progress_bar_refresh_rate'

    I have imported the model and necessary libraries. I am getting the below error in google colab. I have used this model earlier also few months back and it was working fine. This is the new issue I am facing recently with the same code.


    TypeError: init() got an unexpected keyword argument 'progress_bar_refresh_rate'

    Imported libraries:

    !pip install keytotext --upgrade !sudo apt-get install git-lfs

    from keytotext import trainer

    Training Model:

    model = trainer() model.from_pretrained(model_name="t5-small") model.train(train_df=df_train_final, test_df=df_test, batch_size=3, max_epochs=5,use_gpu=True) model.save_model()

    Have attached error screenshot

    • OS: Windows
    • Browser Chrome Error
    opened by aishwaryapisal9 2
  • Update trainer.py

    Update trainer.py

    Delete progress_bar_refresh_rate in trainer.py

    Description

    delete progress_bar_refresh_rate=5, since this keyword argument is no longer supported by the latest version (1.7.0) of PyTorch.Lightning.Trainer module

    Motivation and Context

    having this argument fails the training process

    How Has This Been Tested?

    Ran key to text on the custom dataset before and after August 2nd, 2022. Changes in the new version of Pytorch Lightning's Trainer were put into effect on that date where the above argument was removed and hence, the custom training failed since that day.

    Screenshots (if appropriate):

    Types of changes

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [x] My code follows the code style of this project.
    • [x] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [ ] I have read the CONTRIBUTING document.
    opened by anath2110benten 0
  • Why is cv2 required?

    Why is cv2 required?

    https://github.com/gagan3012/keytotext/blob/6f807b940f5e2fdeb755ed085b40af7c0fa5e87e/keytotext/dataset.py#L1

    I'm using this framework to generate text from knowlege graph. Python interpreter keeps throwing "cv2 not installed" exception. Looks like the pip package doesn't contains cv2 as dependancy. I tried to delete this line in source code, the model works well. Is this line necessary for this project? Concerning about adding opencv to pip package? Thanks for your concern.

    opened by ChunxuYang 0
  • Hi, I notice that given the same input keywords, across different runs, the generated text are the same, even setting different seeds by 'pl.seed_everything(..)'.

    Hi, I notice that given the same input keywords, across different runs, the generated text are the same, even setting different seeds by 'pl.seed_everything(..)'.

    Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

    Describe the solution you'd like A clear and concise description of what you want to happen.

    Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

    Additional context Add any other context or screenshots about the feature request here.

    opened by RuiFeiHe 6
Releases(v1.5.0)
Owner
Gagan Bhatia
Software Developer | Machine Learning Enthusiast
Gagan Bhatia
translate using your voice

speech-to-text-translator Usage translate using your voice description this project makes translating a word easy, all you have to do is speak and...

1 Oct 18, 2021
☀️ Measuring the accuracy of BBC weather forecasts in Honolulu, USA

Accuracy of BBC Weather forecasts for Honolulu This repository records the forecasts made by BBC Weather for the city of Honolulu, USA. Essentially, t

Max Halford 12 Oct 15, 2022
A repo for materials relating to the tutorial of CS-332 NLP

CS-332-NLP A repo for materials relating to the tutorial of CS-332 NLP Contents Tutorial 1: Introduction Corpus Regular expression Tokenization Tutori

Alok singh 9 Feb 15, 2022
Words-per-minute - A terminal app written in python utilizing the curses module that tests the user's ability to type

words-per-minute A terminal app written in python utilizing the curses module th

Tanim Islam 1 Jan 14, 2022
Official Stanford NLP Python Library for Many Human Languages

Official Stanford NLP Python Library for Many Human Languages

Stanford NLP 6.4k Jan 02, 2023
Ceaser-Cipher - The Caesar Cipher technique is one of the earliest and simplest method of encryption technique

Ceaser-Cipher The Caesar Cipher technique is one of the earliest and simplest me

Lateefah Ajadi 2 May 12, 2022
Library for fast text representation and classification.

fastText fastText is a library for efficient learning of word representations and sentence classification. Table of contents Resources Models Suppleme

Facebook Research 24.1k Jan 05, 2023
chaii - hindi & tamil question answering

chaii - hindi & tamil question answering This is the solution for rank 5th in Kaggle competition: chaii - Hindi and Tamil Question Answering. The comp

abhishek thakur 33 Dec 18, 2022
Speech Recognition Database Management with python

Speech Recognition Database Management The main aim of this project is to recogn

Abhishek Kumar Jha 2 Feb 02, 2022
🍊 PAUSE (Positive and Annealed Unlabeled Sentence Embedding), accepted by EMNLP'2021 🌴

PAUSE: Positive and Annealed Unlabeled Sentence Embedding Sentence embedding refers to a set of effective and versatile techniques for converting raw

EQT 21 Dec 15, 2022
Perform sentiment analysis on textual data that people generally post on websites like social networks and movie review sites.

Sentiment Analyzer The goal of this project is to perform sentiment analysis on textual data that people generally post on websites like social networ

Madhusudan.C.S 53 Mar 01, 2022
Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

Disfl-QA is a targeted dataset for contextual disfluencies in an information seeking setting, namely question answering over Wikipedia passages. Disfl-QA builds upon the SQuAD-v2 (Rajpurkar et al., 2

Google Research Datasets 52 Jun 21, 2022
RecipeReduce: Simplified Recipe Processing for Lazy Programmers

RecipeReduce This repo will help you figure out the amount of ingredients to buy for a certain number of meals with selected recipes. RecipeReduce Get

Qibin Chen 9 Apr 22, 2022
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 26 Dec 14, 2022
This is the 25 + 1 year anniversary version of the 1995 Rachford-Rice contest

Rachford-Rice Contest This is the 25 + 1 year anniversary version of the 1995 Rachford-Rice contest. Can you solve the Rachford-Rice problem for all t

13 Sep 20, 2022
A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

wav2vec-toolkit A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models This repository accompanies the

Anton Lozhkov 29 Oct 23, 2022
Honor's thesis project analyzing whether the GPT-2 model can more effectively generate free-verse or structured poetry.

gpt2-poetry The following code is for my senior honor's thesis project, under the guidance of Dr. Keith Holyoak at the University of California, Los A

Ashley Kim 2 Jan 09, 2022
DaCy: The State of the Art Danish NLP pipeline using SpaCy

DaCy: A SpaCy NLP Pipeline for Danish DaCy is a Danish preprocessing pipeline trained in SpaCy. At the time of writing it has achieved State-of-the-Ar

Kenneth Enevoldsen 71 Jan 06, 2023
TaCL: Improve BERT Pre-training with Token-aware Contrastive Learning

TaCL: Improve BERT Pre-training with Token-aware Contrastive Learning

Yixuan Su 26 Oct 17, 2022
Ongoing research training transformer language models at scale, including: BERT & GPT-2

What is this fork of Megatron-LM and Megatron-DeepSpeed This is a detached fork of https://github.com/microsoft/Megatron-DeepSpeed, which in itself is

BigScience Workshop 316 Jan 03, 2023