A number of methods in order to perform Natural Language Processing on live data derived from Twitter

Last update: Nov 24, 2021

Related tags

Overview

Twitter_NLP

Link to Project: https://twitoff-amadou.herokuapp.com/

==Description==

This project integrates a number of methods in order to perform Natural Language Processing (NLP) on live data derived from Twitter. The goal of this project is to demonstrate how NLP can be used at a basic level to classify hypertext by which Twitter user is most likely to 'tweet' (or post) it. For this project, Twitter API access had been granted, and implemented with the Tweepy wrapper for python.

To start, the web app it built using the Flask platform and is deployed on Heroku. For the functionality of the project, data is extracted from Twitter using its API and the Tweepy library and is fed into SQLAlchemy tables. These tables which hold a variety of information we're concerned with, such as the usernames and past tweeting data, are integrated with our PostgreSQL database. The Spacy library is then responsible for vectorizing our tweets into components our models can operate on. Finally, a random forest classifier is tasked with receiving and training on these vectors.

The interface of the app is quite intuitive. There are two text boxes, one labeled "User to add" and the other, "Tweet text to predict". The user is expected to type a name into the 'add' box, such that Tweepy can add the respective twitter user(s) and their tweeting data to our PostgreSQL database. Our random forest will then train live on the inputted values. Once this has been accomplished with at least two Twitter users in the database, one can add text into the 'predict' box, select the two users they wish to compare and let our model produce a result.

A number of methods in order to perform Natural Language Processing on live data derived from Twitter

Related tags

Overview

Twitter_NLP

==Description==

Owner

BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network)

Text classification on IMDB dataset using Keras and Bi-LSTM network

NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking

Community and sentiment analysis based on tweets

Codename generator using WordNet parts of speech database

Generate text line images for training deep learning OCR model (e.g. CRNN)

Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

Mesh TensorFlow: Model Parallelism Made Easier

Rhythm-Finder is a unsupervised ML driven python powered web-application that can find the songs that suits you.

wxPython app for converting encodings, modifying and fixing SRT files

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

Pipeline for training LSA models using Scikit-Learn.

nlp基础任务

Translate - a PyTorch Language Library

Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

ASCEND Chinese-English code-switching dataset

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2021).

A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).

Toward a Visual Concept Vocabulary for GAN Latent Space, ICCV 2021