NLP - Machine learning

Last update: Oct 29, 2021

Overview

Flipkart-product-reviews

NLP - Machine learning

About

Product reviews is an essential part of an online store like Flipkart’s branding and marketing. They help to build trust and loyalty and typically describe what sets your product apart from others. Savvy shoppers almost never purchase a product without knowing how it’s going to work for them. The more reviews a platform has, the more convinced a user will be that he/she is making the right decision.

Online reviews are very important to e-commerce businesses because they ultimately increase sales by giving the consumers the information they need to make the decision to purchase the product. One other important factor in elevating the reputation, standard, and evaluation of an e-commerce store is product rating.

NLP

Natural Language Processing (NLP) helps machines “read” text by simulating the human ability to understand language. It is a field of Artificial Intelligence that gives machines the ability to read, understand and derive meaning from human languages.

#1 Tokenization

Tokenization is the process of breaking down sentence or paragraphs into smaller chunks of words called tokens.

#2 Stop Words Removal

On removal of some words, the meaning of the sentence doesn't change, like and, am. Those words are called stop-words and should be removed before feeding to any algorithm. In datasets, some non-stop words repeat very frequently. Those words too should be removed to get an unbiased result from the algorithm.

#3 Vectorization

After tokenization, and stop words removal, our "content" are still in string format. We need to convert those strings to numbers based on their importance (features). We use TF-IDF vectorization to convert those text to vector of importance. With TF-IDF we can extract important words in our data. It assign rarely occurring words a high number, and frequently occurring words a very low number.

Topic Modelling - LDA

Topic modeling in python involves counting words and grouping similar word patterns to infer topics within unstructured data. Let’s take the example of Flipkart where you might want to know what customers are saying about a particular product from x seller. Instead of spending hours to find out the best-reviewed product through heaps of feedback, you can analyze them with a topic modeling algorithm.

By detecting patterns such as word frequency and distance between words, a topic model clusters feedback that is similar, and words and expressions that appear most often. With this information, you can quickly deduce what each set of texts are talking about.

There are various topic in modelling algorithms, we will be using the Latent Dirichlet Allocation algorithm(LDA).

NLP - Machine learning

Related tags

Overview

Flipkart-product-reviews

About

NLP

#1 Tokenization

#2 Stop Words Removal

#3 Vectorization

Topic Modelling - LDA

Owner

Harshith VH

Pytorch-Named-Entity-Recognition-with-BERT

👄 The most accurate natural language detection library for Python, suitable for long and short text alike

Index different CKAN entities in Solr, not just datasets

A deep learning-based translation library built on Huggingface transformers

Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

API for the GPT-J language model 🦜. Including a FastAPI backend and a streamlit frontend

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含自然语言处理各领域的面试题积累。

A very simple framework for state-of-the-art Natural Language Processing (NLP)

The RWKV Language Model

Big Bird: Transformers for Longer Sequences

硕士期间自学的NLP子任务，供学习参考

Pipeline for training LSA models using Scikit-Learn.

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Repository for the paper "Optimal Subarchitecture Extraction for BERT"

A raytrace framework using taichi language

SimCTG - A Contrastive Framework for Neural Text Generation

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN

vits chinese, tts chinese, tts mandarin

Beyond the Imitation Game collaborative benchmark for enormous language models

NLP - Machine learning

Related tags

Overview

Flipkart-product-reviews

About

NLP

#1 Tokenization

#2 Stop Words Removal

#3 Vectorization

Topic Modelling - LDA

Owner

Harshith VH

Pytorch-Named-Entity-Recognition-with-BERT

👄 The most accurate natural language detection library for Python, suitable for long and short text alike

Index different CKAN entities in Solr, not just datasets

A deep learning-based translation library built on Huggingface transformers

Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

API for the GPT-J language model 🦜. Including a FastAPI backend and a streamlit frontend

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含 自然语言处理各领域的 面试题积累。

A very simple framework for state-of-the-art Natural Language Processing (NLP)

The RWKV Language Model

Big Bird: Transformers for Longer Sequences

硕士期间自学的NLP子任务，供学习参考

Pipeline for training LSA models using Scikit-Learn.

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Repository for the paper "Optimal Subarchitecture Extraction for BERT"

A raytrace framework using taichi language

SimCTG - A Contrastive Framework for Neural Text Generation

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN

vits chinese, tts chinese, tts mandarin

Beyond the Imitation Game collaborative benchmark for enormous language models

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含自然语言处理各领域的面试题积累。