IMDB film review sentiment classification based on BERT's supervised learning model.

Overview

IMDB-sentiment-classification-based-on-BERT

IMDB film review sentiment classification based on BERT's supervised learning model. On the other hand, the model can be extended to other natural language multi-classification tasks.


Documents description

ALL_OUTPUT:4组实验运行结果。

BERT_BASE_DIR:谷歌预训练BERT模型文件。

DATA_DIR:训练集、验证集、测试集文件。

output_models:空文件夹,运行程序时存储输出文件。

Raw_Data:原始数据集以及数据预处理过程涉及到的一些数据文件。

IMDB Parameters:运行‘run.py’文件时需将该文件中的参数传入程序。

run.py:训练、验证、测试模型时运行的文件。


Parameters

--task_name=mrpc
--do_train=true
--do_eval=true
--do_predict=true
--data_dir=D:\XXXXXXX\IMDB\DATA_DIR
--vocab_file=BERT_BASE_DIR/vocab.txt
--bert_config_file=BERT_BASE_DIR/bert_config.json
--init_checkpoint=BERT_BASE_DIR/bert_model.ckpt
--max_seq_length=30
--train_batch_size=1
--learning_rate=2e-6
--num_train_epochs=5
--output_dir=output_models/


The optimization result

Accuracy:
0.922(validation set)
0.928(test set)


File distribution

image

You might also like...
An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

pl_prompt_sst An example project using OpenPrompt under the framework of pytorch-lightning for a training prompt-based text classification model on SS

AI-powered literature discovery and review engine for medical/scientific papers
AI-powered literature discovery and review engine for medical/scientific papers

AI-powered literature discovery and review engine for medical/scientific papers paperai is an AI-powered literature discovery and review engine for me

Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.
Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.

Predicting Yelp Review Quality Table of Contents Introduction Motivation Goal and Central Questions The Data Data Storage and ETL EDA Data Pipeline Da

When doing audio and video sentiment recognition, I found that a lot of code is duplicated, often a function in different time debugging for a long time, based on this problem, I want to manage all the previous work, organized into an open source library can be iterative. For their own use and others.
Implementation of paper Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa.

RoBERTaABSA This repo contains the code for NAACL 2021 paper titled Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoB

A paper list for aspect based sentiment analysis.

Aspect-Based-Sentiment-Analysis A paper list for aspect based sentiment analysis. Survey [IEEE-TAC-20]: Issues and Challenges of Aspect-based Sentimen

MRC approach for Aspect-based Sentiment Analysis (ABSA)
MRC approach for Aspect-based Sentiment Analysis (ABSA)

B-MRC MRC approach for Aspect-based Sentiment Analysis (ABSA) Paper: Bidirectional Machine Reading Comprehension for Aspect Sentiment Triplet Extracti

This project uses unsupervised machine learning to identify correlations between daily inoculation rates in the USA and twitter sentiment in regards to COVID-19.
This project uses unsupervised machine learning to identify correlations between daily inoculation rates in the USA and twitter sentiment in regards to COVID-19.

Twitter COVID-19 Sentiment Analysis Members: Christopher Bach | Khalid Hamid Fallous | Jay Hirpara | Jing Tang | Graham Thomas | David Wetherhold Pro

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Releases(IMDB-big-file-v1.0)
Owner
Paris
Paris
Lumped-element impedance calculator and frequency-domain plotter.

fastZ: Lumped-Element Impedance Calculator fastZ is a small tool for calculating and visualizing electrical impedance in Python. Features include: Sup

Wesley Hileman 47 Nov 18, 2022
Edge-Augmented Graph Transformer

Edge-augmented Graph Transformer Introduction This is the official implementation of the Edge-augmented Graph Transformer (EGT) as described in https:

Md Shamim Hussain 21 Dec 14, 2022
Constituency Tree Labeling Tool

Constituency Tree Labeling Tool The purpose of this package is to solve the constituency tree labeling problem. Look from the dataset labeled by NLTK,

张宇 6 Dec 20, 2022
Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

Expediting Vision Transformers via Token Reorganizations This repository contain

Youwei Liang 101 Dec 26, 2022
Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

AAGCN-ACSA EMNLP 2021 Introduction This repository was used in our paper: Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment An

Akuchi 36 Dec 18, 2022
Searching keywords in PDF file folders

keyword_searching Steps to use this Python scripts: (1)Paste this script into the file folder containing the PDF files you need to search from; (2)Thi

1 Nov 08, 2021
A desktop GUI providing an audio interface for GPT3.

Jabberwocky neil_degrasse_tyson_with_audio.mp4 Project Description This GUI provides an audio interface to GPT-3. My main goal was to provide a conven

16 Nov 27, 2022
Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

Mask-Align: Self-Supervised Neural Word Alignment This is the implementation of our work Mask-Align: Self-Supervised Neural Word Alignment. @inproceed

THUNLP-MT 46 Dec 15, 2022
Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision Training Efficiency We show the training efficiency of our DSLP model b

Chenyang Huang 37 Jan 04, 2023
kochat

Kochat 챗봇 빌더는 성에 안차고, 자신만의 딥러닝 챗봇 애플리케이션을 만드시고 싶으신가요? Kochat을 이용하면 손쉽게 자신만의 딥러닝 챗봇 애플리케이션을 빌드할 수 있습니다. # 1. 데이터셋 객체 생성 dataset = Dataset(ood=True) #

1 Oct 25, 2021
TweebankNLP - Pre-trained Tweet NLP Pipeline (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Models + Tweebank-NER

TweebankNLP This repo contains the new Tweebank-NER dataset and off-the-shelf Twitter-Stanza pipeline for state-of-the-art Tweet NLP, as described in

Laboratory for Social Machines 84 Dec 20, 2022
Abhijith Neil Abraham 2 Nov 05, 2021
Binary LSTM model for text classification

Text Classification The purpose of this repository is to create a neural network model of NLP with deep learning for binary classification of texts re

Nikita Elenberger 1 Mar 11, 2022
A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

spaCyOpenTapioca A spaCy wrapper of OpenTapioca for named entity linking on Wikidata. Table of contents Installation How to use Local OpenTapioca Vizu

Universitätsbibliothek Mannheim 80 Jan 03, 2023
PyTorch implementation of NATSpeech: A Non-Autoregressive Text-to-Speech Framework

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

760 Jan 03, 2023
Voilà turns Jupyter notebooks into standalone web applications

Rendering of live Jupyter notebooks with interactive widgets. Introduction Voilà turns Jupyter notebooks into standalone web applications. Unlike the

Voilà Dashboards 4.5k Jan 03, 2023
Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

keytotext Idea is to build a model which will take keywords as inputs and generate sentences as outputs. Potential use case can include: Marketing Sea

Gagan Bhatia 364 Jan 03, 2023
Pretty-doc - Composable text objects with python

pretty-doc from __future__ import annotations from dataclasses import dataclass

Taine Zhao 2 Jan 17, 2022
Sentiment Analysis Project using Count Vectorizer and TF-IDF Vectorizer

Sentiment Analysis Project This project contains two sentiment analysis programs for Hotel Reviews using a Hotel Reviews dataset from Datafiniti. The

Simran Farrukh 0 Mar 28, 2022
CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

CPT This repository contains code and checkpoints for CPT. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Gener

fastNLP 342 Jan 05, 2023