APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

Last update: Dec 06, 2022

Related tags

Overview

APEACH - Korean Hate Speech Evaluation Datasets

APEACH is the first crowd-generated Korean evaluation dataset for hate speech detection. Sentences of the dataset are created by anonymous participants using an online crowdsourcing platform DeepNatural AI.

Sample Code :

Download

You can download benchmark set APEACH. APEACH/test.csv in this repository.

Dataset Description

APEACH : A hate-speech evaluation dataset generated in 2021, using generation method followd by APEACH paper.

Guidelines

APEACH-GUIDELINE

Topics

Lengths

Paper

https://arxiv.org/pdf/2202.12459.pdf

Experiment Code

Experiment Results

Name	Beep! Dev Dataset	Apeach (Ours)
SoongsilBERT-Base	0.8261	0.8424
SoongsilBERT-Small	0.8149	0.8228
KcBERT-base	0.8088	0.8086
KcBERT-large	0.8295	0.8116
DistillKoBERT	0.7570	0.7715
KoELECTRA-V3	0.7920	0.8101
KoBERT	0.8030	0.7885

We also share BEST model of our dataset which we trained in this experiment as checkpoint, demo webite and api.

Citation

@article{yang2022apeach,
  title={APEACH: Attacking Pejorative Expressions with Analysis on Crowd-Generated Hate Speech Evaluation Datasets},
  author={Yang, Kichang and Jang, Wonjun and Cho, Won Ik},
  journal={arXiv preprint arXiv:2202.12459},
  year={2022}
}

Contributors

The main contributors of the work ( * : equal contribution) :

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

Related tags

Overview

APEACH - Korean Hate Speech Evaluation Datasets

Download

Dataset Description

Guidelines

Topics

Lengths

Paper

Experiment Code

Experiment Results

Citation

Contributors

License

Owner

Kevin-Yang

Journey is a NLP-Powered Developer assistant

Official code for "Parser-Free Virtual Try-on via Distilling Appearance Flows", CVPR 2021

Multilingual finetuning of Machine Translation model on low-resource languages. Project for Deep Natural Language Processing course.

An automated program that helps customers of Pizza Palour place their pizza orders

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Modeling cumulative cases of Covid-19 in the US during the Covid 19 Delta wave using Bayesian methods.

Nmt - TensorFlow Neural Machine Translation Tutorial

Utilizing RBERT model for KLUE Relation Extraction task

Code for Text Prior Guided Scene Text Image Super-Resolution

Mednlp - Medical natural language parsing and utility library

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

Flaxformer: transformer architectures in JAX/Flax

Global Rhythm Style Transfer Without Text Transcriptions

nlp基础任务

MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification approach proposed in 2018. Unlike LSBert, MILES uses the bert-base-multilingual-uncased model, as well as simple language-agnostic approaches to complex word identification (CWI) and candidate ranking.

Dust model dichotomous performance analysis

A script that automatically creates a branch name using google translation api and jira api

Natural Language Processing with transformers

NLP-based analysis of poor Chinese movie reviews on Douban

Predict the spans of toxic posts that were responsible for the toxic label of the posts