UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation

Overview

UNION

Automatic Evaluation Metric described in the paper UNION: An UNreferenced MetrIc for Evaluating Open-eNded Story Generation (EMNLP 2020). Please refer to the Paper List for more information about Open-eNded Language Generation (ONLG) tasks. Hopefully the paper list will help you know more about this field.

Contents

Prerequisites

The code is written in TensorFlow library. To use the program the following prerequisites need to be installed.

  • Python 3.7.0
  • tensorflow-gpu 1.14.0
  • numpy 1.18.1
  • regex 2020.2.20
  • nltk 3.4.5

Computing Infrastructure

We train UNION based on the platform:

  • OS: Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-98-generic x86_64)
  • GPU: NVIDIA TITAN Xp

Quick Start

1. Constructing Negative Samples

Execute the following command:

cd ./Data
python3 ./get_vocab.py your_mode
python3 ./gen_train_data.py your_mode
  • your_mode is roc for ROCStories corpus or wp for WritingPrompts dataset. Then the summary of vocabulary and the corresponding frequency and pos-tagging will be found under ROCStories/ini_data/entitiy_vocab.txt or WritingPrompts/ini_data/entity_vocab.txt.
  • Negative samples and human-written stories will be constructed based on the original training set. The training set will be found under ROCStories/train_data or WritingPrompts/train_data.
  • Note: currently only 10 samples of the full original data and training data are provided. The full data can be downloaded from THUcloud or GoogleDrive.

2. Training of UNION

Execute the following command:

python3 ./run_union.py --data_dir your_data_dir \
    --output_dir ./model/union \
    --task_name train \
    --init_checkpoint ./model/uncased_L-12_H-768_A-12/bert_model.ckpt
  • your_data_dir is ./Data/ROCStories or ./Data/WritingPrompts.
  • The initial checkpoint of BERT can be downloaded from bert. We use the uncased base version of BERT (about 110M parameters). We train the model for 40000 steps at most. The training process will task about 1~2 days.

3. Prediction with UNION

Execute the following command:

python3 ./run_union.py --data_dir your_data_dir \
    --output_dir ./model/output \
    --task_name pred \
    --init_checkpoint your_model_name
  • your_data_dir is ./Data/ROCStories or ./Data/WritingPrompts. If you want to evaluate your custom texts, you only need tp change your file format into ours.

  • your_model_name is ./model/union_roc/union_roc or ./model/union_wp/union_wp. The fine-tuned checkpoint can be downloaded from the following link:

Dataset Fine-tuned Model
ROCStories THUcloud; GoogleDrive
WritingPrompts THUcloud; GoogleDrive
  • The union score of the stories under your_data_dir/ant_data can be found under the output_dir ./model/output.

4. Correlation Calculation

Execute the following command:

python3 ./correlation.py your_mode

Then the correlation between the human judgements under your_data_dir/ant_data and the scores of metrics under your_data_dir/metric_output will be output. The figures under "./figure" show the score graph between metric scores and human judgments for ROCStories corpus.

Data Instruction for files under ./Data

├── Data
   └── `negation.txt`             # manually constructed negation word vocabulary.
   └── `conceptnet_antonym.txt`   # triples with antonym relations extracted from ConceptNet.
   └── `conceptnet_entity.csv`    # entities acquired from ConceptNet.
   └── `ROCStories`
       ├── `ant_data`        # sampled stories and corresponding human annotation.
              └── `ant_data.txt`        # include only binary annotation for reasonable(1) or unreasonable(0)
              └── `ant_data_all.txt`    # include the annotation for specific error types: reasonable(0), repeated plots(1), bad coherence(2), conflicting logic(3), chaotic scenes(4), and others(5). 
              └── `reference.txt`       # human-written stories with the same leading context with annotated stories.
              └── `reference_ipt.txt`
              └── `reference_opt.txt`
       ├── `ini_data`        # original dataset for training/validation/testing.
              └── `train.txt`
              └── `dev.txt`
              └── `test.txt`
              └── `entity_vocab.txt`    # generated by `get_vocab.py`, consisting of all the entities and the corresponding tagged POS followed by the mention frequency in the dataset.
       ├── `train_data`      # negative samples and corresponding human-written stories for training, which are constructed by `gen_train_data.py`.
              └── `train_human.txt`
              └── `train_negative.txt`
              └── `dev_human.txt`
              └── `dev_negative.txt`
              └── `test_human.txt`
              └── `test_negative.txt`
       ├── `metric_output`   # the scores of different metrics, which can be used to replicate the correlation in Table 5 of the paper. 
              └── `bleu.txt`
              └── `bleurt.txt`
              └── `ppl.txt`             # the sign of the result of Perplexity needs to be changed to get the result for *minus* Perplexity.
              └── `union.txt`
              └── `union_recon.txt`     # the ablated model without the reconstruction task
              └── ...
   └── `WritingPrompts`
       ├── ...
 
  • The annotated data file ant_data.txt and ant_data_all.txt are formatted as Story ID ||| Story ||| Seven Annotated Scores.
  • ant_data_all.txt is only available for ROCStories corpus. ant_data_all.txt is the same with ant_data.txt for WrintingPrompts dataset.

Citation

Please kindly cite our paper if this paper and the code are helpful.

@misc{guan2020union,
    title={UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation},
    author={Jian Guan and Minlie Huang},
    year={2020},
    eprint={2009.07602},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Owner
Conversational AI groups from Tsinghua University
Rename Images with Auto Generated Neural Image Captions

Recaption Images with Generated Neural Image Caption Example Usage: Commandline: Recaption all images from folder /home/feng/Downloads/images to folde

feng wang 3 May 01, 2022
Attentive Implicit Representation Networks (AIR-Nets)

Attentive Implicit Representation Networks (AIR-Nets) Preprint | Supplementary | Accepted at the International Conference on 3D Vision (3DV) teaser.mo

29 Dec 07, 2022
RL agent to play μRTS with Stable-Baselines3

Gym-μRTS with Stable-Baselines3/PyTorch This repo contains an attempt to reproduce Gridnet PPO with invalid action masking algorithm to play μRTS usin

Oleksii Kachaiev 24 Nov 11, 2022
Testing and Estimation of structural breaks in Stata

xtbreak estimating and testing for many known and unknown structural breaks in time series and panel data. For an overview of xtbreak test see xtbreak

Jan Ditzen 13 Jun 19, 2022
Pytorch implementation of One-Shot Affordance Detection

One-shot Affordance Detection PyTorch implementation of our one-shot affordance detection models. This repository contains PyTorch evaluation code, tr

46 Dec 12, 2022
Official Implementation of "Learning Disentangled Behavior Embeddings"

DBE: Disentangled-Behavior-Embedding Official implementation of Learning Disentangled Behavior Embeddings (NeurIPS 2021). Environment requirement The

Mishne Lab 12 Sep 28, 2022
Predicting Tweet Sentiment Maching Learning and streamlit

Predicting-Tweet-Sentiment-Maching-Learning-and-streamlit (I prefere using Visual Studio Code ) Open the folder in VS Code Run the first cell in requi

1 Nov 20, 2021
ReferFormer - Official Implementation of ReferFormer

The official implementation of the paper: Language as Queries for Referring Video Object Segmentation Language as Queries for Referring Video Object S

Jonas Wu 232 Dec 29, 2022
Real-Time Multi-Contact Model Predictive Control via ADMM

Here, you can find the code for the paper 'Real-Time Multi-Contact Model Predictive Control via ADMM'. Code is currently being cleared up and optimize

17 Dec 28, 2022
Pose Transformers: Human Motion Prediction with Non-Autoregressive Transformers

Pose Transformers: Human Motion Prediction with Non-Autoregressive Transformers This is the repo used for human motion prediction with non-autoregress

Idiap Research Institute 26 Dec 14, 2022
an implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch

This work has now been superseded by: https://github.com/sniklaus/revisiting-sepconv sepconv-slomo This is a reference implementation of Video Frame I

Simon Niklaus 985 Jan 08, 2023
Repository for the semantic WMI loss

Installation: pip install -e . Installing DL2: First clone DL2 in a separate directory and install it using the following commands: git clone https:/

Nick Hoernle 4 Sep 15, 2022
Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking We revisit and address issues with Oxford 5k and Paris 6k image retrieval benchm

Filip Radenovic 188 Dec 17, 2022
Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

Self-Supervised Policy Adaptation during Deployment PyTorch implementation of PAD and evaluation benchmarks from Self-Supervised Policy Adaptation dur

Nicklas Hansen 101 Nov 01, 2022
A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

OpenAI 29.6k Jan 08, 2023
Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

XDVioDet Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020. The proj

peng 64 Dec 12, 2022
Notebooks for my "Deep Learning with TensorFlow 2 and Keras" course

Deep Learning with TensorFlow 2 and Keras – Notebooks This project accompanies my Deep Learning with TensorFlow 2 and Keras trainings. It contains the

Aurélien Geron 1.9k Dec 15, 2022
A quick recipe to learn all about Transformers

Transformers have accelerated the development of new techniques and models for natural language processing (NLP) tasks.

DAIR.AI 772 Dec 31, 2022
Video lie detector using xgboost - A video lie detector using OpenFace and xgboost

video_lie_detector_using_xgboost a video lie detector using OpenFace and xgboost

2 Jan 11, 2022
Source code for our paper "Empathetic Response Generation with State Management"

Source code for our paper "Empathetic Response Generation with State Management" this repository is maintained by both Jun Gao and Yuhan Liu Model Ove

Yuhan Liu 3 Oct 08, 2022