Deal or No Deal? End-to-End Learning for Negotiation Dialogues

Overview

Introduction

This is a PyTorch implementation of the following research papers:

The code is developed by Facebook AI Research.

The code trains neural networks to hold negotiations in natural language, and allows reinforcement learning self play and rollout-based planning.

Citation

If you want to use this code in your research, please cite:

@inproceedings{DBLP:conf/icml/YaratsL18,
  author    = {Denis Yarats and
               Mike Lewis},
  title     = {Hierarchical Text Generation and Planning for Strategic Dialogue},
  booktitle = {Proceedings of the 35th International Conference on Machine Learning,
               {ICML} 2018, Stockholmsm{\"{a}}ssan, Stockholm, Sweden, July
               10-15, 2018},
  pages     = {5587--5595},
  year      = {2018},
  crossref  = {DBLP:conf/icml/2018},
  url       = {http://proceedings.mlr.press/v80/yarats18a.html},
  timestamp = {Fri, 13 Jul 2018 14:58:25 +0200},
  biburl    = {https://dblp.org/rec/bib/conf/icml/YaratsL18},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Dataset

We release our dataset together with the code, you can find it under data/negotiate. This dataset consists of 5808 dialogues, based on 2236 unique scenarios. Take a look at §2.3 of the paper to learn about data collection.

Each dialogue is converted into two training examples in the dataset, showing the complete conversation from the perspective of each agent. The perspectives differ on their input goals, output choice, and in special tokens marking whether a statement was read or written. See §3.1 for the details on data representation.

# Perspective of Agent 1
<input> 1 4 4 1 1 2 </input>
<dialogue> THEM: i would like 4 hats and you can have the rest . <eos> YOU: deal <eos> THEM: <selection> </dialogue>
<output> item0=1 item1=0 item2=1 item0=0 item1=4 item2=0 </output> 
<partner_input> 1 0 4 2 1 2 </partner_input>

# Perspective of Agent 2
<input> 1 0 4 2 1 2 </input>
<dialogue> YOU: i would like 4 hats and you can have the rest . <eos> THEM: deal <eos> YOU: <selection> </dialogue>
<output> item0=0 item1=4 item2=0 item0=1 item1=0 item2=1 </output>
<partner_input> 1 4 4 1 1 2 </partner_input>

Setup

All code was developed with Python 3.0 on CentOS Linux 7, and tested on Ubuntu 16.04. In addition, we used PyTorch 1.0.0, CUDA 9.0, and Visdom 0.1.8.4.

We recommend to use Anaconda. In order to set up a working environment follow the steps below:

# Install anaconda
conda create -n py30 python=3 anaconda
# Activate environment
source activate py30
# Install PyTorch
conda install pytorch torchvision cuda90 -c pytorch
# Install Visdom if you want to use visualization
pip install visdom

Usage

Supervised Training

Action Classifier

We use an action classifier to compare performance of various models. The action classifier is described in section 3 of (2). It can be trained by running the following command:

python train.py \
--cuda \
--bsz 16 \
--clip 2.0 \
--decay_every 1 \
--decay_rate 5.0 \
--domain object_division \
--dropout 0.1 \
--init_range 0.2 \
--lr 0.001 \
--max_epoch 7 \
--min_lr 1e-05 \
--model_type selection_model \
--momentum 0.1 \
--nembed_ctx 128 \
--nembed_word 128 \
--nhid_attn 128 \
--nhid_ctx 64 \
--nhid_lang 128 \
--nhid_sel 128 \
--nhid_strat 256 \
--unk_threshold 20 \
--skip_values \
--sep_sel \
--model_file selection_model.th

Baseline RNN Model

This is the baseline RNN model that we describe in (1):

python train.py \
--cuda \
--bsz 16 \
--clip 0.5 \
--decay_every 1 \
--decay_rate 5.0 \
--domain object_division \
--dropout 0.1 \
--model_type rnn_model \
--init_range 0.2 \
--lr 0.001 \
--max_epoch 30 \
--min_lr 1e-07 \
--momentum 0.1 \
--nembed_ctx 64 \
--nembed_word 256 \
--nhid_attn 64 \
--nhid_ctx 64 \
--nhid_lang 128 \
--nhid_sel 128 \
--sel_weight 0.6 \
--unk_threshold 20 \
--sep_sel \
--model_file rnn_model.th

Hierarchical Latent Model

In this section we provide guidelines on how to train the hierarchical latent model from (2). The final model requires two sub-models: the clustering model, which learns compact representations over intents; and the language model, which translates intent representations into language. Please read sections 5 and 6 of (2) for more details.

Clustering Model

python train.py \
--cuda \
--bsz 16 \
--clip 2.0 \
--decay_every 1 \
--decay_rate 5.0 \
--domain object_division \
--dropout 0.2 \
--init_range 0.3 \
--lr 0.001 \
--max_epoch 15 \
--min_lr 1e-05 \
--model_type latent_clustering_model \
--momentum 0.1 \
--nembed_ctx 64 \
--nembed_word 256 \
--nhid_ctx 64 \
--nhid_lang 256 \
--nhid_sel 128 \
--nhid_strat 256 \
--unk_threshold 20 \
--num_clusters 50 \
--sep_sel \
--skip_values \
--nhid_cluster 256 \
--selection_model_file selection_model.th \
--model_file clustering_model.th

Language Model

python train.py \
--cuda \
--bsz 16 \
--clip 2.0 \
--decay_every 1 \
--decay_rate 5.0 \
--domain object_division \
--dropout 0.1 \
--init_range 0.2 \
--lr 0.001 \
--max_epoch 15 \
--min_lr 1e-05 \
--model_type latent_clustering_language_model \
--momentum 0.1 \
--nembed_ctx 64 \
--nembed_word 256 \
--nhid_ctx 64 \
--nhid_lang 256 \
--nhid_sel 128 \
--nhid_strat 256 \
--unk_threshold 20 \
--num_clusters 50 \
--sep_sel \
--nhid_cluster 256 \
--skip_values \
--selection_model_file selection_model.th \
--cluster_model_file clustering_model.th \
--model_file clustering_language_model.th

Full Model

python train.py \
--cuda \
--bsz 16 \
--clip 2.0 \
--decay_every 1 \
--decay_rate 5.0 \
--domain object_division \
--dropout 0.2 \
--init_range 0.3 \
--lr 0.001 \
--max_epoch 10 \
--min_lr 1e-05 \
--model_type latent_clustering_prediction_model \
--momentum 0.2 \
--nembed_ctx 64 \
--nembed_word 256 \
--nhid_ctx 64 \
--nhid_lang 256 \
--nhid_sel 128 \
--nhid_strat 256 \
--unk_threshold 20 \
--num_clusters 50 \
--sep_sel \
--selection_model_file selection_model.th \
--lang_model_file clustering_language_model.th \
--model_file full_model.th

Selfplay

If you want to have two pretrained models to negotiate against each another, use selfplay.py. For example, lets have two rnn models to play against each other:

python selfplay.py \
--cuda \
--alice_model_file rnn_model.th \
--bob_model_file rnn_model.th \
--context_file data/negotiate/selfplay.txt  \
--temperature 0.5 \
--selection_model_file selection_model.th

The script will output generated dialogues, as well as some statistics. For example:

================================================================================
Alice : book=(count:3 value:1) hat=(count:1 value:5) ball=(count:1 value:2)
Bob   : book=(count:3 value:1) hat=(count:1 value:1) ball=(count:1 value:6)
--------------------------------------------------------------------------------
Alice : i would like the hat and the ball . <eos>
Bob   : i need the ball and the hat <eos>
Alice : i can give you the ball and one book . <eos>
Bob   : i can't make a deal without the ball <eos>
Alice : okay then i will take the hat and the ball <eos>
Bob   : okay , that's fine . <eos>
Alice : <selection>
Alice : book=0 hat=1 ball=1 book=3 hat=0 ball=0
Bob   : book=3 hat=0 ball=0 book=0 hat=1 ball=1
--------------------------------------------------------------------------------
Agreement!
Alice : 7 points
Bob   : 3 points
--------------------------------------------------------------------------------
dialog_len=4.47 sent_len=6.93 agree=86.67% advantage=3.14 time=2.069s comb_rew=10.93 alice_rew=6.93 alice_sel=60.00% alice_unique=26 bob_rew=4.00 bob_sel=40.00% bob_unique=25 full_match=0.78 
--------------------------------------------------------------------------------
debug: 3 1 1 5 1 2 item0=0 item1=1 item2=1
debug: 3 1 1 1 1 6 item0=3 item1=0 item2=0
================================================================================

Reinforcement Learning

To fine-tune a pretrained model with RL use the reinforce.py script:

python reinforce.py \
--cuda \
--alice_model_file rnn_model.th \
--bob_model_file rnn_model.th \
--output_model_file rnn_rl_model.th \
--context_file data/negotiate/selfplay.txt  \
--temperature 0.5 \
--verbose \
--log_file rnn_rl.log \
--sv_train_freq 4 \
--nepoch 4 \
--selection_model_file selection_model.th  \
--rl_lr 0.00001 \
--rl_clip 0.0001 \
--sep_sel

License

This project is licenced under CC-by-NC, see the LICENSE file for details.

Owner
Facebook Research
Facebook Research
Off-policy continuous control in PyTorch, with RDPG, RTD3 & RSAC

arXiv technical report soon available. we are updating the readme to be as comprehensive as possible Please ask any questions in Issues, thanks. Intro

Zhihan 31 Dec 30, 2022
ImageNet Adversarial Image Evaluation

ImageNet Adversarial Image Evaluation This repository contains the code and some materials used in the experimental work presented in the following pa

Utku Ozbulak 11 Dec 26, 2022
Joint Learning of 3D Shape Retrieval and Deformation, CVPR 2021

Joint Learning of 3D Shape Retrieval and Deformation Joint Learning of 3D Shape Retrieval and Deformation Mikaela Angelina Uy, Vladimir G. Kim, Minhyu

Mikaela Uy 38 Oct 18, 2022
Vector Quantization, in Pytorch

Vector Quantization - Pytorch A vector quantization library originally transcribed from Deepmind's tensorflow implementation, made conveniently into a

Phil Wang 665 Jan 08, 2023
CVPR 2021: "The Spatially-Correlative Loss for Various Image Translation Tasks"

Spatially-Correlative Loss arXiv | website We provide the Pytorch implementation of "The Spatially-Correlative Loss for Various Image Translation Task

Chuanxia Zheng 89 Jan 04, 2023
Bayesian dessert for Lasagne

Gelato Bayesian dessert for Lasagne Recent results in Bayesian statistics for constructing robust neural networks have proved that it is one of the be

Maxim Kochurov 84 May 11, 2020
Torchlight2 lan game server tool - A message forwarding tool for Torchlight 2 lan game

Torchlight 2 Lan Game Server Tool A message forwarding tool for Torchlight 2 lan

Huaijun Jiang 3 Nov 01, 2022
I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)

An Image Captioning codebase This is a codebase for image captioning research. It supports: Self critical training from Self-critical Sequence Trainin

Ruotian(RT) Luo 1.3k Dec 31, 2022
AoT is a system for automatically generating off-target test harness by using build information.

AoT: Auto off-Target Automatically generating off-target test harness by using build information. Brought to you by the Mobile Security Team at Samsun

Samsung 10 Oct 19, 2022
Deep learning library featuring a higher-level API for TensorFlow.

TFLearn: Deep learning library featuring a higher-level API for TensorFlow. TFlearn is a modular and transparent deep learning library built on top of

TFLearn 9.6k Jan 02, 2023
Steerable discovery of neural audio effects

Steerable discovery of neural audio effects Christian J. Steinmetz and Joshua D. Reiss Abstract Applications of deep learning for audio effects often

Christian J. Steinmetz 182 Dec 29, 2022
SegNet model implemented using keras framework

keras-segnet Implementation of SegNet-like architecture using keras. Current version doesn't support index transferring proposed in SegNet article, so

185 Aug 30, 2022
Live Hand Tracking Using Python

Live-Hand-Tracking-Using-Python Project Description: In this project, we will be

Hassan Shahzad 2 Jan 06, 2022
Pixel-Perfect Structure-from-Motion with Featuremetric Refinement (ICCV 2021, Oral)

Pixel-Perfect Structure-from-Motion (ICCV 2021 Oral) We introduce a framework that improves the accuracy of Structure-from-Motion by refining keypoint

Computer Vision and Geometry Lab 831 Dec 29, 2022
CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model with Transformer

CycleTransGAN-EVC CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model with Transformer Demo emotion CycleTransGAN CycleTransGAN Cycle

24 Dec 15, 2022
A MNIST-like fashion product database. Benchmark

Fashion-MNIST Table of Contents Why we made Fashion-MNIST Get the Data Usage Benchmark Visualization Contributing Contact Citing Fashion-MNIST License

Zalando Research 10.5k Jan 08, 2023
General Multi-label Image Classification with Transformers

General Multi-label Image Classification with Transformers Jack Lanchantin, Tianlu Wang, Vicente Ordóñez Román, Yanjun Qi Conference on Computer Visio

QData 154 Dec 21, 2022
CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability

This is the official repository of the paper: CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability A private copy of the

Fadi Boutros 33 Dec 31, 2022
An original implementation of "MetaICL Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi

MetaICL: Learning to Learn In Context This includes an original implementation of "MetaICL: Learning to Learn In Context" by Sewon Min, Mike Lewis, Lu

Meta Research 141 Jan 07, 2023
2021 National Underwater Robotics Vision Optics

2021-National-Underwater-Robotics-Vision-Optics 2021年全国水下机器人算法大赛-光学赛道-B榜精度第18名 (Kilian_Di的团队:A榜[email pro

Di Chang 9 Nov 04, 2022