Source code of SIGIR2021 Paper 'One Chatbot Per Person: Creating Personalized Chatbots based on Implicit Profiles'

Overview

DHAP

Source code of SIGIR2021 Long Paper:

One Chatbot Per Person: Creating Personalized Chatbots based on Implicit User Profiles .

Preinstallation

First, install the python packages in your Python3 environment:

  git clone https://github.com/zhengyima/DHAP.git DHAP
  cd DHAP
  pip install -r requirements.txt

Then, you should download the pre-trained word embeddings to initialize the model training. We provide two word embeddings in the Google Drive:

  • sgns.weibo.bigram-char, folloing Li et al., Chinese word embeddings pre-trained on Weibo. Google Drive
  • Fasttext embeddings, English word embedding pre-trained on Reddit set. Google Drive

You can pre-train your own embeddings(with the same format, i.e., the standard txt format), and use it in the model.

After downloading, you should put the embedding file to the path EMB_FILE.

Data

You should provide the dialogue history of users for training the model. For convenience, we provide a very small subset of PChatbot in the data/ as the demo data. In the direcotry, each user's dialogue history is saved in one text file. Each line in the file should contain post text, user id of post, post timestamp, response text, user id of response, response timestamp, _, _ , with tab as the seperator.

You can refer to seq2seq/dataset/perdialogDatasets.py for more details about the data processing.

If you are interested in the dataset PChatbot, please go to its official repository for more details.

Model Training

We provide a shell script scripts/train_chat.sh to start model pre-training. You should modify the DATA_DIR and EMB_FILE to your own paths. Then, you can start training by the following command:

bash scripts/train_chat.sh

The hyper-parameters are defined and set in the configParser.py.

After training, the trained checkpoints are saved in outputs. The inferenced result is saved in RESULT_FILE, which you define in bash scripts/train_chat.sh

Evaluating

For calculating varities of evaluation metrics(e.g. BLEU, P-Cover...), we provide a shell script scripts/eval.sh. You should modify the EMB_FILE to your own path, then evaluate the results by the following command:

bash scripts/eval.sh

Citations

If our code helps you, please cite our work by:

@inproceedings{DBLP:conf/sigir/madousigir21,
     author = {Zhengyi Ma and Zhicheng Dou and Yutao Zhu Hanxun Zhong and Ji-Rong Wen}, 
     title = {One Chatbot Per Person: Creating Personalized Chatbots based onImplicit User Profiles}, 
     booktitle = {Proceedings of the {SIGIR} 2021}, 
     publisher = {{ACM}}, 
     year = {2021}, 
     url = {https://doi.org/10.1145/3404835.3462828}, 
     doi = {10.1145/3404835.3462828}}

Links

Owner
ZYMa
Master candidate. IR and NLP.
ZYMa
LaneAF: Robust Multi-Lane Detection with Affinity Fields

LaneAF: Robust Multi-Lane Detection with Affinity Fields This repository contains Pytorch code for training and testing LaneAF lane detection models i

155 Dec 17, 2022
I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch for archive)

An Image Captioning codebase This is a codebase for image captioning research. It supports: Self critical training from Self-critical Sequence Trainin

Ruotian(RT) Luo 1.3k Dec 31, 2022
A motion detection system with RaspberryPi, OpenCV, Python

Human Detection System using Raspberry Pi Functionality Activates a relay on detecting motion. You may need following components to get the expected R

Omal Perera 55 Dec 04, 2022
Pytorch implementation for "Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion" (NeurIPS 2021)

Density-aware Chamfer Distance This repository contains the official PyTorch implementation of our paper: Density-aware Chamfer Distance as a Comprehe

Tong WU 93 Dec 15, 2022
Python implementation of "Elliptic Fourier Features of a Closed Contour"

PyEFD An Python/NumPy implementation of a method for approximating a contour with a Fourier series, as described in [1]. Installation pip install pyef

Henrik Blidh 71 Dec 09, 2022
The codebase for our paper "Generative Occupancy Fields for 3D Surface-Aware Image Synthesis" (NeurIPS 2021)

Generative Occupancy Fields for 3D Surface-Aware Image Synthesis (NeurIPS 2021) Project Page | Paper Xudong Xu, Xingang Pan, Dahua Lin and Bo Dai GOF

xuxudong 97 Nov 10, 2022
Code for the paper Progressive Pose Attention for Person Image Generation in CVPR19 (Oral).

Pose-Transfer Code for the paper Progressive Pose Attention for Person Image Generation in CVPR19(Oral). The paper is available here. Video generation

Tengteng Huang 679 Jan 04, 2023
Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech The family of UniSpeech: UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR UniSpeech-

Microsoft 282 Jan 09, 2023
RM Operation can equivalently convert ResNet to VGG, which is better for pruning; and can help RepVGG perform better when the depth is large.

RM Operation can equivalently convert ResNet to VGG, which is better for pruning; and can help RepVGG perform better when the depth is large.

184 Jan 04, 2023
Generate indoor scenes with Transformers

SceneFormer: Indoor Scene Generation with Transformers Initial code release for the Sceneformer paper, contains models, train and test scripts for the

Chandan Yeshwanth 110 Dec 06, 2022
[NeurIPS-2021] Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation

Efficient Graph Similarity Computation - (EGSC) This repo contains the source code and dataset for our paper: Slow Learning and Fast Inference: Effici

24 Dec 31, 2022
cl;asification problem using classification models in supervised learning

wine-quality-predition---classification cl;asification problem using classification models in supervised learning Wine Quality Prediction Analysis - C

Vineeth Reddy Gangula 1 Jan 18, 2022
LBK 26 Dec 28, 2022
My coursework for Machine Learning (2021 Spring) at National Taiwan University (NTU)

Machine Learning 2021 Machine Learning (NTU EE 5184, Spring 2021) Instructor: Hung-yi Lee Course Website : (https://speech.ee.ntu.edu.tw/~hylee/ml/202

100 Dec 26, 2022
Hepsiburada - Hepsiburada Urun Bilgisi Cekme

Hepsiburada Urun Bilgisi Cekme from hepsiburada import Marka nike = Marka("nike"

Ilker Manap 8 Oct 26, 2022
Julia package for contraction of tensor networks, based on the sweep line algorithm outlined in the paper General tensor network decoding of 2D Pauli codes

Julia package for contraction of tensor networks, based on the sweep line algorithm outlined in the paper General tensor network decoding of 2D Pauli codes

Christopher T. Chubb 35 Dec 21, 2022
Learning-Augmented Dynamic Power Management

Learning-Augmented Dynamic Power Management This repository contains source code accompanying paper Learning-Augmented Dynamic Power Management with M

Adam 0 Feb 22, 2022
Caffe: a fast open framework for deep learning.

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berke

Berkeley Vision and Learning Center 33k Dec 28, 2022
"Domain Adaptive Semantic Segmentation without Source Data" (ACM MM 2021)

LDBE Pytorch implementation for two papers (the paper will be released soon): "Domain Adaptive Semantic Segmentation without Source Data", ACM MM2021.

benfour 16 Sep 28, 2022
Tiny Object Detection in Aerial Images.

AI-TOD AI-TOD is a dataset for tiny object detection in aerial images. [Paper] [Dataset] Description AI-TOD comes with 700,621 object instances for ei

jwwangchn 116 Dec 30, 2022