Versatile Generative Language Model

Overview

Versatile Generative Language Model

License: MIT

This is the implementation of the paper:

Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning. Zhaojiang Lin, Andrea Madotto, Pascale Fung Findings of EMNLP 2020 [PDF]

If you use any source codes or datasets included in this toolkit in your work, please cite the following paper. The bibtex is listed below:

@article{lin2020exploring,
  title={Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning},
  author={Lin, Zhaojiang and Madotto, Andrea and Fung, Pascale},
  journal={arXiv preprint arXiv:2004.03829},
  year={2020}
}

Abstract

Fine-tuning pre-trained generative language models to down-stream language generation tasks have shown promising results. However, it comes with the cost of having a single, large, model for each task, which is not ideal in low-memory/power scenarios (e.g., mobile). In this work, we propose an effective way for fine-tuning multiple down-stream generation tasks simultaneously using a single, large pre-trained model. The experiments in five diverse language generation tasks show that by just using an additional 2-3% parameters for each task, our model can maintain or even improve the performance of fine-tuning the whole model.

Versatile Generative Language Model (VLM):

Versatile Language Model (VLM) is composed of three components: a pre-trained language model back-bone (e.g., GPT-2), and two kinds of specialized parameters for each generation task such as low-rank residual adapters and task embeddings.

Dependency

Check the packages needed or simply run the command

❱❱❱ pip install -r requirements.txt

Experiments

Dataset

Download the preprocessed datasets

Reproducibility

We provide the trained checkpoint of our VLM.

Test model: choose one task from (mt, summarization, dialogue, qa, nlg].

❱❱❱ python ./evaluate_vlm.py --task mt --no_sample --model_checkpoint $model_path

Fine tune GPT-2

Train machine translation:

❱❱❱ python ./train.py --gradient_accumulation_steps=4 --max_history=2 --train_batch_size=8 --valid_batch_size=8 --n_epochs 8 --task mt --dataset_path data/NMT/data_en_ge.json

Test machine translation:

❱❱❱ python ./evaluate.py --task mt --no_sample --max_history=2 --model_checkpoint runs/$model_checkpoint

Check run.sh to run other tasks

VLM train Adapters and Task embeddings

Train machine translation without knowledge distillation

❱❱❱ python ./train.py --gradient_accumulation_steps=4 --max_history=2 --train_batch_size=8 --valid_batch_size=8 --n_epochs 8 --task mt --dataset_path data/NMT/data_en_ge.json --adapter_bottleneck 300 --lr 0.0005

Train machine translation using sentence level knowledge distillation:

❱❱❱ python ./sentence_distiller.py --task mt --max_history=2 --model_checkpoint runs/$fully_finetuned_gpt2_checkpoint --no_sample
❱❱❱ python ./train.py --gradient_accumulation_steps=4 --max_history=2 --train_batch_size=8 --valid_batch_size=8 --n_epochs 8 --task mt --dataset_path data/NMT/data_en_ge.json --adapter_bottleneck 300 --lr 0.0005 --distillation

Test machine traslation:

❱❱❱ python ./evaluate.py --task mt --no_sample --adapter_bottleneck 300 --model_checkpoint runs/$model_checkpoint

Check run.sh to run other tasks

Combine all the adapters and task embedding into single model

Line 68 of combine_all.py to provide the list of checkpoint

❱❱❱ python combine_all.py

Test to see if the result is same

❱❱❱ python ./evaluate_vlm.py --task mt --no_sample --model_checkpoint $model_path

The above scripts illustrate how to train VLM continuously when tasks arrive sequentially.

Multitask training VLM

When all the tasks available at the same time.

❱❱❱ python ./train_vlm.py --gradient_accumulation_steps=16 --train_batch_size=1 --valid_batch_size=1 --n_epochs 3

Acknowledgement

This repository is implemented base on Huggingface

Owner
Zhaojiang Lin
Ph.D. Candidate - NLP - Deep Learning
Zhaojiang Lin
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

Advanced Image Manipulation Lab @ Samsung AI Center Moscow 4.7k Dec 31, 2022
Pytorch implementation of MLP-Mixer with loading pre-trained models.

MLP-Mixer-Pytorch PyTorch implementation of MLP-Mixer: An all-MLP Architecture for Vision with the function of loading official ImageNet pre-trained p

Qiushi Yang 2 Sep 29, 2022
Equivariant layers for RC-complement symmetry in DNA sequence data

Equi-RC Equivariant layers for RC-complement symmetry in DNA sequence data This is a repository that implements the layers as described in "Reverse-Co

7 May 19, 2022
This is the dataset for testing the robustness of various VO/VIO methods

KAIST VIO dataset This is the dataset for testing the robustness of various VO/VIO methods You can download the whole dataset on KAIST VIO dataset Ind

1 Sep 01, 2022
Disentangled Face Attribute Editing via Instance-Aware Latent Space Search, accepted by IJCAI 2021.

Instance-Aware Latent-Space Search This is a PyTorch implementation of the following paper: Disentangled Face Attribute Editing via Instance-Aware Lat

67 Dec 21, 2022
Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning

SkFlow has been moved to Tensorflow. SkFlow has been moved to http://github.com/tensorflow/tensorflow into contrib folder specifically located here. T

3.2k Dec 29, 2022
Generate indoor scenes with Transformers

SceneFormer: Indoor Scene Generation with Transformers Initial code release for the Sceneformer paper, contains models, train and test scripts for the

Chandan Yeshwanth 110 Dec 06, 2022
A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX

Foolbox Native: Fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX Foolbox is a Python li

Bethge Lab 2.4k Dec 25, 2022
Code for "Learning to Segment Rigid Motions from Two Frames".

rigidmask Code for "Learning to Segment Rigid Motions from Two Frames". ** This is a partial release with inference and evaluation code.

Gengshan Yang 157 Nov 21, 2022
Official implementation of the PICASO: Permutation-Invariant Cascaded Attentional Set Operator

PICASO Official PyTorch implemetation for the paper PICASO:Permutation-Invariant Cascaded Attentive Set Operator. Requirements Python 3 torch = 1.0 n

Samira Zare 0 Dec 23, 2021
Python scripts for performing lane detection using the LSTR model in ONNX

ONNX LSTR Lane Detection Python scripts for performing lane detection using the Lane Shape Prediction with Transformers (LSTR) model in ONNX. Requirem

Ibai Gorordo 29 Aug 30, 2022
Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021).

STAR-pytorch Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021). CVF (pdf) STAR-DC

43 Dec 21, 2022
Patches desktop steam to look like the new steamdeck ui.

steam_deck_ui_patch The Deck UI patch will patch the regular desktop steam to look like the brand new SteamDeck UI. This patch tool currently works on

The_IT_Dude 3 Aug 29, 2022
Repository for self-supervised landmark discovery

self-supervised-landmarks Repository for self-supervised landmark discovery Requirements pytorch pynrrd (for 3d images) Usage The use of this models i

Riddhish Bhalodia 2 Apr 18, 2022
Wide Residual Networks (WideResNets) in PyTorch

Wide Residual Networks (WideResNets) in PyTorch WideResNets for CIFAR10/100 implemented in PyTorch. This implementation requires less GPU memory than

Jason Kuen 296 Dec 27, 2022
Contextual Attention Localization for Offline Handwritten Text Recognition

CALText This repository contains the source code for CALText model introduced in "CALText: Contextual Attention Localization for Offline Handwritten T

0 Feb 17, 2022
Machine Learning Framework for Operating Systems - Brings ML to Linux kernel

KML: A Machine Learning Framework for Operating Systems & Storage Systems Storage systems and their OS components are designed to accommodate a wide v

File systems and Storage Lab (FSL) 186 Nov 24, 2022
PyTorch implementation of Interpretable Explanations of Black Boxes by Meaningful Perturbation

PyTorch implementation of Interpretable Explanations of Black Boxes by Meaningful Perturbation The paper: https://arxiv.org/abs/1704.03296 What makes

Jacob Gildenblat 322 Dec 17, 2022
PyTorch implementation of the paper: Long-tail Learning via Logit Adjustment

logit-adj-pytorch PyTorch implementation of the paper: Long-tail Learning via Logit Adjustment This code implements the paper: Long-tail Learning via

Chamuditha Jayanga 53 Dec 23, 2022
PyTorch implementation of the Crafting Better Contrastive Views for Siamese Representation Learning

Crafting Better Contrastive Views for Siamese Representation Learning This is the official PyTorch implementation of the ContrastiveCrop paper: @artic

249 Dec 28, 2022