PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Overview

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

This repo contains the PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Installation

python setup.py install 

How to run the models

We provide example scripts for each model in hyperformer/scripts/ folder with their config files in hyperformer/configs. To run the models, please do cd hyperformer and:

  • To run hyperformer++ model (This model generates the task-specific adapters using a shared hypernetwork, which is shared across the tasks and layers of a transformer.):

    bash scripts/hyperformer++.sh
    
  • To run hyperformer model (This model generates the task-specific adapters using a shared hypernetwork, which is shared across the tasks, but this is specific to each layer of a transformer. This model is less efficient compared to hyperformer++.):

    bash scripts/hyperformer.sh
    
  • To run adapter\dagger model (This model share the layer normalization between adapters across the tasks, and train adapters in a multi-task setting.):

    bash scripts/adapters_dagger.sh   
    
  • To run adapter model (This model trains a single-adapter per task and trains the adapters in a single-task learning.):

    bash scripts/adapters.sh 
    
  • To run T5 finetuning model in a multi-task learning setup:

    bash scripts/finetune.sh
    
  • To run T5 finetuning model in a single-task learning setup:

    bash scripts/finetune_single_task.sh
    

We run all the models on 4 GPUs, while this is not necessary and one can run the models on 1 GPU. In case running on one GPU, in all the scripts, please remove the -m torch.distributed.launch --nproc_per_node=4 part.

Bibliography

If you find this repo useful, please cite our paper.

@inproceedings{karimi2021parameterefficient,
  title={Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks},
  author={Karimi Mahabadi, Rabeeh and Ruder, Sebastian and Dehghani, Mostafa and Henderson, James},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
  year={2021}
}

Final words

Hope this repo is useful for your research. For any questions, please create an issue or email [email protected], and I will get back to you as soon as possible.

Owner
Rabeeh Karimi Mahabadi
PhD student in NLP, working on representation learning and textual entailment
Rabeeh Karimi Mahabadi
Solutions of Reinforcement Learning 2nd Edition

Solutions of Reinforcement Learning, An Introduction

YIFAN WANG 1.4k Dec 30, 2022
Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"

Easy-To-Hard The official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks". Gett

Avi Schwarzschild 52 Sep 08, 2022
Anti-UAV base on PaddleDetection

Paddle-Anti-UAV Anti-UAV base on PaddleDetection Background UAVs are very popular and we can see them in many public spaces, such as parks and playgro

Qingzhong Wang 2 Apr 20, 2022
ICCV2021 Oral SA-ConvONet: Sign-Agnostic Optimization of Convolutional Occupancy Networks

Sign-Agnostic Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page This repository contains the implementation

64 Jan 05, 2023
MMRazor: a model compression toolkit for model slimming and AutoML

Documentation: https://mmrazor.readthedocs.io/ English | 简体中文 Introduction MMRazor is a model compression toolkit for model slimming and AutoML, which

OpenMMLab 899 Jan 02, 2023
Retinal Vessel Segmentation with Pixel-wise Adaptive Filters (ISBI 2022)

Official code of Retinal Vessel Segmentation with Pixel-wise Adaptive Filters and Consistency Training (ISBI 2022)

anonymous 14 Oct 27, 2022
Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker This repository contai

Nikita 12 Dec 14, 2022
POPPY (Physical Optics Propagation in Python) is a Python package that simulates physical optical propagation including diffraction

POPPY: Physical Optics Propagation in Python POPPY (Physical Optics Propagation in Python) is a Python package that simulates physical optical propaga

Space Telescope Science Institute 132 Dec 15, 2022
Pytorch implementation of VAEs for heterogeneous likelihoods.

Heterogeneous VAEs Beware: This repository is under construction 🛠️ Pytorch implementation of different VAE models to model heterogeneous data. Here,

Adrián Javaloy 35 Nov 29, 2022
Main Results on ImageNet with Pretrained Models

This repository contains Pytorch evaluation code, training code and pretrained models for the following projects: SPACH (A Battle of Network Structure

Microsoft 151 Dec 14, 2022
FPGA: Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification

FPGA & FreeNet Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification by Zhuo Zheng, Yanfei Zhong, Ailong M

Zhuo Zheng 92 Jan 03, 2023
Classification Modeling: Probability of Default

Credit Risk Modeling in Python Introduction: If you've ever applied for a credit card or loan, you know that financial firms process your information

Aktham Momani 2 Nov 07, 2022
Concept drift monitoring for HA model servers.

{Fast, Correct, Simple} - pick three Easily compare training and production ML data & model distributions Goals Boxkite is an instrumentation library

98 Dec 15, 2022
Machine Translation Implement By Bi-GRU And Transformer

Seq2Seq Translation Implement By Bidirectional GRU And Transformer In Pytorch Before You Run The Code You should download the data through the link be

He Wang 2 Oct 27, 2021
Real-CUGAN - Real Cascade U-Nets for Anime Image Super Resolution

Real Cascade U-Nets for Anime Image Super Resolution 中文 | English 🔥 Real-CUGAN

tarsin 111 Dec 28, 2022
OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis

OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis Overview OpenABC-D is a large-scale labeled dataset generate

NYU Machine-Learning guided Design Automation (MLDA) 31 Nov 22, 2022
[ICCV 2021 Oral] Deep Evidential Action Recognition

DEAR (Deep Evidential Action Recognition) Project | Paper & Supp Wentao Bao, Qi Yu, Yu Kong International Conference on Computer Vision (ICCV Oral), 2

Wentao Bao 80 Jan 03, 2023
Changing the Mind of Transformers for Topically-Controllable Language Generation

We will first introduce the how to run the IPython notebook demo by downloading our pretrained models. Then, we will introduce how to run our training and evaluation code.

IESL 20 Dec 06, 2022
This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

shangbuhuan 52 Nov 25, 2022
Repo for EMNLP 2021 paper "Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression"

beyond-preserved-accuracy Repo for EMNLP 2021 paper "Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression" How to implemen

Kevin Canwen Xu 10 Dec 23, 2022