PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models

Last update: Dec 06, 2022

Related tags

Overview

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models

This repository is the official implementation of the following paper:

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models
Chaoyang He (USC), Shen Li (Facebook AI Research), Mahdi Soltanolkotabi (USC), Salman Avestimehr (USC)
Accepted to ICML 2021 (International Conference on Machine Learning 2021)

1. Introduction

The size of Transformer models is growing at an unprecedented rate. It has taken less than one year to reach trillion-level parameters since the release of GPT-3 (175B). Training such models requires both substantial engineering efforts and enormous computing resources, which are luxuries most research teams cannot afford. In this paper, we propose PipeTransformer, which leverages automated elastic pipelining for efficient distributed training of Transformer models. In PipeTransformer, we design an adaptive on the fly freeze algorithm that can identify and freeze some layers gradually during training, and an elastic pipelining system that can dynamically allocate resources to train the remaining active layers. More specifically, PipeTransformer automatically excludes frozen layers from the pipeline, packs active layers into fewer GPUs, and forks more replicas to increase data-parallel width. We evaluate PipeTransformer using Vision Transformer (ViT) on ImageNet and BERT on SQuAD and GLUE datasets. Our results show that compared to the state-of-the-art baseline, PipeTransformer attains up to $2.83$-fold speedup without losing accuracy. We also provide various performance analyses for a more comprehensive understanding of our algorithmic and system-wise design. Finally, we have modularized our training system with flexible APIs and made the source code publicly available.

2. Overall Design

3. Slides

https://docs.google.com/presentation/d/1t6HWL33KIQo2as0nSHeBpXYtTBcy0nXCoLiKd0EashY/edit?usp=sharing

4. Understanding PipeTransformer by Animation

https://videos.files.wordpress.com/3vsRzoiw/pipetransformer-animation_m4v_hd.mp4

5. Installation

Please follow INSTALL-CONDA.md.

6. Experiments

check README.md at

examples/image_classification

examples/question_answering

examples/text_classification

7. Citation

If you use any part of this code in your research or any engineering project, please cite our paper:

@article{he2021pipetransformer,
  title={Pipetransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models},
  author={He, Chaoyang and Li, Shen and Soltanolkotabi, Mahdi and Avestimehr, Salman},
  journal={Thirty-eighth International Conference on Machine Learning},
  year={2021}
}

8. Contacts

Chaoyang He
https://chaoyanghe.com
[email protected]
[email protected]

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models

Related tags

Overview

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models

1. Introduction

2. Overall Design

3. Slides

4. Understanding PipeTransformer by Animation

5. Installation

6. Experiments

7. Citation

8. Contacts

Owner

DistributedML

[SIGGRAPH 2022 Journal Track] AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

A boosting-based Multiple Instance Learning (MIL) package that includes MIL-Boost and MCIL-Boost

Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs

VGGFace2-HQ - A high resolution face dataset for face editing purpose

Official PyTorch implementation of "VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization" (CVPR 2021)

This is the code of using DQN to play Sekiro .

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

A curated list of awesome Active Learning

This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.

Repository for benchmarking graph neural networks

Aggragrating Nested Transformer Official Jax Implementation

PyTorch Lightning + Hydra. A feature-rich template for rapid, scalable and reproducible ML experimentation with best practices. ⚡🔥⚡

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Official git for "CTAB-GAN: Effective Table Data Synthesizing"

face2comics by Sxela (Alex Spirin) - face2comics datasets

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

An implementation of EWC with PyTorch

Python Auto-ML Package for Tabular Datasets

Research code for the paper "Variational Gibbs inference for statistical estimation from incomplete data".

YOLOV4运行在嵌入式设备上