Code repo for "Transformer on a Diet" paper

Overview

Transformer on a Diet

Reference: C Wang, Z Ye, A Zhang, Z Zhang, A Smola. "Transformer on a Diet". arXiv preprint arXiv (2020).

Installation

pip install --pre --upgrade mxnet
pip install gluonnlp

Results

The results and the command line to reproduce the results on PTB dataset are as follows.

[1] Full (Val PPL 109.19 Test PPL 103.72)

$ cd scripts/language_model/
$ python transformer_language_model.py --model full --data ptb --emsize 320 --nhid 2000 --nlayers 3 --lr 10 --epochs 500 --batch_size 20 --bptt 70 --dropout 0.4 --dropout_h 0.25 --dropout_i 0 --dropout_e 0 --weight_drop 0 --tied --alpha 0 --beta 0 --lr_update_interval 100 --lr_update_factor 1 --num_heads 16 --scaled --units 320 --use_residual --max_src_length 1000 --warmup_steps 0 --first_window_size 1 --kernel_size 3 --d_base 2

[2] Dilated (Val PPL 115.67 Test PPL 110.92)

$ cd scripts/language_model/
$ python transformer_language_model.py --model dilated --data ptb --emsize 320 --nhid 2000 --nlayers 3 --lr 10 --epochs 500 --batch_size 20 --bptt 70 --dropout 0.4 --dropout_h 0.25 --dropout_i 0 --dropout_e 0 --weight_drop 0 --tied --alpha 0 --beta 0 --lr_update_interval 100 --lr_update_factor 1 --num_heads 16 --scaled --units 320 --use_residual --max_src_length 1000 --warmup_steps 0 --first_window_size 1 --kernel_size 3 --d_base 2

[3] Dilated-Memory (Val PPL 115.35 Test PPL 110.98)

$ cd scripts/language_model/
$ python transformer_language_model.py --model dilated_mem --data ptb --emsize 320 --nhid 2000 --nlayers 3 --lr 10 --epochs 500 --batch_size 20 --bptt 70 --dropout 0.4 --dropout_h 0.25 --dropout_i 0 --dropout_e 0 --weight_drop 0 --tied --alpha 0 --beta 0 --lr_update_interval 100 --lr_update_factor 1 --num_heads 16 --scaled --units 320 --use_residual --max_src_length 1000 --warmup_steps 0 --first_window_size 1 --kernel_size 3 --d_base 2

[4] Cascade (Val PPL 109.16 Test PPL 105.27)

$ cd scripts/language_model/
$ python transformer_language_model.py --model cascade --data ptb --emsize 320 --nhid 2000 --nlayers 3 --lr 10 --epochs 500 --batch_size 20 --bptt 70 --dropout 0.4 --dropout_h 0.25 --dropout_i 0 --dropout_e 0 --weight_drop 0 --tied --alpha 0 --beta 0 --lr_update_interval 100 --lr_update_factor 1 --num_heads 16 --scaled --units 320 --use_residual --max_src_length 1000 --warmup_steps 0 --first_window_size 4 --window_size_multiplier 2 --kernel_size 3 --d_base 2

Note that the command to reproduce the results on wikitext-2 would be updated soon.

Reference Paper

The bibtext entry of the reference paper is:

@article{transformerdiet2020,
   title={Transformer on a Diet},
   author={Chenguang Wang and Zihao Ye and Aston Zhang and Zheng Zhang and Alexander J. Smola},
   journal={ArXiv},
   year={2020},
   volume={abs/2002.06170}
}
Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.

Conceptual 12M We introduce the Conceptual 12M (CC12M), a dataset with ~12 million image-text pairs meant to be used for vision-and-language pre-train

Google Research Datasets 226 Dec 07, 2022
Azion the best solution of Edge Computing in the world.

Azion Edge Function docker action Create or update an Edge Functions on Azion Edge Nodes. The domain name is the key for decision to a create or updat

8 Jul 16, 2022
Code-free deep segmentation for computational pathology

NoCodeSeg: Deep segmentation made easy! This is the official repository for the manuscript "Code-free development and deployment of deep segmentation

André Pedersen 26 Nov 23, 2022
MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera

MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera

Felix Wimbauer 494 Jan 06, 2023
Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch

Retrieval-Augmented Denoising Diffusion Probabilistic Models (wip) Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in P

Phil Wang 55 Jan 01, 2023
Keyword spotting on Arm Cortex-M Microcontrollers

Keyword spotting for Microcontrollers This repository consists of the tensorflow models and training scripts used in the paper: Hello Edge: Keyword sp

Arm Software 1k Dec 30, 2022
The codes reproduce the figures and statistics in the paper, "Controlling for multiple covariates," by Mark Tygert.

The accompanying codes reproduce all figures and statistics presented in "Controlling for multiple covariates" by Mark Tygert. This repository also pr

Meta Research 1 Dec 02, 2021
Testing and Estimation of structural breaks in Stata

xtbreak estimating and testing for many known and unknown structural breaks in time series and panel data. For an overview of xtbreak test see xtbreak

Jan Ditzen 13 Jun 19, 2022
This is the repo for the paper `SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization'. (published in Bioinformatics'21)

SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization This is the code for our paper ``SumGNN: Multi-typed Drug

Yue Yu 58 Dec 21, 2022
Generate vibrant and detailed images using only text.

CLIP Guided Diffusion From RiversHaveWings. Generate vibrant and detailed images using only text. See captions and more generations in the Gallery See

Clay M. 401 Dec 28, 2022
PyTorch implementation of PP-LCNet

PP-LCNet-Pytorch Pre-Trained Models Google Drive p018 Accuracy Models Top1 Top5 PPLCNet_x0_25 0.5186 0.7565 PPLCNet_x0_35 0.5809 0.8083 PPLCNet_x0_5 0

24 Dec 12, 2022
Codes for ACL-IJCNLP 2021 Paper "Zero-shot Fact Verification by Claim Generation"

Zero-shot-Fact-Verification-by-Claim-Generation This repository contains code and models for the paper: Zero-shot Fact Verification by Claim Generatio

Liangming Pan 47 Jan 01, 2023
Predicting Auction Sale Price using the kaggle bulldozer auction sales data: Modeling with Ensembles vs Neural Network

Predicting Auction Sale Price using the kaggle bulldozer auction sales data: Modeling with Ensembles vs Neural Network The performances of tree ensemb

Mustapha Unubi Momoh 2 Sep 13, 2022
List some popular DeepFake models e.g. DeepFake, FaceSwap-MarekKowal, IPGAN, FaceShifter, FaceSwap-Nirkin, FSGAN, SimSwap, CihaNet, etc.

deepfake-models List some popular DeepFake models e.g. DeepFake, CihaNet, SimSwap, FaceSwap-MarekKowal, IPGAN, FaceShifter, FaceSwap-Nirkin, FSGAN, Si

Mingcan Xiang 100 Dec 17, 2022
Magic tool for managing internet connection in local network by @zalexdev

Megacut ✂️ A new powerful Python3 tool for managing internet on a local network Installation git clone https://github.com/stryker-project/megacut cd m

Stryker 12 Dec 15, 2022
Repository for publicly available deep learning models developed in Rosetta community

trRosetta2 This package contains deep learning models and related scripts used by Baker group in CASP14. Installation Linux/Mac clone the package git

81 Dec 29, 2022
Implementation detail for paper "Multi-level colonoscopy malignant tissue detection with adversarial CAC-UNet"

Multi-level-colonoscopy-malignant-tissue-detection-with-adversarial-CAC-UNet Implementation detail for our paper "Multi-level colonoscopy malignant ti

CVSM Group - email: <a href=[email protected]"> 84 Nov 22, 2022
This is an official implementation for "PlaneRecNet".

PlaneRecNet This is an official implementation for PlaneRecNet: A multi-task convolutional neural network provides instance segmentation for piece-wis

yaxu 50 Nov 17, 2022
Joint Learning of 3D Shape Retrieval and Deformation, CVPR 2021

Joint Learning of 3D Shape Retrieval and Deformation Joint Learning of 3D Shape Retrieval and Deformation Mikaela Angelina Uy, Vladimir G. Kim, Minhyu

Mikaela Uy 38 Oct 18, 2022
Forecasting directional movements of stock prices for intraday trading using LSTM and random forest

Forecasting directional movements of stock-prices for intraday trading using LSTM and random-forest https://arxiv.org/abs/2004.10178 Pushpendu Ghosh,

Pushpendu Ghosh 270 Dec 24, 2022