Shunted Transformer
This is the offical implementation of Shunted Self-Attention via Multi-Scale Token Aggregation by Sucheng Ren, Daquan Zhou, Shengfeng He, Jiashi Feng, Xinchao Wang
Training from scratch
bash dist_train.sh
Citation
@misc{ren2021shunted,
title={Shunted Self-Attention via Multi-Scale Token Aggregation},
author={Sucheng Ren and Daquan Zhou and Shengfeng He and Jiashi Feng and Xinchao Wang},
year={2021},
eprint={2111.15193},
archivePrefix={arXiv},
primaryClass={cs.CV}
}