Bottleneck Transformers for Visual Recognition Experiments Model Params (M) Acc (%) ResNet50 baseline (ref) 23.5M 93.62 BoTNet-50 18.8M 95.11% BoTNet-S1-50 18.8M 95.67% BoTNet-S1-59 27.5M 95.98% BoTNet-S1-77 44.9M wip Summary Usage (example) Model from model import Model model = ResNet50(num_classes=1000, resolution=(224, 224)) x = torch.randn([2, 3, 224, 224]) print(model(x).size()) Module from model import MHSA resolution = 14 mhsa = MHSA(planes, width=resolution, height=resolution) Reference Paper link Author: Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani Organization: UC Berkeley, Google Research