Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

Last update: Dec 17, 2022

Overview

Introduction

This repository is for X-Linear Attention Networks for Image Captioning (CVPR 2020). The original paper can be found here.

Please cite with the following BibTeX:

@inproceedings{xlinear2020cvpr,
  title={X-Linear Attention Networks for Image Captioning},
  author={Pan, Yingwei and Yao, Ting and Li, Yehao and Mei, Tao},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2020}
}

Requirements

Python 3
CUDA 10
numpy
tqdm
easydict
PyTorch (>1.0)
torchvision
coco-caption

Data preparation

Download the bottom up features and convert them to npz files

python2 tools/create_feats.py --infeats bottom_up_tsv --outfolder ./mscoco/feature/up_down_10_100

Download the annotations into the mscoco folder. More details about data preparation can be referred to self-critical.pytorch
Download coco-caption and setup the path of __C.INFERENCE.COCO_PATH in lib/config.py
The pretrained models and results can be downloaded here.
The pretrained SENet-154 model can be downloaded here.

Training

Train X-LAN model

bash experiments/xlan/train.sh

Train X-LAN model using self critical

Copy the pretrained model into experiments/xlan_rl/snapshot and run the script

bash experiments/xlan_rl/train.sh

Train X-LAN transformer model

bash experiments/xtransformer/train.sh

Train X-LAN transformer model using self critical

Copy the pretrained model into experiments/xtransformer_rl/snapshot and run the script

bash experiments/xtransformer_rl/train.sh

Evaluation

CUDA_VISIBLE_DEVICES=0 python3 main_test.py --folder experiments/model_folder --resume model_epoch

Acknowledgements

Thanks the contribution of self-critical.pytorch and awesome PyTorch team.

Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]

Related tags

Overview

Introduction

Requirements

Data preparation

Training

Train X-LAN model

Train X-LAN model using self critical

Train X-LAN transformer model

Train X-LAN transformer model using self critical

Evaluation

Acknowledgements

Owner

JDAI-CV

The official PyTorch implementation of paper BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition

Revealing and Protecting Labels in Distributed Training

Text Generation by Learning from Demonstrations

Implementation of ProteinBERT in Pytorch

Reinforcement Learning via Supervised Learning

Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021)

MagFace: A Universal Representation for Face Recognition and Quality Assessment

Python library for computer vision labeling tasks. The core functionality is to translate bounding box annotations between different formats-for example, from coco to yolo.

Official Pytorch Code for the paper TransWeather

This repository provides an unified frameworks to train and test the state-of-the-art few-shot font generation (FFG) models.

Official TensorFlow code for the forthcoming paper

Data augmentation for NLP, accepted at EMNLP 2021 Findings

The undersampled DWI image using Slice-Interleaved Diffusion Encoding (SIDE) method can be reconstructed by the UNet network.

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Adaptive FNO transformer - official Pytorch implementation

PASSL包含 SimCLR，MoCo，BYOL，CLIP等基于对比学习的图像自监督算法以及 Vision-Transformer，Swin-Transformer，BEiT，CVT，T2T，MLP_Mixer等视觉Transformer算法

Jingju baseline - A baseline model of our project of Beijing opera script generation

Official Chainer implementation of GP-GAN: Towards Realistic High-Resolution Image Blending (ACMMM 2019, oral)

Code repository for "Stable View Synthesis".

A general-purpose programming language, focused on simplicity, safety and stability.