Code for paper Adaptively Aligned Image Captioning via Adaptive Attention Time

Last update: Aug 27, 2022

Overview

Adaptively Aligned Image Captioning via Adaptive Attention Time

This repository includes the implementation for Adaptively Aligned Image Captioning via Adaptive Attention Time.

Requirements

Python 3.6
Java 1.8.0
PyTorch 1.0
cider
coco-caption
tensorboardX

Training AAT

Prepare data (with python2)

See details in data/README.md.

(notes: Set word_count_threshold in scripts/prepro_labels.py to 4 to generate a vocabulary of size 10,369.)

You should also preprocess the dataset and get the cache for calculating cider score for SCST:

$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train

Training

$ sh train-aat.sh

See opts.py for the options.

Evaluation

$ CUDA_VISIBLE_DEVICES=0 python eval.py --model log/log_aat_rl/model.pth --infos_path log/log_aat_rl/infos_aat.pkl  --dump_images 0 --dump_json 1 --num_images -1 --language_eval 1 --beam_size 2 --batch_size 100 --split test

Reference

If you find this repo helpful, please consider citing:

@inproceedings{huang2019adaptively,
  title = {Adaptively Aligned Image Captioning via Adaptive Attention Time},
  author = {Huang, Lun and Wang, Wenmin and Xia, Yaxian and Chen, Jie},
  booktitle = {Advances in Neural Information Processing Systems 32},
  year={2019}
}

Acknowledgements

This repository is based on Ruotian Luo's self-critical.pytorch.

Code for paper Adaptively Aligned Image Captioning via Adaptive Attention Time

Related tags

Overview

Adaptively Aligned Image Captioning via Adaptive Attention Time

Requirements

Training AAT

Prepare data (with python2)

Training

Evaluation

Reference

Acknowledgements

Owner

Lun Huang

Text Generation by Learning from Demonstrations

EgGateWayGetShell py脚本

Council-GAN - Implementation for our paper Breaking the Cycle - Colleagues are all you need (CVPR 2020)

Implementation of trRosetta and trDesign for Pytorch, made into a convenient package

An Implementation of SiameseRPN with Feature Pyramid Networks

[ICCV 2021 Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

ComputerVision - This repository aims at realized easy network architecture

SlideGraph+: Whole Slide Image Level Graphs to Predict HER2 Status in Breast Cancer

Simple, but essential Bayesian optimization package

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

A treasure chest for visual recognition powered by PaddlePaddle

Learning Versatile Neural Architectures by Propagating Network Codes

Problem-943.-ACMP - Problem 943. ACMP

[NeurIPS'21] Shape As Points: A Differentiable Poisson Solver

Research code for the paper "Variational Gibbs inference for statistical estimation from incomplete data".

PyDEns is a framework for solving Ordinary and Partial Differential Equations (ODEs & PDEs) using neural networks

A minimal solution to hand motion capture from a single color camera at over 100fps. Easy to use, plug to run.

Image Captioning using CNN and Transformers

Implementation for paper LadderNet: Multi-path networks based on U-Net for medical image segmentation

Depression Asisstant GDSC Challenge Solution