Code accompanying the paper Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs (Chen et al., CVPR 2020, Oral).

Last update: Dec 29, 2022

Related tags

Deep Learning asg2cap

Overview

Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

This repository contains PyTorch implementation of our paper Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs (CVPR 2020).

Prerequisites

Python 3 and PyTorch 1.3.

# clone the repository
git clone https://github.com/cshizhe/asg2cap.git
cd asg2cap
# clone caption evaluation codes
git clone https://github.com/cshizhe/eval_cap.git
export PYTHONPATH=$(pwd):${PYTHONPATH}

Training & Inference

cd controlimcap/driver

# support caption models: [node, node.role, 
# rgcn, rgcn.flow, rgcn.memory, rgcn.flow.memory]
# see our paper for details
mtype=rgcn.flow.memory 

# setup config files
# you should modify data paths in configs/prepare_*_imgsg_config.py
python configs/prepare_coco_imgsg_config.py $mtype
resdir='' # copy the output string of the previous step

# training
python asg2caption.py $resdir/model.json $resdir/path.json $mtype --eval_loss --is_train --num_workers 8

# inference
python asg2caption.py $resdir/model.json $resdir/path.json $mtype --eval_set tst --num_workers 8

Datasets

Annotations

Annotations for MSCOCO and VisualGenome datasets can be download from GoogleDrive.

(Image, ASG, Caption) annotations: regionfiles/image_id.json

JSON Format:
{
	"region_id": {
		"objects":[
			{
	     		"object_id": int, 
	     		"name": str, 
	     		"attributes": [str],
				"x": int,
				"y": int, 
				"w": int, 
				"h": int
			}],
  	  "relationships": [
			{
				"relationship_id": int,
				"subject_id": int,
				"object_id": int,
				"name": str
			}],
  	  "phrase": str,
  }
}

vocabularies int2word.npy: [word] word2int.json: {word: int}
data splits: public_split directory trn_names.npy, val_names.npy, tst_names.npy

Features

Features for MSCOCO and VisualGenome datasets are available at BaiduNetdisk (code: 6q32).

We also provide pretrained models and codes to extract features for new images.

Global Image Feature: the last mean pooling feature of ResNet101 pretrained on ImageNet

format: npy array, shape=(num_fts, dim_ft) corresponding to the order in data_split names

Region Image Feature: fc7 layer of Faster-RCNN pretrained on VisualGenome

format: hdf5 files, "image_id".jpg.hdf5

key: 'image_id'.jpg

attrs: {"image_w": int, "image_h": int, "boxes": 4d array (x1, y1, x2, y2)}

Result Visualization

Citations

If you use this code as part of any published research, we'd really appreciate it if you could cite the following paper:

@article{chen2020say,
  title={Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs},
  author={Chen, Shizhe and Jin, Qin and Wang, Peng and Wu, Qi},
  journal={CVPR},
  year={2020}
}

License

MIT License

Code accompanying the paper Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs (Chen et al., CVPR 2020, Oral).

Related tags

Overview

Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

Prerequisites

Training & Inference

Datasets

Annotations

Features

Result Visualization

Citations

License

Owner

Shizhe Chen

This folder contains the python code of UR5E's advanced forward kinematics model.

This repository compare a selfie with images from identity documents and response if the selfie match.

This repository contains the code for our fast polygonal building extraction from overhead images pipeline.

Stratified Transformer for 3D Point Cloud Segmentation (CVPR 2022)

CvT-ASSD: Convolutional vision-Transformerbased Attentive Single Shot MultiBox Detector (ICTAI 2021 CCF-C 会议)The 33rd IEEE International Conference on Tools with Artificial Intelligence

AgeGuesser: deep learning based age estimation system. Powered by EfficientNet and Yolov5

Package for working with hypernetworks in PyTorch.

Azion the best solution of Edge Computing in the world.

Statsmodels: statistical modeling and econometrics in Python

NEG loss implemented in pytorch

Code repository for Self-supervised Structure-sensitive Learning, CVPR'17

For visualizing the dair-v2x-i dataset

Select, weight and analyze complex sample data

Uncertainty-aware Semantic Segmentation of LiDAR Point Clouds for Autonomous Driving

Plugin for Gaffer providing direct acess to asset from PolyHaven.com. Only HDRIs at the moment, Cycles and Arnold supported

A PyTorch implementation of " EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks."

Instant-nerf-pytorch - NeRF trained SUPER FAST in pytorch

Finite Element Analysis

Adaptive Attention Span for Reinforcement Learning

Context-Aware Image Matting for Simultaneous Foreground and Alpha Estimation