GoodNews Everyone! Context driven entity aware captioning for news images

Related tags

Deep LearningGoodNews
Overview

This is the code for a CVPR 2019 paper, called GoodNews Everyone! Context driven entity aware captioning for news images. Enjoy!

Model preview:

GoodNews Model!

Huge Thanks goes to New York Times API for providing such a service for FREE!

Another Thanks to @ruotianluo for providing the captioning code.

Dependencies/Requirements:

pytorch==1.0.0
spacy==2.0.11
h5py==2.7.0
bs4==4.5.3
joblib==0.12.2
nltk==3.2.3
tqdm==4.19.5
urllib2==2.7
goose==1.0.25
urlparse
unidecode

Introduction

We took the first steps to move the captioning systems to interpretation (see the paper for more detail). To this end, we have used New York Times API to retrieve the articles, images and captions.

The structure of this repo is as follows:

  1. Getting the data
  2. Cleaning and formating the data
  3. How to train models

Get the data

You have 3 options to get the data.

Images only

If you want to download the images only and directly start working on the same dataset as ours, then download the cleaned version of the dataset without images: article+caption.json and put it to data/ folder and download the img_urls.json and put it in the get_data/get_images_only/ folder.

Then run

python get_images.py --num_thread 16

Then, you will get the images. After that move to Clean and Format Data section.

PS: I have recieved numerous emails regarding some of the images not present/broken in the img_urls.json. Which is why I decided to put the images on the drive to download in the name of open science. Download all images

Images + articles

If you would like the get the raw version of the article and captions to do your own cleaning and processing, no worries! First download the article_urls and go to folder get_data/with_article_urls/ and run

python get_data_with_urls.py --num_thread 16
python combine_dataset.py 

This will get you the raw version of the caption, articles and also the images. After that move to Clean and Format Data section.

I want more!

As you know, New York Times is huge. Their articles starts from 1881 (It is crazy!) until well today. So in case you want to get ALL the data or expand the data to more years, then first step is go to New York Times API and get an API key. All you have to do is just sign up for the API key.

Once you have the key go to folder get_data/with_api/ and run

python retrieve_all_urls.py --api-key XXXX --start_year XXX --end_year XXX 

This is for getting the article urls and then saving in the format of month-year. Once you have the all urls from the API, then you run

python get_data_api.py
python combine_dataset.py

get_data_api.py retrieves the articles, captions and images. combine_dataset.py combines yearly data into one file after removing data points if they have corrupt image, empty articles or empty captions. After that move to Clean and Format Data section.

Small Note

I also provide the links to images and their data splits (train, val, test). Even though I always use random seed to decide the split, just in case If the GODS meddles with the random seed, here is the link to a json where you can find each image and its split: img_splits.json

Clean and Format the Data

Now that we have the data, it is time to clean, preprocess and format the data.

Preprocess

When you reach this part, you must have captioning_dataset.json in your data/ folder.

Captions

This part is for cleaning the captions (tokenizing, removing non-ascii characters, etc.), splitting train, val, and test and creating anonymize captions.

In other words, we change the caption "Alber Einstein taught in Princeton in 1926" to "PERSON_ taught in ORGANIZATION_ in DATE_." Move to preprocess/ folder and run

python clean_captions.py

Resize Images

To resize the images to 256x256:

python resize.py --root XXXX --img_size 256

Articles

Get the article format that is needed for the encoding methods by running: create_article_set.py

python create_article_set.py

Format

Now to create H5 file for captions, images and articles, just need to go to scripts/ folder and run in order

python prepro_labels.py --max_length 31 --word_count_threshold 4
python prepro_images.py

We proposed 3 different article encoding method. You can download each of encoded article methods, articles_full_avg_, articles_full_wavg, articles_full_TBB.

Or you can use the code to obtain them:

python prepro_articles_avg.py
python prepro_articles_wavg.py
python prepro_articles_tbb.py

Train

Finally we are ready to train. Magical words are:

python train.py --cnn_weight [YOUR HOME DIRECTORY]/.torch/resnet152-b121ed2d.pth 

You can check the opt.py for changing a lot of the options such dimension size, different models, hyperparameters, etc.

Evaluate

After you train your models, you can get the score according commonly used metrics: Bleu, Cider, Spice, Rouge, Meteor. Be sure to specify model_path, cnn_model_path, infos_path and sen_embed_path when runing eval.py. eval.py is usually used in training but it is necessary to run it to get the insertion.

Insertion

Last but not least insert.py. After you run eval.py, it will produce you a json file with the ids and their template captions. To fill the correct named entity, you have to run insert.py:

python insert.py --output [XXX] --dump [True/False] --insertion_method ['ctx', 'att', 'rand']

PS: I have been requested to provide model's output, so I thought it would be best to share it with everyone. Model Output In this folder, you have:

test.json: Test set with raw and template version of the caption.

article.json: Article sentences which is needed in the insert.py.

w/o article folder: All the models output on template captions, without articles.

with article folder: Our models output in the paper with sentence attention(sen_att) and image attention(vis_att), provided in the json. Hope this is helpful to more of you.

Conclusion

Thank you and sorry for the bugs!

Official code for MPG2: Multi-attribute Pizza Generator: Cross-domain Attribute Control with Conditional StyleGAN

This is the official code for Multi-attribute Pizza Generator (MPG2): Cross-domain Attribute Control with Conditional StyleGAN. Paper Demo Setup Envir

Fangda Han 5 Sep 01, 2022
YuNetのPythonでのONNX、TensorFlow-Lite推論サンプル

YuNet-ONNX-TFLite-Sample YuNetのPythonでのONNX、TensorFlow-Lite推論サンプルです。 TensorFlow-LiteモデルはPINTO0309/PINTO_model_zoo/144_YuNetのものを使用しています。 Requirement Op

KazuhitoTakahashi 8 Nov 17, 2021
Transfer Learning for Pose Estimation of Illustrated Characters

bizarre-pose-estimator Transfer Learning for Pose Estimation of Illustrated Characters Shuhong Chen *, Matthias Zwicker * WACV2022 [arxiv] [video] [po

Shuhong Chen 142 Dec 28, 2022
sktime companion package for deep learning based on TensorFlow

NOTE: sktime-dl is currently being updated to work correctly with sktime 0.6, and wwill be fully relaunched over the summer. The plan is Refactor and

sktime 573 Jan 05, 2023
Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities

ORB-SLAM2 Authors: Raul Mur-Artal, Juan D. Tardos, J. M. M. Montiel and Dorian Galvez-Lopez (DBoW2) 13 Jan 2017: OpenCV 3 and Eigen 3.3 are now suppor

Raul Mur-Artal 7.8k Dec 30, 2022
Western-3DSlicer-Modules - Point-Set Registrations for Ultrasound Probe Calibrations

Point-Set Registrations for Ultrasound Probe Calibrations -Undergraduate Thesis-

Matteo Tanzi 0 May 04, 2022
Lipschitz-constrained Unsupervised Skill Discovery

Lipschitz-constrained Unsupervised Skill Discovery This repository is the official implementation of Seohong Park, Jongwook Choi*, Jaekyeom Kim*, Hong

Seohong Park 17 Dec 18, 2022
[TOG 2021] PyTorch implementation for the paper: SofGAN: A Portrait Image Generator with Dynamic Styling.

This repository contains the official PyTorch implementation for the paper: SofGAN: A Portrait Image Generator with Dynamic Styling. We propose a SofGAN image generator to decouple the latent space o

Anpei Chen 694 Dec 23, 2022
Deep Learning for Human Part Discovery in Images - Chainer implementation

Deep Learning for Human Part Discovery in Images - Chainer implementation NOTE: This is not official implementation. Original paper is Deep Learning f

Shintaro Shiba 63 Sep 25, 2022
This repository contains the code needed to train Mega-NeRF models and generate the sparse voxel octrees

Mega-NeRF This repository contains the code needed to train Mega-NeRF models and generate the sparse voxel octrees used by the Mega-NeRF-Dynamic viewe

cmusatyalab 260 Dec 28, 2022
Styled Augmented Translation

SAT Style Augmented Translation Introduction By collecting high-quality data, we were able to train a model that outperforms Google Translate on 6 dif

139 Dec 29, 2022
Code for NAACL 2021 full paper "Efficient Attentions for Long Document Summarization"

LongDocSum Code for NAACL 2021 paper "Efficient Attentions for Long Document Summarization" This repository contains data and models needed to reprodu

56 Jan 02, 2023
a general-purpose Transformer based vision backbone

Swin Transformer By Ze Liu*, Yutong Lin*, Yue Cao*, Han Hu*, Yixuan Wei, Zheng Zhang, Stephen Lin and Baining Guo. This repo is the official implement

Microsoft 9.9k Jan 08, 2023
Everything about being a TA for ITP/AP course!

تی‌ای بودن! تی‌ای یا دستیار استاد از نقش‌های رایج بین دانشجویان مهندسی است، این ریپوزیتوری قرار است نکات مهم درمورد تی‌ای بودن و تی ای شدن را به ما نش

<a href=[email protected]"> 14 Sep 10, 2022
Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling

274 Dec 06, 2022
VGGVox models for Speaker Identification and Verification trained on the VoxCeleb (1 & 2) datasets

VGGVox models for speaker identification and verification This directory contains code to import and evaluate the speaker identification and verificat

338 Dec 27, 2022
Einshape: DSL-based reshaping library for JAX and other frameworks.

Einshape: DSL-based reshaping library for JAX and other frameworks. The jnp.einsum op provides a DSL-based unified interface to matmul and tensordot o

DeepMind 62 Nov 30, 2022
Code for Max-Margin Contrastive Learning - AAAI 2022

Max-Margin Contrastive Learning This is a pytorch implementation for the paper Max-Margin Contrastive Learning accepted to AAAI 2022. This repository

Anshul Shah 12 Oct 22, 2022
Exe-to-xlsm - Simple script to create VBscript of exe and inject to xlsm

🎁 Exe To Office Executable file injection to Office documents: .xlsm, .docm, .p

3 Jan 25, 2022