NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.

Overview

#NeuralTalk

Warning: Deprecated. Hi there, this code is now quite old and inefficient, and now deprecated. I am leaving it on Github for educational purposes, but if you would like to run or train image captioning I warmly recommend my new code release NeuralTalk2. NeuralTalk2 is written in Torch and is SIGNIFICANTLY (I mean, ~100x+) faster because it is batched and runs on the GPU. It also supports CNN finetuning, which helps a lot with performance.

This project contains Python+numpy source code for learning Multimodal Recurrent Neural Networks that describe images with sentences.

This line of work was recently featured in a New York Times article and has been the subject of multiple academic papers from the research community over the last few months. This code currently implements the models proposed by Vinyals et al. from Google (CNN + LSTM) and by Karpathy and Fei-Fei from Stanford (CNN + RNN). Both models take an image and predict its sentence description with a Recurrent Neural Network (either an LSTM or an RNN).

Overview

The pipeline for the project looks as follows:

  • The input is a dataset of images and 5 sentence descriptions that were collected with Amazon Mechanical Turk. In particular, this code base is set up for Flickr8K, Flickr30K, and MSCOCO datasets.
  • In the training stage, the images are fed as input to RNN and the RNN is asked to predict the words of the sentence, conditioned on the current word and previous context as mediated by the hidden layers of the neural network. In this stage, the parameters of the networks are trained with backpropagation.
  • In the prediction stage, a witheld set of images is passed to RNN and the RNN generates the sentence one word at a time. The results are evaluated with BLEU score. The code also includes utilities for visualizing the results in HTML.

Dependencies

Python 2.7, modern version of numpy/scipy, perl (if you want to do BLEU score evaluation), argparse module. Most of these are okay to install with pip. To install all dependencies at once, run the command pip install -r requirements.txt

I only tested this code with Ubuntu 12.04, but I tried to make it as generic as possible (e.g. use of os module for file system interactions etc. So it might work on Windows and Mac relatively easily.)

Protip: you really want to link your numpy to use a BLAS implementation for its matrix operations. I use virtualenv and link numpy against a system installation of OpenBLAS. Doing this will make this code almost an order of time faster because it relies very heavily on large matrix multiplies.

Getting started

  1. Get the code. $ git clone the repo and install the Python dependencies
  2. Get the data. I don't distribute the data in the Git repo, instead download the data/ folder from here. Also, this download does not include the raw image files, so if you want to visualize the annotations on raw images, you have to obtain the images from Flickr8K / Flickr30K / COCO directly and dump them into the appropriate data folder.
  3. Train the model. Run the training $ python driver.py (see many additional argument settings inside the file) and wait. You'll see that the learning code writes checkpoints into cv/ and periodically reports its status in status/ folder.
  4. Monitor the training. The status can be inspected manually by reading the JSON and printing whatever you wish in a second process. In practice I run cross-validations on a cluster, so my cv/ folder fills up with a lot of checkpoints that I further filter and inspect with other scripts. I am including my cluster training status visualization utility as well if you like. Run a local webserver (e.g. $ python -m SimpleHTTPServer 8123) and then open monitorcv.html in your browser on http://localhost:8123/monitorcv.html, or whatever the web server tells you the path is. You will have to edit the file to setup the paths properly and point it at the right json files.
  5. Evaluate model checkpoints. To evaluate a checkpoint from cv/, run the evaluate_sentence_predctions.py script and pass it the path to a checkpoint.
  6. Visualize the predictions. Use the included html file visualize_result_struct.html to visualize the JSON struct produced by the evaluation code. This will visualize the images and their predictions. Note that you'll have to download the raw images from the individual dataset pages and place them into the corresponding data/ folder.

Lastly, note that this is currently research code, so a lot of the documentation is inside individual Python files. If you wish to work with this code, you'll have to get familiar with it and be comfortable reading Python code.

Pretrained model

Some pretrained models can be found in the NeuralTalk Model Zoo. The slightly hairy part is that if you wish to apply these models to some arbitrary new image (one not from Flickr8k/30k/COCO) you have to first extract the CNN features. I use the 16-layer VGG network from Simonyan and Zisserman, because the model is beautiful, powerful and available with Caffe. There is opportunity for putting the preprocessing and inference into a single nice function that uses the Python wrapper to get the features and then runs the pretrained sentence model. I might add this in the future.

Using the model to predict on new images

The code allows you to easily predict and visualize results of running the model on COCO/Flickr8K/Flick30K images. If you want to run the code on arbitrary image (e.g. on your file system), things get a little more complicated because we need to first need to pipe your image through the VGG CNN to get the 4096-D activations on top.

Have a look inside the folder example_images for instructions on how to do this. Currently, the code for extracting the raw features from each image is in Matlab, so you will need it installed on your system. Caffe also has a wrapper for Python, but I wasn't yet able to use the Python wrapper to exactly reproduce the features I get from Matlab. The example_images will walk you through the process, and you will eventually use predict_on_images.py to run the prediction.

Using your own data

The input to the system is the data folder, which contains the Flickr8K, Flickr30K and MSCOCO datasets. In particular, each folder (e.g. data/flickr8k) contains a dataset.json file that stores the image paths and sentences in the dataset (all images, sentences, raw preprocessed tokens, splits, and the mappings between images and sentences). Each folder additionally contains vgg_feats.mat , which is a .mat file that stores the CNN features from all images, one per row, using the VGG Net from ILSVRC 2014. Finally, there is the imgs/ folder that holds the raw images. I also provide the Matlab script that I used to extract the features, which you may find helpful if you wish to use a different dataset. This is inside the matlab_features_reference/ folder, and see the Readme file in that folder for more information.

License

BSD license.

Owner
Andrej
I like to train Deep Neural Nets on large datasets.
Andrej
百度2021年语言与智能技术竞赛机器阅读理解Pytorch版baseline

项目说明: 百度2021年语言与智能技术竞赛机器阅读理解Pytorch版baseline 比赛链接:https://aistudio.baidu.com/aistudio/competition/detail/66?isFromLuge=true 官方的baseline版本是基于paddlepadd

周俊贤 54 Nov 23, 2022
Implementation of ResMLP, an all MLP solution to image classification, in Pytorch

ResMLP - Pytorch Implementation of ResMLP, an all MLP solution to image classification out of Facebook AI, in Pytorch Install $ pip install res-mlp-py

Phil Wang 178 Dec 02, 2022
A Novel Incremental Learning Driven Instance Segmentation Framework to Recognize Highly Cluttered Instances of the Contraband Items

A Novel Incremental Learning Driven Instance Segmentation Framework to Recognize Highly Cluttered Instances of the Contraband Items This repository co

Taimur Hassan 3 Mar 16, 2022
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers Results results on COCO val Backbone Method Lr Schd PQ Config Download

155 Dec 20, 2022
AFLNet: A Greybox Fuzzer for Network Protocols

AFLNet: A Greybox Fuzzer for Network Protocols AFLNet is a greybox fuzzer for protocol implementations. Unlike existing protocol fuzzers, it takes a m

626 Jan 06, 2023
Code for the paper "Reinforced Active Learning for Image Segmentation"

Reinforced Active Learning for Image Segmentation (RALIS) Code for the paper Reinforced Active Learning for Image Segmentation Dependencies python 3.6

Arantxa Casanova 79 Dec 19, 2022
Adversarial Reweighting for Partial Domain Adaptation

Adversarial Reweighting for Partial Domain Adaptation Code for paper "Xiang Gu, Xi Yu, Yan Yang, Jian Sun, Zongben Xu, Adversarial Reweighting for Par

12 Dec 01, 2022
Learn other languages ​​using artificial intelligence with python.

The main idea of ​​the project is to facilitate the learning of other languages. We created a simple AI that will interact with you. Just ask questions that if she knows, she will answer.

Pedro Rodrigues 2 Jun 07, 2022
This is the official code of our paper "Diversity-based Trajectory and Goal Selection with Hindsight Experience Relay" (PRICAI 2021)

Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay This is the official implementation of our paper "Diversity-based Traje

Tianhong Dai 6 Jul 18, 2022
FrankMocap: A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator

FrankMocap pursues an easy-to-use single view 3D motion capture system developed by Facebook AI Research (FAIR). FrankMocap provides state-of-the-art 3D pose estimation outputs for body, hand, and bo

Facebook Research 1.9k Jan 07, 2023
AI Face Mesh: This is a simple face mesh detection program based on Artificial intelligence.

AI Face Mesh: This is a simple face mesh detection program based on Artificial Intelligence which made with Python. It's able to detect 468 different

Md. Rakibul Islam 1 Jan 13, 2022
Fast, flexible and easy to use probabilistic modelling in Python.

Please consider citing the JMLR-MLOSS Manuscript if you've used pomegranate in your academic work! pomegranate is a package for building probabilistic

Jacob Schreiber 3k Dec 29, 2022
Pytorch implementation for RelTransformer

RelTransformer Our Architecture This is a Pytorch implementation for RelTransformer The implementation for Evaluating on VG200 can be found here Requi

Vision CAIR Research Group, KAUST 21 Nov 22, 2022
Python package for Bayesian Machine Learning with scikit-learn API

Python package for Bayesian Machine Learning with scikit-learn API Installing & Upgrading package pip install https://github.com/AmazaspShumik/sklearn

Amazasp Shaumyan 482 Jan 04, 2023
Integrated physics-based and ligand-based modeling.

ComBind ComBind integrates data-driven modeling and physics-based docking for improved binding pose prediction and binding affinity prediction. Given

Dror Lab 44 Oct 26, 2022
Text Summarization - WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

Text Summarization WCN — Weighted Contextual N-gram method for evaluation of Text Summarization In this project, I fine tune T5 model on Extreme Summa

Aditya Shah 1 Jan 03, 2022
Tensorflow AffordanceNet and AffContext implementations

AffordanceNet and AffContext This is tensorflow AffordanceNet and AffContext implementations. Both are implemented and tested with tensorflow 2.3. The

Beatriz Pérez 6 Dec 01, 2022
Implement Decoupled Neural Interfaces using Synthetic Gradients in Pytorch

disclaimer: this code is modified from pytorch-tutorial Image classification with synthetic gradient in Pytorch I implement the Decoupled Neural Inter

Andrew 114 Dec 22, 2022
[SIGGRAPH Asia 2021] Pose with Style: Detail-Preserving Pose-Guided Image Synthesis with Conditional StyleGAN

Pose with Style: Detail-Preserving Pose-Guided Image Synthesis with Conditional StyleGAN [Paper] [Project Website] [Output resutls] Official Pytorch i

Badour AlBahar 215 Dec 17, 2022
Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

Minimal PyTorch implementation of Generative Latent Optimization This is a reimplementation of the paper Piotr Bojanowski, Armand Joulin, David Lopez-

Thomas Neumann 117 Nov 27, 2022