Text to image synthesis using thought vectors

Overview

Text To Image Synthesis Using Thought Vectors

Join the chat at https://gitter.im/text-to-image/Lobby

This is an experimental tensorflow implementation of synthesizing images from captions using Skip Thought Vectors. The images are synthesized using the GAN-CLS Algorithm from the paper Generative Adversarial Text-to-Image Synthesis. This implementation is built on top of the excellent DCGAN in Tensorflow. The following is the model architecture. The blue bars represent the Skip Thought Vectors for the captions.

Model architecture

Image Source : Generative Adversarial Text-to-Image Synthesis Paper

Requirements

Datasets

  • All the steps below for downloading the datasets and models can be performed automatically by running python download_datasets.py. Several gigabytes of files will be downloaded and extracted.
  • The model is currently trained on the flowers dataset. Download the images from this link and save them in Data/flowers/jpg. Also download the captions from this link. Extract the archive, copy the text_c10 folder and paste it in Data/flowers.
  • Download the pretrained models and vocabulary for skip thought vectors as per the instructions given here. Save the downloaded files in Data/skipthoughts.
  • Make empty directories in Data, Data/samples, Data/val_samples and Data/Models. They will be used for sampling the generated images and saving the trained models.

Usage

  • Data Processing : Extract the skip thought vectors for the flowers data set using :
python data_loader.py --data_set="flowers"
  • Training

    • Basic usage python train.py --data_set="flowers"
    • Options
      • z_dim: Noise Dimension. Default is 100.
      • t_dim: Text feature dimension. Default is 256.
      • batch_size: Batch Size. Default is 64.
      • image_size: Image dimension. Default is 64.
      • gf_dim: Number of conv in the first layer generator. Default is 64.
      • df_dim: Number of conv in the first layer discriminator. Default is 64.
      • gfc_dim: Dimension of gen untis for for fully connected layer. Default is 1024.
      • caption_vector_length: Length of the caption vector. Default is 1024.
      • data_dir: Data Directory. Default is Data/.
      • learning_rate: Learning Rate. Default is 0.0002.
      • beta1: Momentum for adam update. Default is 0.5.
      • epochs: Max number of epochs. Default is 600.
      • resume_model: Resume training from a pretrained model path.
      • data_set: Data Set to train on. Default is flowers.
  • Generating Images from Captions

    • Write the captions in text file, and save it as Data/sample_captions.txt. Generate the skip thought vectors for these captions using:
    python generate_thought_vectors.py --caption_file="Data/sample_captions.txt"
    
    • Generate the Images for the thought vectors using:
    python generate_images.py --model_path=<path to the trained model> --n_images=8
    

    n_images specifies the number of images to be generated per caption. The generated images will be saved in Data/val_samples/. python generate_images.py --help for more options.

Sample Images Generated

Following are the images generated by the generative model from the captions.

Caption Generated Images
the flower shown has yellow anther red pistil and bright red petals
this flower has petals that are yellow, white and purple and has dark lines
the petals on this flower are white with a yellow center
this flower has a lot of small round pink petals.
this flower is orange in color, and has petals that are ruffled and rounded.
the flower has yellow petals and the center of it is brown

Implementation Details

  • Only the uni-skip vectors from the skip thought vectors are used. I have not tried training the model with combine-skip vectors.
  • The model was trained for around 200 epochs on a GPU. This took roughly 2-3 days.
  • The images generated are 64 x 64 in dimension.
  • While processing the batches before training, the images are flipped horizontally with a probability of 0.5.
  • The train-val split is 0.75.

Pre-trained Models

  • Download the pretrained model from here and save it in Data/Models. Use this path for generating the images.

TODO

  • Train the model on the MS-COCO data set, and generate more generic images.
  • Try different embedding options for captions(other than skip thought vectors). Also try to train the caption embedding RNN along with the GAN-CLS model.

References

Alternate Implementations

License

MIT

Owner
Paarth Neekhara
PhD student, Computer Science, UCSD
Paarth Neekhara
DROPO: Sim-to-Real Transfer with Offline Domain Randomization

DROPO: Sim-to-Real Transfer with Offline Domain Randomization Gabriele Tiboni, Karol Arndt, Ville Kyrki. This repository contains the code for the pap

Gabriele Tiboni 8 Dec 19, 2022
A vanilla 3D face modeling on pose-invariant and multi-lightning image data

3D-Face-Modeling A vanilla 3D face modeling on pose-invariant and multi-lightning image data Table of Contents Background Install Usage Contributing B

Haochen Zhang 1 Mar 12, 2022
Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

Translated in 🇰🇷 Korean/ Ludwig is a toolbox that allows users to train and test deep learning models without the need to write code. It is built on

Ludwig 8.7k Dec 31, 2022
Locally Most Powerful Bayesian Test for Out-of-Distribution Detection using Deep Generative Models

LMPBT Supplementary code for the Paper entitled ``Locally Most Powerful Bayesian Test for Out-of-Distribution Detection using Deep Generative Models"

1 Sep 29, 2022
This is the code for "HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields".

HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields This is the code for "HyperNeRF: A Higher-Dimensional

Google 702 Jan 02, 2023
Revisting Open World Object Detection

Revisting Open World Object Detection Installation See INSTALL.md. Dataset Our n

58 Dec 23, 2022
TensorFlow implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? Source: Improving Vision Transformer Efficiency and Accuracy by Learning to Tokenize

Aritra Roy Gosthipaty 23 Dec 24, 2022
Trajectory Extraction of road users via Traffic Camera

Traffic Monitoring Citation The associated paper for this project will be published here as soon as possible. When using this software, please cite th

Julian Strosahl 14 Dec 17, 2022
Explicable Reward Design for Reinforcement Learning Agents [NeurIPS'21]

Explicable Reward Design for Reinforcement Learning Agents [NeurIPS'21]

3 May 12, 2022
Accurate Phylogenetic Inference with Symmetry-Preserving Neural Networks

Accurate Phylogenetic Inference with a Symmetry-preserving Neural Network Model Claudia Solis-Lemus Shengwen Yang Leonardo Zepeda-Núñez This repositor

Leonardo Zepeda-Núñez 2 Feb 11, 2022
Face recognition with trained classifiers for detecting objects using OpenCV

Face_Detector Face recognition with trained classifiers for detecting objects using OpenCV Libraries required to be installed using pip Command: cv2 n

Chumui Tripura 0 Oct 31, 2021
GestureSSD CBAM - A gesture recognition web system based on SSD and CBAM, using pytorch, flask and node.js

GestureSSD_CBAM A gesture recognition web system based on SSD and CBAM, using pytorch, flask and node.js SSD implementation is based on https://github

xue_senhua1999 2 Jan 06, 2022
Official PyTorch implementation of "The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation" (ICCV 21).

CenterGroup This the official implementation of our ICCV 2021 paper The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person P

Dynamic Vision and Learning Group 43 Dec 25, 2022
Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices

Intro Real-time object detection and classification. Paper: version 1, version 2. Read more about YOLO (in darknet) and download weight files here. In

Trieu 6.1k Dec 30, 2022
Implementation of the Chamfer Distance as a module for pyTorch

Chamfer Distance for pyTorch This is an implementation of the Chamfer Distance as a module for pyTorch. It is written as a custom C++/CUDA extension.

Christian Diller 205 Jan 05, 2023
graph-theoretic framework for robust pairwise data association

CLIPPER: A Graph-Theoretic Framework for Robust Data Association Data association is a fundamental problem in robotics and autonomy. CLIPPER provides

MIT Aerospace Controls Laboratory 118 Dec 28, 2022
Python program that works as a contact list

Lista de Contatos Programa em Python que funciona como uma lista de contatos. Features Adicionar novo contato Remover contato Atualizar contato Pesqui

Victor B. Lino 3 Dec 16, 2021
A unified 3D Transformer Pipeline for visual synthesis

Overview This is the official repo for the paper: "NÃœWA: Visual Synthesis Pre-training for Neural visUal World creAtion". NÃœWA is a unified multimodal

Microsoft 2.6k Jan 03, 2023