CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

Last update: Dec 29, 2022

Overview

CLIP-GEN

[简体中文][English]

本项目在萤火二号集群上用 PyTorch 实现了论文《CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP》。

CLIP-GEN 是一个 Language-Free 的文本生成图像的方法，它不依赖图文训练样本，通过预训练 CLIP 模型的强大表征能力，只需要图片数据就可以训练出一个文本生成图像的模型。该方法的基本原理是：CLIP-GEN 首先会训练一个 VQ-GAN，把图片映射到离散空间；然后再训练一个 GPT 模型，把 CLIP embedding 映射到 VQ-GAN 的离散空间；由于在 CLIP 中，文本和图像共享一个特征空间，在 inference 的时候我们就可以通过同样的方法把文本映射到 VQ-GAN 的离散空间，然后 decode 为 RGB 图像。

Requirements

hfai (to be released soon)
torch>=1.8

Training

支持的数据集：coco, imagenet, googlecc

下载 CLIP 预训练模型

下载 CLIP 后放至 pretrained/clip_vit_b32.pt，该预训练模型来自 OpenAI.

在 COCO 上训练 VQGAN

提交任务至萤火集群：

hfai python train_vqgan.py --ds coco -- -n 1 -p 30

本地运行：

python train_vqgan.py --ds coco

在 COCO 上训练 Conditional GPT

提交任务至萤火集群：

hfai python train_gpt.py --ds coco --vqgan_ckpt /path/to/vqgan/ckpt -- -n 4 -p 30

本地运行：

python train_gpt.py --ds coco --vqgan_ckpt /path/to/vqgan/ckpt

Demo

下载在 COCO 上训练好的 VQGAN 和 GPT 模型，分别放到 pretrained/vqgan_coco.pt 和 pretrained/gpt_coco.pt；然后运行：

python demo.py --text "A city bus driving on the city street" --out "bus.jpg"

NOTE: demo 的运行不依赖 hfai，用户可以在装有 PyTorch 的环境下直接使用

Samples

下面是一些文本生成图像的样本：

References

Citation

@article{wang2022clip,
  title={CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP},
  author={Wang, Zihao and Liu, Wei and He, Qian and Wu, Xinglong and Yi, Zili},
  journal={arXiv preprint arXiv:2203.00386},
  year={2022}
}

TODO

预训练模型
FFRecord 数据

You might also like...

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

VQGAN-CLIP-Docker About Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized This is a stripped and minimal dependency repository for running loca

73 Sep 11, 2022

A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

175 Dec 29, 2022

Text2Art is an AI art generator powered with VQGAN + CLIP and CLIPDrawer models

Text2Art is an AI art generator powered with VQGAN + CLIP and CLIPDrawer models. You can easily generate all kind of art from drawing, painting, sketch, or even a specific artist style just using a text input. You can also specify the dimensions of the image. The process can take 3-20 mins and the results will be emailed to you.

643 Dec 30, 2022

A 1.3B text-to-image generation model trained on 14 million image-text pairs

minDALL-E on Conceptual Captions minDALL-E, named after minGPT, is a 1.3B text-to-image generation model trained on 14 million image-text pairs for no

604 Dec 14, 2022

A PyTorch Lightning solution to training OpenAI's CLIP from scratch.

train-CLIP 📎 A PyTorch Lightning solution to training CLIP from scratch. Goal ⚽ Our aim is to create an easy to use Lightning implementation of OpenA

396 Dec 30, 2022

[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

DataFree A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation" Authors: Gongfa

47 Jan 9, 2023

Free-duolingo-plus - Duolingo account creator that uses your invite code to get you free duolingo plus

free-duolingo-plus duolingo account creator that uses your invite code to get yo

1 Jan 6, 2022

Official implementation of SynthTIGER (Synthetic Text Image GEneratoR) ICDAR 2021

🐯 SynthTIGER: Synthetic Text Image GEneratoR Official implementation of SynthTIGER | Paper | Datasets Moonbin Yim1, Yoonsik Kim1, Han-cheol Cho1, Sun

256 Jan 5, 2023

The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

SuperGen The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding. Requirements Before running, you

38 Dec 12, 2022

Comments

"nn.TransformerEncoderLayer" is adopted to construct the "conditonal transformer" in your paper.

Thanks for your great work.

I noticed that you utilize "nn.TransformerEncoderLayer" when constructing "conditional transformer". Since it is used to predict the next token index, I am wondering whether the decoder of transformer is more appropriate for the construction of your conditional transformer? or what's the reason that you don't adopt "nn.TransformerdecoderLayer" ?

Because of the structure of "nn.TransformerEncoderLayer" is simpler or more concise than that of "nn.TransformerDEcoderLayer" ?

opened by fido20160817 0
Add Web Demo & Docker environment

This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

This also means we can make a web page where other people can try out your model, view it here: https://replicate.com/hfailab/clip-gen. You can find the docker file under the tab ‘run model with docker’.

We have added some examples to the page, but do claim the page so you can own the page, customise the Example gallery as you like, push any future update to the web demo, and we'll feature it on our website and tweet about it too. You can find the 'Claim this model' button on the top of the page. Any member of the HFAiLab organization on GitHub can claim the model ~ When the page is claimed, it will be automatically linked to the arXiv website as well (under “Demos”).

In case you're wondering who I am, I'm from Replicate, where we're trying to make machine learning reproducible. We got frustrated that we couldn't run all the really interesting ML work being done. So, we're going round implementing models we like. 😊

opened by chenxwh 0

Releases(v0.1.0)

v0.1.0(May 5, 2022)

Source code(tar.gz)
Source code(zip)
clip_vit_b32.pt(577.18 MB)
gpt_coco.pt(1285.89 MB)
vqgan_coco.pt(300.86 MB)

Owner

GitHub Repository

COLMAP - Structure-from-Motion and Multi-View Stereo

COLMAP About COLMAP is a general-purpose Structure-from-Motion (SfM) and Multi-View Stereo (MVS) pipeline with a graphical and command-line interface.

4.7k Jan 07, 2023

Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study.

APR The repo for the paper Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study. Environment setu

8 Nov 26, 2022

PyTorch implementation of the REMIND method from our ECCV-2020 paper "REMIND Your Neural Network to Prevent Catastrophic Forgetting"

REMIND Your Neural Network to Prevent Catastrophic Forgetting This is a PyTorch implementation of the REMIND algorithm from our ECCV-2020 paper. An ar

72 Nov 27, 2022

(NeurIPS 2021) Pytorch implementation of paper "Re-ranking for image retrieval and transductive few-shot classification"

SSR (NeurIPS 2021) Pytorch implementation of paper "Re-ranking for image retrieval and transductivefew-shot classification" [Paper] [Project webpage]

29 Dec 06, 2022

CyTran: Cycle-Consistent Transformers for Non-Contrast to Contrast CT Translation

CyTran: Cycle-Consistent Transformers for Non-Contrast to Contrast CT Translation We propose a novel approach to translate unpaired contrast computed

13 Jan 02, 2023

PyTorch Implementation of AnimeGANv2

PyTorch implementation of AnimeGANv2

4k Jan 07, 2023

ObjectDrawer-ToolBox: a graphical image annotation tool to generate ground plane masks for a 3D object reconstruction system

ObjectDrawer-ToolBox is a graphical image annotation tool to generate ground plane masks for a 3D object reconstruction system, Object Drawer.

77 Jan 05, 2023

A 1.3B text-to-image generation model trained on 14 million image-text pairs

minDALL-E on Conceptual Captions minDALL-E, named after minGPT, is a 1.3B text-to-image generation model trained on 14 million image-text pairs for no

604 Dec 14, 2022

A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

WILDS is a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, from tumor identification to wildlife monitoring to poverty mapping.

437 Dec 30, 2022

End-to-end machine learning project for rices detection

Basmatinet Welcome to this project folks ! Whether you like it or not this project is all about riiiiice or riz in french. It is also about Deep Learn

47 Jun 18, 2022

A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

collie_recs Collie is a library for preparing, training, and evaluating implicit deep learning hybrid recommender systems, named after the Border Coll

97 Jan 03, 2023

A minimal TPU compatible Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

NeRF Minimal Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Result of Tiny-NeRF RGB Depth

11 Jul 24, 2022

NLG evaluation via Statistical Measures of Similarity: BaryScore, DepthScore, InfoLM

NLG evaluation via Statistical Measures of Similarity: BaryScore, DepthScore, InfoLM Automatic Evaluation Metric described in the papers BaryScore (EM

28 Dec 28, 2022

This repository contains demos I made with the Transformers library by HuggingFace.

Transformers-Tutorials Hi there! This repository contains demos I made with the Transformers library by 🤗 HuggingFace. Currently, all of them are imp

3.5k Jan 01, 2023

PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

4 May 08, 2022

On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Findings of EMNLP 2021))

PTvsBT On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Findings of EMNLP 2021) Citation Please cite a

10 Nov 25, 2022

CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

Related tags

Overview

CLIP-GEN

Requirements

Training

Demo

Samples

References

Citation

TODO

You might also like...

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

Text2Art is an AI art generator powered with VQGAN + CLIP and CLIPDrawer models

A 1.3B text-to-image generation model trained on 14 million image-text pairs

A PyTorch Lightning solution to training OpenAI's CLIP from scratch.

[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

Free-duolingo-plus - Duolingo account creator that uses your invite code to get you free duolingo plus

Official implementation of SynthTIGER (Synthetic Text Image GEneratoR) ICDAR 2021

The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

Comments

"nn.TransformerEncoderLayer" is adopted to construct the "conditonal transformer" in your paper.

Add Web Demo & Docker environment

Releases(v0.1.0)

v0.1.0(May 5, 2022)

Owner

COLMAP - Structure-from-Motion and Multi-View Stereo

Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study.

PyTorch implementation of the REMIND method from our ECCV-2020 paper "REMIND Your Neural Network to Prevent Catastrophic Forgetting"

(NeurIPS 2021) Pytorch implementation of paper "Re-ranking for image retrieval and transductive few-shot classification"

CyTran: Cycle-Consistent Transformers for Non-Contrast to Contrast CT Translation

PyTorch Implementation of AnimeGANv2

ObjectDrawer-ToolBox: a graphical image annotation tool to generate ground plane masks for a 3D object reconstruction system

A 1.3B text-to-image generation model trained on 14 million image-text pairs

A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

End-to-end machine learning project for rices detection

A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

A minimal TPU compatible Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

NLG evaluation via Statistical Measures of Similarity: BaryScore, DepthScore, InfoLM

This repository contains demos I made with the Transformers library by HuggingFace.

PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Findings of EMNLP 2021))

The implementation for paper Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets.

👨‍💻 run nanosaur in simulation with Gazebo/Ingnition

Implement the Pareto Optimizer and pcgrad to make a self-adaptive loss for multi-task

Baseline of DCASE 2020 task 4