๐Ÿ‡ฐ๐Ÿ‡ท Text to Image in Korean

Overview

KoDALLE

Open In Colab Wandb Log

image-20211227151557604

Utilizing pretrained language modelโ€™s token embedding layer and position embedding layer as DALLEโ€™s text encoder.

Background

  • Training DALLE model from scratch demands large size paired dataset of images and captions. For example, OpenAI DALLE is trained with more than 250 million text-image pairs for the training.
  • If the dataset isnโ€™t large enough or is limited to specific domains, number of vocabularies in the trained DALLE model are insufficient. For instance, 1 million text captions of K-Fashion dataset only consists of more or less than 300 tokens.
  • Therefore, inferencing from such DALLE models could be problematic if the given sentence query is unconnected to the originally trained captionsโ€™ text dataset.

KoDALLE's Result on Small Size Fashion Dataset

OpenAIโ€™s DALLE KoDALLE of HappyFace
Train Dataset Size 250 Million Pairs 0.8 Million Pairs
#Params 12 Billion 428 Million
#Layers 64 Layers 16 Layers
Computing Resource 1024 x V100 16GB 1 x V100 32GB
Text Encoder 16384 Vocab x 512 Dim BPE 32000 Vocab x 1024 Dim klue/roberta-large
Image Encoder VQVAE VQGAN
Optimizer AdamW AdamW
Learning Rate 4.5e-5 3.0e-5
Weight Decay 4.5e-3 3.0e-3
LR Scheduler ReduceLROnPlateau -

The team constructed Text to Fashion Design DALLE model in Korean language with less than 100k text-image sampled pairs.

Caption ํ•˜์˜์—์„œ ์ƒ‰์ƒ์€ ์Šค์นด์ด๋ธ”๋ฃจ์ด๋‹ค. ์ƒ์˜์—์„œ ๊ธฐ์žฅ์€ ๋กฑ์ด๋‹ค. ์ƒ‰์ƒ์€ ํ™”์ดํŠธ์ด๋‹ค. ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ๋ธ”๋ผ์šฐ์Šค์ด๋‹ค. ๋””ํ…Œ์ผ์—๋Š” ์…”๋ง์ด๋‹ค. ์†Œ๋งค๊ธฐ์žฅ์€ ๋ฐ˜ํŒ”์ด๋‹ค. ์†Œ์žฌ์—๋Š” ์‹คํฌ์ด๋‹ค. ํ”„๋ฆฐํŠธ์—๋Š” ๋ฌด์ง€์ด๋‹ค. ๋„ฅ๋ผ์ธ์€ ๋ธŒ์ด๋„ฅ์ด๋‹ค. ํ•์€ ๋…ธ๋ฉ€
Generated Image image
Caption ์•„์šฐํ„ฐ๋Š” ์ƒ‰์ƒ์ด ์นดํ‚ค ์†Œ์žฌ๊ฐ€ ์šฐ๋ธ ํ•์ด ๋ฃจ์ฆˆ์ธ ์ฝ”ํŠธ์ด๋‹ค. ํ•˜์˜๋Š” ์ƒ‰์ƒ์ด ๋„ค์ด๋น„ ์†Œ์žฌ๊ฐ€ ๋ฐ๋‹˜ ํ•์ด ์Šคํ‚ค๋‹ˆ์ธ ์ฒญ๋ฐ”์ง€์ด๋‹ค.
Generated Image image
Caption ํ•˜์˜์—์„œ ๊ธฐ์žฅ์€ ๋ฐœ๋ชฉ์ด๋‹ค. ์ƒ‰์ƒ์€ ๋ธ”๋ฃจ์ด๋‹ค. ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ์Šค์ปคํŠธ์ด๋‹ค. ์†Œ์žฌ์—๋Š” ๋ฐ๋‹˜์ด๋‹ค. ํ•์€ ์™€์ด๋“œ์ด๋‹ค. ์ƒ์˜์—์„œ ์ƒ‰์ƒ์€ ํ™”์ดํŠธ์ด๋‹ค. ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ๋ธ”๋ผ์šฐ์Šค์ด๋‹ค. ๋””ํ…Œ์ผ์—๋Š” ์…”๋ง์ด๋‹ค. ์†Œ๋งค๊ธฐ์žฅ์€ ๋ฐ˜ํŒ”์ด๋‹ค. ์†Œ์žฌ์—๋Š” ์šฐ๋ธ์ด๋‹ค.
Generated Image image
Caption ์ƒ์˜์—์„œ ๊ธฐ์žฅ์€ ๋…ธ๋ฉ€์ด๋‹ค. ์ƒ์˜์—์„œ ์ƒ‰์ƒ์€ ํ™”์ดํŠธ์ด๋‹ค. ์ƒ์˜์—์„œ ์„œ๋ธŒ์ƒ‰์ƒ์€ ๋ธ”๋ž™์ด๋‹ค. ์ƒ์˜์—์„œ ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ํ‹ฐ์…”์ธ ์ด๋‹ค. ์ƒ์˜์—์„œ ์†Œ๋งค๊ธฐ์žฅ์€ ๋ฐ˜ํŒ”์ด๋‹ค. ์ƒ์˜์—์„œ ์†Œ์žฌ์—๋Š” ์ €์ง€์ด๋‹ค. ์ƒ์˜์—์„œ ํ”„๋ฆฐํŠธ์—๋Š” ๋ ˆํ„ฐ๋ง์ด๋‹ค. ์ƒ์˜์—์„œ ๋„ฅ๋ผ์ธ์€ ๋ผ์šด๋“œ๋„ฅ์ด๋‹ค. ์ƒ์˜์—์„œ ํ•์€ ๋ฃจ์ฆˆ์ด๋‹ค.
Generated Image image

Methodology

Experimentations were conducted with the following Korean Transformers Modelsโ€™ embedding layers. The team selected klue/roberta-large as baseline in the repository considering the size of the model.

KoDALLE with klue/roberta-large's wpe and wte which is trainable on 16GB GPU Google Colab environment. Hyperparams related to the DALLE's model size are following.

'BATCH_SIZE': 32
'DEPTH': 2
'TEXT_SEQ_LEN': 128
'VOCAB_SIZE': 32000
'MODEL_DIM': 1024
'ATTN_TYPES': 'full'
'DIM_HEAD': 64
'HEADS': 8

Significance

  • Offers promising result for training from scratch on specific domains with small size dataset.
  • Introduces solution for domain specific DALLE & CLIP models to be robust on input sentence.
  • Recommends adequate text-to-image model size for given computation resource.
  • Suggests effortless method of creating DALLE & CLIP model for own languages if pretrained language model is available.

WIP

  • Add image-caption reranker(EfficientNet + Klue/roberta-large)
  • Model trained with 500k text-image pairs.
  • Modulize in python code.
  • Update Inference code.
  • Update FID and IS metrics on test and validation dataset.
You might also like...
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach This is the repo to host the dataset TextSeg and code for TexRNe

BARTScore: Evaluating Generated Text as Text Generation
BARTScore: Evaluating Generated Text as Text Generation

This is the Repo for the paper: BARTScore: Evaluating Generated Text as Text Generation Updates 2021.06.28 Release online evaluation Demo 2021.06.25 R

Code for EMNLP 2021 main conference paper
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task Automatic number plate recognition using tech:  Yolo, OCR, Scene text detection, scene text recognation, flask, torch
Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Automatic Number Plate Recognition Automatic Number Plate Recognition (ANPR) is the process of reading the characters on the plate with various optica

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

DALL-E in Pytorch Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ge

Comments
  • Koclip apply in KoDALLE

    Koclip apply in KoDALLE

    ๋ณ€๊ฒฝ์‚ฌํ•ญ

    add) model.py

    ํ˜„์ˆ˜๋‹˜์˜ KoCLIP์ด DALLE Roberta ์—์„œ ์ž‘๋™ํ•˜๊ฒŒ๋” ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ•œ ํŒŒ์ผ์ž…๋‹ˆ๋‹ค.

    dev branch์— ์กด์žฌํ•˜๋Š” model.py ๋น„๊ตํ•˜๋ฉด์„œ ์ˆ˜์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

    add) generate.ipynb

    KoCLIP์ด ์ž‘๋™ํ•˜๋Š”๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋„๋ก ๋งŒ๋“  ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

    opened by JoonHong-Kim 1
  • add: KoCLIP codes

    add: KoCLIP codes

    ๋ณ€๊ฒฝ์‚ฌํ•ญ:

    refactor) clipmodel.py

    • CLIPModel ์ตœ์ข… ๋ฒ„์ „์œผ๋กœ ์ˆ˜์ •
    • clip folder๋กœ ์ด๋™

    add) clip/train_clip.py

    • CLIP ๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉํ•œ ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค

    add) clip/dataloader.py

    • CLIP ๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉํ•œ dataloader ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.
    opened by shawnhyeonsoo 0
  • add skip_sample in TextImageDataset

    add skip_sample in TextImageDataset

    ๋ณ€๊ฒฝ์‚ฌํ•ญ

    modify) loader.py

    • TextImageDataset์—์„œ texts, image๋ฅผ ๋ถˆ๋Ÿฌ์˜ฌ ๋•Œ, data๊ฐ€ ์—†์„ ๊ฒฝ์šฐ ๋ฐœ์ƒํ•˜๋Š” ์—๋Ÿฌ ์ฒ˜๋ฆฌ
    • skip_sample ํ•จ์ˆ˜๋ฅผ ํ™œ์šฉํ•˜์—ฌ error๊ฐ€ ๋ฐœ์ƒํ•  ๊ฒฝ์šฐ, random ํ˜น์€ ๋‹ค์Œ index๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ skip
    • ๊ธฐ์กด train_dalle_gpt_roberta.py๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ˆ˜์ •
    opened by jjonhwa 0
Releases(v0.1.0-beta)
Official implementation for the paper: Multi-label Classification with Partial Annotations using Class-aware Selective Loss

Multi-label Classification with Partial Annotations using Class-aware Selective Loss Paper | Pretrained models Official PyTorch Implementation Emanuel

99 Dec 27, 2022
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

ELECTRA Introduction ELECTRA is a method for self-supervised language representation learning. It can be used to pre-train transformer networks using

Google Research 2.1k Dec 28, 2022
Computational Methods Course at UdeA. Forked and size reduced from:

Computational Methods for Physics & Astronomy Book version at: https://restrepo.github.io/ComputationalMethods by: Sebastian Bustamante 2014/2015 Dieg

Diego Restrepo 11 Sep 10, 2022
Official implementation for paper: A Latent Transformer for Disentangled Face Editing in Images and Videos.

A Latent Transformer for Disentangled Face Editing in Images and Videos Official implementation for paper: A Latent Transformer for Disentangled Face

InterDigital 108 Dec 09, 2022
FedScale: Benchmarking Model and System Performance of Federated Learning

FedScale: Benchmarking Model and System Performance of Federated Learning (Paper) This repository contains scripts and instructions of building FedSca

268 Jan 01, 2023
This is the implementation of our work Deep Extreme Cut (DEXTR), for object segmentation from extreme points.

This is the implementation of our work Deep Extreme Cut (DEXTR), for object segmentation from extreme points.

Sergi Caelles 828 Jan 05, 2023
auto-tuning momentum SGD optimizer

YellowFin YellowFin is an auto-tuning optimizer based on momentum SGD which requires no manual specification of learning rate and momentum. It measure

Jian Zhang 288 Nov 19, 2022
GarmentNets: Category-Level Pose Estimation for Garments via Canonical Space Shape Completion

GarmentNets This repository contains the source code for the paper GarmentNets: Category-Level Pose Estimation for Garments via Canonical Space Shape

Columbia Artificial Intelligence and Robotics Lab 43 Nov 21, 2022
Regulatory Instruments for Fair Personalized Pricing.

Fair pricing Source code for WWW 2022 paper Regulatory Instruments for Fair Personalized Pricing. Installation Requirements Linux with Python = 3.6 p

Renzhe Xu 6 Oct 26, 2022
Pytorch implementation of various High Dynamic Range (HDR) Imaging algorithms

Deep High Dynamic Range Imaging Benchmark This repository is the pytorch impleme

Tianhong Dai 5 Nov 16, 2022
Reinforcement learning algorithms in RLlib

raylab Reinforcement learning algorithms in RLlib and PyTorch. Installation pip install raylab Quickstart Raylab provides agents and environments to b

ร‚ngelo 50 Sep 08, 2022
Lux AI environment interface for RLlib multi-agents

Lux AI interface to RLlib MultiAgentsEnv For Lux AI Season 1 Kaggle competition. LuxAI repo RLlib-multiagents docs Kaggle environments repo Please let

Jaime 12 Nov 07, 2022
Implementation of Memory-Efficient Neural Networks with Multi-Level Generation, ICCV 2021

Memory-Efficient Multi-Level In-Situ Generation (MLG) By Jiaqi Gu, Hanqing Zhu, Chenghao Feng, Mingjie Liu, Zixuan Jiang, Ray T. Chen and David Z. Pan

Jiaqi Gu 2 Jan 04, 2022
TensorFlow 2 implementation of the Yahoo Open-NSFW model

TensorFlow 2 implementation of the Yahoo Open-NSFW model

Bosco Yung 101 Jan 01, 2023
Official PyTorch Implementation of HELP: Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning (NeurIPS 2021 Spotlight)

[NeurIPS 2021 Spotlight] HELP: Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning [Paper] This is Official PyTorch implementatio

42 Nov 01, 2022
Unofficial PyTorch implementation of Fastformer based on paper "Fastformer: Additive Attention Can Be All You Need"."

Fastformer-PyTorch Unofficial PyTorch implementation of Fastformer based on paper Fastformer: Additive Attention Can Be All You Need. Usage : import t

Hong-Jia Chen 126 Dec 06, 2022
Machine learning algorithms for many-body quantum systems

NetKet NetKet is an open-source project delivering cutting-edge methods for the study of many-body quantum systems with artificial neural networks and

NetKet 413 Dec 31, 2022
[ICML 2021] "Graph Contrastive Learning Automated" by Yuning You, Tianlong Chen, Yang Shen, Zhangyang Wang

Graph Contrastive Learning Automated PyTorch implementation for Graph Contrastive Learning Automated [talk] [poster] [appendix] Yuning You, Tianlong C

Shen Lab at Texas A&M University 80 Nov 23, 2022
Advbox is a toolbox to generate adversarial examples that fool neural networks in PaddlePaddleใ€PyTorchใ€Caffe2ใ€MxNetใ€Kerasใ€TensorFlow and Advbox can benchmark the robustness of machine learning models.

Advbox is a toolbox to generate adversarial examples that fool neural networks in PaddlePaddleใ€PyTorchใ€Caffe2ใ€MxNetใ€Kerasใ€TensorFlow and Advbox can benchmark the robustness of machine learning models

AdvBox 1.3k Dec 25, 2022
Active learning for Mask R-CNN in Detectron2

MaskAL - Active learning for Mask R-CNN in Detectron2 Summary MaskAL is an active learning framework that automatically selects the most-informative i

49 Dec 20, 2022