๐Ÿ‡ฐ๐Ÿ‡ท Text to Image in Korean

Overview

KoDALLE

Open In Colab Wandb Log

image-20211227151557604

Utilizing pretrained language modelโ€™s token embedding layer and position embedding layer as DALLEโ€™s text encoder.

Background

  • Training DALLE model from scratch demands large size paired dataset of images and captions. For example, OpenAI DALLE is trained with more than 250 million text-image pairs for the training.
  • If the dataset isnโ€™t large enough or is limited to specific domains, number of vocabularies in the trained DALLE model are insufficient. For instance, 1 million text captions of K-Fashion dataset only consists of more or less than 300 tokens.
  • Therefore, inferencing from such DALLE models could be problematic if the given sentence query is unconnected to the originally trained captionsโ€™ text dataset.

KoDALLE's Result on Small Size Fashion Dataset

OpenAIโ€™s DALLE KoDALLE of HappyFace
Train Dataset Size 250 Million Pairs 0.8 Million Pairs
#Params 12 Billion 428 Million
#Layers 64 Layers 16 Layers
Computing Resource 1024 x V100 16GB 1 x V100 32GB
Text Encoder 16384 Vocab x 512 Dim BPE 32000 Vocab x 1024 Dim klue/roberta-large
Image Encoder VQVAE VQGAN
Optimizer AdamW AdamW
Learning Rate 4.5e-5 3.0e-5
Weight Decay 4.5e-3 3.0e-3
LR Scheduler ReduceLROnPlateau -

The team constructed Text to Fashion Design DALLE model in Korean language with less than 100k text-image sampled pairs.

Caption ํ•˜์˜์—์„œ ์ƒ‰์ƒ์€ ์Šค์นด์ด๋ธ”๋ฃจ์ด๋‹ค. ์ƒ์˜์—์„œ ๊ธฐ์žฅ์€ ๋กฑ์ด๋‹ค. ์ƒ‰์ƒ์€ ํ™”์ดํŠธ์ด๋‹ค. ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ๋ธ”๋ผ์šฐ์Šค์ด๋‹ค. ๋””ํ…Œ์ผ์—๋Š” ์…”๋ง์ด๋‹ค. ์†Œ๋งค๊ธฐ์žฅ์€ ๋ฐ˜ํŒ”์ด๋‹ค. ์†Œ์žฌ์—๋Š” ์‹คํฌ์ด๋‹ค. ํ”„๋ฆฐํŠธ์—๋Š” ๋ฌด์ง€์ด๋‹ค. ๋„ฅ๋ผ์ธ์€ ๋ธŒ์ด๋„ฅ์ด๋‹ค. ํ•์€ ๋…ธ๋ฉ€
Generated Image image
Caption ์•„์šฐํ„ฐ๋Š” ์ƒ‰์ƒ์ด ์นดํ‚ค ์†Œ์žฌ๊ฐ€ ์šฐ๋ธ ํ•์ด ๋ฃจ์ฆˆ์ธ ์ฝ”ํŠธ์ด๋‹ค. ํ•˜์˜๋Š” ์ƒ‰์ƒ์ด ๋„ค์ด๋น„ ์†Œ์žฌ๊ฐ€ ๋ฐ๋‹˜ ํ•์ด ์Šคํ‚ค๋‹ˆ์ธ ์ฒญ๋ฐ”์ง€์ด๋‹ค.
Generated Image image
Caption ํ•˜์˜์—์„œ ๊ธฐ์žฅ์€ ๋ฐœ๋ชฉ์ด๋‹ค. ์ƒ‰์ƒ์€ ๋ธ”๋ฃจ์ด๋‹ค. ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ์Šค์ปคํŠธ์ด๋‹ค. ์†Œ์žฌ์—๋Š” ๋ฐ๋‹˜์ด๋‹ค. ํ•์€ ์™€์ด๋“œ์ด๋‹ค. ์ƒ์˜์—์„œ ์ƒ‰์ƒ์€ ํ™”์ดํŠธ์ด๋‹ค. ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ๋ธ”๋ผ์šฐ์Šค์ด๋‹ค. ๋””ํ…Œ์ผ์—๋Š” ์…”๋ง์ด๋‹ค. ์†Œ๋งค๊ธฐ์žฅ์€ ๋ฐ˜ํŒ”์ด๋‹ค. ์†Œ์žฌ์—๋Š” ์šฐ๋ธ์ด๋‹ค.
Generated Image image
Caption ์ƒ์˜์—์„œ ๊ธฐ์žฅ์€ ๋…ธ๋ฉ€์ด๋‹ค. ์ƒ์˜์—์„œ ์ƒ‰์ƒ์€ ํ™”์ดํŠธ์ด๋‹ค. ์ƒ์˜์—์„œ ์„œ๋ธŒ์ƒ‰์ƒ์€ ๋ธ”๋ž™์ด๋‹ค. ์ƒ์˜์—์„œ ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ํ‹ฐ์…”์ธ ์ด๋‹ค. ์ƒ์˜์—์„œ ์†Œ๋งค๊ธฐ์žฅ์€ ๋ฐ˜ํŒ”์ด๋‹ค. ์ƒ์˜์—์„œ ์†Œ์žฌ์—๋Š” ์ €์ง€์ด๋‹ค. ์ƒ์˜์—์„œ ํ”„๋ฆฐํŠธ์—๋Š” ๋ ˆํ„ฐ๋ง์ด๋‹ค. ์ƒ์˜์—์„œ ๋„ฅ๋ผ์ธ์€ ๋ผ์šด๋“œ๋„ฅ์ด๋‹ค. ์ƒ์˜์—์„œ ํ•์€ ๋ฃจ์ฆˆ์ด๋‹ค.
Generated Image image

Methodology

Experimentations were conducted with the following Korean Transformers Modelsโ€™ embedding layers. The team selected klue/roberta-large as baseline in the repository considering the size of the model.

KoDALLE with klue/roberta-large's wpe and wte which is trainable on 16GB GPU Google Colab environment. Hyperparams related to the DALLE's model size are following.

'BATCH_SIZE': 32
'DEPTH': 2
'TEXT_SEQ_LEN': 128
'VOCAB_SIZE': 32000
'MODEL_DIM': 1024
'ATTN_TYPES': 'full'
'DIM_HEAD': 64
'HEADS': 8

Significance

  • Offers promising result for training from scratch on specific domains with small size dataset.
  • Introduces solution for domain specific DALLE & CLIP models to be robust on input sentence.
  • Recommends adequate text-to-image model size for given computation resource.
  • Suggests effortless method of creating DALLE & CLIP model for own languages if pretrained language model is available.

WIP

  • Add image-caption reranker(EfficientNet + Klue/roberta-large)
  • Model trained with 500k text-image pairs.
  • Modulize in python code.
  • Update Inference code.
  • Update FID and IS metrics on test and validation dataset.
You might also like...
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach This is the repo to host the dataset TextSeg and code for TexRNe

BARTScore: Evaluating Generated Text as Text Generation
BARTScore: Evaluating Generated Text as Text Generation

This is the Repo for the paper: BARTScore: Evaluating Generated Text as Text Generation Updates 2021.06.28 Release online evaluation Demo 2021.06.25 R

Code for EMNLP 2021 main conference paper
Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Text-AutoAugment (TAA) This repository contains the code for our paper Text AutoAugment: Learning Compositional Augmentation Policy for Text Classific

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task Automatic number plate recognition using tech:  Yolo, OCR, Scene text detection, scene text recognation, flask, torch
Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Automatic Number Plate Recognition Automatic Number Plate Recognition (ANPR) is the process of reading the characters on the plate with various optica

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

DALL-E in Pytorch Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ge

Comments
  • Koclip apply in KoDALLE

    Koclip apply in KoDALLE

    ๋ณ€๊ฒฝ์‚ฌํ•ญ

    add) model.py

    ํ˜„์ˆ˜๋‹˜์˜ KoCLIP์ด DALLE Roberta ์—์„œ ์ž‘๋™ํ•˜๊ฒŒ๋” ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ•œ ํŒŒ์ผ์ž…๋‹ˆ๋‹ค.

    dev branch์— ์กด์žฌํ•˜๋Š” model.py ๋น„๊ตํ•˜๋ฉด์„œ ์ˆ˜์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

    add) generate.ipynb

    KoCLIP์ด ์ž‘๋™ํ•˜๋Š”๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋„๋ก ๋งŒ๋“  ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

    opened by JoonHong-Kim 1
  • add: KoCLIP codes

    add: KoCLIP codes

    ๋ณ€๊ฒฝ์‚ฌํ•ญ:

    refactor) clipmodel.py

    • CLIPModel ์ตœ์ข… ๋ฒ„์ „์œผ๋กœ ์ˆ˜์ •
    • clip folder๋กœ ์ด๋™

    add) clip/train_clip.py

    • CLIP ๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉํ•œ ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค

    add) clip/dataloader.py

    • CLIP ๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉํ•œ dataloader ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.
    opened by shawnhyeonsoo 0
  • add skip_sample in TextImageDataset

    add skip_sample in TextImageDataset

    ๋ณ€๊ฒฝ์‚ฌํ•ญ

    modify) loader.py

    • TextImageDataset์—์„œ texts, image๋ฅผ ๋ถˆ๋Ÿฌ์˜ฌ ๋•Œ, data๊ฐ€ ์—†์„ ๊ฒฝ์šฐ ๋ฐœ์ƒํ•˜๋Š” ์—๋Ÿฌ ์ฒ˜๋ฆฌ
    • skip_sample ํ•จ์ˆ˜๋ฅผ ํ™œ์šฉํ•˜์—ฌ error๊ฐ€ ๋ฐœ์ƒํ•  ๊ฒฝ์šฐ, random ํ˜น์€ ๋‹ค์Œ index๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ skip
    • ๊ธฐ์กด train_dalle_gpt_roberta.py๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ˆ˜์ •
    opened by jjonhwa 0
Releases(v0.1.0-beta)
GndNet: Fast ground plane estimation and point cloud segmentation for autonomous vehicles using deep neural networks.

GndNet: Fast Ground plane Estimation and Point Cloud Segmentation for Autonomous Vehicles. Authors: Anshul Paigwar, Ozgur Erkent, David Sierra Gonzale

Anshul Paigwar 114 Dec 29, 2022
B-cos Networks: Attention is All we Need for Interpretability

Convolutional Dynamic Alignment Networks for Interpretable Classifications M. Bรถhle, M. Fritz, B. Schiele. B-cos Networks: Alignment is All we Need fo

58 Dec 23, 2022
OverFeat is a Convolutional Network-based image classifier and feature extractor.

OverFeat OverFeat is a Convolutional Network-based image classifier and feature extractor. OverFeat was trained on the ImageNet dataset and participat

593 Dec 08, 2022
This repository accompanies our paper โ€œDo Prompt-Based Models Really Understand the Meaning of Their Prompts?โ€

This repository accompanies our paper โ€œDo Prompt-Based Models Really Understand the Meaning of Their Prompts?โ€ Usage To replicate our results in Secti

Albert Webson 64 Dec 11, 2022
Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

Nader Akoury 27 Dec 20, 2022
Computational inteligence project on faces in the wild dataset

Table of Contents The general idea How these scripts work? Loading data Needed modules and global variables Parsing the arrays in dataset Extracting a

tooraj taraz 4 Oct 21, 2022
ใ€ŠLightXML: Transformer with dynamic negative sampling for High-Performance Extreme Multi-label Text Classi๏ฌcationใ€‹(AAAI 2021) GitHub:

LightXML: Transformer with dynamic negative sampling for High-Performance Extreme Multi-label Text Classi๏ฌcation

76 Dec 05, 2022
Patch SVDD for Image anomaly detection

Patch SVDD Patch SVDD for Image anomaly detection. Paper: https://arxiv.org/abs/2006.16067 (published in ACCV 2020). Original Code : https://github.co

Hong-Jeongmin 0 Dec 03, 2021
Learning to See by Looking at Noise

Learning to See by Looking at Noise This is the official implementation of Learning to See by Looking at Noise. In this work, we investigate a suite o

Manel Baradad Jurjo 82 Dec 24, 2022
๐Ÿงฎ Matrix Factorization for Collaborative Filtering is just Solving an Adjoint Latent Dirichlet Allocation Model after All

Accompanying source code to the paper "Matrix Factorization for Collaborative Filtering is just Solving an Adjoint Latent Dirichlet Allocation Model A

Florian Wilhelm 39 Dec 03, 2022
Simple ray intersection library similar to coldet - succedeed by libacc

Ray Intersection This project offers a header only acceleration structure library including implementations for a BVH- and KD-Tree. Applications may i

Nils Moehrle 29 Jun 23, 2022
Content shared at DS-OX Meetup

Streamlit-Projects Streamlit projects available in this repo: An introduction to Streamlit presented at DS-OX (Feb 26, 2020) meetup Streamlit 101 - Ja

Arvindra 69 Dec 23, 2022
๐Ÿค— Paper Style Guide

๐Ÿค— Paper Style Guide (Work in progress, send a PR!) Libraries to Know booktabs natbib cleveref Either seaborn, plotly or altair for graphs algorithmic

Hugging Face 66 Dec 12, 2022
Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

Gyeongjae Choi 17 Sep 23, 2021
This repository contains the source code of an efficient 1D probabilistic model for music time analysis proposed in ICASSP2022 venue.

Jump Reward Inference for 1D Music Rhythmic State Spaces An implementation of the probablistic jump reward inference model for music rhythmic informat

Mojtaba Heydari 25 Dec 16, 2022
LBBA-boosted WSOD

LBBA-boosted WSOD Summary Our code is based on ruotianluo/pytorch-faster-rcnn and WSCDN Sincerely thanks for your resources. Newer version of our code

Martin Dong 20 Sep 19, 2022
Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Knover Knover is a toolkit for knowledge grounded dialogue generation based on PaddlePaddle. Knover allows researchers and developers to carry out eff

607 Dec 31, 2022
Controlling Hill Climb Racing with Hand Tacking

Controlling Hill Climb Racing with Hand Tacking Opened Palm for Gas Closed Palm for Brake

Rohit Ingole 3 Jan 18, 2022
Deep Learning and Reinforcement Learning Library for Scientists and Engineers ๐Ÿ”ฅ

TensorLayer is a novel TensorFlow-based deep learning and reinforcement learning library designed for researchers and engineers. It provides an extens

TensorLayer Community 7.1k Dec 29, 2022
EfficientNetV2-with-TPU - Cifar-10 case study

EfficientNetV2-with-TPU EfficientNet EfficientNetV2 adalah jenis jaringan saraf convolutional yang memiliki kecepatan pelatihan lebih cepat dan efisie

Sultan syach 1 Dec 28, 2021