当前位置:网站首页>Multimodal unsupervised image to image translation
Multimodal unsupervised image to image translation
2022-07-29 02:37:00 【A fan boy addicted to bicycles】
Preface : be based on GAN The direction of image translation has been very popular , Last time I introduced an unrepeatable SketchyGAN, Very disappointed . This time we introduce an unsupervised from NVIDIA research GAN Image translation for MUNIT, The next article is also about unsupervised image translation 《Unsupervised Sketch-to-Photo Synthesis》 Compare the similarities and differences between the two , Thinking can bring inspiration to the current work .
Catalog
Some shared potential space assumptions
Bidirectional reconstruction loss function
Main contributions
An image in a given source domain , The goal is to learn the conditional distribution of the corresponding image in the target domain , You don't need to see any examples of corresponding image pairs . Suppose that the image representation can be decomposed into domain invariant content codes and style codes that capture domain specific attributes . To convert an image to another domain , We recombine its content code with the random style code sampled from the style space of the target domain .
Sketch to photo synthesis is challenging , There are two reasons :
1、 The sketch is inconsistent with the photo in shape , The sketch commonly used by amateurs has great deformation in space and geometry . therefore , Converting sketches into photos requires correction of deformation .
2、 The sketch is colorless , Lack of visual details . Sketch on white paper with black strokes , Internal marks that mainly outline the boundaries and characteristics of objects . In order to synthesize a picture , Shadows and colored textures must be filled correctly .
In this paper , We propose a principled framework for multimodal unsupervised image to image translation . Pictured 1 (a) Shown , Our framework makes several assumptions . We first assume that the potential space of an image can be decomposed into content space and style space . We further assume that images in different domains share a common content space , Not style space . In order to convert the image to the target domain , We recombine its content code with the random style code in the target style space ( chart 1 (b)). The information that should be retained in the process of content code translation , The style code represents other variants that are not included in the input image . By sampling different styles of code , Our model can produce different multimodal outputs . A large number of experiments have proved the effectiveness of our method in modeling multimodal output distribution and its superior image quality than the most advanced method . Besides , The decomposition of content and style space allows our framework to perform example guided image translation , The style of translation output is controlled by the sample image provided by the user in the target domain .

Methods,
Some shared potential space assumptions
hypothesis
、
Belong to two different domains , Sample from two edge distributions
and
, So the goal of generation is
and
.
Suppose each image
It is the content potential code shared by two domains and the style potential code specific to a single domain
Generated . The goal of the network is to learn potential generator and encoder functions and Neural Networks .
This assumption is consistent with UNIT The shared potential space hypothesis proposed in is closely related . although UNIT Suppose there is a fully shared potential space , But we assume that there is only part of the potential space ( Content ) Can be shared across domains , And the rest ( style ) Is domain specific , When cross domain mapping is many to many , This is a more reasonable assumption .
Encoder - Decoder structure
The model consists of two automatic encoders ( It is indicated by red and blue arrows respectively ), One for each domain . The potential code of each self encoder consists of content code c And style code s form . We fight against the target ( Dotted line ) Training models , To ensure that the translated image is indistinguishable from the real image in the target domain , And the goal of two-way reconstruction ( Dotted line ), Rebuild images and potential code .

The potential code of each automatic encoder is decomposed into a content code
And a style code
, Image to image conversion is through the exchange encoder - The decoder performs , Although the prior distribution is unimodal , However, due to the nonlinearity of the decoder , The output image distribution can be multimodal .
The loss function includes bidirectional reconstruction loss ( Make sure the encoder and decoder are reversed ) And confrontational losses ( Match the distribution of the translation image with that of the target domain ).
Bidirectional reconstruction loss function
In order to learn reciprocal encoder and decoder pairs , We use the objective function to encourage the reconstruction of both image -> latent -> image and latent -> image -> latent.
Image reconstruction loss function . Given an image sampled from the data distribution , We should be able to reconstruct it after encoding and decoding :

Potential reconstruction loss function . Given a potential code sampled from the potential distribution at the time of translation ( Style and content ), We should be able to reconstruct it after decoding and encoding .

Author use L1 Reconstruction losses , Because it can promote clear output image .
Against the loss . utilize GANs To match the distribution of the translated image and the target data :

Total loss


Code reappearance
I have to say that the papers of NVIDIA Research Institute are very conscientious , Can be reproduced quickly , Not like last time sketchy gan, If there is a problem with the code, email the author 、 carry issue No reply ……
Code address :GitHub - NVlabs/MUNIT: Multimodal Unsupervised Image-to-Image Translation
Use the address :imaginaire/projects/munit at master · NVlabs/imaginaire · GitHub
I reviewed this code , He provided. shoe Data set pre training model , Although the effect on the edge graph is very good , But I changed to sketchy datasets The effect is very general , The author proposes a general framework , Not for sketch Data optimization , The effect is generally reasonable .
I also calculated FID and IS, The index score ratio is unsupervised GAN The method is higher , It's a little awkward .
Personal feelings
Unfortunately, this paper is rough , The general framework proposed by the author is a little complicated , I haven't studied deeply on the basis that the author can use it directly , Look back at this part when you have time .

Reference resources
边栏推荐
- Altium designer outputs Gerber and other production documents
- Time pit in MySQL driver
- Quickly master nodejs installation and getting started
- [upload pictures can be cut-1]
- Intel's IPP Library (Integrated Performance Primitives)
- 2022/07/28 学习笔记 (day18) 常用API
- Polygon point test
- Explain asynchronous tasks in detail: task status and lifecycle management
- How awesome is the architecture of "12306"?
- ROS2/ROS1开发过程中的一些记录
猜你喜欢

Shell 脚本 快速入门 -01

HTTP缓存

Transform okhttp cache with retrofit

I want to talk about high concurrency.

What if there is not enough time for adequate testing?

MySQL驱动中关于时间的坑

VR safety training of mine mining virtual reality improves employees' vigilance and protection awareness

MySQL和Redis的双写一致性

Mqtt routine

“12306”的架构到底有多牛逼?
随机推荐
Polygon point test
HTTP缓存
如何利用 RPA 实现自动化获客?
矿山开采虚拟现实vr安全培训提升员工警惕性和防护意识
Driverless obstacle avoidance technology
Stm32f103xx firmware function library-1
Double write consistency of MySQL and redis
What are the TCP retransmission mechanisms?
Responsive dream weaving template hotel room website
ES6 event binding (v-on usage)
How does the Devops team defend against API attacks?
Intel's IPP Library (Integrated Performance Primitives)
详解异步任务:任务的状态及生命周期管理
MySQL basic operation and comprehensive instance project based on MySQL basic operation
C语言实现三子棋游戏
当Synchronized遇到这玩意儿,有个大坑,要注意
我被这个浏览了 746000 次的问题惊住了
3D intelligent factory process flow visualization interactive display application advantages
Explain asynchronous tasks in detail: task status and lifecycle management
STM32F103 learn the steps and template fool tutorial of 1-keil5 project establishment