当前位置:网站首页>Multimodal unsupervised image to image translation
Multimodal unsupervised image to image translation
2022-07-29 02:37:00 【A fan boy addicted to bicycles】
Preface : be based on GAN The direction of image translation has been very popular , Last time I introduced an unrepeatable SketchyGAN, Very disappointed . This time we introduce an unsupervised from NVIDIA research GAN Image translation for MUNIT, The next article is also about unsupervised image translation 《Unsupervised Sketch-to-Photo Synthesis》 Compare the similarities and differences between the two , Thinking can bring inspiration to the current work .
Catalog
Some shared potential space assumptions
Bidirectional reconstruction loss function
Main contributions
An image in a given source domain , The goal is to learn the conditional distribution of the corresponding image in the target domain , You don't need to see any examples of corresponding image pairs . Suppose that the image representation can be decomposed into domain invariant content codes and style codes that capture domain specific attributes . To convert an image to another domain , We recombine its content code with the random style code sampled from the style space of the target domain .
Sketch to photo synthesis is challenging , There are two reasons :
1、 The sketch is inconsistent with the photo in shape , The sketch commonly used by amateurs has great deformation in space and geometry . therefore , Converting sketches into photos requires correction of deformation .
2、 The sketch is colorless , Lack of visual details . Sketch on white paper with black strokes , Internal marks that mainly outline the boundaries and characteristics of objects . In order to synthesize a picture , Shadows and colored textures must be filled correctly .
In this paper , We propose a principled framework for multimodal unsupervised image to image translation . Pictured 1 (a) Shown , Our framework makes several assumptions . We first assume that the potential space of an image can be decomposed into content space and style space . We further assume that images in different domains share a common content space , Not style space . In order to convert the image to the target domain , We recombine its content code with the random style code in the target style space ( chart 1 (b)). The information that should be retained in the process of content code translation , The style code represents other variants that are not included in the input image . By sampling different styles of code , Our model can produce different multimodal outputs . A large number of experiments have proved the effectiveness of our method in modeling multimodal output distribution and its superior image quality than the most advanced method . Besides , The decomposition of content and style space allows our framework to perform example guided image translation , The style of translation output is controlled by the sample image provided by the user in the target domain .

Methods,
Some shared potential space assumptions
hypothesis
、
Belong to two different domains , Sample from two edge distributions
and
, So the goal of generation is
and
.
Suppose each image
It is the content potential code shared by two domains and the style potential code specific to a single domain
Generated . The goal of the network is to learn potential generator and encoder functions and Neural Networks .
This assumption is consistent with UNIT The shared potential space hypothesis proposed in is closely related . although UNIT Suppose there is a fully shared potential space , But we assume that there is only part of the potential space ( Content ) Can be shared across domains , And the rest ( style ) Is domain specific , When cross domain mapping is many to many , This is a more reasonable assumption .
Encoder - Decoder structure
The model consists of two automatic encoders ( It is indicated by red and blue arrows respectively ), One for each domain . The potential code of each self encoder consists of content code c And style code s form . We fight against the target ( Dotted line ) Training models , To ensure that the translated image is indistinguishable from the real image in the target domain , And the goal of two-way reconstruction ( Dotted line ), Rebuild images and potential code .

The potential code of each automatic encoder is decomposed into a content code
And a style code
, Image to image conversion is through the exchange encoder - The decoder performs , Although the prior distribution is unimodal , However, due to the nonlinearity of the decoder , The output image distribution can be multimodal .
The loss function includes bidirectional reconstruction loss ( Make sure the encoder and decoder are reversed ) And confrontational losses ( Match the distribution of the translation image with that of the target domain ).
Bidirectional reconstruction loss function
In order to learn reciprocal encoder and decoder pairs , We use the objective function to encourage the reconstruction of both image -> latent -> image and latent -> image -> latent.
Image reconstruction loss function . Given an image sampled from the data distribution , We should be able to reconstruct it after encoding and decoding :

Potential reconstruction loss function . Given a potential code sampled from the potential distribution at the time of translation ( Style and content ), We should be able to reconstruct it after decoding and encoding .

Author use L1 Reconstruction losses , Because it can promote clear output image .
Against the loss . utilize GANs To match the distribution of the translated image and the target data :

Total loss


Code reappearance
I have to say that the papers of NVIDIA Research Institute are very conscientious , Can be reproduced quickly , Not like last time sketchy gan, If there is a problem with the code, email the author 、 carry issue No reply ……
Code address :GitHub - NVlabs/MUNIT: Multimodal Unsupervised Image-to-Image Translation
Use the address :imaginaire/projects/munit at master · NVlabs/imaginaire · GitHub
I reviewed this code , He provided. shoe Data set pre training model , Although the effect on the edge graph is very good , But I changed to sketchy datasets The effect is very general , The author proposes a general framework , Not for sketch Data optimization , The effect is generally reasonable .
I also calculated FID and IS, The index score ratio is unsupervised GAN The method is higher , It's a little awkward .
Personal feelings
Unfortunately, this paper is rough , The general framework proposed by the author is a little complicated , I haven't studied deeply on the basis that the author can use it directly , Look back at this part when you have time .

Reference resources
边栏推荐
- 矿山开采虚拟现实vr安全培训提升员工警惕性和防护意识
- 如果非要在多线程中使用 ArrayList 会发生什么?
- MySQL驱动中关于时间的坑
- Remember error scheduler once Asynceventqueue: dropping event from queue shared causes OOM
- JMeter's BeanShell generates MD5 encrypted data and writes it to the database
- 如何快速设计一套支持渲染富文本内容的跨端组件
- Teach you how to install vscode by hand (with illustrated steps)
- Understand the evolution of redis architecture in one article
- Summary of knowledge points of Engineering Economics
- How much is the report development cost in the application system?
猜你喜欢

How to migrate thinkphp5 projects to Alibaba cloud function computing to cope with traffic peaks?

Esbuild Bundler HMR

Transform okhttp cache with retrofit

When synchronized encounters this thing, there is a big hole, so be careful

如何把thinkphp5的项目迁移到阿里云函数计算来应对流量洪峰?

XSS靶场(二)xss.haozi

如果非要在多线程中使用 ArrayList 会发生什么?

How to use RPA to achieve automatic customer acquisition?

ECCV 2022 | AirDet:无需微调的小样本目标检测方法

HTTP缓存
随机推荐
Teach you how to install vscode by hand (with illustrated steps)
Explanation of engineering economics terms
NVIDIA-VPI(Vision Programming Interface)
Object based real-time spatial audio rendering - Dev for dev column
我被这个浏览了 746000 次的问题惊住了
Branch management practice of "two pizza" team
Virsh console connection failure
Installation guide for proftpd Secure FTP server with TLS encryption enabled
Interprocess communication - detailed explanation of the pipeline (explanation of graphic cases)
Three implementation methods of Servlet
CUDA details GPU architecture
新版海螺影视主题模板M3.1全解密版本多功能苹果CMSv10后台自适应主题开源全解密版
Understanding service governance in distributed development
代码实现 —— 多项式的最大公因式(线性代数)
How to migrate thinkphp5 projects to Alibaba cloud function computing to cope with traffic peaks?
Redis master-slave mode, sentinel cluster, fragment cluster
When I look at the source code, what am I thinking?
4年测试经验,好不容易进了阿里,两个月后我选择了裸辞...
On Multithreading
HTTP breakpoint resume and cache problems