当前位置：网站首页>Cyclegan parsing

Cyclegan parsing

2022-07-27 06:57:00 【Mr_ health】

Preface

In the last blog post, we talked about pix2pix Methods , see Pix2Pix Principle analysis ,pix2pix The method of is applicable to the style migration of paired data , As shown on the left side of the picture below . But in most cases , about A Images of style , We have no corresponding B Style image , What we have is a group of people in style A（ Source domain ） The image and a group are in style B（ Target domain ） Image , such pix2pix2 The method doesn't work .CycleGAN The innovation of is that it can be between the source domain and the target domain , There is no need to establish a one-to-one mapping between training data , This migration can be achieved . The proposed time of this method is 2017 year , At present, it is a very classic and basic method .

Address of thesis ：https://arxiv.org/abs/1703.10593

Basic framework

cyclegan The principle is shown in the figure below . The whole architecture is arranged as follows ：

（1） Input ：

x： Source domain , style A Image
y： Target domain , style B Image

（2） Two generators ：

G： Used to style A Image x Convert to style B Image
F： Used to style B Image y Convert to style A Image

So-called cycle, It can be understood as ：

adopt G Put the style A Image x Convert to style B Image $\widehat{Y}$ , After then $\widehat{Y}$ adopt F After that, you can still switch back to style A, And can Make sure the content in the image is consistent .
adopt F Put the style B Image y Convert to style A Image $\widehat{X}$ , After then $\widehat{X}$ adopt G After that, you can still switch back to style B, And can Make sure the content in the image is consistent .

That is, good training G and F You can finish the style freely A、B Transformation. .

Loss function

In the training, we introduce two discriminators ：

Dy： Differentiate between authentic styles B Image and pass G The converted fake style B Images
Dx： Differentiate between authentic styles A Image and pass G The converted fake style B Images

The loss function is mainly composed of the following parts ：

（1）Dy Situated GAN Loss ：

（2）Dx Situated GAN Loss ：

（3） Loss of cycle consistency , That is what we mentioned earlier cycle reason ：

（4）Identity loss

This loss It is only found in the implementation of real code . It means generator G Used to generate y Style image , Then put the y Send in G, Should still generate y, Only in this way can we prove G Has the ability to generate y The ability of style . therefore G(y) and y It should be as close as possible to . According to the explanation in the paper , If you don't add this loss, Then the generator may modify the hue of the image autonomously , Make the overall color change .

Code

Officially implemented pytorch Code ：https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix

（1） Forward propagation part ：

netG_A Namely G, complete A->B Style transformation （ Source domain to target domain ）
netG_B Namely F, complete B->A Style transformation （ Target domain to source domain ）

    def forward(self):
        """Run forward pass; called by both functions <optimize_parameters> and <test>."""
        self.fake_B = self.netG_A(self.real_A)  # G_A(A)
        self.rec_A = self.netG_B(self.fake_B)   # G_B(G_A(A))
        self.fake_A = self.netG_B(self.real_B)  # G_B(B)
        self.rec_B = self.netG_A(self.fake_A)   # G_A(G_B(B))

（2） to update G：

stay if lambda_idt > 0: In this branch , What is achieved is Identity loss.

The back is Gan Loss （loss_G_A、loss_G_B） And the loss of cyclic consistency （loss_cycle_A、loss_cycle_B）

Be careful ： The discriminator in the code netD_A The judgment is true B Style and generation B True or false style （ Equivalent to Dy）

Empathy netD_B The judgment is true A Style and generation A True or false style （ Equivalent to Dx）

    def backward_G(self):
        """Calculate the loss for generators G_A and G_B"""
        lambda_idt = self.opt.lambda_identity
        lambda_A = self.opt.lambda_A
        lambda_B = self.opt.lambda_B
        # Identity loss
        if lambda_idt > 0:
            # G_A should be identity if real_B is fed: ||G_A(B) - B||
            self.idt_A = self.netG_A(self.real_B)  # Will be true B Send in netG_A(A->B Style generator ) The generated should still be B style 
            self.loss_idt_A = self.criterionIdt(self.idt_A, self.real_B) * lambda_B * lambda_idt
            # G_B should be identity if real_A is fed: ||G_B(A) - A||
            self.idt_B = self.netG_B(self.real_A) # Will be true A Send in netG_B(B->A Style generator ) The generated should still be A style 
            self.loss_idt_B = self.criterionIdt(self.idt_B, self.real_A) * lambda_A * lambda_idt
        else:
            self.loss_idt_A = 0
            self.loss_idt_B = 0

        # GAN loss D_A(G_A(A))
        self.loss_G_A = self.criterionGAN(self.netD_A(self.fake_B), True)
        # GAN loss D_B(G_B(B))
        self.loss_G_B = self.criterionGAN(self.netD_B(self.fake_A), True)
        # Forward cycle loss || G_B(G_A(A)) - A||
        self.loss_cycle_A = self.criterionCycle(self.rec_A, self.real_A) * lambda_A
        # Backward cycle loss || G_A(G_B(B)) - B||
        self.loss_cycle_B = self.criterionCycle(self.rec_B, self.real_B) * lambda_B
        # combined loss and calculate gradients
        self.loss_G = self.loss_G_A + self.loss_G_B + self.loss_cycle_A + self.loss_cycle_B + self.loss_idt_A + self.loss_idt_B
        self.loss_G.backward()

（3） to update D：

    def backward_D_A(self):
        """Calculate GAN loss for discriminator D_A"""
        fake_B = self.fake_B_pool.query(self.fake_B)
        self.loss_D_A = self.backward_D_basic(self.netD_A, self.real_B, fake_B)

    def backward_D_B(self):
        """Calculate GAN loss for discriminator D_B"""
        fake_A = self.fake_A_pool.query(self.fake_A)
        self.loss_D_B = self.backward_D_basic(self.netD_B, self.real_A, fake_A)

Generator structure

Finally, I would like to add cyclegan The structure of the generator used , It comes from the thesis ：Perceptual Losses for Real-Time Style Transfer and Super-Resolution, If you are interested, you can search , The basic structure is as follows .

A total of 3 Convolution layers 、5 A remnant 、3 It's made up of two convolution layers .
Pooling and other operations are not used here , In the initial convolution （ The second floor 、 The third level ） Down sampling was performed , At the end of the 3 Up sampling was carried out in convolution layers , The most direct way is to reduce the computational complexity , Another advantage is that the effective receiving area becomes larger , Convolution down sampling will increase the effective region .5 The number of residual blocks is the same （128） Filter nucleus , In each residual block 2 Convolution layers （3*3 nucleus ）, There is no standard... In the convolution layer here 0 fill （padding）, Because use 0 Filling will cause serious artifacts on the boundary of the generated image . In order to ensure that the input and output image size does not change , Reflection filling is added to the initial input part of the image .
The residual network here is not the residual network of he Kaiming （ After convolution, there is no Relu）, It USES Gross and Wilber The residual network of . The latter method is proved to be effective in image classification algorithm .

For input yes 256×256 Size image ,residual block share 9 individual , about 128×128 Size image ,residual block by 6 individual .