当前位置：网站首页>Style conversion model style_ Transformer project instance pytorch implementation

Style conversion model style_ Transformer project instance pytorch implementation

2022-07-28 12:00:00 【Mr. Xiaocai】

Style transformation model style_transformer Project instance pytorch Realization

Have you ever thought about it , Use machine learning to draw , today , I will take you hand in hand into the deep learning model neural style Code practice .
neural-style Model is a model of style transfer , yes GitHub Last great project , So what is style transfer , Let's take a simple example ：
Insert picture description here
The theoretical guidance of this project comes from the thesis :Perceptual Losses for Real-Time Style Transfer and Super-Resolution

One . Related work

Related neural network architecture ： The relevant feedforward neural network architecture of this paper is based on “Deep residual learning for image recognition. ” as well as “Training and investigating residual nets.” Two papers .
Relevant image generation method ： The method of image generation in this paper is based on the paper “Inverting visual representations with convolutional networks”, But it is not used Pixel-Loss Function, Instead, the perceptual loss function is used to replace the loss function of the pixel by pixel gap . This method applies feedforward neural network , And Gatys Used in the paper “Understanding deep image representations by inverting them” The method has the same effect but faster .

Two . Implementation details

Insert picture description here
Image map 2 It's shown in , Our system consists of two parts ： A picture conversion network $f w$ And a loss network $\phi$ （ Used to define a series of loss functions $l_1, l_2, l_3$ ）, Image conversion network is a deep residual network , The parameter is weight $W$ , It puts the input picture $x$ By mapping $\hat y=fw(x)$ Convert to output picture $\hat y$ , Each loss function calculates a scalar value $l_i(\hat y,y_i)$ , Measuring output $\hat y$ And the target image $y_i$ The gap between . Image conversion network is used SGD Training ( Code implementation adopts Adam), Make the weighted sum of a series of loss functions keep decreasing .
chart 2： System Overview . On the left is Generator, The right side is pre trained vgg16 The Internet （ Always fixed ）.
Insert picture description here
Loss network $\phi$ Is able to define a feature （ Content ） Loss $l_{feat}^{\phi}$ And a loss of style $l_{style}^{\phi}$ , Measure the gap between content and style . For each input image $x$ We have a content goal $y_c$ A style goal $y_s$ , For style conversion , Content target $y_c$ It's the input image $x$ , Output image $y$ , The output image should be $y_s$ Combine content with $x=y_c$ On . We train a network for each target style . For single image super-resolution reconstruction , The input image $x$ Is a low resolution input , The target content is a real high-resolution image , Style reconstruction does not use . We train a network for each super-resolution factor .

3、 ... and . Image conversion network

Bright spot : Use residual network down sampling input image, Then a new one is generated by up sampling interpolation output image.
Insert picture description here

1. Use the residual network as follows :

Insert picture description here

note: The author compares the use of residual block and normal convolutional network The performance of the , Such as Fig1 Shown , Results show residual block It will converge faster , But the end result is similar . The author speculates that the possible residual network may perform better in deeper Networks .

Four . Loss Functions details

1. Content Loss Function

We do not recommend pixel by pixel comparison , It's about using VGG Calculate advanced features （ Content ） Express , This method is different from that artistic style Use VGG-19 Extracting style features is the same , The formula ：
Insert picture description here
Find an image $\hat y$ Minimize the feature loss of the lower layer , Can often be produced visually and $y$ Indistinguishable images , If high-rise buildings are used to rebuild , The content and global structure will be preserved , But color, texture and precise shape no longer exist . Using a feature loss to train our image conversion network can make the output very close to the target image y, But it's not to let them make a perfect match .

2. Style Reconstruction Loss

features （ Content ） The loss penalizes the output image （ When it deviates from its goal $y$ when ）, So we also hope to punish the deviation in style ： Color , texture , A common pattern , Other aspects . In order to achieve this effect Gatys Et al. Proposed the loss function of the following style reconstruction .

among $\phi_j(x)$ On behalf of the Internet $\phi$ Of the $j$ layer , Input is $x$ . The shape of the feature map is $C_j × H_j × W_j$ 、 Definition Gram matrix $G^{\phi}_j(x)$ by $C_j × C_j$ matrix （ Characteristic matrix ） The elements come from ：
Insert picture description here
If we put $\phi_j(x)$ Understand as a $C_j$ The characteristics of dimensions , The size of each feature is $H_j × W_j$ , Then the left side of the above formula $G_j(x)$ Even with $C_j$ The eccentricity of the dimension is proportional to the covariance . Each grid location can be regarded as an independent sample . Therefore, it can grasp which feature can drive other information .
The style loss function is to output pictures $\hat y$ And target pictures $y$ Between gram matrix :
Insert picture description here

5、 ... and . Code implementation

1. Style change network :

#  Obviously, it is the style conversion module 
class TransformerNet(nn.Module):
    def __init__(self):
        super(TransformerNet, self).__init__()
        # Initial convolution layers
        self.conv1 = ConvLayer(3, 32, kernel_size=9, stride=1)
        self.in1 = nn.InstanceNorm2d(32, affine=True)
        self.conv2 = ConvLayer(32, 64, kernel_size=3, stride=2)
        self.in2 = nn.InstanceNorm2d(64, affine=True)
        self.conv3 = ConvLayer(64, 128, kernel_size=3, stride=2)
        self.in3 = nn.InstanceNorm2d(128, affine=True)
        # Residual layers
        self.res1 = ResidualBlock(128)
        self.res2 = ResidualBlock(128)
        self.res3 = ResidualBlock(128)
        self.res4 = ResidualBlock(128)
        self.res5 = ResidualBlock(128)
        # Upsampling Layers
        self.deconv1 = UpsampleConvLayer(128, 64, kernel_size=3, stride=1, upsample=2)
        self.in4 = nn.InstanceNorm2d(64, affine=True)
        self.deconv2 = UpsampleConvLayer(64, 32, kernel_size=3, stride=1, upsample=2)
        self.in5 = nn.InstanceNorm2d(32, affine=True)
        self.deconv3 = ConvLayer(32, 3, kernel_size=9, stride=1)
        # Non-linearities
        self.relu = nn.ReLU()

    def forward(self, x):
        y = self.relu(self.in1(self.conv1(x)))
        y = self.relu(self.in2(self.conv2(y)))
        y = self.relu(self.in3(self.conv3(y)))
        y = self.res1(y)
        y = self.res2(y)
        y = self.res3(y)
        y = self.res4(y)
        y = self.res5(y)
        y = self.relu(self.in4(self.deconv1(y)))
        y = self.relu(self.in5(self.deconv2(y)))
        y = self.deconv3(y)
        return y

2. Residual module

class ResidualBlock(nn.Module):
    """ResidualBlock introduced in: https://arxiv.org/abs/1512.03385 recommended architecture: http://torch.ch/blog/2016/02/04/resnets.html """
    def __init__(self, channels):
        super(ResidualBlock, self).__init__()
        self.conv1 = ConvLayer(channels, channels, kernel_size=3, stride=1)
        self.in1 = nn.InstanceNorm2d(channels, affine=True)
        self.conv2 = ConvLayer(channels, channels, kernel_size=3, stride=1)
        self.in2 = nn.InstanceNorm2d(channels, affine=True)
        self.relu = nn.ReLU()
    
    def forward(self, x):
        residual = x
        out = self.relu(self.in1(self.conv1(x)))
        out = self.in2(self.conv2(out))
        out = out + residual
        return out

3. Up sampling module

#  Obviously, it is the upper sampling module 
class UpsampleConvLayer(nn.Module):
    """UpsampleConvLayer Upsamples the input and then does a convolution. This method gives better results compared to ConvTranspose2d. ref: http://distill.pub/2016/deconv-checkerboard/ """
    def __init__(self, in_channels, out_channels, kernel_size, stride, upsample=None):
        super(UpsampleConvLayer, self).__init__()
        self.upsample = upsample
        reflection_padding = kernel_size // 2
        self.reflection_pad = nn.ReflectionPad2d(reflection_padding)
        self.conv2d = nn.Conv2d(in_channels, out_channels, kernel_size, stride)

    def forward(self, x):
        x_in = x
        if self.upsample:
            x_in = nn.functional.interpolate(x_in, mode='nearest', scale_factor=self.upsample)
        out = self.reflection_pad(x_in)
        out = self.conv2d(out)
        return out

4. Basic network module

#  Convolution module 
class ConvLayer(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride):
        super(ConvLayer, self).__init__()
        reflection_padding = kernel_size // 2
        self.reflection_pad = nn.ReflectionPad2d(reflection_padding)
        self.conv2d = nn.Conv2d(in_channels, out_channels, kernel_size, stride)
        
    def forward(self, x):
        out = self.reflection_pad(x)
        out = self.conv2d(out)
        return out

#  It is obviously a residual module 
class ResidualBlock(nn.Module):
    """ResidualBlock introduced in: https://arxiv.org/abs/1512.03385 recommended architecture: http://torch.ch/blog/2016/02/04/resnets.html """
    def __init__(self, channels):
        super(ResidualBlock, self).__init__()
        self.conv1 = ConvLayer(channels, channels, kernel_size=3, stride=1)
        self.in1 = nn.InstanceNorm2d(channels, affine=True)
        self.conv2 = ConvLayer(channels, channels, kernel_size=3, stride=1)
        self.in2 = nn.InstanceNorm2d(channels, affine=True)
        self.relu = nn.ReLU()
    
    def forward(self, x):
        residual = x
        out = self.relu(self.in1(self.conv1(x)))
        out = self.in2(self.conv2(out))
        out = out + residual
        return out