当前位置：网站首页>[machine learning notes] [style transfer] deeplearning ai course4 4th week programming（tensorflow2)

[machine learning notes] [style transfer] deeplearning ai course4 4th week programming（tensorflow2)

2022-07-26 10:39:00 【LittleSeedling】

Special applications - Style transfer

Neural style conversion NST（Neural Style Transfer)
The migration study
- Load model
- Build a model with multiple outputs
Load model
- stay VGG19 Select the output of style layer and content layer
- Build the model
Define style matrix
Define the loss function
Loading pictures
Regularization
The main function
Combine 1
Combine 2
Code

 The goal is ：
	1.  Use pre trained models vgg19, Transfer picture style 
	 modify 【 Reference article 】 Code for , Use tensorflow2 Realization

Reference from ：
1.【 Chinese/English 】【 Wu Enda's after-school programming homework 】Course 4 - Convolutional neural networks - Work in the fourth week
2.Tensorflow2.0 Nerve style migration
3. Tensorflow2.0 And tf.keras.applacations The migration study
4. Tensorflow2.0 How to specify multiple outputs in the network
5. Wu enda Coursera Deep learning course deeplearning.ai (4-4) Face recognition and neural style conversion – Programming operation

Neural style conversion NST（Neural Style Transfer)

It merges two images , Use 【 Content image 】 and 【 Style image 】 Generate 【 Merged image 】

Insert picture description here

The migration study

Use the pre trained convolution network , And build on this basis . The idea of using networks trained on different tasks and applying them to new tasks is called transfer learning .
Use here keras Self contained VGG-19. This model is already in a very large ImageNet Training on the database , So I learned to recognize various low-level features ( Shallow ) And advanced features ( Deep level ).

Load model

vgg = tf.keras.applications.VGG19(include_top=False, weights="imagenet")
vgg.summary()

include_top: Whether to keep all the fully connected networks at the top 
weights:None Represents random initialization ,"imagenet" Represents loading pre training weights 
input_tensor:
input_shpae: Optional , Only when the include_top=False It works , The length should be 3 Of tuple, Indicates the name of the input picture shape, The image 【 Wide and high 】 Must be greater than 71, Such as （150,150,3）
classes: Optional , Categories of image classification , Only when the include_top=True And the pre training weight is not loaded, which can be used .

vgg.trainable = False  Said is wrong vgg The parameters in are trained .

Build a model with multiple outputs

selected_layers = ['block1_conv1',
                'block2_conv1',
                'block3_conv1', 
                'block4_conv1', 
                'block5_conv1']
model = tf.keras.Model([vgg.input], outputs)

such , All layer 1 outputs , It contains the output of the previously selected layer

Load model

stay VGG19 Select the output of style layer and content layer

Choosing more can play an average role

def vgg_layers(layer_names):
    """  Select the layer to output   Parameters ： layer_names -- Select the layer as the output   return ： model -- Models with multiple outputs  """
    vgg = tf.keras.applications.VGG19(include_top=False, weights="imagenet")
    vgg.trainable = False

    outputs = [vgg.get_layer(name).output for name in layer_names]

    #  Build a model with multiple outputs 
    model = tf.keras.Model([vgg.input], outputs)

    return model

Build the model

class StyleContentModel(tf.keras.models.Model):
    def get_config(self):
        pass

    def __init__(self, style_layers, content_layers):
        super(StyleContentModel, self).__init__()
        #  Modified with the specified output layer vgg Model 
        self.vgg = vgg_layers(style_layers + content_layers)
        self.style_layers = style_layers
        self.content_layers = content_layers
        #  Select style layer output   The number of 
        self.num_style_layers = len(style_layers)
        #  No training 
        self.vgg.trainable = False

    def call(self, inputs, training=None, mask=None):
        """Expects float input in [0,1]"""
        inputs = inputs * 255.0
        #  Output preprocessing 
        preprocess_input = tf.keras.applications.vgg19.preprocess_input(inputs)
        #  Get the output 
        outputs = self.vgg(preprocess_input)
        
        #  Separate from the output 【 Style layer output 】 and 【 Content layer output 】
        style_outputs, content_outputs = outputs[:self.num_style_layers], outputs[self.num_style_layers:]
        
        #  Yes 【 Content layer output 】 Pre treatment , Turn it into 【 Style matrix 】 In the form of 
        style_outputs = [gram_matrix(style_output) for style_output in style_outputs]

        #  take 【 Content layer output 】 Separate into dictionary form 
        content_dict = {
    
            content_name: value for content_name, value in zip(self.content_layers, content_outputs)
        }
        
        #  take 【 Style layer output 】 Separate into dictionary form 
        style_dict = {
    
            style_name: value for style_name, value in zip(self.style_layers, style_outputs)
        }

        return {
    'content': content_dict, 'style': style_dict}

Define style matrix

$Gram\_matrix(A) = AA^T$

def gram_matrix(input_tensor):
    """  matrix A The style matrix of is AA^T """
    result = tf.linalg.einsum('bijc,bijd->bcd', input_tensor, input_tensor)
    input_shape = tf.shape(input_tensor)
    num_locations = tf.cast(input_shape[1]*input_shape[2], tf.float32)
    #  Divide by width * high , Avoid the value of style matrix is too large 
    return result / num_locations

Define the loss function

$\alpha J_{content}(C,G) + \beta J_{style}(S,G)$
Because the value of the loss function is too large ,
here , take $\alpha = 10000~~\beta = 0.01$ .
$J_{content}(C,G) = {1\over nm}\sum_j^m\sum_i^n(C_i^l-G_i^l)^2$
among $i$ Represents each element in the matrix , $l$ Represent the output of different layers .

$J_{style}(S,G) = {1\over nm}\sum_j^m\sum_i^n(S_i^l-G_i^l)^2$
among S For style matrix Gram_martrix(S), $i$ Represents each element in the matrix , $l$ Represent the output of different layers .

def style_content_loss2(outputs, target, num_style_layers, num_content_layers):
    """  Calculate the loss   Parameters ： output --  Output after model . Use 【 Content picture 】 Iterate step by step . target --  Goals that need to be approached . It is divided into 【 Content 】 and 【 style 】 Two parts .  Namely 【 Content picture 】 and 【 Style picture 】 Output . num_style_layers -- 【 Style layer output 】 The number of  num_content_layers -- 【 Content layer output 】 The number of  """
    style_outputs = outputs["style"]
    content_outputs = outputs["content"]

    style_target = target["style"]
    content_target = target["content"]

    #  Computing style loss 
    style_loss = tf.add_n([tf.reduce_mean((style_outputs[name]-style_target[name])**2)
                           for name in style_outputs.keys()])
    style_loss /= num_style_layers

    #  Calculate content loss 
    content_loss = tf.add_n([tf.reduce_mean((content_outputs[name]-content_target[name])**2)
                             for name in content_outputs.keys()])
    content_loss /= num_content_layers

    #  Calculate the total loss 
    loss = total_cost(content_loss, style_loss,alpha=1e4,beta=1e-2)
    return loss
    
def total_cost(J_content, J_style, alpha=1e1, beta=1e2):
    """  Calculate the total loss function   Parameters ： J_content --  Content loss  J_style --  Style loss  alpha --  Hyperparameters , Weight of content loss  beta --  Hyperparameters , Weight of style loss   return ： J --  Total loss  """
    J = alpha * J_content + beta * J_style
    return J

Loading pictures

The largest dimension of the loaded image is 256（ Too big to run , It will explode ）

def load_img(path_to_img):
    """  Loading pictures  """
    #  The largest dimension of the picture 
    max_dim = 256
    img = tf.io.read_file(path_to_img)
    img = tf.image.decode_image(img, channels=3)
    img = tf.image.convert_image_dtype(img, tf.float32)

    shape = tf.cast(tf.shape(img)[:-1], tf.float32)
    long_dim = max(shape)
    scale = max_dim / long_dim

    new_shape = tf.cast(shape * scale, tf.int32)

    img = tf.image.resize(img, new_shape)
    img = img[tf.newaxis, :]
    return img

Regularization

def high_pass_x_y(image):
    x_var = image[:, :, 1:, :] - image[:, :, :-1, :]
    y_var = image[:, 1:, :, :] - image[:, :-1, :, :]

    return x_var, y_var


def total_variation_loss(image):
    x_deltas, y_deltas = high_pass_x_y(image)
    return tf.reduce_mean(x_deltas ** 2) + tf.reduce_mean(y_deltas ** 2)

The main function

def main1(epochs=5, steps_per_epoch=100):
    #  Starting time 
    start_time = time.perf_counter()
    
    #  choice vgg The output layer of the model 
    content_layers = ["block5_conv2"]
    style_layers = [
        "block1_conv1",
        "block2_conv1",
        "block3_conv1",
        "block4_conv1",
        "block5_conv1"
    ]
    #  The calculation selects several outputs 
    num_style_layers = len(style_layers)
    num_content_layers = len(content_layers)
    
    #  Extract with specified output vgg Model 
    extractor = StyleContentModel(style_layers, content_layers)

    #  Load content pictures and style pictures 
    content_image = load_img("images/cat.jpg")
    style_image = load_img("images/monet.jpg")

    #  Run it first , Get the encoded 【 Target style 】 and 【 Target content 】
    style_targets = extractor(style_image)["style"]
    content_targets = extractor(content_image)["content"]

    targets = {
    
        "style": style_targets,
        "content": content_targets
    }

    #  take 【 Content picture 】 As input to the model 
    image = tf.Variable(content_image)
    
    #  Define optimizer Adam
    opt = tf.optimizers.Adam(learning_rate=0.02)
    #  The weight of the loss function 
    # style_weight = 1e-2
    # content_weight = 1e4
    total_variation_weight = 1e8

    costs = []
    step = 0
    for n in range(epochs):
        for m in range(steps_per_epoch):
            step += 1
            with tf.GradientTape() as tape:
                outputs = extractor(image)
                loss = style_content_loss2(outputs, targets, num_style_layers, num_content_layers)
                #  Regularization bias 
                loss += total_variation_weight * total_variation_loss(image)
            #  For input image  updated 
            grads = tape.gradient(loss, image)
            opt.apply_gradients(grads_and_vars=[(grads,image)])
            #  send image stay 0-1 Between 
            image.assign(tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0))

            #  Record the loss 
            costs.append(loss)
            print(f"step{step}--loss:{loss}")
        imshow2(image.read_value())
        plt.title("Train step:{}".format(step))
        plt.show()

    plt.plot(np.squeeze(costs))
    plt.ylabel("cost")
    plt.xlabel("iterations")
    plt.title("learning rate="+str(0.02))
    plt.show()

    #  End time 
    end_time = time.perf_counter()

    #  Elapsed time 
    minium = end_time - start_time
    #  Total printing time 
    print(" Yes ：" + str(int(minium / 60)) + " branch " + str(int(minium % 60)) + " second ")

Combine 1

Insert picture description here
+

iteration 1000 Time
Insert picture description here

Insert picture description here

Combine 2

Insert picture description here

=
iteration 500 Time
Insert picture description here

step1--loss:206808352.0
step2--loss:138015312.0
step3--loss:76444464.0
step4--loss:55079300.0
step5--loss:52182004.0
step6--loss:52179800.0
step7--loss:49280824.0
step8--loss:45222588.0
step9--loss:40886236.0
step10--loss:37080472.0
step11--loss:33747848.0
step12--loss:31121796.0
step13--loss:29348120.0
step14--loss:27991062.0
step15--loss:26776242.0
step16--loss:25650356.0
step17--loss:24728126.0
step18--loss:23919458.0
...
step99--loss:8370326.0
step100--loss:8396298.0
step101--loss:8440048.0
...
step499--loss:5766706.5
step500--loss:5699691.5
 Yes ：14 branch 32 second

Insert picture description here

Code

tensorflow2.3
python3.8.5

import time
import os
import sys
import scipy.io
import scipy.misc
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
from PIL import Image
import nst_utils
import numpy as np
import tensorflow as tf
from PIL import Image, ImageDraw, ImageFont

#  Don't use GPU
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
"""  The model used , yes VGG Online 19 Layer version , Already in very big ImageNet Training on the database , Learned to recognize various low-level features and high-level features  """


def load_my_model():
    # model = nst_utils.load_vgg_model("pretrained-model/imagenet-vgg-verydeep-19.mat")
    vgg = tf.keras.applications.VGG19(include_top=False, weights="imagenet")
    vgg.summary()


def gram_matrix(input_tensor):
    """  matrix A The style matrix of is AA^T """
    result = tf.linalg.einsum('bijc,bijd->bcd', input_tensor, input_tensor)
    input_shape = tf.shape(input_tensor)
    num_locations = tf.cast(input_shape[1]*input_shape[2], tf.float32)
    #  Divide by width * high , Avoid the value of style matrix is too large 
    return result / num_locations


# tf.random.set_seed(1)
# A = tf.random.normal([3,2*1],mean=1,stddev=4)
# GA = gram_matrix(A)
# print("GA ="+str(GA))

def total_cost(J_content, J_style, alpha=1e1, beta=1e2):
    """  Calculate the total loss function   Parameters ： J_content --  Content loss  J_style --  Style loss  alpha --  Hyperparameters , Weight of content loss  beta --  Hyperparameters , Weight of style loss   return ： J --  Total loss  """
    J = alpha * J_content + beta * J_style
    return J


# np.random.seed(3)
# J_content = np.random.randn()
# J_style = np.random.randn()
# J = total_cost(J_content,J_style)
# print("J=" + str(J))

def load_img(path_to_img):
    """  Loading pictures  """
    #  The largest dimension of the picture 
    max_dim = 256
    img = tf.io.read_file(path_to_img)
    img = tf.image.decode_image(img, channels=3)
    img = tf.image.convert_image_dtype(img, tf.float32)

    shape = tf.cast(tf.shape(img)[:-1], tf.float32)
    long_dim = max(shape)
    scale = max_dim / long_dim

    new_shape = tf.cast(shape * scale, tf.int32)

    img = tf.image.resize(img, new_shape)
    img = img[tf.newaxis, :]
    return img


def vgg_layers(layer_names):
    """  Select the layer to output   Parameters ： layer_names -- Select the layer as the output   return ： model -- Models with multiple outputs  """
    vgg = tf.keras.applications.VGG19(include_top=False, weights="imagenet")
    vgg.trainable = False

    outputs = [vgg.get_layer(name).output for name in layer_names]

    #  Build a model with multiple outputs 
    model = tf.keras.Model([vgg.input], outputs)

    return model


class StyleContentModel(tf.keras.models.Model):
    def get_config(self):
        pass

    def __init__(self, style_layers, content_layers):
        super(StyleContentModel, self).__init__()
        #  Modified with the specified output layer vgg Model 
        self.vgg = vgg_layers(style_layers + content_layers)
        self.style_layers = style_layers
        self.content_layers = content_layers
        #  Select style layer output   The number of 
        self.num_style_layers = len(style_layers)
        #  No training 
        self.vgg.trainable = False

    def call(self, inputs, training=None, mask=None):
        """Expects float input in [0,1]"""
        inputs = inputs * 255.0
        #  Output preprocessing 
        preprocess_input = tf.keras.applications.vgg19.preprocess_input(inputs)
        #  Get the output 
        outputs = self.vgg(preprocess_input)

        #  Separate from the output 【 Style layer output 】 and 【 Content layer output 】
        style_outputs, content_outputs = outputs[:self.num_style_layers], outputs[self.num_style_layers:]

        #  Yes 【 Content layer output 】 Pre treatment , Turn it into 【 Style matrix 】 In the form of 
        style_outputs = [gram_matrix(style_output) for style_output in style_outputs]

        #  take 【 Content layer output 】 Separate into dictionary form 
        content_dict = {
    
            content_name: value for content_name, value in zip(self.content_layers, content_outputs)
        }

        #  take 【 Style layer output 】 Separate into dictionary form 
        style_dict = {
    
            style_name: value for style_name, value in zip(self.style_layers, style_outputs)
        }

        return {
    'content': content_dict, 'style': style_dict}

def style_content_loss(outputs, target, num_style_layers, num_content_layers):
    """  Calculate the loss   Parameters ： output --  Output after model . Use 【 Content picture 】 Iterate step by step . target --  Goals that need to be approached . It is divided into 【 Content 】 and 【 style 】 Two parts .  Namely 【 Content picture 】 and 【 Style picture 】 Output . num_style_layers -- 【 Style layer output 】 The number of  num_content_layers -- 【 Content layer output 】 The number of  """
    style_outputs = outputs["style"]
    content_outputs = outputs["content"]

    style_target = target["style"]
    content_target = target["content"]

    #  Computing style loss 
    style_loss = tf.add_n([tf.reduce_mean((style_outputs[name]-style_target[name])**2)
                           for name in style_outputs.keys()])
    style_loss /= num_style_layers

    #  Calculate content loss 
    content_loss = tf.add_n([tf.reduce_mean((content_outputs[name]-content_target[name])**2)
                             for name in content_outputs.keys()])
    content_loss /= num_content_layers

    #  Calculate the total loss 
    loss = total_cost(content_loss, style_loss,alpha=1e4,beta=1e-2)
    return loss

#  Plot function 
def imshow2(image, title=None):
    if len(image.shape) > 3:
        image = tf.squeeze(image, axis=0)

    plt.imshow(image)
    if title:
        plt.title(title)


def main1(epochs=5, steps_per_epoch=100):
    #  Starting time 
    start_time = time.perf_counter()

    #  choice vgg The output layer of the model 
    content_layers = ["block5_conv2"]
    style_layers = [
        "block1_conv1",
        "block2_conv1",
        "block3_conv1",
        "block4_conv1",
        "block5_conv1"
    ]
    #  The calculation selects several outputs 
    num_style_layers = len(style_layers)
    num_content_layers = len(content_layers)

    #  Extract with specified output vgg Model 
    extractor = StyleContentModel(style_layers, content_layers)

    #  Load content pictures and style pictures 
    content_image = load_img("images/cat.jpg")
    style_image = load_img("images/monet.jpg")

    #  Run it first , Get the encoded 【 Target style 】 and 【 Target content 】
    style_targets = extractor(style_image)["style"]
    content_targets = extractor(content_image)["content"]

    targets = {
    
        "style": style_targets,
        "content": content_targets
    }

    #  take 【 Content picture 】 As input to the model 
    image = tf.Variable(content_image)

    #  Define optimizer Adam
    opt = tf.optimizers.Adam(learning_rate=0.02)
    #  The weight of the loss function 
    # style_weight = 1e-2
    # content_weight = 1e4
    total_variation_weight = 1e8

    costs = []
    step = 0
    for n in range(epochs):
        for m in range(steps_per_epoch):
            step += 1
            with tf.GradientTape() as tape:
                outputs = extractor(image)
                loss = style_content_loss(outputs, targets, num_style_layers, num_content_layers)
                #  Regularization bias 
                loss += total_variation_weight * total_variation_loss(image)
            #  For input image  updated 
            grads = tape.gradient(loss, image)
            opt.apply_gradients(grads_and_vars=[(grads,image)])
            #  send image stay 0-1 Between 
            image.assign(tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0))

            #  Record the loss 
            costs.append(loss)
            print(f"step{step}--loss:{loss}")
        imshow2(image.read_value())
        plt.title("Train step:{}".format(step))
        plt.show()

    plt.plot(np.squeeze(costs))
    plt.ylabel("cost")
    plt.xlabel("iterations")
    plt.title("learning rate="+str(0.02))
    plt.show()

    #  End time 
    end_time = time.perf_counter()

    #  Elapsed time 
    minium = end_time - start_time
    #  Total printing time 
    print(" Yes ：" + str(int(minium / 60)) + " branch " + str(int(minium % 60)) + " second ")


def high_pass_x_y(image):
    x_var = image[:, :, 1:, :] - image[:, :, :-1, :]
    y_var = image[:, 1:, :, :] - image[:, :-1, :, :]

    return x_var, y_var


def total_variation_loss(image):
    x_deltas, y_deltas = high_pass_x_y(image)
    return tf.reduce_mean(x_deltas ** 2) + tf.reduce_mean(y_deltas ** 2)


def main():
    # load_my_model()
    main1()

if __name__ == '__main__':
    main()