当前位置：网站首页>Tensorflow—Neural Style Transfer

Tensorflow—Neural Style Transfer

2022-07-03 10:27:00 【JallinRichel】

Style transfer [Style_transfer] Learning notes ——Tensorflow

The actual effect of the tutorial is shown
Overview of style transfer & The implementation idea of this tutorial
- Overview of style transfer
- The implementation idea of this tutorial
Preparation
- Software preparation
- Data preparation （ Download content and style pictures ）
Code implementation
Thank you for watching ！！！

explain ： This article is for the author to learn Tensorflow Study notes during the official tutorial , Now it is sorted out for your reference . You can read and learn this article as the Chinese translation of the official tutorial . The code of this tutorial is consistent with the official code , There are only a few minor changes .

Tensorflow The official tutorial link is attached at the end of the article .

The actual effect of the tutorial is shown

This tutorial will use pictures 1 With pictures 2 Generate pictures 3

picture 1—— Content picture ¹

Content picture ,url：https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg
picture 2—— Style picture ²

Insert picture description here
picture 3—— Result pictures （ Parameter setting epoch=10, step_per_epoch=100）

Insert picture description here

Overview of style transfer & The implementation idea of this tutorial

Overview of style transfer

The image passes by The covariance matrix of the feature map obtained after convolution layer can well characterize the texture features of the image , But it will lose location information . But in the task of style transfer , We can ignore the disadvantage of location information loss , Just find a way to represent the texture information of the image , And transfer these texture information to the image that needs to be transferred by style , Complete the task of style migration .

In this tutorial , We use Clem matrix instead of covariance matrix , It can describe the autocorrelation of global features .

The implementation idea of this tutorial

In this tutorial , We start from the trained image classification network VGG19 Select the middle layer to extract the texture features of the image .

The official tutorial also introduces the use of Tensorflow_Hub Examples of direct and rapid style migration , In this article, we will not repeat .

Preparation

Software preparation

install anaconda The latest version ³, Click on Download Select the corresponding version to download ;（ It is recommended to use GPU edition ）
The author's CPU by Intel Core i5-8300H CPU @ 2.30GHz 2.30 GHz
This tutorial USES Python 3.7.7 edition
install Tensorflow、Keras、Matplotlib、Numpy

Data preparation （ Download content and style pictures ）

content_path = tf.keras.utils.get_file('YellowLabradorLooking_new.jpg', 'https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg')
style_path = tf.keras.utils.get_file('kandinsky5.jpg','https://storage.googleapis.com/download.tensorflow.org/example_images/Vassily_Kandinsky%2C_1913_-_Composition_7.jpg')

Code implementation

Import corresponding modules

import os
import tensorflow as tf
# Load compressed models from tensorflow_hub
os.environ['TFHUB_MODEL_LOAD_FORMAT'] = 'COMPRESSED'

import IPython.display as display

import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (12, 12)
mpl.rcParams['axes.grid'] = False

import numpy as np
import PIL.Image
import time
import functools

def tensor_to_image(tensor): # Define the transformation function from tensor to image 
  tensor = tensor*255
  tensor = np.array(tensor, dtype=np.uint8)
  if np.ndim(tensor)>3:
    assert tensor.shape[0] == 1
    tensor = tensor[0]
  return PIL.Image.fromarray(tensor)

Show on the screen that two pictures have been downloaded

Define a function to load pictures , And limit the maximum size of the picture to 512 Pixels .

def load_img(path_to_img):
  max_dim = 512
  img = tf.io.read_file(path_to_img)
  img = tf.image.decode_image(img, channels=3)
  img = tf.image.convert_image_dtype(img, tf.float32)

  shape = tf.cast(tf.shape(img)[:-1], tf.float32)
  long_dim = max(shape)
  scale = max_dim / long_dim

  new_shape = tf.cast(shape * scale, tf.int32)

  img = tf.image.resize(img, new_shape)
  img = img[tf.newaxis, :]
  return img

Define a function to display pictures

def imshow(image, title=None):
  if len(image.shape) > 3:
    image = tf.squeeze(image, axis=0)

  plt.imshow(image)
  if title:
    plt.title(title)

Display images

content_image = load_img(content_path)
style_image = load_img(style_path)

plt.subplot(1, 2, 1)
imshow(content_image, 'Content Image')

plt.subplot(1, 2, 2)
imshow(style_image, 'Style Image')

Now two pictures should be displayed on your screen
Insert picture description here

Define content and style expression

We can use the middle layer of the model to obtain the content and style of the image .

Start with the input layer of the network , The first few layers can be used to represent low-level features such as edges and textures . When you browse the Internet step by step , The last few layers can be used to represent the high-level features of the image —— Such as wheels or eyes .

In this tutorial , We use VGG19 The middle layer of the network architecture defines the content and style of the image , Try to match the corresponding style and content target representation on these middle layers .

Load a that does not contain a classification header VGG19, List its layer name

vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
# If you want to load the classification header , You can take the top one  False Change it to  True
print()
for layer in vgg.layers:
  print(layer.name)

Choose the middle tier from the network to express the content and style of the picture

content_layers = ['block5_conv2'] 

style_layers = ['block1_conv1',
                'block2_conv1',
                'block3_conv1', 
                'block4_conv1', 
                'block5_conv1']

num_content_layers = len(content_layers)
num_style_layers = len(style_layers)

Why can these intermediate output layers define the content and style representation of pictures

In high-level features , A network wants to realize image classification , Then it must understand the picture . This requires the original image as the input pixel , And create an internal representation , The complex understanding of converting the original image pixels into the corresponding image features .

This is also one reason why convolutional neural network can have better results ： They can capture the invariance and defining features of classes that are not affected by background noise and other disturbances .

So when the image is input into the model , At this time, the model acts as a complex feature extractor . By accessing the middle tier of the model , We can describe the content and style of the input image .

Build a model

The network is designed in tf.keras.applications in , We can go through Keras functional API Extract the value of the middle layer .
We can use model = Model(inputs, outputs) To define the model .

The following function can build a VGG19 Model

def vgg_layers(layer_names):
  """ Creates a vgg model that returns a list of intermediate output values."""
  # Load our model. Load pretrained VGG, trained on imagenet data
  vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
  vgg.trainable = False

  outputs = [vgg.get_layer(name).output for name in layer_names]

  model = tf.keras.Model([vgg.input], outputs)
  return model

style_extractor = vgg_layers(style_layers)
style_outputs = style_extractor(style_image*255)

#Look at the statistics of each layer's output
for name, output in zip(style_layers, style_outputs):
  print(name)
  print(" shape: ", output.numpy().shape)
  print(" min: ", output.numpy().min())
  print(" max: ", output.numpy().max())
  print(" mean: ", output.numpy().mean())
  print()

Computing style （ Calculate the Clem matrix ）

By taking the outer product of the eigenvector and itself at each position , And find the average value of the outer product at all positions , Calculate the Clem matrix containing this information .
Insert picture description here
This can be used tf.linalg.einsum Function to implement

def gram_matrix(input_tensor):
  result = tf.linalg.einsum('bijc,bijd->bcd', input_tensor, input_tensor)
  input_shape = tf.shape(input_tensor)
  num_locations = tf.cast(input_shape[1]*input_shape[2], tf.float32)
  return result/(num_locations)

Extract the style and content of the image

Build a model of return style and content tensor

class StyleContentModel(tf.keras.models.Model):
  def __init__(self, style_layers, content_layers):
    super(StyleContentModel, self).__init__()
    self.vgg = vgg_layers(style_layers + content_layers)
    self.style_layers = style_layers
    self.content_layers = content_layers
    self.num_style_layers = len(style_layers)
    self.vgg.trainable = False

  def call(self, inputs):
    "Expects float input in [0,1]"
    inputs = inputs*255.0
    preprocessed_input = tf.keras.applications.vgg19.preprocess_input(inputs)
    outputs = self.vgg(preprocessed_input)
    style_outputs, content_outputs = (outputs[:self.num_style_layers],
                                      outputs[self.num_style_layers:])

    style_outputs = [gram_matrix(style_output)
                     for style_output in style_outputs]

    content_dict = {
    content_name: value
                    for content_name, value
                    in zip(self.content_layers, content_outputs)}

    style_dict = {
    style_name: value
                  for style_name, value
                  in zip(self.style_layers, style_outputs)}

    return {
    'content': content_dict, 'style': style_dict}

extractor = StyleContentModel(style_layers, content_layers)

results = extractor(tf.constant(content_image))

When we input the image , The model will return style_layers Clem matrix and content_layers The content of .

The operating gradient drops

With style and content extractors , We can now run the style migration algorithm program .

By calculating the mean square error of the image output relative to each target , Then take the weighted sum of these losses .

Establish style and content goals

style_targets = extractor(style_image)['style']
content_targets = extractor(content_image)['content']

Define a tf.Variable To include the image to be optimized . In the code before the article, the pixel value of the image has been set to float32 type .（tf.Variable Must have the same shape as the content image ）

In order to make the algorithm faster , We limit the pixel value of the picture to 0 To 1 Between

image = tf.Variable(content_image)

def clip_0_1(image):
  return tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)

Build an optimizer , And use the weighted combination of the two losses to get the total loss .

When used in this tutorial Adma, But recommended LBFGS.

opt = tf.optimizers.Adam(learning_rate=0.02, beta_1=0.99, epsilon=1e-1)

style_weight=1e-2
content_weight=1e4

def style_content_loss(outputs):
    style_outputs = outputs['style']
    content_outputs = outputs['content']
    style_loss = tf.add_n([tf.reduce_mean((style_outputs[name]-style_targets[name])**2) 
                           for name in style_outputs.keys()])
    style_loss *= style_weight / num_style_layers

    content_loss = tf.add_n([tf.reduce_mean((content_outputs[name]-content_targets[name])**2) 
                             for name in content_outputs.keys()])
    content_loss *= content_weight / num_content_layers
    loss = style_loss + content_loss
    return loss

Use tf,GradientTape To update the picture

@tf.function()
def train_step(image):
  with tf.GradientTape() as tape:
    outputs = extractor(image)
    loss = style_content_loss(outputs)

  grad = tape.gradient(loss, image)
  opt.apply_gradients([(grad, image)])
  image.assign(clip_0_1(image))

Now we can try to run our program several times

train_step(image)
train_step(image)
train_step(image)
tensor_to_image(image)

Output results ：

Insert picture description here
Then we run it many times , For better results

import time
start = time.time()

epochs = 10
steps_per_epoch = 100

step = 0
for n in range(epochs):
  for m in range(steps_per_epoch):
    step += 1
    train_step(image)
    print(".", end='', flush=True)
  display.clear_output(wait=True)
  display.display(tensor_to_image(image))
  print("Train step: {}".format(step))

end = time.time()
print("Total time: {:.1f}".format(end-start))

If the reader who carries out this step uses CPU Version! , The speed of producing results will be relatively slow , Can be epochs perhaps step_per_epoch Set the value of lower .

Output results ：

Insert picture description here