当前位置:网站首页>[machine learning notes] [style transfer] deeplearning ai course4 4th week programming(tensorflow2)
[machine learning notes] [style transfer] deeplearning ai course4 4th week programming(tensorflow2)
2022-07-26 10:39:00 【LittleSeedling】
Special applications - Style transfer
The goal is :
1. Use pre trained models vgg19, Transfer picture style
modify 【 Reference article 】 Code for , Use tensorflow2 Realization
Reference from :
1.【 Chinese/English 】【 Wu Enda's after-school programming homework 】Course 4 - Convolutional neural networks - Work in the fourth week
2.Tensorflow2.0 Nerve style migration
3. Tensorflow2.0 And tf.keras.applacations The migration study
4. Tensorflow2.0 How to specify multiple outputs in the network
5. Wu enda Coursera Deep learning course deeplearning.ai (4-4) Face recognition and neural style conversion – Programming operation
Neural style conversion NST(Neural Style Transfer)
It merges two images , Use 【 Content image 】 and 【 Style image 】 Generate 【 Merged image 】

The migration study
Use the pre trained convolution network , And build on this basis . The idea of using networks trained on different tasks and applying them to new tasks is called transfer learning .
Use here keras Self contained VGG-19. This model is already in a very large ImageNet Training on the database , So I learned to recognize various low-level features ( Shallow ) And advanced features ( Deep level ).
Load model
vgg = tf.keras.applications.VGG19(include_top=False, weights="imagenet")
vgg.summary()
include_top: Whether to keep all the fully connected networks at the top
weights:None Represents random initialization ,"imagenet" Represents loading pre training weights
input_tensor:
input_shpae: Optional , Only when the include_top=False It works , The length should be 3 Of tuple, Indicates the name of the input picture shape, The image 【 Wide and high 】 Must be greater than 71, Such as (150,150,3)
classes: Optional , Categories of image classification , Only when the include_top=True And the pre training weight is not loaded, which can be used .
vgg.trainable = False Said is wrong vgg The parameters in are trained .
Build a model with multiple outputs
selected_layers = ['block1_conv1',
'block2_conv1',
'block3_conv1',
'block4_conv1',
'block5_conv1']
model = tf.keras.Model([vgg.input], outputs)
such , All layer 1 outputs , It contains the output of the previously selected layer
Load model
stay VGG19 Select the output of style layer and content layer
Choosing more can play an average role
def vgg_layers(layer_names):
""" Select the layer to output Parameters : layer_names -- Select the layer as the output return : model -- Models with multiple outputs """
vgg = tf.keras.applications.VGG19(include_top=False, weights="imagenet")
vgg.trainable = False
outputs = [vgg.get_layer(name).output for name in layer_names]
# Build a model with multiple outputs
model = tf.keras.Model([vgg.input], outputs)
return model
Build the model
class StyleContentModel(tf.keras.models.Model):
def get_config(self):
pass
def __init__(self, style_layers, content_layers):
super(StyleContentModel, self).__init__()
# Modified with the specified output layer vgg Model
self.vgg = vgg_layers(style_layers + content_layers)
self.style_layers = style_layers
self.content_layers = content_layers
# Select style layer output The number of
self.num_style_layers = len(style_layers)
# No training
self.vgg.trainable = False
def call(self, inputs, training=None, mask=None):
"""Expects float input in [0,1]"""
inputs = inputs * 255.0
# Output preprocessing
preprocess_input = tf.keras.applications.vgg19.preprocess_input(inputs)
# Get the output
outputs = self.vgg(preprocess_input)
# Separate from the output 【 Style layer output 】 and 【 Content layer output 】
style_outputs, content_outputs = outputs[:self.num_style_layers], outputs[self.num_style_layers:]
# Yes 【 Content layer output 】 Pre treatment , Turn it into 【 Style matrix 】 In the form of
style_outputs = [gram_matrix(style_output) for style_output in style_outputs]
# take 【 Content layer output 】 Separate into dictionary form
content_dict = {
content_name: value for content_name, value in zip(self.content_layers, content_outputs)
}
# take 【 Style layer output 】 Separate into dictionary form
style_dict = {
style_name: value for style_name, value in zip(self.style_layers, style_outputs)
}
return {
'content': content_dict, 'style': style_dict}
Define style matrix
G r a m _ m a t r i x ( A ) = A A T Gram\_matrix(A) = AA^T Gram_matrix(A)=AAT
def gram_matrix(input_tensor):
""" matrix A The style matrix of is AA^T """
result = tf.linalg.einsum('bijc,bijd->bcd', input_tensor, input_tensor)
input_shape = tf.shape(input_tensor)
num_locations = tf.cast(input_shape[1]*input_shape[2], tf.float32)
# Divide by width * high , Avoid the value of style matrix is too large
return result / num_locations
Define the loss function
J ( G ) = α J c o n t e n t ( C , G ) + β J s t y l e ( S , G ) J(G) = \alpha J_{content}(C,G) + \beta J_{style}(S,G) J(G)=αJcontent(C,G)+βJstyle(S,G)
Because the value of the loss function is too large ,
here , take α = 10000 β = 0.01 \alpha = 10000~~\beta = 0.01 α=10000 β=0.01.
J c o n t e n t ( C , G ) = 1 n m ∑ j m ∑ i n ( C i l − G i l ) 2 J_{content}(C,G) = {1\over nm}\sum_j^m\sum_i^n(C_i^l-G_i^l)^2 Jcontent(C,G)=nm1j∑mi∑n(Cil−Gil)2
among i i i Represents each element in the matrix , l l l Represent the output of different layers .
J s t y l e ( S , G ) = 1 n m ∑ j m ∑ i n ( S i l − G i l ) 2 J_{style}(S,G) = {1\over nm}\sum_j^m\sum_i^n(S_i^l-G_i^l)^2 Jstyle(S,G)=nm1j∑mi∑n(Sil−Gil)2
among S For style matrix Gram_martrix(S), i i i Represents each element in the matrix , l l l Represent the output of different layers .
def style_content_loss2(outputs, target, num_style_layers, num_content_layers):
""" Calculate the loss Parameters : output -- Output after model . Use 【 Content picture 】 Iterate step by step . target -- Goals that need to be approached . It is divided into 【 Content 】 and 【 style 】 Two parts . Namely 【 Content picture 】 and 【 Style picture 】 Output . num_style_layers -- 【 Style layer output 】 The number of num_content_layers -- 【 Content layer output 】 The number of """
style_outputs = outputs["style"]
content_outputs = outputs["content"]
style_target = target["style"]
content_target = target["content"]
# Computing style loss
style_loss = tf.add_n([tf.reduce_mean((style_outputs[name]-style_target[name])**2)
for name in style_outputs.keys()])
style_loss /= num_style_layers
# Calculate content loss
content_loss = tf.add_n([tf.reduce_mean((content_outputs[name]-content_target[name])**2)
for name in content_outputs.keys()])
content_loss /= num_content_layers
# Calculate the total loss
loss = total_cost(content_loss, style_loss,alpha=1e4,beta=1e-2)
return loss
def total_cost(J_content, J_style, alpha=1e1, beta=1e2):
""" Calculate the total loss function Parameters : J_content -- Content loss J_style -- Style loss alpha -- Hyperparameters , Weight of content loss beta -- Hyperparameters , Weight of style loss return : J -- Total loss """
J = alpha * J_content + beta * J_style
return J
Loading pictures
The largest dimension of the loaded image is 256( Too big to run , It will explode )
def load_img(path_to_img):
""" Loading pictures """
# The largest dimension of the picture
max_dim = 256
img = tf.io.read_file(path_to_img)
img = tf.image.decode_image(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)
shape = tf.cast(tf.shape(img)[:-1], tf.float32)
long_dim = max(shape)
scale = max_dim / long_dim
new_shape = tf.cast(shape * scale, tf.int32)
img = tf.image.resize(img, new_shape)
img = img[tf.newaxis, :]
return img
Regularization
def high_pass_x_y(image):
x_var = image[:, :, 1:, :] - image[:, :, :-1, :]
y_var = image[:, 1:, :, :] - image[:, :-1, :, :]
return x_var, y_var
def total_variation_loss(image):
x_deltas, y_deltas = high_pass_x_y(image)
return tf.reduce_mean(x_deltas ** 2) + tf.reduce_mean(y_deltas ** 2)
The main function
def main1(epochs=5, steps_per_epoch=100):
# Starting time
start_time = time.perf_counter()
# choice vgg The output layer of the model
content_layers = ["block5_conv2"]
style_layers = [
"block1_conv1",
"block2_conv1",
"block3_conv1",
"block4_conv1",
"block5_conv1"
]
# The calculation selects several outputs
num_style_layers = len(style_layers)
num_content_layers = len(content_layers)
# Extract with specified output vgg Model
extractor = StyleContentModel(style_layers, content_layers)
# Load content pictures and style pictures
content_image = load_img("images/cat.jpg")
style_image = load_img("images/monet.jpg")
# Run it first , Get the encoded 【 Target style 】 and 【 Target content 】
style_targets = extractor(style_image)["style"]
content_targets = extractor(content_image)["content"]
targets = {
"style": style_targets,
"content": content_targets
}
# take 【 Content picture 】 As input to the model
image = tf.Variable(content_image)
# Define optimizer Adam
opt = tf.optimizers.Adam(learning_rate=0.02)
# The weight of the loss function
# style_weight = 1e-2
# content_weight = 1e4
total_variation_weight = 1e8
costs = []
step = 0
for n in range(epochs):
for m in range(steps_per_epoch):
step += 1
with tf.GradientTape() as tape:
outputs = extractor(image)
loss = style_content_loss2(outputs, targets, num_style_layers, num_content_layers)
# Regularization bias
loss += total_variation_weight * total_variation_loss(image)
# For input image updated
grads = tape.gradient(loss, image)
opt.apply_gradients(grads_and_vars=[(grads,image)])
# send image stay 0-1 Between
image.assign(tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0))
# Record the loss
costs.append(loss)
print(f"step{step}--loss:{loss}")
imshow2(image.read_value())
plt.title("Train step:{}".format(step))
plt.show()
plt.plot(np.squeeze(costs))
plt.ylabel("cost")
plt.xlabel("iterations")
plt.title("learning rate="+str(0.02))
plt.show()
# End time
end_time = time.perf_counter()
# Elapsed time
minium = end_time - start_time
# Total printing time
print(" Yes :" + str(int(minium / 60)) + " branch " + str(int(minium % 60)) + " second ")
Combine 1

+
=
iteration 1000 Time 


Combine 2

+

=
iteration 500 Time 

step1--loss:206808352.0
step2--loss:138015312.0
step3--loss:76444464.0
step4--loss:55079300.0
step5--loss:52182004.0
step6--loss:52179800.0
step7--loss:49280824.0
step8--loss:45222588.0
step9--loss:40886236.0
step10--loss:37080472.0
step11--loss:33747848.0
step12--loss:31121796.0
step13--loss:29348120.0
step14--loss:27991062.0
step15--loss:26776242.0
step16--loss:25650356.0
step17--loss:24728126.0
step18--loss:23919458.0
...
step99--loss:8370326.0
step100--loss:8396298.0
step101--loss:8440048.0
...
step499--loss:5766706.5
step500--loss:5699691.5
Yes :14 branch 32 second

Code
tensorflow2.3
python3.8.5
import time
import os
import sys
import scipy.io
import scipy.misc
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
from PIL import Image
import nst_utils
import numpy as np
import tensorflow as tf
from PIL import Image, ImageDraw, ImageFont
# Don't use GPU
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
""" The model used , yes VGG Online 19 Layer version , Already in very big ImageNet Training on the database , Learned to recognize various low-level features and high-level features """
def load_my_model():
# model = nst_utils.load_vgg_model("pretrained-model/imagenet-vgg-verydeep-19.mat")
vgg = tf.keras.applications.VGG19(include_top=False, weights="imagenet")
vgg.summary()
def gram_matrix(input_tensor):
""" matrix A The style matrix of is AA^T """
result = tf.linalg.einsum('bijc,bijd->bcd', input_tensor, input_tensor)
input_shape = tf.shape(input_tensor)
num_locations = tf.cast(input_shape[1]*input_shape[2], tf.float32)
# Divide by width * high , Avoid the value of style matrix is too large
return result / num_locations
# tf.random.set_seed(1)
# A = tf.random.normal([3,2*1],mean=1,stddev=4)
# GA = gram_matrix(A)
# print("GA ="+str(GA))
def total_cost(J_content, J_style, alpha=1e1, beta=1e2):
""" Calculate the total loss function Parameters : J_content -- Content loss J_style -- Style loss alpha -- Hyperparameters , Weight of content loss beta -- Hyperparameters , Weight of style loss return : J -- Total loss """
J = alpha * J_content + beta * J_style
return J
# np.random.seed(3)
# J_content = np.random.randn()
# J_style = np.random.randn()
# J = total_cost(J_content,J_style)
# print("J=" + str(J))
def load_img(path_to_img):
""" Loading pictures """
# The largest dimension of the picture
max_dim = 256
img = tf.io.read_file(path_to_img)
img = tf.image.decode_image(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)
shape = tf.cast(tf.shape(img)[:-1], tf.float32)
long_dim = max(shape)
scale = max_dim / long_dim
new_shape = tf.cast(shape * scale, tf.int32)
img = tf.image.resize(img, new_shape)
img = img[tf.newaxis, :]
return img
def vgg_layers(layer_names):
""" Select the layer to output Parameters : layer_names -- Select the layer as the output return : model -- Models with multiple outputs """
vgg = tf.keras.applications.VGG19(include_top=False, weights="imagenet")
vgg.trainable = False
outputs = [vgg.get_layer(name).output for name in layer_names]
# Build a model with multiple outputs
model = tf.keras.Model([vgg.input], outputs)
return model
class StyleContentModel(tf.keras.models.Model):
def get_config(self):
pass
def __init__(self, style_layers, content_layers):
super(StyleContentModel, self).__init__()
# Modified with the specified output layer vgg Model
self.vgg = vgg_layers(style_layers + content_layers)
self.style_layers = style_layers
self.content_layers = content_layers
# Select style layer output The number of
self.num_style_layers = len(style_layers)
# No training
self.vgg.trainable = False
def call(self, inputs, training=None, mask=None):
"""Expects float input in [0,1]"""
inputs = inputs * 255.0
# Output preprocessing
preprocess_input = tf.keras.applications.vgg19.preprocess_input(inputs)
# Get the output
outputs = self.vgg(preprocess_input)
# Separate from the output 【 Style layer output 】 and 【 Content layer output 】
style_outputs, content_outputs = outputs[:self.num_style_layers], outputs[self.num_style_layers:]
# Yes 【 Content layer output 】 Pre treatment , Turn it into 【 Style matrix 】 In the form of
style_outputs = [gram_matrix(style_output) for style_output in style_outputs]
# take 【 Content layer output 】 Separate into dictionary form
content_dict = {
content_name: value for content_name, value in zip(self.content_layers, content_outputs)
}
# take 【 Style layer output 】 Separate into dictionary form
style_dict = {
style_name: value for style_name, value in zip(self.style_layers, style_outputs)
}
return {
'content': content_dict, 'style': style_dict}
def style_content_loss(outputs, target, num_style_layers, num_content_layers):
""" Calculate the loss Parameters : output -- Output after model . Use 【 Content picture 】 Iterate step by step . target -- Goals that need to be approached . It is divided into 【 Content 】 and 【 style 】 Two parts . Namely 【 Content picture 】 and 【 Style picture 】 Output . num_style_layers -- 【 Style layer output 】 The number of num_content_layers -- 【 Content layer output 】 The number of """
style_outputs = outputs["style"]
content_outputs = outputs["content"]
style_target = target["style"]
content_target = target["content"]
# Computing style loss
style_loss = tf.add_n([tf.reduce_mean((style_outputs[name]-style_target[name])**2)
for name in style_outputs.keys()])
style_loss /= num_style_layers
# Calculate content loss
content_loss = tf.add_n([tf.reduce_mean((content_outputs[name]-content_target[name])**2)
for name in content_outputs.keys()])
content_loss /= num_content_layers
# Calculate the total loss
loss = total_cost(content_loss, style_loss,alpha=1e4,beta=1e-2)
return loss
# Plot function
def imshow2(image, title=None):
if len(image.shape) > 3:
image = tf.squeeze(image, axis=0)
plt.imshow(image)
if title:
plt.title(title)
def main1(epochs=5, steps_per_epoch=100):
# Starting time
start_time = time.perf_counter()
# choice vgg The output layer of the model
content_layers = ["block5_conv2"]
style_layers = [
"block1_conv1",
"block2_conv1",
"block3_conv1",
"block4_conv1",
"block5_conv1"
]
# The calculation selects several outputs
num_style_layers = len(style_layers)
num_content_layers = len(content_layers)
# Extract with specified output vgg Model
extractor = StyleContentModel(style_layers, content_layers)
# Load content pictures and style pictures
content_image = load_img("images/cat.jpg")
style_image = load_img("images/monet.jpg")
# Run it first , Get the encoded 【 Target style 】 and 【 Target content 】
style_targets = extractor(style_image)["style"]
content_targets = extractor(content_image)["content"]
targets = {
"style": style_targets,
"content": content_targets
}
# take 【 Content picture 】 As input to the model
image = tf.Variable(content_image)
# Define optimizer Adam
opt = tf.optimizers.Adam(learning_rate=0.02)
# The weight of the loss function
# style_weight = 1e-2
# content_weight = 1e4
total_variation_weight = 1e8
costs = []
step = 0
for n in range(epochs):
for m in range(steps_per_epoch):
step += 1
with tf.GradientTape() as tape:
outputs = extractor(image)
loss = style_content_loss(outputs, targets, num_style_layers, num_content_layers)
# Regularization bias
loss += total_variation_weight * total_variation_loss(image)
# For input image updated
grads = tape.gradient(loss, image)
opt.apply_gradients(grads_and_vars=[(grads,image)])
# send image stay 0-1 Between
image.assign(tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0))
# Record the loss
costs.append(loss)
print(f"step{step}--loss:{loss}")
imshow2(image.read_value())
plt.title("Train step:{}".format(step))
plt.show()
plt.plot(np.squeeze(costs))
plt.ylabel("cost")
plt.xlabel("iterations")
plt.title("learning rate="+str(0.02))
plt.show()
# End time
end_time = time.perf_counter()
# Elapsed time
minium = end_time - start_time
# Total printing time
print(" Yes :" + str(int(minium / 60)) + " branch " + str(int(minium % 60)) + " second ")
def high_pass_x_y(image):
x_var = image[:, :, 1:, :] - image[:, :, :-1, :]
y_var = image[:, 1:, :, :] - image[:, :-1, :, :]
return x_var, y_var
def total_variation_loss(image):
x_deltas, y_deltas = high_pass_x_y(image)
return tf.reduce_mean(x_deltas ** 2) + tf.reduce_mean(y_deltas ** 2)
def main():
# load_my_model()
main1()
if __name__ == '__main__':
main()
边栏推荐
- [leetcode每日一题2021/4/29]403. 青蛙过河
- Application of.Net open source framework in industrial production
- algorithm
- 将json文件中数组转换为struct
- Analyze the hybrid construction objects in JS in detail (construction plus attributes, prototype plus methods)
- [Halcon vision] threshold segmentation
- 使用Geoprocessor 工具
- 反射机制简述
- .NET操作Redis List列表
- Introduction to data analysis | kaggle Titanic mission (I) - > data loading and preliminary observation
猜你喜欢

Agenda express | list of sub forum agenda on July 27

Issue 6: which mainstream programming language should college students choose

工厂模式详解

记给esp8266烧录刷固件

粽子大战 —— 猜猜谁能赢

Application of.Net open source framework in industrial production

.NET 开源框架在工业生产中的应用

Unit test, what is unit test and why is it so difficult to write a single test

Centos8 (liunx) deploying WTM (asp.net 5) using PgSQL

Okaleido生态核心权益OKA,尽在聚变Mining模式
随机推荐
(转载)ArcGIS Engine中各种点的创建方法
Uninstall Meizu app store
C language callback function
[leetcode每日一题2021/2/14]765. 情侣牵手
移动端双指缩放事件(原生),e.originalEvent.touches
MD5加密
[leetcode每日一题2021/5/8]1723. 完成所有工作的最短时间
多目标优化系列1---NSGA2的非支配排序函数的讲解
【机器学习小记】【风格迁移】deeplearning.ai course4 4th week programming(tensorflow2)
头歌 Phoenix 入门(第1关:Phoenix 安装、第2关:Phoenix 基础语法)
Redis特殊数据类型使用场景
MLX90640 红外热成像仪测温传感器模块开发笔记(六)红外图像伪彩色编码
超图 影像 如何去除黑边(两种方法)
从蚂蚁的觅食过程看团队研发(转载)
.net operation redis list list
Redis docker instance and data structure
Our Web3 entrepreneurship project is yellow
A semicolon is missing
Tradingview 使用教程
Introduction to Phoenix (Level 1: Phoenix installation, level 2: Phoenix basic grammar)