当前位置:网站首页>[machine learning notes] [style transfer] deeplearning ai course4 4th week programming(tensorflow2)
[machine learning notes] [style transfer] deeplearning ai course4 4th week programming(tensorflow2)
2022-07-26 10:39:00 【LittleSeedling】
Special applications - Style transfer
The goal is :
1. Use pre trained models vgg19, Transfer picture style
modify 【 Reference article 】 Code for , Use tensorflow2 Realization
Reference from :
1.【 Chinese/English 】【 Wu Enda's after-school programming homework 】Course 4 - Convolutional neural networks - Work in the fourth week
2.Tensorflow2.0 Nerve style migration
3. Tensorflow2.0 And tf.keras.applacations The migration study
4. Tensorflow2.0 How to specify multiple outputs in the network
5. Wu enda Coursera Deep learning course deeplearning.ai (4-4) Face recognition and neural style conversion – Programming operation
Neural style conversion NST(Neural Style Transfer)
It merges two images , Use 【 Content image 】 and 【 Style image 】 Generate 【 Merged image 】

The migration study
Use the pre trained convolution network , And build on this basis . The idea of using networks trained on different tasks and applying them to new tasks is called transfer learning .
Use here keras Self contained VGG-19. This model is already in a very large ImageNet Training on the database , So I learned to recognize various low-level features ( Shallow ) And advanced features ( Deep level ).
Load model
vgg = tf.keras.applications.VGG19(include_top=False, weights="imagenet")
vgg.summary()
include_top: Whether to keep all the fully connected networks at the top
weights:None Represents random initialization ,"imagenet" Represents loading pre training weights
input_tensor:
input_shpae: Optional , Only when the include_top=False It works , The length should be 3 Of tuple, Indicates the name of the input picture shape, The image 【 Wide and high 】 Must be greater than 71, Such as (150,150,3)
classes: Optional , Categories of image classification , Only when the include_top=True And the pre training weight is not loaded, which can be used .
vgg.trainable = False Said is wrong vgg The parameters in are trained .
Build a model with multiple outputs
selected_layers = ['block1_conv1',
'block2_conv1',
'block3_conv1',
'block4_conv1',
'block5_conv1']
model = tf.keras.Model([vgg.input], outputs)
such , All layer 1 outputs , It contains the output of the previously selected layer
Load model
stay VGG19 Select the output of style layer and content layer
Choosing more can play an average role
def vgg_layers(layer_names):
""" Select the layer to output Parameters : layer_names -- Select the layer as the output return : model -- Models with multiple outputs """
vgg = tf.keras.applications.VGG19(include_top=False, weights="imagenet")
vgg.trainable = False
outputs = [vgg.get_layer(name).output for name in layer_names]
# Build a model with multiple outputs
model = tf.keras.Model([vgg.input], outputs)
return model
Build the model
class StyleContentModel(tf.keras.models.Model):
def get_config(self):
pass
def __init__(self, style_layers, content_layers):
super(StyleContentModel, self).__init__()
# Modified with the specified output layer vgg Model
self.vgg = vgg_layers(style_layers + content_layers)
self.style_layers = style_layers
self.content_layers = content_layers
# Select style layer output The number of
self.num_style_layers = len(style_layers)
# No training
self.vgg.trainable = False
def call(self, inputs, training=None, mask=None):
"""Expects float input in [0,1]"""
inputs = inputs * 255.0
# Output preprocessing
preprocess_input = tf.keras.applications.vgg19.preprocess_input(inputs)
# Get the output
outputs = self.vgg(preprocess_input)
# Separate from the output 【 Style layer output 】 and 【 Content layer output 】
style_outputs, content_outputs = outputs[:self.num_style_layers], outputs[self.num_style_layers:]
# Yes 【 Content layer output 】 Pre treatment , Turn it into 【 Style matrix 】 In the form of
style_outputs = [gram_matrix(style_output) for style_output in style_outputs]
# take 【 Content layer output 】 Separate into dictionary form
content_dict = {
content_name: value for content_name, value in zip(self.content_layers, content_outputs)
}
# take 【 Style layer output 】 Separate into dictionary form
style_dict = {
style_name: value for style_name, value in zip(self.style_layers, style_outputs)
}
return {
'content': content_dict, 'style': style_dict}
Define style matrix
G r a m _ m a t r i x ( A ) = A A T Gram\_matrix(A) = AA^T Gram_matrix(A)=AAT
def gram_matrix(input_tensor):
""" matrix A The style matrix of is AA^T """
result = tf.linalg.einsum('bijc,bijd->bcd', input_tensor, input_tensor)
input_shape = tf.shape(input_tensor)
num_locations = tf.cast(input_shape[1]*input_shape[2], tf.float32)
# Divide by width * high , Avoid the value of style matrix is too large
return result / num_locations
Define the loss function
J ( G ) = α J c o n t e n t ( C , G ) + β J s t y l e ( S , G ) J(G) = \alpha J_{content}(C,G) + \beta J_{style}(S,G) J(G)=αJcontent(C,G)+βJstyle(S,G)
Because the value of the loss function is too large ,
here , take α = 10000 β = 0.01 \alpha = 10000~~\beta = 0.01 α=10000 β=0.01.
J c o n t e n t ( C , G ) = 1 n m ∑ j m ∑ i n ( C i l − G i l ) 2 J_{content}(C,G) = {1\over nm}\sum_j^m\sum_i^n(C_i^l-G_i^l)^2 Jcontent(C,G)=nm1j∑mi∑n(Cil−Gil)2
among i i i Represents each element in the matrix , l l l Represent the output of different layers .
J s t y l e ( S , G ) = 1 n m ∑ j m ∑ i n ( S i l − G i l ) 2 J_{style}(S,G) = {1\over nm}\sum_j^m\sum_i^n(S_i^l-G_i^l)^2 Jstyle(S,G)=nm1j∑mi∑n(Sil−Gil)2
among S For style matrix Gram_martrix(S), i i i Represents each element in the matrix , l l l Represent the output of different layers .
def style_content_loss2(outputs, target, num_style_layers, num_content_layers):
""" Calculate the loss Parameters : output -- Output after model . Use 【 Content picture 】 Iterate step by step . target -- Goals that need to be approached . It is divided into 【 Content 】 and 【 style 】 Two parts . Namely 【 Content picture 】 and 【 Style picture 】 Output . num_style_layers -- 【 Style layer output 】 The number of num_content_layers -- 【 Content layer output 】 The number of """
style_outputs = outputs["style"]
content_outputs = outputs["content"]
style_target = target["style"]
content_target = target["content"]
# Computing style loss
style_loss = tf.add_n([tf.reduce_mean((style_outputs[name]-style_target[name])**2)
for name in style_outputs.keys()])
style_loss /= num_style_layers
# Calculate content loss
content_loss = tf.add_n([tf.reduce_mean((content_outputs[name]-content_target[name])**2)
for name in content_outputs.keys()])
content_loss /= num_content_layers
# Calculate the total loss
loss = total_cost(content_loss, style_loss,alpha=1e4,beta=1e-2)
return loss
def total_cost(J_content, J_style, alpha=1e1, beta=1e2):
""" Calculate the total loss function Parameters : J_content -- Content loss J_style -- Style loss alpha -- Hyperparameters , Weight of content loss beta -- Hyperparameters , Weight of style loss return : J -- Total loss """
J = alpha * J_content + beta * J_style
return J
Loading pictures
The largest dimension of the loaded image is 256( Too big to run , It will explode )
def load_img(path_to_img):
""" Loading pictures """
# The largest dimension of the picture
max_dim = 256
img = tf.io.read_file(path_to_img)
img = tf.image.decode_image(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)
shape = tf.cast(tf.shape(img)[:-1], tf.float32)
long_dim = max(shape)
scale = max_dim / long_dim
new_shape = tf.cast(shape * scale, tf.int32)
img = tf.image.resize(img, new_shape)
img = img[tf.newaxis, :]
return img
Regularization
def high_pass_x_y(image):
x_var = image[:, :, 1:, :] - image[:, :, :-1, :]
y_var = image[:, 1:, :, :] - image[:, :-1, :, :]
return x_var, y_var
def total_variation_loss(image):
x_deltas, y_deltas = high_pass_x_y(image)
return tf.reduce_mean(x_deltas ** 2) + tf.reduce_mean(y_deltas ** 2)
The main function
def main1(epochs=5, steps_per_epoch=100):
# Starting time
start_time = time.perf_counter()
# choice vgg The output layer of the model
content_layers = ["block5_conv2"]
style_layers = [
"block1_conv1",
"block2_conv1",
"block3_conv1",
"block4_conv1",
"block5_conv1"
]
# The calculation selects several outputs
num_style_layers = len(style_layers)
num_content_layers = len(content_layers)
# Extract with specified output vgg Model
extractor = StyleContentModel(style_layers, content_layers)
# Load content pictures and style pictures
content_image = load_img("images/cat.jpg")
style_image = load_img("images/monet.jpg")
# Run it first , Get the encoded 【 Target style 】 and 【 Target content 】
style_targets = extractor(style_image)["style"]
content_targets = extractor(content_image)["content"]
targets = {
"style": style_targets,
"content": content_targets
}
# take 【 Content picture 】 As input to the model
image = tf.Variable(content_image)
# Define optimizer Adam
opt = tf.optimizers.Adam(learning_rate=0.02)
# The weight of the loss function
# style_weight = 1e-2
# content_weight = 1e4
total_variation_weight = 1e8
costs = []
step = 0
for n in range(epochs):
for m in range(steps_per_epoch):
step += 1
with tf.GradientTape() as tape:
outputs = extractor(image)
loss = style_content_loss2(outputs, targets, num_style_layers, num_content_layers)
# Regularization bias
loss += total_variation_weight * total_variation_loss(image)
# For input image updated
grads = tape.gradient(loss, image)
opt.apply_gradients(grads_and_vars=[(grads,image)])
# send image stay 0-1 Between
image.assign(tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0))
# Record the loss
costs.append(loss)
print(f"step{step}--loss:{loss}")
imshow2(image.read_value())
plt.title("Train step:{}".format(step))
plt.show()
plt.plot(np.squeeze(costs))
plt.ylabel("cost")
plt.xlabel("iterations")
plt.title("learning rate="+str(0.02))
plt.show()
# End time
end_time = time.perf_counter()
# Elapsed time
minium = end_time - start_time
# Total printing time
print(" Yes :" + str(int(minium / 60)) + " branch " + str(int(minium % 60)) + " second ")
Combine 1

+
=
iteration 1000 Time 


Combine 2

+

=
iteration 500 Time 

step1--loss:206808352.0
step2--loss:138015312.0
step3--loss:76444464.0
step4--loss:55079300.0
step5--loss:52182004.0
step6--loss:52179800.0
step7--loss:49280824.0
step8--loss:45222588.0
step9--loss:40886236.0
step10--loss:37080472.0
step11--loss:33747848.0
step12--loss:31121796.0
step13--loss:29348120.0
step14--loss:27991062.0
step15--loss:26776242.0
step16--loss:25650356.0
step17--loss:24728126.0
step18--loss:23919458.0
...
step99--loss:8370326.0
step100--loss:8396298.0
step101--loss:8440048.0
...
step499--loss:5766706.5
step500--loss:5699691.5
Yes :14 branch 32 second

Code
tensorflow2.3
python3.8.5
import time
import os
import sys
import scipy.io
import scipy.misc
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
from PIL import Image
import nst_utils
import numpy as np
import tensorflow as tf
from PIL import Image, ImageDraw, ImageFont
# Don't use GPU
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
""" The model used , yes VGG Online 19 Layer version , Already in very big ImageNet Training on the database , Learned to recognize various low-level features and high-level features """
def load_my_model():
# model = nst_utils.load_vgg_model("pretrained-model/imagenet-vgg-verydeep-19.mat")
vgg = tf.keras.applications.VGG19(include_top=False, weights="imagenet")
vgg.summary()
def gram_matrix(input_tensor):
""" matrix A The style matrix of is AA^T """
result = tf.linalg.einsum('bijc,bijd->bcd', input_tensor, input_tensor)
input_shape = tf.shape(input_tensor)
num_locations = tf.cast(input_shape[1]*input_shape[2], tf.float32)
# Divide by width * high , Avoid the value of style matrix is too large
return result / num_locations
# tf.random.set_seed(1)
# A = tf.random.normal([3,2*1],mean=1,stddev=4)
# GA = gram_matrix(A)
# print("GA ="+str(GA))
def total_cost(J_content, J_style, alpha=1e1, beta=1e2):
""" Calculate the total loss function Parameters : J_content -- Content loss J_style -- Style loss alpha -- Hyperparameters , Weight of content loss beta -- Hyperparameters , Weight of style loss return : J -- Total loss """
J = alpha * J_content + beta * J_style
return J
# np.random.seed(3)
# J_content = np.random.randn()
# J_style = np.random.randn()
# J = total_cost(J_content,J_style)
# print("J=" + str(J))
def load_img(path_to_img):
""" Loading pictures """
# The largest dimension of the picture
max_dim = 256
img = tf.io.read_file(path_to_img)
img = tf.image.decode_image(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)
shape = tf.cast(tf.shape(img)[:-1], tf.float32)
long_dim = max(shape)
scale = max_dim / long_dim
new_shape = tf.cast(shape * scale, tf.int32)
img = tf.image.resize(img, new_shape)
img = img[tf.newaxis, :]
return img
def vgg_layers(layer_names):
""" Select the layer to output Parameters : layer_names -- Select the layer as the output return : model -- Models with multiple outputs """
vgg = tf.keras.applications.VGG19(include_top=False, weights="imagenet")
vgg.trainable = False
outputs = [vgg.get_layer(name).output for name in layer_names]
# Build a model with multiple outputs
model = tf.keras.Model([vgg.input], outputs)
return model
class StyleContentModel(tf.keras.models.Model):
def get_config(self):
pass
def __init__(self, style_layers, content_layers):
super(StyleContentModel, self).__init__()
# Modified with the specified output layer vgg Model
self.vgg = vgg_layers(style_layers + content_layers)
self.style_layers = style_layers
self.content_layers = content_layers
# Select style layer output The number of
self.num_style_layers = len(style_layers)
# No training
self.vgg.trainable = False
def call(self, inputs, training=None, mask=None):
"""Expects float input in [0,1]"""
inputs = inputs * 255.0
# Output preprocessing
preprocess_input = tf.keras.applications.vgg19.preprocess_input(inputs)
# Get the output
outputs = self.vgg(preprocess_input)
# Separate from the output 【 Style layer output 】 and 【 Content layer output 】
style_outputs, content_outputs = outputs[:self.num_style_layers], outputs[self.num_style_layers:]
# Yes 【 Content layer output 】 Pre treatment , Turn it into 【 Style matrix 】 In the form of
style_outputs = [gram_matrix(style_output) for style_output in style_outputs]
# take 【 Content layer output 】 Separate into dictionary form
content_dict = {
content_name: value for content_name, value in zip(self.content_layers, content_outputs)
}
# take 【 Style layer output 】 Separate into dictionary form
style_dict = {
style_name: value for style_name, value in zip(self.style_layers, style_outputs)
}
return {
'content': content_dict, 'style': style_dict}
def style_content_loss(outputs, target, num_style_layers, num_content_layers):
""" Calculate the loss Parameters : output -- Output after model . Use 【 Content picture 】 Iterate step by step . target -- Goals that need to be approached . It is divided into 【 Content 】 and 【 style 】 Two parts . Namely 【 Content picture 】 and 【 Style picture 】 Output . num_style_layers -- 【 Style layer output 】 The number of num_content_layers -- 【 Content layer output 】 The number of """
style_outputs = outputs["style"]
content_outputs = outputs["content"]
style_target = target["style"]
content_target = target["content"]
# Computing style loss
style_loss = tf.add_n([tf.reduce_mean((style_outputs[name]-style_target[name])**2)
for name in style_outputs.keys()])
style_loss /= num_style_layers
# Calculate content loss
content_loss = tf.add_n([tf.reduce_mean((content_outputs[name]-content_target[name])**2)
for name in content_outputs.keys()])
content_loss /= num_content_layers
# Calculate the total loss
loss = total_cost(content_loss, style_loss,alpha=1e4,beta=1e-2)
return loss
# Plot function
def imshow2(image, title=None):
if len(image.shape) > 3:
image = tf.squeeze(image, axis=0)
plt.imshow(image)
if title:
plt.title(title)
def main1(epochs=5, steps_per_epoch=100):
# Starting time
start_time = time.perf_counter()
# choice vgg The output layer of the model
content_layers = ["block5_conv2"]
style_layers = [
"block1_conv1",
"block2_conv1",
"block3_conv1",
"block4_conv1",
"block5_conv1"
]
# The calculation selects several outputs
num_style_layers = len(style_layers)
num_content_layers = len(content_layers)
# Extract with specified output vgg Model
extractor = StyleContentModel(style_layers, content_layers)
# Load content pictures and style pictures
content_image = load_img("images/cat.jpg")
style_image = load_img("images/monet.jpg")
# Run it first , Get the encoded 【 Target style 】 and 【 Target content 】
style_targets = extractor(style_image)["style"]
content_targets = extractor(content_image)["content"]
targets = {
"style": style_targets,
"content": content_targets
}
# take 【 Content picture 】 As input to the model
image = tf.Variable(content_image)
# Define optimizer Adam
opt = tf.optimizers.Adam(learning_rate=0.02)
# The weight of the loss function
# style_weight = 1e-2
# content_weight = 1e4
total_variation_weight = 1e8
costs = []
step = 0
for n in range(epochs):
for m in range(steps_per_epoch):
step += 1
with tf.GradientTape() as tape:
outputs = extractor(image)
loss = style_content_loss(outputs, targets, num_style_layers, num_content_layers)
# Regularization bias
loss += total_variation_weight * total_variation_loss(image)
# For input image updated
grads = tape.gradient(loss, image)
opt.apply_gradients(grads_and_vars=[(grads,image)])
# send image stay 0-1 Between
image.assign(tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0))
# Record the loss
costs.append(loss)
print(f"step{step}--loss:{loss}")
imshow2(image.read_value())
plt.title("Train step:{}".format(step))
plt.show()
plt.plot(np.squeeze(costs))
plt.ylabel("cost")
plt.xlabel("iterations")
plt.title("learning rate="+str(0.02))
plt.show()
# End time
end_time = time.perf_counter()
# Elapsed time
minium = end_time - start_time
# Total printing time
print(" Yes :" + str(int(minium / 60)) + " branch " + str(int(minium % 60)) + " second ")
def high_pass_x_y(image):
x_var = image[:, :, 1:, :] - image[:, :, :-1, :]
y_var = image[:, 1:, :, :] - image[:, :-1, :, :]
return x_var, y_var
def total_variation_loss(image):
x_deltas, y_deltas = high_pass_x_y(image)
return tf.reduce_mean(x_deltas ** 2) + tf.reduce_mean(y_deltas ** 2)
def main():
# load_my_model()
main1()
if __name__ == '__main__':
main()
边栏推荐
- el-table实现可编辑表格
- .NET操作Redis String字符串
- Database functions
- Some web APIs you don't know
- json_ object_ put: Assertion `jso->_ ref_ count > 0‘ failed. Aborted (core dumped)
- Tradingview 使用教程
- .net operation redis set unordered collection
- Issue 8: cloud native -- how should college students learn in the workplace
- Write to esp8266 burning brush firmware
- 粽子大战 —— 猜猜谁能赢
猜你喜欢

GIS方法类期刊和论文的综述(Introduction)怎么写?

vscode上使用anaconda(已经配置好环境)
The software cannot be opened

STM32 阿里云MQTT esp8266 AT命令

Okaleido ecological core equity Oka, all in fusion mining mode

Analysis of the transaction problem of chained method call

canvas上传图片base64-有裁剪功能-Jcrop.js

第4期:大学生提前职业技能准备之一

【论文下饭】Deep Mining External Imperfect Data for ChestX-ray Disease Screening

第6期:大学生应该选择哪种主流编程语言
随机推荐
[Halcon vision] image gray change
移动端双指缩放事件(原生),e.originalEvent.touches
MD5 encryption
2022/07/25 ------ arrangement of strings
Tradingview 使用教程
vscode上使用anaconda(已经配置好环境)
英语基础句型结构------起源
[Halcon vision] Fourier transform of image
7-25 0-1背包 (50分)
MLX90640 红外热成像仪测温传感器模块开发笔记(六)红外图像伪彩色编码
多目标优化系列1---NSGA2的非支配排序函数的讲解
【机器学习小记】【风格迁移】deeplearning.ai course4 4th week programming(tensorflow2)
json_object_put: Assertion `jso->_ref_count > 0‘ failed.Aborted (core dumped)
Asynctask < T> decoration and await are not used in synchronous methods to obtain asynchronous return values (asynchronous methods are called in synchronous methods)
QRcode二维码(C语言)遇到的问题
13 以对象管理资源
【dectectron2】跟着官方demo一起做
Redis special data type usage scenarios
Controller返回JSON数据
12 复制对象时勿忘其每一个成分