当前位置:网站首页>Complete DNN deep neural network CNN training with tensorflow to complete image recognition cases
Complete DNN deep neural network CNN training with tensorflow to complete image recognition cases
2022-07-03 13:36:00 【Haibao 7】
In deep learning , One of the more successful applications than traditional machine learning is image recognition . We use widely used MNIST Handwritten digital image data set .
Data set official website :
http://yann.lecun.com/exdb/mnist/
Use tensorflow complete DNN Deep neural network CNN Train to complete handwritten picture recognition
MNIST_data_bak Under the document Load source





The introduction and explanation in the program are very clear .
Talk with pictures , Self understanding >




Backpropagation
Back propagation , Gradient descent is used reverse-mode autodiff
• Forward propagation , Namely make predictions, Then calculate the output error , however
Then calculate the contribution of each neuron node to the error
• To seek contribution is to seek gradient according to the error of forward propagation
• Then adjust the original weight according to the contribution
The code is as follows :
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import numpy as np
from tensorflow.contrib.layers import fully_connected
# The build diagram phase
n_inputs = 28*28
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name='X')
y = tf.placeholder(tf.int64, shape=(None), name='y')
# Build a neural network layer , We have two hidden layers here , Is essentially the same , In addition to the input inputs The connections to each neuron are different
# It's different from the number of neurons
# The output layer is also very similar , Just activate the function from ReLU Turned into Softmax nothing more
def neuron_layer(X, n_neurons, name, activation=None):
# Including all computing nodes for this layer ,name_scope Write but not write
with tf.name_scope(name):
# Take the dimension of the input matrix as the number of input connections of the layer
n_inputs = int(X.get_shape()[1])
stddev = 2 / np.sqrt(n_inputs)
# Inside this layer w Think of it as a two-dimensional array , Each neuron for a group of w Parameters
# truncated normal distribution Than regular normal distribution The value of is small
# There won't be any large weight values , Make sure you train slowly and steadily
# Using this standard variance will make the convergence faster
# w Parameters need to be random , Not for 0, Otherwise, the output is 0, Finally, the adjustment is not meaningful
init = tf.truncated_normal((n_inputs, n_neurons), stddev=stddev)
w = tf.Variable(init, name='weights')
b = tf.Variable(tf.zeros([n_neurons]), name='biases')
# The use of vector representations is more efficient than adding one by one
z = tf.matmul(X, w) + b
if activation == "relu":
return tf.nn.relu(z)
else:
return z
with tf.name_scope("dnn"):
hidden1 = neuron_layer(X, n_hidden1, "hidden1", activation="relu")
hidden2 = neuron_layer(hidden1, n_hidden2, "hidden2", activation="relu")
# Enter into softmax Previous results
logits = neuron_layer(hidden2, n_outputs, "outputs")
# with tf.name_scope("dnn"):
# # tensorflow Using this function helps us use the appropriate initialization w and b The strategy of , By default ReLU Activation function
# hidden1 = fully_connected(X, n_hidden1, scope="hidden1")
# hidden2 = fully_connected(hidden1, n_hidden2, scope="hidden2")
# logits = fully_connected(hidden2, n_outputs, scope="outputs", activation_fn=None)
with tf.name_scope("loss"):
# Define the cross entropy loss function , And average a sample
# Function is equivalent to using first softmax Loss function , And then we calculate the cross entropy , And more efficient
# Allied softmax_cross_entropy_with_logits Only for one-hot code , What we use will give 0-9 Classification number
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
loss = tf.reduce_mean(xentropy, name="loss")
learning_rate = 0.01
with tf.name_scope("train"):
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
training_op = optimizer.minimize(loss)
with tf.name_scope("eval"):
# obtain logits The biggest one in it 1 Bit and y Compare whether the categories are the same , return True perhaps False A set of values
correct = tf.nn.in_top_k(logits, y, 1)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
init = tf.global_variables_initializer()
saver = tf.train.Saver()
# Calculation chart stage
mnist = input_data.read_data_sets("MNIST_data_bak/")
n_epochs = 400
batch_size = 50
with tf.Session() as sess:
init.run()
for epoch in range(n_epochs):
for iteration in range(mnist.train.num_examples // batch_size):
X_batch, y_batch = mnist.train.next_batch(batch_size)
sess.run(training_op, feed_dict={
X: X_batch, y: y_batch})
acc_train = accuracy.eval(feed_dict={
X: X_batch, y: y_batch})
acc_test = accuracy.eval(feed_dict={
X: mnist.test.images,
y: mnist.test.labels})
print(epoch, "Train accuracy:", acc_train, "Test accuracy:", acc_test)
save_path = saver.save(sess, "./my_dnn_model_final.ckpt")
'''
# Use model forecast
with tf.Session as sess:
saver.restore(sess, "./my_dnn_model_final.ckpt")
X_new_scaled = [...]
Z = logits.eval(feed_dict={
X: X_new_scaled})
y_pred = np.argmax(Z, axis=1) # See which category is the largest
'''

Tuning neural network superparameters

• Number of hidden layers
• For many questions , You can start with just one hidden layer , You can get good results , For example, for complex problems, we can
Just use enough neurons on the hidden layer , For a long time, people were satisfied and did not explore deep Neural Networks
• But the deep neural network has higher parameter efficiency , The number of neurons can be exponentially reduced , And train faster !
• It's like drawing a forest directly will be slow , But if you draw branches , Copy and paste branches into trees , Copying and pasting trees into forests is very
fast . The real world is usually this hierarchical structure ,DNN Is to use this advantage
• The previous hidden layer builds a low-level structure , Lines that form various shapes and directions , The middle hidden layer combines low-level structures , For example, Fang
block 、 circular , The later hidden layer and output layer form a more advanced structure , Like the face
• Not only does this hierarchical structure help DNN Convergence is faster , At the same time, it increases the reusability to new data sets , for example , If you have already trained
A neural network to recognize faces , Now you want to train a new network to recognize hairstyles , You can reuse the previous layers , Just don't go
Random initialization Weights and biases, You can assign the weight values of the previous layers in the first network to the new network as initialization , Then start training
• In this way, the network does not have to train the low-level network structure from the original , It only needs to train high-rise structures .
• For many questions , One or two hidden layers are enough ,MNIST You can achieve 97% When using a hidden layer of hundreds of neurons
, achieve 98% Use two hidden layers , For more complex problems , You can gradually increase the hidden layer , Until the training set is fitted .
• Very complex tasks such as image classification and speech recognition , It needs dozens or even hundreds of floors , But not all are connected , And they need to be big
The amount of data , however , You rarely need to train from scratch , It is very convenient to reuse some classic networks of similar services that have been trained in advance .
That way, the training will be much faster and will not require much data .
• The number of neurons in each hidden layer
• The number of neurons in the input layer and the output layer can be easily determined , According to the demand , such as MNIST Input layer 28*28=784, Output layer 10
• The usual practice is that there are fewer and fewer neurons in each hidden layer , For example, the first hidden layer 300 Neurons , The second hidden layer 100 Neurons , But , Now the number of neurons in each hidden layer is more the same , For example, all of them are 150 individual , In this way, there is less need to adjust the super parameters , Just like looking for the number of hidden layers before , You can gradually increase the number until it is over fitted , Finding the perfect number is more or black Technology
• The simple way is to choose more layers and neurons than you actually need , And then use early stopping To prevent over fitting , also L1、L2 Regularization techniques , also dropout
adopt Regularization Prevent over fitting
Typical deep learning has hundreds of parameters , Sometimes even millions , from
For so many super parameters , The network has unimaginable degrees of freedom to adapt to large
Measure all kinds of complex data sets . But this flexibility also means a tendency to train
Practice set fitting .
Early Stopping
Reference resources :https://www.jianshu.com/p/9ab695d91459
https://keras.io/callbacks/#earlystopping
https://zhuanlan.zhihu.com/p/186458993
https://tensorflow.google.cn/guide/migrate/early_stopping
To avoid overfitting , Often need to use early-stopping, That is in your loss When approaching convergence , You can stop training in advance
EarlyStopping yes Callbacks A kind of ,callbacks Used to specify in each epoch What is the specific operation to be performed at the beginning and the end .Callbacks There are some set interfaces in , You can use it directly , Such as ’acc’, 'val_acc’, ’loss’ and ’val_loss’ wait .
EarlyStopping It's used to stop training in advance callbacks. In particular , Can be achieved when training set loss It's not diminishing ( That is, the degree of reduction is less than a certain threshold ) Stop training when .
To prevent over fitting on the training set , A good method is early
stopping, Interrupt the training when the descent starts on the verification set
• One way to use TensorFlow To achieve , It's interval, such as every 50
steps, Evaluate the model on the validation set , Then save the snapshot if you lose
The output performance is better than the previous snapshot , Remember to iterate the last time you save a snapshot
Of steps The number of , When they arrive in step Of limit Times ,restore most
The last winning snapshot
• Even though early stopping The actual work is done well , You can still get better
When combined with other regularization techniques .

Relevant code parameters 
L1 L2 Regularization processing
• Use L1 and L2 Regularization to restrict neural network connections weights The weight
• A way to use TensorFlow To do regularization is to add the appropriate regularization term to the loss function .
But if there are many layers , The above method is not very convenient , Fortunately, ,
TensorFlow Provides a better choice , Many functions, such as get_variable() perhaps fully_connected() Accept one *_regularizer Parameters , Can pass anything with weights Is the parameter , Return the function corresponding to the regularization loss ,1_regularizer(),l2_regularizer() and l1_l2_regularizer() Function returns this function .
The above code neural network has two hidden layers , An output layer , At the same time
Create nodes in the graph to calculate the weight of each layer L1 We're losing ,
TensorFlow Automatically add these nodes to a special containing all regular
Set of chemical losses . You just need to add these regularization losses to the overall loss
Missing Center , Don't forget to add the regularization loss to the overall loss , otherwise
They will be ignored .
Dropout operation
• keep_prob Is the retained proportion ,1-keep_prob yes dropout rate
• When training , hold is_training Set to True, When it comes to testing , set up
Set as False.
Principle diagram 
The code is as follows :
Tuning the activation function of neural network hyperparameters
• In most cases, the activation function uses ReLU Activation function , This activation function
Faster calculation , And the gradient descent will not get stuck plateaus, And for big
The input value of , It won't saturate , Opposite contrast logistic function and
hyperbolic tangent function, Will be saturated in 1
• For the output layer ,softmax Activation function is usually a good choice for fractional
Class task , Because categories are mutually exclusive , For the return
service , Do not use activation functions at all
• Activation function
• In most cases, the activation function uses ReLU Activation function , This activation function
Faster calculation , And the gradient descent will not get stuck plateaus, And for big
The input value of , It won't saturate , Opposite contrast logistic function and
hyperbolic tangent function, Will be saturated in 1
• For the output layer ,softmax Activation function is usually a good choice for fractional
Class task , Because categories are mutually exclusive , For the return
service , Do not use activation functions at all
• In most cases, the activation function uses ReLU Activation function , This activation function
Faster calculation , And the gradient descent will not get stuck plateaus, And for big
The input value of , It won't saturate , Opposite contrast logistic function and
hyperbolic tangent function, Will be saturated in 1
• For the output layer ,softmax Activation function is usually a good choice for fractional
Class task , Because categories are mutually exclusive , For the return
service , Do not use activation functions at all .
ReLU Activation function
Rectified Linear Units
• ReLU Calculate that the linear function is nonlinear , If it is greater than 0 It's the result , otherwise
Namely 0
• The response of biological neurons looks like Sigmoid Activation function , all
The experts are Sigmoid I got stuck for a long time , But it turns out ReLU Caigeng
Suitable for artificial neural network , This is a misunderstanding of simulated Biology .
Several activation functions and their derivatives :
Softmax Handle .
Multilayer perceptron is usually used for classification problems , Two classification
• There are also many times it will be used for multi classification , You need to change the activation function of the output layer
Become a shared softmax function
• The output becomes a probability value used to evaluate which category it belongs to

The data increases
• Generate some new training samples from the existing data , Manually increase the training set , This will reduce over fitting
• For example, if your model is a classified mushroom picture , You can translate slightly , rotate , Change size , Then add these changed pictures to the training set , This allows the model to withstand location , Direction , The impact of size , If you want to use the model, you can withstand the influence of light conditions , You can produce many pictures with different contrast , Suppose mushrooms are symmetrical , You can also flip the picture horizontally
As shown in the figure :
• TensorFlow Provide some image operators , for example transposing(shifting),> rotating,resizing,flipping,cropping,adjusting brightness,> contrast,saturation,hue
result :
To be continued —>>>> Preface
边栏推荐
- Setting up Oracle datagurd environment
- Convolution emotion analysis task4
- Mysql database basic operation - regular expression
- [how to solve FAT32 when the computer is inserted into the U disk or the memory card display cannot be formatted]
- Start signing up CCF C ³- [email protected] chianxin: Perspective of Russian Ukrainian cyber war - Security confrontation and sanctions g
- IBEM 数学公式检测数据集
- 双链笔记 RemNote 综合评测:快速输入、PDF 阅读、间隔重复/记忆
- Flink SQL knows why (XV): changed the source code and realized a batch lookup join (with source code attached)
- Task5: multi type emotion analysis
- MySQL_ JDBC
猜你喜欢

显卡缺货终于到头了:4000多块可得3070Ti,比原价便宜2000块拿下3090Ti

mysql更新时条件为一查询

The latest BSC can pay dividends. Any B usdt Shib eth dividend destruction marketing can

Flink SQL knows why (19): the transformation between table and datastream (with source code)

Flink SQL knows why (17): Zeppelin, a sharp tool for developing Flink SQL

挡不住了,国产芯片再度突进,部分环节已进到4nm

使用Tensorflow进行完整的深度神经网络CNN训练完成图片识别案例2

Flutter dynamic | fair 2.5.0 new version features

这本数学书AI圈都在转,资深ML研究员历时7年之作,免费电子版可看

人身变声器的原理
随机推荐
Multi table query of MySQL - multi table relationship and related exercises
MapReduce实现矩阵乘法–实现代码
Error running 'application' in idea running: the solution of command line is too long
106. How to improve the readability of SAP ui5 application routing URL
SVN添加文件时的错误处理:…\conf\svnserve.conf:12: Option expected
Task6: using transformer for emotion analysis
Spark实战1:单节点本地模式搭建Spark运行环境
The shadow of the object at the edge of the untiy world flickers, and the shadow of the object near the far point is normal
Flink SQL knows why (XIV): the way to optimize the performance of dimension table join (Part 1) with source code
Setting up remote links to MySQL on Linux
物联网毕设 --(STM32f407连接云平台检测数据)
HALCON联合C#检测表面缺陷——HALCON例程autobahn
Complete deep neural network CNN training with tensorflow to complete picture recognition case 2
rxjs Observable filter Operator 的实现原理介绍
Unity embeddedbrowser browser plug-in event communication
顺序表(C语言实现)
Introduction to the implementation principle of rxjs observable filter operator
Static linked list (subscript of array instead of pointer)
Shell timing script, starting from 0, CSV format data is regularly imported into PostgreSQL database shell script example
编程内功之编程语言众多的原因