当前位置：网站首页>Batch normalization batch_ normalization

Batch normalization batch_ normalization

2022-07-26 17:08:00 【Full stack programmer webmaster】

Hello everyone , I meet you again , I'm the king of the whole stack

In order to solve the problem of reducing gradient disappearance at the initial stage of deep neural network training / Explosion problem ,Sergey loffe and Christian Szegedy A scheme using batch normalization technology is proposed , This technique involves adding an operation to the model before each layer activates the function , Simple zero centering and normalized input , Then pass the two new parameters of each layer ( A zoom , Another mobile ) Zoom and move results , Words and sentences , This operation enables the model to learn the best model and the average value of input at each layer

Batch normalization principle

(1)\(\mu_B = \frac{1}{m_B}\sum_{i=1}^{m_B}x^{(i)}\) # Experience average , Evaluate the whole small batch B

(2)\(\theta_B = \frac{1}{m_B}\sum_{i=1}^{m_b}(x^{(i)} – \mu_B)^2\) # Evaluate the whole small batch B The variance of

(3)\(x_{(i)}^* = \frac{x^{(i)} – \mu_B}{\sqrt{\theta_B^2+\xi}}\)# Zero centralization and normalization

(4)\(z^{(i)} = \lambda x_{(i)}^* + \beta\)# Zoom and move the input

During the test , There is no small batch of data to calculate the empirical mean and standard deviation , All can be simply replaced by the average value and standard deviation of the whole training set , In the process of training, it can be effectively calculated with the average value of changes

however , Batch normalization does add some complexity and running cost to the model , Make the prediction speed of neural network slow , So if inverse requires fast prediction , It may be necessary to check the following before batch normalization ELU+He How does initialization behave

tf.layers.batch_normalization Use

The function prototype

def batch_normalization(inputs,
                    axis=-1,
                    momentum=0.99,
                    epsilon=1e-3,
                    center=True,
                    scale=True,
                    beta_initializer=init_ops.zeros_initializer(),
                    gamma_initializer=init_ops.ones_initializer(),
                    moving_mean_initializer=init_ops.zeros_initializer(),
                    moving_variance_initializer=init_ops.ones_initializer(),
                    beta_regularizer=None,
                    gamma_regularizer=None,
                    beta_constraint=None,
                    gamma_constraint=None,
                    training=False,
                    trainable=True,
                    name=None,
                    reuse=None,
                    renorm=False,
                    renorm_clipping=None,
                    renorm_momentum=0.99,
                    fused=None,
                    virtual_batch_size=None,
                    adjustment=None):

Precautions for use

(1) Use batch_normalization It takes three steps ：

a. Set the activation function at the convolution layer to None
b. Use batch_normalization
c. Use the activation function to activate 

 Example ：
inputs = tf.layers.dense(inputs,self.n_neurons,
                                   kernel_initializer=self.initializer,
                                   name = 'hidden%d'%(layer+1))
if self.batch_normal_momentum:
    inputs = tf.layers.batch_normalization(inputs,momentum=self.batch_normal_momentum,train=self._training)

inputs = self.activation(inputs,name = 'hidden%d_out'%(layer+1))

(2) During the training , The parameter training Set to True, At testing time , take training Set to False, At the same time, pay special attention to update_ops Use

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
 Need to be updated every time you train , have access to sess.run(update_ops)
 It's fine too ：
with tf.control_dependencies(update_ops):
    train_op = tf.train.AdamOptimizer(learning_rate).minimize(loss)

Use mnist Data sets for simple testing

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np

mnist = input_data.read_data_sets('MNIST_data',one_hot=True)
x_train,y_train = mnist.train.images,mnist.train.labels
x_test,y_test = mnist.test.images,mnist.test.labels

Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz

he_init = tf.contrib.layers.variance_scaling_initializer()
def dnn(inputs,n_hiddens=1,n_neurons=100,initializer=he_init,activation=tf.nn.elu,batch_normalization=None,training=None):
    for layer in range(n_hiddens):
        inputs = tf.layers.dense(inputs,n_neurons,kernel_initializer=initializer,name = 'hidden%d'%(layer+1))
        if batch_normalization is not None:   
            inputs = tf.layers.batch_normalization(inputs,momentum=batch_normalization,training=training)
        inputs = activation(inputs,name = 'hidden%d'%(layer+1))
    return inputs

tf.reset_default_graph()
n_inputs = 28*28
n_hidden = 100
n_outputs = 10

X = tf.placeholder(tf.float32,shape=(None,n_inputs),name='X')
Y = tf.placeholder(tf.int32,shape=(None,n_outputs),name='Y')

training = tf.placeholder_with_default(False,shape=(),name='tarining')
dnn_outputs = dnn(X)

logits = tf.layers.dense(dnn_outputs,n_outputs,kernel_initializer = he_init,name='logits')
y_proba = tf.nn.softmax(logits,name='y_proba')
xentropy = tf.nn.softmax_cross_entropy_with_logits(labels=Y,logits=y_proba)
loss = tf.reduce_mean(xentropy,name='loss')
train_op = tf.train.AdamOptimizer(learning_rate=0.01).minimize(loss)

correct = tf.equal(tf.argmax(Y,1),tf.argmax(y_proba,1))
accuracy = tf.reduce_mean(tf.cast(correct,tf.float32))

epoches = 20
batch_size = 100
np.random.seed(42)

init = tf.global_variables_initializer()
rnd_index = np.random.permutation(len(x_train))
n_batches = len(x_train) // batch_size
with tf.Session() as sess:
    sess.run(init)
    for epoch in range(epoches):       
        for batch_index in np.array_split(rnd_index,n_batches):
            x_batch,y_batch = x_train[batch_index],y_train[batch_index]
            feed_dict = {X:x_batch,Y:y_batch,training:True}
            sess.run(train_op,feed_dict=feed_dict)
        loss_val,accuracy_val = sess.run([loss,accuracy],feed_dict={X:x_test,Y:y_test,training:False})
        print('epoch:{},loss:{},accuracy:{}'.format(epoch,loss_val,accuracy_val))

Publisher ： Full stack programmer stack length , Reprint please indicate the source ：https://javaforall.cn/120021.html Link to the original text ：https://javaforall.cn

原网站

版权声明
本文为[Full stack programmer webmaster]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/200/202207181539456624.html