当前位置:网站首页>Batch normalization (Standardization) processing
Batch normalization (Standardization) processing
2022-07-07 05:01:00 【Yinque Guangqian】
In fact, the normalization of sample data , We're in the front Kaggle Practice of house price prediction (K Crossover verification ) Have done good specific experiments , Also got good results , Here we mainly focus on how to normalize , And why we should do such a treatment , What are the benefits .
When we get the data samples , There are usually some exceptions ( Relatively speaking, it is larger or smaller ) The sample of , Or the dispersion of samples is very high , So when we are training , We need to do some extra work , For example, normalization is done , You will get the following two obvious benefits .
1、 The output of each layer in the depth model will be more stable , Because after normalization , The characteristics of the sample are concentrated in a section ( such as , The mean for 0, The standard deviation is 1), So it's eliminated “ Abnormal samples ” Adverse effects , Because the distribution is relatively uniform , So it will be easier to train an effective model .
2、 In training , Convergence will be faster , This is very helpful for deepening the model .
How to do normalization , There are many ways , such as : Maximum and minimum standardization 、log Logarithmic function normalization 、 Normalization of arctangent function 、 L2 Norm normalization, etc , This paper mainly introduces a method commonly used in Neural Networks , The picture is more intuitive , Let's look at the picture first ( Pretty simple , Find the mean and variance , Then do a division operation ):
Normalized layer
First find the mean and variance , Then the eigenvalue is subtracted from the mean and divided by the variance , The normalized processing data is obtained
import d2lzh as d2l
from mxnet import gluon,init,nd,autograd
from mxnet.gluon import nn
# Batch normalization
'''
There is one more γ and β Parameters , They are learnable stretch and offset parameters
If batch normalization is not beneficial , These two parameters can determine the input X Do not normalize
moving_mean,moving_var These two parameters are moving average and variance , It is estimated in the whole training data set
Therefore, the calculation results of training mode and prediction mode are different
'''
def batch_norm(X,gamma,beta,moving_mean,moving_var,eps,momentum):
if not autograd.is_training():
# The prediction model directly uses the estimated moving average and variance
X_hat=(X-moving_mean)/nd.sqrt(moving_var+eps)
else:
# Training mode , It is divided into 2 dimension ( Fully connected layer ) And 4 dimension ( Convolution layer )
assert(X.ndim in (2,4))
if X.ndim==2:
meanV=X.mean(axis=0)
var=((X-meanV)**2).mean(axis=0)
else:
meanV=X.mean(axis=(0,2,3),keepdims=True)
var=((X-meanV)**2).mean(axis=(0,2,3),keepdims=True)
X_hat=(X-meanV)/nd.sqrt(var+eps)
# Update the mean and variance of the moving average
moving_mean = momentum * moving_mean + (1.0 - momentum) * meanV
moving_var = momentum * moving_var + (1.0 - momentum) * var
Y=gamma*X_hat+beta
return Y,moving_mean,moving_var
Customize BatchNorm layer
# Parameters num_features In the full connection layer, the number of outputs , Convolution layer is the number of output channels
class BatchNorm(nn.Block):
def __init__(self,num_features,num_dims,**kwargs):
super(BatchNorm,self).__init__(**kwargs)
if num_dims==2:
shape=(1,num_features)
else:
shape=(1,num_features,1,1)
# Stretch and offset parameters involved in gradient and iteration , They are initialized to 1 and 0
self.gamma=self.params.get('gamma',shape=shape,init=init.One())
self.beta=self.params.get('beta',shape=shape,init=init.Zero())
# Variables that do not participate in the gradient sum iteration , All initialized in memory as 0
self.moving_mean=nd.zeros(shape)
self.moving_var=nd.zeros(shape)
def forward(self,X):
# If X Not in memory , take moving_mean,moving_var Copied to the X On the video memory
if self.moving_mean.context!=X.context:
self.moving_mean=self.moving_mean.copyto(X.context)
self.moving_var=self.moving_var.copyto(X.context)
# Save the updated moving_mean and moving_var
Y,self.moving_mean,self.moving_var=batch_norm(X,self.gamma.data(),self.beta.data(),self.moving_mean,self.moving_var,eps=1e-5,momentum=0.9)
return Y
added BN Layer of LeNet Model
#LeNet
net=nn.Sequential()
net.add(nn.Conv2D(6,kernel_size=5),
BatchNorm(6,num_dims=4),nn.Activation('sigmoid'),nn.MaxPool2D(pool_size=2,strides=2),
nn.Conv2D(16,kernel_size=5),
BatchNorm(16,num_dims=4),nn.Activation('sigmoid'),nn.MaxPool2D(pool_size=2,strides=2),
nn.Dense(120),
BatchNorm(120,num_dims=2),nn.Activation('sigmoid'),
nn.Dense(10)
)
lr,num_epochs,batch_size,ctx=1.0,5,256,d2l.try_gpu()
net.initialize(ctx=ctx,init=init.Xavier())
trainer=gluon.Trainer(net.collect_params(),'sgd',{'learning_rate':lr})
train_iter,test_iter=d2l.load_data_fashion_mnist(batch_size)
d2l.train_ch5(net,train_iter,test_iter,batch_size,trainer,ctx,num_epochs)
'''
epoch 1, loss 0.7461, train acc 0.748, test acc 0.827, time 8.1 sec
epoch 2, loss 0.4090, train acc 0.853, test acc 0.858, time 7.9 sec
epoch 3, loss 0.3635, train acc 0.867, test acc 0.822, time 7.8 sec
epoch 4, loss 0.3268, train acc 0.881, test acc 0.775, time 7.7 sec
epoch 5, loss 0.3099, train acc 0.888, test acc 0.857, time 7.6 sec
'''
# Print gamma and beta data
print(net[1].gamma.data())
'''
[[[[1.5982468]]
[[1.6550801]]
[[1.4356986]]
[[1.1882782]]
[[1.2812225]]
[[1.8739824]]]]
<NDArray 1x6x1x1 @gpu(0)>
'''
print(net[1].beta.data())
'''
[[[[ 1.1335251 ]]
[[-0.18426114]]
[[-0.02497273]]
[[ 0.99639875]]
[[-1.2256573 ]]
[[-2.2048857 ]]]]
'''
LeNet Simple implementation of the model
As can be seen from the above ,BN Layers are placed in front of the activation function . In addition, for batch normalization layer , It is already defined in the framework , And there is no need to specify num_features and num_dims, These parameters will be automatically obtained after delayed initialization , Let's look at the effect .
net=nn.Sequential()
net.add(nn.Conv2D(6,kernel_size=5),
nn.BatchNorm(),nn.Activation('sigmoid'),nn.MaxPool2D(pool_size=2,strides=2),
nn.Conv2D(16,kernel_size=5),
nn.BatchNorm(),nn.Activation('sigmoid'),nn.MaxPool2D(pool_size=2,strides=2),
nn.Dense(120),
nn.BatchNorm(),nn.Activation('sigmoid'),
nn.Dense(84),
nn.BatchNorm(),nn.Activation('sigmoid'),
nn.Dense(10)
)
'''
training on gpu(0)
epoch 1, loss 0.6276, train acc 0.779, test acc 0.799, time 5.9 sec
epoch 2, loss 0.3885, train acc 0.859, test acc 0.856, time 5.8 sec
epoch 3, loss 0.3456, train acc 0.875, test acc 0.815, time 5.9 sec
epoch 4, loss 0.3201, train acc 0.885, test acc 0.873, time 5.9 sec
epoch 5, loss 0.3053, train acc 0.888, test acc 0.855, time 6.0 sec
'''
Examples of averaging different dimensions
For finding the mean or variance of a dimension , Here is an example , Let everyone know more intuitively , How to operate in different dimensions
import numpy as np
a1=np.arange(10).reshape(2,5)
print(a1)
print(a1[:,0])# View the first set of data in the first dimension [0 5]
print(a1.mean(axis=0))#[2.5 3.5 4.5 5.5 6.5]
'''
[[0 1 2 3 4]
[5 6 7 8 9]]
[0 5]
[2.5 3.5 4.5 5.5 6.5]
'''
a2=np.arange(30).reshape(2,3,1,5)
print(a2)
print(a2[:,0,:,:])# View channel dimension (NCHW, The second dimension ) The first set of data
print(a2.mean(axis=(0,2,3)))#[ 9.5 14.5 19.5]
'''
[[[[ 0 1 2 3 4]]
[[ 5 6 7 8 9]]
[[10 11 12 13 14]]]
[[[15 16 17 18 19]]
[[20 21 22 23 24]]
[[25 26 27 28 29]]]]
[[[ 0 1 2 3 4]]
[[15 16 17 18 19]]]
[ 9.5 14.5 19.5]
'''
# Keep dimensions the same
print(a2.mean(axis=(0,2,3),keepdims=True))
'''
[[[[ 9.5]]
[[14.5]]
[[19.5]]]]
shape :(1, 3, 1, 1)
'''
边栏推荐
- Oracle - views and sequences
- Using thread class and runnable interface to realize the difference between multithreading
- R语言主成分pca、因子分析、聚类对地区经济研究分析重庆市经济指标
- ServiceMesh主要解决的三大痛点
- U++ 游戏类 学习笔记
- Chapter 9 Yunji datacanvas company has been ranked top 3 in China's machine learning platform market
- Jetson nano配置pytorch深度学习环境//待完善
- 史上最全学习率调整策略lr_scheduler
- Fiance donated 500million dollars to female PI, so that she didn't need to apply for projects, recruited 150 scientists, and did scientific research at ease!
- namespace基础介绍
猜你喜欢
Chapter 9 Yunji datacanvas company won the highest honor of the "fifth digital finance innovation competition"!
IMS data channel concept of 5g vonr+
JS variable plus
Read of shell internal value command
Ansible中的inventory主机清单(预祝你我有数不尽的鲜花和浪漫)
acwing 843. n-皇后问题
[736. LISP syntax parsing]
C语言中函数指针与指针函数
Ansible reports an error: "MSG": "invalid/incorrect password: permission denied, please try again“
Introduction to namespace Basics
随机推荐
Flask project uses flask socketio exception: typeerror: function() argument 1 must be code, not str
Section 1: (3) logic chip process substrate selection
Basic idea of counting and sorting
C语言中函数指针与指针函数
高手勿进!写给初中级程序员以及还在大学修炼的“准程序员”的成长秘籍
The most complete learning rate adjustment strategy in history LR_ scheduler
STM32封装ESP8266一键配置函数:实现实现AP模式和STA模式切换、服务器与客户端创建
Common Oracle SQL statements
Flask项目使用flask-socketio异常:TypeError: function() argument 1 must be code, not str
How to choose an offer and what factors should be considered
JS also exports Excel
架构实战训练营|课后作业|模块 6
Ansible概述和模块解释(你刚走过了今天,而扑面而来的却是昨天)
Stm32f103ze+sht30 detection of ambient temperature and humidity (IIC simulation sequence)
Chapter 9 Yunji datacanvas company won the highest honor of the "fifth digital finance innovation competition"!
组织实战攻防演练的5个阶段
01机器学习相关规定
AttributeError: module ‘torch._ C‘ has no attribute ‘_ cuda_ setDevice‘
JS variable case
sscanf,sscanf_ S and its related usage "suggested collection"