当前位置：网站首页>Understanding of batchnorm2d() function in pytorch

Understanding of batchnorm2d() function in pytorch

2022-07-27 10:14:00 【Chen Zhuangshi's programming life】

List of articles

1. brief introduction

Machine learning , Before model training , The data need to be normalized , Make it uniformly distributed . In the process of deep neural network training , Usually a workout is a batch, Not all data . Every batch Having different distributions produces internal covarivate shift problem —— In the process of training , The data distribution will change , It brings difficulties to the learning of the next layer network .Batch Normalization Force the data back to the mean value of 0, The variance of 1 On the Zhengtai distribution of , On the one hand, it makes the data distribution consistent , On the other hand, avoid the disappearance of the gradient .

2. Calculation

As shown in the figure ：
Insert picture description here
Above is input data , Its shape=[5, 3, h, w]

Step1: Calculation Mean value under the same channel , As shown in the red block , Both represent the same channel
Insert picture description here
Step2: Calculation Variance under the same channel , As shown in the red block , Both represent the same channel

Step3： Normalize each data under the current channel

Among them x Represents a specific point , Such as x = X[0][0][0][0][0] This data point .
Step4: Add zoom and translation variables $\gamma$ and $\beta$ , The normalized value is
Insert picture description here
among , $\epsilon$ Is a set constant , The default is 1e^-5, Its function is to prevent the elimination of 0. $\gamma$ and $\beta$ These two parameters generally do not need our attention （ If , Parameters affine=true, We need to give ）.

3. Pytorch Medium nn.BatchNorm2d() Function interpretation

It mainly requires input 4 Parameters ：
（1）num_features： The input data is shape It's usually [batch_size, channel, height, width], num_features Among them channel;
（2）eps: A value added to the denominator , The purpose is to calculate the stability of , Default ：1e-5;
（3）momentum: An estimation parameter for the mean and variance in the operation process , The default value is 0.1.
Insert picture description here
（4）affine： When set to true when , Given the coefficient matrix that can be learned $\gamma$ and $\beta$

4. Code example ：

import torch

data = torch.ones(size=(2, 2, 3, 4))
data[0][0][0][0] = 25
print("data = ", data)

print("\n")

print("========================= Use encapsulated BatchNorm2d() Calculation ================================")
BN = torch.nn.BatchNorm2d(num_features=2, eps=0, momentum=0)
BN_data = BN(data)
print("BN_data = ", BN_data)

print("\n")

print("========================= Calculate by yourself ================================")
x = torch.cat((data[0][0], data[1][0]), dim=1)      # 1. Splice the same channel （ That is, treat the same channel as a whole ）
x_mean = torch.Tensor.mean(x)                       # 2. Calculate the average value of ownership of the same channel （ That is, the mean value after splicing ）
x_var = torch.Tensor.var(x, False)                  # 3. Calculate the variance of ownership of the same channel （ That is, the variance after splicing ）

# 4. Use the first number to find BatchNorm After the value of 
bn_first = ((data[0][0][0][0] - x_mean) / ( torch.pow(x_var, 0.5))) * BN.weight[0] + BN.bias[0]
print("bn_first = ", bn_first)

Running results ：
（1） The original data
Insert picture description here
（2） Use BatchNorm() function

Insert picture description here
（3） Calculate the normalized value of the batch by yourself

The data of the two boxes marked red in the figure are completely equal , End of the flower ！！！