当前位置:网站首页>Batchnorm2d principle, function and explanation of batchnorm2d function parameters in pytorch
Batchnorm2d principle, function and explanation of batchnorm2d function parameters in pytorch
2022-06-28 16:46:00 【Full stack programmer webmaster】
Hello everyone , I meet you again , I'm your friend, Quan Jun .
BN principle 、 effect :
Function parameters :
BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)1.num_features: The general input parameter is batch_sizenum_featuresheight*width, That is, the number of features , This is the input BN The number of channels in the layer ; 2.eps: A value added to the denominator , The purpose is to calculate the stability of , The default is :1e-5, Avoid denominator as 0; 3.momentum: An estimation parameter for the mean and variance in the operation process ( My understanding is a stability coefficient , Be similar to SGD Medium momentum The coefficient of ); 4.affine: When set to true when , The coefficient matrix that can be learned will be given gamma and beta Generally speaking pytorch The models in are inherited nn.Module Class , All have a property trainning Specify whether it is training status , The training status will affect whether the parameters of some layers are fixed , such as BN Layer or Dropout layer . Usually use model.train() Specify current model model In training status ,model.eval() Specify that the current model is in test state . meanwhile ,BN Of API There are several parameters to be concerned about , One is affine Specifies whether affine , Another is track_running_stats Specifies whether to track the current batch The statistical characteristics of . These three parameters are also prone to problems :trainning,affine,track_running_stats. Among them affine Specifies whether affine , That is, whether we need the fourth of the above formula , If affine=False be γ=1,β=0, And can't learn to be updated . It is usually set to affine=True. trainning and track_running_stats,track_running_stats=True It means to track the whole training process batch The statistical characteristics of , Get variance and mean , Instead of just relying on the current input batch The statistical characteristics of . Contrary , If track_running_stats=False Then it just calculates the current input batch The mean and variance in the statistical properties of . When in the reasoning stage , If track_running_stats=False, If at this time batch_size The relatively small , Then its statistical characteristics will deviate greatly from the global statistical characteristics , May lead to bad results . If BatchNorm2d Parameters of track_running_stats Set up False, After loading the pre training, the results of each model test set are different ;track_running_stats Set to True when , The result is the same every time . running_mean and running_var Parameters are based on input batch The statistical properties of , Not exactly “ Study ” Parameters to , But it is very important for the whole calculation .BN Layer. running_mean and running_var The update to forward During operation , Not in optimizer.step() In the , So if you are in training , Even if it is not done manually step(),BN The statistical properties of will also change .
model.train() # In training state
for data , label in self.dataloader:
pred =model(data) # It will be updated here model Medium BN Statistical characteristic parameters ,running_mean,running_var
loss=self.loss(pred,label)
# Even if you don't do the following three lines ,BN The statistical characteristic parameters of will also change
opt.zero_grad()
loss.backward()
opt.step()This is the time , Use model.eval() Go to the testing phase , Can be fixed running_mean and running_var, Sometimes, if the model is pre trained and then loaded , When rerunning the test data , The results are different , There is a little loss of performance , This time is basically training and track_running_stats Wrong settings . If two models are used for joint training , To make convergence easier to control , First, pre train the model model_A, also model_A There are also several BN layer , In the future, we need to model_A As a inference Reasoning model and model_B Joint training , Hope at this time model_A Medium BN Statistical characteristic quantity of running_mean and running_var No random changes , So we need to put model_A.eval() Set to test model , Otherwise, in the trainning In mode , Even if you don't update the parameters of the model , Its BN Will change , This will lead to different results than expected .
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/132951.html Link to the original text :https://javaforall.cn
边栏推荐
- LTspice 电路仿真入门
- 岛屿类问题通用解法与DFS框架
- 中能融合携手天翼云打造“能源大脑”
- Introduction to reverse commissioning PE structure details 02/07
- 10.Hystrix断路器
- PostgreSQL异常处理
- O & M - unified gateway is very necessary
- 【Hot100】4. Find the median of two positive arrays
- Interview with wangyuntao of China Academy of information technology: digital and real integration enables the prosperity and development of cultural industry
- WPF 视频硬解码渲染播放(无空域)(支持4K、8K、高帧率视频)
猜你喜欢

FS2K人脸素描属性识别

The new paradigm of AI landing is "hidden" in the next major upgrade of software infrastructure

云上竞技,360°见证速度与激情

AI落地的新范式,就“藏”在下一场软件基础设施的重大升级里

What is the maximum number of concurrent TCP connections for a server? 65535?
![[tcapulusdb knowledge base] Introduction to tcapulusdb restrictions](/img/d3/27f09f7f5ab8e27d1ab87a35a9c0f3.png)
[tcapulusdb knowledge base] Introduction to tcapulusdb restrictions

Introduction to reverse commissioning PE structure details 02/07

【世界海洋日】TcaplusDB号召你一同保护海洋生物多样性

2022年暑期及9月份CSP-J1 CSP-S1初赛 培训计划及学习要点

【TcaplusDB知识库】TcaplusDB限制条件介绍
随机推荐
逆向调试入门-PE结构详解02/07
抓取手机端变体组合思路设想
NOIP普及组2006-2018初赛 2019 CSP-J1 2020 CSP-J1 完善程序题
使用 Open Connector 进行 HubSpot 和 SAP 系统的集成工作
10.Hystrix断路器
10.hystrix circuit breaker
运维-- 统一网关非常必要
【Hot100】1. 两数之和
【Hot100】2.两数相加
如何在网站上安装 WordPress
General solution of island problems and DFS framework
【Laravel】关于Laravel8的composer安装
FS2K人脸素描属性识别
如何备份 WordPress 数据库
【Hot100】3. 无重复字符的最长子串
MATLB|电力系统优化运行与市场化
Fs2k face sketch attribute recognition
Traffic management and control of firewall Foundation
Redmibook Pro 14 enhanced version cannot open delta software drastudio_ v1.00.07.52
Steps to be taken for successful migration to the cloud