当前位置:网站首页>Batchnorm2d principle, function and explanation of batchnorm2d function parameters in pytorch
Batchnorm2d principle, function and explanation of batchnorm2d function parameters in pytorch
2022-06-28 16:46:00 【Full stack programmer webmaster】
Hello everyone , I meet you again , I'm your friend, Quan Jun .
BN principle 、 effect :
Function parameters :
BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)1.num_features: The general input parameter is batch_sizenum_featuresheight*width, That is, the number of features , This is the input BN The number of channels in the layer ; 2.eps: A value added to the denominator , The purpose is to calculate the stability of , The default is :1e-5, Avoid denominator as 0; 3.momentum: An estimation parameter for the mean and variance in the operation process ( My understanding is a stability coefficient , Be similar to SGD Medium momentum The coefficient of ); 4.affine: When set to true when , The coefficient matrix that can be learned will be given gamma and beta Generally speaking pytorch The models in are inherited nn.Module Class , All have a property trainning Specify whether it is training status , The training status will affect whether the parameters of some layers are fixed , such as BN Layer or Dropout layer . Usually use model.train() Specify current model model In training status ,model.eval() Specify that the current model is in test state . meanwhile ,BN Of API There are several parameters to be concerned about , One is affine Specifies whether affine , Another is track_running_stats Specifies whether to track the current batch The statistical characteristics of . These three parameters are also prone to problems :trainning,affine,track_running_stats. Among them affine Specifies whether affine , That is, whether we need the fourth of the above formula , If affine=False be γ=1,β=0, And can't learn to be updated . It is usually set to affine=True. trainning and track_running_stats,track_running_stats=True It means to track the whole training process batch The statistical characteristics of , Get variance and mean , Instead of just relying on the current input batch The statistical characteristics of . Contrary , If track_running_stats=False Then it just calculates the current input batch The mean and variance in the statistical properties of . When in the reasoning stage , If track_running_stats=False, If at this time batch_size The relatively small , Then its statistical characteristics will deviate greatly from the global statistical characteristics , May lead to bad results . If BatchNorm2d Parameters of track_running_stats Set up False, After loading the pre training, the results of each model test set are different ;track_running_stats Set to True when , The result is the same every time . running_mean and running_var Parameters are based on input batch The statistical properties of , Not exactly “ Study ” Parameters to , But it is very important for the whole calculation .BN Layer. running_mean and running_var The update to forward During operation , Not in optimizer.step() In the , So if you are in training , Even if it is not done manually step(),BN The statistical properties of will also change .
model.train() # In training state
for data , label in self.dataloader:
pred =model(data) # It will be updated here model Medium BN Statistical characteristic parameters ,running_mean,running_var
loss=self.loss(pred,label)
# Even if you don't do the following three lines ,BN The statistical characteristic parameters of will also change
opt.zero_grad()
loss.backward()
opt.step()This is the time , Use model.eval() Go to the testing phase , Can be fixed running_mean and running_var, Sometimes, if the model is pre trained and then loaded , When rerunning the test data , The results are different , There is a little loss of performance , This time is basically training and track_running_stats Wrong settings . If two models are used for joint training , To make convergence easier to control , First, pre train the model model_A, also model_A There are also several BN layer , In the future, we need to model_A As a inference Reasoning model and model_B Joint training , Hope at this time model_A Medium BN Statistical characteristic quantity of running_mean and running_var No random changes , So we need to put model_A.eval() Set to test model , Otherwise, in the trainning In mode , Even if you don't update the parameters of the model , Its BN Will change , This will lead to different results than expected .
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/132951.html Link to the original text :https://javaforall.cn
边栏推荐
- Cross cluster deployment of helm applications using karmada
- LDD 知识整理
- 【TcaplusDB】祝大家端午安康!
- 昨日元宇宙| 沃尔玛成立探索元宇宙和Web3的创新部门,Dior发布元宇宙展览
- Noip2011-2018 problem solving report of improvement group
- Langqing and Langchao, an ecological model from OEM to value symbiosis
- 昨日元宇宙|Meta “元宇宙”部门一季度亏损29.6亿美元,六福珠宝发行数字藏品
- 2022年暑期及9月份CSP-J1 CSP-S1初赛 培训计划及学习要点
- 【Hot100】4. 寻找两个正序数组的中位数
- 【Hot100】1. Sum of two numbers
猜你喜欢

抓取手机端变体组合思路设想

What is the maximum number of concurrent TCP connections for a server? 65535?
![[proteus simulation] L297 driving stepping motor](/img/12/7902cf31f19df5d2613de7f25dca5b.png)
[proteus simulation] L297 driving stepping motor

Traffic management and control of firewall Foundation

On the design principle of price discount in SAP software

使用 Open Connector 进行 HubSpot 和 SAP 系统的集成工作

逆向调试入门-PE结构详解02/07

The first place on the list - brake by wire "new cycle", the market competitiveness of local suppliers is TOP10

Redmibook Pro 14 enhanced version cannot open delta software drastudio_ v1.00.07.52

The new paradigm of AI landing is "hidden" in the next major upgrade of software infrastructure
随机推荐
Knowing these commands allows you to master shell's own tools
10.hystrix circuit breaker
媒体数据处理V2版本(VPC)图像缩放内容解析
[redis] a brief summary of redis on January 31, 2021 No.01
Redmibook Pro 14 enhanced version cannot open delta software drastudio_ v1.00.07.52
[208] API design based on accesstoken
CODING DevOps 助力中化信息打造新一代研效平台,驱动“线上中化”新未来
[tcapulusdb knowledge base] Introduction to tcapulusdb restrictions
A 24-year-old bald programmer teaches you how to continuously integrate and deliver microservice delivery. You can't learn how to cut me off
使用Karmada实现Helm应用的跨集群部署
js中订阅发布模式bus
Code implementation of gain (4) -- gap dataset missing data filling based on GaN (sequence) [improved version]
Super automation and the future of network security
Why MySQL table connection is faster than subquery
Convolutional neural network for machine learning uses cifar10 data set and alexnet network model to train classification model, install labelimg, and report error
Fs2k face sketch attribute recognition
C#/VB.NET 将PDF转为Excel
小新黑苹果声卡ID注入
2019 CSP J2入门组 CSP-S2提高组 第2轮 视频与题解
GCC efficient graph revolution for joint node representationlearning and clustering