当前位置:网站首页>Neural network parameter initialization
Neural network parameter initialization
2022-07-27 07:07:00 【Mr_ health】
1. bn Parameter initialization of layer
bn The parameters that the layer needs to initialize are scale and bias,bias It is usually initialized to 0,scale There are two ways to initialize
- Initialize to 1
- Initialize to 0: It is assumed that bn Layer does not exist , It is generally used in structures with residuals :resblock, Because there is chortcut, So this initialization method is feasible
# Zero-initialize the last BN in each residual branch,
# so that the residual branch starts with zeros, and each residual block behaves like an identity.
# This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
if zero_init_residual:
for m in self.modules():
if isinstance(m, Bottleneck):
nn.init.constant_(m.bn3.weight, 0)
elif isinstance(m, BasicBlock):
nn.init.constant_(m.bn2.weight, 0)2. fc Parameter initialization of layer
- weight: Normal distribution to initialize , The mean for 0, Variance can be adjusted
- bias:0.01 Or for 0, It's usually 0
if isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01) # The mean for 0, The variance of 0.01
nn.init.zeros_(m.bias)3. conv Parameter initialization of layer
- bias:0.01 Or for 0, It's usually 0
- weight:
(1)Xavier initialization :Xavier Parameter initialization 、 Deep learning parameter initialization ( One )Xavier initialization With code
(2)He initialization : Deep learning parameter initialization ( Two )Kaiming initialization With code
stay He During initialization, you can choose mode by :fan_in perhaps fan_out, Some bloggers concluded :
- If the weight is through linear layer ( Convolution or full connection ) Implicitly determined , You need to set mode=fan_in;
- If by creating random matrix Explicitly create weights , Then set mode=‘fan_out’.
I looked at the major mainstream models, such as mobilenet_v2,resnet They all use mode=‘fan_out’
边栏推荐
- Interpretation of deepsort source code (IV)
- 手机上也能训练BERT和ResNet了?!
- DNA modified near infrared two region GaAs quantum dots | GaAs DNA QDs | DNA modified GaAs quantum dots
- deepsort源码解读(三)
- Livox SLAM(带LIO+闭环检测优化)
- R2live code learning record (3): radar feature extraction
- Dajiang livox customized format custommsg format conversion pointcloud2
- 肽核酸PNA-多肽PNA-TPP|Glt-Ala-Ala-Pro-Leu-pNA|Suc-Ala-Pro-pNA|Suc-AAPL-pNA|Suc-AAPM-pNA
- PNA修饰多肽ARMS-PNA|PNA-DNA|suc-AAPF-pNA|Suc-(Ala)3-pNA
- 基于SSM实现的校园新闻发布管理系统
猜你喜欢

ZnS-DNA QDs近红外硫化锌ZnS量子点改性脱氧核糖核酸DNA|DNA修饰ZnS量子点

【11】 Binary code: "holding two roller handcuffs, crying out for hot hot hot"?

PNA peptide nucleic acid modified peptide suc Tyr Leu Val PNA | suc ala Pro Phe PNA 11

What is the reason why the channel list is empty on the intelligent security video platform easycvr?

How can chrome quickly transfer a group of web pages (tabs) to another device (computer)

Netease Yunxin appeared at the giac global Internet architecture conference to decrypt the practice of the new generation of audio and video architecture in the meta universe scene

jest单测样式问题【identity-obj-proxy】npm包

The problem of torch loading custom models

Analysis of strong tennis cup 2021 PWN competition -- babypwn

Boostrap
随机推荐
【12】 Understand the circuit: from telegraph to gate circuit, how can we "send messages from thousands of miles"?
Reasoning speed of model
Express framework
The problem of torch loading custom models
ZnS DNA QDs near infrared zinc sulfide ZnS quantum dots modified deoxyribonucleic acid dna|dna modified ZnS quantum dots
Qi Yue: thiol modified oligodna | DNA modified cdte/cds core-shell quantum dots | DNA coupled indium arsenide InAs quantum dots InAs DNA QDs
基于SSM医院预约管理系统
Peptide nucleic acid oligomer containing azobenzene monomer (nh2-tnt4, n-pnas) Qiyue biological customization
DNA偶联PbSe量子点|近红外硒化铅PbSe量子点修饰脱氧核糖核酸DNA|PbSe-DNA QDs
采用QT进行OpenGL开发(一)绘制平面图形
仿真模型简单介绍
Record of pychart running jupyter notebook in virtual environment
【11】 Binary code: "holding two roller handcuffs, crying out for hot hot hot"?
Some problems about too fast s verification code
Code random notes_ Hash_ 242 effective letter heterotopic words
Event capture and bubbling - what is the difference between them?
Pytorch uses data_ Prefetcher improves data reading speed
Basic concepts of program, process, thread, coprocess, single thread and multi thread
Shell编程的规范和变量
Using docker to install and deploy redis on CentOS