当前位置：网站首页>Deep learning - residual networks resnets

Deep learning - residual networks resnets

2022-06-30 07:43:00 【Hair will grow again without it】

Residual network

ResNets By Residual block （Residual block） Built , First of all, I'll explain what residual blocks are .

This is a two-layer neural network , stay 𝐿 Layer to activate , obtain 𝑎[𝑙+1], Activate again , Two layers later you get 𝑎[𝑙+2]. The calculation process is from 𝑎[𝑙] Start , Start with linear activation , According to this formula ：𝑧[𝑙+1] = 𝑊[𝑙+1]*𝑎[𝑙] + 𝑏[𝑙+1], adopt 𝑎[𝑙] Work out 𝑧[𝑙+1], namely 𝑎[𝑙] Multiply by the weight matrix , Plus the deviation factor . And then through ReLU The nonlinear activation function gives 𝑎[𝑙+1] , 𝑎[𝑙+1] = 𝑔(𝑧[𝑙+1]) meter count have to Out . Pick up the Again Time Into the That's ok Line sex Thrill live , In accordance with the According to the etc. type 𝑧[𝑙+2] = 𝑊[2+1]𝑎[𝑙+1] + 𝑏[𝑙+2], Finally, according to this equation again ReLu Nonlinear activation , namely 𝑎[𝑙+2] = 𝑔(𝑧[𝑙+2]), there 𝑔 Refer to ReLU Nonlinear functions , The result is 𝑎[𝑙+2]. let me put it another way , Information flows from 𝑎[𝑙] To 𝑎[𝑙+2] All the above steps are required , That is, the main path of the network layer .

There's a little change in the residual network , We take 𝑎[𝑙] Straight back , Copy to the deep layer of the neural network , stay ReLU The nonlinear activation function is preceded by 𝑎[𝑙], It's a shortcut .𝑎[𝑙] The information of the neural network directly reaches the deep layer , No longer passing along the main path , That means the last equation (𝑎[𝑙+2] = 𝑔(𝑧[𝑙+2])) Removed , Instead, there is another ReLU Nonlinear functions , Still right 𝑧[𝑙+2] Conduct 𝑔 Function processing , But this time, add 𝑎[𝑙], namely ： 𝑎[𝑙+2] = 𝑔(𝑧[𝑙+2] + 𝑎[𝑙]),
That's the one you added 𝑎[𝑙] A residual block is generated .

ResNet The inventor of is hekaiming （Kaiming He）、 Zhang Xiangyu （Xiangyu Zhang）、 Ren Shaoqing （Shaoqing Ren） He SunJian （Jiangxi Sun）, They found that using residual blocks can train deeper neural networks . So build a ResNet The network is by stacking many of these residual blocks together , Form a deep neural network .
The following network is not a residual network , It's an ordinary network （Plain network）, The term comes from ResNet The paper .

Turn it into ResNet The way to do this is to add all the jump Links , As you saw on the previous slide , Add a shortcut every two layers , Make a residual block . As shown in the figure ,5 The residual blocks are connected together to form a residual network .

Why use residual networks ：
If we use standard optimization algorithm to train a normal network , For example, gradient descent method , Or other popular optimization algorithms . If there is no residual , There are no shortcuts or jumping Links , From experience, you will find that with the deepening of network depth , Training mistakes will be reduced first , And then increase . In theory , With the deepening of network depth , We should train better and better . in other words , Theoretically, the deeper the network, the better . But actually , If there is no residual network , For an ordinary network , The deeper the depth, the more difficult it is to train with the optimization algorithm . actually , With the deepening of network depth , There will be more and more training mistakes .
But there is. ResNets Not so , Even if the network is deeper , Training is good , For example, training error reduction , Even if it's deep training 100 Layer network is no exception . But yes. 𝑥 Activation , Or these intermediate activations can reach deeper layers of the network . This way really It is helpful to solve the problems of gradient disappearance and gradient explosion , Let's train the deeper network at the same time , It can guarantee good performance . Maybe from another point of view , As the Internet gets deeper , Network connections will become bloated , however ResNet It's really effective in training deep networks .

原网站

版权声明
本文为[Hair will grow again without it]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/181/202206300722175020.html

当前位置：网站首页>Deep learning - residual networks resnets

Deep learning - residual networks resnets

Residual network

边栏推荐

猜你喜欢

随机推荐