当前位置:网站首页>Detailed ResNet: What problem is ResNet solving?
Detailed ResNet: What problem is ResNet solving?
2022-08-04 07:18:00 【hot-blooded chef】
原作者开源代码:https://github.com/KaimingHe/deep-residual-networks
论文:https://arxiv.org/pdf/1512.03385.pdf
1、网络退化问题
在ResNet诞生之前,AlexNet、VGGThese more mainstream networks are simple stacked layers,比较明显的现象是,网络层数越深,The better the recognition effect.但事实上,当When the number of network layers reaches a certain depth,Accuracy will reach saturation,然后迅速下降.
2、Causes of network degradation
Due to the chain rule in the backpropagation algorithm,If the gradient between layers is(0,1)之间,层层缩小,那么就会出现梯度消失.反之,If the gradient passed layer by layer is greater than1,Then after layer by layer expansion,就会出现梯度爆炸.所以,Simple stacked layers will inevitably degenerate the network.
虽然梯度消失/The explosion is caused by too deep hidden layers of the network,但是在论文中,It has been said that this problem is mainly solved by normalizing initialization and intermediate normalization layers.So the network degradation is not because the gradient disappears/caused by the explosion,What is the cause of the network degradation problem??Another paper gives the answer:The Shattered Gradients Problem: If resnets are the answer, then what is the question?
大意是神经网络越来越深的时候,反传回来的梯度之间的相关性会越来越差,最后接近白噪声.因为我们知道图像是具备局部相关性的,那其实可以认为梯度也应该具备类似的相关性,这样更新的梯度才有意义,如果梯度接近白噪声,那梯度更新可能根本就是在做随机扰动.
3、残差网络
Based on network degradation problem,The author of the paper proposed the concept of residual network.The mathematical model of a residual block is shown in the following figure.The biggest difference between the residual network and the previous network is that there is one moreidentityThe quickest way to branch.And because of the existence of this branch,make the network backpropagation,The loss can pass the gradient directly to the previous network through this shortcut,thereby slowing down the problem of network degradation.
When analyzing the causes of network degradation in Section II,We learned that there is a correlation between gradients.After we have the gradient correlation in the index,The authors analyze a range of structures and activation functions,发现resnetExcellent at preserving gradient correlation(Correlation between attenuation from 1 2 L \frac{1}{\sqrt{2^L}} 2L1到 1 L \frac{1}{\sqrt{L}} L1了.This is actually quite understandable,From the gradient flow,There is a gradient that is passed back untouched,This part of the correlation is very strong.
除此之外,Residual network does not add new parameters,Just one more addition.而在GPU的加速下,This extra computation is almost negligible.
不过我们可以看到,Because the residual block is finally F ( x ) + x F(x) + x F(x)+x的操作,那么意味着 F ( x ) F(x) F(x) 与 x x x的shape必须一致.But in the actual network,还可以利用1x1The convolution of changing the number of channels,The above is on the leftResNet-34structure used,The bottleneck-like structure in the picture on the right isResNet-50/101/152structure used.
And doing so on the right effectively reduces the amount of parameters,Comparing the two calculations:
- The parameter on the left is:3x3x256x256+3x3x256x256 = 1,179,648
- The parameter on the right is:1x1x256x64+3x3x64x64+1x1x64x256 = 69,632
可以看到,We're in a residual block was reduced2个数量级的参数,而在ResNetA series of network construction process,is the stacking of these structures.
4、实验结果
ResNetAs shown in the above recommended parameter,The authors also replaced the fully connected layers with global average pooling,On the one hand, the number of parameters is reduced,On the other hand, fully connected layers are prone to overfitting and rely heavily on dropout 正则化,The global average pooling itself acts as a regularizer,Itself to prevent the overall structure of the fitting.此外,Global average pooling aggregates spatial information,So it is more robust to spatial transformation of the input.
The final experimental comparison results are also very obvious.,ResNet-34Effectively slow down the gradient disappearance/爆炸的现象.And for the exploration of deeper networks,ResNetEven stack the number of network layers to1000层,Although not widely used in industry,But in academic theory, it is of great significance.
5、总结
而最后,To answer our questions:“ResNet到底在解决什么问题?”,我们重新来看一下Res Block的结构.
现在假设 x = 5 , H ( x ) = 5.1 x=5,H(x) = 5.1 x=5,H(x)=5.1
- If it is a non-residual structure,Then the network map is: F ( 5 ) ′ = 5.1 F(5)' = 5.1 F(5)′=5.1
- If it is a residual structure,The network is mapped to: F ( 5 ) + 0.1 = 5.1 F(5) + 0.1 = 5.1 F(5)+0.1=5.1
这里的 F ′ F' F′和 F F F都表示网络参数映射,引入残差后的映射对输出的变化更敏感.For example, from5.1到5.2,映射 F ′ F' F′的输出增加了1/51=2%,And for the residual structure from5.1到5.2,映射 F F F是从0.1到0.2,增加了100%.明显后者输出变化对权重的调整作用更大,所以效果更好.(转自:resnet(残差网络)的F(x)究竟长什么样子?)Subsequent experiments also proved the hypothesis,Residual network ratioplainThe network is better trained.因此,ResNetTo solve the problem is better training network.
Finally, put the author to useKeras和tf2实现的ResNet.
边栏推荐
猜你喜欢
Centos通过Docker搭建MySQL的PXC集群
IoU, GIoU, DIoU and CIoU in target detection
DropBlock: 卷积层的正则化方法及复现代码
curl (7) Failed connect to localhost8080; Connection refused
Interpretation of EfficientNet: Composite scaling method of neural network (based on tf-Kersa reproduction code)
Database knowledge: SQLServer creates non-sa user notes
mysql:列类型之float、double
IDEA中创建编写JSP
数据库技巧:整理SQLServer非常实用的脚本
Error occurred while trying to proxy request项目突然起不来了
随机推荐
如何用matlab做高精度计算?【第一辑】
用matlab打造的摩斯电码加解码器音频版,支持包括中文在内的任意字符
E-R图总结规范
IDEA中创建编写JSP
A priori box (Anchor) in target detection
Base64编码原理
子空间结构保持的多层极限学习机自编码器(ML-SELM-AE)
Online public account article content to audio file practical gadget
Nacos 原理
What is the connection between GRNN, RBF, PNN, KELM?
误差指标分析计算之matlab实现【开源1.0.0版】
Error EPERM operation not permitted, mkdir ‘Dsoftwarenodejsnode_cache_cacach两种解决办法
ubuntu18.04安装redis教程
Computer software: recommend a disk space analysis tool - WizTree
事件链原理,事件代理,页面的渲染流程,防抖和节流,懒加载和预加载
53个全球免费学术资源数据库整理,查资料写论文必备【开学必备】
mysql月份比較是否相等
Computer knowledge: desktop computers should choose the brand and assembly, worthy of collection
MySQL重置root密码
秒杀系统设计