当前位置：网站首页>Interpretation of EfficientNet: Composite scaling method of neural network (based on tf-Kersa reproduction code)

Interpretation of EfficientNet: Composite scaling method of neural network (based on tf-Kersa reproduction code)

2022-08-04 07:01:00 【hot-blooded chef】

论文：https://arxiv.org/pdf/1905.11946.pdf

代码：https://github.com/qubvel/efficientnet

1、介绍

EfficientNetThe paper attracted a lot of attention when it was published,The reason is because the result it shows kills the existing network in seconds,And in the case of a higher accuracy rate,There are few parameters,在ImageNeton the slaughter list.

在这里插入图片描述

Seeing this breathtaking result,Many would argue that this paper should propose a completely new structure,To do it fast and well.其实并不是这样,作者独辟蹊径,From an angle that no one had noticed before：Quantify the relationship between the three dimensions,网络的宽度,深度和分辨率.

在之前的工作中,Usually just zoom in in three dimensions-深度,宽度,and one of the image dimensions.Although two or three dimensions can be arbitrarily increased,But arbitrary increases require tedious manual tuning and often result in sub-optimal accuracy and efficiency.

这篇文章中,The author rethinksCNNThe process of magnification.In particular, this core question is studied：Is there a principled way to growConvNetsCan achieve better accuracy and efficiency？

在这里插入图片描述

如上图所示,如果我们有一个baseline模型,例如,如果我们想要使用 $2^N$ times more computing resources,那么我们可以通过 $\alpha^N$ to increase the depth of the network,通过 $\beta^N$ 增加网络的宽度,通过 $\gamma^N$ to increase the resolution of the image.(b)-(d)are arbitrarily increased by one of the coefficients,(e)It is the composite scaling method proposed by the author.

2、问题公式化

定义：

一个ConvNet的第i层定义为： $Y_i=F_i(X_i)$ , $Y_i$ 是输出的tensor, $X_i$ 是输入的tensor, $F_i$ 是卷积层,输入tensor的shape为 $H_i, W_i,C_i>$ ,那么一个完整的ConvNet定义 $F_k\bigodot...\bigodot F_2\bigodot F_1(X_i) = \bigodot_{j=1...k}F_j(X_1)$ .And a convolutional neural network is often divided into multiplestage,例如ResNet就被分为5个stage.

因此可以将ConvNet统一定义为：
$\bigodot_{i=1...s}F^{L_i}_{i}(X<H_i,W_i,C_i>)$
其中下标i（从1到s）表示的是stage的序号, $F_{i}^{L_i}$ 代表第i个stage,它由 $F_i$ Layers are repeated $L_i$ 次构成.

与通常的ConvNet设计不同,通常的ConvNetDesign is mainly concerned with finding the best layer structure $F_i$ ,But the author of this article did the opposite.Consistently scale the network width by a constant ratio、深度和分辨率,rather than scaling a single coefficient individually.So that the memory usage and the amount of parameters are less than the target threshold,achieve the highest accuracy.

Expressed as a formula：

在这里插入图片描述

其中 $w, d, r$ are the width of the network, respectively,深度和分辨率的系数.其他参数同上.

3、EfficientNet

The author existsMobileNets和ResNetsThe above proves that the compound scaling method performs well,And the accuracy of the model after scaling is largely dependent on baseline网络.所以作者使用NAS(神经网络搜索)Search for a better onebaseline网络,也即EfficientNet-B0.

Efficient block

The overall design idea is very simple to use first1x1的卷积升维,Add an attention mechanism after the depthwise separable convolution operation（SENet）,最后再利用1x1After convolutional dimension reduction,Merge with output.It is just a combination of the advantages of the previous networks.

在这里插入图片描述

EfficientNet

Eventually will these fewBlockRepeat the stacking multiple timesEfficientNet

在这里插入图片描述

4、实验结果

The authors first scale separately $w, d, r$ 的值,Often zooming in can get better accuracy,But the quasi went up to80%It will quickly saturate after that,This shows the limitations of dilating only a single dimension.

在这里插入图片描述

Then the author tried the method of compound scaling,直观来说,for larger resolution images,我们应该增加网络的深度,This larger receptive field helps capture similar features in larger images containing more pixels.相应的,when the resolution is higher,We should also increase the width of the network,Capture more fine-grained patterns in high-resolution images with more pixels.This means that we need to coordinate and balance scaling in different dimensions,instead of the traditional scaling of a single dimension.

为了验证这个想法,The authors compare the results of scaling widths at different network depths and resolutions,如下图所示.If just scaling the network width $w$ ,而不改变深度（ $d = 1.0$ ）和分辨率（ $r = 1.0$ ）,Accuracy will quickly saturate.And in deeper（ $d = 2.0$ ）and higher resolutions（ $r = 2.0$ ）,Scaling the width of the network achieves better accuracy with the same amount of computation.

在这里插入图片描述

Each point on a line represents a model with a different width factorw

5、结论

基于上述的实验结果,The paper presents a new one复合缩放方法,使用一个复合系数 $\phi$ to consistently scale the width of the network,Depth and resolution in a specified way

定义：
$d=\alpha^\phi\\ width: w=\beta^\phi\\ resolution: r=\gamma^\phi\\$
Suggested formula for scaling factor：
$\alpha \cdot \beta^2 \cdot \gamma^2 \approx 2\\ \alpha \geq 1, \beta \geq1, \gamma\geq1$
So for anything new $\phi$ The total number of operations will increase approximately $2^\phi$