当前位置：网站首页>EfficientNET_ V1

EfficientNET_ V1

2022-07-28 06:22:00 【A tavern on the mountain】

EffcientNet_v1: Efficiently expand convolution network to achieve better efficiency and accuracy .

Catalog

1. introduction

2. Increase model size

3.NAS Search for

4. Detailed network structure

5.MBConv framework

Address of thesis ：

1. introduction

In some previous papers , Basically through change 3 Parameters ( The channel number 、 Network layers 、 Enter the resolution of the image ) To improve the performance of the network , This paper is to explore the influence of these three parameters at the same time . Mention in the paper , What this article puts forward EfficientNet-B7 stay Imagenet top-1 It has reached the highest accuracy of the year 84.3%, With the highest accuracy before GPipe comparison , The number of arguments （Params） Just for its 1/8.4, Reasoning speed has increased 6.1 times .. The picture below is EfficientNet Comparison with other networks （ Be careful , A small number of parameters does not mean that the reasoning speed is fast ）.

2. Increase model size

chart （b）, By increasing the number of convolution kernels , Increase the number of channels of the output characteristic matrix to increase width, To be able to obtain Higher fine-grained features And it's easier to train , But for the width Large and shallow networks are often difficult to learn deeper features .

chart （c）, By increasing the number of network layers （stage or block Number ） To increase the depth of the network depth, Can get More abundant 、 Complex features And can be well applied to other tasks . But if the depth of the network is too deep, the gradient will disappear , The problem of difficult training .

chart （d）, Increase the image resolution of the input network （22*224—>600*600） Be able to potentially obtain Finer grained template features , But for very high input resolution , The gain in accuracy will also decrease . And large resolution images will increase the amount of computation .

In the benchmark EfficientNetB-0 Add... Respectively width、depth as well as resolution The statistical results obtained after . It can be seen from the above figure that it is probably in Accuracy achieve 80% It tends to be saturated 了 .

Then the author did another experiment , Use different d , r Combine , Then constantly change the network width Then you get the... As shown in the figure below 4 Curves , Through analysis, it can be found that in the same FLOPs Next , At the same time increase d and r It works best .

3.NAS Search for

Then the author proposes a hybrid scaling method ( compound scaling method) In this method, a Mixing factor ϕ To uniform zoom width,depth,resolution Parameters , The specific calculation formula is as follows , among s . t Represents a restriction ：

FLOPs（ Theoretical calculation amount ） And depth The relationship is ： When depth Double ,FLOPs It's doubled .

FLOPs And width The relationship is ： When width Double （ namely channal Double ）,FLOPs Can turn 4 times , Because of the convolution FLOPs=featurew×featureh×featurec×kernelw×kernelh×kernelnumber

（ It is assumed that the height and width of the input-output characteristic matrix are constant ）, When width double times , Enter the value of the characteristic matrix channels And the output characteristic matrix channels Or the number of convolution kernels will double .

FLOPs And resolution The relationship is ： When resolution Double ,FLOPs Can also turn 4 times , Similar to the above, because the width of the characteristic matrix featurew And the height of the characteristic matrix featureh Will double .

So in general FLOPs The magnification can be approximated by （α*β2*γ2）ϕ Express , When α*β2*γ2≈2 when , To any one ϕ for ,FLOPs It's equivalent to increasing 2ϕ times .

Next, the author in the benchmark network EfficientNetB-0（ Later, the detailed structure of the network will be discussed in detail ） Upper use NAS To search for α、β、γ Three parameters .

（step1） First fix ϕ = 1, And based on the formula given above (2) and (3) To search , The author found that EfficientNetB-0 The best parameter is α=1.2、β=1.1、γ=1.15.

（step2） Then fix α=1.2、β=1.1、γ=1.15, stay EfficientNetB-0 Based on the use of different ϕ Get... Separately EfficientNetB-1 to EfficientNetB-7.

It should be noted that , about From different benchmark Networks α、β、γ Not necessarily the same . The other thing to note is that , In the original paper , The author also said , If you are directly in Search on the big model α、β、γ Better results may be obtained , But the search cost is too high in a larger model , So this article is on the smaller EfficientNetB-0 Search on the model .

4. Detailed network structure

The table below for EfficientNet-B0 Network framework of （B1-B7 Is in the B0 On the basis of modification Resolution,Channels as well as Layers）, It can be seen that the network is divided into 9 individual Stage.

first Stage Is a convolution kernel with a size of 3x3 The step is 2 Ordinary convolution layer of （ contain BN And activation functions Swish）.

Stage2～Stage8 All are On repeat stack MBConv structure （ In the last column Layers It means that we should Stage repeat MBConv How many times ）.

Stage9 By an ordinary 1x1 The convolution of layer （ contain BN And activation functions Swish） An average pool layer and a full connection layer .

Each MBConv Followed by a number 1 or 6, there 1 or 6 It's the magnification factor （1*1 Convolution kernel dimension increasing ratio ）n namely MBConv First of all 1x1 The convolution layer of the input characteristic matrix channels Expand to n times , among k3x3 or k5x5 Express MBConv in Depthwise Conv The size of convolution kernel used .Channels Indicates that the Stage After the output of the characteristic matrix Channels.

5.MBConv framework

MBConv In fact, that is MobileNetV3 Network InvertedResidualBlock, But there are some differences . One is that the activation function is different （EfficientNet Of MBConv Are used in Swish Activation function ）, The other is in Every MBConv They all added SE（Squeeze-and-Excitation） modular .

MBConv The structure consists mainly of a 1x1 The general convolution of （ Dimension increasing effect , contain BN and Swish）, One k*k Of Depthwise Conv Convolution （ contain BN and Swish）k The specific value of can be seen EfficientNet-B0 The network framework of is mainly 3*3 and 5*5 Two cases , One SE modular , One 1*1 The general convolution of （ Dimensionality reduction , contain BN）, One Droupout Layers make up . There are several points to pay attention to during the construction process ：

first Ascending dimensional 1*1 Convolution layer , The number of convolution kernels is the input characteristic matrix channel Of n times ,n =1 or n=6, When n=1 when , Don't be the first to upgrade 1*1 Convolution layer , namely Stage2 Medium MBConv No structure has the first dimension 1*1 Convolution layer （ This sum MobileNetV3 The Internet is similar to ）.

About shortcut Connect , Only if you enter MBConv The characteristic matrix of the structure and the characteristic matrix of the output shape It exists only when it is the same .

SE modular As shown below , Pooled by a global average , Two fully connected layers . The number of nodes in the first full connection layer is the input value MBConv Characteristic matrix channels Of 1 /4, And the use of Swish Activation letter Count . The number of nodes in the second full connection layer is equal to Depthwise Conv layer Output characteristic matrix channels, And the use of Sigmoid Thrill Live functions .

Dropout Layer of dropout_rate stay tensorflow Of keras The source code corresponds to drop_connect_rate I'll talk about it later （ Be careful , In the source code implementation, only shortcut Only when Dropout layer ）.

input_size Represents the image size of the input network when training the network .

width_coefficient representative channel The multiplier factor on the dimension , For example EfficientNetB0 in Stage1 Of 3x3 The number of convolution kernels used in the convolution layer is 32, So in B6 The middle is 32×1.8=57.6 Then round it to the nearest 8 An integral multiple of 56, Other Stage Empathy .

depth_coefficient representative depth The multiplier factor on the dimension （ Targeted only Stage2 To Stage8）, For example EfficientNetB0 in Stage7 Of L^ i = 4 , So in B6 The middle is 4×2.6=10.4 Then round up, that is 11.

drop_connect_rate Is in MBConv In structure dropout Layer used drop_rate, Be careful , In the source code implementation, only shortcut Only when Dropout layer . The other thing to note is that , there Dropout Layer is Stochastic Depth, Will be Throw away the whole at random block Main branch of （ There are only shortcut branches , It's equivalent to skipping this block） It can also be understood as reducing the depth of the network .