当前位置:网站首页>EfficientNET_ V1
EfficientNET_ V1
2022-07-28 06:22:00 【A tavern on the mountain】
EffcientNet_v1: Efficiently expand convolution network to achieve better efficiency and accuracy .
Catalog
1. introduction
In some previous papers , Basically through change 3 Parameters ( The channel number 、 Network layers 、 Enter the resolution of the image ) To improve the performance of the network , This paper is to explore the influence of these three parameters at the same time . Mention in the paper , What this article puts forward EfficientNet-B7 stay Imagenet top-1 It has reached the highest accuracy of the year 84.3%, With the highest accuracy before GPipe comparison , The number of arguments (Params) Just for its 1/8.4, Reasoning speed has increased 6.1 times .. The picture below is EfficientNet Comparison with other networks ( Be careful , A small number of parameters does not mean that the reasoning speed is fast ).

2. Increase model size

chart (b), By increasing the number of convolution kernels , Increase the number of channels of the output characteristic matrix to increase width, To be able to obtain Higher fine-grained features And it's easier to train , But for the width Large and shallow networks are often difficult to learn deeper features .
chart (c), By increasing the number of network layers (stage or block Number ) To increase the depth of the network depth, Can get More abundant 、 Complex features And can be well applied to other tasks . But if the depth of the network is too deep, the gradient will disappear , The problem of difficult training .
chart (d), Increase the image resolution of the input network (22*224—>600*600) Be able to potentially obtain Finer grained template features , But for very high input resolution , The gain in accuracy will also decrease . And large resolution images will increase the amount of computation .

In the benchmark EfficientNetB-0 Add... Respectively width、depth as well as resolution The statistical results obtained after . It can be seen from the above figure that it is probably in Accuracy achieve 80% It tends to be saturated 了 .
Then the author did another experiment , Use different d , r Combine , Then constantly change the network width Then you get the... As shown in the figure below 4 Curves , Through analysis, it can be found that in the same FLOPs Next , At the same time increase d and r It works best .
3.NAS Search for

Then the author proposes a hybrid scaling method ( compound scaling method) In this method, a Mixing factor ϕ To uniform zoom width,depth,resolution Parameters , The specific calculation formula is as follows , among s . t Represents a restriction :

FLOPs( Theoretical calculation amount ) And depth The relationship is : When depth Double ,FLOPs It's doubled .
FLOPs And width The relationship is : When width Double ( namely channal Double ),FLOPs Can turn 4 times , Because of the convolution FLOPs=featurew×featureh×featurec×kernelw×kernelh×kernelnumber
( It is assumed that the height and width of the input-output characteristic matrix are constant ), When width double times , Enter the value of the characteristic matrix channels And the output characteristic matrix channels Or the number of convolution kernels will double .
FLOPs And resolution The relationship is : When resolution Double ,FLOPs Can also turn 4 times , Similar to the above, because the width of the characteristic matrix featurew And the height of the characteristic matrix featureh Will double .
So in general FLOPs The magnification can be approximated by (α*β2*γ2)ϕ Express , When α*β2*γ2≈2 when , To any one ϕ for ,FLOPs It's equivalent to increasing 2ϕ times .
Next, the author in the benchmark network EfficientNetB-0( Later, the detailed structure of the network will be discussed in detail ) Upper use NAS To search for α、β、γ Three parameters .
(step1) First fix ϕ = 1, And based on the formula given above (2) and (3) To search , The author found that EfficientNetB-0 The best parameter is α=1.2、β=1.1、γ=1.15.
(step2) Then fix α=1.2、β=1.1、γ=1.15, stay EfficientNetB-0 Based on the use of different ϕ Get... Separately EfficientNetB-1 to EfficientNetB-7.
It should be noted that , about From different benchmark Networks α、β、γ Not necessarily the same . The other thing to note is that , In the original paper , The author also said , If you are directly in Search on the big model α、β、γ Better results may be obtained , But the search cost is too high in a larger model , So this article is on the smaller EfficientNetB-0 Search on the model .
4. Detailed network structure
The table below for EfficientNet-B0 Network framework of (B1-B7 Is in the B0 On the basis of modification Resolution,Channels as well as Layers), It can be seen that the network is divided into 9 individual Stage.

first Stage Is a convolution kernel with a size of 3x3 The step is 2 Ordinary convolution layer of ( contain BN And activation functions Swish).
Stage2~Stage8 All are On repeat stack MBConv structure ( In the last column Layers It means that we should Stage repeat MBConv How many times ).
Stage9 By an ordinary 1x1 The convolution of layer ( contain BN And activation functions Swish) An average pool layer and a full connection layer .
Each MBConv Followed by a number 1 or 6, there 1 or 6 It's the magnification factor (1*1 Convolution kernel dimension increasing ratio )n namely MBConv First of all 1x1 The convolution layer of the input characteristic matrix channels Expand to n times , among k3x3 or k5x5 Express MBConv in Depthwise Conv The size of convolution kernel used .Channels Indicates that the Stage After the output of the characteristic matrix Channels.
5.MBConv framework
MBConv In fact, that is MobileNetV3 Network InvertedResidualBlock, But there are some differences . One is that the activation function is different (EfficientNet Of MBConv Are used in Swish Activation function ), The other is in Every MBConv They all added SE(Squeeze-and-Excitation) modular .

MBConv The structure consists mainly of a 1x1 The general convolution of ( Dimension increasing effect , contain BN and Swish), One k*k Of Depthwise Conv Convolution ( contain BN and Swish)k The specific value of can be seen EfficientNet-B0 The network framework of is mainly 3*3 and 5*5 Two cases , One SE modular , One 1*1 The general convolution of ( Dimensionality reduction , contain BN), One Droupout Layers make up . There are several points to pay attention to during the construction process :
first Ascending dimensional 1*1 Convolution layer , The number of convolution kernels is the input characteristic matrix channel Of n times ,n =1 or n=6, When n=1 when , Don't be the first to upgrade 1*1 Convolution layer , namely Stage2 Medium MBConv No structure has the first dimension 1*1 Convolution layer ( This sum MobileNetV3 The Internet is similar to ).
About shortcut Connect , Only if you enter MBConv The characteristic matrix of the structure and the characteristic matrix of the output shape It exists only when it is the same .
SE modular As shown below , Pooled by a global average , Two fully connected layers . The number of nodes in the first full connection layer is the input value MBConv Characteristic matrix channels Of 1 /4, And the use of Swish Activation letter Count . The number of nodes in the second full connection layer is equal to Depthwise Conv layer Output characteristic matrix channels, And the use of Sigmoid Thrill Live functions .

Dropout Layer of dropout_rate stay tensorflow Of keras The source code corresponds to drop_connect_rate I'll talk about it later ( Be careful , In the source code implementation, only shortcut Only when Dropout layer ).

input_size Represents the image size of the input network when training the network .
width_coefficient representative channel The multiplier factor on the dimension , For example EfficientNetB0 in Stage1 Of 3x3 The number of convolution kernels used in the convolution layer is 32, So in B6 The middle is 32×1.8=57.6 Then round it to the nearest 8 An integral multiple of 56, Other Stage Empathy .
depth_coefficient representative depth The multiplier factor on the dimension ( Targeted only Stage2 To Stage8), For example EfficientNetB0 in Stage7 Of L^ i = 4 , So in B6 The middle is 4×2.6=10.4 Then round up, that is 11.
drop_connect_rate Is in MBConv In structure dropout Layer used drop_rate, Be careful , In the source code implementation, only shortcut Only when Dropout layer . The other thing to note is that , there Dropout Layer is Stochastic Depth, Will be Throw away the whole at random block Main branch of ( There are only shortcut branches , It's equivalent to skipping this block) It can also be understood as reducing the depth of the network .
dropout_rate It's before the last full connection layer dropout layer ( stay stage9 Of Pooling And FC Between ) Of dropout_rate.
Address of thesis :
https://arxiv.org/abs/1905.11946
https://arxiv.org/abs/1905.11946
Reference is highly recommended :
9.1 EfficientNet Network details _ Bili, Bili _bilibili
https://www.bilibili.com/video/BV1XK4y1U7PX?spm_id_from=333.999.0.0EfficientNet Network details _ Sunflower's little mung bean blog -CSDN Blog _efficientnet Network structure The detailed structure of the ideological network of the paper in the preface of the catalogue MBConv structure EfficientNet(B0-B7) Parameter preface original paper name :EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks Thesis download address :https://arxiv.org/abs/1905.11946 The original paper provides code :https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet Their own use Pythttps://blog.csdn.net/qq_37541097/article/details/114434046
边栏推荐
- For a large amount of data, matlab generates Excel files and typesetting processing source code
- RS232 RS485 RS422 communication learning and notes
- Beta分布(概率的概率)
- ESXi on ARM v1.2 (2020年11月更新)
- Perl入门学习(十)格式化输出
- 怎么看SIMULINK直接搭的模块的传递函数
- Best practices to ensure successful deployment of Poe devices
- ESXi社区版网卡驱动再次更新
- 4、 Model optimizer and inference engine
- N positions of bouncing shell
猜你喜欢

AEM线上产品推介会---线缆认证测仪

Varistor design parameters and classic circuit recording hardware learning notes 5

如何测试工业以太网线缆(利用FLUKE DSX-8000)?

Agilent安捷伦 E5071测试阻抗、衰减均正常,惟独串扰NG?---修复方案

福禄克DSX2-5000、DSX2-8000模块如何找到校准到期日期?

线缆测试中遇到苦恼---某厂商案例分析?

压敏电阻设计参数及经典电路记录 硬件学习笔记5

Nsctf web Title writeup

Internet of things interoperability system: classification, standards and future development

LED发光二极管选型-硬件学习笔记3
随机推荐
Model inversion attacks that exploit confidence information on and basic countermeasures
set_ false_ path
1、 Amd - openvino environment configuration
(PHP graduation project) based on PHP student daily behavior management system access
Perl入门学习(十一)文件操作
Arduino reads the analog voltage_ How mq2 gas / smoke sensor works and its interface with Arduino
AEM online product promotion conference - Cable certification tester
EMC实验实战案例-ESD静电实验
压敏电阻设计参数及经典电路记录 硬件学习笔记5
2、 Openvino brief introduction and construction process
MAE 掩码自编码是可扩展的学习
(PHP graduation project) obtained based on thinkphp5 campus news release management system
Shuffle Net_ v1-shuffle_ v2
AEM线上产品推介会---线缆认证测仪
ASP. Net read database bound to treeview recursive mode
(PHP graduation design) obtained based on PHP fruit sales store management system
51 single chip microcomputer independent key linkage nixie tube LED buzzer
电快速脉冲群(EFT)设计-EMC系列 硬件设计笔记4
ESXi社区版网卡驱动再次更新
PLC的选型