当前位置:网站首页>In depth analysis of mobilenet and its variants

In depth analysis of mobilenet and its variants

2022-06-23 13:39:00 Xiaobai learns vision

Click on the above “ Xiaobai studies vision ”, Optional plus " Star standard " or “ Roof placement

 Heavy dry goods , First time delivery 

I've been looking at things about lightweight networks recently , I found that this summary is very good , So I translated it ! Sum up the varieties , At the same time, the schematic diagram is very clear , Hope to give you some inspiration , If you feel good, welcome to sanlianha !

Introduction

In this paper , I outlined efficiency CNN Model ( Such as MobileNet And its variants ) The components used in (building blocks), And explain why they are so efficient . Specially , I've provided a visual explanation of how to convolute in space and channel domains .

Components used in efficient models

In explaining specific efficiency CNN Before the model , Let's check the efficiency first CNN The amount of computation of the components used in the model , Let's see how convolution works in space and channel domains .

9b3f3c403ac130085f0048b0de281e68.png

hypothesis H x W For export feature map Space size of ,N Is the number of input channels ,K x K Is the size of the convolution kernel ,M Is the number of output channels , Then the computation of standard convolution is HWNK²M .

The important point here is , The computation of standard convolution is the same as that of (1) Output characteristic map H x W Space size of ,(2) Convolution kernel K Size ,(3) Number of input and output channels N x M In direct proportion to .

When convoluting in the spatial domain and the channel domain , The above calculation is required . By decomposing this convolution , Can speed up CNNs, As shown in the figure below .

Convolution

 

First , I offer an intuitive explanation , How the convolution of space and channel domain is standard convolution , The amount of calculation is HWNK²M .

I connect the line between the input and the output , To visualize the dependency between input and output . The number of lines roughly represents the amount of computation of convolution in space and channel domain .

a44a100c6426b5d6ef155d546d861470.png

for example , The most common convolution ——conv3x3, It can be as shown in the figure above . We can see , Input and output are locally connected in the spatial domain , And in the channel domain, it's fully connected .

5ccca0cb5db92c367ffd1dc993556de0.png

Next , Used to change the number of channels as shown above conv1x1, or pointwise convolution. because kernel Its size is 1x1, So the computational cost of this convolution is HWNM, Calculation volume ratio conv3x3 To reduce the 1/9. This convolution is used to “ blend ” Information between channels .

Grouping convolution (Grouped Convolution)

 

Packet convolution is a variant of convolution , Will input feature map Channel grouping of , Convolute the channels of each packet independently .

hypothesis G Represents the number of groups , The computational complexity of grouping convolution is HWNK²M/G, The computation becomes standard convolution 1/G.

2a8319669756693189ffb6e43dbef312.png

stay conv3x3 and G=2 The situation of . We can see , The number of connections in the channel domain is smaller than the standard convolution , It shows that the calculation is less .

8a38e908fe0c82bc5a45ed5231fee051.png

stay conv3x3,G=3 Under the circumstances , Connections become more sparse .

83b59ef3fb09314365f81e0c677c39d3.png

stay conv1x1,G=2 Under the circumstances ,conv1x1 It can also be grouped . This type of convolution is used to ShuffleNet in .

ce98b25544fa989285c5bb7074863f2e.png

stay conv1x1,G=3 The situation of . 

Depth separates the convolution (Depthwise Convolution)

 

In depth convolution , Convolute each input channel separately . It can also be defined as a special case of packet convolution , Where the number of input and output channels is the same ,G Equal to the number of channels .

0814e7f73ba9a884f5399c1c5f64507b.png

As shown above ,depthwise convolution By omitting convolution in the channel domain , The computation is greatly reduced .

Channel Shuffle

 

Channel shuffle It's an operation ( layer ), It changes ShuffleNet The order of channels used in . This is done by tensor reshape and transpose To achieve .

More precisely , Give Way GN ' (=N) Represents the number of input channels , First, the dimension of the input channel reshape by (G, N '), And then (G, N ') Transpose to (N ', G), Finally, it will be flatten In the same shape as the input . here G Denotes the number of groups in the packet convolution , stay ShuffleNet China and channel shuffle Use layers together .

although ShuffleNet Can't use multiply add operation (MACs) To define by the number of , But there should be some overhead .

1ee764bc77fe62ec12df9211faeeca38.png

G=2 At the time of the channel shuffle situation . Convolution does not perform , It just changed the order of the channels .

09cdaf52739ff22c287d69f8085dc30f.png

The number of channels disturbed in this case G=3 

Efficient Models

below , For efficient CNN Model , I'm going to intuitively explain why they're efficient , And how to convolute in space and channel domain . 

ResNet (Bottleneck Version)

 

ResNet With the use of bottleneck The residual unit of the architecture is a good starting point for further comparison with other models .

728a406bbcb6c0b106d772e2b41419ef.png

As shown above , have bottleneck The residual unit of the architecture consists of conv1x1、conv3x3、conv1x1 form . first conv1x1 The dimension of input channel is reduced , Reduced the subsequent conv3x3 Amount of computation . final conv1x1 Restore the dimension of the output channel .

ResNeXt

 

ResNeXt It's an efficient CNN Model , It can be seen as ResNet A special case of , take conv3x3 Replace with a group of conv3x3. By using valid grouping conv, And ResNet comparison ,conv1x1 The channel reduction rate becomes moderate , So as to get better accuracy at the same cost .

08f591caf95f5d5dac8f03f0b0258f76.png

MobileNet (Separable Conv)

 

MobileNet It's a stack of separable convolution modules , from depthwise conv and conv1x1 (pointwise conv) form .

5b3dcac5ccd242001af6daaa560053c2.png

Separable convolution performs convolution independently in space domain and channel domain . This convolution decomposition significantly reduces the amount of computation , from HWNK²M Down to HWNK² (depthwise) + HWNM (conv1x1), HWN(K² + M) . In general ,M>>K( Such as K=3 and M≥32), The reduction rate is about 1/8-1/9.

The important point here is , Calculate the amount of bottleneck Now it is conv1x1! 

ShuffleNet

 

ShuffleNet The motivation is as mentioned above ,conv1x1 It's the bottleneck of separable convolution . although conv1x1 It's already working , There seems to be no room for improvement , grouping conv1x1 Can be used for this purpose !

ea5722d0780582613c013baee24e909c.png

The figure above illustrates the use of ShuffleNet Module . The important thing here is building block yes channel shuffle layer , In packet convolution, it performs the order of passthrough between groups “shuffles”. without channel shuffle, The output of packet convolution is not utilized between groups , Resulting in a decrease in accuracy .

MobileNet-v2

MobileNet-v2 A similar ResNet With medium bottleneck The module architecture of the residual unit ; Convolution can be separated by depth (depthwise convolution) Instead of conv3x3, It's an improved version of the residual unit .

c3d97b9d4081365ea3e3d77f0b577db2.png

You can see from above , And standard bottleneck The architecture is the opposite , first conv1x1 Increased channel dimension , And then execute depthwise conv, the last one conv1x1 Reduced channel dimensions .

960cb7320c010fe1726c2a3981ead3f5.png

By means of the above building blocks Reorder , And with MobileNet-v1( Separable conv) Compare , We can see how this architecture works ( This reordering does not change the entire model architecture , because MobileNet-v2 It's a stack of modules ).

in other words , The above module can be regarded as an improved version of separable convolution , In which the single of convolution can be separated conv1x1 It's broken down into two conv1x1. Give Way T The expansion factor representing the channel dimension , Two conv1x1 The amount of calculation is 2HWN²/T , And... Under separable convolution conv1x1 The amount of calculation is HWN². stay [5] in , Use T = 6, take conv1x1 The cost of computing has been reduced 3 times ( It's usually T/2).

FD-MobileNet

Last , Introduce Fast-Downsampling MobileNet (FD-MobileNet)[10]. In this model , And MobileNet comparison , Down sampling is performed in earlier layers . This simple technique can reduce the total computational cost . The reason is the traditional down sampling strategy and the computational cost of separable variables .

from VGGNet Start , Many models use the same down sampling strategy : Perform down sampling , Then double the number of channels in subsequent layers . For standard convolutions , The amount of calculation remains unchanged after down sampling , Because by definition HWNK²M . And for separable variables , After down sampling, the amount of calculation is reduced ; from HWN(K² + M) Reduced to H/2 W/2 2N(K² + 2M) = HWN(K²/2 + M). When M When it's not big ( The earlier layers ), It's relatively dominant .

The following is a summary of the full text

d83e08699f1cdec3f9d145633256a9f8.png

The good news !

Xiaobai learns visual knowledge about the planet

Open to the outside world

21528e65608ea53474d0207e2d480fa5.png

 download 1:OpenCV-Contrib Chinese version of extension module 

 stay 「 Xiaobai studies vision 」 Official account back office reply : Extension module Chinese course , You can download the first copy of the whole network OpenCV Extension module tutorial Chinese version , Cover expansion module installation 、SFM Algorithm 、 Stereo vision 、 Target tracking 、 Biological vision 、 Super resolution processing and other more than 20 chapters .


 download 2:Python Visual combat project 52 speak 
 stay 「 Xiaobai studies vision 」 Official account back office reply :Python Visual combat project , You can download, including image segmentation 、 Mask detection 、 Lane line detection 、 Vehicle count 、 Add Eyeliner 、 License plate recognition 、 Character recognition 、 Emotional tests 、 Text content extraction 、 Face recognition, etc 31 A visual combat project , Help fast school computer vision .


 download 3:OpenCV Actual project 20 speak 
 stay 「 Xiaobai studies vision 」 Official account back office reply :OpenCV Actual project 20 speak , You can download the 20 Based on OpenCV Realization 20 A real project , Realization OpenCV Learn advanced .


 Communication group 

 Welcome to join the official account reader group to communicate with your colleagues , There are SLAM、 3 d visual 、 sensor 、 Autopilot 、 Computational photography 、 testing 、 Division 、 distinguish 、 Medical imaging 、GAN、 Wechat groups such as algorithm competition ( It will be subdivided gradually in the future ), Please scan the following micro signal clustering , remarks :” nickname + School / company + Research direction “, for example :” Zhang San  +  Shanghai Jiaotong University  +  Vision SLAM“. Please note... According to the format , Otherwise, it will not pass . After successful addition, they will be invited to relevant wechat groups according to the research direction . Please do not send ads in the group , Or you'll be invited out , Thanks for your understanding ~
原网站

版权声明
本文为[Xiaobai learns vision]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/174/202206231254280741.html