当前位置:网站首页>6.6 separate convolution

6.6 separate convolution

2022-06-12 11:33:00 The metamorphosis of chicken with vegetables

1 Separable convolution

Separable convolution includes Space separable convolution and Depth separates the convolution .

1.1 Space separable convolution (Spatially Separable Convolutions)

Space means [height, width] Two dimensions ; Spatially separable convolution will [n*n] Convolution division of [1*n] and [n*1] Calculate in two steps . give an example :

One 3*3 Convolution kernel , stay 5*5 Of feature map Do the calculation on , Total needs 3*3*9=81 Time calculation .

Again , In space separable convolutions , use 3*1 and 1*3 Two convolution kernels of 3*3 Convolution kernel , First step 3*1 The amount of calculation required is 15*3*1=45, The second step 1*3 The amount of calculation required is 9*1*3=27, Total needs 45+27=72 Time calculation , Less than 81 Time .

so , Spatially separable convolution can reduce the number of operations , Reduce computing costs .

1.2 Depth separates the convolution (Depth Separable Convolutions)

MobileNetV1 Is to put VGG The standard convolution in is replaced by deep separable convolution .

The core idea of deep separable convolution is to divide a complete convolution operation into two steps , Respectively Convolution by depth (Depthwise Convolution) And Point by point convolution (Pointwise Convolution).

If conventional convolution deals with a size of 64*64, Three channel picture , Included by 4 Convolution layer of a filter , Final output 4 individual feature map, And the size is the same as the input layer . The parameter quantity of convolution layer can be calculated as 4*3*3*3=108.

Deep separable convolution operation , First, perform depth by depth convolution , Apply a single filter to each input channel , obtain 3 individual feature map, The parameter is 3*3*3=27. Then convolution point by point , use 1*1 The convolution combines the output of convolution at different depths , Get a new set of outputs . The convolution here will replace the previous step map Weighted combination in depth direction , Generate a new feature map. The parameter calculation amount is 1*1*3*4=12. The total parameter is 27+12=39.

 

 2 Group convolution and depth separable convolution are compared

2.1 Grouping convolution  

Grouping convolution Group Convolution First seen in AlexNet In the network , It is a method to reduce the amount of parameters and calculation .

The main idea : Group input features , Convolute separately , Then combine the features . The total parameter of the block convolution is 1/G, among G Is the number of groups divided .

advantage : Packet convolution can reduce the number of parameters , It can also be regarded as a structured sparse method , It is equivalent to a regularization method .

When the number of groups is equal to the input / output dimension , namely G=Din=Dout, amount to MobileNet and Xception Depth convolution in ;

When the number of groups is equal to the input / output dimension ,G=Din=Dout, And when the input dimension of convolution kernel is equal to the input characteristic dimension, that is K=W=H, The input characteristic diagram is C*1*1, stay MobileFaceNet To become Global Depthwise Convolutions(GDC), That is, global weighted pooling . And GAP That is, the difference between global average pooling ,GDC Each position is given a learnable weight .

3 MobileNet series

3.1 MobileNet_V1

1 Innovation points

① Put forward MobileNet framework , Using deep separable convolution instead of traditional convolution , Reduce computation .

② Introduce two contraction superparameters (Shrinking Hyperparameters): Width multiplier (width multiplier) And resolution multiplier (Resoution multiplier), among Width multiplier The main function is Make every layer of the network thinner , Change the number of channels ; The role of the resolution multiplier is to reduce the amount of computation Hyperparameters .

2 The problem is

Some information is lost after training in deep convolution , Result in the weight of some cores being 0.

3.2 MobileNet_V2

1 Innovation points

①  Modify the last layer RELU6, introduce Linear BottleNeck

② Introduce feature reuse structure , take Resnet Thought ;

③ Reverse residual block is used inverted residuals block, Yes RELU To avoid the defects of .

RELU Dead neurons , And by Resnet Reuse of structures , It can greatly alleviate the problem of feature degradation .

2 defects

① At the end of the network, there is a lot of computation

3.3 MobileNet_V3

1 Innovation points

①  Complementary search technology combination :NAS Perform a module level search ,NetAdapt Perform a local search ;

② Network structure improvement : Move the average pool of the last step forward , And remove the last accretion layer , introduce h-swish Activation function .

h-swish Having... While maintaining accuracy :① It is easy to implement in the software and hardware framework ‘② Avoid loss of numerical accuracy ;③ Run fast 、

ref.

(11 Bar message ) 【 Thesis study 】 Lightweight networks ——MobileNetV3 Finally here ( Including open source code )_Lingyun_wu The blog of -CSDN Blog _mobilenetv3

原网站

版权声明
本文为[The metamorphosis of chicken with vegetables]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/163/202206121131113144.html