当前位置:网站首页>How many convolution methods does deep learning have? (including drawings)
How many convolution methods does deep learning have? (including drawings)
2022-07-03 18:21:00 【ZRX_ GIS】
1 The focus of
How to improve the existing network architecture through the selection of convolution mode
About convolution
2 Why do traditional networks use small convolution instead of large convolution ?(VGG Net)
kernel_size | Adv | exp | dis_Adv |
---|---|---|---|
large_kernel_size | The range of perception is large | AlexNet、LeNet And other networks use a relatively large convolution kernel , Such as 5×5,11×11 | There are many parameters ; Large amount of computation |
small_kernel_size | Less parameters ; A small amount of calculation ; Three nonlinear activation layers are integrated instead of a single nonlinear activation layer , Increase the discrimination ability of the model | VGG after | Insufficient sensory domain ; Deep stack convolution is prone to uncontrollable factors |
3 Can a fixed size convolution kernel see a larger area ?( Cavity convolution )
The standard 3×3 The convolution kernel can only see the corresponding region 3×3 Size , But in order for the convolution kernel to see a larger range ,dilated conv Make it possible .pooling The information loss caused by down sampling operation is irreversible , This is not conducive to pixel level tasks , Replace with void convolution pooling The role of ( Multiply the receptive field ) It is more suitable for semantic segmentation .
4 Must the convolution kernel be a square ?( Asymmetric convolution )
Standard 3×3 The convolution is split into a 1×3 Convolution sum 3×1 Convolution , Without changing the size of receptive field, the amount of calculation can be reduced .
5 Can convolution only be done in the same set ?( Group convolution & Depth separates the convolution )
Group convolution Is to group the input characteristic graph , Convolution was performed for each group . Assume that the size of the input feature map is CHW(12×5×5), The number of output characteristic graphs is N(6) individual , If the setting is to be divided into G(3) individual groups, Then the number of input characteristic diagrams of each group is C / G ( 4 ), The number of output characteristic diagrams of each group is N/G(2), The size of each convolution kernel is (C/G)KK(4×5×5), The total number of convolution kernels is still N(6) individual , The number of convolution kernels in each group is N/G(2), Each convolution kernel only convolutes with the input characteristic graphs of the same group , The total parameter of convolution kernel is N*(C/G)KK, so , The number of total parameters was reduced to the original 1/G.
6 Can packet convolution randomly group channels ?(ShffleNet)
To achieve... Between features Mutual communication , In addition to using dense point wise convolution, You can also use channel shuffle. Yes group convolution Then the characteristic diagram is analyzed “ restructuring ”, This ensures that the following convolution inputs come from different groups , Therefore, information can flow between different groups . chart c This process is further demonstrated , amount to “ Disturb evenly ”.
7 Can only one size convolution kernel be used for each layer of convolution ?(Inception)
Traditional cascading networks , It's basically a stack of convolutions , Each layer has only one size convolution kernel , for example V G G A large number of 3×3 Convolution layer . in fact , Same floor feature map Multiple convolution kernels of different sizes can be used separately , In order to obtain features of different scales , Combine these features together , The obtained features are often better than those using a single convolution kernel . In order to reduce the parameters as much as possible , Usually use first 1 * 1 The convolution of maps the characteristic graph to Hidden space , Then do convolution in hidden space .
8 Are the features between channels equal ?(SENet)
Whether in the Inception、DenseNet perhaps ShuffleNet Inside , The features we generate for all channels are directly combined regardless of weight , So why think that the characteristics of all channels have equal effects on the model ? There are often thousands of convolution kernels in a convolution layer , Each convolution kernel corresponds to a feature , So how to distinguish so many features ? This method is to automatically obtain the importance of each feature channel through learning , Then, according to the calculated importance, enhance the useful features and suppress the features that are not useful for the current task .
9 Is the convolution kernel necessarily rectangular ?( Deformable convolution )
Regular shaped convolution kernel ( For example, the general square 3*3 Convolution ) Feature extraction may be limited , If the convolution kernel is given the property of deformation , Let the network according to label The error passed back automatically adjusts the shape of the convolution kernel , Adapt to the region of interest that the network focuses on , You can extract better features . for example , The network will be based on the original location (a), Learn one offset Offset , A new convolution kernel is obtained (b)(c)(d), Then some special cases will become special cases of this more generalized model , For example, figure (c) Represents the recognition of objects from different scales , chart (d) Represents the recognition of rotating objects
10 Network rewriting ideas
(1)kernel:
First , The large convolution kernel is replaced by several small convolution kernels
secondly , Single size convolution kernel is replaced by Multi Size convolution kernel
also , The deformable convolution is used to replace the fixed shape convolution kernel
or , Add... To the network 1X1 Convolution
(2)channels:
First , Depth separable convolution is introduced
secondly , Introduce packet convolution
also , introduce channel shuffle
or ,feature map weighting
(3)connection
First , introduce skip
secondly , introduce dense, Make each layer blend with the other layers (DenseNet)
11 summary
边栏推荐
- [combinatorics] generating function (positive integer splitting | unordered non repeated splitting example)
- Codeforces Round #803 (Div. 2) C. 3SUM Closure
- Redis cache avalanche, penetration, breakdown
- Codeforces Round #803 (Div. 2) C. 3SUM Closure
- Life perception 1
- [combinatorics] exponential generating function (example 2 of solving multiple set permutation with exponential generating function)
- Kotlin's collaboration: Context
- English语法_名词 - 分类
- Have you learned the correct expression posture of programmers on Valentine's day?
- PHP MySQL order by keyword
猜你喜欢
Module 9 operation
Grammaire anglaise Nom - Classification
English语法_形容词/副词3级 - 倍数表达
How to install PHP on Ubuntu 20.04
BFS - topology sort
Market demand survey and marketing strategy analysis report of global and Chinese pet milk substitutes 2022-2028
Line by line explanation of yolox source code of anchor free series network (5) -- mosaic data enhancement and mathematical understanding
Redis core technology and practice - learning notes (VI) how to achieve data consistency between master and slave Libraries
Administrative division code acquisition
Redis core technology and practice - learning notes (VII) sentinel mechanism
随机推荐
2022-2028 global sepsis treatment drug industry research and trend analysis report
Keepalived 设置不抢占资源
Computer graduation design PHP campus address book telephone number inquiry system
Micro service component sentinel console call
Three gradient descent methods and code implementation
Theoretical description of linear equations and summary of methods for solving linear equations by eigen
Should I be laid off at the age of 40? IBM is suspected of age discrimination, calling its old employees "dinosaurs" and planning to dismiss, but the employees can't refute it
How do microservices aggregate API documents? This wave of operation is too good
Postfix 技巧和故障排除命令
2022-2028 global scar care product industry research and trend analysis report
English語法_名詞 - 分類
How to install PHP on Ubuntu 20.04
2022-2028 global lithium battery copper foil industry research and trend analysis report
On Data Mining
Line by line explanation of yolox source code of anchor free series network (6) -- mixup data enhancement
[combinatorics] generating function (property summary | important generating function)*
204. Count prime
What kind of experience is it when the Institute earns 20000 yuan a month?
毕业总结
(8) HS corner detection