当前位置:网站首页>(cvpr-2019) selective kernel network
(cvpr-2019) selective kernel network
2022-07-29 02:02:00 【Gu daochangsheng '】
Selective kernel network
paper subject :Selective Kernel Networks
paper Published by Nanjing University of technology on CVPR 2019 The job of
paper link
Code: link
Abstract
In standard convolutional neural networks (CNN) in , The receptive fields of artificial neurons in each layer are designed to have the same size . It's well known in Neuroscience , The size of receptive field of neurons in visual cortex is regulated by stimulation , This is building CNN Rarely considered when . We put forward a kind of CNN Dynamic selection mechanism in , Each neuron is allowed to adaptively adjust the size of its receptive field according to multiple scales of input information . We have designed a system called selective nucleus (SK) Components of the unit , Among them, multiple branches with different core sizes are guided by the information of these branches , Use softmax Focus on Fusion . Different attention to these branches leads to different sizes of effective receptive fields of neurons in the fusion layer . Multiple SK The cells are stacked into a deep network , It is called selective nuclear network (SKNets). stay ImageNet and CIFAR On the benchmark , Our experience shows that ,SKNet It surpasses the most advanced existing architecture with low model complexity . Detailed analysis shows that ,SKNet Neurons in can capture target objects of different scales , This verifies the ability of neurons to adaptively adjust the size of their receptive field according to input .
1. Introduction
Last century , The primary visual cortex of cats (V1) The local receptive field of neurons in (RF)[14] Inspired convolutional neural network (CNN)[26] The construction of , And it continues to inspire modern CNN The structure of . for example , as everyone knows , In the visual cortex , The same area ( Such as V1 Area ) Of neurons RF The size is different , This enables neurons to collect multiscale spatial information at the same processing stage . This mechanism is found in recent Convolutional Neural Networks (CNNs) Widely used in . A typical example is InceptionNets[42, 15, 43, 41], One simple series connection is designed from "inception " In the component 3 × 3 , 5 × 5 , 7 × 7 3 \times 3,5 \times 5,7 \times 7 3×3,5×5,7×7 Gather multiscale information in convolution kernel .
However , In the design CNN when , Other cortical neurons RF Features have not been emphasized , One of the features is RF Adaptive changes in size . A great deal of experimental evidence shows that , Neurons in the visual cortex RF Size is not fixed , But stimulated regulation .V1 Classic of area neurons RF(CRF) By Hubel and Wiesel[14] Discovered , It is determined by a single directional bar . later , Many studies ( Such as [30]) Find out ,CRF Other stimuli will also affect the response of neurons . These neurons are known to have nonclassical RFs(nCRFs). Besides ,nCRF The size of is related to the contrast of the stimulus : The less contrast , Effective nCRF The larger the size [37]. It's amazing , Through stimulation nCRF A span , After removing these stimuli , Neuronal CRF Will also expand [33]. All these experiments show that , Neuronal RF Size is not fixed , It is modulated by stimulation [38]. Unfortunately , When building the deep learning model , This feature has not received much attention . Those models with multi-scale information on the same layer , Such as InceptionNets, There is an inherent mechanism , The neurons in the next convolution layer can be adjusted according to the input content RF size , Because the next convolution layer will linearly aggregate multiscale information from different branches . But this linear aggregation method may not be enough to provide neurons with strong adaptability .
In this paper , We propose a nonlinear method , Aggregate information from multiple cores , Realize the adaptation of neurons RF size . We introduced “ Selective kernel ”(SK) Convolution , It consists of three groups of operators . split 、 Fusion and choice . The split operator produces multiple paths with different kernel sizes , Corresponding to the difference of neurons RF size . The fusion operator combines and summarizes information from multiple paths , To obtain the global and comprehensive representation of the selection weight . The selection operator aggregates the characteristic graphs of kernels of different sizes according to the selection weight .
SK Convolution can be lightweight in computation , And there is only a slight increase in parameters and calculation costs . We show that , stay ImageNet 2012 Data sets [35] On ,SKNets Better than the most advanced model before , Its model complexity is similar . be based on SKNet50, We found it SK The best setting for convolution , And show the contribution of each component . To prove its universal applicability , We are still working on smaller datasets CIFAR-10 and 100[22] Provides convincing results , And successfully SK Embedded small model ( Such as ShuffleNetV2[27]) in .
In order to verify that the proposed model does have adjustment neurons RF The power of size , We simulate stimulation by enlarging the target object in the natural image and reducing the background to keep the image size unchanged . Results found , When the target object becomes larger , Most neurons collect more and more information from larger kernel pathways . These results suggest that , The proposed SKNet The neurons in have adaptive RF size , This may be the basis for the excellent performance of the model in object recognition .
2. Related Work
Multi branch convolution network .Highway The Internet [39] Skip path and gating unit are introduced . The dual branch structure reduces the difficulty of training hundreds of layers of Networks . This idea is also used in ResNet[9, 10], But skipping paths is a pure identity mapping . Except for identity mapping , Swing network [7] And multi residual network [1] Extends the main transformation with more identical paths . Deep neural decision forest [21] The tree structure multi branch principle with learning split function is formed .FractalNets[25] and Multilevel ResNets[52] The design method of is that multiple paths can be fractal and recursive extended .InceptionNets[42, 15, 43, 41] Carefully configure each branch with a custom kernel filter , In order to aggregate more information and multiple features . Please note that , The proposed SKNets follow InceptionNets Thought , Configure various filters for multiple branches , But there are at least two important differences .1)SKNets Our plan is much simpler , No need for a lot of custom design ;2) These multi branch adaptive selection mechanisms are used to realize the adaptation of neurons RF size .
grouping / depth / Expansion convolution . Packet convolution has become popular due to its low computational cost . use G Indicates the group size , Then compared with ordinary convolution , The number of parameters and the calculation cost will be divided by G. They first AlexNet [23] Used in , The purpose is to distribute the model in more GPU Resources . It's amazing , Using packet convolution ,ResNeXts [47] It can also improve accuracy . This G be called “ base ”, It characterizes the model with depth and width .
Many compact models have been developed based on interleaved block convolution , for example IGCV1 [53]、IGCV2 [46] and IGCV3 [40]. A special case of block convolution is depth convolution , Where the number of groups is equal to the number of channels . Xception [3] and MobileNetV1 [11] Introduced depthwise separable convolution, The ordinary convolution is decomposed into depthwise convolution and pointwise convolution. stay MobileNetV2 [36] and ShuffleNet [54, 27] And other follow-up work has verified the effectiveness of depth convolution . Apart from grouping / Beyond depth convolution , Cavity convolution [50, 51] Support RF Exponential expansion without losing coverage . for example , With expansion 2 Of 3×3 Convolution can cover approximately 5×5 The filter RF, It consumes less than half of the computing and memory at the same time . stay SK The convolution , Larger size ( for example ,>1) The kernel of is designed with grouping / depth / Extended convolution integration , To avoid a lot of overhead .
Attention mechanism . lately , The benefits of attention mechanism have been shown in a series of tasks , From neural machine translation in natural language processing [2] To image explanation in image understanding [49]. It focuses on the distribution of the most informative feature expression [16, 17, 24, 28, 31], And suppress less useful expressions . Attention has been widely used in recent applications , Such as pedestrian re identification [4]、 Image restoration [55]、 Text abstraction [34] And lip reading [48]. In order to improve the performance of image classification ,Wang wait forsomeone [44] Came up with a CNN Baseline and mask attention between intermediate stages . An hourglass module is introduced to achieve global emphasis across spatial and channel dimensions . Besides ,SENet[12] It brings an effective 、 Lightweight gating mechanism , I calibrate the feature map through channel oriented import . Except for the passage ,BAM[32] and CBAM[45] Spatial attention is also introduced in a similar way . by comparison , What we proposed SKNets It is the first self adaptation that explicitly focuses on neurons by introducing an attention mechanism RF size .
Dynamic convolution . Space transformation network [18] Learning parameter transformation to distort feature map , This is considered difficult to train . Dynamic filters [20] Only the parameters of the filter can be adaptively modified , There is no need to resize the kernel . Active convolution [19] The sampling position in convolution is increased by offset . These offsets are learned end-to-end , But it becomes static after training , And in the SKNet in , Neuronal RF The size can be changed adaptively in the reasoning process . Deformable convolution network [6] Further make the position offset dynamic , But it's not like SKNet Aggregate multi-scale information like that .
3. Methods
3.1. Selective Kernel Convolution
In order to enable neurons to adaptively adjust their RF size , We propose an automatic selection operation , namely “ Selective kernel ”(SK) Convolution , In multiple cores with different kernel sizes . say concretely , We implement SK Convolution ——Split、Fuse and Select, Pictured 1 Shown , It shows two branches . So in this case , There are only two kernels of different sizes , But it's easy to expand to multiple branches .

chart 1. Selective kernel convolution .
Split : For any given characteristic graph X ∈ R H ′ × W ′ × C ′ \mathbf{X} \in \mathbb{R}^{H^{\prime} \times W^{\prime} \times C^{\prime}} X∈RH′×W′×C′, By default , Let's start with two transformations F ~ : X → U ~ ∈ \tilde{\mathcal{F}}: \mathbf{X} \rightarrow \tilde{\mathbf{U}} \in F~:X→U~∈ R H × W × C \mathbb{R}^{H \times W \times C} RH×W×C and F ^ : X → U ^ ∈ R H × W × C \widehat{\mathcal{F}}: \mathbf{X} \rightarrow \widehat{\mathbf{U}} \in \mathbb{R}^{H \times W \times C} F:X→U∈RH×W×Cwith kernel Respectively 3 and 5. Please note that , F ~ \tilde{\mathcal{F}} F~ and F ^ \widehat{\mathcal{F}} F Are grouped by valid / Deep convolution 、 Batch normalization [15] and ReLU [29] Functions are composed in turn . In order to further improve efficiency , have 5 × 5 5 \times 5 5×5 The traditional convolution of the kernel is replaced with 3 × 3 3 \times 3 3×3 Kernel and expansion size 2 Extended convolution of .
Fuse: As stated in the introduction , Our goal is to enable neurons to adaptively adjust their RF size . The basic idea is to use gates to control the flow of information from multiple branches , These branches carry information of different sizes into the next layer of neurons .
To achieve this goal , Doors need to integrate information from all branches . We first fuse elements from multiple ( chart 1 Two of them ) The result of branching :
U = U ~ + U ^ \mathbf{U}=\tilde{\mathbf{U}}+\widehat{\mathbf{U}} U=U~+U
Then we embed global information by simply using global average pooling , To generate s ∈ R C \mathbf{s} \in \mathbb{R}^{C} s∈RC Channel statistics . say concretely , s \mathbf{s} s Of the c c c Elements are passed in the spatial dimension H × W H \times W H×W Zoom up U \mathbf{U} U Calculated :
s c = F g p ( U c ) = 1 H × W ∑ i = 1 H ∑ j = 1 W U c ( i , j ) . s_{c}=\mathcal{F}_{g p}\left(\mathbf{U}_{c}\right)=\frac{1}{H \times W} \sum_{i=1}^{H} \sum_{j=1}^{W} \mathbf{U}_{c}(i, j) . sc=Fgp(Uc)=H×W1i=1∑Hj=1∑WUc(i,j).
Besides , It also creates a compact feature z ∈ R d × 1 \mathbf{z} \in \mathbb{R}^{d \times 1} z∈Rd×1, To achieve precise and adaptive selection guidance . This is through a simple full connection (fc) Layer , Improve efficiency by reducing dimensions :
z = F f c ( s ) = δ ( B ( W s ) ) , \mathbf{z}=\mathcal{F}_{f c}(\mathbf{s})=\delta(\mathcal{B}(\mathbf{W} \mathbf{s})), z=Ffc(s)=δ(B(Ws)),
among δ \delta δ yes ReLU function [29], B \mathcal{B} B Indicates batch normalization [15], W ∈ R d × C W\in \mathbb{R}^{d \times C} W∈Rd×C. To study d d d Impact on model efficiency , We use a reduction ratio r r r To control its value :
d = max ( C / r , L ) , d=\max (C / r, L), d=max(C/r,L),
among L L L Express d d d The minimum value of ( L = 32 L=32 L=32 It is a typical setting in our experiment ).
choice : Cross channel soft attention is used to adaptively select information of different spatial scales , By compact feature descriptors z z z guide . say concretely ,softmax Operator applied to channel number :
a c = e A c z e A c z + e B c z , b c = e B c z e A c z + e B c z a_{c}=\frac{e^{\mathbf{A}_{c} \mathbf{z}}}{e^{\mathbf{A}_{c} \mathbf{z}}+e^{\mathbf{B}_{c} \mathbf{z}}}, b_{c}=\frac{e^{\mathbf{B}_{c} \mathbf{z}}}{e^{\mathbf{A}_{c} \mathbf{z}}+e^{\mathbf{B}_{c} \mathbf{z}}} ac=eAcz+eBczeAcz,bc=eAcz+eBczeBcz
among A , B ∈ R C × d \mathbf{A}, \mathbf{B} \in \mathbb{R}^{C \times d} A,B∈RC×d and a , b \mathbf{a}, \mathbf{b} a,b respectively U ~ \widetilde{\mathbf{U}} U and U ^ \widehat{\mathbf{U}} U Soft attention vector . Please note that , A c ∈ R 1 × d \mathbf{A}_{c} \in \mathbb{R}^{1 \times d} Ac∈R1×d yes A \mathbf{A} A Of the c c c That's ok , a c a_{c} ac yes a \mathbf{a} a Of the c c c Elements , B c \mathbf{B}_{c} Bc and b c b_{c} bc So it is with . In the case of two branches , matrix B \mathbf{B} B It's redundant , because a c + b c = 1 a_{c}+b_{c}=1 ac+bc=1. The final feature map V \mathbf{V} V It is obtained through the attention weight on various kernels :
V c = a c ⋅ U ~ c + b c ⋅ U ^ c , a c + b c = 1 \mathbf{V}_{c}=a_{c} \cdot \widetilde{\mathbf{U}}_{c}+b_{c} \cdot \widehat{\mathbf{U}}_{c}, \quad a_{c}+b_{c}=1 Vc=ac⋅Uc+bc⋅Uc,ac+bc=1
among V = [ V 1 , V 2 , … , V C ] , V c ∈ R H × W \mathbf{V}=\left[\mathbf{V}_{1}, \mathbf{V}_{2}, \ldots, \mathbf{V}_{C}\right], \mathbf{V}_{c} \in \mathbb{R}^{H \times W} V=[V1,V2,…,VC],Vc∈RH×W.


边栏推荐
- 【7.21-26】代码源 - 【体育节】【丹钓战】【最大权值划分】
- Data security is a competitive advantage. How can companies give priority to information security and compliance
- Code reading - ten C open source projects
- 规划数学期末模拟考试一
- Covering access to 2w+ traffic monitoring equipment, EMQ creates a new engine for the digitalization of all elements of traffic in Shenzhen
- 剑指offer专项突击版第13天
- Secret skill winter tide branding skill matching
- 知道创宇上榜CCSIP 2022全景图多个领域
- [UE4] replay game playback for ue4.26
- 【流放之路-第三章】
猜你喜欢

Planning mathematics final exam simulation II

What is the function of data parsing?

Network security litigation risk: four issues that chief information security officers are most concerned about

【流放之路-第三章】

【流放之路-第四章】

Autoware reports an error: can't generate global path for start solution
![[public class preview]: application exploration of Kwai gpu/fpga/asic heterogeneous platform](/img/e7/1d06eba0e50eeb91d2d5da7524f4af.png)
[public class preview]: application exploration of Kwai gpu/fpga/asic heterogeneous platform

Super technology network security risk assessment service, comprehensively understand the security risks faced by the network system

Js DOM2 和 DOM3

Data platform data access practice
随机推荐
【MySQL】sql给表起别名
Sigma-DSP-OUTPUT
Web crawler API Quick Start Guide
golang启动报错【已解决】
数学建模——带相变材料的低温防护服御寒仿真模拟
【7.21-26】代码源 - 【体育节】【丹钓战】【最大权值划分】
Day01 job
Add graceful annotations to latex formula; "Data science" interview questions collection of RI Gai; College Students' computer self-study guide; Personal firewall; Cutting edge materials / papers | sh
[the road of Exile - Chapter 2]
【7.21-26】代码源 - 【平方计数】【字典序最小】【“Z”型矩阵】
For a safer experience, Microsoft announced the first PC with a secure Pluto chip
Network security litigation risk: four issues that chief information security officers are most concerned about
druid. The performance of IO + tranquility real-time tasks is summarized with the help of 2020 double 11
DSP震动座椅
ciscn 2022 华中赛区 misc
科研环境对人的影响是很大的
剑指offer专项突击版第13天
Super technology network security risk assessment service, comprehensively understand the security risks faced by the network system
[7.21-26] code source - [sports festival] [Dan fishing war] [maximum weight division]
Dynamic memory and smart pointer