当前位置:网站首页>EPSANet: An Efficient Pyramid Split Attention Block on Convolutional Neural Network
EPSANet: An Efficient Pyramid Split Attention Block on Convolutional Neural Network
2022-08-02 08:35:00 【ZZE15832206526】
一、EPSANet: An Efficient Pyramid Split Attention Block on Convolutional Neural Network
摘要: 最近,有研究表明,By embedding an attention module into a deep convolutional neural network,Can effectively improve the performance of the network.This paper proposes a new lightweight、effective attention methods——Pyramid distraction(EPSA)模块.通过在ResNet瓶颈块中的PSA模块替换3x3卷积,A new efficient pyramid-splitting attention is obtained(EPSA).EPSABlocks can easily be added to a full-fledged backbone network as a plug-and-play component,and can achieve significant improvements in model performance.因此,By overlaying theseResNet风格的EPSA块,我们开发了一个名为EPSANetSimple and efficient backbone architecture of.相应地,EPSANetCan classify images including but not limited to、目标检测、Various computer vision tasks such as instance segmentation provide stronger multi-scale representation capabilities.在没有花哨的情况下,提议的EPSANetoutperforms most state-of-the-art channel attention methods.
Attention mechanisms are widely used in image classification、目标检测、实例分割、语义分割、Scenario analysis and action orientation such as computer vision areas.具体来说,注意方法有两种类型,即通道注意和空间注意.最近,有研究表明,Use channel attention、Spatial attention or both can significantly improve performance.The most commonly used channel attention methods are squeezing and excitation(SE)模块,它可以以相当低的成本显著提高性能.SENet的缺点是它忽略了空间信息的重要性.因此,We propose a bottleneck attention module(BAM)and the convolutional block attention module(CBAM),Enrich the attention map by effectively combining spatial attention and channel attention.然而,There are still two important and challenging problems to be solved.The first question is how to effectively capture and utilize the spatial information of feature maps at different scales,以丰富特征空间.第二,Channel or spatial attention can only effectively capture local information,but cannot establish a long-term channel dependency.相应地,Many methods have been proposed to solve these two problems.提出了基于多尺度特征表示和跨通道信息交互的方法,如PyConv、Res2Net和HS-ResNet等.另一方面,A long-term channel dependency can be established.然而,The model complexity of the above method is high,Large amount of network calculation.基于以上观察,We believe it is necessary to develop a low-cost but effective attention module.本文提出了一种低成本、High-performance new module pyramid is distracting(EPSA).所提出的PSAModules have the ability to process input tensors at multiple scales.具体来说,A common feature map is divided intoS组,每个组都有CS通道.然后,Using multi-scale pyramid convolution structure in each channel characteristic figure in the information integration of different scales.通过这样做,可以更精确地合并上下文特征的邻居尺度.最后,Channel-level attention weights by extracting multi-scale feature maps,established cross-dimensional interactions.采用SoftmaxThe operation recalibrates the attention weights of the corresponding channels,thus establishing long-term channel dependence.因此,利用ResNet瓶颈块中的3x3模块代替3x3卷积,Got a new effective pyramid splitting attention(EPSA)块.此外,通过将这些EPSABlocks are superimposed asResNet风格,提出了一种名为EPSANet的网络.如图1所示,所提出的EPSANet不仅在Top-1The accuracy is superior to the state-of-the-art,And it is more efficient in terms of required parameters.本工作的主要贡献总结如下:
- A new efficient pyramid splitting attention is proposed(EPSA)块,This block can effectively extract multi-scale spatial information at a finer level,and develop long-term channel dependence.所提出的EPSABlocks are very flexible and extensible,So can be applied to many computer vision tasks of various kinds of network architecture.
- A new backbone architecture is proposedEPSANet,The architecture can learn richer multi-scale feature representations,And adaptive to recalibrate the d channel attention to weight.
- 大量的实验表明,所提出的EPSANet在ImageNet和COCO数据集上的图像分类、Good results can be achieved in both object detection and instance segmentation.
2 相关工作
注意力机制 Most abundant attention mechanism to strengthen the information distribution of the characteristics of the expression,At the same time suppress less useful feature expression,从而使模型自适应地关注上下文中的重要区域.SENet中的挤压和激发(SE)Note that channel correlations can be captured by selectively tuning the scale of the channels.CBAM中的CBAMCan pass for the large size of nuclear channel attention to add maximum pool characteristics to enrich note figure.受CBAM的激励,A2Net中的GSoPA second-order pooling method is proposed to extract richer feature aggregations.最近,Proposed the nonlocal blocks to build intensive space feature maps,and capture long-term dependencies through non-local operations.基于非局部块,双注意网络(A2Net)引入了一种新的关系函数,Attention and spatial information embedded in the diagram.因此,SKNet引入了一种动态选择注意机制,Allow each neuron based on the input feature maps multiple scales adaptively adjust its accept the size of the wild.ResNeSproposed a similar distraction block,Attention can be made across groups of feature maps.FcanetPut forward a new method of frequency channel more attention,Implemented preprocessing for frequency-domain channel attention mechanism.GCNet引入了一个简单的空间注意模块,thus developing a long-term channel dependence.ECANetA one-dimensional convolution layer is used to reduce the connection layer redundancy.DANetBy adding these two attention modules from different branches,Adaptively integrates local features with their global dependencies.上述方法要么侧重于设计更复杂的注意力模块,这不可避免地会带来更大的计算成本,Or they can't build up a long-term channel dependence.因此,In order to further improve the efficiency and reduce the complexity of the model,提出了一种新的注意模块PSA,This module aims to learn attention weights with low model complexity,and effectively integrate local and global attention,Establish a long-term channel dependence.
多尺度的特征表示 多尺度特征表示的能力对于各种视觉任务至关重要,如实例分割、面部分析、目标检测、Salient Object Detection and Semantic Segmentation.Extracting multi-scale features more efficiently for visual recognition tasks is crucial.通过在CNNAn operator capable of multi-scale feature extraction is embedded in,可以获得更有效的特征表示能力.另一方面,CNNCoarse-to-fine multi-scale features can be learned naturally through a bunch of convolution operators.因此,Designing a better convolution operator is an improvementCNNThe key to multi-scale representation.
3 方法
3.1Restore the channel attention
通道注意力 The channel attention mechanism allows the network to selectively weight the importance of each channel,resulting in more information output.让 X ∈ R C × H × W X∈R^{C×H×W} X∈RC×H×W表示输入特征图,其中数量H、W、C分别表示其高度、宽度、输入通道数.SE块由两部分组成:Squeeze and excite two parts,They are designed to encode global information and adaptively recalibrate the relationship of channel orientation, respectively.通常,Channel statistics can be generated using global average pooling,This pool is used to embed global spatial information into channel descriptors.The global average pooling operator can be calculated as:
g c = 1 H × W ∑ i = 1 H ∑ j = 1 W x c ( i , j ) g_c=\frac{1}{H×W}\sum_{i=1}^H\sum_{j=1}^W x_c(i,j) gc=H×W1i=1∑Hj=1∑Wxc(i,j)
SE块中第cA passage to the attention of the weight can be written to:
w c = σ ( W 1 σ ( W 0 ( g c ) ) ) w_c = \sigma(W_1\sigma(W_0(g_c))) wc=σ(W1σ(W0(gc)))
其中,符号 σ \sigma σRepresents a regenerative linear unit(ReLU)操作, W 0 ∈ R C × C r W0∈R^{C×\frac{C}r} W0∈RC×rC和 W 1 ∈ R C r × C W_1∈R^{\frac{C}{r}×C} W1∈RrC×C表示全连接(FC)层.通过两个全连接层,可以更有效地组合通道之间的线性信息,contribute to high、Interaction of low channel dimensional information.符号 σ \sigma σ表示激励函数,Usually used in practical applicationss型函数.Use the excitation function,We can assign weights to channels after they interact,从而更有效地提取信息.The generation process of the channel attention weights introduced above is named asSEWeight模块,SEWeightModule diagram as shown in figure2所示.
The motivation of this work is to build a more efficient channel attention mechanism.因此,A new pyramid split attention is proposed(PSA)模块.如图3所示,PSA模块主要分四个步骤实现.首先,By implementing the proposed splitting and splicing(SPC)模块,A multi-scale feature map in the channel direction is obtained.其次,利用SEWeightThe module extracts attention from feature maps of different scales,Get channel-level attention vectors.第三,利用SoftmaxRecalibrate channel-level attention vectors,得到多尺度通道的权重.第四,Apply element-wise output operations to recalibrated weights and corresponding feature maps.最后,A refined feature map with richer multi-scale feature information can be obtained as output.
如图4所示,在提出的PSAThe basic operators for multi-scale feature extraction inSPC,输入特征图X被分割成S部分,用 [ X 0 , X 1 , ⋅ ⋅ ⋅ , X S − 1 ] [X_0,X_1,···,X_{S−1}] [X0,X1,⋅⋅⋅,XS−1]表示.对于每个分割的部分,It has the number of channels C I = C s C^I=\frac{C}s CI=sC,第i个特征图为 X i ∈ R C I × H × W Xi∈R^{C^I×H×W} Xi∈RCI×H×W,i=0,1,···,S−1.请注意,C应该可以被s整除.by this split,We can process input tensors in parallel at multiple scales,Thus, a feature map containing a single type of kernel can be obtained.相应地,Spatial information on each channel-level feature map can be extracted.By using a multi-scale convolution kernel in a pyramid structure,Can produce different spatial resolutions and depths.对于每个分割的部分,It learns multi-scale spatial information independently,and establish a local way of cross-channel interaction.
然而,随着内核大小的增加,The number of parameters will be greatly improved.To handle input tensors at different kernel scales without increasing computational cost,Introduced the method of population convolution,and apply it to the convolution kernels in parallel.进一步,We devised a new criterion for choosing the size of the group,without increasing the number of parameters.The relationship between multiscale kernel size and group size can be written as:
G = 2 K − 1 2 G=2^{\frac{K-1}2} G=22K−1
其中,数量K是核大小,G是组大小.Our ablation experiments have confirmed the above formula,特别是当k×k等于3×3,G的默认值为1时.最后,The multi-scale feature map generating function is given.
F i = C o n v ( k i × k i , G i ) ( X i ) i = 0 , 1 , 2 , ⋅ ⋅ ⋅ , S − 1 F_i = Conv(k_i × k_i, G_i)(X_i) i = 0, 1, 2, · · · , S − 1 Fi=Conv(ki×ki,Gi)(Xi)i=0,1,2,⋅⋅⋅,S−1
其中,第i个内核大小 k i = 2 × ( i + 1 ) + 1 k_i=2×(i+1)+1 ki=2×(i+1)+1,第i个组大小 G i = 2 k i − 1 2 G_i=2^{\frac{k^i−1}{2}} Gi=22ki−1和 F i ∈ R C I × H × W F_i∈R^{C^I×H×W} Fi∈RCI×H×WFeature maps representing different scales.The entire multi-scale preprocessing feature map can be obtained by a concatenation method
F = C a t ( [ F 0 , F 1 , ⋅ ⋅ ⋅ , F S − 1 ] ) F = Cat([F_0, F_1, · · · , F_{S−1}]) F=Cat([F0,F1,⋅⋅⋅,FS−1])
其中, F ∈ R C × H × W F∈R^{C×H×W} F∈RC×H×WTo get multi-scale feature maps.By extracting channel attention weight information from multi-scale preprocessing feature maps,The attention weight vectors of different scales are obtained.在数学上,The vector of attention weights can be expressed as:
Z = Z 0 ⊕ Z 1 ⊕ ⋅ ⋅ ⋅ ⊕ Z S − 1 Z = Z_0 ⊕ Z_1 ⊕ · · · ⊕ Z_{S−1} Z=Z0⊕Z1⊕⋅⋅⋅⊕ZS−1
式中,⊕connection operator, Z i Z_i Zi为 F i F_i Fiattention value,Zweight vector for multi-scale attention.在紧凑的特征描述符 Z i Z_i Zi的引导下,Adaptive selection of different spatial scales across channels using soft attention.A soft assigned weight is given by
a t t i = S o f t m a x ( Z i ) = e x p ( Z i ) ∑ i = 0 S − 1 e x p ( Z i ) att_i = Softmax(Z_i)=\frac{exp(Z_i)}{\sum^{S-1}_{i=0}exp(Z_i)} atti=Softmax(Zi)=∑i=0S−1exp(Zi)exp(Zi)
其中,使用SoftmaxTo obtain a multiscale channel calibration weights againatti,which contains all the location information in space and the attention weights in the channel.通过这样做,It realizes the interaction between local and global channel attention.然后,Note will feature weight calibration of channel series convergence and stitching,Thus, the entire channel attention vector is obtained as
a t t = a t t 0 ⊕ a t t 1 ⊕ ⋅ ⋅ ⋅ ⊕ a t t S − 1 att = att_0 ⊕ att_1 ⊕ · · · ⊕ att_{S−1} att=att0⊕att1⊕⋅⋅⋅⊕attS−1
其中,attAfter said pay attention to the interaction of multi-scale channel weights.然后,We will recalibrate the multi-scale channel weight and corresponding measure of attentionFi的特征图相乘为:
Y i = F i ⨀ a t t i , i = 1 , 2 , 3 , ⋅ ⋅ ⋅ S − 1 Y_i = F_i \bigodot att_i, i = 1, 2, 3, · · · S − 1 Yi=Fi⨀atti,i=1,2,3,⋅⋅⋅S−1
Said in the channel direction of multiplication, Y i Y_i YiFeature map representing the resulting attention weights for multi-scale channel directions.Concatenation operator is more efficient than summation,Because it can not destroy the original features of mapping information overall situation to keep said.总之,The process of obtaining refined output can be written as:
O u t = C a t ( [ Y 0 , Y 1 , ⋅ ⋅ ⋅ , Y S − 1 ] ) Out=Cat([Y_0,Y_1,···,Y_{S−1}]) Out=Cat([Y0,Y1,⋅⋅⋅,YS−1])
Redisson报异常attempt to unlock lock, not locked by current thread by node id解决方案
Button to control the running water light (timer)
Database Plus 的云上之旅:SphereEx 正式开源 ShardingSphere on Cloud 解决方案
cas: 139504-50-0 Maytansine DM1|Mertansine|
node(三) 模块化
IO process thread -> process -> day4
小说里的编程 【连载之二十五】元宇宙里月亮弯弯
OneNote 教程,如何在 OneNote 中创建更多空间?
A young man with strong blood and energy actually became a housekeeper. How did he successfully turn around and change careers?
Axial Turbine Privacy Policy
OneNote Tutorial, How to Create More Spaces in OneNote?
Biotin-EDA|CAS:111790-37-5| Ethylenediamine biotin