当前位置:网站首页>In depth understanding of the se module of the attention mechanism in CV
In depth understanding of the se module of the attention mechanism in CV
2022-07-08 02:18:00 【Strawberry sauce toast】
CV Medium Attention Mechanism summary ( One ):SE modular
Squeeze-and-Excitation Networks
Thesis link :Squeeze-and-Excitation Networks
1. Abstract
In this work, we focus instead on the channel relationship and propose a novel architectural unit, which we term the “Squeeze-and-Excitation”(SE) block,that adaptively recalibrates( Recalibrate )channel-wise feature responses by explicitly modelling interdependencies between channels.
SE The module belongs to the channel attention mechanism , It can adaptively learn the dependencies between different channels .
2. SE Detailed understanding of the module
Given in the original SE The module legend is as follows :
Combined with article 3 The contents of this section provide a detailed understanding of the following two issues :
- SE How modules learn about dependencies between different channels ?
- SE How does the module use the channel information to guide the model to carry out differentiated weighted learning of features ?
2.1 Multiple input and multiple output channels
chart 1 in ① Part describes the convolution layer of multiple input and multiple output channels .
Multiple input channels : Each channel of the input characteristic graph corresponds to a two-dimensional convolution kernel , The sum of convolution results of all input channels is the final convolution result , As shown in the figure below ( For simplicity , The deviation is omitted ):
In style , C C C It means the first one C C C Output channels , S S S It means the first one S S S Input channels .
Each input channel corresponds to a two-dimensional convolution kernel , therefore : Number of channels of three-dimensional convolution kernel = Enter the number of channels of the characteristic graph .
2.2 Multiple output channels
Each output channel corresponds to an independent three-dimensional convolution kernel , therefore , The number of channels to output the characteristic graph = Number of three-dimensional convolution kernels . Usually , The number of output channels is a super parameter .
According to the principle of multiple input and multiple output channels , It is not difficult for us to understand that in conventional convolution calculation , The correlation between different input channels is hidden in each output channel , And only “ Add up ” This simple way , Different output channels correspond to independent three-dimensional convolution kernels , therefore , The correlation between input channels is not reasonably utilized .
Therefore, the author of the paper proposes SE Module to explicitly utilize information between different input channels .
2.3 Squeeze-and-Excitation Block
2.3.1 Squeeze: Global Information Embedding
The author adopts global average convergence (Global Average Pooling) Get the information of each channel .
z c = F s q ( u c ) = 1 H × W ∑ i = 1 H ∑ j = 1 W u c ( i , j ) z_c=\bold F_{sq}(\bold u_c)=\frac{1}{H\times W}\sum_{i=1}^{H} \sum_{j=1}^{W}u_c(i,j) zc=Fsq(uc)=H×W1i=1∑Hj=1∑Wuc(i,j)
Why do you do this ? The original text explains :
Each of the learned filters operates with a local receptive field and consequently each unit of the transformation output U U U is unable to exploit contextual information outside of this region.
On a sheet of H × W H\times W H×W In the characteristic diagram of , Each element only corresponds to a local area in the input characteristic graph ( It's the receptive field ), Therefore, each element in the output feature graph contains only local information rather than global information .
To mitigate this problem, we propose to squeeze global spatial information into a channel descriptor. This is achieved by using global average pooling to generated chanel-wise statistics.
The author uses the global average convergence to get the overall situation features , The purpose is to fuse local information to get global information , The reason for adopting global average convergence is that it is simple to implement , Other more delicate but complex operations can also be used .
2.3.2 Excitation: Adaptive Recaloibration
Excitation( incentive ) The module is to better get the dependencies between various channels , Two requirements need to be met :
- It can learn the nonlinear relationship between various channels ;
- It can ensure that each channel has a corresponding output , obtain soft-label, instead of one-hot Type vector .
therefore , The author uses two full connection layers to learn nonlinear relations , Finally using sigmoid Activation function .
And in order to reduce model parameters and complexity , Adopted “bottleneck” Thought design full connection layer , Then a super parameter is generated : r r r, Wen Zhongling r = 16 r=16 r=16.
Turn off On by What Well send use s i g m o i d Letter Count Of thinking Examination ? \color{red}{ On why to use sigmoid Thinking about functions ?} Turn off On by What Well send use sigmoid Letter Count Of thinking Examination ?
sigmoid Is one of the common activation functions ,SE The final output of the module is equivalent to the weight of each channel learned , First of all, ensure that the weight cannot be 0, by 0 Instead, it will lose a lot of information , So it can't be used ReLU; in addition , The range you want here is [ 0 , 1 ] [0,1] [0,1] The weight of , Not to highlight a certain channel , Different from “ Multi category classification ” problem , More like “ Multi label classification ” problem , So I'm going to use softmax Function is not appropriate .
Excitation The module formula represents :
s = F e x ( x , W ) = σ ( g ( z , W ) ) = σ ( W 2 δ ( W 1 z ) ) s=\bold F_{ex}(\bold x, \bold W)=\sigma(g(\bold z,\bold W))=\sigma(\bold W_2\delta(W_1 \bold z)) s=Fex(x,W)=σ(g(z,W))=σ(W2δ(W1z))
In style , δ ( ∙ ) \delta(\bullet) δ(∙) Express ReLU Activation function , σ ( ∙ ) \sigma(\bullet) σ(∙) Express sigmoid Activation function .
2.3.3 weighting
The final will be SE The output of the module acts on the output of the convolution layer , Get the output characteristic diagram of channel attention weighting .
Use what you get channel-wise vector , Each element of the characteristic graph of each channel is weighted ( Understand the formula (4) Then there is the product of scalar and matrix ).
3. SE Use of modules
3、 ... and 、PyTorch Realization SE modular
3.1 Use the full connection layer to realize Excitation
class SE(nn.Module):
def __init__(self, channels, reduction=16): # I think so 16, If the number of feature map channels is small , It can be adjusted properly
super(SE, self).__init__()
self.squeeze = nn.AdaptiveAvgPool2d((1, 1))
self.excitation = nn.Sequential(
nn.Linear(channels, channels // reduction, bias=False),
nn.ReLU(inplace=True),
nn.Linear(channels // reduction, channels, bias=False),
nn.Sigmoid())
def forward(self, x):
b, c, _, _ = x.size()
y = self.squeeze(x).view(b, c)
y = self.excitation(y).view(b, c, 1, 1)
return x * y.expand_as(x)
3.2 Use 1 × 1 1\times 1 1×1 Convolution realization Excitation
Use 1 × 1 1\times 1 1×1 Convolution instead of full connection layer , Avoid dimensional transformations between matrices and vectors
class SE(nn.Module):
def __init__(self, channels, reduction=2):
super(SE, self).__init__()
self.squeeze = nn.AdaptiveAvgPool2d((1, 1))
self.excitation = nn.Sequential(
nn.Conv2d(channels, channels // reduction, kernel_size=1, stride=1, padding=0, bias=False),
nn.ReLU(inplace=True),
nn.Conv2d(channels // reduction, channels, kernel_size=1, stride=1, padding=0, bias=False),
nn.Sigmoid())
def forward(self, x):
b, c, _, _ = x.size()
y = self.squeeze(x)
print(y.shape)
y = self.excitation(y)
print(y.shape)
return x * y
边栏推荐
- 发现值守设备被攻击后分析思路
- The bank needs to build the middle office capability of the intelligent customer service module to drive the upgrade of the whole scene intelligent customer service
- 喜欢测特曼的阿洛
- [knowledge map paper] Devine: a generative anti imitation learning framework for knowledge map reasoning
- Introduction à l'outil nmap et aux commandes communes
- Opengl/webgl shader development getting started guide
- LeetCode精选200道--链表篇
- Node JS maintains a long connection
- Introduction to ADB tools
- Semantic segmentation | learning record (3) FCN
猜你喜欢
谈谈 SAP iRPA Studio 创建的本地项目的云端部署问题
C language -cmake cmakelists Txt tutorial
Mqtt x newsletter 2022-06 | v1.8.0 release, new mqtt CLI and mqtt websocket tools
Alo who likes TestMan
《通信软件开发与应用》课程结业报告
Completion report of communication software development and Application
Master go game through deep neural network and tree search
Keras' deep learning practice -- gender classification based on inception V3
Introduction to grpc for cloud native application development
Wechat applet uniapp page cannot jump: "navigateto:fail can not navigateto a tabbar page“
随机推荐
【每日一题】736. Lisp 语法解析
Literature reading and writing
How does the bull bear cycle and encryption evolve in the future? Look at Sequoia Capital
Xmeter newsletter 2022-06 enterprise v3.2.3 release, error log and test report chart optimization
The bank needs to build the middle office capability of the intelligent customer service module to drive the upgrade of the whole scene intelligent customer service
电路如图,R1=2kΩ,R2=2kΩ,R3=4kΩ,Rf=4kΩ。求输出与输入关系表达式。
XXL job of distributed timed tasks
Kwai applet guaranteed payment PHP source code packaging
Coreldraw2022 download and install computer system requirements technical specifications
leetcode 869. Reordered Power of 2 | 869. 重新排序得到 2 的幂(状态压缩)
EMQX 5.0 发布:单集群支持 1 亿 MQTT 连接的开源物联网消息服务器
[error] error loading H5 weight attributeerror: 'STR' object has no attribute 'decode‘
What are the types of system tests? Let me introduce them to you
Semantic segmentation | learning record (5) FCN network structure officially implemented by pytoch
In the digital transformation of the financial industry, the integration of business and technology needs to go through three stages
实现前缀树
WPF custom realistic wind radar chart control
A comprehensive and detailed explanation of static routing configuration, a quick start guide to static routing
谈谈 SAP 系统的权限管控和事务记录功能的实现
Completion report of communication software development and Application