当前位置：网站首页>CBAM for in-depth understanding of the attention mechanism in CV

CBAM for in-depth understanding of the attention mechanism in CV

2022-07-08 02:18:00 【Strawberry sauce toast】

CV Medium Attention Mechanism summary （ Two ）：CBAM

CBAM：Convolutional Block Attention Module

Thesis link ：CBAM（ECCV 2018）

1. Abstract

1.1 CBAM Summary

Given an intermediate feature map, our module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement.
And SE Different modules ,CBAM Combined with passageway And Space Attention mechanism . The author believes that channel attention determines “what is important", Spatial attention determines "where is important".

1.2 CV in Attention The role of mechanism

Besides , The author in Introduction It concisely expounds Attention The role of mechanism , namely ：

Attention not only tells where to focus, it also improves the representation of interests.
Our goal is to increase representation power by using attention mechanism: focusing on important features and suppressing unnecessary ones.
Using attention mechanism can improve the ability of network feature expression .

1.3 CBAM The advantages of modules

CBAM It has the following two advantages ：

And SE comparison , Improved channel attention module , Added spatial attention module ;
And BAM comparison , Not just for bottleneck in , Instead, it can be used in any intermediate convolution module , It's a plug-and-play（ Plug and play ） The attention module .

Two 、 Module details

The CBAM The module is shown in the figure below ：
CBAM flow chart
The following is combined with article 3 Section elaborates CBAM Implementation details of the module .

2.1 Channel Attention Module：focusing on “what”

And SE The difference between modules is that , The author added max-pooling operation , also AvgPool And MaxPool Share the same multi-layer perceptron (multi-layer perceptron, MLP) Reduce learnable parameters .
therefore ,CBAM The channel attention extraction of can be expressed by the following formula ：
Insert picture description here

2.2 Spatial Attention Module: focusing on “where”

First , Perform maximum aggregation and average aggregation on the channel dimension respectively , The magnitude is $H\times W$
Characteristic graph , Then the number of input channels is 2, The number of output channels is 1 The convolution layer extracts spatial attention , The formula is as follows ：
Insert picture description here

2.3 Arrangement of attention modules

Combine channel attention with spatial attention , Get the weighted feature .
The combination sequence and mode of channel attention and spatial attention （ Pictured 1 Shown ）：

Channel in the former , Space behind
Space comes first , Channel in the
Serial
parallel

For the combination sequence and mode , The author proved it by ablation experiment .

2.4 Usage mode

Combination with residual network ：
Insert picture description here

3、 ... and 、PyTorch Realization

import torch
from torch import nn

class ChannelAttentionModule(nn.Module):
    def __init__(self, channel, reduction=16):
        super(ChannelAttentionModule, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d((1, 1))
        self.max_pool = nn.AdaptiveMaxPool2d((1, 1))
        self.shared_MLP = nn.Sequential(
            nn.Conv2d(channel, channel // reduction, kernel_size=1, stride=1, padding=0, bias=False),
            nn.ReLU(inplace=True),
            nn.Conv2d(channel // reduction, channel, kernel_size=1, stride=1, padding=0, bias=False)
        )
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = self.shared_MLP(self.avg_pool(x))
        max_out = self.shared_MLP(self.max_pool(x))
        out = avg_out + max_out
        return self.sigmoid(out)


class SpatialAttentionModule(nn.Module):
    def __init__(self, kernel_size=7, padding=3):
        super(SpatialAttentionModule, self).__init__()
        self.conv2d = nn.Conv2d(in_channels=2, out_channels=1,
                                kernel_size=kernel_size, stride=1, padding=padding, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = torch.mean(x, dim=1, keepdim=True)
        max_out, _ = torch.max(x, dim=1, keepdim=True)  # torch.max returns (values, indices)
        out = torch.cat([avg_out, max_out], dim=1)
        out = self.conv2d(out)
        return self.sigmoid(out)


class CBAM(nn.Module):
    def __init__(self, channel, reduction, kernel_size, padding):
        super(CBAM, self).__init__()
        self.channel_attention = ChannelAttentionModule(channel, reduction)
        self.spatial_attention = SpatialAttentionModule(kernel_size, padding)

    def forward(self, x):
        out = self.channel_attention(x) * x
        out = self.spatial_attention(out) * out
        return out