当前位置:网站首页>YOLOX改进之一:添加CBAM、SE、ECA注意力机制
YOLOX改进之一:添加CBAM、SE、ECA注意力机制
2022-07-27 12:54:00 【人工智能算法研究院】
前 言:之前发布系列已经有对2020年发布的YOLOv5进行改进,不少朋友咨询YOLOX改进方法,本系列就重点对YOLOX如何改进进行详细介绍,基本跟YOLOv5一致,有细微差异。此后的系列文章,将重点对YOLOX的如何改进进行详细的介绍,目的是为了给那些搞科研的同学需要创新点或者搞工程项目的朋友需要达到更好的效果提供自己的微薄帮助和参考。
需要更多程序资料以及答疑欢迎大家关注——微信公众号:人工智能AI算法工程师
解决问题:本文以加入CBAM双通道注意力机制为例,可以让网络更加关注待检测目标,提高检测效果,解决复杂环境背景下容易错漏检的情况。

添加方法:
第一步:确定添加的位置,作为即插即用的注意力模块,可以添加到YOLOX网络中的任何地方。本文以添加进卷积Conv模块中为例。
第二步:darknet.py构建CBAM模块。
class SE(nn.Module):
def __init__(self, channel, ratio=16):
super(SE, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.fc = nn.Sequential(
nn.Linear(channel, channel // ratio, bias=False),
nn.ReLU(inplace=True),
nn.Linear(channel // ratio, channel, bias=False),
nn.Sigmoid()
)
def forward(self, x):
b, c, _, _ = x.size()
y = self.avg_pool(x).view(b, c)
y = self.fc(y).view(b, c, 1, 1)
return x * y
class ECA(nn.Module):
def __init__(self, channel, b=1, gamma=2):
super(ECA, self).__init__()
kernel_size = int(abs((math.log(channel, 2) + b) / gamma))
kernel_size = kernel_size if kernel_size % 2 else kernel_size + 1
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.conv = nn.Conv1d(1, 1, kernel_size=kernel_size, padding=(kernel_size - 1) // 2, bias=False)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
y = self.avg_pool(x)
y = self.conv(y.squeeze(-1).transpose(-1, -2)).transpose(-1, -2).unsqueeze(-1)
y = self.sigmoid(y)
return x * y.expand_as(x)
class ChannelAttention(nn.Module):
def __init__(self, in_planes, ratio=8):
super(ChannelAttention, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.max_pool = nn.AdaptiveMaxPool2d(1)
# 利用1x1卷积代替全连接
self.fc1 = nn.Conv2d(in_planes, in_planes // ratio, 1, bias=False)
self.relu1 = nn.ReLU()
self.fc2 = nn.Conv2d(in_planes // ratio, in_planes, 1, bias=False)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
avg_out = self.fc2(self.relu1(self.fc1(self.avg_pool(x))))
max_out = self.fc2(self.relu1(self.fc1(self.max_pool(x))))
out = avg_out + max_out
return self.sigmoid(out)
class SpatialAttention(nn.Module):
def __init__(self, kernel_size=7):
super(SpatialAttention, self).__init__()
assert kernel_size in (3, 7), 'kernel size must be 3 or 7'
padding = 3 if kernel_size == 7 else 1
self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
avg_out = torch.mean(x, dim=1, keepdim=True)
max_out, _ = torch.max(x, dim=1, keepdim=True)
x = torch.cat([avg_out, max_out], dim=1)
x = self.conv1(x)
return self.sigmoid(x)
# CBAM注意力机制
class CBAM(nn.Module):
def __init__(self, channel, ratio=8, kernel_size=7):
super(CBAM, self).__init__()
self.channelattention = ChannelAttention(channel, ratio=ratio)
self.spatialattention = SpatialAttention(kernel_size=kernel_size)
def forward(self, x):
x = x*self.channelattention(x)
x = x*self.spatialattention(x)
return x第三步:yolo_pafpn.py中注册我们进行修改的CBAM模块
self.cbam_1 = CBAM(int(in_channels[2] * width)) # 对应dark5输出的1024维度通道
self.cbam_2 = CBAM(int(in_channels[1] * width)) # 对应dark4输出的512维度通道
self.cbam_3 = CBAM(int(in_channels[0] * width)) # 对应dark3输出的256维度通道
def forward(self, input):
"""
Args:
inputs: input images.
Returns:
Tuple[Tensor]: FPN feature.
"""
# backbone
out_features = self.backbone(input)
features = [out_features[f] for f in self.in_features]
[x2, x1, x0] = features
# 3、直接对输入的特征图使用注意力机制
x0 = self.cbam_1(x0)
x1 = self.cbam_2(x1)
x2 = self.cbam_3(x2)结 果:本人在多个数据集上做了大量实验,针对不同的数据集效果不同,同一个数据集的不同添加位置方法也是有差异,需要大家进行实验。有效果有提升的情况占大多数。
需要更多程序资料以及答疑欢迎大家关注——微信公众号:人工智能AI算法工程师
PS:CBAM等多种注意力机制,不仅仅是可以添加进YOLOX,也可以添加进任何其他的深度学习网络,不管是分类还是检测还是分割,主要是计算机视觉领域,都可能会有不同程度的提升效果。
最后,希望能互粉一下,做个朋友,一起学习交流。
边栏推荐
- MySQL startup options and configuration files
- Wechat campus laundry applet graduation design finished product (6) opening defense ppt
- 基于C语言实现线性表的建立、插入、删除、查找等基本操作
- Selenium eight elements positioning and relative locator
- 小程序毕设作品之微信校园洗衣小程序毕业设计成品(6)开题答辩PPT
- West test Shenzhen Stock Exchange listing: annual revenue of 240million, fund-raising of 900million, market value of 4.7 billion
- 纯c手写线程池
- Summary of scaling and coding methods in Feature Engineering
- Is it still time to take the PMP Exam in September?
- 用命令如何返回上级目录
猜你喜欢

Egg swagger doc graphic verification code solution

Accuracy improvement method: efficient visual transformer framework of adaptive tokens (open source)

Real image denoising based on multi-scale residual dense blocks and block connected cascaded u-net

NoSQL —— NoSQL 三大理论基石 —— CAP —— BASE—— 最终一致性
![[training day3] section [greed] [two points]](/img/4f/4130a1ade0ac0003adeddca780ff14.png)
[training day3] section [greed] [two points]

Keras深度学习实战——推荐系统数据编码

opencv图像的缩放平移及旋转

期货手续费标准和保证金比例

【笔记】逻辑斯蒂回归

认知篇----硬件工程师的成才之路之经典
随机推荐
[daily question] 1206. Design jump table
How to test and decrypt the encryption interface
Ncnn compilation and use pnnx compilation and use
在灯塔工厂点亮5G,宁德时代抢先探路中国智造
13. User web layer services (I)
Charles tutorial
平板模切机
In the "meta cosmic space", utonmos will open the digital world of the combination of virtual and real
Wechat campus laundry applet graduation design finished product of applet completion work (3) background function
Is it still time to take the PMP Exam in September?
[training day4] card game [greed]
Fifth, download the PC terminal of personality and solve the problem of being unable to open it
MySQL high availability practical solution MHA
NoSQL -- three theoretical cornerstones of NoSQL -- cap -- Base -- final consistency
Data enhancement in image processing
[leetcode] 592. Fraction addition and subtraction
特征工程中的缩放和编码的方法总结
Gains and losses of desensitization project
Meshlab farthest point sampling (FPS)
[training day4] anticipating [expected DP]