当前位置:网站首页>YOLOX加强特征提取网络Panet分析
YOLOX加强特征提取网络Panet分析
2022-07-02 22:13:00 【牧羊女说】
在上一篇文章中,分享了YOLOX的CSPDarknet网络,详见YOLOX backbone——CSPDarknet的实现
在CSPDarknet中,有三个层次的输出, 分别是dark5(20x20x1024)、dark4(40x40x512)、dark3(80x80x256)。这三个层次的输出,会进入一个加强特征提取网络Panet,进一步进行特征提取,见下图红框标出来的部分:
Panet基本思想是,将深层特征进行上采样,并与浅层特征进行融合(见图上1~6标注部分),融合后的浅层特征再进行下采样,然后再与深层特征融合(见图上6~10部分)。
在YOLOX的官方实现代码上,Panet的实现在yolo_pafpn.py文件中的。结合上面数字标注,对官方代码进行了注释:
class YOLOPAFPN(nn.Module):
"""
YOLOv3 model. Darknet 53 is the default backbone of this model.
"""
def __init__(
self,
depth=1.0,
width=1.0,
in_features=("dark3", "dark4", "dark5"),
in_channels=[256, 512, 1024],
depthwise=False,
act="silu",
):
super().__init__()
self.backbone = CSPDarknet(depth, width, depthwise=depthwise, act=act)
self.in_features = in_features
self.in_channels = in_channels
Conv = DWConv if depthwise else BaseConv
self.upsample = nn.Upsample(scale_factor=2, mode="nearest")
# 20x20x1024 -> 20x20x512
self.lateral_conv0 = BaseConv(
int(in_channels[2] * width), int(in_channels[1] * width), 1, 1, act=act
)
# 40x40x1024 -> 40x40x512
self.C3_p4 = CSPLayer(
int(2 * in_channels[1] * width),
int(in_channels[1] * width),
round(3 * depth),
False,
depthwise=depthwise,
act=act,
) # cat
# 40x40x512 -> 40x40x256
self.reduce_conv1 = BaseConv(
int(in_channels[1] * width), int(in_channels[0] * width), 1, 1, act=act
)
# 80x80x512 -> 80x80x256
self.C3_p3 = CSPLayer(
int(2 * in_channels[0] * width), # 2x256
int(in_channels[0] * width), # 256
round(3 * depth),
False,
depthwise=depthwise,
act=act,
)
# bottom-up conv
# 80x80x256 -> 40x40x256
self.bu_conv2 = Conv(
int(in_channels[0] * width), int(in_channels[0] * width), 3, 2, act=act
)
# 40x40x512 -> 40x40x512
self.C3_n3 = CSPLayer(
int(2 * in_channels[0] * width), # 2*256
int(in_channels[1] * width), # 512
round(3 * depth),
False,
depthwise=depthwise,
act=act,
)
# bottom-up conv
# 40x40x512 -> 20x20x512
self.bu_conv1 = Conv(
int(in_channels[1] * width), int(in_channels[1] * width), 3, 2, act=act
)
# 20x20x1024 -> 20x20x1024
self.C3_n4 = CSPLayer(
int(2 * in_channels[1] * width), # 2*512
int(in_channels[2] * width), # 1024
round(3 * depth),
False,
depthwise=depthwise,
act=act,
)
def forward(self, input):
"""
Args:
inputs: input images.
Returns:
Tuple[Tensor]: FPN feature.
"""
# backbone
out_features = self.backbone(input)
features = [out_features[f] for f in self.in_features]
[x2, x1, x0] = features
# 第1步,对输出feature map进行卷积
# 20x20x1024 -> 20x20x512
fpn_out0 = self.lateral_conv0(x0) # 1024->512/32
# 第2步,对第1步中输出的feature map进行上采样
# Upsampling, 20x20x512 -> 40x40x512
f_out0 = self.upsample(fpn_out0) # 512/16
# 第3步,concat + CSP layer
# 40x40x512 + 40x40x512 -> 40x40x1024
f_out0 = torch.cat([f_out0, x1], 1) # 512->1024/16
# 40x40x1024 -> 40x40x512
f_out0 = self.C3_p4(f_out0) # 1024->512/16
# 第4步,对第3步输出的feature map进行卷积
# 40x40x512 -> 40x40x256
fpn_out1 = self.reduce_conv1(f_out0) # 512->256/16
# 第5步,继续上采样
# 40x40x256 -> 80x80x256
f_out1 = self.upsample(fpn_out1) # 256/8
# 第6步,concat+CSPLayer,输出到yolo head
# 80x80x256 + 80x80x256 -> 80x80x512
f_out1 = torch.cat([f_out1, x2], 1) # 256->512/8
# 80x80x512 -> 80x80x256
pan_out2 = self.C3_p3(f_out1) # 512->256/8
# 第7步,下采样
# 80x80x256 -> 40x40x256
p_out1 = self.bu_conv2(pan_out2) # 256->256/16
# 第8步,concat + CSPLayer, 输出到yolo head
# 40x40x256 + 40x40x256 = 40x40x512
p_out1 = torch.cat([p_out1, fpn_out1], 1) # 256->512/16
# 40x40x512 -> 40x40x512
pan_out1 = self.C3_n3(p_out1) # 512->512/16
# 第9步, 继续下采样
# 40x40x512 -> 20x20x512
p_out0 = self.bu_conv1(pan_out1) # 512->512/32
# 第10步,concat + CSPLayer, 输出到yolo head
# 20x20x512 + 20x20x512 -> 20x20x1024
p_out0 = torch.cat([p_out0, fpn_out0], 1) # 512->1024/32
# 20x20x1024 -> 20x20x1024
pan_out0 = self.C3_n4(p_out0) # 1024->1024/32
outputs = (pan_out2, pan_out1, pan_out0)
return outputs
参考:Pytorch 搭建自己的YoloX目标检测平台(Bubbliiiing 深度学习 教程)_哔哩哔哩_bilibili
边栏推荐
- 20220527_数据库过程_语句留档
- 从底层结构开始学习FPGA----Xilinx ROM IP的定制与测试
- 归并排序详解及应用
- Hisilicon VI access video process
- 抖音实战~点赞数量弹框
- MySQL queries nearby data And sort by distance
- SQL advanced syntax
- Sword finger offer II 099 Sum of minimum paths - double hundred code
- [npuctf2020]ezlogin XPath injection
- 潘多拉 IOT 开发板学习(HAL 库)—— 实验4 串口通讯实验(学习笔记)
猜你喜欢
Cryptography -- the mode of block cipher
购买完域名之后能干什么事儿?
“一个优秀程序员可抵五个普通程序员!”
Getting started with golang: for Range an alternative method of modifying the values of elements in slices
面试过了,起薪16k
Is 408 not fragrant? The number of universities taking the 408 examination this year has basically not increased!
Use of recyclerview with viewbinding
Print out mode of go
Generics and reflection, this is enough
ADC of stm32
随机推荐
PMP project integration management
MarkDown基本语法
[adjustment] postgraduate enrollment of Northeast Petroleum University in 2022 (including adjustment)
The difference between new and make in golang
Win11系统explorer频繁卡死无响应的三种解决方法
Tronapi-波场接口-源码无加密-可二开--附接口文档-基于ThinkPHP5封装-作者详细指导-2022年7月1日08:43:06
Call vs2015 with MATLAB to compile vs Project
景联文科技低价策略帮助AI企业降低模型训练成本
(stinger) use pystinger Socks4 to go online and not go out of the network host
Redis 过期策略+conf 记录
The concepts of terminal voltage, phase voltage and line voltage in FOC vector control and BLDC control are still unclear
Sword finger offer II 099 Sum of minimum paths - double hundred code
[Yangcheng cup 2020] easyphp
Xshell configuration xforward forwarding Firefox browser
STM32之ADC
[favorite poems] OK, song
Configuration clic droit pour choisir d'ouvrir le fichier avec vs Code
损失函数~
Construction of Hisilicon 3559 universal platform: rotation operation on the captured YUV image
BBR 遭遇 CUBIC