当前位置:网站首页>Yolox enhanced feature extraction network panet analysis
Yolox enhanced feature extraction network panet analysis
2022-07-02 23:22:00 【Said the shepherdess】
In the last article , Shared YOLOX Of CSPDarknet The Internet , See YOLOX backbone——CSPDarknet The implementation of the
stay CSPDarknet in , There are three levels of output , Namely dark5(20x20x1024)、dark4(40x40x512)、dark3(80x80x256). Output of these three levels , Will enter a network of enhanced feature extraction Panet, Further feature extraction , See the part marked in the red box in the following figure :
Panet The basic idea is , Upsampling deep features , And fuse with shallow features ( See figure above 1~6 Annotation part ), The fused shallow features are then down sampled , Then integrate with deep features ( See figure above 6~10 part ).
stay YOLOX On the official implementation code ,Panet Implementation in yolo_pafpn.py In the document . Combined with the above numbers , The official code is commented :
class YOLOPAFPN(nn.Module):
"""
YOLOv3 model. Darknet 53 is the default backbone of this model.
"""
def __init__(
self,
depth=1.0,
width=1.0,
in_features=("dark3", "dark4", "dark5"),
in_channels=[256, 512, 1024],
depthwise=False,
act="silu",
):
super().__init__()
self.backbone = CSPDarknet(depth, width, depthwise=depthwise, act=act)
self.in_features = in_features
self.in_channels = in_channels
Conv = DWConv if depthwise else BaseConv
self.upsample = nn.Upsample(scale_factor=2, mode="nearest")
# 20x20x1024 -> 20x20x512
self.lateral_conv0 = BaseConv(
int(in_channels[2] * width), int(in_channels[1] * width), 1, 1, act=act
)
# 40x40x1024 -> 40x40x512
self.C3_p4 = CSPLayer(
int(2 * in_channels[1] * width),
int(in_channels[1] * width),
round(3 * depth),
False,
depthwise=depthwise,
act=act,
) # cat
# 40x40x512 -> 40x40x256
self.reduce_conv1 = BaseConv(
int(in_channels[1] * width), int(in_channels[0] * width), 1, 1, act=act
)
# 80x80x512 -> 80x80x256
self.C3_p3 = CSPLayer(
int(2 * in_channels[0] * width), # 2x256
int(in_channels[0] * width), # 256
round(3 * depth),
False,
depthwise=depthwise,
act=act,
)
# bottom-up conv
# 80x80x256 -> 40x40x256
self.bu_conv2 = Conv(
int(in_channels[0] * width), int(in_channels[0] * width), 3, 2, act=act
)
# 40x40x512 -> 40x40x512
self.C3_n3 = CSPLayer(
int(2 * in_channels[0] * width), # 2*256
int(in_channels[1] * width), # 512
round(3 * depth),
False,
depthwise=depthwise,
act=act,
)
# bottom-up conv
# 40x40x512 -> 20x20x512
self.bu_conv1 = Conv(
int(in_channels[1] * width), int(in_channels[1] * width), 3, 2, act=act
)
# 20x20x1024 -> 20x20x1024
self.C3_n4 = CSPLayer(
int(2 * in_channels[1] * width), # 2*512
int(in_channels[2] * width), # 1024
round(3 * depth),
False,
depthwise=depthwise,
act=act,
)
def forward(self, input):
"""
Args:
inputs: input images.
Returns:
Tuple[Tensor]: FPN feature.
"""
# backbone
out_features = self.backbone(input)
features = [out_features[f] for f in self.in_features]
[x2, x1, x0] = features
# The first 1 Step , For output feature map Convolution
# 20x20x1024 -> 20x20x512
fpn_out0 = self.lateral_conv0(x0) # 1024->512/32
# The first 2 Step , Right. 1 Output in step feature map Sample up
# Upsampling, 20x20x512 -> 40x40x512
f_out0 = self.upsample(fpn_out0) # 512/16
# The first 3 Step ,concat + CSP layer
# 40x40x512 + 40x40x512 -> 40x40x1024
f_out0 = torch.cat([f_out0, x1], 1) # 512->1024/16
# 40x40x1024 -> 40x40x512
f_out0 = self.C3_p4(f_out0) # 1024->512/16
# The first 4 Step , Right. 3 Step output feature map Convolution
# 40x40x512 -> 40x40x256
fpn_out1 = self.reduce_conv1(f_out0) # 512->256/16
# The first 5 Step , Continue sampling
# 40x40x256 -> 80x80x256
f_out1 = self.upsample(fpn_out1) # 256/8
# The first 6 Step ,concat+CSPLayer, Output to yolo head
# 80x80x256 + 80x80x256 -> 80x80x512
f_out1 = torch.cat([f_out1, x2], 1) # 256->512/8
# 80x80x512 -> 80x80x256
pan_out2 = self.C3_p3(f_out1) # 512->256/8
# The first 7 Step , Down sampling
# 80x80x256 -> 40x40x256
p_out1 = self.bu_conv2(pan_out2) # 256->256/16
# The first 8 Step ,concat + CSPLayer, Output to yolo head
# 40x40x256 + 40x40x256 = 40x40x512
p_out1 = torch.cat([p_out1, fpn_out1], 1) # 256->512/16
# 40x40x512 -> 40x40x512
pan_out1 = self.C3_n3(p_out1) # 512->512/16
# The first 9 Step , Continue downsampling
# 40x40x512 -> 20x20x512
p_out0 = self.bu_conv1(pan_out1) # 512->512/32
# The first 10 Step ,concat + CSPLayer, Output to yolo head
# 20x20x512 + 20x20x512 -> 20x20x1024
p_out0 = torch.cat([p_out0, fpn_out0], 1) # 512->1024/32
# 20x20x1024 -> 20x20x1024
pan_out0 = self.C3_n4(p_out0) # 1024->1024/32
outputs = (pan_out2, pan_out1, pan_out0)
return outputs
Reference resources :Pytorch Build your own YoloX Target detection platform (Bubbliiiing Deep learning course )_ Bili, Bili _bilibili
边栏推荐
- Tiktok actual combat ~ number of likes pop-up box
- Compose 中的 'ViewPager' 详解 | 开发者说·DTalk
- Tronapi wave field interface - source code without encryption - can be opened twice - interface document attached - packaging based on thinkphp5 - detailed guidance of the author - July 1, 2022 08:43:
- What can I do after buying a domain name?
- Looking at Ctrip's toughness and vision from the Q1 financial report in 2022
- Go project operation method
- 跨境电商如何通过打好数据底座,实现低成本稳步增长
- Construction of Hisilicon 3559 universal platform: draw a frame on the captured YUV image
- 数字图像处理实验目录
- Win11如何开启目视控制?Win11开启目视控制的方法
猜你喜欢
基于FPGA的VGA协议实现
Win11如何开启目视控制?Win11开启目视控制的方法
Successfully changed Splunk default URL root path
为什么RTOS系统要使用MPU?
Chow-Liu Tree
Set right click to select vs code to open the file
Prometheus deployment
Print out mode of go
Hisilicon VI access video process
Innovation strength is recognized again! Tencent security MSS was the pioneer of cloud native security guard in 2022
随机推荐
PotPlayer设置最小化的快捷键
Pandora IOT development board learning (HAL Library) - Experiment 4 serial port communication experiment (learning notes)
Typical case of data annotation: how does jinglianwen technology help enterprises build data solutions
STM32之ADC
提交代码流程
Print out mode of go
BBR 遭遇 CUBIC
“一个优秀程序员可抵五个普通程序员!”
Markdown basic grammar
Methods to solve the tampering of Chrome browser and edeg browser homepage
公司里只有一个测试是什么体验?听听他们怎么说吧
How difficult is it to be high? AI rolls into the mathematics circle, and the accuracy rate of advanced mathematics examination is 81%!
Construction of Hisilicon 3559 universal platform: draw a frame on the captured YUV image
从底层结构开始学习FPGA----Xilinx ROM IP的定制与测试
Call vs2015 with MATLAB to compile vs Project
MarkDown基本语法
C# MVC创建一个视图摆脱布局的影响
万物并作,吾以观复|OceanBase 政企行业实践
4 special cases! Schools in area a adopt the re examination score line in area B!
基于FPGA的VGA协议实现