当前位置：网站首页>Yolox enhanced feature extraction network panet analysis

Yolox enhanced feature extraction network panet analysis

2022-07-02 23:22:00 【Said the shepherdess】

In the last article , Shared YOLOX Of CSPDarknet The Internet , See YOLOX backbone——CSPDarknet The implementation of the

stay CSPDarknet in , There are three levels of output , Namely dark5（20x20x1024）、dark4（40x40x512）、dark3（80x80x256）. Output of these three levels , Will enter a network of enhanced feature extraction Panet, Further feature extraction , See the part marked in the red box in the following figure ：

Panet The basic idea is , Upsampling deep features , And fuse with shallow features （ See figure above 1~6 Annotation part ）, The fused shallow features are then down sampled , Then integrate with deep features （ See figure above 6~10 part ）.

stay YOLOX On the official implementation code ,Panet Implementation in yolo_pafpn.py In the document . Combined with the above numbers , The official code is commented ：

class YOLOPAFPN(nn.Module):
    """
    YOLOv3 model. Darknet 53 is the default backbone of this model.
    """

    def __init__(
        self,
        depth=1.0,
        width=1.0,
        in_features=("dark3", "dark4", "dark5"),
        in_channels=[256, 512, 1024],
        depthwise=False,
        act="silu",
    ):
        super().__init__()
        self.backbone = CSPDarknet(depth, width, depthwise=depthwise, act=act)
        self.in_features = in_features
        self.in_channels = in_channels
        Conv = DWConv if depthwise else BaseConv

        self.upsample = nn.Upsample(scale_factor=2, mode="nearest")

        # 20x20x1024 -> 20x20x512
        self.lateral_conv0 = BaseConv(
            int(in_channels[2] * width), int(in_channels[1] * width), 1, 1, act=act
        )

        # 40x40x1024 -> 40x40x512
        self.C3_p4 = CSPLayer(
            int(2 * in_channels[1] * width),
            int(in_channels[1] * width),
            round(3 * depth),
            False,
            depthwise=depthwise,
            act=act,
        )  # cat

        # 40x40x512 -> 40x40x256
        self.reduce_conv1 = BaseConv(
            int(in_channels[1] * width), int(in_channels[0] * width), 1, 1, act=act
        )

        # 80x80x512 -> 80x80x256
        self.C3_p3 = CSPLayer(
            int(2 * in_channels[0] * width),    # 2x256
            int(in_channels[0] * width),        # 256
            round(3 * depth),
            False,
            depthwise=depthwise,
            act=act,
        )

        # bottom-up conv
        # 80x80x256 -> 40x40x256
        self.bu_conv2 = Conv(
            int(in_channels[0] * width), int(in_channels[0] * width), 3, 2, act=act
        )

        # 40x40x512 -> 40x40x512
        self.C3_n3 = CSPLayer(
            int(2 * in_channels[0] * width),     # 2*256
            int(in_channels[1] * width),         # 512
            round(3 * depth),
            False,
            depthwise=depthwise,
            act=act,
        )

        # bottom-up conv
        # 40x40x512 -> 20x20x512
        self.bu_conv1 = Conv(
            int(in_channels[1] * width), int(in_channels[1] * width), 3, 2, act=act
        )

        # 20x20x1024 -> 20x20x1024
        self.C3_n4 = CSPLayer(
            int(2 * in_channels[1] * width),      # 2*512
            int(in_channels[2] * width),          # 1024
            round(3 * depth),
            False,
            depthwise=depthwise,
            act=act,
        )

    def forward(self, input):
        """
        Args:
            inputs: input images.

        Returns:
            Tuple[Tensor]: FPN feature.
        """

        #  backbone
        out_features = self.backbone(input)
        features = [out_features[f] for f in self.in_features]
        [x2, x1, x0] = features

        #  The first 1 Step , For output feature map Convolution 
        # 20x20x1024 -> 20x20x512
        fpn_out0 = self.lateral_conv0(x0)  # 1024->512/32

        #  The first 2 Step , Right. 1 Output in step feature map Sample up 
        # Upsampling, 20x20x512 -> 40x40x512
        f_out0 = self.upsample(fpn_out0)  # 512/16

        #  The first 3 Step ,concat + CSP layer
        # 40x40x512 + 40x40x512 -> 40x40x1024
        f_out0 = torch.cat([f_out0, x1], 1)  # 512->1024/16
        # 40x40x1024 -> 40x40x512
        f_out0 = self.C3_p4(f_out0)  # 1024->512/16

        #  The first 4 Step , Right. 3 Step output feature map Convolution 
        # 40x40x512 -> 40x40x256
        fpn_out1 = self.reduce_conv1(f_out0)  # 512->256/16

        #  The first 5 Step , Continue sampling 
        # 40x40x256 -> 80x80x256
        f_out1 = self.upsample(fpn_out1)  # 256/8

        #  The first 6 Step ,concat+CSPLayer, Output to yolo head
        # 80x80x256 + 80x80x256 -> 80x80x512
        f_out1 = torch.cat([f_out1, x2], 1)  # 256->512/8
        # 80x80x512 -> 80x80x256
        pan_out2 = self.C3_p3(f_out1)  # 512->256/8

        #  The first 7 Step , Down sampling 
        # 80x80x256 -> 40x40x256
        p_out1 = self.bu_conv2(pan_out2)  # 256->256/16

        #  The first 8 Step ,concat + CSPLayer,  Output to yolo head
        # 40x40x256 + 40x40x256 = 40x40x512
        p_out1 = torch.cat([p_out1, fpn_out1], 1)  # 256->512/16
        # 40x40x512 -> 40x40x512
        pan_out1 = self.C3_n3(p_out1)  # 512->512/16

        #  The first 9 Step ,  Continue downsampling 
        # 40x40x512 -> 20x20x512
        p_out0 = self.bu_conv1(pan_out1)  # 512->512/32

        #  The first 10 Step ,concat + CSPLayer,  Output to yolo head
        # 20x20x512 + 20x20x512 -> 20x20x1024
        p_out0 = torch.cat([p_out0, fpn_out0], 1)  # 512->1024/32
        # 20x20x1024 -> 20x20x1024
        pan_out0 = self.C3_n4(p_out0)  # 1024->1024/32

        outputs = (pan_out2, pan_out1, pan_out0)
        return outputs

Reference resources ：Pytorch Build your own YoloX Target detection platform （Bubbliiiing Deep learning course ）_ Bili, Bili _bilibili

原网站

版权声明
本文为[Said the shepherdess]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/183/202207022205595751.html

当前位置：网站首页>Yolox enhanced feature extraction network panet analysis

Yolox enhanced feature extraction network panet analysis

边栏推荐

猜你喜欢

随机推荐