当前位置：网站首页>Neck modules of the yolo series

Neck modules of the yolo series

2022-08-04 12:12:00 【Sand side dishes】

学习：【Make YOLO Great Again】YOLOv1-v7Big analysis of the whole series（Neck篇）

本文研究yolo系列的Neck模块.yolov1、yolov2没有使用Neck模块,yolov3开始使用.NeckThe purpose of the module is to fuse the features of different layers to detect large, medium and small objects.

模块
yolov3	FPN
yolov4	spp+FPN
yolov5	spp+FPN,Concat层后的CBL模块改成了CSP_V5模块
yolox	spp+FPN
yolov7	sppscp+优化的PAN(Concat层前的CBL改成MPConv,Concat层后使用E-ELAN)

在进行yolo系列NeckBefore module study,先研究FPN、SPP和PAN模块.

1.FPN（feature pyramid networks）

目的：Improve detection of small targets.

原来很多目标检测算法都是只采用高层特征进行预测,高层的特征语义信息比较丰富,但是分辨率较低,目标位置比较粗略.假设在深层网络中,最后的高层特征图中一个像素可能对应着输出图像20*20的像素区域,那么小于20*20像素的小物体的特征大概率已经丢失.与此同时,低层的特征语义信息比较少,但是目标位置准确,这是对小目标检测有帮助的.FPN将高层特征与底层特征进行融合,从而同时利用低层特征的高分辨率和高层特征的丰富语义信息,并进行了多尺度特征的独立预测,对小物体的检测效果有明显的提升.

2.SPP(Spatial Pyramid Pooling)

SPP,即空间金字塔池化.SPPThe purpose is to solve the problem of arbitrary size of input data.SPP网络用在YOLOv4The purpose of isIncrease the receptive field of the network

SPP的使用方法：

First divide the input：Divide the input features into different parts：最左边有16个蓝色小格子的图,It means to split from the input features16份,16X256中的256表示的是channel,即SPP对每一层都分成16份(不一定是等比分).中间的4个绿色小格子和右边1个紫色大格子也同理,That is, the input features are divided into separately4X256和1X256份.（Note that how many portions are divided into the above can be customized）
Pool each feature：一般选择MAX Pooling,即对每一份进行最大池化.看上图,通过SPP层,The input features are transformed into 16X256+4X256+1X256 = 21X256的矩阵.
A fully connected layer is connected behind：连接一个1X10752的全连接层.This solves the problem of arbitrary input data size.

SPPMedium convolution kernel 尺寸、and step size calculation method：

在这里插入图片描述

假设输入数据大小是 (7,11), 池化数量 (4,4):
那么核大小为 (2,3), 步长大小为 (2,3), padding 为 (1,1), 得到池化后的矩阵大小的确是 4∗4.

SPP的pytorch实现：

#coding=utf-8

import math
import torch
import torch.nn.functional as F

# 构建SPP层(空间金字塔池化层)
class SPPLayer(torch.nn.Module):

    def __init__(self, num_levels, pool_type='max_pool'):
        super(SPPLayer, self).__init__()

        self.num_levels = num_levels
        self.pool_type = pool_type

    def forward(self, x):
        num, c, h, w = x.size() # num:样本数量 c:通道数 h:高 w:宽
        for i in range(self.num_levels):
            level = i+1
            kernel_size = (math.ceil(h / level), math.ceil(w / level))
            stride = (math.ceil(h / level), math.ceil(w / level))
            pooling = (math.floor((kernel_size[0]*level-h+1)/2), math.floor((kernel_size[1]*level-w+1)/2))

            # 选择池化方式 
            if self.pool_type == 'max_pool':
                tensor = F.max_pool2d(x, kernel_size=kernel_size, stride=stride, padding=pooling).view(num, -1)
            else:
                tensor = F.avg_pool2d(x, kernel_size=kernel_size, stride=stride, padding=pooling).view(num, -1)

            # 展开、拼接
            if (i == 0):
                x_flatten = tensor.view(num, -1)
            else:
                x_flatten = torch.cat((x_flatten, tensor.view(num, -1)), 1)
        return x_flatten

SPPCSP

SPP的优化,在SPP模块基础上在最后增加concat操作,与SPP模块之前的特征图进行融合,更加丰富了特征信息.

3.PANet

网络结构如下图所示,与FPN相比,PANet 在UpSample之后又加了DownSample的操作.PANetCrazy fusion of features from different levels,其在FPN模块的基础上增加了自底向上的特征金字塔结构,保留了更多的浅层位置特征,将整体特征提取能力进一步提升.