当前位置：网站首页>Yolov5 Lite: experiment and thinking of repovgg re parameterization on the industrial landing of Yolo

Yolov5 Lite: experiment and thinking of repovgg re parameterization on the industrial landing of Yolo

2022-07-08 02:20:00 【pogg_】

Insert picture description here
QQ Communication group ：993965802

The copyright of this article belongs to GiantPandaCV, Please do not reprint without permission

This experiment mainly draws lessons from repvgg The idea of re parameterization , The original 3×3conv Replace with Repvgg Block, For the original YOLO Model rising point .

Preface ： Once before shufflenetv2 And yolov5 The combination of , The purpose is to adapt arm Series of chips , Give Way yolov5 It can also achieve real-time performance on the end-side equipment . But in gpu perhaps npu Has also been trying to experiment , The purpose of such experiments is clear , It's not demanding , Mainly hope yolov5 It can speed up while maintaining the original accuracy .

experiment

This time the model is mainly for reference repvgg The idea of re parameterization , The original 3×3conv Replace with repvgg block, In the process of training , Using a multi branch model , And when deploying and reasoning , It uses the model of converting multiple branches into one path .
Insert picture description here
analogy repvgg The views expressed in the paper , there baseline The choice is yolov5s, Yes yolov5s Of 3×3conv refactoring , Separate one 1×1conv The side branch of .

In reasoning , Fuse collateral branches into 3×3 In convolution of , The model at this time is the same as the original yolov5s There is no difference in the model
Insert picture description here
Before that , Use the most direct way to yolov5s Make magic changes , That is, directly replace backbone The way , But it is found that the parameter quantity and FLOPs Higher , The reproduction accuracy is closest to yolov5s Yes. repvgg-A1, as follows backbone Replace with A1 Of yolov5s：
Insert picture description here
Then , In order to suppress Flops And the increase of parameters , Take use of repvgg block Replace yolov5s Of 3×3conv The way .

The difference between the two Flops The ratio and parameter ratio are about 2.75 and 1.85.

performance

Through ablation experiments , It is concluded that the yolov5s And fusion repvgg block Of yolov5s The performance differences are as follows ：
Insert picture description here
Evaluated here yolov5s stay map The indicators are different from the official website , After two tests, it was 55.8 and 35.8, But the test results are similar to https://github.com/midasklr/yolov5prune as well as Issue #3168 · ultralytics/yolov5 Almost the same . Use repvgg block restructure yolov5s Of 3×3 Convolution , stay [email protected] and @.5:.95 The indicators can be improved by at least one point .

After training repyolov5s Need to carry out convert, Will the collateral 1×1conv To merge , Otherwise, the reasoning will be better than the original yolov5s slow 20%.

Use convert.py Yes repvgg block Re parameterize , The main codes are as follows , Reference resources https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py：

# --------------------------repvgg refuse---------------------------------
    def reparam conv(self):  # fuse model Conv2d() + BatchNorm2d() layers
         """ :param rbr_dense: 3×3 Convolution module  :param rbr_1x1: 1×1 Collateral branch inception :param _pad_1x1_to_3x3_tensor:  Yes 1×1 Of inception To expand  :return: """
        print('Reparam and Fusing Block... ')
        for m in self.model.modules():
            # print(m)
            if type(m) is RepVGGBlock:
                if hasattr(m, 'rbr_1x1'):
                    # print(m)
                    kernel, bias = m.get_equivalent_kernel_bias()
                    conv_reparam = nn.Conv2d(in_channels=m.rbr_dense.conv.in_channels,
                                                 out_channels=m.rbr_dense.conv.out_channels,
                                                 kernel_size=m.rbr_dense.conv.kernel_size,
                                                 stride=m.rbr_dense.conv.stride,
                                                 padding=m.rbr_dense.conv.padding, dilation=m.rbr_dense.conv.dilation,
                                                 groups=m.rbr_dense.conv.groups, bias=True)
                    conv_reparam.weight.data = kernel
                    conv_reparam.bias.data = bias
                    for para in self.parameters():
                        para.detach_()
                    m.rbr_dense = conv_reparam
                    # m.__delattr__('rbr_dense')
                    m.__delattr__('rbr_1x1')
                    m.deploy = True
                    m.forward = m.fusevggforward  # update forward
                continue
                # print(m)
            if type(m) is Conv and hasattr(m, 'bn'):
                # print(m)
                m.conv = fuse_conv_and_bn(m.conv, m.bn)  # update conv
                delattr(m, 'bn')  # remove batchnorm
                m.forward = m.fuseforward  # update forward
        self.info()
        return self

We can do it by calling onnx Model for convert The models before and after are visualized ：
Insert picture description here

Reasoning

map Indicators are only part of the reference , And part of it is about reparam and fuse After yolov5s Will it be because repvgg block Slow down due to implantation . In theory ,reparam After repvgg block Equivalent to 3×3 Convolution , However, the convolution is better than ordinary 3×3 Convolution is more compact .

After three tests coco val2017 After the dataset （5000 Zhang and single sheet reasoning ）, obtain repyolov5s The estimated time of the leaflet is 14/14/14（ms）、yolov5s by 16/16/16（ms）, Here I discussed with white God , White God believes that there may be test errors in the extremely close reasoning time between the two , Without any persuasion .

But to be sure convert After yolov5s Reasoning speed will not be because repvgg block Implant and slow down . In order to avoid contingency and measurement error , It's used here 500/5000/64115/118287 This picture is tested for reasoning ：
Insert picture description here
The test results are as follows ：

test

The detection effect should also be an indicator of concern , Use the above two models , Ensure that other parameters are consistent , Detect the picture , The effect is as follows ：
Insert picture description here

summary

Use repvgg block Yes yolov5s Improvement , Through ablation experiments , Sum up the following points ：

The fusion repvgg block Of yolov5s It can rise points on both large and small targets ; Use fusion repvgg
block and leakyrelu Of yolov5s Biyuan yolov5s stay map It's lower 0.5 percentage , But the speed can be improved 15%（ Mainly replaced Silu What functions do ）;
If you don't do convert, Personally, this fusion experiment is meaningless , The side branches will seriously affect the running speed of the model ;
C3 Block and Repvgg Block stay cpu Low cost performance , stay gpu and npu The maximum gain can only be achieved by using on
Use reparameterized yolov5 There is a price , The cost and loss are all in training , It will occupy more graphics card about 5-10% Explicit memory of , Training time will also increase
Consider using repvgg block Yes yolov3-spp and yolov4 Of 3×3 Convolution for reconstruction

The code and pre training model will be put on my warehouse later ：

https://github.com/ppogg/YOLOv5-Lite

Insert picture description here