当前位置：网站首页>Yolov5 Lite: ncnn+int8 deployment and quantification, raspberry pie can also be real-time

Yolov5 Lite: ncnn+int8 deployment and quantification, raspberry pie can also be real-time

2022-07-08 02:19:00 【pogg_】

The copyright of this article belongs to GiantPandaCV, Please do not reprint without permission

Preface ： Remember the article I wrote two months ago , About yolov4-tiny+ncnn+int8 Detailed tutorial on quantification ：
https://zhuanlan.zhihu.com/p/372278785

Later, I prepared to write yolov5+ncnn+int8 Quantitative tutorial , But in the yolov5 We have trouble quantifying , On the one hand, the speed is slower after quantification , On the other hand, the accuracy decreases seriously , There is a phenomenon that the screen is full of detection boxes , After many attempts , It all ended in failure .

later , Or decide to change another way yolov5 Quantify , One is even the smallest yolov5s The model can speed up after quantification , Still can't meet my demand for speed , Second, for Focus layer , No matter which forward reasoning framework is used , To add additional pairs Focus The splicing operation of layers is too cumbersome for me .

therefore , I am right. yolov5 Made a series of lightweight changes , Make his network structure more concise , It can also really speed up （ for example arm Raspberry pie of Architecture series , At least three times faster ;x86 Architecturally inter The processor can also speed up about twice ）：

The model structure is shown in ：https://zhuanlan.zhihu.com/p/400545131

This blog , Or continue with the previous article yolov4 Quantitative work , Yes yolov5 Conduct ncnn Deployment and quantification of .

One 、 Environmental preparation

There are two main tools needed ：

ncnn The frame of reasoning
Address Links ：https://github.com/Tencent/ncnn

YOLOv5-Lite Source code and weight
Address Links ：https://github.com/ppogg/YOLOv5-Lite

The performance of the model is as follows ：
Insert picture description here

About ncnn Build and install , There are many online tutorials , But recommended in linux Operation in environment ,window It's fine too , But there may be more holes .

Two 、onnx Model extraction

git clone https://github.com/ppogg/YOLOv5-Lite.git
python models/export.py --weights weights/yolov5-lite.pt --img 640 --batch 1
python -m onnxsim weights/yolov5-lite.onnx weights/yolov5-lite-sim.onnx

This process is usually very smooth ~

3、 ... and 、 Turn into ncnn Model

./onnx2ncnn yolov5ss-sim.onnx yolov5-lite.param yolov5-lite.bin
./ncnnoptimize yolov5-lite.param yolov5-lite.bin yolov5-lite-opt.param yolov5-lite-opt.bin 65536

This process still won't get stuck , It was extracted smoothly , At this point, there are fp32,fp16, Is the total 4 A model ：

In order to realize dynamic size image processing , Need to be right yolov5ss-opt.param Of reshape Modify the operation ：

Put the above three places reshape The scale of is all changed to -1：

Nothing else needs to be changed .

Four 、 Post processing modification

ncnn Official yolov5.cpp Two things need to be changed

anchor Information is in models/yolov5-lite.yaml, Need to cluster according to your own data set anchor Make corresponding modifications ：

Output layer ID stay Permute Inside the floor , It also needs to be modified accordingly ：

Revised as follows ：

here , Only the above points are modified ,Focus Layer code can also be removed according to personal situation , again make You can test .

fp16 The effect of model detection is as follows ：

5、 ... and 、Int8 quantitative

For more detailed tutorials, please refer to my Zhihu blog about yolov4-tiny A tutorial for , Many details will not be covered in this article （ A link is attached below ）.

Here are some additional points ：

Please use the checklist data set coco_val that 5000 Data set ;
mean and val The value of should be consistent with the value set in the original training model , stay yolov5ss.cpp It also needs to be consistent ;
The verification process is quite long , Please wait patiently

Run code ：

find images/ -type f > imagelist.txt
./ncnn2table yolov5-lite-opt.param yolov5-lite-opt.bin imagelist.txt yolov5-lite.table mean=[104,117,123] norm=[0.017,0.017,0.017] shape=[640,640,3] pixel=BGR thread=8 method=kl
./ncnn2int8 yolov5-lite-opt.param yolov5-lite-opt.bin yolov5-ite-opt-int8.param yolov5-lite-opt-int8.bin yolov5-lite.table

The quantified model is as follows ：

The size of the quantized model is about 1.7m about , It should meet your obsessive-compulsive disorder of small model size ;

here , Quantitative shufflev2-yolov5 The model is tested ：

There is a slight loss of accuracy after quantification , But it is still within the acceptable range . It is impossible that the accuracy of the model will not decline completely after quantification , For targets with obvious large-scale characteristics ,shufflev2-yolov5 For such goals score Can remain the same （ In fact, it will still drop a bit ）, But for long-distance small-scale targets ,score It will go down 10%-30% Unequal , What can't be done , So please treat the model rationally .

Remove the first three preheating , The temperature of raspberry pie is 45° above , Test the model , After quantification benchmark as follows ：

#  The fourth time 
[email protected]:~/Downloads/ncnn/build/benchmark $ ./benchncnn 8 4 0
loop_count = 8
num_threads = 4
powersave = 0
gpu_device = -1
cooling_down = 1
    v5lite-s  min =   90.86  max =   93.53  avg =   91.56
v5lite-s-int8  min =   83.15  max =   84.17  avg =   83.65
v5lite-s-416  min =  154.51  max =  155.59  avg =  155.09
         yolov4-tiny  min =  298.94  max =  302.47  avg =  300.69
           nanodet_m  min =   86.19  max =  142.79  avg =   99.61
          squeezenet  min =   59.89  max =   60.75  avg =   60.41
     squeezenet_int8  min =   50.26  max =   51.31  avg =   50.75
           mobilenet  min =   73.52  max =   74.75  avg =   74.05
      mobilenet_int8  min =   40.48  max =   40.73  avg =   40.63
        mobilenet_v2  min =   72.87  max =   73.95  avg =   73.31
        mobilenet_v3  min =   57.90  max =   58.74  avg =   58.34
          shufflenet  min =   40.67  max =   41.53  avg =   41.15
       shufflenet_v2  min =   30.52  max =   31.29  avg =   30.88
             mnasnet  min =   62.37  max =   62.76  avg =   62.56
     proxylessnasnet  min =   62.83  max =   64.70  avg =   63.90
     efficientnet_b0  min =   94.83  max =   95.86  avg =   95.35
   efficientnetv2_b0  min =  103.83  max =  105.30  avg =  104.74
        regnety_400m  min =   76.88  max =   78.28  avg =   77.46
           blazeface  min =   13.99  max =   21.03  avg =   15.37
           googlenet  min =  144.73  max =  145.86  avg =  145.19
      googlenet_int8  min =  123.08  max =  124.83  avg =  123.96
            resnet18  min =  181.74  max =  183.07  avg =  182.37
       resnet18_int8  min =  103.28  max =  105.02  avg =  104.17
             alexnet  min =  162.79  max =  164.04  avg =  163.29
               vgg16  min =  867.76  max =  911.79  avg =  889.88
          vgg16_int8  min =  466.74  max =  469.51  avg =  468.15
            resnet50  min =  333.28  max =  338.97  avg =  335.71
       resnet50_int8  min =  239.71  max =  243.73  avg =  242.54
      squeezenet_ssd  min =  179.55  max =  181.33  avg =  180.74
 squeezenet_ssd_int8  min =  131.71  max =  133.34  avg =  132.54
       mobilenet_ssd  min =  151.74  max =  152.67  avg =  152.32
  mobilenet_ssd_int8  min =   85.51  max =   86.19  avg =   85.77
      mobilenet_yolo  min =  327.67  max =  332.85  avg =  330.36
  mobilenetv2_yolov3  min =  221.17  max =  224.84  avg =  222.60

It can be accelerated 5-10% about , I don't have it rv and rk A series of boards , So the test of other boards needs to be tested by friends in the community ~

As for the former yolov5s Why is the speed slower after quantification , Even the accuracy drops seriously , The only explanation lies in Focus layer , This thing is easy to collapse if it is slightly misaligned , It's also more brain consuming , Simply removed .

summary ：

In this paper, yolov5-lite(s Model ） Deployment and quantification tutorial ;
Before dissecting yolov5s The reason why it is easy to collapse is quantified ;
ncnn Of fp16 Model contrast native torch The accuracy of the model can remain unchanged ;

[ Upper figure , Zuo Wei torch The original model , Right for fp16 Model ]
ncnn Of int8 The accuracy of the model will decrease slightly , Speed can only be improved on raspberry pie 5-10%, Other boards have not been tested yet ;

[ Upper figure , Zuo Wei torch The original model , Right for int8 Model ]

For some scenes with small space , Like an elevator , Face detection , The resolution is generally 240*180,s The model in raspberry pie once reasoned forward as 55-60ms, stay 0.1T On the raspberry pie of Suanli , Basically, it can also achieve real-time ！

Project address ：https://github.com/ppogg/YOLOv5-Lite

Welcome white whoring ~

2021 year 08 month 20 Daily update : ----------------------------------------------------------

I have finished Android Version adaptation

This is my red rice mobile phone , The processor is Qualcomm snapdragon 730G, The test results are as follows :

This is quantified int8 Model detection effect :

Outdoor scene detection :