当前位置：网站首页>[target detection] YOLOv7 theoretical introduction + practical test

[target detection] YOLOv7 theoretical introduction + practical test

2022-08-01 04:48:00 【zstar-_】

Overview

YOLOv7 was proposed by the author team of YOLOv4, whose first paper is also the author of YOLOR.
The style of the paper is the same as that of YOLOR, which is more difficult to understand. Therefore, the theoretical part here is not carefully studied, and only a few innovative points proposed in the paper are translated and summarized.

Theoretical Innovation

Extended Efficient Layer Aggregation Network

The author proposes a network structure E-ELAN using expand, shuffle, and merge cardinality to achieve the ability to continuously enhance the network learning ability without destroying the original gradient path.

insert image description here

Model scaling based on concatenate models

The authors propose a model scaling method that preserves the properties of the model as it was originally designed and maintains the optimal structure.

insert image description here

Plan reparameterized convolution

Although RepConv achieves excellent performance on top of VGG, its accuracy will drop significantly when it is directly applied to ResNet, DenseNet, and other architectures.The authors use gradient flow propagation paths to analyze how reparameterized convolutions should be combined with different networks.The authors also design the planned reparameterized convolution accordingly.

insert image description here

tag matching

In the past, in the training of deep networks, label assignment usually refers directly to GT and generates hard labels according to given rules.However, in recent years, if taking object detection as an example, researchers often use the network to predict the quality and distribution of the output, and then use some computational and optimization methods to generate reliable soft labels in combination with GT considerations.For example, YOLO uses bounding box regression predictions and IoU of GT as soft labels for objectivity.In this paper, the authors consider the network prediction results together with GT, and then assign soft labels as a mechanism for the "label assigner".

insert image description here

Finally, the author conducted a series of model comparison experiments, and the results are shown in the following table:

insert image description here

Experimental test

No matter how bells and whistles the previous theoretical part is, it's ultimately up to the practical test results.
Since YOLOv7 is modified based on the YOLOv5 code, anyone who has trained the YOLOv5 model can easily run it.
The specific process here will not be repeated, because with [target detection] YOLOv5Running through the VisDrone dataset is exactly the same.

Here I still use the VisDrone data set, use the YOLOv7 model, and add the same training parameters as in the previous blog post. After training for 1 epoch, the memory is burst..

insert image description here
I changed the batch_size to 1 and insisted on 2epoch, the memory still explodes.

So I switched to my own data set and compared it with YOLOv5-5.0. The effect is as follows:

Algorithm	[email protected]	[email protected]:.95s
yolov5-5.0	95.6%	67.6%
yolov7	94.8%	67.4%

It can be seen that the effect of yolov7 is not as good as that of yolov5 on my own data set. This may be due to the fact that my data set has a larger target and is sparser, and the detection difficulty is not high.In addition, the input image is 640x640 size, and the input size recommended by the better model of yolov7 is 1280x1280.However, for my 6GB slag graphics card, it cannot be tested and verified.I'll try again when I get a chance to switch devices later..