当前位置:网站首页>Heavy update! Yolov4 latest paper! Interpreting yolov4 framework
Heavy update! Yolov4 latest paper! Interpreting yolov4 framework
2022-06-25 20:37:00 【SophiaCV】
Thesis address and code
https://arxiv.org/abs/2004.10934v1
Code :https://github.com/AlexeyAB/darknet
This blog post is about YOLOv4 Translation and framework interpretation of the thesis ! And there are PDF Version available for download !——YOLOv4 Reading notes ( With mind map )!YOLOv4: Optimal Speed and Accuracy of Object Detection( Click the jump )
Abstract :
It is said that there are many functions that can improve convolutional neural networks (CNN) The accuracy of the . A combination of these features needs to be tested on a large dataset , The results are proved theoretically . Some functions only run on certain models , And it only works on certain issues , Or just run on small datasets ; And some functions ( For example, batch normalization and residual linking ) For most models , Tasks and datasets . We assume that such general functions include weighted residual connections (WRC), Cross phase partial connection (CSP), Standardization across small batches (CmBN), Self confrontation training (SAT) and Mish Activate . We use the following new features :WRC,CSP,CmBN,SAT,Mish Activate , Mosaic data enhancement ,CmBN,DropBlock Regularization and CIoU The loss of , And combine some of these functions to achieve the latest results :43.5% Of AP(65.7 stay Tesla V100 On ,MS COCO The real-time speed of the dataset is about 65 FPS.

The core of the core : The author will Weighted-Residual-Connections(WRC), Cross-Stage-Partial-connections(CSP), Cross mini-Batch Normalization(CmBN), Self-adversarial-training(SAT),Mish-activation Mosaic data augmentation, DropBlock, CIoU And so on YOLOv4, Can hang everything YOLOv4. stay MS-COCO Data on :43.5%@AP(65.7%@AP50) At the same time, it can achieve [email protected]



contribution
The author designs YOLO At the beginning of the project, the goal is to design a Fast and efficient target detector . The main contributions of this paper are as follows :
A fast and powerful target detector is designed , It makes anyone need only one 1080Ti perhaps 2080Ti You can train such an ultra fast and accurate target detector ;
( Can't translate directly into English )We verify the influence of SOTA bag-of-freebies and bag-of-specials methods of object detection during detector training
The author of SOTA Methods to improve ( contain CBN、PAN,SAM) To make it more suitable for single GPU Training
Method
Based on the existing real-time network, the author puts forward two views :
about GPU for , In group convolution, a small number of groups(1-8), such as CSPResNeXt50/CSPDarknet53;
about VPU for , Use group convolution instead of SE modular .
Network structure selection
The network structure is selected for the input resolution 、 Network layers 、 Parameter quantity 、 Find a compromise between the number of output filters . The author's research shows that :CSPResNeXt50 Better than... In classification CSPDarkNet53, On the contrary, it performs poorly in terms of detection .
After the main structure of the network is determined , The next goal is to select additional modules to enhance the receptive field 、 Better feature aggregation module ( Such as FPN、PAN、ASFF、BiFPN). The best model for classification may not be suitable for detection , contrary , The detection model needs to have the following characteristics :
Higher input resolution , To better detect small targets ;
More layers , In order to have a greater receptive field ;
More parameters , Larger models can detect targets of different sizes at the same time .
A word is : Choose to have a greater receptive field 、 The model with larger parameters acts as backbone. The following figure shows the different backbone Comparison of the above information . You can see from it :CSPResNeXt50 Contains only 16 Convolution layers , Its receptive field is 425x425, contain 20.6M Parameters ; and CSPDarkNet53 contain 29 Convolution layers ,725x725 Feeling field of ,27.6M Parameters . This shows theoretically and experimentally :CSPDarkNet53 It is more suitable for Backbone.

stay CSPDarkNet53 On the basis of , The author added SPP modular , Because it can enhance the receptive field of the model 、 Separate more important context information 、 It will not reduce the reasoning speed of the model ; meanwhile , The author also uses PANet Different in backbone Level parameter aggregation method instead of FPN.
The final model is :CSPDarkNet53+SPP+PANet(path-aggregation neck)+YOLOv3-head = YOLOv4.
Tricks choice
For better training target detection model ,CNN Models typically have the following modules :
Activations:ReLU、Leaky-ReLU、PReLU、ReLU6、SELU、Swish or Mish
Bounding box regression Loss:MSE、IoU、GIoU、CIoU、DIoU
Data Augmentation:CutOut、MixUp、CutMix
Regularization:DropOut、DropPath、Spatial DropOut、DropBlock
Normalization:BN、SyncBn、FRN、CBN
Skip-connections:Residual connections, weighted residual connections, Cross stage partial connections
The author selects the following from the above modules : Select the activation function Mish; Regularization options DropBlock; Due to the focus on single GPU, Not considered SyncBN.
Other improvement strategies
In order to make the detector more suitable for single GPU, The author also made several additional designs and improvements :
A new data augmentation method is introduced :Mosaic And self confrontation training ;
adopt GA The algorithm selects the optimal hyperparameter ;
The existing methods are improved to be more suitable for efficient training and reasoning : improvement SAM、 improvement PAN,CmBN.




YOLOv4
To make a long story short ,YOLOv4 Contains the following information :
Backbone:CSPDarkNet53
Neck:SPP,PAN
Head:YOLOv3
Tricks(backbone):CutMix、Mosaic、DropBlock、Label Smoothing
Modified(backbone): Mish、CSP、MiWRC
Tricks(detector):CIoU、CMBN、DropBlock、Mosaic、SAT、Eliminate grid sensitivity、Multiple Anchor、Cosine Annealing scheduler、Random training shape
Modified(tector):Mish、SPP、SAM、PAN、DIoU-NMS
Experiments
The quality of the model should be verified by experiments , Directly compare the table :





边栏推荐
- Installing MySQL under Linux (CentOS 7)
- CG kit explore high performance rendering on mobile terminal
- Connect the local browser to the laboratory server through mobaxterm
- Interview shock: talk about thread life cycle and transformation process?
- Barrier of cursor application scenario
- Log4j2 vulnerability battle case
- Tencent music knowledge map search practice
- JS canvas drawing an arrow with two hearts
- Li-rads lesion classification reading notes
- Transunet reading notes
猜你喜欢

Log4j2 vulnerability battle case

Avoid material "minefields"! Play super high conversion rate

Flexible scale out: from file system to distributed file system
[data recovery in North Asia] a data recovery case in which the upper virtual machine data is lost due to the hard disk failure and disconnection of raid6 disk array
Yanjiehua, editor in chief of Business Review: how to view the management trend of business in the future?
Tencent music knowledge map search practice
Interviewer: why does TCP shake hands three times and break up four times? Most people can't answer!

Node installation method you don't know
hashlib. Md5() function to filter out duplicate system files and remove them

II Traits (extractors)
随机推荐
206. reverse linked list (insert, iteration and recursion)
Share a billing system (website) I have developed
Recommend a free screen recording software
Introduction to the basics of kotlin language: lambda expression
Pcl+vs2019+opencv environment configuration
Detailed explanation of unified monitoring function of multi cloud virtual machine
Clickhouse disables automatic clearing of tables / columns, that is, disables TTL
Yunzhisheng atlas supercomputing platform: computing acceleration practice based on fluid + alluxio (Part I)
Node installation method you don't know
Chrome plugin installation
Yanjiehua, editor in chief of Business Review: how to view the management trend of business in the future?
Now meditation: crash service and performance service help improve application quality
Expand and check the specified node when loading ztree
Baidu AI Financing Innovation workshop enrollment!
8. iterators and generators
Online yaml to XML tool
Redis common principles interview
PIP command -fatal error in launcher: unable to create process using How to resolve the error after migrating the virtual environment?
Impact of Huawei application transfer and application claim on user identification
Modifying routes without refreshing the interface