当前位置：网站首页>Anchor free detector: centernet

Anchor free detector: centernet

2022-07-27 10:01:00 【yfy2022yfy】

2019/07/15, Reprint please indicate

Address of thesis ：https://arxiv.org/abs/1904.07850

Source code ：https://github.com/xingyizhou/CenterNet

One 、 Abstract

At present, more successful detectors , They all list a lot of potential target locations , Then classify each , After that, we need to add post-processing , The author believes that this practice is still inefficient .

In this paper , The author put A goal is modeled as a point -- The center point of the target detection frame . The detector estimates through key points , Find the center point , And return to other attributes , such as size,3D Location , Direction and even posture .

CenterNet stay MS COCO The best speed accuracy performance is achieved on the data set ：

28.1%AP @142fps
37.4%AP @52fps
45.1%AP @1.4fps（ Multi scale testing ）.

Two 、 primary coverage

1、 Methods to introduce

In this paper , The author modeled a goal as a point -- The center point of the target detection frame . The detector estimates through key points , Find the center point , And return to other attributes , such as size,3D Location , Direction and even posture .

object detection It has become a standard key point prediction problem ：

Input the picture into the full convolution network to generate heatmap, The peak value of the heat map corresponds to the target center ;
Predict the width and height of the target detection frame according to the picture features at the peak ;

in addition , Output additional predictions at the center , Can be done 3d object detection , And multi person posture estimation ：

about 3D Box estimates , We return to the absolute depth of the object 、 3、 ... and D Box size and object orientation
For human posture estimation , We regard the two-dimensional joint point as the central position + Offset , Direct regression offset

2、 As before based on anchor Comparison of methods

This method and is based on anchor Of one stage The method is similar to . The center point itself can also be regarded as a box with unknown shape anchor.
The difference is ：

Center point anchor Allocation is only based on location , Don't consider IoU, There is no threshold and classification ;
A goal has only one positive example center , no need NMS, But simply from heatmap Take the peak .
CenterNet Higher output resolution , Output stride=4, Compared with the traditional stride=16.

3、 Output thermal diagram

The network output result is down sampling stride=4. The author uses several full convolutional codes - Decode the network prediction heat map ： Stacked Hourglass The Internet , On the sampling ResNet, And deep polymerization DLA.

First calculate GT Key points , Using Gauss check GT Point and adjacent pixel calculation . If there is overlap after Gaussian calculation , Then the maximum value is used for each pixel . Use during training focal loss.

To compensate for downsampling （stride=4） Resulting quantization error , We additionally predicted the center point offset , For training L1 loss.

4、 The central point is the goal

Suppose the coordinates of the target box are $(x_1^{(k)}, y_1^{(k)}, x_2^{(k)}, x_2^{(k)})$ . Center point p_k Output from the thermodynamic diagram network . in addition , You can return to the target frame width and height s_k . Considering the amount of calculation , Here, only the output thermal map is used to predict the size of the target frame , Use L1 loss.

The author uses a network to predict key points at the same time 、 Center point offset 、 And scale , common C+4 results （ For key points C A thermodynamic diagram shows ） By the way , The value of the center point is 0~1 between , Is confidence . All outputs share a backbone , Then each output will use one 3x3 Convolution ,ReLU and 1x1 Convolution .

Heat map （ The left one ）、 Peak offset （ middle ）、 Check the size of the frame （ The right one ）

Heat map C The thermodynamic diagrams of this category share C individual .
It is written that every thermodynamic diagram has n A peak , The author sets the maximum 100, I understand it is multi-target detection .
Each target has a corresponding center point and frame size

5、 Detection extension

In addition to output target detection , This method can be in the same form , increase 3D testing 、 The branch of human posture estimation .