当前位置：网站首页>[cvpr2022] intensive reading of querydet papers

[cvpr2022] intensive reading of querydet papers

2022-06-11 08:28:00 【Marlowee】

The paper ：https://arxiv.org/abs/2103.09136

Source code ：https://github.com/ChenhongyiYang/QueryDet-PyTorch

1 Preface

Recently, we have been considering how to improve the detection of small and medium-sized targets in remote sensing images , I happened to see the small target detection work proposed by Tucson in the future QueryDet, The main idea of the article is to use Cascaded sparsity query Accelerate small target detection at high resolution , It greatly reduces the computing and storage overhead of the network , The following mainly talks about my understanding and thinking about this article .

2 Research background

I think the best thing about the author's article is , They didn't carry it mechanically 、 List the work done by our predecessors , Instead, focus on summarizing the task itself , The challenges and difficulties inherent in the task are clearly sorted out .

2.1 The challenge of low accuracy of small target detection

stay COCO On dataset , Current mainstream detectors RetinaNet For big goals 、 The detection accuracy of medium targets can reach 51.2mAP and 44.1mAP, However, the detection accuracy of small targets only stops at 24.1mAP, The author concludes that there are three main reasons for the accuracy degradation of small targets ：

CNN The down sampling operation will eliminate the characteristic information of small targets , It will also cause the feature to be polluted by the background ;
The receptive field of low resolution feature map may not match the size of small target ;
The influence of small target's bounding box disturbance on detection results is much greater than that of large target , So positioning is more difficult ;

Small target detection challenges

2.2 Improve motivation

The existing small target detection methods usually maintain the feature of large resolution by enlarging the size of the input image or reducing the sampling rate , This method introduces a large number of redundant calculations , Make it in low-level The computational complexity of feature detection is very high , As shown in the figure below .
Comparison of operation cost of different structures

The author denies this Stupid way （ I wanted to use this stupid method to improve before , Fortunately, I read this article first ）, And described two key findings ：

high resolution 、 Lower characteristic layer (Low-level feature map) The feature computation in is highly redundant , The spatial distribution of small targets is sparse , It only takes up a small part of the feature map , As shown in the figure below, the proportion of aircraft in the remote sensing image is very small ;
FPN In structure , Even the low resolution feature layer can not accurately detect small targets , However, it can also roughly judge whether there is a small target and the corresponding area with high confidence . The sampling characteristic of feature pyramid is similar to the convolution characteristic of convolution neural network （ translation 、 The zoom 、 Distortion invariance ）, It can be sampled according to its bottom 、 Feature inference based on the characteristics of up sampling ;

Small target image example
Based on the above starting point ,QueryDet Cascade sparse query is proposed （Cascade Sparse Query） Mechanism . among Query Represents the use of the previous layer （higher-level feature with lower resolution） It's from query To guide the small target detection of this layer , Then predict the query Further transfer to the next layer , The process of guiding the next level of small target detection ;Cascade It shows the idea of this cascade ;Sparse Means by using sparse convolution （sparse convolution） To significantly reduce the computational overhead of the detection header on the lower feature layer .

To put it bluntly , The feature map of the previous layer has high-level features and low resolution , Be responsible for the initial screening of small targets ; This kind of query is conducted to the low level with high-resolution information before fine searching , such “glance and focus” Of two-stage Structure can effectively carry out dynamic reasoning , Detect the final result .

2 Model structure

As mentioned earlier , In the previous design of detector based on feature pyramid , Small targets tend to be detected from high-resolution low-level feature maps . However , Because small targets are usually sparsely distributed in space , The efficiency of intensive computing paradigm on high-resolution feature map is very low . Inspired by this observation , The author puts forward a kind of From thick to thin To reduce the computational cost of low-level pyramids ： First , Predict the rough position of the small target on the rough feature map , Then the corresponding positions on the fine feature map are calculated . This process can be seen as a query process ： The rough location is the query key , The high-resolution feature used to detect small targets is the query value , The whole process is shown in the figure below .
QueryDet Testing process
The original text strictly defines this process with a formula , It is not easy to understand , Below I will borrow the figure in the author's home page 、 Try to explain the detection process clearly in popular language ：
QueryDet Schematic diagram of testing process
For the image above , Contains two cascading query operations , namely ：Large->Medium and Medium->Small, We use Large->Medium For example . First , The Internet will be in Large Marking small targets in hierarchical images （ The scale will be smaller than the preset threshold s The object of is defined as a small target ）,Large The hierarchical network will predict the confidence of small targets in the prediction process , Get the grid information containing small targets ; secondly , In the process of reasoning , The network selection prediction score is greater than the threshold s The position of query, And map this location to Medium In the characteristic diagram of , The specific mathematical formula is as follows (1) Shown ; Last ,Medium The corresponding three head Only in key Calculate on the corresponding position in the position set head And for the next layer queries, This calculation process is realized by sparse convolution .

Specific mathematical description

3 experimental result

In this paper, the ablation experiments have been done , It mainly includes ：

stay COCO mini-val Compare RetinaNet & QueryDet
stay Visdrone Compare RetinaNet & QueryDet
stay COCO mini-val On the ablation experiment , Compare HR(hight-resolution feature),RB(loss
re-balance, Is to add weight to different layers ),QH( additional Query Head)
stay COCO and Visdrone 2018 Use different on query threshold Compare AP、AR、FPS Of trade off
stay COCO mini-val I don't use query There are three different methods and uses query Methods ：CSQ The optimal
stay COCO mini-val The upper comparison starts from different layers query, Corresponding AP and FPS
Use different backbone（MobileNet V2 & ShuffleNet V2） test result
stay COCO mini-val Use embedded on QueryDet Of FCOS, Comparison results
stay COCO test-dev & VisDrone validation It's not used in methods

CSQ、CQ、CCQ Performance comparison
The results are not listed , Look at the visualization .

4 summary

QueryDet utilize high-resolution feature To improve the performance of small target detection , adopt CSQ Mechanism , The high-level low resolution feature is used to screen the area with small targets , The position obtained by primary screening on the high-resolution feature layer , And use sparse convolution , The calculation consumption is greatly saved . Actually , In the thesis SOFT It is still open to discussion , How about the specific performance , I will share it with you after carefully studying the source code .

原网站

版权声明
本文为[Marlowee]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/162/202206110820006101.html