当前位置:网站首页>POWERBOARD coco! Dino: let target detection embrace transformer
POWERBOARD coco! Dino: let target detection embrace transformer
2022-07-25 17:05:00 【PaperWeekly】

author | Li Feng
Company | Doctoral student of Hong Kong University of science and Technology
Research direction | object detection 、 Multimodal learning
PR Let's take a look at our recent list COCO Target detection model of ,DINO(DETR withImproved deNoising anchOr boxes), From the beginning of March to now (7 month ), This model makes DETR (DEtection TRansformer) Type of detector has achieved target detection SOTA performance , stay COCO We got it 63.3 AP Performance of , Compared with the previous SOTA The detector reduces the model parameters and training data by more than ten times !

Paper title :
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Thesis link :
https://arxiv.org/abs/2203.03605
Code link :
https://github.com/IDEACVR/DINO

The main features
SOTA performance : On the large model, with relatively small data and models (~1/10 Compared with before SwinV2) The best test results have been obtained . stay ResNet-50 Standards for setting I got 51.3 AP.
End2end( End to end learning ):DINO Belong to DETR Type of detector , It is end-to-end learnable , It avoids many modules that need manual design in traditional detectors ( Such as NMS).
Fast converging( Fast convergence ): In standard ResNet-50 setting Next , Use 5 Scale features (5-scale) Of DINO stay 12 individual epoch To achieve 49.4 AP, stay 24 individual epoch To achieve 51.3 AP. Use 4 Scale features (4-scale) Of DINO It achieves similar performance and can 23 FPS function .


Effect display
▲ La La Land, trained on COCO
▲ 007, trained on COCO

Motivation starting point
Transformer Now it is widely used in natural language processing and computer vision , And has achieved the best performance on many mainstream tasks . However , In the field of target detection ,DETR This is based on Transformer Although as a very innovative detector , But it has not been widely used as a mainstream detector . for example , Almost all models are in PaperWithCode The list of is all traditional CNN Detection head ( Such as HTC [1]).
therefore , What we are very interested in is ,DETR This simplicity 、 End to end learnable target detector , There are also stronger models Transformer Blessing , Can we not achieve better performance ?
The answer is yes .

Background Background
Doing it DINO Before , Several students in our laboratory have finished DAB-DETR [2] and DN-DETR [3],DINO It is also a continuation of these two works that several of our classmates followed together , These designs are used .
DAB-DETR Is thinking DETR query The question of understanding . It puts DETR Of positional query Explicitly modeled as a four-dimensional box, a four-dimensional box , At the same time, each floor decoder Will predict the relative offset And update the detection box , Get a more accurate detection frame prediction
402 Payment Required
, Dynamically update this check box and use it to help decoder cross-attention To extract feature.DN-DETR Is thinking DETR Bipartite graph matching problem in , Or label allocation . We found that DETR The binary matching in is very unstable in the early stage , This will lead to inconsistent optimization objectives and slow convergence . therefore , We use one denoising task Input the real box with noise directly into decoder in , As a shortcut To learn relative offset , It skips the matching process and learns directly ( Understand my previous article in detail ).
These two articles make us understand DETR Our understanding has deepened a lot , At the same time DETR The effect of type model is similar to that of tradition CNN The convergence speed and results of the model comparable. How to further improve the detector performance and convergence speed ? We can follow DAB and DN To think further :
DAB Let's realize query Importance , So how to learn better or initialize better query?
DN Denoising training is introduced to stabilize label assignment , How to further optimize label allocation ?

Method Method introduction

▲ Framework
To solve the above problems ,DINO Further put forward 3 Improvements to optimize , The model architecture is shown in the figure above .
5.1 Contrastive denoising(DN)
DN The noise samples introduced in the de-noising training of are all positive samples for learning , However, the model not only needs to learn how to return positive samples , Also need to be aware of How to distinguish negative samples . for example ,DINO Of decoder Used in China 900 individual query, Generally, there are only a few objects in a picture , So most of them are negative samples .

therefore , We designed a training model to identify negative samples , As shown in the figure above , We are right. DN Improved , Not only to return to the real box , You also need to identify negative samples . about DN When a large noise is added to the real box , We regard it as a negative sample , In the denoising training, it will be supervised not to predict objects . meanwhile , These negative samples happen to be near the real box , Therefore, it is relatively difficult to distinguish the difficult negative samples , The distinction between positive and negative samples that allow the model to learn .
5.2 Mix query selection
In most detr In the model ,query It is learned from the data set , It is not related to the input image . For better initialization decoder query,deformable detr [4] Put forward to use encoder Of dense feature Categories and boxes are predicted in , And select some meaningful ones from these dense predictions to initialize decoder feature.
However , This method was not widely used in later work , We have made some improvements to this method and re emphasized its importance . stay query in , We are actually more concerned about position query, That is, the box . meanwhile , from encoder feature Of feature As content query Not the best for testing , Because of these feature They are all very rough and have not been optimized , There may be ambiguity . For example “ people ” This category , Elected by the feature It may only contain a part of people or objects around people , Inaccurate , Because it is grid feature.
therefore , We've improved on that , Give Way query selection Just choose position query, And use what you can learn content query.
5.3 Look forward twice
This method is right for decoder The gradient propagation of is optimized , Let's not talk about it here , You can go to our paper Read further .

summary
We hope DINO It can bring you some enlightenment , It has SOTA Performance of , The simplicity of end-to-end optimization , And fast convergence 、 Training and inference There are many advantages of being quick .
At the same time, I hope DETR This type of detector is used by more people , Let everyone realize DETR This type of detector is more than just a novel Methods , At the same time, it also has robust performance .

reference

[1] HTC https://arxiv.org/abs/1901.07518
[2] DAB-DETR https://arxiv.org/abs/2201.12329
[3] DN-DETR https://arxiv.org/pdf/2203.01305.pdf
[4] https://arxiv.org/abs/2010.04159
Read more

# cast draft through Avenue #
Let your words be seen by more people
How to make more high-quality content reach the reader group in a shorter path , How about reducing the cost of finding quality content for readers ? The answer is : People you don't know .
There are always people you don't know , Know what you want to know .PaperWeekly Maybe it could be a bridge , Push different backgrounds 、 Scholars and academic inspiration in different directions collide with each other , There are more possibilities .
PaperWeekly Encourage university laboratories or individuals to , Share all kinds of quality content on our platform , It can be Interpretation of the latest paper , It can also be Analysis of academic hot spots 、 Scientific research experience or Competition experience explanation etc. . We have only one purpose , Let knowledge really flow .
The basic requirements of the manuscript :
• The article is really personal Original works , Not published in public channels , For example, articles published or to be published on other platforms , Please clearly mark
• It is suggested that markdown Format writing , The pictures are sent as attachments , The picture should be clear , No copyright issues
• PaperWeekly Respect the right of authorship , And will be adopted for each original first manuscript , Provide Competitive remuneration in the industry , Specifically, according to the amount of reading and the quality of the article, the ladder system is used for settlement
Contribution channel :
• Send email :[email protected]
• Please note your immediate contact information ( WeChat ), So that we can contact the author as soon as we choose the manuscript
• You can also directly add Xiaobian wechat (pwbot02) Quick contribution , remarks : full name - contribute

△ Long press add PaperWeekly Small make up
Now? , stay 「 You know 」 We can also be found
Go to Zhihu home page and search 「PaperWeekly」
Click on 「 Focus on 」 Subscribe to our column
·

边栏推荐
- Dynamic planning topic record
- 用秩讨论线性方程组的解/三个平面的位置关系
- 柏睿数据加入阿里云PolarDB开源数据库社区
- 搜狗批量推送软件-搜狗批量推送工具【2022最新】
- Don't believe these "rumors" in the process of preparing for the exam!
- 2D semantic segmentation -- deeplabv3plus reproduction
- 免费的低代码开发平台有哪些?
- 【obs】发送前丢帧及帧优先级
- Use huggingface to quickly load pre training models and datasets in moment pool cloud
- 【目标检测】YOLOv5跑通VisDrone数据集
猜你喜欢

Replicate swin on Huawei ascend910_ transformer

Fudan University emba2022 graduation season - graduation does not forget the original intention and glory to embark on the journey again

复旦大学EMBA同学同行专题:始终将消费者的价值放在最重要的位置

MySQL之联表查询、常用函数、聚合函数

WPF 实现用户头像选择器

In the eyes of 100 users, there are 100 QQS

Jenkins' role based authorization strategy installation configuration

3D semantic segmentation - PVD

How to deploy applications on IPFs using 4everland cli

在华为昇腾Ascend910上复现swin_transformer
随机推荐
Is the online account opening of Founder futures reliable and safe?
为什么 4EVERLAND 是 Web 3.0 的最佳云计算平台
Exception handling mechanism topic 1
easyui datagrid控件使用
MySQL linked table query, common functions, aggregate functions
月薪1万在中国是什么水平?答案揭露残酷的收入真相
基于redis6.2.4的redis cluster部署
【云驻共创】探秘GaussDB如何助力工商银行打造金融核心数据
From digitalization to intelligent operation and maintenance: what are the values and challenges?
Rosen's QT journey 99 QML table control tableview
Chapter V: process control
Dynamic planning topic record
第四章:操作符
Rebudget: balance efficiency and fairness in market-based multi-core resource allocation by reallocating the budget at run time
Unity is better to use the hot scheme Wolong
Roson的Qt之旅#99 QML表格控件-TableView
After 20 years of agitation, the chip production capacity has started from zero to surpass that of the United States, which is another great achievement made in China
Enterprise live broadcast: witness focused products, praise and embrace ecology
ReBudget:通过运行时重新分配预算的方法,在基于市场的多核资源分配中权衡效率与公平性
China's chip self-sufficiency rate has increased significantly, resulting in high foreign chip inventories and heavy losses. American chips can be said to have thrown themselves in the foot


