当前位置:网站首页>GWD: rotating target detection based on Gaussian Wasserstein distance | ICML 2021
GWD: rotating target detection based on Gaussian Wasserstein distance | ICML 2021
2022-06-29 22:45:00 【VincentLee】
This paper describes the main problems of rotating target detection in detail , The rotating regression target is defined as Gaussian distribution , Use Wasserstein Distance measures the distance between Gaussian distributions for training . at present , Conventional target detection also has many methods to transform regression into probability distribution function , This article has the same merits , Worth reading
The paper : Rethinking Rotated Object Detection with Gaussian Wasserstein Distance Loss
- Address of thesis :https://arxiv.org/abs/2101.11952
- Paper code :https://github.com/yangxue0827/RotationDetection
Introduction
Targets with arbitrary orientation are everywhere in the detection data set , Target detection relative to horizontal , Rotating target detection is still in its infancy . at present , majority SOTA Research has focused on the rotation angle of the regression target , Solving the rotation angle brings new problems :i) The index is inconsistent with the loss .ii) The regression interval of rotation angle is discontinuous . iii) Square problem . in fact , There is no good solution to the above problems , This will greatly affect the performance of the model , Especially when the angle is at the boundary of the range .
In order to solve the above problems , The paper proposes that GWD Method , Firstly, a two-dimensional Gaussian distribution is used to model the rotating target , And then use Gaussian Wasserstein Distance(GWD) Instead of nondifferentiable rotation IoU, according to GWD Calculation loss value , This aligns model training with metrics .
The main contributions of this paper are as follows :
- Three main problems of rotating target detection are summarized .
- Use Gaussian Wasserstein Distance(GWD) Describe rotation bbox Distance between , Reuse GWD Calculate instead of IoU Lost loss, And it is differentiable .
- GWD-based Loss can solve the problem of discontinuous rotation angle range and square problem , And right bbox Is defined in a way that does not require .
- Test on multiple public datasets , The methods of the thesis have good performance .
Rotated Object Regression Detector Revisit
Bounding Box Definition
chart 2 Two rotations are shown bbox How to define :OpenCV form $D{oc}$ And the long side form $D{le}$, The angle of the former is $h_{oc}$ And the abscissa $\theta\in[-90^{\circ},0^{\circ})$, The angle of the latter is the angle between the long side and the abscissa $\theta\in[-90^{\circ},90^{\circ})$, The two definitions can be transformed into each other ( Regardless of the center point ):
The main difference between the two representations lies in the order of edges and the angle $(h,w,\theta)$, same bbox In different ways , You may need to swap the order of edges or the addition and subtraction of angles 90. In many studies nowadays , Combine the design of the model with bbox To avoid specific problems : Such as $D{oc}$ Square problem can be avoided ,$D{le}$ The edge exchange problem can be avoided .
Inconsistency between Metric and Loss( Inconsistency between indicators and losses )
IoU It is an important evaluation index in the detection field , But the regression loss function used in actual training ( Such as $l_n$-norms) There are often inconsistencies with the evaluation indicators , That is, a smaller loss value does not equal a higher performance . at present , Some measures have been taken to solve the inconsistency problem in the field of horizontal target detection , Such as DIoU and GIoU. In the field of rotating target detection , Due to the addition of angle regression , Make the problem of inconsistency more prominent , But there is still no good solution , The paper also lists some examples to compare IoU Loss and smooth L1 Loss :
- Case 1: Relationship between angle difference and loss value , Curve geometry is monotonic , But only smooth L1 A curve is a convex curve , It can be optimized to the global optimal solution .
- Case 2: The relationship between the difference in aspect ratio and the loss value ,smooth-l1 The loss value is fixed ( Mainly from the angle difference ), and IoU The loss varies dramatically with the horizontal axis .
- Case 3: Influence of center point offset on loss value , The curves are monotonic , but smooth L1 The curve is not highly consistent with the difference .
From the above analysis, we can see that , In the field of rotating target detection ,IoU Loss can better fill the difference between judgment criteria and regression loss . But it's a pity. , In the field of rotating target detection , Two rotations bbox Between the IoU Computation is nondifferentiable , Can't be used for training . So , This paper is based on Wasserstein distance Propose a differentiable loss to replace IoU Loss , By the way, it can also solve the discontinuous problem of rotation angle regression interval and the square problem .
Boundary Discontinuity and Square-Like Problem( Rotation angle regression interval discontinuity and square problem )
The image above Case1-2 The problem of interval discontinuity of rotation angle regression is summarized , With OpenCV Formal Case 2 For example , about anchor$(0,0,70,10,-90^{\circ})$ as well as GT$(0,0,10,70,-25^{\circ})$, There are two methods of regression :
- way1 Rotate counterclockwise for a small angle , The predicted result is $(0,0,70,10,-115^{\circ})$, But because of the periodicity of the angle (PoA) And edge order exchange (EoE), If you use smooth L1 Loss function , This result is related to GT Will produce a huge loss of value . in addition , This angle is also beyond the predetermined angle range .
- choice way2 You need to scale the width and height at the same time , Rotate clockwise by a large angle .
The above problems usually occur in anchor and GT When the angle of is at the boundary of the angle range , When anchor and GT When the angle of is not at the boundary position ,way1 There will be no huge loss value . therefore , about smooth-L1, The optimal treatment of boundary angle and non boundary angle will be too consistent , This will also hinder the training of the model .
The square problem mainly occurs in the detection method using the long side form , Because the square target has no absolute long side , The expression of the long side form to the square object itself is not unique . With Case3 For example , There is anchor$(0,0,45,44,0^{\circ})$ as well as GT$(0,0,45,43,-60^{\circ})$,way1 It can be rotated clockwise by a small angle to become the position and GT coincident $(0,0,45,43,30^{\circ})$. However, due to the large angle difference ,way1 There will be a higher regression loss . therefore , Need to look like way2 Then rotate it counterclockwise for a large angle . The main reason for the square problem is not mentioned above PoA and EoE, It is the inconsistency between the measurement standard and the loss calculation .
The Proposed Method
After the above analysis , The paper hopes that the regression loss function of the new rotating target detection method can satisfy the following points :
- Requirement1: And IoU The metrics are highly consistent .
- Requirement2: It's very small , Allow direct learning .
- Requirement3: Smoother the boundary scene of the angle regression range .
Wasserstein Distance for Rotating Box
At present, most IoU Loss can be considered as a function of distance , Based on this , This paper is based on Wasserstein distance A new regression loss function is proposed . First , Rotate bbox$\mathcal{B}(x,y,h,w,\theta)$ Turn into 2-D Gaussian distribution $\mathcal{N}(m,\sum)$:
$R$ For the rotation matrix ,$S$ Is the diagonal vector of the eigenvalue . about $\mathbb{R}^n$ Any two probability measures on $\mu$ and $\upsilon$, Its Wasserstein distance $W$ It can be expressed as :
The formula 2 For all random vector combinations $(X,Y)\in\mathbb{R}^n\times\mathbb{R}^n,X\sim\mu,Y\sim\upsilon$ Calculate , Put in the Gaussian distribution $d:=W(\mathcal{N}(m_1,\sum_1);\mathcal{N}(m_2,\sum_2))$, Convert to get :
Special attention :
Consider the exchangeable case ( Horizontal target detection )$\sum_1\sum_2=\sum_2\sum_1$ Next , The formula 3 Convertible to :
$\parallel\parallel_F$ by Frobenius norm , there bbox They are all horizontal , The formula 5 Approximate to $l_2$-norm Loss , indicate Wasserstein Distance is consistent with the loss commonly used in horizontal detection tasks , Can be used to regress losses . The formula calculation here is more complicated , If you are interested, you can look at the references .
Gaussian Wasserstein Distance Regression Loss
The paper adopts nonlinear transformation function $f$ take GWD It maps to $\frac{1}{\tau+f(d^2)}$, Get something similar to IoU Loss function :
The previous graph also describes the use of different nonlinear functions $f$ Loss function curve under , You can see the formula 6 Very close to IoU Loss curve , Can also measure non intersecting bbox. therefore , The formula 6 Obviously, it can satisfy Requirement1 and Requirement2, Let's start the analysis Requirement3, First give the formula 1 Characteristics of :
According to the characteristics 1 You know ,GWD The loss function is for OpenCV Form and long side form are equal , That is, the model does not need to fix a specific bbox Expression training . With Case2 Of Way1 For example ,GT$(0,0,70,10,65^{\circ})$ And forecasting $(0,0,70,10,-115^{\circ})$ Have the same mean $m$ And variance $\sum$,GWD The loss function does not output large loss values . And according to the characteristics 2 And characteristics 3,Case2 and Case3 Of way1 Similarly, it will not produce large loss value , therefore GWD The loss function also satisfies Requirement3.
As a whole ,GWD The advantages of rotating target detection are as follows :
- GWD bring bbox The different definitions of are equal , Ensure that the model can focus on improving the accuracy , There is no need to worry bbox Definition form of .
- GWD It's differentiable IoU Loss alternatives , It is highly consistent with the detection index . and ,GWD No intersection can be measured bbox Distance between , Be similar to GIoU and DIoU Characteristics of .
- GWD The problem of discontinuity and squareness in the regression interval of rotation angle is avoided , It reduces the learning difficulty of the model .
Overall Loss Function Design
The paper will RetinaNet As a basic detector ,bbox Expressed as $(x,y,w,h,\theta)$, The experiment mainly uses OpenCV form , Regression goals are defined as :
Variable $x$,$x_a$,$x^{*}$ Distribution representative GT,anchor And forecast results , The final multitask loss function is :
$N$ by anchor Count ,$objn$ An indicator of the foreground or background ,$b_n$ For forecast bbox,$gt_n$ by GT,$t_n$ by GT The label of ,$p_n$ Label the forecast ,$\lambda_1=1$ and $\lambda_2=2$ Is a super parameter ,$L{cls}$ by focal loss.
Experiments
Compare other solutions to specific problems .
stay DOTA Compare multiple models on a dataset , There are many other experiments in the paper , Those who are interested can go and have a look .
Conclusion
This paper describes the main problems of rotating target detection in detail , The rotating regression target is defined as Gaussian distribution , Use Wasserstein Distance measures the distance between Gaussian distributions for training . at present , Conventional target detection also has many methods to transform regression into probability distribution function , This article has the same merits , Worth reading .
Reference Content
- Wasserstein distance between two Gaussians - https://djalil.chafai.net/blog/2010/04/30/wasserstein-distance-between-two-gaussians/ undefined
If this article helps you , Please give me a compliment or watch it ~
边栏推荐
- Mysql database: partition
- Mysql database: the difference between drop, truncate and delete
- wirehark数据分析与取证infiltration.pacapng
- 云原生爱好者周刊:炫酷的 Grafana 监控面板集合
- 还天天熬夜加班做报表?其实你根本不懂如何高效做报表
- 股票开户安全吗?上海股票开户。
- Does rapid software delivery really need to be at the cost of security?
- Underlying principles of file operations (file descriptors and buffers)
- Kubernetes architecture that novices must know
- Nacos-配置中心基本使用
猜你喜欢

Free PDF to word software sharing, these software must know!

Detailed description of gaussdb (DWS) complex and diverse resource load management methods

26 years old, 0 basic career change software test, from 3K to 16K monthly salary, a super complete learning guide compiled by me

论文浅尝 | KR-GCN: 知识感知推理的可解释推荐系统

Underlying principles of file operations (file descriptors and buffers)

AI scene Storage Optimization: yunzhisheng supercomputing platform storage practice based on juicefs

Touch key and key control corresponding LED status reversal

Optional类的高级使用

Kr-gcn: an interpretable recommendation system based on knowledge aware reasoning
![[php8+oracle11g+windows environment without tools] Intranet / no network /win10/php connecting to Oracle database instance](/img/72/214ee6d3842f393164cc93bb387926.png)
[php8+oracle11g+windows environment without tools] Intranet / no network /win10/php connecting to Oracle database instance
随机推荐
0. grpc environment setup
Go zero micro Service Practice Series (VII. How to optimize such a high demand)
The server quickly sets up the alist integrated network disk website [pagoda panel one click deployment of alist]
Digital tracking analysis of insurance services in the first quarter of 2022
Mysql database: the difference between drop, truncate and delete
Code sharing for making and developing small programs on the dating platform
Talk about auto in MySQL in detail_ What is the function of increment
MySQL 锁常见知识点&面试题总结
static关键字续、继承、重写、多态
请教一下,CDC2.2.1可以同时监听多个pgsql 的库吗?
STM32 basic knowledge points
[proteus simulation] digital tube display of stepping motor speed
Moosefs tuning notes
还天天熬夜加班做报表?其实你根本不懂如何高效做报表
mysql备份数据库linux
Just like our previous views on the Internet, our understanding of the Internet began to become deeper
触摸按键与按键控制对应的LED状态翻转
The MySQL data cannot be read after the checkpoint recovery. Do you know the reason
Daily mathematics serial 54: February 23
Cloud native enthusiast weekly: cool collection of grafana monitoring panels