当前位置:网站首页>Focus loss explanation
Focus loss explanation
2022-07-28 01:12:00 【@BangBang】
1. summary
The paper :Focal Loss for Dense Object Detection
Focal Loss There is a lot of controversy on the Internet , Some people think that Focal Loss Useful , Some people think it doesn't work
stay Yolov3 The author tries to use Focal Loss, But after using it, the author found its mAP It's down 2 A little bit , So the author is also curious .

The original paper gives Focal Loss A set of parameters for , first line r = 0 r=0 r=0 No use Focal Loss, AP by 31.1 , But with Focal Loss Then it reached 34.0, Compared to not using Focal Loss Promoted 3 A little bit , The effect is quite obvious .
The author of the paper mentioned Focal Loss Mainly aimed at One-Stage Target detected , about one-stage Models will face class imblance This problem , That is, the imbalance between positive and negative samples . A candidate box that can match the target in an image ( Positive sample ) The number is usually only a dozen or dozens , There is no matching candidate box ( Negative sample ) There are about 10^4~10^5
You can see through the picture above , The red box doesn't match the target , And the yellow one is the matching target box . So when matching positive and negative samples, most of them are not matched , That is, it belongs to the category of negative samples , Positive samples are actually very few .
There are bound to be questions here , Why? two stage The network did not hear the problem of category imbalance .
- Personal understanding ,two stage A two-step , In the first step, the category is unbalanced , It must also exist . But the final result is to determine the final coordinates of our target and whether it is the target through the second stage of detection . And through our first stage, for example
Faster RCNNadoptRPNThen we finally provide the target number of the second stage network2000Multiple ,RPNIn addition to filtering the target box with relatively good quality , The probability that the target box may be a positive sample increases , andOne-stageThere are tens of thousands 、 More than 100000 , abouttwo stageIn the second stage, it also has the problem of imbalance between positive and negative samples , But compared toOne stageIt's much better , So our paper puts forwardFocal LossMainly aimed atone stageOnline . - in the light of
one stageIn this 1 0 4 10^4 104 1 0 5 10^5 105 Unmatched target candidate boxMost of them are simple and easy to divide negative samples ( It has no effect on network training , However, too many samples will drown out a small number of samples that are conducive to training ), So if you choose all the sample training networks directly , The effect will be very bad - Before we
one-stageThe network also screens positive and negative samples , Namelyhard negative mining, Not all negative samples will be used to train the network , Instead, choose those who have a greater impact on the loss to train the network , This can really achieve a better effect .
In the table above , The author also made a series of comparisons . The data in the front column of the table ishard negative miningMethod , Take our positive and negative samples . But if we directly use theFocal LossWe will find that , Its effect is still very good . be relative tohard negative miningMethod ,AP Improved 3 A little bit .Focal LossWhy is the effect better , We will introduce its theory later .
2. Focal Loss
It is mentioned in the paper that Focal Loss To solve the problem one stage Extreme imbalance between foreground and background samples in target detection , such as (1:1000). For dichotomous cross entropy loss , Its formula is as follows :
In order to simplify the , We define p t p_t pt:
be :
C E ( p , y ) = C E ( p t ) = − l o g ( p t ) CE(p,y)=CE(p_t)= -log(p_t) CE(p,y)=CE(pt)=−log(pt)
Equilibrium cross entropy
The common method to solve the sample imbalance is , Introduce weight factor a a a, a a a stay [0,1] Section . When y=1 When it is a positive sample, it is a a a, When it is a negative sample, it is 1 − a 1-a 1−a, So it adds a a a The loss of the factor can be written as follows :
C E ( p t ) = a t l o g ( p t ) CE(p_t)=a_tlog(p_t) CE(pt)=atlog(pt)
In the picture, the author does an experiment , stay a = 0.75 a=0.75 a=0.75 When , The best effect . So it can be seen that a a a It's not the proportion of positive and negative samples , Because the ratio of positive and negative samples is very small , May be 1:1000, instead of 0.75
Focal Loss Definition
a a a Is to balance the weight of positive and negative samples , But it does not distinguish between easy samples and difficult samples . So the author improved the loss function , It can reduce the weight of easy samples , So we can focus on hard negative sample ( Indistinguishable negative sample ) Training for . The author introduces a new coefficient ( 1 − P t ) r (1-P_t)^r (1−Pt)r ,Focal Loss For the definition of :
F L ( p t ) = − ( 1 − p t ) r l o g ( p t ) FL(p_t)=-(1-p_t)^rlog(p_t) FL(pt)=−(1−pt)rlog(pt)
( 1 − p t ) r (1-p_t)^r (1−pt)r It can reduce the loss contribution of easy samples 
When r = 0 r=0 r=0 when , Become our most primitive C E ( p t ) = − l o g ( p t ) CE(p_t)=-log(p_t) CE(pt)=−log(pt), Corresponding to the blue curve in the figure ,
The abscissa in the figure represents p t p_t pt, If the sample is positive , p t = p p_t=p pt=p, We hope p The bigger the better ,p The higher the probability, the more accurate the prediction target . For negative samples ,p The smaller the better. , That is to say 1-p The bigger the better , Therefore, we can see that whether the positive sample or the negative sample, we hope p t p_t pt The bigger the better .
We can see p t p_t pt Values in [0.6 1] Section , It belongs to the situation with better classification , In fact, the samples with good classification are easy to classify , In fact, we don't need to put much weight on these simple samples . From the picture we can see when r > 0 r>0 r>0 When , When r = 1 , 2 , 5 r=1,2,5 r=1,2,5 When , With p t p_t pt The increase of , It is falling faster and faster , When r r r The bigger it is , The smaller the weight of easy samples .
In practical application, we are Focal Loss use a a a To balance the sample , The final Focal Loss as follows :
F L ( p t ) = − a t ( 1 − p t ) r l o g ( p t ) FL(p_t)=-a_t(1-p_t)^rlog(p_t) FL(pt)=−at(1−pt)rlog(pt)
Yes p t p_t pt Expanded Focal Loss Form the following :
Use Focal Loss Then we will focus more on the difficult samples . For simple samples ,Focal Loss Will reduce its loss weight .
In the use of Focal Loss When , Try to mark the training set correctly , Don't make mistakes , If the marking is wrong , It must be a hard sample to learn ,Focal Loss It may be crazy to learn from the samples with wrong annotation , The effect of the model is getting worse .
边栏推荐
- 0-1背包问题
- Network device hard core technology insider firewall and security gateway (IX) virtualization artifact (II)
- Demo: the test interface receives duplicate data and creates documents in a short time
- Leetcode - find the median of two positively ordered arrays
- c# 反射之Type使用
- 深度刨析数据在内存中的存储
- The most detailed summary of common English terms in the chip industry (quick grasp of graphics and text)
- C语言程序设计 | 单身狗题目讲解
- C language programming | single dog topic explanation
- Cross desktop web container evolution
猜你喜欢

【STM32】看门狗模块

浏览器视频帧操作方法 requestVideoFrameCallback() 简介

Jointly create a new chapter in cultural tourism | xinqidian signs a strategic cooperation agreement with Guohua cultural tourism

Array related knowledge
![Thesis appreciation [iclr18] a neural language model combining syntax and vocabulary learning](/img/1c/5b9726b16f67dfc2016a0c2035baae.png)
Thesis appreciation [iclr18] a neural language model combining syntax and vocabulary learning

Postman下载、使用教程

UML类图的六大关系,最佳学习理解方式

Go language variable

110. SAP UI5 FileUploader 控件深入介绍 - 为什么需要一个隐藏的 iframe

共创文旅新篇章|新起典与国华文旅签订战略合作协议
随机推荐
Iperf installation and use
Recommended system model: DSSM twin tower model for embedded recall
论文赏析[ICLR18]联合句法和词汇学习的神经语言模型
浏览器视频帧操作方法 requestVideoFrameCallback() 简介
诺基亚宣布与博通合作开发5G芯片
R language uses ggplot2 visualization: use ggpattern package to add custom stripe patterns, shadows, stripes, or other patterns or textures to the grouped bar graph
Database daily question --- day 22: last login
Swear, swear, swear
[STM32] watchdog module
深度刨析数据在内存中的存储
If a table has nearly ten million data and crud is relatively slow, how to optimize it
0-1 knapsack problem
Vandermond convolution learning notes
What is the reason for Chinese garbled code when dataworks transmits data to MySQL
文件系统挂载
Go language variable
Swoole定时器
推荐系统-模型(三):精排模型【LR、GBDT、Wide&Deep、DCN、DIN、DIEN、MMOE、PLE】
SRv6初登场
一周年创作纪念日,冲吧少年郎