当前位置:网站首页>"Parse" focalloss to solve the problem of data imbalance
"Parse" focalloss to solve the problem of data imbalance
2022-07-07 06:18:00 【ViatorSun】
FocalLoss Appearance , Mainly to solve anchor-based (one-stage) Classification of target detection networks . Later instance segmentation is also often used .
Be careful
Here is Classification of target detection networks , It is not a simple classification problem , The two are different .
The difference lies in , For the distribution problem , A picture must belong to a certain class ; And the classification in the detection task , There are a lot of anchor Aimless ( It can be called negative sample ).
Classification task
natural K Class classification task
The label of , Use one. K Length vector as label , use one-hot( perhaps +smooth, Don't think about it here ) To code , The final label is a shape like [1,…, 0, …, 0] In this way . So if you want to separate the background , Naturally, you can think of adding one 1 dimension , If the target detection task has K class , Here just use K+1 Dimension represents classification , among 1 Dimension represents no goal . For classified tasks , Finally, it is generally used softmax Laiguiyi , Make the output of all categories add up to 1.
But in the detection task , For aimless anchor, We don't want the final result to add up to 1, Instead, all probability outputs are 0. Then it can be like this , We regard a multi classification task as Multiple binary tasks (sigmoid), For each category , I output a probability , If you are close to 0 It means that it is not in this category , If you are close to 1, It stands for this anchor Is this category .
So the network output does not need to use softmax Laiguiyi , It's right K Each component of the length vector sigmoid Activate , Let its output value represent the probability of two classifications . For aimless anchor,gt All components in are 0, The probability of belonging to each class is 0, That is, marked as background .
thus ,FocalLoss The problem to be solved is not a multi classification problem , It is Multiple binary classification problems .
Formula analysis
First look at the formula : Only label y = 1 y=1 y=1 when , The formula / Cross entropy makes sense , p t p_t pt That is, the label is 1 The predicted value corresponding to / Probability of correct model classification
p t = ( 1 − p r e d _ s i g m o i d ) ∗ t a r g e t + p r e d _ s i g m o i d ∗ ( 1 − t a r g e t ) p_t = (1 - pred\_sigmoid) * target + pred\_sigmoid * (1 - target) pt=(1−pred_sigmoid)∗target+pred_sigmoid∗(1−target)
C E ( p t ) = − α t log ( p t ) F L ( p t ) = − α t ( 1 − p t ) γ log ( p t ) F L ( p ) = { − α ( 1 − p ) γ log ( p ) , i f y = 1 − ( 1 − α ) p γ log ( 1 − p ) , i f y = 0 CE(p_t)=-\alpha_t \log(p_t) \\ \quad \\ FL(p_t)=-\alpha_t(1-p_t)^\gamma \log(p_t) \\ \quad \\ FL(p) = \begin{cases} \quad -\alpha(1-p)^\gamma \log(p) &, if \quad y=1 &\\ -(1-\alpha)p^\gamma \log(1-p)&,if \quad y=0 \end{cases} CE(pt)=−αtlog(pt)FL(pt)=−αt(1−pt)γlog(pt)FL(p)={ −α(1−p)γlog(p)−(1−α)pγlog(1−p),ify=1,ify=0
- Parameters p[ The formula 3]: When p->0 when ( The probability is very low / It is difficult to distinguish which category ), Modulation factor (1-p) near 1, The loss is not affected , When p->1 when ,(1-p) near 0, So as to reduce the total number of easy samples loss The contribution of
- Parameters γ \gamma γ: When γ = 0 \gamma=0 γ=0 when ,Focal loss Is the traditional cross entropy ,
When γ \gamma γ increases , Adjustment factor ( 1 − p t ) (1-p_t) (1−pt) Also increases .
When γ \gamma γ When it is a fixed value , such as γ = 2 \gamma=2 γ=2 ️ about easy example(p>0.5) p=0.9 Of loss Smaller than the standard cross entropy 100 times , When p=0.968 when , smaller 1000+ times ;️ about hard example(p<0.5) loss smaller 4 times
In this case , hard example The weight of has increased a lot , Thus increasing the importance of what misclassification .
Experiments show that , γ = 2 , α = 0.75 \gamma=2,\alpha=0.75 γ=2,α=0.75 It works best when - α \alpha α Adjust the unbalance coefficient of positive and negative samples , γ \gamma γ Control the imbalance of difficult and easy samples
Code reappearance
In the official code , did not target = F.one_hot(target, num_clas)
This line of code , This is because
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
import torch
from torch.nn import functional as F
def sigmoid_focal_loss( inputs: torch.Tensor, targets: torch.Tensor, alpha: float = -1,
gamma: float = 2, reduction: str = "none") -> torch.Tensor:
inputs = inputs.float()
targets = targets.float()
p = torch.sigmoid(inputs)
target = F.one_hot(target, num_clas+1)
ce_loss = F.binary_cross_entropy_with_logits(inputs, targets, reduction="none")
p_t = p * targets + (1 - p) * (1 - targets)
loss = ce_loss * ((1 - p_t) ** gamma)
if alpha >= 0:
alpha_t = alpha * targets + (1 - alpha) * (1 - targets)
loss = alpha_t * loss
if reduction == "mean":
loss = loss.mean()
elif reduction == "sum":
loss = loss.sum()
return loss
sigmoid_focal_loss_jit: "torch.jit.ScriptModule" = torch.jit.script(sigmoid_focal_loss)
Besides ,torchvision China also supports focal loss
Complete code
Official full code :https://github.com/facebookresearch/
Reference resources
- https://zhuanlan.zhihu.com/p/391186824
边栏推荐
- Go language learning notes - Gorm use - Gorm processing errors | web framework gin (10)
- Crudini 配置文件编辑工具
- 一个简单的代数问题的求解
- 进程间通信之共享内存
- Experience of Niuke SQL
- Flask1.1.4 werkzeug1.0.1 source code analysis: start the process
- PTA ladder game exercise set l2-004 search tree judgment
- 3428. Put apples
- Loss function and positive and negative sample allocation in target detection: retinanet and focal loss
- Qtthread, one of many methods of QT multithreading
猜你喜欢
JVM命令之 jstack:打印JVM中线程快照
Chain storage of stack
Jcmd of JVM command: multifunctional command line
Detailed explanation of platform device driver architecture in driver development
The solution of a simple algebraic problem
话说SQLyog欺骗了我!
Bypass open_ basedir
Go语学习笔记 - gorm使用 - gorm处理错误 | Web框架Gin(十)
go-microservice-simple(2) go-Probuffer
Red hat install kernel header file
随机推荐
Check Point:企业部署零信任网络(ZTNA)的核心要素
Test the foundation of development, and teach you to prepare for a fully functional web platform environment
Cf:c. column swapping [sort + simulate]
[cloud native] what is the microservice architecture?
PowerPivot - DAX (function)
JVM命令之 jinfo:实时查看和修改JVM配置参数
Dc-7 target
JVM命令之 jstack:打印JVM中线程快照
Jmeter自带函数不够用?不如自己动手开发一个
从“跑分神器”到数据平台,鲁大师开启演进之路
How to keep accounts of expenses in life
一个简单的代数问题的求解
骑士战胜魔王(背包&dp)
Check point: the core element for enterprises to deploy zero trust network (ztna)
Jstat of JVM command: View JVM statistics
Jstat pour la commande JVM: voir les statistiques JVM
Apple CMS V10 template /mxone Pro adaptive film and television website template
Jinfo of JVM command: view and modify JVM configuration parameters in real time
@pathvariable 和 @Requestparam的详细区别
cf:C. Column Swapping【排序 + 模擬】