当前位置:网站首页>Talking about label smoothing technology
Talking about label smoothing technology
2022-07-05 09:06:00 【aelum】
Author's brief introduction : Non Coban transcoding , We are constantly enriching our technology stack
️ Blog home page :https://raelum.blog.csdn.net
Main areas :NLP、RS、GNN
If this article helps you , Can pay attention to ️ + give the thumbs-up + Collection + Leaving a message. , This will be the biggest motivation for my creation
Catalog
One 、 from One-Hot To Label Smoothing
Consider the cross entropy loss of a single sample
H ( p , q ) = − ∑ i = 1 C p i log q i H(p,q)=-\sum_{i=1}^C p_i\log q_i H(p,q)=−i=1∑Cpilogqi
among C C C Represents the number of categories , p i p_i pi It's a real distribution ( namely target
), q i q_i qi Is the predicted distribution ( That is, the output of neural network prediction
).
If the real distribution adopts the traditional One-Hot vector , Then its component is not 0 0 0 namely 1 1 1. We might as well set a second k k k A place is 1 1 1, The rest are 0 0 0, At this point, the cross entropy loss becomes
H ( p , q ) = − log q k H(p,q)=-\log q_k H(p,q)=−logqk
It is not difficult to find some problems from the above expression :
- The relationship between real tags and other tags is ignored , Some useful knowledge cannot be learned ;
- One-Hot Tends to make the model overconfident (Overconfidence), It is easy to cause over fitting , This leads to the degradation of generalization performance ;
- Mislabeled samples ( namely
target
error ) It is easier to affect the training of the model ; - One-Hot Yes “ ready to accept either course ” The sample characterization of is poor .
The way to alleviate these problems is to adopt Label Smoothing Technology , It is also a regularization technique , As follows :
p i : = { 1 − ϵ , i = k ϵ / ( C − 1 ) , i ≠ k p_i:= \begin{cases} 1-\epsilon,& i=k \\ \epsilon/(C-1),&i\neq k\\ \end{cases} pi:={ 1−ϵ,ϵ/(C−1),i=ki=k
among ϵ \epsilon ϵ Is a small positive number .
for example , Set original target
by [ 0 , 0 , 1 , 0 , 0 , 0 ] [0,0,1,0,0,0] [0,0,1,0,0,0], take ϵ = 0.1 \epsilon=0.1 ϵ=0.1, Past the Label Smoothing after target
Turn into [ 0.02 , 0.02 , 0.9 , 0.02 , 0.02 , 0.02 ] [0.02,0.02,0.9,0.02,0.02,0.02] [0.02,0.02,0.9,0.02,0.02,0.02].
The original One-Hot Vectors are often called Hard Target( or Hard Label), After the label is smoothed, it is usually called Soft Target( or Soft Label)
Two 、Label Smoothing Simple implementation of
import torch
def label_smoothing(label, eps):
label[label == 1] = 1 - eps
label[label == 0] = eps / (len(label) - 1)
return label
a = torch.tensor([0, 0, 1, 0, 0, 0], dtype=torch.float)
print(label_smoothing(a, 0.1))
# tensor([0.0200, 0.0200, 0.9000, 0.0200, 0.0200, 0.0200])
3、 ... and 、Label Smoothing Advantages and disadvantages
advantage :
- To some extent, it can alleviate the model Overconfidence The problem of , In addition, it also has certain anti noise ability ;
- Provides the relationship between categories in training data ( Data to enhance );
- It may enhance the generalization ability of the model to a certain extent .
shortcoming :
- Simply add random noise , It can't reflect the relationship between labels , Therefore, the improvement of the model is limited , There is even a risk of under fitting ;
- In some cases Soft Label It doesn't help us build better Neural Networks ( Not as good as Hard Label).
Four 、 When to use Label Smoothing?
- Huge data sets inevitably have noise ( That is, the marking is wrong ), In order to avoid the noise learned by the model, you can add Label Smoothing;
- For fuzzy case In general, we can introduce Label Smoothing( For example, in the cat and dog classification task , There may be some pictures that look like both dogs and cats );
- Prevention model Overconfidence.
边栏推荐
- JS asynchronous error handling
- Multiple linear regression (gradient descent method)
- Halcon color recognition_ fuses. hdev:classify fuses by color
- Introduction Guide to stereo vision (2): key matrix (essential matrix, basic matrix, homography matrix)
- [daiy4] jz32 print binary tree from top to bottom
- Causes and appropriate analysis of possible errors in seq2seq code of "hands on learning in depth"
- Jenkins Pipeline 方法(函数)定义及调用
- Halcon: check of blob analysis_ Blister capsule detection
- [daiy4] copy of JZ35 complex linked list
- Array, date, string object method
猜你喜欢
The combination of deep learning model and wet experiment is expected to be used for metabolic flux analysis
Introduction Guide to stereo vision (2): key matrix (essential matrix, basic matrix, homography matrix)
[beauty of algebra] singular value decomposition (SVD) and its application to linear least squares solution ax=b
Codeworks round 639 (Div. 2) cute new problem solution
Use and programming method of ros-8 parameters
Halcon Chinese character recognition
Halcon blob analysis (ball.hdev)
容易混淆的基本概念 成员变量 局部变量 全局变量
深度学习模型与湿实验的结合,有望用于代谢通量分析
RT-Thread内核快速入门,内核实现与应用开发学习随笔记
随机推荐
Codeworks round 639 (Div. 2) cute new problem solution
kubeadm系列-02-kubelet的配置和启动
.NET服务治理之限流中间件-FireflySoft.RateLimit
我从技术到产品经理的几点体会
Luo Gu p3177 tree coloring [deeply understand the cycle sequence of knapsack on tree]
The location search property gets the login user name
Halcon wood texture recognition
Blue Bridge Cup provincial match simulation question 9 (MST)
Confusion matrix
[Niuke brush questions day4] jz55 depth of binary tree
uni-app 实现全局变量
Solution to the problems of the 17th Zhejiang University City College Program Design Competition (synchronized competition)
Mengxin summary of LIS (longest ascending subsequence) topics
【PyTorch Bug】RuntimeError: Boolean value of Tensor with more than one value is ambiguous
OpenFeign
nodejs_ 01_ fs. readFile
Multiple linear regression (sklearn method)
Generate confrontation network
多元线性回归(sklearn法)
一题多解,ASP.NET Core应用启动初始化的N种方案[上篇]