当前位置:网站首页>Talking about label smoothing technology
Talking about label smoothing technology
2022-07-05 09:06:00 【aelum】
Author's brief introduction : Non Coban transcoding , We are constantly enriching our technology stack
️ Blog home page :https://raelum.blog.csdn.net
Main areas :NLP、RS、GNN
If this article helps you , Can pay attention to ️ + give the thumbs-up + Collection + Leaving a message. , This will be the biggest motivation for my creation

Catalog
One 、 from One-Hot To Label Smoothing
Consider the cross entropy loss of a single sample
H ( p , q ) = − ∑ i = 1 C p i log q i H(p,q)=-\sum_{i=1}^C p_i\log q_i H(p,q)=−i=1∑Cpilogqi
among C C C Represents the number of categories , p i p_i pi It's a real distribution ( namely target), q i q_i qi Is the predicted distribution ( That is, the output of neural network prediction).
If the real distribution adopts the traditional One-Hot vector , Then its component is not 0 0 0 namely 1 1 1. We might as well set a second k k k A place is 1 1 1, The rest are 0 0 0, At this point, the cross entropy loss becomes
H ( p , q ) = − log q k H(p,q)=-\log q_k H(p,q)=−logqk
It is not difficult to find some problems from the above expression :
- The relationship between real tags and other tags is ignored , Some useful knowledge cannot be learned ;
- One-Hot Tends to make the model overconfident (Overconfidence), It is easy to cause over fitting , This leads to the degradation of generalization performance ;
- Mislabeled samples ( namely
targeterror ) It is easier to affect the training of the model ; - One-Hot Yes “ ready to accept either course ” The sample characterization of is poor .
The way to alleviate these problems is to adopt Label Smoothing Technology , It is also a regularization technique , As follows :
p i : = { 1 − ϵ , i = k ϵ / ( C − 1 ) , i ≠ k p_i:= \begin{cases} 1-\epsilon,& i=k \\ \epsilon/(C-1),&i\neq k\\ \end{cases} pi:={ 1−ϵ,ϵ/(C−1),i=ki=k
among ϵ \epsilon ϵ Is a small positive number .
for example , Set original target by [ 0 , 0 , 1 , 0 , 0 , 0 ] [0,0,1,0,0,0] [0,0,1,0,0,0], take ϵ = 0.1 \epsilon=0.1 ϵ=0.1, Past the Label Smoothing after target Turn into [ 0.02 , 0.02 , 0.9 , 0.02 , 0.02 , 0.02 ] [0.02,0.02,0.9,0.02,0.02,0.02] [0.02,0.02,0.9,0.02,0.02,0.02].
The original One-Hot Vectors are often called Hard Target( or Hard Label), After the label is smoothed, it is usually called Soft Target( or Soft Label)
Two 、Label Smoothing Simple implementation of
import torch
def label_smoothing(label, eps):
label[label == 1] = 1 - eps
label[label == 0] = eps / (len(label) - 1)
return label
a = torch.tensor([0, 0, 1, 0, 0, 0], dtype=torch.float)
print(label_smoothing(a, 0.1))
# tensor([0.0200, 0.0200, 0.9000, 0.0200, 0.0200, 0.0200])
3、 ... and 、Label Smoothing Advantages and disadvantages
advantage :
- To some extent, it can alleviate the model Overconfidence The problem of , In addition, it also has certain anti noise ability ;
- Provides the relationship between categories in training data ( Data to enhance );
- It may enhance the generalization ability of the model to a certain extent .
shortcoming :
- Simply add random noise , It can't reflect the relationship between labels , Therefore, the improvement of the model is limited , There is even a risk of under fitting ;
- In some cases Soft Label It doesn't help us build better Neural Networks ( Not as good as Hard Label).
Four 、 When to use Label Smoothing?
- Huge data sets inevitably have noise ( That is, the marking is wrong ), In order to avoid the noise learned by the model, you can add Label Smoothing;
- For fuzzy case In general, we can introduce Label Smoothing( For example, in the cat and dog classification task , There may be some pictures that look like both dogs and cats );
- Prevention model Overconfidence.
边栏推荐
- My experience from technology to product manager
- 交通运输部、教育部:广泛开展水上交通安全宣传和防溺水安全提醒
- OpenFeign
- Golang foundation -- map, array and slice store different types of data
- C#【必备技能篇】ConfigurationManager 类的使用(文件App.config的使用)
- Ros-10 roslaunch summary
- Use arm neon operation to improve memory copy speed
- ECMAScript6介绍及环境搭建
- Editor use of VI and VIM
- MPSoC QSPI Flash 升级办法
猜你喜欢

Install the CPU version of tensorflow+cuda+cudnn (ultra detailed)

Programming implementation of subscriber node of ROS learning 3 subscriber

Count of C # LINQ source code analysis

深度学习模型与湿实验的结合,有望用于代谢通量分析

Introduction Guide to stereo vision (7): stereo matching

混淆矩阵(Confusion Matrix)

Applet (subcontracting)

Programming implementation of ROS learning 2 publisher node
![Introduction Guide to stereo vision (3): Zhang calibration method of camera calibration [ultra detailed and worthy of collection]](/img/d8/39020b1ce174299f60b6f278ae0b91.jpg)
Introduction Guide to stereo vision (3): Zhang calibration method of camera calibration [ultra detailed and worthy of collection]

Blogger article navigation (classified, real-time update, permanent top)
随机推荐
[Niuke brush questions day4] jz55 depth of binary tree
优先级队列(堆)
Halcon Chinese character recognition
多元线性回归(梯度下降法)
Array,Date,String 对象方法
Beautiful soup parsing and extracting data
Install the CPU version of tensorflow+cuda+cudnn (ultra detailed)
It cold knowledge (updating ing~)
[code practice] [stereo matching series] Classic ad census: (5) scan line optimization
golang 基础 ——map、数组、切片 存放不同类型的数据
信息与熵,你想知道的都在这里了
notepad++
Attention is all you need
交通运输部、教育部:广泛开展水上交通安全宣传和防溺水安全提醒
Programming implementation of ROS learning 2 publisher node
AdaBoost use
Rebuild my 3D world [open source] [serialization-1]
Codeforces Round #648 (Div. 2) E.Maximum Subsequence Value
Halcon clolor_ pieces. Hedv: classifier_ Color recognition
Applet (use of NPM package)