当前位置:网站首页>[in-depth learning] review pytoch's 19 loss functions
[in-depth learning] review pytoch's 19 loss functions
2022-07-04 20:27:00 【Demeanor 78】
Just for academic sharing , It does not represent the position of the official account , Deletion of infringement contact
Reproduced in : author :mingo_ Sensitive
Link to the original text :https://blog.csdn.net/shanglianlm/article/details/85019768
Reading guide
Nineteen loss functions are summarized in this paper , Its mathematical formula and code implementation are introduced , I hope you can master .
01
Basic usage
criterion = LossCriterion() # Constructors have their own arguments
loss = criterion(x, y) # There are also parameters when calling the standard
02
Loss function
2-1 L1 Norm loss L1Loss
Calculation output and target The absolute value of the difference .
torch.nn.L1Loss(reduction='mean')
Parameters :
reduction- Three values ,none: Do not use reduction ;mean: return loss The average of and ;sum: return loss And . Default :mean.
2-2 Loss of mean square error MSELoss
Calculation output and target The mean square error of the difference between .
torch.nn.MSELoss(reduction='mean')
Parameters :
reduction- Three values ,none: Do not use reduction ;mean: return loss The average of and ;sum: return loss And . Default :mean.
2-3 Cross entropy loss CrossEntropyLoss
When training has C It's very effective when it comes to the classification of two categories . Optional parameters weight Must be a 1 dimension Tensor, Weights will be assigned to each category . Very effective for imbalanced training sets .
In multi category tasks , Always use softmax Activation function + Cross entropy loss function , Because cross entropy describes the difference between two probability distributions , But the neural network outputs vectors , It's not in the form of a probability distribution . So we need to softmax The activation function performs a vector “ normalization ” In the form of probability distribution , And then the cross entropy loss function is used to calculate loss.
torch.nn.CrossEntropyLoss(weight=None, ignore_index=-100, reduction='mean')
Parameters :
weight (Tensor, optional) – Customize the weight of each category . It has to be a length of C Of Tensor
ignore_index (int, optional) – Set a target value , The target value is ignored , So that it doesn't affect The gradient of the input .
reduction- Three values ,none: Do not use reduction ;mean: return loss The average of and ;sum: return loss And . Default :mean.
2-4 KL Divergence loss KLDivLoss
Calculation input and target Between KL The divergence .KL Divergence can be used to measure the distance between different continuous distributions , In the space of continuous output distribution ( Discrete sampling ) It is very effective for direct regression on .
torch.nn.KLDivLoss(reduction='mean')
Parameters :
reduction- Three values ,none: Do not use reduction ;mean: return loss The average of and ;sum: return loss And . Default :mean.
2-5 Binary cross entropy loss BCELoss
The calculation function of cross entropy in binary classification task . Error used to measure reconstruction , For example, automatic encoders . Pay attention to the value of the target t[i] For the range of 0 To 1 Between .
torch.nn.BCELoss(weight=None, reduction='mean')
Parameters :
weight (Tensor, optional) – Custom each batch Elemental loss The weight of . It has to be a length of “nbatch” Of Of Tensor
pos_weight(Tensor, optional) – Customized for each positive sample loss The weight of . It has to be a length by “classes” Of Tensor
2-6 BCEWithLogitsLoss
BCEWithLogitsLoss The loss function takes Sigmoid Layer integrated into BCELoss Class . This version is simpler than using a Sigmoid Layer and the BCELoss More stable numerically , Because after merging these two operations into one layer , You can use log-sum-exp Of Techniques to achieve numerical stability .
torch.nn.BCEWithLogitsLoss(weight=None, reduction='mean', pos_weight=None)
Parameters :
weight (Tensor, optional) – Custom each batch Elemental loss The weight of . It has to be a length by “nbatch” Of Tensor
pos_weight(Tensor, optional) – Customized for each positive sample loss The weight of . It has to be a length by “classes” Of Tensor
2-7 MarginRankingLoss
torch.nn.MarginRankingLoss(margin=0.0, reduction='mean')
about mini-batch( Small batch ) The loss function for each instance in is as follows :
Parameters :
margin: The default value is 0
2-8 HingeEmbeddingLoss
torch.nn.HingeEmbeddingLoss(margin=1.0, reduction='mean')
about mini-batch( Small batch ) The loss function for each instance in is as follows :
Parameters :
margin: The default value is 1
2-9 Multi label classification loss MultiLabelMarginLoss
torch.nn.MultiLabelMarginLoss(reduction='mean')
about mini-batch( Small batch ) For each sample in, the loss is calculated as follows :
2-10 Smooth version L1 Loss SmoothL1Loss
Also known as Huber Loss function .
torch.nn.SmoothL1Loss(reduction='mean')
among
2-11 2 Classified logistic Loss SoftMarginLoss
torch.nn.SoftMarginLoss(reduction='mean')
2-12 Multi label one-versus-all Loss MultiLabelSoftMarginLoss
torch.nn.MultiLabelSoftMarginLoss(weight=None, reduction='mean')
2-13 cosine Loss CosineEmbeddingLoss
torch.nn.CosineEmbeddingLoss(margin=0.0, reduction='mean')
Parameters :
margin: The default value is 0
2-14 Multi category classification of hinge Loss MultiMarginLoss
torch.nn.MultiMarginLoss(p=1, margin=1.0, weight=None, reduction='mean')
Parameters :
p=1 perhaps 2 The default value is :1
margin: The default value is 1
2-15 Triplet loss TripletMarginLoss
torch.nn.TripletMarginLoss(margin=1.0, p=2.0, eps=1e-06, swap=False, reduction='mean')
among :
2-16 Connection timing classification loss CTCLoss
CTC Connection timing classification loss , You can automatically align data that is not aligned , It is mainly used for training serialization data without prior alignment . For example, speech recognition 、ocr Identification and so on .
torch.nn.CTCLoss(blank=0, reduction='mean')
Parameters :
reduction- Three values ,none: Do not use reduction ;mean: return loss The average of and ;sum: return loss And . Default :mean.
2-17 Negative log likelihood loss NLLLoss
Negative log likelihood loss . Used for training C There are three categories of classification problems .
torch.nn.NLLLoss(weight=None, ignore_index=-100, reduction='mean')
Parameters :
weight (Tensor, optional) – Customize the weight of each category . It has to be a length of C Of Tensor
ignore_index (int, optional) – Set a target value , The target value is ignored , So that it doesn't affect The gradient of the input .
2-18 NLLLoss2d
Negative log likelihood loss for image input . It calculates the negative log likelihood loss per pixel .
torch.nn.NLLLoss2d(weight=None, ignore_index=-100, reduction='mean')
Parameters :
weight (Tensor, optional) – Customize the weight of each category . It has to be a length of C Of Tensor
reduction- Three values ,none: Do not use reduction ;mean: return loss The average of and ;sum: return loss And . Default :mean.
2-19 PoissonNLLLoss
The target value is the negative log likelihood loss of Poisson distribution
torch.nn.PoissonNLLLoss(log_input=True, full=False, eps=1e-08, reduction='mean')
Parameters :
log_input (bool, optional) – If set to True , loss Will be in accordance with the public type exp(input) - target * input To calculate , If set to False , loss Will follow input - target * log(input+eps) Calculation .
full (bool, optional) – Whether to calculate all of loss, i. e. add Stirling Approximation term target * log(target) - target + 0.5 * log(2 * pi * target).
eps (float, optional) – The default value is : 1e-8
Reference material
http://www.voidcn.com/article/p-rtzqgqkz-bpg.html
Past highlights
It is suitable for beginners to download the route and materials of artificial intelligence ( Image & Text + video ) Introduction to machine learning series download Chinese University Courses 《 machine learning 》( Huang haiguang keynote speaker ) Print materials such as machine learning and in-depth learning notes 《 Statistical learning method 》 Code reproduction album machine learning communication qq Group 955171419, Please scan the code to join wechat group
边栏推荐
- Cann operator: using iterators to efficiently realize tensor data cutting and blocking processing
- 漫谈客户端存储技术之Cookie篇
- Small hair cat Internet of things platform construction and application model
- Optimization cases of complex factor calculation: deep imbalance, buying and selling pressure index, volatility calculation
- YOLOv5s-ShuffleNetV2
- 软件客户端数字签名一定要申请代码签名证书吗?
- Pytoch learning (4)
- Development and construction of DFI ecological NFT mobile mining system
- Informatics Olympiad 1336: [example 3-1] find roots and children
- 一文搞懂Go语言中文件的读写与创建
猜你喜欢
ICML 2022 | Meta提出鲁棒的多目标贝叶斯优化方法,有效应对输入噪声
一文搞懂Go语言中文件的读写与创建
What is involution?
FS4061A升压8.4V充电IC芯片和FS4061B升压12.6V充电IC芯片规格书datasheet
node强缓存和协商缓存实战示例
FS8B711S14电动红酒开瓶器单片机IC方案开发专用集成IC
BCG 使用之CBCGPTabWnd控件(相当于MFC TabControl)
Write it down once Net analysis of thread burst height of an industrial control data acquisition platform
华为nova 10系列支持应用安全检测功能 筑牢手机安全防火墙
【历史上的今天】7 月 4 日:第一本电子书问世;磁条卡的发明者出生;掌上电脑先驱诞生
随机推荐
Offset function and windowing function
kotlin 基本使用
B2B mall system development of electronic components: an example of enabling enterprises to build standardized purchase, sale and inventory processes
一文搞懂Go语言中文件的读写与创建
漫谈客户端存储技术之Cookie篇
repeat_ P1002 [NOIP2002 popularization group] cross the river pawn_ dp
Write it down once Net analysis of thread burst height of an industrial control data acquisition platform
Talking about cookies of client storage technology
[today in history] July 4: the first e-book came out; The inventor of magnetic stripe card was born; Palm computer pioneer was born
2022 version of stronger jsonpath compatibility and performance test (snack3, fastjson2, jayway.jsonpath)
On communication bus arbitration mechanism and network flow control from the perspective of real-time application
New wizard effect used by BCG
So this is the BGP agreement
Six stones programming: about code, there are six triumphs
被奉为经典的「金字塔原理」,教给我们哪些PPT写作技巧?
HDU 1097 A hard puzzle
软件客户端数字签名一定要申请代码签名证书吗?
Decryption function calculates "task state and lifecycle management" of asynchronous task capability
Dynamic memory management
Why is the maximum speed the speed of light