当前位置:网站首页>[in-depth learning] review pytoch's 19 loss functions
[in-depth learning] review pytoch's 19 loss functions
2022-07-04 20:27:00 【Demeanor 78】
Just for academic sharing , It does not represent the position of the official account , Deletion of infringement contact
Reproduced in : author :mingo_ Sensitive
Link to the original text :https://blog.csdn.net/shanglianlm/article/details/85019768
Reading guide
Nineteen loss functions are summarized in this paper , Its mathematical formula and code implementation are introduced , I hope you can master .
01
Basic usage
criterion = LossCriterion() # Constructors have their own arguments
loss = criterion(x, y) # There are also parameters when calling the standard
02
Loss function
2-1 L1 Norm loss L1Loss
Calculation output and target The absolute value of the difference .
torch.nn.L1Loss(reduction='mean')
Parameters :
reduction- Three values ,none: Do not use reduction ;mean: return loss The average of and ;sum: return loss And . Default :mean.
2-2 Loss of mean square error MSELoss
Calculation output and target The mean square error of the difference between .
torch.nn.MSELoss(reduction='mean')
Parameters :
reduction- Three values ,none: Do not use reduction ;mean: return loss The average of and ;sum: return loss And . Default :mean.
2-3 Cross entropy loss CrossEntropyLoss
When training has C It's very effective when it comes to the classification of two categories . Optional parameters weight Must be a 1 dimension Tensor, Weights will be assigned to each category . Very effective for imbalanced training sets .
In multi category tasks , Always use softmax Activation function + Cross entropy loss function , Because cross entropy describes the difference between two probability distributions , But the neural network outputs vectors , It's not in the form of a probability distribution . So we need to softmax The activation function performs a vector “ normalization ” In the form of probability distribution , And then the cross entropy loss function is used to calculate loss.
torch.nn.CrossEntropyLoss(weight=None, ignore_index=-100, reduction='mean')
Parameters :
weight (Tensor, optional) – Customize the weight of each category . It has to be a length of C Of Tensor
ignore_index (int, optional) – Set a target value , The target value is ignored , So that it doesn't affect The gradient of the input .
reduction- Three values ,none: Do not use reduction ;mean: return loss The average of and ;sum: return loss And . Default :mean.
2-4 KL Divergence loss KLDivLoss
Calculation input and target Between KL The divergence .KL Divergence can be used to measure the distance between different continuous distributions , In the space of continuous output distribution ( Discrete sampling ) It is very effective for direct regression on .
torch.nn.KLDivLoss(reduction='mean')
Parameters :
reduction- Three values ,none: Do not use reduction ;mean: return loss The average of and ;sum: return loss And . Default :mean.
2-5 Binary cross entropy loss BCELoss
The calculation function of cross entropy in binary classification task . Error used to measure reconstruction , For example, automatic encoders . Pay attention to the value of the target t[i] For the range of 0 To 1 Between .
torch.nn.BCELoss(weight=None, reduction='mean')
Parameters :
weight (Tensor, optional) – Custom each batch Elemental loss The weight of . It has to be a length of “nbatch” Of Of Tensor
pos_weight(Tensor, optional) – Customized for each positive sample loss The weight of . It has to be a length by “classes” Of Tensor
2-6 BCEWithLogitsLoss
BCEWithLogitsLoss The loss function takes Sigmoid Layer integrated into BCELoss Class . This version is simpler than using a Sigmoid Layer and the BCELoss More stable numerically , Because after merging these two operations into one layer , You can use log-sum-exp Of Techniques to achieve numerical stability .
torch.nn.BCEWithLogitsLoss(weight=None, reduction='mean', pos_weight=None)
Parameters :
weight (Tensor, optional) – Custom each batch Elemental loss The weight of . It has to be a length by “nbatch” Of Tensor
pos_weight(Tensor, optional) – Customized for each positive sample loss The weight of . It has to be a length by “classes” Of Tensor
2-7 MarginRankingLoss
torch.nn.MarginRankingLoss(margin=0.0, reduction='mean')
about mini-batch( Small batch ) The loss function for each instance in is as follows :
Parameters :
margin: The default value is 0
2-8 HingeEmbeddingLoss
torch.nn.HingeEmbeddingLoss(margin=1.0, reduction='mean')
about mini-batch( Small batch ) The loss function for each instance in is as follows :
Parameters :
margin: The default value is 1
2-9 Multi label classification loss MultiLabelMarginLoss
torch.nn.MultiLabelMarginLoss(reduction='mean')
about mini-batch( Small batch ) For each sample in, the loss is calculated as follows :
2-10 Smooth version L1 Loss SmoothL1Loss
Also known as Huber Loss function .
torch.nn.SmoothL1Loss(reduction='mean')
among
2-11 2 Classified logistic Loss SoftMarginLoss
torch.nn.SoftMarginLoss(reduction='mean')
2-12 Multi label one-versus-all Loss MultiLabelSoftMarginLoss
torch.nn.MultiLabelSoftMarginLoss(weight=None, reduction='mean')
2-13 cosine Loss CosineEmbeddingLoss
torch.nn.CosineEmbeddingLoss(margin=0.0, reduction='mean')
Parameters :
margin: The default value is 0
2-14 Multi category classification of hinge Loss MultiMarginLoss
torch.nn.MultiMarginLoss(p=1, margin=1.0, weight=None, reduction='mean')
Parameters :
p=1 perhaps 2 The default value is :1
margin: The default value is 1
2-15 Triplet loss TripletMarginLoss
torch.nn.TripletMarginLoss(margin=1.0, p=2.0, eps=1e-06, swap=False, reduction='mean')
among :
2-16 Connection timing classification loss CTCLoss
CTC Connection timing classification loss , You can automatically align data that is not aligned , It is mainly used for training serialization data without prior alignment . For example, speech recognition 、ocr Identification and so on .
torch.nn.CTCLoss(blank=0, reduction='mean')
Parameters :
reduction- Three values ,none: Do not use reduction ;mean: return loss The average of and ;sum: return loss And . Default :mean.
2-17 Negative log likelihood loss NLLLoss
Negative log likelihood loss . Used for training C There are three categories of classification problems .
torch.nn.NLLLoss(weight=None, ignore_index=-100, reduction='mean')
Parameters :
weight (Tensor, optional) – Customize the weight of each category . It has to be a length of C Of Tensor
ignore_index (int, optional) – Set a target value , The target value is ignored , So that it doesn't affect The gradient of the input .
2-18 NLLLoss2d
Negative log likelihood loss for image input . It calculates the negative log likelihood loss per pixel .
torch.nn.NLLLoss2d(weight=None, ignore_index=-100, reduction='mean')
Parameters :
weight (Tensor, optional) – Customize the weight of each category . It has to be a length of C Of Tensor
reduction- Three values ,none: Do not use reduction ;mean: return loss The average of and ;sum: return loss And . Default :mean.
2-19 PoissonNLLLoss
The target value is the negative log likelihood loss of Poisson distribution
torch.nn.PoissonNLLLoss(log_input=True, full=False, eps=1e-08, reduction='mean')
Parameters :
log_input (bool, optional) – If set to True , loss Will be in accordance with the public type exp(input) - target * input To calculate , If set to False , loss Will follow input - target * log(input+eps) Calculation .
full (bool, optional) – Whether to calculate all of loss, i. e. add Stirling Approximation term target * log(target) - target + 0.5 * log(2 * pi * target).
eps (float, optional) – The default value is : 1e-8
Reference material
http://www.voidcn.com/article/p-rtzqgqkz-bpg.html
Past highlights
It is suitable for beginners to download the route and materials of artificial intelligence ( Image & Text + video ) Introduction to machine learning series download Chinese University Courses 《 machine learning 》( Huang haiguang keynote speaker ) Print materials such as machine learning and in-depth learning notes 《 Statistical learning method 》 Code reproduction album machine learning communication qq Group 955171419, Please scan the code to join wechat group
边栏推荐
- Six stones programming: about code, there are six triumphs
- 一文搞懂Go语言中文件的读写与创建
- Lingyun going to sea | Murong Technology & Huawei cloud: creating a model of financial SaaS solutions in Africa
- Pointnet / pointnet++ point cloud data set processing and training
- 黑马程序员-软件测试--07阶段2-linux和数据库-09-24-linux命令学习步骤,通配符,绝对路径,相对路径,文件和目录常用命令,文件内容相关操作,查看日志文件,ping命令使用,
- Template_ Large integer subtraction_ Regardless of size
- 泰山OFFICE技术讲座:关于背景(底纹和高亮)的顺序问题
- Write it down once Net analysis of thread burst height of an industrial control data acquisition platform
- 长城证券开户安全吗 股票开户流程网上开户
- 15million employees are easy to manage, and the cloud native database gaussdb makes HR office more efficient
猜你喜欢
应用实践 | 蜀海供应链基于 Apache Doris 的数据中台建设
abc229 总结(区间最长连续字符 图的联通分量计数)
QT writing the Internet of things management platform 38- multiple database support
FS8B711S14电动红酒开瓶器单片机IC方案开发专用集成IC
如何让你的小游戏适配不同尺寸的手机屏幕
Application practice | Shuhai supply chain construction of data center based on Apache Doris
Optimization cases of complex factor calculation: deep imbalance, buying and selling pressure index, volatility calculation
Regular replacement [JS, regular expression]
【深度学习】一文看尽Pytorch之十九种损失函数
What does the neural network Internet of things mean? Popular explanation
随机推荐
[QNX hypervisor 2.2 user manual]6.3.1 factory page and control page
凌云出海记 | 文华在线&华为云:打造非洲智慧教学新方案
Multi table operation inner join query
Niuke Xiaobai month race 7 F question
Cann operator: using iterators to efficiently realize tensor data cutting and blocking processing
【历史上的今天】7 月 4 日:第一本电子书问世;磁条卡的发明者出生;掌上电脑先驱诞生
公司要上监控,Zabbix 和 Prometheus 怎么选?这么选准没错!
黑马程序员-软件测试--09阶段2-linux和数据库-31-43修改文件权限字母发的说明,-查找链接修改文件,查找文件命令,链接文件,压缩解压方式,vi编辑器基本使用,
【毕业季】绿蚁新醅酒,红泥小火炉。晚来天欲雪,能饮一杯无?
[graduation season] green ant new fermented grains wine, red mud small stove. If it snows late, can you drink a cup?
更强的 JsonPath 兼容性及性能测试之2022版(Snack3,Fastjson2,jayway.jsonpath)
Crystal optoelectronics: ar-hud products of Chang'an dark blue sl03 are supplied by the company
node强缓存和协商缓存实战示例
1007 maximum subsequence sum (25 points) (PAT class a)
Write it down once Net analysis of thread burst height of an industrial control data acquisition platform
Dark horse programmer - software testing - 09 stage 2-linux and database -31-43 instructions issued by modifying the file permission letter, - find the link to modify the file, find the file command,
需求开发思考
记一次 .NET 某工控数据采集平台 线程数 爆高分析
Actual combat simulation │ JWT login authentication
长城证券开户安全吗 股票开户流程网上开户