当前位置:网站首页>Common loss function of deep learning
Common loss function of deep learning
2022-07-02 00:35:00 【Falling flowers and rain】
List of articles

In deep learning , Loss function is a function used to measure the quality of model parameters , The way to measure is to compare the difference between network output and real output , The name of loss function is different in different literatures , There are mainly the following naming methods :

1. Classification task
The cross entropy loss function is most used in the classification task of deep learning , So here we focus on this loss function .
1.1 Multi category tasks
In multi classification tasks, we usually use softmax take logits In the form of probability , Therefore, the cross entropy loss of multi classification is also called softmax Loss , Its calculation method is :

among ,y Is the sample x The true probability of belonging to a category , and f(x) Is the prediction score of the sample belonging to a certain category ,S yes softmax function ,L To measure p,q The difference between the loss results .
Example :

The cross entropy loss in the above figure is :

Understand... From the perspective of probability , Our goal is to minimize the negative value of the logarithm of the prediction probability corresponding to the correct category , As shown in the figure below :

stay tf.keras Use in CategoricalCrossentropy Realization , As shown below :
# Import the corresponding package
import tensorflow as tf
# Set true and predicted values
y_true = [[0, 1, 0], [0, 0, 1]]
y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
# Instantiation cross entropy loss
cce = tf.keras.losses.CategoricalCrossentropy()
# Calculate the loss result
cce(y_true, y_pred).numpy()
The result is :
1.176939
1.2 2. Classified tasks
When processing task 2 , We're not using softmax Activation function , But use sigmoid Activation function , The loss function is adjusted accordingly , Using the cross entropy loss function of binary classification :

among ,y Is the sample x The true probability of belonging to a category , and y^ Is the prediction probability that the sample belongs to a certain category ,L The loss result used to measure the difference between the real value and the predicted value .
stay tf.keras When implemented in BinaryCrossentropy(), As shown below :
# Import the corresponding package
import tensorflow as tf
# Set true and predicted values
y_true = [[0], [1]]
y_pred = [[0.4], [0.6]]
# Cross entropy loss of instantiated binary classification
bce = tf.keras.losses.BinaryCrossentropy()
# Calculate the loss result
bce(y_true, y_pred).numpy()
The result is :
0.5108254
2. Return to the task
The loss functions commonly used in regression tasks are as follows :
2.1 MAE Loss
Mean absolute loss(MAE) Also known as L1 Loss, It takes the absolute error as the distance :

The curve is shown in the figure below :

Characteristic is : because L1 loss Have sparsity , To punish larger values , Therefore, it is often added as a regular item to other loss As a constraint .L1 loss The biggest problem is that the gradient is not smooth at zero , Causes the minimum to be skipped .
stay tf.keras Use in MeanAbsoluteError Realization , As shown below :
# Import the corresponding package
import tensorflow as tf
# Set true and predicted values
y_true = [[0.], [0.]]
y_pred = [[1.], [1.]]
# Instantiation MAE Loss
mae = tf.keras.losses.MeanAbsoluteError()
# Calculate the loss result
mae(y_true, y_pred).numpy()
The result is :
1.0
2.2 MSE Loss
Mean Squared Loss/ Quadratic Loss(MSE loss) Also known as L2 loss, Or Euclidean distance , It takes the sum of the squares of the errors as the distance :

The curve is shown in the figure below :

Characteristic is :L2 loss It is also often used as a regular term . When the predicted value is very different from the target value , Gradients explode easily .
stay tf.keras Pass through MeanSquaredError Realization :
# Import the corresponding package
import tensorflow as tf
# Set true and predicted values
y_true = [[0.], [1.]]
y_pred = [[1.], [1.]]
# Instantiation MSE Loss
mse = tf.keras.losses.MeanSquaredError()
# Calculate the loss result
mse(y_true, y_pred).numpy()
The result is :
0.5
2.3 smooth L1 Loss
Smooth L1 The loss function is as follows :

among : 𝑥 = f ( x ) − y 𝑥=f(x)−y x=f(x)−y Is the difference between the real value and the predicted value .

As can be seen from the above figure , This function is actually a piecewise function , stay [-1,1] Between is actually L2 Loss , That's it L1 The problem of unsmooth , stay [-1,1] Outside the range , It's actually L1 Loss , This solves the problem of gradient explosion of outliers . This loss function is usually used in target detection .
stay tf.keras Use in Huber Calculate the loss , As shown below :
# Import the corresponding package
import tensorflow as tf
# Set true and predicted values
y_true = [[0], [1]]
y_pred = [[0.6], [0.4]]
# Instantiation smooth L1 Loss
h = tf.keras.losses.Huber()
# Calculate the loss result
h(y_true, y_pred).numpy()
result :
0.18
summary
Know the loss function of classification task
The cross entropy loss function of multi classification and the cross entropy loss function of two classificationKnow the loss function of the regression task
MAE,MSE,smooth L1 Loss function
边栏推荐
- 毕业季 | 华为专家亲授面试秘诀:如何拿到大厂高薪offer?
- LDR6035智能蓝牙音响可充可放(5.9.12.15.20V)快充快放设备充电
- Vue force cleaning browser cache
- 创业团队如何落地敏捷测试,提升质量效能?丨声网开发者创业讲堂 Vol.03
- Leetcode skimming: stack and queue 02 (realizing stack with queue)
- Talents come from afar, and Wangcheng district has consolidated the intellectual base of "strengthening the provincial capital"
- 【CTF】bjdctf_2020_babystack2
- GaussDB(for MySQL) :Partial Result Cache,通过缓存中间结果对算子进行加速
- vue 强制清理浏览器缓存
- 【CTF】bjdctf_ 2020_ babystack2
猜你喜欢

2023款雷克萨斯ES产品公布,这回进步很有感

Promise and modular programming

使用多线程Callable查询oracle数据库

Graduation season is both a farewell and a new beginning

如何提升数据质量

【微信授权登录】uniapp开发小程序,实现获取微信授权登录功能

Friends circle community program source code sharing

B tree and b+tree of MySQL

Dongge cashes in and the boss retires?

Mysql database driver (JDBC Driver) jar package download
随机推荐
[template] adaptive Simpson integral
PWN attack and defense world cgpwn2
Node——Egg 创建本地文件访问接口
实例讲解将Graph Explorer搬上JupyterLab
The difference between timer and scheduledthreadpoolexecutor
449 original code, complement code, inverse code
【CTF】bjdctf_ 2020_ babystack2
挖财学堂开户打新债安全可靠嘛?
启牛学院开户安全的吗?开户怎么开?
PHP reads ini or env type configuration
Heketi record
Leetcode 96 différents arbres de recherche binaires
Promise and modular programming
Leetcode skimming: binary tree 01 (preorder traversal of binary tree)
[opencv450] hog+svm and hog+cascade for pedestrian detection
Selectively inhibiting learning bias for active sampling
Gaussdb (for MySQL):partial result cache, which accelerates the operator by caching intermediate results
Halcon knowledge: an attempt of 3D reconstruction
Database -- sqlserver details
export default 导出的对象,不能解构问题,和module.exports的区别