当前位置：网站首页>[loss functions of L1, L2 and smooth L1]

[loss functions of L1, L2 and smooth L1]

2022-07-05 11:42:00 【Network starry sky (LUOC)】

List of articles

One 、 common MSE、MAE Loss function
- 1.1 Mean square error 、 Loss of square
- 1.2 Mean absolute error
Two 、L1_Loss and L2_Loss
- 2.1 L1_Loss and L2_Loss Formula
- 2.2 Several key concepts
3、 ... and 、smooth L1 Loss function

One 、 common MSE、MAE Loss function

1.1 Mean square error 、 Loss of square

Mean square error （MSE） It is the most commonly used error in regression loss function , It is the sum of squares of the difference between the predicted value and the target value , The formula is as follows ：
Insert picture description here

The following figure shows the curve distribution of root mean square error , The minimum value is the position where the predicted value is the target value .

Insert picture description here

advantage ： All points are continuous and smooth , Convenient derivation , It has a more stable solution

shortcoming ： Not particularly robust , Why? ？ Because when the input value of the function is far from the central value , When using the gradient descent method, the gradient is very large , May cause gradient explosion .

What is gradient explosion ？
Error gradient is the direction and quantity of calculation in the process of neural network training , Used to update network weights with the right direction and the right amount .
In deep networks or cyclic neural networks , The error gradient can be accumulated in the update , It becomes a very large gradient , And then it leads to a big update of the network weight , And that makes the network unstable . In extreme cases , The value of the weight becomes very large , To overflow , Lead to NaN value .
Gradients between network layers （ Greater than 1.0） Exponential growth caused by repeated multiplication produces a gradient explosion .

Problems caused by gradient explosion
In deep multilayer perceptron networks , Gradient explosion can cause network instability , The best result is that you can't learn from the training data , And the worst result is something that can't be updated NaN Weight value .

1.2 Mean absolute error

Mean absolute error （MAE） Is another commonly used regression loss function , It is the sum of the absolute value of the difference between the target value and the predicted value , Represents the average error range of the predicted value , Without considering the direction of the error , The scope is 0 To ∞, The formula is as follows ：

Insert picture description here

advantage ： No matter what kind of input value , All have stable gradients , It will not cause gradient explosion problems , A more robust solution .

shortcoming ： At the center point is the break point , No derivative , It's not convenient to solve .

The above two loss functions are also called L2 Loss and L1 Loss .

Two 、L1_Loss and L2_Loss

2.1 L1_Loss and L2_Loss Formula

L1 Norm loss function , Also known as the minimum absolute deviation （LAD）, Minimum absolute error （LAE）. On the whole , It is the target value （Yi) And estimates （f(xi)) The sum of the absolute differences of （S) To minimize the ：
Insert picture description here

L2 Norm loss function , Also known as least square error （LSE）. in general , It is the target value （Yi) And estimates （f(xi)) The sum of the squares of the differences （S) To minimize the ：
Insert picture description here

import numpy as np

def L1(yhat, y):
    loss = np.sum(np.abs(y - yhat))
    return loss

def L2(yhat, y):
    loss =np.sum(np.power((y - yhat), 2))
    return loss
# call 
yhat = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
y = np.array([1, 1, 0, 1, 1])

print("L1 = " ,(L1(yhat,y)))
print("L2 = " ,(L2(yhat,y)))

L1 Norm and L2 The difference between norm and loss function can be quickly summarized as follows ：
Insert picture description here

2.2 Several key concepts

（1） Robustness
The reason why the minimum absolute deviation is robust , Because it can handle outliers in data . This may be useful in studies where outliers may be safely and effectively ignored . If you need to consider any or all outliers , Then the minimum absolute deviation is the better choice .

Intuitively , because L2 Norm squares the error （ If the error is greater than 1, The error will be magnified a lot ）, The error of the model will be greater than L1 The norm is bigger , So the model will be more sensitive to this sample , This requires adjusting the model to minimize errors . If this sample is an outlier , The model needs to be adjusted to accommodate individual outliers , This will sacrifice many other normal samples , Because the error of these normal samples is smaller than that of the single outlier .

（2） stability
The instability of the minimum absolute deviation method means , For a small horizontal fluctuation of the data set , The regression line may jump a lot （ Such as , Derivation at turning point ）. On some data structures , The method has many continuous solutions ; however , A small shift in the data set , Many continuous solutions of a data structure in a certain region will be skipped . After skipping the solution in this region , The minimum absolute deviation line may have a greater inclination than the previous line .

By contraries , The solution of the least square method is stable , Because any small fluctuations in a data point , The regression line always moves only slightly ; That is to say , The regression parameter is a continuous function of the data set .

3、 ... and 、smooth L1 Loss function

As the name suggests ,smooth L1 It's after smoothing L1, As I said before L1 The disadvantage of loss is that there is a discount point , Not smooth , Leading to instability , How to make it smooth ？smooth L1 The loss function is ：

Insert picture description here

smooth L1 The loss function curve is shown in the figure below , The purpose of the author's setting is to make loss More robust to outliers , Compared with L2 Loss function , It's for outliers （ It refers to the point far from the center ）、 outliers （outlier） Insensitivity , It's not easy to control the weight of the flight .

Insert picture description here