当前位置：网站首页>【pytorch】nn.CrossEntropyLoss() 与 nn.NLLLoss()

【pytorch】nn.CrossEntropyLoss() 与 nn.NLLLoss()

2022-07-01 09:03:00 【Enzo 想砸电脑】

一、意会的理解交叉熵作为损失函数的意义

交叉熵损失多用于多分类函数，下面我们通过拆解交叉熵的公式来理解其作为损失函数的意义

假设我们在做一个 n分类的问题，模型预测的输出结果是 $x_1, x_2, x_3, ...., x_n]$
然后，我们需要定义一个损失函数，然后通过反向传播调整模型的权重，这里的损失函数我们就选择交叉熵损失函数啦～

nn.CrossEntropyLoss() 的公式为：
$-log(\frac{e^{x_{[class]}}}{\sum_je^{x_{j}}}) = -x_{[class]} + log(\sum_j e^{x_{j}})$

x 是预测结果，是一个向量，其元素个数是需要由模型保证的，保证和分类数一样多
class 表示这个样本的实际标签，比如，样本实际属于分类2，那么class=2
$x_{[class]}$ 就是 $x_2$ ，就是取测试结果向量中的第二个元素，也就是取其真实分类对应的那个预测值

上面铺垫完了，接下来，我们要拆解公式，理解公式了
1、首先，交叉熵损失函数公式中包含了一个最基础的部分： $softmax(x_i) = \frac{e^{x_i}}{\sum_je^{x_{j}}}$
softmax 将分类的结果做了归一化： $e^x$ 先将数据映射到(0, 1] 的区间，再使所有分类概率相加的总和等于1。经过softmax处理后，size不会变，每个值的意义是样本被分到这个分类的概率。

2、我们想要使预测结果中，真实分类的那个值的概率接近 100%。我们取出真实分类的那个值：
$\frac{e^{x_{[class]}}}{\sum_je^{x_{j}}}$ ，我们希望它的值是 100%

3、作为损失函数的意义是：当预测结果越接近真实值，损失函数的值越接近于0。
我们把 $\frac{e^{x_{[class]}}}{\sum_je^{x_{j}}}$ 取log，再取反，就能保证当 $\frac{e^{x_{[class]}}}{\sum_je^{x_{j}}}$ 越接近于100%， $loss=-log(\frac{e^{x_{[class]}}}{\sum_je^{x_{j}}})$ 越接近0。

二、应用：

假设有4张图片，或者说batch_ size=4。我们需要把这4张图片分类到5个类别上去，比如说：鸟，狗，猫，汽车，船
经过网络计算后，我们得到了预测结果：predict，size为[4, 5]
其真实标签为 label，size为 [4]
接下来使用 nn.CrossEntropyLoss() 计算预测结果predict 和真实值label 的交叉熵损失，可以

import torch
import torch.nn as nn

# -----------------------------------------
# 定义数据: batch_size=4； 一共有5个分类
# label.size() : torch.Size([4])
# predict.size(): torch.Size([4, 5])
# -----------------------------------------
torch.manual_seed(100)
predict = torch.rand(4, 5)
label = torch.tensor([4, 3, 3, 2])
print(predict)
print(label)

# -----------------------------------------
# 直接调用函数 nn.CrossEntropyLoss() 计算 Loss
# -----------------------------------------
criterion = nn.CrossEntropyLoss()
loss = criterion(predict, label)
print(loss)

在这里插入图片描述

nn.CrossEntropyLoss() 可以拆解成如下3个步骤，或者说可以由如下3个操作替换，其运算结果一毛一样：

softmax：对每张图片的分类结果做softmax， softmax详细介绍
log：对上面的结果取log
（步骤1 和步骤2 可以合并为 nn.logSoftmax() ）
NLL：nn.NLLLoss(a, b) 的操作是从a 中取出b对应的那个值(b中存的是 index值)，再去掉负号（取反），然后求和取均值

import torch
import torch.nn as nn

torch.manual_seed(100)
predict = torch.rand(4, 5)
label = torch.tensor([4, 3, 3, 2])

softmax = nn.Softmax(dim=1)
nll = nn.NLLLoss()

temp1 = softmax(predict)
temp2 = torch.log(temp1)
output = nll(temp2, label)
print(output)   # tensor(1.5230)

纯手撸版本

import torch

torch.manual_seed(100)
predict = torch.rand(4, 5)
label = torch.tensor([4, 3, 3, 2])

# softmax
temp1 = torch.exp(predict) / torch.sum(torch.exp(predict), dim=1, keepdim=True)

# log
temp2 = torch.log(temp1)

# nll
temp3 = torch.gather(temp2, dim=1, index=label.view(-1, 1))
temp4 = -temp3
output = torch.mean(temp4)

print(output)    # tensor(1.5230)