当前位置：网站首页>Loss loss function

Loss loss function

2022-06-28 08:23:00 【Beiluo sect XY】

This blog records all kinds of losses encountered , If you want to know about various losses and their codes , You can also view mmdet Project loss part

Cross entropy

Applicable to multi classification tasks , Cross entropy is a common loss in classification loss ,-ylogP Take the average , probability P by 1 when , The loss is 0. stay bert Of mlm In the pre training task ignore_index Enter the reference , Depending on only part of the location （15%mask It's about ） Calculate the loss . In the actual calculation , The label turns to one-hot,y=0 The location of -ylogP by 0, Not involved in loss calculation , Refer to the red example in the following link Cross entropy loss and binary cross entropy loss _ Plane train Barrett's blog -CSDN Blog _ Cross entropy loss

The calculation process is ：softmax——log——nllloss

softmax Output in 0-1 Between

log The output is from negative infinity to 0 Between ,softmax and log Can be combined into F.log_softmax operation

nllloss take log The value of the corresponding position in the output , Take the negative after averaging , Output in 0 To positive infinity

There are many ways to do it , All in the following example loss=tensor(1.3077)

import torch
import torch.nn as nn
import torch.nn.functional as F

# input = torch.randn(3,3)
# print(input)
input = torch.tensor([[0.2,0.3,0.4],
                     [-0.02,-0.13,0.2],
                     [0.5,-0.3,0.73]])

target = torch.tensor([0,2,1])
target1 = torch.FloatTensor([[1,0,0],
                       [0,0,1],
                       [0,1,0]])

##############################################################
"""
softmax+ Only the probability of correct class position 
nn.Softmax-torch.log-nn.NLLLoss
=F.log_softmax-nn.NLLLoss
=nn.CrossEntropyLoss
=F.cross_entropy

"""


# 1、nllloss softmax-log-nllloss
softmax=nn.Softmax(dim=1)   #  The transverse softmax
nllloss=nn.NLLLoss()
softmax_log = torch.log(softmax(input))

softmax_log2=F.log_softmax(input, dim=1)

"""
print(softmax_log)
tensor([[-1.2019, -1.1019, -1.0019],
        [-1.1448, -1.2548, -0.9248],
        [-0.9962, -1.7962, -0.7662]])
"""
loss1 = nllloss(softmax_log,target)
print(loss1)

loss2 = nllloss(softmax_log2,target)
print(loss2)

# 2、nllloss= Take the corresponding value, average it, and take the negative value 
loss3 = -(softmax_log[0][target[0]]+
         softmax_log[1][target[1]]+
         softmax_log[2][target[2]])/3
print(loss3) 



# 3、 Call directly CrossEntropyLoss、cross_entropy
crossEntropyLoss=nn.CrossEntropyLoss()
loss4=crossEntropyLoss(input, target)
print(loss4)
loss5=F.cross_entropy(input, target)
print(loss5)

Binary cross entropy / Two categories cross entropy

It is suitable for two categories 、 Multi label classification task . Where is the cross entropy softmax Then only take the value of the correct position to calculate loss. Binary cross entropy holds that classes are independent , stay sigmoid Then take the values of all positions to calculate loss,-（ylogP+（1-y）log(1-P)） Take the average , When y by 0 or 1 when , One of the two terms added must be 0.

"""
sigmoid+ Probability of using all categories 
F.sigmoid- F.binary_cross_entropy
=F.binary_cross_entropy_with_logits
=nn.BCEWithLogitsLoss
=nn.BCELoss

"""

softmax=nn.Softmax(dim=1)   
print('softmax=',softmax(input))

sigmoid = F.sigmoid(input)
print('sigmoid=',sigmoid)

# softmax= tensor([[0.3006, 0.3322, 0.3672],
#         [0.3183, 0.2851, 0.3966],
#         [0.3693, 0.1659, 0.4648]])

# sigmoid= tensor([[0.5498, 0.5744, 0.5987],
#         [0.4950, 0.4675, 0.5498],
#         [0.6225, 0.4256, 0.6748]])



loss6 = F.binary_cross_entropy(sigmoid, target1)
print(loss6)

loss7 = F.binary_cross_entropy_with_logits(input, target1)
print(loss7)

loss = nn.BCELoss(reduction='mean')
loss8 =loss(sigmoid, target1)
print(loss8)

loss =nn.BCEWithLogitsLoss()
loss9 =loss(input, target1)
print(loss9)

def binary_cross_entropyloss(prob, target, weight=None):
    weight=torch.ones((3,3))   #  Positive sample weight 
    loss = -weight * (target * torch.log(prob) + (1 - target) * (torch.log(1 - prob)))
#     print('torch.numel(target)=',torch.numel(target))   #  Element number 9
    loss = torch.sum(loss) / torch.numel(target)   #  Averaging 
    return loss
loss10=binary_cross_entropyloss(sigmoid, target1)
print(loss10)

Equilibrium cross entropy Balanced Cross-Entropy—— Set super parameters

It is often used in semantic segmentation tasks , Deal with the imbalance between positive and negative samples , Balance positive and negative samples （ In binary cross entropy, you can also set a parameter to set the weight of positive samples ）.

blanced and focal It can be understood as an idea , Used to optimize various losses . If in -（ylogP+（1-y）log(1-P)） Add one more beta coefficient , It is amended as follows -（BylogP+(1-B)（1-y）log(1-P)）

pytorch To realize Balanced Cross-Entropy_fpan98 The blog of -CSDN Blog

In the following example 1-y/w*h As beta, In practice, you can also set a super parameter by yourself

input = torch.tensor([[0.2,0.3,0.4],
                     [-0.02,-0.13,0.2],
                     [0.5,-0.3,0.73]])

target2 = torch.FloatTensor([[1,0,0],
                       [1,1,1],
                       [0,1,1]])

def balanced_loss(input, target):
    input = input.view(input.shape[0], -1)
    target = target.view(target.shape[0], -1)
    loss = 0.0
    for i in range(input.shape[0]):
        #  In this case beta Respectively tensor(0.6667)、tensor(0.)、tensor(0.3333)
        beta = 1-torch.sum(target[i])/target.shape[1]   #  Non in the sample 1 Probability 
        print('beta=',beta)
        x = torch.max(torch.log(input[i]), torch.tensor([-100.0]))
        y = torch.max(torch.log(1-input[i]), torch.tensor([-100.0]))
        l = -(beta*target[i] * x + (1-beta)*(1 - target[i]) * y)
        print('l=',l)
        loss += torch.sum(l)
    return loss

loss11=balanced_loss(sigmoid, target2)

Equilibrium cross entropy Balanced Cross-Entropy—— Set the proportion of positive and negative samples for online difficult sample mining

In this example, the negative samples participating in the loss calculation are limited to the number of positive samples 3 times , Select a difficult negative sample

Another blog post also mentioned online difficult sample mining and its code be based on 【 be based on （ be based on pytorch Of resnet） Of fpn】 Of psenet_ Beiluo school XY The blog of -CSDN Blog

class BalanceCrossEntropyLoss(nn.Module):
    '''
    Balanced cross entropy loss.
    Shape:
        - Input: :math:`(N, 1, H, W)`
        - GT: :math:`(N, 1, H, W)`, same shape as the input
        - Mask: :math:`(N, H, W)`, same spatial shape as the input
        - Output: scalar.

    Examples::

        >>> m = nn.Sigmoid()
        >>> loss = nn.BCELoss()
        >>> input = torch.randn(3, requires_grad=True)
        >>> target = torch.empty(3).random_(2)
        >>> output = loss(m(input), target)
        >>> output.backward()
    '''

    def __init__(self, negative_ratio=3.0, eps=1e-6):
        super(BalanceCrossEntropyLoss, self).__init__()
        self.negative_ratio = negative_ratio
        self.eps = eps

    def forward(self,
                pred: torch.Tensor,
                gt: torch.Tensor,
                mask: torch.Tensor,
                return_origin=False):
        '''
        Args:
            pred: shape :math:`(N, 1, H, W)`, the prediction of network
            gt: shape :math:`(N, 1, H, W)`, the target
            mask: shape :math:`(N, H, W)`, the mask indicates positive regions
        '''
        positive = (gt * mask).byte()
        negative = ((1 - gt) * mask).byte()
        positive_count = int(positive.float().sum())
        negative_count = min(int(negative.float().sum()),
                            int(positive_count * self.negative_ratio))
        loss = nn.functional.binary_cross_entropy(
            pred, gt, reduction='none')[:, 0, :, :]
        positive_loss = loss * positive.float()
        negative_loss = loss * negative.float()
        negative_loss, _ = torch.topk(negative_loss.view(-1), negative_count)

        balance_loss = (positive_loss.sum() + negative_loss.sum()) /\
            (positive_count + negative_count + self.eps)

        if return_origin:
            return balance_loss, loss
        return balance_loss

Focal Loss

Dealing with difficult samples , According to the model output probability P Adjust specific gravity , One more super parameter gamma

-（B((1-P)**gamma)ylogP+(1-B)(P**gamma)（1-y）log(1-P)）

class BCEFocalLosswithLogits(nn.Module):
    def __init__(self, gamma=0.2, alpha=0.6, reduction='mean'):
        super(BCEFocalLosswithLogits, self).__init__()
        self.gamma = gamma
        self.alpha = alpha
        self.reduction = reduction

    def forward(self, logits, target):
        # logits: [N, H, W], target: [N, H, W]
        logits = F.sigmoid(logits)
        alpha = self.alpha
        gamma = self.gamma
        loss = - alpha * (1 - logits) ** gamma * target * torch.log(logits) - \
               (1 - alpha) * logits ** gamma * (1 - target) * torch.log(1 - logits)
        if self.reduction == 'mean':
            loss = loss.mean()
        elif self.reduction == 'sum':
            loss = loss.sum()
        return loss
L=BCEFocalLosswithLogits()
loss12= L(sigmoid, target2)

Weighting based on the number of samples

Count the number of different categories , Get the probability of occurrence of this category p, according to 1/log（a+p） Get the weight of each category , among a Is a super parameter for smoothing

A method to deal with sample imbalance , Sample weight processing method and code _ Continent _starry The blog of -CSDN Blog _ Sample weight setting

DICE LOSS

It is applicable to the imbalance of positive and negative samples , Commonly used in semantic segmentation, where there is a large difference in the number of scene primes before and after, the calculation method is

1-2*abs(y*p)/(abs(y)+abs(p)) In actual calculation ,y Must be greater than or equal to 0,p adopt sigmoid The later is greater than or equal to 0, There is no... In the code abs Function of

You can also refer to pse That blog post will mine online difficult samples （ Control the proportion of positive and negative samples 1:3） and doce loss Combined code

def dice_loss(input, target):
    input = input.contiguous().view(input.size()[0], -1)
    target = target.contiguous().view(target.size()[0], -1).float()

    a = torch.sum(input * target, 1)
    b = torch.sum(input * input, 1) + 0.001
    c = torch.sum(target * target, 1) + 0.001
    d = (2 * a) / (b + c)
    return 1-d

L1 loss

Return to loss , Regardless of direction , The calculation method is abs(y-p) Average value

from torch.nn import L1Loss
loss = L1Loss()
loss13 = loss(input, target1)
print(loss13)

tmp = abs(input- target1)
loss14 = torch.sum(tmp)/torch.numel(tmp)
print(loss14)

loss appear nan

It shows that the loss is normally reduced at the beginning of training , Later, it appeared sporadically Nan, Later, it was all Nan 了 . This is actually the phenomenon of gradient explosion . In various models 、 This phenomenon may occur in all kinds of losses . Such as the following situations encountered by bloggers ：

use yolov3 VOC Problems and solutions in training your own data _ Gary zdh The blog of -CSDN Blog

ta The learning rate adjustment strategy is modified to reduce the learning rate

pytorch MultiheadAttention appear NaN_qq_37650969 The blog of -CSDN Blog _attention nan

ta Met in attention The whole line is mask The situation of

The reasons include ：

1）loss Divide by nan

2）loss In calculation log0, Such as binary cross entropy calculation loss = -(y*ln(p)+(1-y)*ln(1-p))

3）softmax The inputs of are all negative infinity , Output is Nan

Solutions generally include ：

1） Cross entropy input p Add a small number to limit the range of one side

crossentropy(out+1e-8, target)

2） Scope of bilateral restrictions

q_logits = torch.clamp(q_logits, min=1e-7, max=1 - 1e-7) # Added truncation interval

q_log_prob = torch.nn.functional.log_softmax(q_logits, dim=1)

3） Reduce the learning rate , Reduce shock

4） Evaluate whether the loss function is appropriate for the task

5） Evaluate whether there are annotation problems

Q： Why is the mean value not taken in the formula , But the code often needs to take the mean value

A： It's actually a batch The average loss of the sample is taken , Instead of averaging a single sample ,reduction Can be set to

'none', 'mean', 'sum'

Q： What are the methods to deal with the imbalance of various types of samples

A： Resampling 、 Weighted based on the number of samples 、balanced Set the positive and negative sample super parameters 、balanced Control the proportion of positive and negative samples 、dice loss