当前位置：网站首页>Entropy and full connection layer

Entropy and full connection layer

2022-07-03 23:50:00 【onlywishes】

entropy

Entropy is the expected value of information , It is a measure of the certainty of random variables . The greater the entropy , The more uncertain the value of the variable , Otherwise, the more certain it is .

import torch
a = torch.full([4],1/4.)
print(-(a*torch.log2(a)).sum())     #a,b,c Suppose four options choose the right probability 
#tensor(2.)                           # The more obvious entropy is, the less unexpected it is , The greater the entropy, the greater the uncertainty 
b = torch.tensor([0.1,0.1,0.1,0.7])
print(-(b*torch.log2(b)).sum())
#tensor(1.3568)
c = torch.tensor([0.001,0.001,0.001,0.999])
print(-(c*torch.log2(c)).sum())
#tensor(0.0313)

Cross entropy

The relative entropy is also known as KL The divergence ,KL distance , It's a measure of the distance between two random distributions . Write it down as DKL(p||q)DKL(p||q). It measures when the true distribution is p when , Hypothetical distribution q Invalidity of

For two distributions p（ Real sample distribution ）,q（ Model to be estimated ） All are （0-1） Yes

Yes H（p,q）= H(p) + Dkl(p|q ) , When p When known , You can put H(p) As a constant , At this time, cross entropy and KL Distance is equivalent in behavior , All reflect the distribution p,q The degree of similarity . Minimizing cross entropy is equal to minimizing KL distance . They will all be in p=q Get the minimum H(p)（p=q when KL A distance of 0）

Fully connected layer

nn.Linear(input,output,bias=TRUE)

It contains w The transpose , and b（ Personal understanding ）

Used to set up the full connection layer in the network , It should be noted that the input and output of the full connection layer are two-dimensional tensors

The general shape is [batch_size, size], Unlike the convolution layer, the input and output are required to be four-dimensional tensors

input： Enter the size of the two-dimensional tensor

output： Output the size of the two-dimensional tensor

If not used nn.Linear, Create two dimensions tensor The input and output of is the opposite , namely 【output,input】 To transpose

import torch
import torch.nn as nn
x = torch.randn(1,784)
print(x.shape)      #torch.Size([1, 784])
layer1 = nn.Linear(784,200)     # The first is in, The second is out
layer2 = nn.Linear(200,200)
layer3 = nn.Linear(200,10)

x = layer1(x)
print(x.shape)      #torch.Size([1, 200])   , Dimension reduction 
x = layer2(x)
print(x.shape)      #torch.Size([1, 200])   , feature extraction 
x = layer3(x)
print(x.shape)      #torch.Size([1, 10])    , Reduced to 10

nn.Relu vs F.relu

nn.Relu Class methods are used ,F.relu The function method is used , Pay attention to the case of letters

relu Medium inplace The default is FALSE

inplace = False when , The value of the input object will not be modified , Instead, it returns a newly created object

inplace = True when , Will modify the value of the input object , So the printed object has the same storage address , Save space and time for repeated application and memory release , Just pass the original address , More efficient

import torch
import torch.nn as nn
from torch.nn import functional as F
x = torch.randn(1,10)

x = F.relu(x,inplace=True)
print(x)    #tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.9487, 1.0720, 0.0000, 0.0000, 0.3956, 0.0000]])
layer = nn.ReLU()
x = layer(x)
print(x)    #tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.9487, 1.0720, 0.0000, 0.0000, 0.3956, 0.0000]])

About activation functions relu

sigmoid and tanh yes “ Saturation activation function ”, and ReLU And its variants are “ Unsaturated activation function ”. Use “ Unsaturated activation function ” There are two advantages ：
1.“ Unsaturated activation function ” Can solve the so-called “ The gradient disappears ” problem .
2. It can speed up convergence .

leaky relu

ReLU Is to set all negative values to zero ,Leaky ReLU Is to give all negative values a non-zero slope , Default 0.01

Usage and relu identical , Angle can be set

SELU function

It's a composite function

Softplus

take relu function x Axis approach 0 Partial smoothing

Use GPU Speed up

device = torch.device('cuda:0')    #  Use equipment , You can choose to move what you need to calculate to the equipment you need .

# Will be accelerated to GPU
o = nn.CrossEntropyLoss().to(device)    # Use .to() Method specification GPU Speed up , Will return inference, His type depends on the original type 
t = nn.CrossEntropyLoss().cuda()        # Use .cuda() Method

Calculation accuracy

logits = torch.rand(4,10)
pred = F.softmax(logits,dim=1)

pred_label = pred.argmax(dim=1)
print(pred_label)       #tensor([5, 5, 2, 6])

p = logits.argmax(dim=1)
print(p)        #tensor([5, 5, 2, 6])

label = torch.tensor([3,9,2,0])
correct = torch.eq(pred_label,label) #
print(correct)  #tensor([False, False,  True, False])

i = correct.sum().float().item()/4  # Calculation accuracy ,item Get the elements inside 
print(i)    #0.25

原网站

版权声明
本文为[onlywishes]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202142042211998.html