当前位置:网站首页>Entropy and full connection layer
Entropy and full connection layer
2022-07-03 23:50:00 【onlywishes】
entropy
Entropy is the expected value of information , It is a measure of the certainty of random variables . The greater the entropy , The more uncertain the value of the variable , Otherwise, the more certain it is .
import torch
a = torch.full([4],1/4.)
print(-(a*torch.log2(a)).sum()) #a,b,c Suppose four options choose the right probability
#tensor(2.) # The more obvious entropy is, the less unexpected it is , The greater the entropy, the greater the uncertainty
b = torch.tensor([0.1,0.1,0.1,0.7])
print(-(b*torch.log2(b)).sum())
#tensor(1.3568)
c = torch.tensor([0.001,0.001,0.001,0.999])
print(-(c*torch.log2(c)).sum())
#tensor(0.0313)
Cross entropy
The relative entropy is also known as KL The divergence ,KL distance , It's a measure of the distance between two random distributions . Write it down as DKL(p||q)DKL(p||q). It measures when the true distribution is p when , Hypothetical distribution q Invalidity of
For two distributions p( Real sample distribution ),q( Model to be estimated ) All are (0-1) Yes
Yes H(p,q)= H(p) + Dkl(p|q ) , When p When known , You can put H(p) As a constant , At this time, cross entropy and KL Distance is equivalent in behavior , All reflect the distribution p,q The degree of similarity . Minimizing cross entropy is equal to minimizing KL distance . They will all be in p=q Get the minimum H(p)(p=q when KL A distance of 0)
Fully connected layer
nn.Linear(input,output,bias=TRUE)
It contains w The transpose , and b( Personal understanding )
Used to set up the full connection layer in the network , It should be noted that the input and output of the full connection layer are two-dimensional tensors
The general shape is [batch_size, size], Unlike the convolution layer, the input and output are required to be four-dimensional tensors
input: Enter the size of the two-dimensional tensor
output: Output the size of the two-dimensional tensor
If not used nn.Linear, Create two dimensions tensor The input and output of is the opposite , namely 【output,input】 To transpose
import torch
import torch.nn as nn
x = torch.randn(1,784)
print(x.shape) #torch.Size([1, 784])
layer1 = nn.Linear(784,200) # The first is in, The second is out
layer2 = nn.Linear(200,200)
layer3 = nn.Linear(200,10)
x = layer1(x)
print(x.shape) #torch.Size([1, 200]) , Dimension reduction
x = layer2(x)
print(x.shape) #torch.Size([1, 200]) , feature extraction
x = layer3(x)
print(x.shape) #torch.Size([1, 10]) , Reduced to 10
nn.Relu vs F.relu
nn.Relu Class methods are used ,F.relu The function method is used , Pay attention to the case of letters
relu Medium inplace The default is FALSE
inplace = False when , The value of the input object will not be modified , Instead, it returns a newly created object
inplace = True when , Will modify the value of the input object , So the printed object has the same storage address , Save space and time for repeated application and memory release , Just pass the original address , More efficient
import torch
import torch.nn as nn
from torch.nn import functional as F
x = torch.randn(1,10)
x = F.relu(x,inplace=True)
print(x) #tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.9487, 1.0720, 0.0000, 0.0000, 0.3956, 0.0000]])
layer = nn.ReLU()
x = layer(x)
print(x) #tensor([[0.0000, 0.0000, 0.0000, 0.0000, 0.9487, 1.0720, 0.0000, 0.0000, 0.3956, 0.0000]])
About activation functions relu
sigmoid and tanh yes “ Saturation activation function ”, and ReLU And its variants are “ Unsaturated activation function ”. Use “ Unsaturated activation function ” There are two advantages :
1.“ Unsaturated activation function ” Can solve the so-called “ The gradient disappears ” problem .
2. It can speed up convergence .
leaky relu
ReLU Is to set all negative values to zero ,Leaky ReLU Is to give all negative values a non-zero slope , Default 0.01
Usage and relu identical , Angle can be set
SELU function
It's a composite function
Softplus
take relu function x Axis approach 0 Partial smoothing
Use GPU Speed up
device = torch.device('cuda:0') # Use equipment , You can choose to move what you need to calculate to the equipment you need .
# Will be accelerated to GPU
o = nn.CrossEntropyLoss().to(device) # Use .to() Method specification GPU Speed up , Will return inference, His type depends on the original type
t = nn.CrossEntropyLoss().cuda() # Use .cuda() Method
Calculation accuracy
logits = torch.rand(4,10)
pred = F.softmax(logits,dim=1)
pred_label = pred.argmax(dim=1)
print(pred_label) #tensor([5, 5, 2, 6])
p = logits.argmax(dim=1)
print(p) #tensor([5, 5, 2, 6])
label = torch.tensor([3,9,2,0])
correct = torch.eq(pred_label,label) #
print(correct) #tensor([False, False, True, False])
i = correct.sum().float().item()/4 # Calculation accuracy ,item Get the elements inside
print(i) #0.25
边栏推荐
- Subgraph isomorphism -subgraph isomorphism
- 2/14 (regular expression, sed streaming editor)
- Selenium library 4.5.0 keyword explanation (4)
- Advanced C language - pointer 2 - knowledge points sorting
- [MySQL] classification of multi table queries
- C summary of knowledge point definitions, summary notes
- Pytorch learning notes 5: model creation
- NPM script
- [2021]NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections
- What is the difference between NFT, SFT and dnft? How to build NFT platform applications?
猜你喜欢
Ningde times and BYD have refuted rumors one after another. Why does someone always want to harm domestic brands?
Recursive least square adjustment
How to understand the gain bandwidth product operational amplifier gain
Research Report on the scale prediction of China's municipal engineering industry and the prospect of the 14th five year plan 2022-2028
NLP Chinese corpus project: large scale Chinese natural language processing corpus
Loop compensation - explanation and calculation of first-order, second-order and op amp compensation
Gorilla/mux framework (RK boot): add tracing Middleware
Vscode regular match replace console log(.*)
2022 t elevator repair registration examination and the latest analysis of T elevator repair
Ningde times and BYD have refuted rumors one after another. Why does someone always want to harm domestic brands?
随机推荐
Deep learning ----- using NN, CNN, RNN neural network to realize MNIST data set processing
"Learning notes" recursive & recursive
How to solve the "safe startup function prevents the operating system from starting" prompt when installing windows10 on parallel desktop?
A treasure open source software, cross platform terminal artifact tabby
Ningde times and BYD have refuted rumors one after another. Why does someone always want to harm domestic brands?
Analysis of refrigeration and air conditioning equipment operation in 2022 and examination question bank of refrigeration and air conditioning equipment operation
Double efficiency. Six easy-to-use pychar plug-ins are recommended
Kubedl hostnetwork: accelerating the efficiency of distributed training communication
D26: the nearest number (translation + solution)
The upload experience version of uniapp wechat applet enters the blank page for the first time, and the page data can be seen only after it is refreshed again
[MySQL] sql99 syntax to realize multi table query
Interpretation of corolla sub low configuration, three cylinder power configuration, CVT fuel saving and smooth, safety configuration is in place
[note] glide process and source code analysis
炒股開戶傭金優惠怎麼才能獲得,網上開戶安全嗎
Cgb2201 preparatory class evening self-study and lecture content
Ningde times and BYD have refuted rumors one after another. Why does someone always want to harm domestic brands?
想请教一下,十大劵商如何开户?在线开户是安全么?
Smart fan system based on stm32f407
C # basic knowledge (1)
D29:post Office (post office, translation)