当前位置:网站首页>Review of neural network related knowledge (pytorch)
Review of neural network related knowledge (pytorch)
2022-07-29 06:11:00 【Quinn-ntmy】
One 、 Common concepts
1、Batch( Batch )
(1) If it is a model training method ,batch It refers to the estimation of weight or parameters that is updated at one time after all data processing ;
(2) If it is data in model training ,batch Is the amount of data input for model calculation .
Batch based model training steps :
a) Initialize parameters
b) Repeat the following steps : Processing all the data , Update parameters
Its corresponding is the incremental algorithm , Steps are as follows :
a) Initialize parameters
b) Repeat the following steps : Process a data point or group of data points , Update parameters
(BP In the algorithm, ,“ Handle ” The specific operation is Calculate the gradient curve of the loss function . For batch algorithm —— Calculation Average or overall Gradient curve of loss function ; Incremental algorithm —— The calculation loss function only corresponds to This observation or several observations Gradient curve of .“ to update ” Is to subtract the product of gradient change rate and learning rate from the existing parameter value .)
2、Online Learning and Offline Learning
(1)Offline Learning: All data can be obtained repeatedly , For example, the above batch algorithm .
advantage : For any fixed number of parameters , The objective function can be calculated directly , Therefore, it is easy to verify whether the model training is developing in the required direction ; The calculation accuracy can reach any reasonable degree ; Various algorithms can be used to avoid local optimization ; You can use training 、 verification 、 Test the trisection method to verify the universality of the model ; The machine confidence interval of the predicted value can be calculated .
(2)Online Learning: Each observation will be discarded after processing , Update parameters at the same time ( A kind of incremental algorithm ).
3、 The offset / threshold
Neurons in the hidden layer or output layer using the activation function usually add an offset value when calculating the network input (Bias). For linear output neurons , The offset term is the intercept term in regression .
Each neuron in the hidden layer and the output layer has its own offset term b. But if the input data has been converted into a finite value field by equal proportion , such as [0,1] Section , After the neuron of the first hidden layer sets the offset term , Other neurons in any subsequent layer that are linked to these neurons with offset items do not need to set additional offset items .
4、 Standardized data
common “ Standardization ” Data processing method :
(1) Replay shrink Rescaling: It refers to adding or subtracting a constant from a vector , Multiply or divide by a constant .
(2) Normalization Normalization: It refers to dividing a vector by its norm (eg: European space distance ). Deep learning , The range is usually used as the norm , That is, subtract the minimum value from the vector , And divide by its range , Thus, the numerical range is 0~1.
frequently-used Normalization Method (BN、LN、IN、GN) See for details https://cloud.tencent.com/developer/article/1526775
(3) Standardization (Standardization): A measure that removes the position and size of a vector . such as : A vector that obeys normal distribution , You can subtract its mean , And divide by its variance to standardize the data , Thus, a vector of standard normal distribution is obtained .
Two 、torch.nn
torch.nn The classes in the package cover the common contents of deep neural network model in the process of building and parameter optimization ( Convolution layer 、 Pooling layer 、 Full connection layer and other construction methods , Parameter normalization method to prevent over fitting 、Dropout Method , Linear activation function of activation function part 、 Nonlinear activation function correlation method …)
1、 Guide pack
import torch
# torch.autograd Classes and functions are provided to derive any scalar function .
# To use automatic derivation , You only need to make minor changes to the existing code . Just put all the tensor Include in Variable In the object
from torch.autograd import Variable
# The amount of data entered in batch
batch_n = 100
# The number of features output through hidden layers
hidden_layer = 100
# Number of characteristics of input data
input_data = 1000
# The number of classification results finally output
output_data = 10
x = Variable(torch.randn(batch_n, input_data), requires_grad=False)
y = Variable(torch.randn(batch_n, output_data), requiers_grad=False)
''' # Here, delete the weight parameter code , Because later used torch.nn The classes in the package can help automatically generate and initialize the weight parameters of the corresponding dimensions w1 = Variable(torch.randn(input_data, hidden_layer), requires_grad = True) w2 = Variable(torch.randn(hidden_layer, output_data), requires_grad = True) '''
2、 Build a model
# ******** Model structures, ********
model = torch.nn.Sequential(
# First, it completes the linear transformation from the input layer to the hidden layer
torch.nn.Linear(input_data, hidden_layer),
# Through the activation function
torch.nn.ReLU(),
# Finally, complete the linear transformation from the hidden layer to the output layer
torch.nn.Linear(hidden_layer, output_data)
)
print(models)
- torch.nn.Sequential Is a sequence container , The neural network model is built by nesting various classes that implement the specific functions of the neural network .! Parameters will be automatically passed down according to the sequence we defined .
We can think of the parts in the container as different modules , Modules can be combined freely . There are generally two ways to add modules :
(1) Use direct nesting ( That is, the code above )
Model structure printout :
Sequential(
(0): Linear(in_features=1000, out_features=100, bias=True)
(1): ReLU()
(2): Linear(in_features=100, out_features=10, bias=True)
)
Process finished with exit code 0
By default, the number sequence starting from zero is used as the name of each module .
(2) Use orderdict An ordered dictionary is passed in
import torch
from torch.autograd import Variable
from collections import OrderedDict
# The amount of data entered in batch
batch_n = 100
# The number of features output through hidden layers
hidden_layer = 100
# Number of characteristics of input data
input_data = 1000
# The number of classification results finally output
output_data = 10
models = torch.nn.Sequential(OrderedDict([
("Line1", torch.nn.Linear(input_data, hidden_layer)),
("ReLU1", torch.nn.ReLU()),
("Line2", torch.nn.Linear(hidden_layer, output_data))
])
)
print(models)
Model structure printout :
Sequential(
(Line1): Linear(in_features=1000, out_features=100, bias=True)
(ReLU1): ReLU()
(Line2): Linear(in_features=100, out_features=10, bias=True)
)
Process finished with exit code 0
Each module is our customized name , Clearer .
torch.nn.Linear Class is used to realize linear transformation between different layers . Receive three parameters ( Enter the number of features , Output characteristic number , Whether to use offset ).
In practice , Just pass the input feature number and output feature number to torch.nn.Linear class , The weight parameters and offsets of the corresponding dimensions will be automatically generated .torch.nn.ReLU Class belongs to nonlinear activation classification , By default, no parameters need to be passed in during definition . There are many more available (PReLU、LeakyReLU、Tanh、Sigmoid、Softmax…)
3、 Optimization model
# ******** Optimization model ********
epoch_n = 10000
learning_rate = 1e-4
loss_fn = torch.nn.MSELoss()
- torch.nn.MSELoss class ( Mean square error ), You don't need to pass in any parameters when defining class objects , But when using an instance, you need to enter parameters with the same two dimensions (x,y) Calculate :
loss_fn = torch.nn.MSELoss()
x = Variable(torch.randn(100,100))
y = Variable(torch.randn(100,100))
loss = loss_fn(x, y)
print(loss)
result :
tensor(1.9493)
Process finished with exit code 0
- torch.nn.L1Loss class ( Mean absolute error ), Method is the same as above. .
- torch.nn.CrossEntropyLoss class ( Cross entropy ), When using an example, you need to input two parameters that meet the calculation conditions of cross entropy :
Classification cross entropy loss is usually used in multi class classification sets . The probability of correct class is close to 1, The probability of other classes is close 0
loss_fn = torch.nn.CrossEntropyLoss()
x = Variable(torch.randn(3, 5))
y = Variable(torch.LongTensor(3).random_(5)) # 3 The range is 0~4 The random number of
loss = loss_fn(x, y)
print(loss)
result :
tensor(1.5172)
Process finished with exit code 0
For the specific principle of cross entropy , Reference resources https://finisky.github.io/2020/07/09/crossentropyloss/
Train the established model and optimize the parameters :
# ******** Training models ********
for epoch in range(epoch_n):
y_pred = models(x)
loss = loss_fn(y_pred, y)
if epoch%1000 == 0:
print("Epoch:{}, loss:{:.4f}".format(epoch, loss.item()))
models.zero_grad()
loss.backward()
# Access all parameters in the model ( Yes models.parameters() Traversal )
for param in models.parameters():
param.data -= param.grad.data*learning_rate
3、 ... and 、torch.optim
The parameter optimization and update of neural network weights in the previous code have not been automated . stay torch.optim The package provides many classes that can realize automatic parameter optimization (SGD、AdaGrad、RMSProp、Adam etc. ).
import torch
# torch.autograd Classes and functions are provided to derive any scalar function .
# To use automatic derivation , You only need to make minor changes to the existing code . Just put all the tensor Include in Variable In the object
from torch.autograd import Variable
# The amount of data entered in batch
batch_n = 100
# The number of features output through hidden layers
hidden_layer = 100
# Number of characteristics of input data
input_data = 1000
# The number of classification results finally output
output_data = 10
x = Variable(torch.randn(batch_n, input_data), requires_grad=False)
y = Variable(torch.randn(batch_n, output_data), requires_grad=False)
''' # Here, delete the weight parameter code , Because later used torch.nn The classes in the package can help automatically generate and initialize the weight parameters of the corresponding dimensions w1 = Variable(torch.randn(input_data, hidden_layer), requires_grad = True) w2 = Variable(torch.randn(hidden_layer, output_data), requires_grad = True) '''
# ******** Model structures, ********
models = torch.nn.Sequential(
# First, it completes the linear transformation from the input layer to the hidden layer
torch.nn.Linear(input_data, hidden_layer),
# Through the activation function
torch.nn.ReLU(),
# Finally, complete the linear transformation from the hidden layer to the output layer
torch.nn.Linear(hidden_layer, output_data)
)
# print(models)
# ******** Optimization model ********
epoch_n = 20
learning_rate = 1e-4
loss_fn = torch.nn.MSELoss()
optimzer = torch.optim.Adam(models.parameters(), lr=learning_rate)
# ******** model training ********
for epoch in range(epoch_n):
y_pred = models(x)
loss = loss_fn(y_pred, y)
print("Epoch:{}, loss:{:.4f}".format(epoch, loss.item()))
optimzer.zero_grad() # Because the optimization algorithm is introduced above , Call directly optimzer.zero_grad() Zero the gradient of model parameters
loss.backward()
# The calculated gradient value is used to update the parameters of each node
optimzer.step()
Print the results :
Epoch:0, loss:1.1384
Epoch:1, loss:1.1160
Epoch:2, loss:1.0941
Epoch:3, loss:1.0727
Epoch:4, loss:1.0517
Epoch:5, loss:1.0311
Epoch:6, loss:1.0109
Epoch:7, loss:0.9913
Epoch:8, loss:0.9720
Epoch:9, loss:0.9532
Epoch:10, loss:0.9349
Epoch:11, loss:0.9171
Epoch:12, loss:0.8996
Epoch:13, loss:0.8827
Epoch:14, loss:0.8663
Epoch:15, loss:0.8503
Epoch:16, loss:0.8347
Epoch:17, loss:0.8194
Epoch:18, loss:0.8045
Epoch:19, loss:0.7900
Process finished with exit code 0
It USES Adam As an optimization function , Input in this class is the initial value of the optimized parameters and learning rate , If the initial value of learning rate is not entered , Default 0.001.!!Adam The optimization function can adaptively adjust the learning rate used by the gradient update .
Four 、torch and torchvision
torchvision The main function of package is to realize data processing 、 Import and preview :
import torch
import torchvision
from torchvision import datasets
from torchvision import transforms
from torch.autograd import Variable
1、torch.transforms
torch.transformers Provides rich classes to transform data (eg:CV Data sets are pictures ;NLP Data sets are text ). A large part of it can be used to realize data enhancement .
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5],
std=[0.5, 0.5, 0.5])])
transforms.Compose Class is regarded as a container , Multiple data transformations can be combined at the same time , The parameter passed in is a list , The elements in the list are various transformation operations on the loaded data ( The above code only uses type conversion ToTensor And standardized transformation Normalize).
Standard deviation transformation :
After this transformation , The data are all consistent with the mean of 0, The standard deviation is 1 The standard normal distribution of .
Other data transformation operations :
- torchvision.transforms.Resize Scale according to the required size , The passed parameter can be an integer data , Or is it (h,w) This sequence ——( Height , Width ).
- torchvision.transforms.Scale ditto .
- torchvision.transforms.CenterCrop Take the center of the loaded picture as the reference point , Cut to the desired size . The transfer parameters are the same as .
- torchvision.transforms.RandomCrop Literally , Random cutting .
- torchvision.transforms.RandomHorizontalFlip Flip the picture horizontally according to random probability , You can pass parameters to customize random probability , If there is no definition, it defaults to 0.5.
- torchvision.transforms.RandomVerticalFlip Flip vertically , The parameters are the same as above .
- torchvision.transforms.ToTensor Type conversion of picture data ——Tensor data type .
- torchvision.transforms.ToPILImage take Tensor Variable data ——PIL Picture data , Show the content of the picture .
2、torch.nn
The typical processing of neural network is as follows :
(1) Define the network structure of learnable parameters ( Stack layers and layers of design );
(2) Data set input ;
(3) Processing input ( It is handled by the defined network layer ), It is mainly reflected in the forward propagation of the network ;
(4) Calculation loss, from loss Layer computing ;
(5) Back propagation gradient ;
(6) Change the parameter value according to the gradient (SGD), The simplest implementation is :
w = w - lr * gradient
CNN Model structures, :
# ******* Model building and parameter optimization *******
class Model(torch.nn.Module):
# Two layers of convolution : One pooling layer and two fully connected layers
def __init__(self):
super(Model, self).__init__()
self.conv1 = torch.nn.Sequential(
torch.nn.Conv2d(1, 64, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(stride=2, kernel_size=2)
)
self.dense = torch.nn.Sequential(
torch.nn.Linear(14*14*128, 1024),
torch.nn.ReLU(),
torch.nn.Dropout(p=0.5),
torch.nn.Linear(1024, 10)
)
def forward(self, x):
x = self.conv1(x)
x = x.view(-1, 14*14*128)
x = self.demse(x)
return x
- torch.nn.Conv2d Convolution layer , Main input parameters : Enter the number of channels 、 Number of output channels 、 Convolution kernel size 、 The convolution kernel moves the step size 、padding value .
- torch.nn.MaxPool2d Maximum pool layer , main parameter : Pool window size 、 Pool window move step size 、padding value .
- torch.nn.Dropout Prevent convolution neural network from fitting in the training process ( working principle : Some parameters of the convolutional neural network model are zeroed with a certain random probability , To achieve the purpose of reducing the connection of two adjacent layers .)
- Forward propagation forward function
Usage flow :
(1) call module Of call Method ;
(2)module Of call It calls module Of forward Method ;
(3)forward If you call Module Subclasses of , Back to step one , If you come across Function Subclasses of , Keep going ;
(4) call Function Of call Method ;
(5)Function Of call Method is called Function Of forward Method ;
(6)Function Of forward Return value ;
(7)module Of forward Return value ;
(8) stay module Of call Conduct forward_hook operation , Then return the value .
Other important knowledge
Prepare the data —> Choose a model framework , There are also two important parts of supervision training : Loss function 、 optimization algorithm
- Loss function selection :
For model output probability The situation of , The loss function should be based on Cross entropy The loss of ; - Optimization algorithm selection :
The optimization algorithm uses the error signal to update the weight of the model . The simplest —— Hyperparametric control optimizer , This super parameter is the learning rate (lr), In the process of training , Several different learning rates should be tried and compared .
Method : Classical random gradient descent (SGD), But for complex optimization problems , There is a convergence problem . The alternative is Adaptive optimization algorithm (Adagrad or Adam), about Adam, See three above .
Supervise cycle training :
Is a nested loop : Internal circulation over a dataset or a set of batches , And an external cycle that repeats the internal cycle on a fixed number of cycles and other termination conditions .
After a lot of batch processing , The training cycle completes one cycle ( Refers to a complete training iteration ). The model is trained for a certain number of cycles , To train epoch Quantity is not optional , But there are some ways to decide when to stop .
The core idea :1、 Defining models ;2、 Calculate the output ;3、 Use the loss function to calculate the gradient ;4、 Apply the optimization algorithm to update the model parameters according to the gradient .
Split the dataset :
Divide training 、 verification 、 Test data set or Conduct k Crossover verification ( Smaller datasets can )
Make sure that these three data sets maintain the same distribution , Some precautions should be taken : Aggregate the data set by class label , Then randomly split each data set divided by class label into training 、 Validation and test data set .( common : Training 70%, verification 15%, test 15%).
When to stop training :
“ Stop in time ” The heuristic method of . Track and verify performance on datasets , And pay attention to when the performance is no longer improved . If performance continues to improve , End of training .
Before the end of training epoch The quantity of is called tolerance .
Appropriate super parameters :
Loss function 、 optimization algorithm 、 Learning rate 、 The size of the layer 、 Tolerance to stop in time 、 Various standardized decisions .
边栏推荐
猜你喜欢
[target detection] generalized focal loss v1
[image classification] how to use mmclassification to train your classification model
ABSA1: Attentional Encoder Network for Targeted Sentiment Classification
迁移学习——Transfer Joint Matching for Unsupervised Domain Adaptation
Wechat built-in browser prohibits caching
ML11-SKlearn实现支持向量机
[target detection] 6. SSD
京微齐力:基于HMEP060的心率血氧模块开发(1:FPGA发送多位指令)
GA-RPN:引导锚点的建议区域网络
Power Bi report server custom authentication
随机推荐
ML7自学笔记
NLP领域的AM模型
【Transformer】ACMix:On the Integration of Self-Attention and Convolution
第三周周报 ResNet+ResNext
迁移学习——Transitive Transfer Learning
研究生新生培训第一周:深度学习和pytorch基础
How to perform POC in depth with full flash distribution?
[tensorrt] convert pytorch into deployable tensorrt
【Transformer】AdaViT: Adaptive Tokens for Efficient Vision Transformer
ML自学笔记5
第2周学习:卷积神经网络基础
2、 During OCR training, txt files and picture data are converted to LMDB file format
QT学习笔记-Excel的导入导出
【Transformer】AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
【Transformer】SOFT: Softmax-free Transformer with Linear Complexity
Briefly talk about the difference between pendingintent and intent
ROS常用指令
Anr Optimization: cause oom crash and corresponding solutions
京微齐力:基于HMEP060的心率血氧模块开发(1:FPGA发送多位指令)
1、 Pytorch Cookbook (common code Collection)