当前位置：网站首页>Review of neural network related knowledge (pytorch)

Review of neural network related knowledge (pytorch)

2022-07-29 06:11:00 【Quinn-ntmy】

One 、 Common concepts

1、Batch（ Batch ）
（1） If it is a model training method ,batch It refers to the estimation of weight or parameters that is updated at one time after all data processing ;
（2） If it is data in model training ,batch Is the amount of data input for model calculation .

Batch based model training steps ：
a) Initialize parameters
b) Repeat the following steps ： Processing all the data , Update parameters

Its corresponding is the incremental algorithm , Steps are as follows ：
a) Initialize parameters
b) Repeat the following steps ： Process a data point or group of data points , Update parameters

（BP In the algorithm, ,“ Handle ” The specific operation is Calculate the gradient curve of the loss function . For batch algorithm —— Calculation Average or overall Gradient curve of loss function ; Incremental algorithm —— The calculation loss function only corresponds to This observation or several observations Gradient curve of .“ to update ” Is to subtract the product of gradient change rate and learning rate from the existing parameter value .）

2、Online Learning and Offline Learning
（1）Offline Learning： All data can be obtained repeatedly , For example, the above batch algorithm .
advantage ： For any fixed number of parameters , The objective function can be calculated directly , Therefore, it is easy to verify whether the model training is developing in the required direction ; The calculation accuracy can reach any reasonable degree ; Various algorithms can be used to avoid local optimization ; You can use training 、 verification 、 Test the trisection method to verify the universality of the model ; The machine confidence interval of the predicted value can be calculated .
（2）Online Learning： Each observation will be discarded after processing , Update parameters at the same time （ A kind of incremental algorithm ）.

3、 The offset / threshold
Neurons in the hidden layer or output layer using the activation function usually add an offset value when calculating the network input （Bias）. For linear output neurons , The offset term is the intercept term in regression .
Each neuron in the hidden layer and the output layer has its own offset term b. But if the input data has been converted into a finite value field by equal proportion , such as [0,1] Section , After the neuron of the first hidden layer sets the offset term , Other neurons in any subsequent layer that are linked to these neurons with offset items do not need to set additional offset items .

4、 Standardized data
common “ Standardization ” Data processing method ：
（1） Replay shrink Rescaling： It refers to adding or subtracting a constant from a vector , Multiply or divide by a constant .
（2） Normalization Normalization： It refers to dividing a vector by its norm (eg： European space distance ). Deep learning , The range is usually used as the norm , That is, subtract the minimum value from the vector , And divide by its range , Thus, the numerical range is 0~1.
frequently-used Normalization Method （BN、LN、IN、GN） See for details https://cloud.tencent.com/developer/article/1526775
（3） Standardization （Standardization）： A measure that removes the position and size of a vector . such as ： A vector that obeys normal distribution , You can subtract its mean , And divide by its variance to standardize the data , Thus, a vector of standard normal distribution is obtained .

Two 、torch.nn

torch.nn The classes in the package cover the common contents of deep neural network model in the process of building and parameter optimization （ Convolution layer 、 Pooling layer 、 Full connection layer and other construction methods , Parameter normalization method to prevent over fitting 、Dropout Method , Linear activation function of activation function part 、 Nonlinear activation function correlation method …）

1、 Guide pack

import torch
# torch.autograd Classes and functions are provided to derive any scalar function .
#  To use automatic derivation , You only need to make minor changes to the existing code . Just put all the tensor Include in Variable In the object 
from torch.autograd import Variable

#  The amount of data entered in batch 
batch_n = 100
#  The number of features output through hidden layers 
hidden_layer = 100
#  Number of characteristics of input data 
input_data = 1000
#  The number of classification results finally output 
output_data = 10

x = Variable(torch.randn(batch_n, input_data), requires_grad=False)
y = Variable(torch.randn(batch_n, output_data), requiers_grad=False)

''' #  Here, delete the weight parameter code , Because later used torch.nn The classes in the package can help automatically generate and initialize the weight parameters of the corresponding dimensions  w1 = Variable(torch.randn(input_data, hidden_layer), requires_grad = True) w2 = Variable(torch.randn(hidden_layer, output_data), requires_grad = True) '''

2、 Build a model

# ******** Model structures, ********
model = torch.nn.Sequential(
    #  First, it completes the linear transformation from the input layer to the hidden layer 
    torch.nn.Linear(input_data, hidden_layer),
    #  Through the activation function 
    torch.nn.ReLU(),
    #  Finally, complete the linear transformation from the hidden layer to the output layer 
    torch.nn.Linear(hidden_layer, output_data)
)
print(models)

torch.nn.Sequential Is a sequence container , The neural network model is built by nesting various classes that implement the specific functions of the neural network .！ Parameters will be automatically passed down according to the sequence we defined .
We can think of the parts in the container as different modules , Modules can be combined freely . There are generally two ways to add modules ：
（1） Use direct nesting （ That is, the code above ）
Model structure printout ：

Sequential(
  (0): Linear(in_features=1000, out_features=100, bias=True)
  (1): ReLU()
  (2): Linear(in_features=100, out_features=10, bias=True)
)

Process finished with exit code 0

By default, the number sequence starting from zero is used as the name of each module .

（2） Use orderdict An ordered dictionary is passed in

import torch
from torch.autograd import Variable
from collections import OrderedDict

#  The amount of data entered in batch 
batch_n = 100
#  The number of features output through hidden layers 
hidden_layer = 100
#  Number of characteristics of input data 
input_data = 1000
#  The number of classification results finally output 
output_data = 10

models = torch.nn.Sequential(OrderedDict([
    ("Line1", torch.nn.Linear(input_data, hidden_layer)),
    ("ReLU1", torch.nn.ReLU()),
    ("Line2", torch.nn.Linear(hidden_layer, output_data))
])
)
print(models)

Model structure printout ：

Sequential(
  (Line1): Linear(in_features=1000, out_features=100, bias=True)
  (ReLU1): ReLU()
  (Line2): Linear(in_features=100, out_features=10, bias=True)
)

Process finished with exit code 0

Each module is our customized name , Clearer .

torch.nn.Linear Class is used to realize linear transformation between different layers . Receive three parameters （ Enter the number of features , Output characteristic number , Whether to use offset ）.
In practice , Just pass the input feature number and output feature number to torch.nn.Linear class , The weight parameters and offsets of the corresponding dimensions will be automatically generated .
torch.nn.ReLU Class belongs to nonlinear activation classification , By default, no parameters need to be passed in during definition . There are many more available （PReLU、LeakyReLU、Tanh、Sigmoid、Softmax…）

3、 Optimization model

# ******** Optimization model ********
epoch_n = 10000
learning_rate = 1e-4
loss_fn = torch.nn.MSELoss()

torch.nn.MSELoss class （ Mean square error ）, You don't need to pass in any parameters when defining class objects , But when using an instance, you need to enter parameters with the same two dimensions (x,y) Calculate ：

loss_fn = torch.nn.MSELoss()
x = Variable(torch.randn(100,100))
y = Variable(torch.randn(100,100))
loss = loss_fn(x, y)
print(loss)

result ：

tensor(1.9493)

Process finished with exit code 0

torch.nn.L1Loss class （ Mean absolute error ）, Method is the same as above. .
torch.nn.CrossEntropyLoss class （ Cross entropy ）, When using an example, you need to input two parameters that meet the calculation conditions of cross entropy ：
Classification cross entropy loss is usually used in multi class classification sets . The probability of correct class is close to 1, The probability of other classes is close 0

loss_fn = torch.nn.CrossEntropyLoss()
x = Variable(torch.randn(3, 5))
y = Variable(torch.LongTensor(3).random_(5))  # 3 The range is 0~4 The random number of 
loss = loss_fn(x, y)
print(loss)

result ：

tensor(1.5172)

Process finished with exit code 0

For the specific principle of cross entropy , Reference resources https://finisky.github.io/2020/07/09/crossentropyloss/

Train the established model and optimize the parameters ：

# ******** Training models ********
for epoch in range(epoch_n):
    y_pred = models(x)
    loss = loss_fn(y_pred, y)
    if epoch%1000 == 0:
        print("Epoch:{}, loss:{:.4f}".format(epoch, loss.item()))
    models.zero_grad()

    loss.backward()
    
    #  Access all parameters in the model （ Yes models.parameters() Traversal ）
    for param in models.parameters():
        param.data -= param.grad.data*learning_rate

3、 ... and 、torch.optim

The parameter optimization and update of neural network weights in the previous code have not been automated . stay torch.optim The package provides many classes that can realize automatic parameter optimization （SGD、AdaGrad、RMSProp、Adam etc. ）.

import torch
# torch.autograd Classes and functions are provided to derive any scalar function .
#  To use automatic derivation , You only need to make minor changes to the existing code . Just put all the tensor Include in Variable In the object 
from torch.autograd import Variable

#  The amount of data entered in batch 
batch_n = 100
#  The number of features output through hidden layers 
hidden_layer = 100
#  Number of characteristics of input data 
input_data = 1000
#  The number of classification results finally output 
output_data = 10

x = Variable(torch.randn(batch_n, input_data), requires_grad=False)
y = Variable(torch.randn(batch_n, output_data), requires_grad=False)

''' #  Here, delete the weight parameter code , Because later used torch.nn The classes in the package can help automatically generate and initialize the weight parameters of the corresponding dimensions  w1 = Variable(torch.randn(input_data, hidden_layer), requires_grad = True) w2 = Variable(torch.randn(hidden_layer, output_data), requires_grad = True) '''

# ******** Model structures, ********
models = torch.nn.Sequential(
    #  First, it completes the linear transformation from the input layer to the hidden layer 
    torch.nn.Linear(input_data, hidden_layer),
    #  Through the activation function 
    torch.nn.ReLU(),
    #  Finally, complete the linear transformation from the hidden layer to the output layer 
    torch.nn.Linear(hidden_layer, output_data)
)
# print(models)

# ******** Optimization model ********
epoch_n = 20
learning_rate = 1e-4
loss_fn = torch.nn.MSELoss()

optimzer = torch.optim.Adam(models.parameters(), lr=learning_rate)

# ******** model training ********
for epoch in range(epoch_n):
    y_pred = models(x)
    loss = loss_fn(y_pred, y)
    print("Epoch:{}, loss:{:.4f}".format(epoch, loss.item()))
    optimzer.zero_grad()  #  Because the optimization algorithm is introduced above , Call directly optimzer.zero_grad() Zero the gradient of model parameters 

    loss.backward()

    #  The calculated gradient value is used to update the parameters of each node 
    optimzer.step()

Print the results ：

Epoch:0, loss:1.1384
Epoch:1, loss:1.1160
Epoch:2, loss:1.0941
Epoch:3, loss:1.0727
Epoch:4, loss:1.0517
Epoch:5, loss:1.0311
Epoch:6, loss:1.0109
Epoch:7, loss:0.9913
Epoch:8, loss:0.9720
Epoch:9, loss:0.9532
Epoch:10, loss:0.9349
Epoch:11, loss:0.9171
Epoch:12, loss:0.8996
Epoch:13, loss:0.8827
Epoch:14, loss:0.8663
Epoch:15, loss:0.8503
Epoch:16, loss:0.8347
Epoch:17, loss:0.8194
Epoch:18, loss:0.8045
Epoch:19, loss:0.7900

Process finished with exit code 0

It USES Adam As an optimization function , Input in this class is the initial value of the optimized parameters and learning rate , If the initial value of learning rate is not entered , Default 0.001.！！Adam The optimization function can adaptively adjust the learning rate used by the gradient update .

Four 、torch and torchvision

torchvision The main function of package is to realize data processing 、 Import and preview ：

import torch
import torchvision
from torchvision import datasets
from torchvision import transforms
from torch.autograd import Variable

1、torch.transforms
torch.transformers Provides rich classes to transform data （eg：CV Data sets are pictures ;NLP Data sets are text ）. A large part of it can be used to realize data enhancement .

transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize(mean=[0.5, 0.5, 0.5],
                                                     std=[0.5, 0.5, 0.5])])

transforms.Compose Class is regarded as a container , Multiple data transformations can be combined at the same time , The parameter passed in is a list , The elements in the list are various transformation operations on the loaded data （ The above code only uses type conversion ToTensor And standardized transformation Normalize）.
Standard deviation transformation ：
Insert picture description here
After this transformation , The data are all consistent with the mean of 0, The standard deviation is 1 The standard normal distribution of .
Other data transformation operations ：

torchvision.transforms.Resize Scale according to the required size , The passed parameter can be an integer data , Or is it (h,w) This sequence ——( Height , Width ).
torchvision.transforms.Scale ditto .
torchvision.transforms.CenterCrop Take the center of the loaded picture as the reference point , Cut to the desired size . The transfer parameters are the same as .
torchvision.transforms.RandomCrop Literally , Random cutting .
torchvision.transforms.RandomHorizontalFlip Flip the picture horizontally according to random probability , You can pass parameters to customize random probability , If there is no definition, it defaults to 0.5.
torchvision.transforms.RandomVerticalFlip Flip vertically , The parameters are the same as above .
torchvision.transforms.ToTensor Type conversion of picture data ——Tensor data type .
torchvision.transforms.ToPILImage take Tensor Variable data ——PIL Picture data , Show the content of the picture .

2、torch.nn
The typical processing of neural network is as follows ：
（1） Define the network structure of learnable parameters （ Stack layers and layers of design ）;
（2） Data set input ;
（3） Processing input （ It is handled by the defined network layer ）, It is mainly reflected in the forward propagation of the network ;
（4） Calculation loss, from loss Layer computing ;
（5） Back propagation gradient ;
（6） Change the parameter value according to the gradient （SGD）, The simplest implementation is ：
w = w - lr * gradient
CNN Model structures, ：

# ******* Model building and parameter optimization *******
class Model(torch.nn.Module):
    #  Two layers of convolution ： One pooling layer and two fully connected layers 
    def __init__(self):
        super(Model, self).__init__()
        self.conv1 = torch.nn.Sequential(
            torch.nn.Conv2d(1, 64, kernel_size=3, stride=1, padding=1),
            torch.nn.ReLU(),
            torch.nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(stride=2, kernel_size=2)
        )

        self.dense = torch.nn.Sequential(
            torch.nn.Linear(14*14*128, 1024),
            torch.nn.ReLU(),
            torch.nn.Dropout(p=0.5),
            torch.nn.Linear(1024, 10)
        )

    def forward(self, x):
        x = self.conv1(x)
        x = x.view(-1, 14*14*128)
        x = self.demse(x)
        return x

torch.nn.Conv2d Convolution layer , Main input parameters ： Enter the number of channels 、 Number of output channels 、 Convolution kernel size 、 The convolution kernel moves the step size 、padding value .
torch.nn.MaxPool2d Maximum pool layer , main parameter ： Pool window size 、 Pool window move step size 、padding value .
torch.nn.Dropout Prevent convolution neural network from fitting in the training process （ working principle ： Some parameters of the convolutional neural network model are zeroed with a certain random probability , To achieve the purpose of reducing the connection of two adjacent layers .）
Forward propagation forward function
Usage flow ：
（1） call module Of call Method ;
（2）module Of call It calls module Of forward Method ;
（3）forward If you call Module Subclasses of , Back to step one , If you come across Function Subclasses of , Keep going ;
（4） call Function Of call Method ;
（5）Function Of call Method is called Function Of forward Method ;
（6）Function Of forward Return value ;
（7）module Of forward Return value ;
（8） stay module Of call Conduct forward_hook operation , Then return the value .

Other important knowledge

Prepare the data —> Choose a model framework , There are also two important parts of supervision training ： Loss function 、 optimization algorithm

Loss function selection ：
For model output probability The situation of , The loss function should be based on Cross entropy The loss of ;
Optimization algorithm selection ：
The optimization algorithm uses the error signal to update the weight of the model . The simplest —— Hyperparametric control optimizer , This super parameter is the learning rate (lr), In the process of training , Several different learning rates should be tried and compared .
Method ： Classical random gradient descent (SGD), But for complex optimization problems , There is a convergence problem . The alternative is Adaptive optimization algorithm （Adagrad or Adam）, about Adam, See three above .

Supervise cycle training ：
Is a nested loop ： Internal circulation over a dataset or a set of batches , And an external cycle that repeats the internal cycle on a fixed number of cycles and other termination conditions .
After a lot of batch processing , The training cycle completes one cycle （ Refers to a complete training iteration ）. The model is trained for a certain number of cycles , To train epoch Quantity is not optional , But there are some ways to decide when to stop .

The core idea ：1、 Defining models ;2、 Calculate the output ;3、 Use the loss function to calculate the gradient ;4、 Apply the optimization algorithm to update the model parameters according to the gradient .

Split the dataset ：
Divide training 、 verification 、 Test data set or Conduct ~~k Crossover verification~~ （ Smaller datasets can ）
Make sure that these three data sets maintain the same distribution , Some precautions should be taken ： Aggregate the data set by class label , Then randomly split each data set divided by class label into training 、 Validation and test data set .（ common ： Training 70%, verification 15%, test 15%）.

When to stop training ：
“ Stop in time ” The heuristic method of . Track and verify performance on datasets , And pay attention to when the performance is no longer improved . If performance continues to improve , End of training .
Before the end of training epoch The quantity of is called tolerance .

Appropriate super parameters ：
Loss function 、 optimization algorithm 、 Learning rate 、 The size of the layer 、 Tolerance to stop in time 、 Various standardized decisions .

原网站

版权声明
本文为[Quinn-ntmy]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/210/202207290519491502.html