当前位置：网站首页>Week 2: convolutional neural network

Week 2: convolutional neural network

2022-07-25 22:59:00 【The Pleiades of CangMao】

Preface

Basic structure

One 、 Convolution Convolutional

1 Traditional neural networks VS Convolutional neural networks

2 Concept

Convolution is a mathematical operation on two functions of real variables .

3 Calculation of convolution layer

Because the image is two-dimensional , So we need to carry out two-dimensional convolution on the image .
Insert picture description here

How convolution kernel works ？

example 1： Input ：5x5
Convolution kernel ：3x3 $\begin{bmatrix}1&0&1\\0&1&0\\1&0&1\end{bmatrix}$
step ：1
Characteristics of figure ：3x3

3x3 The convolution kernel of acts on the input data , Multiply and add the corresponding values , As shown in the figure below ：

When Set the step size (stride) Does not match the size of the input picture when , The picture can be externally Zero fill , Thus, the input image size becomes larger .eg. example 1 Of padding by 0, example 2 Of padding by 1~

example 2：2 individual channel（2 individual filter, The final output is 2 individual feature map）,padding=1
7x7x3,（3 A different weight matrix ）
channel For the generated feature map The number of layers , Here is 2

Yes padding The size of the output characteristic graph ( The side length of a characteristic graph )： $\frac{(N+padding*2-F)}{stride}+1$

Two 、 Pooling

While pooling retains the main features Reduce parameters and Amount of computation , Prevent over fitting , Improve the model generalization ability .
It is generally between the convolution layer and the convolution layer , Between the full connection layer and the full connection layer .

Pooling The type of :

Max pooling： Maximum pooling （ Commonly used for classification tasks ）
Average pooling： The average pooling

3、 ... and 、 Full connection

Usually the last part , Change the multi-dimensional matrix into a one-dimensional vertical column matrix for classification .

Four 、 Summary

Typical structure of convolutional neural network

One 、AlexNet

AlexNe The network brings deep learning back to the stage of history ~

To prevent over fitting
（1） Used Dropout（ Immediate deactivation ）, During training, some neurons are shut down randomly , Integrate all neurons during the test ;
（2） Data to enhance ：
· reverse 、 translation 、 symmetry ： Random crop（ Scale the picture ）、 Flip horizontal （ Multiply the number of samples ）;
· change RGB Channel strength , Yes RGB Gaussian perturbation in space ;
The nonlinear activation function is used ReLU（ Solved the problem of gradient disappearance 、 The calculation speed is very fast, just 2 Determine whether the input is >0、 The convergence rate is much faster than sigmoid）

Two 、ZFNet

Insert picture description here

3、 ... and 、VGG

VGG Is a deeper network . An auxiliary classifier is added , Solve the problem of gradient disappearance caused by too deep depth .

Four 、GoogleNet

Set up Inception modular （ Multiple Inception Structure stacking ）, stay Inception Pass through Multiple convolution kernels Increase the diversity of features ( The following figure on the left ), But the final output is very large , It will lead to high computational complexity .

The solution is ： Insert 1*1 Convolution kernel Conduct Dimension reduction ( Top right )
Then use a small convolution kernel instead of a large convolution kernel , Further reduce the amount of parameters .

for example ： Use Two 3 * 3 Convolution kernel Come on replace One 5 * 5 Convolution kernel , It can reduce the parameter quantity

Add nonlinear activation function between convolution kernel and convolution kernel , It is the network that produces more independent features , Stronger representation , Faster training .

The output has no additional full connection layer （ Except for the last category output layer ）

5、 ... and 、ResNet

depth 152 layer .
Residual learning network （deep residual learning network）
Residual thought : Remove the same main part , To highlight small changes , It can be used to train very deep Networks .
Insert picture description here
set up F(x) + x = f( g( h(x) + x ) + x ) + x, Derivation of the formula , have to derivative = (f’ + 1)(g’ + 1)(h’ + 1). Even if the derivative of the function is zero or the function itself is zero , But add 1 Then there will be no gradient disappear .

Programming

torchvision.datasets.MNIST(root, train=True, transform=None, target_transform=None, download=False)

root Download the data set to the local root directory , Include training.pt and test.pt file
train, If set to True, from training.pt Create a dataset , Otherwise, from test.pt establish .
transform, A function or transformation , Input PIL picture , Return the transformed data .
target_transform A function or transformation , Input target , To transform .
download, If set to True, Download data from the Internet and put it in root Under the folder
DataLoader Is a more important class , The common operations provided are ：
batch_size( Every batch Size ), shuffle( Whether to randomly disrupt the order ),
num_workers( When loading data, several sub processes are used )

transforms.Compose

Composes several transforms together. This transform does not support torchscript.
Combine several transformations . This transformation doesn't support torchscript.
That is, combine several transformation methods , Transform the corresponding data in order .
among torchscript Script module , Used to encapsulate scripts for cross platform use , If you need to support this situation , Need to use torch.nn.Sequential, instead of compose
Corresponding to the code in the problem description , Apply first ToTensor() send [0-255] Transformation for [0-1], Then apply Normalize Custom standardization

transforms.ToTensor()

Convert a PIL Image or numpy.ndarray to tensor
Change one PIL Library pictures or numpy The array of is tensor Tensor type ; Convert from [0,255]->[0,1]
Realization principle ： Deal with different types , That is, divide each value by 255, Finally through torch.from_numpy take PIL Library pictures or numpy.ndarray For specific value types, such as Int32,int16,float Equal conversion torch.tensor data type

transforms.Normalize()

Normalize a tensor image with mean and standard deviation
Normalize a by means of mean and standard deviation tensor Images （ Standardizing images ）, Formula for ：
output[channel] = (input[channel] - mean[channel]) / std[channel]
explain ：transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
first (0.5, 0.5 ,0.5) That is, the average of the three channels
the second (0.5, 0.5, 0.5) That is, the standard deviation of the three channels
because ToTensor() The image has been changed to [0, 1], We make it [-1, 1], Take the first channel as an example , Substitute the maximum and minimum values into the formula ：
(0-0.5)/0.5=-1
(1-0.5)/0.5=1
That is, mapping to [-1,1]

One 、MNIST Data set classification ： Build a simple CNN Yes mnist Data sets are classified .

Notebook link

Two 、CIFAR10 Data set classification ： Use CNN Yes CIFAR10 Data sets are classified

Notebook link

3、 ... and 、 Use VGG16 Yes CIFAR10 classification

Notebook link

problem

dataloader Inside shuffle What's the difference between taking different values ？
shuffle = true when , You can disrupt the order of pictures to increase diversity
Generally in train Set up shuffle=true, stay test Set up shuffle=false.
Pytorch Of DataLoader Medium shuffle Is to disrupt , Retake batch.

If the training set shuffle Not set to true The trained model is not generalized , That is, it is only suitable for predicting this data set , If the effect is not good on other data sets, it may also be bad on this data set . our shuffle The purpose of is to increase the generalization ability of the model , The test set is used to evaluate the performance of our training model on unknown data . Let the test set keep the original order convolution .
transform in , Different values are taken , What's the difference between this ？
Common data preprocessing methods , Such as below , The data set in the training part enhances the data , Use random cropping and random flipping , To enhance the generalization ability of the model , Prevent over fitting .
epoch and batch The difference between ？
When a complete data set passes through the neural network once , And return once , This process is called a epoch.（ Multiple required epoch： In deep learning , It is not enough to transmit the entire data set to the neural network once , You need to train on neural network many times .）
When the data set is large , For each epoch, It is difficult to read all data sets into memory at one time , This is the need to divide the data set into several reads , Call it one at a time batch.
bach_size：batch Number of samples in .
1x1 Convolution sum of FC What's the difference? ？ What role does it play ？
There is no difference in mathematical principle . but 1x1 The input size of convolution kernel of is not fixed ; The full connection is fixed .1*1 Convolution can be used to increase dimension ,FC Usually at the end .
residual leanring Why can we improve the accuracy ？
Solved the problem of gradient disappearance , It can be used to train deeper Networks
Code exercise 2 , The Internet and 1989 year Lecun Proposed LeNet What's the difference? ？
Code 2 is used ReLu As an activation function ,LeNet The activation function of is Sigmod
Code exercise 2 , After convolution feature map It's going to get smaller , How to apply Residual Learning Residual learning ?
You can choose to use 1*1 Convolution 、 fill padding And other methods to adjust feature map Size
What methods can further improve the accuracy ？
Prevent over fitting ：Dropout、 The data set in the training part enhances the data （ Data preprocessing ）
Appropriately increase the depth of the model , Use a more appropriate activation function , Change the network structure of the model