当前位置:网站首页>Week 2: convolutional neural network
Week 2: convolutional neural network
2022-07-25 22:59:00 【The Pleiades of CangMao】
Preface

Basic structure
One 、 Convolution Convolutional
1 Traditional neural networks VS Convolutional neural networks

2 Concept
Convolution is a mathematical operation on two functions of real variables .
3 Calculation of convolution layer
Because the image is two-dimensional , So we need to carry out two-dimensional convolution on the image .
How convolution kernel works ?
example 1: Input :5x5
Convolution kernel :3x3 [ 1 0 1 0 1 0 1 0 1 ] \begin{bmatrix}1&0&1\\0&1&0\\1&0&1\end{bmatrix} ⎣⎡101010101⎦⎤
step :1
Characteristics of figure :3x3
3x3 The convolution kernel of acts on the input data , Multiply and add the corresponding values , As shown in the figure below :
When Set the step size (stride) Does not match the size of the input picture when , The picture can be externally Zero fill , Thus, the input image size becomes larger .eg. example 1 Of padding by 0, example 2 Of padding by 1~
example 2:2 individual channel(2 individual filter, The final output is 2 individual feature map),padding=1
7x7x3,(3 A different weight matrix )
channel For the generated feature map The number of layers , Here is 2
Yes padding The size of the output characteristic graph ( The side length of a characteristic graph ): ( N + p a d d i n g ∗ 2 − F ) s t r i d e + 1 \frac{(N+padding*2-F)}{stride}+1 stride(N+padding∗2−F)+1
Two 、 Pooling
- While pooling retains the main features Reduce parameters and Amount of computation , Prevent over fitting , Improve the model generalization ability .
- It is generally between the convolution layer and the convolution layer , Between the full connection layer and the full connection layer .

Pooling The type of :
- Max pooling: Maximum pooling ( Commonly used for classification tasks )
- Average pooling: The average pooling

3、 ... and 、 Full connection
Usually the last part , Change the multi-dimensional matrix into a one-dimensional vertical column matrix for classification .
Four 、 Summary

Typical structure of convolutional neural network
One 、AlexNet
AlexNe The network brings deep learning back to the stage of history ~
- To prevent over fitting
(1) Used Dropout( Immediate deactivation ), During training, some neurons are shut down randomly , Integrate all neurons during the test ;
(2) Data to enhance :
· reverse 、 translation 、 symmetry : Random crop( Scale the picture )、 Flip horizontal ( Multiply the number of samples );
· change RGB Channel strength , Yes RGB Gaussian perturbation in space ; - The nonlinear activation function is used ReLU( Solved the problem of gradient disappearance 、 The calculation speed is very fast, just 2 Determine whether the input is >0、 The convergence rate is much faster than sigmoid)

Two 、ZFNet

3、 ... and 、VGG
VGG Is a deeper network . An auxiliary classifier is added , Solve the problem of gradient disappearance caused by too deep depth .
Four 、GoogleNet
Set up Inception modular ( Multiple Inception Structure stacking ), stay Inception Pass through Multiple convolution kernels Increase the diversity of features ( The following figure on the left ), But the final output is very large , It will lead to high computational complexity .
The solution is : Insert 1*1 Convolution kernel Conduct Dimension reduction ( Top right )
Then use a small convolution kernel instead of a large convolution kernel , Further reduce the amount of parameters .
for example : Use Two 3 * 3 Convolution kernel Come on replace One 5 * 5 Convolution kernel , It can reduce the parameter quantity 
Add nonlinear activation function between convolution kernel and convolution kernel , It is the network that produces more independent features , Stronger representation , Faster training .
The output has no additional full connection layer ( Except for the last category output layer )
5、 ... and 、ResNet
depth 152 layer .
Residual learning network (deep residual learning network)
Residual thought : Remove the same main part , To highlight small changes , It can be used to train very deep Networks .
set up F(x) + x = f( g( h(x) + x ) + x ) + x, Derivation of the formula , have to derivative = (f’ + 1)(g’ + 1)(h’ + 1). Even if the derivative of the function is zero or the function itself is zero , But add 1 Then there will be no gradient disappear .
Programming
torchvision.datasets.MNIST(root, train=True, transform=None, target_transform=None, download=False)
- root Download the data set to the local root directory , Include training.pt and test.pt file
- train, If set to True, from training.pt Create a dataset , Otherwise, from test.pt establish .
- transform, A function or transformation , Input PIL picture , Return the transformed data .
- target_transform A function or transformation , Input target , To transform .
- download, If set to True, Download data from the Internet and put it in root Under the folder
- DataLoader Is a more important class , The common operations provided are :
- batch_size( Every batch Size ), shuffle( Whether to randomly disrupt the order ),
- num_workers( When loading data, several sub processes are used )
transforms.Compose
Composes several transforms together. This transform does not support torchscript.
Combine several transformations . This transformation doesn't support torchscript.
- That is, combine several transformation methods , Transform the corresponding data in order .
- among torchscript Script module , Used to encapsulate scripts for cross platform use , If you need to support this situation , Need to use
torch.nn.Sequential, instead ofcompose- Corresponding to the code in the problem description , Apply first
ToTensor()send [0-255] Transformation for [0-1], Then applyNormalizeCustom standardization
transforms.ToTensor()
Convert a PIL Image or numpy.ndarray to tensor
Change one PIL Library pictures ornumpyThe array of istensorTensor type ; Convert from [0,255]->[0,1]
- Realization principle : Deal with different types , That is, divide each value by 255, Finally through
torch.from_numpytake PIL Library pictures ornumpy.ndarrayFor specific value types, such asInt32,int16,floatEqual conversiontorch.tensordata type
transforms.Normalize()
Normalize a tensor image with mean and standard deviation
Normalize a by means of mean and standard deviation tensor Images ( Standardizing images ), Formula for :output[channel] = (input[channel] - mean[channel]) / std[channel]
explain :transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
- first
(0.5, 0.5 ,0.5)That is, the average of the three channels- the second
(0.5, 0.5, 0.5)That is, the standard deviation of the three channels
becauseToTensor()The image has been changed to [0, 1], We make it [-1, 1], Take the first channel as an example , Substitute the maximum and minimum values into the formula :- (0-0.5)/0.5=-1
- (1-0.5)/0.5=1
That is, mapping to [-1,1]
One 、MNIST Data set classification : Build a simple CNN Yes mnist Data sets are classified .
Two 、CIFAR10 Data set classification : Use CNN Yes CIFAR10 Data sets are classified
3、 ... and 、 Use VGG16 Yes CIFAR10 classification
problem
- dataloader Inside shuffle What's the difference between taking different values ?
shuffle = truewhen , You can disrupt the order of pictures to increase diversity- Generally in
trainSet upshuffle=true, staytestSet upshuffle=false.- Pytorch Of DataLoader Medium shuffle Is to disrupt , Retake batch.
If the training set shuffle Not set to true The trained model is not generalized , That is, it is only suitable for predicting this data set , If the effect is not good on other data sets, it may also be bad on this data set . our shuffle The purpose of is to increase the generalization ability of the model , The test set is used to evaluate the performance of our training model on unknown data . Let the test set keep the original order convolution .
transform in , Different values are taken , What's the difference between this ?
Common data preprocessing methods , Such as below , The data set in the training part enhances the data , Use random cropping and random flipping , To enhance the generalization ability of the model , Prevent over fitting .epoch and batch The difference between ?
- When a complete data set passes through the neural network once , And return once , This process is called a epoch.( Multiple required epoch: In deep learning , It is not enough to transmit the entire data set to the neural network once , You need to train on neural network many times .)
- When the data set is large , For each epoch, It is difficult to read all data sets into memory at one time , This is the need to divide the data set into several reads , Call it one at a time batch.
- bach_size:batch Number of samples in .
1x1 Convolution sum of FC What's the difference? ? What role does it play ?
There is no difference in mathematical principle . but 1x1 The input size of convolution kernel of is not fixed ; The full connection is fixed .1*1 Convolution can be used to increase dimension ,FC Usually at the end .residual leanring Why can we improve the accuracy ?
Solved the problem of gradient disappearance , It can be used to train deeper NetworksCode exercise 2 , The Internet and 1989 year Lecun Proposed LeNet What's the difference? ?
Code 2 is used ReLu As an activation function ,LeNet The activation function of is SigmodCode exercise 2 , After convolution feature map It's going to get smaller , How to apply Residual Learning Residual learning ?
You can choose to use 1*1 Convolution 、 fill padding And other methods to adjust feature map SizeWhat methods can further improve the accuracy ?
Prevent over fitting :Dropout、 The data set in the training part enhances the data ( Data preprocessing )
Appropriately increase the depth of the model , Use a more appropriate activation function , Change the network structure of the model
边栏推荐
- Day006 select structure (if statement exercise)
- 依法严厉打击违规自媒体运营者:净化自媒体行业迫在眉睫
- Matrixcube unveils the complete distributed storage system matrixkv implemented in 102-300 lines
- Madness. MySQL learning.
- Deploy flash based websites using Google cloud
- JVM memory area
- Summary of common methods of string:
- We media people must have four material websites, and don't worry about finding materials anymore
- The difference between abstract classes and interfaces
- BIO、NIO、AIO的区别?
猜你喜欢

Interview question 17.11. word distance ●●

Tree view model example of QT

Qt5.12 installation error prompt: c:\qt5.12.11\vcredist\vcredist_ msvc2019_ x86.exe /norestart /q

MathType安装和解决不能Crtl+V的问题

Learning notes of technical art hundred people plan (2) -- vector

Network Security Learning (16)

自媒体人必备的4个素材网站,再也不用担心找不到素材

QT的Tree View Model示例

Qt的TQTreeWidget控件

Node.js operation database
随机推荐
SSH服务器CBC加密模式漏洞(CVE-2008-5161)
Summary of my 2020 online summer camp
HJ9 提取不重复的整数
Review of static routing
Tree view model example of QT
MatrixCube揭秘102——300行实现的完整分布式存储系统MatrixKV
Anaconda~Upload did not complete.
DOM event object
Deep recursion, deep search DFS, backtracking, paper cutting learning.
BIO、NIO、AIO的区别?
1000个Okaleido Tiger首发上线Binance NFT,引发抢购热潮
Simple application of partial labels and selectors
Summary of common methods of string:
[PTA] 7-19 check face value (15 points)
Basic knowledge of radar
面试题 17.11. 单词距离 ●●
QT的Tree View Model示例
Qt的TQTreeWidget控件
Von Neumann architecture
Common software shortcuts

