当前位置：网站首页>LeNet5、AlexNet、VGGNet、ResNet

LeNet5、AlexNet、VGGNet、ResNet

2022-07-28 07:21:00 【Sauerkraut】

1. Classical convolutional neural network framework :LeNet5、AlexNet、VGGNet

1.1LeNet5

LeNet5 Without input layer , Altogether 7 layer , The number of layers of trainable parameters is 5.

INPUT- Convolution C1- Pooling S2- Convolution C3- Pooling S4- Convolution C5- Full link F6- Full link OUTPUT

INPUT The size of is normalized to 32*32*1 .

Convolution ： Convolution kernel size 5×5, step ：1

Pooling ： Average

Activation function ： Hyperbolic tangent （Tanh） or S（Sigmoid） The activation function of

1.2AlexNet

AlexNet structure ：

Input layer ： The image size is 227×227×3, among 3 Representing the input image channel Count （R,G,B） by 3.
Convolution layer ：filter size 11×11,filter Number 96, Convolution step ?=4.（filter Size only lists width and height ,filter Matrix channel Count and input pictures channel Count one , Not listed here ）
Pooling layer ：max pooling,filter size 3×3, step ?=2
Convolution layer ：filter size 5×5,filter Number 256, step ?=1,padding Use same convolution, Even if the output image and input image of the convolution layer remain unchanged in width and height .
Pooling layer ：max pooling,filter size 3×3, step ?=2
Convolution layer ：filter size 3×3,filter Number 384, step ?=1,padding Use same convolution.
Convolution layer ：filter size 3×3,filter Number 384, step ?=1,padding Use same convolution.
Convolution layer ：filter size 3×3,filter Number 256, step ?=1,padding Use same convolution.
Pooling layer ：max pooling,filter size 3×3, step ?=2; After the pooling operation , Will be the size of 6×6×256 The output matrix of flatten Become a 9216 Dimension vector .
Fully connected layer ：neuron The number of 4096.
Fully connected layer ：neuron The number of 4096.
Fully connected layer , Output layer ：softmax Activation function ,neuron The number of 1000, representative 1000 Categories .

Pooling ： Maximize

Activation function ：ReLU

1.3VGGNet

VGG-16 structure ：

Input layer
Convolution layer - Convolution layer
Pooling layer
Convolution layer - Convolution layer
Pooling layer
Convolution layer - Convolution layer - Convolution layer
Pooling layer
Convolution layer - Convolution layer - Convolution layer
Pooling layer
Convolution layer - Convolution layer - Convolution layer
Pooling layer
Fully connected layer
Fully connected layer
Fully connected layer , Output layer

Convolution layer :filter Both width and height are 3, In steps of 1,padding All use same convolution

Pooling layer :filter Both width and height are 2, The steps are 2.

With the deepening of the network , The height and width of the image are shrinking with certain rules , After each pooling, it just shrinks by half , And the number of channels is increasing , And it happens to double after every convolution operation . in other words , The scale of image reduction and the scale of channel number increase are regular .

2.ResNet

Why is the depth of the network important ？

because CNN Be able to extract low/mid/high-level Characteristics of , The more layers the network has , It means being able to extract different level The richer the characteristics of . also , The deeper the net The more abstract the features extracted from the complex , The more semantic information .

Why not simply increase the number of network layers ？

For the original network , If you simply increase the depth , Will cause gradient dispersion or gradient explosion . The solution to the problem ：This problem,however, has been largely addressed by normalized initialization and intermediate normalization layers, which enable networks with tens of layers to start converging for stochastic gradient descent (SGD) with backpropagation. This can train dozens of layers of Networks . But there's another problem , Namely Degradation problem , Network layers increase , But the accuracy on the training set is saturated or even decreased .( This cannot be interpreted as overfitting, because overfit It should be better in the training set )

Solve the problem of degradation ： Deep residual network .

ResNet Two kinds of mapping： One is identity mapping, It refers to figure 1 in ” A curved curve ”, Another kind residual mapping, It means except ” A curved curve “ The part , So the final output is y=F(x)+xy=F(x)+x
identity mapping It's in the formula x, and residual mapping refer to “ Bad ”, That is to say y−x, So the residual refers to F(x) part .

$\oplus$ Connection There are two ways ：

(1)shortcut Equivalent dimension mapping ,F(x) And x Adding is adding element by element

y=F(x,Wi)+xy=F(x,Wi)+x
F=W2σ(W1x)F=W2σ(W1x)

(2) If the two dimensions are different , There are two options ：

(2.1) Directly through zero padding To add dimensions （channel）.

(2.2) to x Perform a linear mapping to match dimensions . To realize is to use 1x1 Convolution realizes , Direct change 1x1 Convolution filters number . This will increase parameters .

If F(x) And x Of channels Different , Namely 3x3x64 and 3x3x128 Convolution operation of , their channel The number is different. (64 and 128), Therefore, the calculation method is adopted ：