当前位置：网站首页>CV learning notes cnn-vgg

CV learning notes cnn-vgg

2022-06-09 07:56:00 【Moresweet】

CNN-VGG

1 . The process of image recognition

** Get the original information ：** The external information obtained through the sensor （ Such as images ） Into signals that the computer can process .
** Preprocessing ：** Translate the image 、 rotate 、 De noise … operation , The purpose is to enhance the features of interest in the image .
** Feature extraction and feature selection ：** In pattern recognition , Feature extraction and selection are required . Feature extraction and selection is one of the key technologies in image recognition .
** Classifier design ：** It refers to a recognition rule obtained through training , A feature classification can be obtained by this recognition rule , Make the image recognized
Technology can get high recognition rate . Classification decision is to classify the identified objects in the feature space , So as to better identify the
Which class does the object belong to .

2. Classification and testing

classification ： Classification is a given picture , Be able to complete the work of determining it as a certain category .

testing ： Detection is to find the objects of a given category in the picture , And give the work of the existing area in the picture .

Insert picture description here

3. VGG neural network

common CNN（ Convolutional neural networks ）

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-rdYSq7Rs-1653013712403)(imgs/image-20220520094651788.png)]

I mentioned in my last blog AlexNet It's the most classic CNN Model , and VGG Is based on AlexNet Improved network model , These branches developed later （ Different color blocks have different branches ） Each has its own optimization characteristics , and VGG Is characterized by The network is deep , Reached 16-19 layer , And a small convolution kernel is used （3x3）

VGG19（ The number represents the number of layers , Each layer contains a certain convolution 、 Pooling 、 Full connection and other operations ） as follows

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-J7yGoyZU-1653013712404)(imgs/image-20220520095035639.png)]

Let's say VGG16 Take an example for analysis ：

vgg16 The network layer with parameters in the network structure has 16 layer , namely 13 Convolution layers ,5 A pool layer ,3 All connection layers , The active layer... Is not included .( notes ： The pooling layer does not contain parameters , so VGG16 Of 16 The calculation does not include nonparametric layers , Only the convolution layer and the full connection layer are calculated )

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-l9Frcfah-1653013712404)(imgs/image-20220520101401740.png)]

Example ：

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-QGNBJqzr-1653013712405)(imgs/image-20220520095138799.png)]

Layer by layer analysis ：
Enter the picture as (224,224,3) That's resolution 224x224 Three channel picture information , If the picture size is different, you need resize
Channel number calculation formula ：N=(W-F+2P)/S + 1
conv1 two [3,3] Convolution network , The output feature layer is 64, Output is (224,224,64), Again 2X2 Maximum pooling , Output net by
(112,112,64).
conv2 two [3,3] Convolution network , The output feature layer is 128, Output net by (112,112,128), Again 2X2 Maximum pooling , Output
net by (56,56,128).
conv3 Three times [3,3] Convolution network , The output feature layer is 256, Output net by (56,56,256), Again 2X2 Maximum pooling , Output net
by (28,28,256).
conv3 Three times [3,3] Convolution network , The output feature layer is 256, Output net by (28,28,512), Again 2X2 Maximum pooling , Output net
by (14,14,512).
conv3 Three times [3,3] Convolution network , The output feature layer is 256, Output net by (14,14,512), Again 2X2 Maximum pooling , Output net
by (7,7,512).
Use convolution to simulate the full connection layer , The effect is equal to , Output net by (1,1,4096). Two times in total .
Use convolution to simulate the full connection layer , The effect is equal to , Output net by (1,1,1000).
The final output is the prediction of each class .

VGG The optimization strategy of

Use convolution layer instead of full connection layer

The calculation method of the full connection layer is actually that the convolution kernel size is feature map Convolution of size , The way to use convolution layer instead of full connection layer is to set convolution kernel as the size of input space .

Why do you do this , Because a lot of computer resources （ Such as CUDA） The convolution operation is processed , Can speed up , and FC But not , Therefore, it can improve efficiency .

for example VGG16 The first fully connected layer input is 7x7x512, Output is 1x1x4096, This can be done with a convolution kernel size 7x7, step
(stride) by 1, No filling (padding), Number of output channels 4096 The convolution layer equivalent representation of , The output of 1x1x4096, And the whole company
The connection layer is equivalent . The following full connection layer can use 1x1 Convolution equivalent substitution .

1x1 Convolution

1x1 Convolution can increase and reduce the dimension of the feature channel , In this way, the number of convolution kernels is the number of output channels , Compared with the pool layer, the number of channels cannot be changed , It can only achieve feature map Dimensionality reduction of virtual reality .