当前位置：网站首页>Thesis reading (0) - alexnet of classification

Thesis reading (0) - alexnet of classification

2022-07-28 22:54:00 【leu_ mon】

ALexNet

author ：Alex Krizhevsky
Company ：University of Toronto
Time ：NeurIPS,ILSVRC 2012 champion
subject ：ImageNet Classification with Deep Convolutional Neural Networks

Abstract

We trained a large-scale deep convolutional neural network to classify ImageNet LSVRC-2010 The contest 120 10000 high-resolution images ,1000 Different categories . On the test data , We got top-1 37.5% and top-5 17.0% Error rate , This result is much better than the best one at present . This neural network has 6000 Ten thousand parameters and 650000 Neurons , contain 5 Convolution layers （ Some convolutions are followed by pooling layers ） and 3 All connection layers , And finally a 1000 Dimensional softmax. In order to train faster , We used unsaturated neurons （ReLU）, And in the convolution operation using a very effective GPU. In order to reduce the over fitting of full connection layer , We used a recently developed one called dropout The regularization method of , It turned out to be very effective . We also used a variant of this model to participate in ILSVRC-2012 competition , Won the championship and finished second top-5 26.2% Compared with , We have top-5 15.3% Error rate .

background

The huge complexity of the object recognition task means that this problem cannot be specified , Convolutional neural networks (CNNs) The capacity of can be controlled by changing the depth and breadth , They can also make powerful and often correct assumptions about the nature of images （ Due to the stability of statistics and the locality of pixels ）.

Compared with the standard feedforward neural network with similar hierarchical size ,CNNs There are fewer connections and parameters , It's easier to train, but training on high-resolution images is still expensive .

current GPU, With a highly optimized 2D Convolution realization , Strong enough to do a lot CNN Training for , And recent data sets such as ImageNet It contains enough annotation samples to train such a model without serious over fitting .

Model

Insert picture description here

Before the final network 5 The layer is the convolution layer , The rest 3 The layer is the fully connected layer . The output of the last full connection layer is fed to 1000 Dimensional softmax layer ,softmax Will produce 1000 Distribution of class labels . Our network maximizes the goal of multiple logistic regression , This is equivalent to the mean value of the logarithm probability of the correct label of the training sample under the maximum prediction distribution .

result

Insert picture description here

The author in ILSVRC-2012 The training set trained five CNN The network averages 16.4% Error rate , stay ImageNet 2011 Pre training on the whole data set released in autumn , Then a volume layer is added after the pooling layer of the above network ILSVRC-2012 Training set tuning , Trained two CNN The Internet . Seven CNN The prediction of the network is averaged and 15.3% Error rate . The second place won 26.2% Error rate .
Insert picture description here

It shows the convolution kernel learned from the two data connection layers of the network . The Internet has learned a lot of frequency cores 、 Direction selection kernel , Also learned a variety of color spots .
Insert picture description here

（ On the left ） By means of 8 It's calculated on a test image top-5 The prediction qualitatively evaluates what the network learns . Note that even targets that are not in the center of the image can be recognized by the network , For example, the bug in the upper left corner .（ Right ） The first column is 5 Zhang ILSVRC-2010 Test image . The remaining columns show 6 Training images , The feature vector of these images in the last hidden layer has the minimum Euclidean distance from the feature vector of the test image .

Training & test

Of the sample batch size by 128,, momentum by 0.9, The weight decay rate is 0.0005
The mean value used is 0, The standard deviation is 0.01 The Gaussian distribution initializes the weight of each layer
The learning rate is initialized to 0.01, When the error is fixed, the learning rate becomes the original 0.1

Highlights of the article

Sort according to the importance of the author

Used ReLU Activation function ： Training is equivalent tanh The activation function is several times faster
many GPU Model parallel training ： Cross validation is used for different GPU Unified adjustment of the model
Local response normalization ： Follow up work is rarely seen
Overlapping pooling ： Each part of the previous pooling does not overlap , The model is more difficult to fit after overlapping pooling
Data to enhance ： Translation and horizontal flipping of the image , Change the of the image RGB Strength
dropout discarded ： Reduce overfitting , Model fusion is a method to reduce test error , But it takes too much time ,dropout Method will be 0.5 The probability of each hidden layer neuron output is set to 0, It only takes twice as long .

Author outlook

The result can be simply by waiting faster GPU And a larger set of available data to improve
Use a very large deep convolution network on video sequences , The timing structure of video sequence will provide very helpful information

原网站

版权声明
本文为[leu_ mon]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/196/202207130600509423.html