当前位置:网站首页>Thesis reading (2) - vggnet of classification
Thesis reading (2) - vggnet of classification
2022-07-28 23:02:00 【leu_ mon】
VGGNet
author :Karen Simonyan & Andrew Zisserman
Company : University of Oxford and Google DeepMind
Time :ILSVRC 2014 runner-up
subject :Very Deep Convolutional Networks for Large-Scale Image
Abstract
In this work , We study the influence of convolution network depth on the accuracy in large-scale image recognition environment . Our main contribution is to use very small (3×3) Convolutional filter architecture makes a comprehensive assessment of the increase in network depth , This shows that by pushing the depth to 16-19 The weighting layer can realize the significant improvement of the prior art configuration . These findings are our ImageNet Challenge 2014 The basis of the paper , Our team won the first place and the second place in the positioning and classification process . We also show that , Our representation is good for generalization of other datasets , The best results on other datasets . We make our two best ConvNet Models are publicly available , In order to further study the depth vision representation in computer vision .
background
With ConvNets It is becoming more and more popular in the field of computer vision , In order to achieve better accuracy , Many attempts have been made to improve Krizhevsky wait forsomeone (2012) Original architecture , This paper discusses ConvNet Another important aspect of Architecture Design —— Its depth . For this reason, other parameters of the architecture have been modified , Use very small in all layers (3×3) Convolutional filter , And by adding more convolution layers to steadily increase the depth of the network .
Model
All hidden layers are equipped with ReLU Activation function , The above network ( In addition to a ) No local response normalization (LRN), From the network A Medium 11 A weighted layer (8 Convolutions and 3 All connection layers ) To the network E in Of 19 A weighted layer (16 Convolutions and 3 All connection layers ).
By using three 3×3 Convolution layers are stacked to replace individual 7×7 layer . First , Three nonlinear correction layers are combined , Not a single one , This makes the decision function more discriminative . secondly , Reduce the number of parameters : This can be seen as right 7×7 Convolution filter is regularized , Force them through 3×3 filter ( Inject nonlinearity between them ) decomposition .
combination 1×1 Convolution layer ( To configure C, surface 1) It is a way to increase the nonlinearity of decision function without affecting the receptive field of convolution layer .
result
- Use the local response normalization network (A-LRN The Internet ) In the absence of any normalized layer , On the model A Did not improve .
- The classification error increases with ConvNet Increase in depth and decrease in depth , To configure C( Contains three 1×1 Convolution layer ) Than in the entire network layer 3×3 Convolution configuration D Worse .
- When the depth reaches 19 When the layer , The error rate of the architecture is saturated , But a deeper model may be beneficial to a larger data set .
- A shallow network of measurements top- 1 Error rate ratio network B Of top-1 Error rate ( On the center crop image ) high 7%, This proves that the deep network with small convolution kernel is better than the shallow network with large convolution kernel .
- Scale jitter during training (S ∈ [256;512]) With fixed minimum edge (S = 256 or S = 384) Image training results better than , This confirms that the training set enhancement through scale jitter is indeed helpful in capturing multi-scale image statistics .
The above table adopts multi-scale test , Compared with the previous table, it is found that , Scale jitter during testing results in better performance .
Use multi crop images (multi-crop) Performance is more intensive than evaluation (dense) Slightly better , And these two methods are complementary , Because their combination is superior to each of them . The author believes that this is caused by different treatments of convolution boundary conditions .
- multi-crop: That is, to carry out random cropping of multiple images , Then predict the structure of each sample through the network , And then you average all the outcomes
- densely: utilize FCN Thought , Send the original map directly to the network for prediction , Change the last full connection layer to 1x1 Convolution of , In this way, we can finally get a prediction score map, And then you average the results
Trained 6 A single scale network and a multi-scale network ,7 Network integration has 7.3% Of ILSVRC Test error ( Multiple test scales are used in the test ), Use two best performing multiscale models ( To configure D and E) Are combined , A combination of intensive evaluation and multi cropping image evaluation is used to reduce the test error to 6.8%.
The model proposed in this paper is significantly better than that of previous generations in ILSVRC-2012 and ILSVRC-2013 The model with the best result in the competition .
Training & test
- Batch size 256, momentum by 0.9, Weight falloff (L2 The penalty factor is set to 5⋅10−4)
- Two fully connected layers take dropout Regularization (dropout The ratio is set to 0.5)
- The learning rate is initially set to 0.01, The rate of study is 10 Double the ratio to reduce , Total decrease 3 Time , Training 74 individual epochs.
- By using Glorot&Bengio(2010) To initialize weights without pre training .
- A fixed size is randomly cropped from the normalized training image 224×224 Of The input image
- During training , The cropped image is randomly flipped horizontally and randomly RGB Color shift ; When testing , Enhance the test set by flipping the image horizontally
- Using a large number of cropped images can improve accuracy , Because compared with the full convolution network , It makes the sampling of the input image more precise
Highlights of the article
- The depth of the network increases , The network topology is simple
- Replace big core with small core , And the use of 1*1 Convolution layer , These papers have been tried before the author
- Experiments show that the effect of normalized response layer is not obvious
- When testing, multiple test scales can increase performance
Author outlook
- The depth of the network is important for visual representation .
边栏推荐
- 美国FCC提供16亿美元资助本国运营商移除华为和中兴设备
- HP ProLiant DL380 boot from USB flash drive, press which key
- CS flow [abnormal detection: normalizing flow]
- DIP-VBTV: Color Image Restoration Model Combining Deep Image Prior and Vector Bundle Total Variation
- leetcode 199. 二叉树的右视图
- Find out the maximum value of all indicators in epoch [tips]
- Summary of the problem that MathType formula does not correspond in word
- 【数据库】
- How to delete and remove the first row of elements in PHP two-dimensional array
- Nacos配置热更新的4种方式、读取项目配置文件的多种方式,@value,@RefreshScope,@NacosConfigurationProperties
猜你喜欢
es个人整理的相关面试题
Summary of common formula notes for solving problems in Higher Mathematics
cnpm安装步骤
Xshell7, xftp7 personal free version official download, no need to crack, no activation, download and use
【物理应用】大气吸收损耗附matlab代码
Nacos配置热更新的4种方式、读取项目配置文件的多种方式,@value,@RefreshScope,@NacosConfigurationProperties
Yolov5 improvement 15: network lightweight method deep separable convolution
Lenovo r9000p installation matlab2018a+cuda10.0 compilation
【雷达】基于核聚类实现雷达信号在线分选附matlab代码
Migration from IPv4 to IPv6
随机推荐
Migration from IPv4 to IPv6
Torch.fft.fft 2. () error reporting problem solution
Fastflow [abnormal detection: normalizing flow]
can‘t convert cuda:0 device type tensor to numpy. Use Tensor. cpu() to copy the tensor to host memory
【数据库】
RuntimeError: set_ sizes_ contiguous is not allowed on a Tensor created from .data or .detach().
Yolov5 improvement 12: replace backbone network C3 with lightweight network shufflenetv2
Draem+sspcab [anomaly detection: block]
Wheel 7: TCP client
【复制】互联网术语、简称、缩写
recursion and iteration
【物理应用】水下浮动风力涡轮机的尾流诱导动态模拟风场附matlab代码
Improvement 16 of yolov5: replace backbone network C3 with lightweight network pp-lcnet
1.8tft color screen test code (stm32f407ve)
Learning experience sharing 3: yolov5 training data set path index
The simple neural network model based on full connection layer MLP is changed to the model based on CNN convolutional neural network
1e3是浮点数?
Leetcode 199. right view of binary tree
Yolov5 improvement 15: network lightweight method deep separable convolution
Cglib create proxy