当前位置:网站首页>Thesis reading (2) - vggnet of classification
Thesis reading (2) - vggnet of classification
2022-07-28 23:02:00 【leu_ mon】
VGGNet
author :Karen Simonyan & Andrew Zisserman
Company : University of Oxford and Google DeepMind
Time :ILSVRC 2014 runner-up
subject :Very Deep Convolutional Networks for Large-Scale Image
Abstract
In this work , We study the influence of convolution network depth on the accuracy in large-scale image recognition environment . Our main contribution is to use very small (3×3) Convolutional filter architecture makes a comprehensive assessment of the increase in network depth , This shows that by pushing the depth to 16-19 The weighting layer can realize the significant improvement of the prior art configuration . These findings are our ImageNet Challenge 2014 The basis of the paper , Our team won the first place and the second place in the positioning and classification process . We also show that , Our representation is good for generalization of other datasets , The best results on other datasets . We make our two best ConvNet Models are publicly available , In order to further study the depth vision representation in computer vision .
background
With ConvNets It is becoming more and more popular in the field of computer vision , In order to achieve better accuracy , Many attempts have been made to improve Krizhevsky wait forsomeone (2012) Original architecture , This paper discusses ConvNet Another important aspect of Architecture Design —— Its depth . For this reason, other parameters of the architecture have been modified , Use very small in all layers (3×3) Convolutional filter , And by adding more convolution layers to steadily increase the depth of the network .
Model

All hidden layers are equipped with ReLU Activation function , The above network ( In addition to a ) No local response normalization (LRN), From the network A Medium 11 A weighted layer (8 Convolutions and 3 All connection layers ) To the network E in Of 19 A weighted layer (16 Convolutions and 3 All connection layers ).
By using three 3×3 Convolution layers are stacked to replace individual 7×7 layer . First , Three nonlinear correction layers are combined , Not a single one , This makes the decision function more discriminative . secondly , Reduce the number of parameters : This can be seen as right 7×7 Convolution filter is regularized , Force them through 3×3 filter ( Inject nonlinearity between them ) decomposition .
combination 1×1 Convolution layer ( To configure C, surface 1) It is a way to increase the nonlinearity of decision function without affecting the receptive field of convolution layer .
result

- Use the local response normalization network (A-LRN The Internet ) In the absence of any normalized layer , On the model A Did not improve .
- The classification error increases with ConvNet Increase in depth and decrease in depth , To configure C( Contains three 1×1 Convolution layer ) Than in the entire network layer 3×3 Convolution configuration D Worse .
- When the depth reaches 19 When the layer , The error rate of the architecture is saturated , But a deeper model may be beneficial to a larger data set .
- A shallow network of measurements top- 1 Error rate ratio network B Of top-1 Error rate ( On the center crop image ) high 7%, This proves that the deep network with small convolution kernel is better than the shallow network with large convolution kernel .
- Scale jitter during training (S ∈ [256;512]) With fixed minimum edge (S = 256 or S = 384) Image training results better than , This confirms that the training set enhancement through scale jitter is indeed helpful in capturing multi-scale image statistics .

The above table adopts multi-scale test , Compared with the previous table, it is found that , Scale jitter during testing results in better performance .
Use multi crop images (multi-crop) Performance is more intensive than evaluation (dense) Slightly better , And these two methods are complementary , Because their combination is superior to each of them . The author believes that this is caused by different treatments of convolution boundary conditions .
- multi-crop: That is, to carry out random cropping of multiple images , Then predict the structure of each sample through the network , And then you average all the outcomes
- densely: utilize FCN Thought , Send the original map directly to the network for prediction , Change the last full connection layer to 1x1 Convolution of , In this way, we can finally get a prediction score map, And then you average the results

Trained 6 A single scale network and a multi-scale network ,7 Network integration has 7.3% Of ILSVRC Test error ( Multiple test scales are used in the test ), Use two best performing multiscale models ( To configure D and E) Are combined , A combination of intensive evaluation and multi cropping image evaluation is used to reduce the test error to 6.8%.

The model proposed in this paper is significantly better than that of previous generations in ILSVRC-2012 and ILSVRC-2013 The model with the best result in the competition .
Training & test
- Batch size 256, momentum by 0.9, Weight falloff (L2 The penalty factor is set to 5⋅10−4)
- Two fully connected layers take dropout Regularization (dropout The ratio is set to 0.5)
- The learning rate is initially set to 0.01, The rate of study is 10 Double the ratio to reduce , Total decrease 3 Time , Training 74 individual epochs.
- By using Glorot&Bengio(2010) To initialize weights without pre training .
- A fixed size is randomly cropped from the normalized training image 224×224 Of The input image
- During training , The cropped image is randomly flipped horizontally and randomly RGB Color shift ; When testing , Enhance the test set by flipping the image horizontally
- Using a large number of cropped images can improve accuracy , Because compared with the full convolution network , It makes the sampling of the input image more precise
Highlights of the article
- The depth of the network increases , The network topology is simple
- Replace big core with small core , And the use of 1*1 Convolution layer , These papers have been tried before the author
- Experiments show that the effect of normalized response layer is not obvious
- When testing, multiple test scales can increase performance
Author outlook
- The depth of the network is important for visual representation .
边栏推荐
- Torch.fft.fft 2. () error reporting problem solution
- 轮子七:TCP客户端
- 无代码开发平台通讯录导出入门教程
- 【数据库】
- This year, MediaTek 5g chip shipments are expected to reach 50million sets!
- Summary of the problem that MathType formula does not correspond in word
- 18 diagrams, intuitive understanding of neural networks, manifolds and topologies
- leetcode 199. 二叉树的右视图
- Improvement 18 of yolov5: the loss function is improved to alpha IOU loss function
- DirectX repair tool download (where is exagear simulator package)
猜你喜欢

【三维目标检测】3DSSD(一)

LTE cell search process and sch/bch design

MySQL foundation - data query

Introduction to address book export without code development platform

【物理应用】大气吸收损耗附matlab代码

Record a question about the order of trigonometric function exchange integrals

No code development platform management background tutorial

can‘t convert cuda:0 device type tensor to numpy. Use Tensor. cpu() to copy the tensor to host memory

Stm32subeide (10) -- ADC scans multiple channels in DMA mode

Simple es highlight practice
随机推荐
Learning experience sharing 4: learning experience of yolov7
Invest 50billion yuan! SMIC capital was officially registered!
Improvement 13 of yolov5: replace backbone network C3 with lightweight network efficientnetv2
(重要)初识C语言 -- 函数
es学习目录
Introduction to address book export without code development platform
Target detection notes -yolo
Mspba [anomaly detection: representation_based]
无代码开发平台通讯录导出入门教程
Annaconda installs pytoch and switches environments
DirectX修复工具下载(exagear模拟器数据包在哪里)
Lenovo r9000p installation matlab2018a+cuda10.0 compilation
Morphology of image
cannot resize variables that require grad
CS flow [abnormal detection: normalizing flow]
2020年国内十大IC设计企业曝光!这五大产业挑战仍有待突破!
How to install and use PHP library neo4j
18张图,直观理解神经网络、流形和拓扑
A new paradigm of distributed deep learning programming: Global tensor
递归和迭代