当前位置:网站首页>Thesis reading (2) - vggnet of classification
Thesis reading (2) - vggnet of classification
2022-07-28 23:02:00 【leu_ mon】
VGGNet
author :Karen Simonyan & Andrew Zisserman
Company : University of Oxford and Google DeepMind
Time :ILSVRC 2014 runner-up
subject :Very Deep Convolutional Networks for Large-Scale Image
Abstract
In this work , We study the influence of convolution network depth on the accuracy in large-scale image recognition environment . Our main contribution is to use very small (3×3) Convolutional filter architecture makes a comprehensive assessment of the increase in network depth , This shows that by pushing the depth to 16-19 The weighting layer can realize the significant improvement of the prior art configuration . These findings are our ImageNet Challenge 2014 The basis of the paper , Our team won the first place and the second place in the positioning and classification process . We also show that , Our representation is good for generalization of other datasets , The best results on other datasets . We make our two best ConvNet Models are publicly available , In order to further study the depth vision representation in computer vision .
background
With ConvNets It is becoming more and more popular in the field of computer vision , In order to achieve better accuracy , Many attempts have been made to improve Krizhevsky wait forsomeone (2012) Original architecture , This paper discusses ConvNet Another important aspect of Architecture Design —— Its depth . For this reason, other parameters of the architecture have been modified , Use very small in all layers (3×3) Convolutional filter , And by adding more convolution layers to steadily increase the depth of the network .
Model

All hidden layers are equipped with ReLU Activation function , The above network ( In addition to a ) No local response normalization (LRN), From the network A Medium 11 A weighted layer (8 Convolutions and 3 All connection layers ) To the network E in Of 19 A weighted layer (16 Convolutions and 3 All connection layers ).
By using three 3×3 Convolution layers are stacked to replace individual 7×7 layer . First , Three nonlinear correction layers are combined , Not a single one , This makes the decision function more discriminative . secondly , Reduce the number of parameters : This can be seen as right 7×7 Convolution filter is regularized , Force them through 3×3 filter ( Inject nonlinearity between them ) decomposition .
combination 1×1 Convolution layer ( To configure C, surface 1) It is a way to increase the nonlinearity of decision function without affecting the receptive field of convolution layer .
result

- Use the local response normalization network (A-LRN The Internet ) In the absence of any normalized layer , On the model A Did not improve .
- The classification error increases with ConvNet Increase in depth and decrease in depth , To configure C( Contains three 1×1 Convolution layer ) Than in the entire network layer 3×3 Convolution configuration D Worse .
- When the depth reaches 19 When the layer , The error rate of the architecture is saturated , But a deeper model may be beneficial to a larger data set .
- A shallow network of measurements top- 1 Error rate ratio network B Of top-1 Error rate ( On the center crop image ) high 7%, This proves that the deep network with small convolution kernel is better than the shallow network with large convolution kernel .
- Scale jitter during training (S ∈ [256;512]) With fixed minimum edge (S = 256 or S = 384) Image training results better than , This confirms that the training set enhancement through scale jitter is indeed helpful in capturing multi-scale image statistics .

The above table adopts multi-scale test , Compared with the previous table, it is found that , Scale jitter during testing results in better performance .
Use multi crop images (multi-crop) Performance is more intensive than evaluation (dense) Slightly better , And these two methods are complementary , Because their combination is superior to each of them . The author believes that this is caused by different treatments of convolution boundary conditions .
- multi-crop: That is, to carry out random cropping of multiple images , Then predict the structure of each sample through the network , And then you average all the outcomes
- densely: utilize FCN Thought , Send the original map directly to the network for prediction , Change the last full connection layer to 1x1 Convolution of , In this way, we can finally get a prediction score map, And then you average the results

Trained 6 A single scale network and a multi-scale network ,7 Network integration has 7.3% Of ILSVRC Test error ( Multiple test scales are used in the test ), Use two best performing multiscale models ( To configure D and E) Are combined , A combination of intensive evaluation and multi cropping image evaluation is used to reduce the test error to 6.8%.

The model proposed in this paper is significantly better than that of previous generations in ILSVRC-2012 and ILSVRC-2013 The model with the best result in the competition .
Training & test
- Batch size 256, momentum by 0.9, Weight falloff (L2 The penalty factor is set to 5⋅10−4)
- Two fully connected layers take dropout Regularization (dropout The ratio is set to 0.5)
- The learning rate is initially set to 0.01, The rate of study is 10 Double the ratio to reduce , Total decrease 3 Time , Training 74 individual epochs.
- By using Glorot&Bengio(2010) To initialize weights without pre training .
- A fixed size is randomly cropped from the normalized training image 224×224 Of The input image
- During training , The cropped image is randomly flipped horizontally and randomly RGB Color shift ; When testing , Enhance the test set by flipping the image horizontally
- Using a large number of cropped images can improve accuracy , Because compared with the full convolution network , It makes the sampling of the input image more precise
Highlights of the article
- The depth of the network increases , The network topology is simple
- Replace big core with small core , And the use of 1*1 Convolution layer , These papers have been tried before the author
- Experiments show that the effect of normalized response layer is not obvious
- When testing, multiple test scales can increase performance
Author outlook
- The depth of the network is important for visual representation .
边栏推荐
- Invest 50billion yuan! SMIC capital was officially registered!
- 1e3是浮点数?
- MySQL Basics - Introduction and basic instructions
- 【复制】互联网术语、简称、缩写
- sql优化常用的几种方法
- Will Qualcomm and MediaTek chips soon be sold, and will they surpass Huawei to become the first in China?
- 从 IPv4 向 IPv6 的迁移
- Lenovo r9000p installation matlab2018a+cuda10.0 compilation
- MySQL常用的日期时间函数
- CFA [anomaly detection: embedded_based]
猜你喜欢

Summary of common formula notes for solving problems in Higher Mathematics

Lenovo r9000p installation matlab2018a+cuda10.0 compilation

Cglib create proxy
![[3D target detection] 3dssd (I)](/img/84/bcd3fe0ba811ea79248a5f50b15429.png)
[3D target detection] 3dssd (I)

TypeError: can‘t convert cuda:0 device type tensor to numpy. Use Tensor. cpu() to copy the tensor to

Torch.fft.fft 2. () error reporting problem solution

Stm32subeide (10) -- ADC scans multiple channels in DMA mode

Yolov5 improvement 5: improve the feature fusion network panet to bifpn

【三维目标检测】3DSSD(一)

MySQL foundation - data query
随机推荐
18张图,直观理解神经网络、流形和拓扑
sql优化常用的几种方法
LTE cell search process and sch/bch design
Stm32f4 serial port burning [flymcu]
Stm32subeide (10) -- ADC scans multiple channels in DMA mode
18 diagrams, intuitive understanding of neural networks, manifolds and topologies
CS flow [abnormal detection: normalizing flow]
DirectX repair tool download (where is exagear simulator package)
Symbol符号类型
Common library code snippet pytorch_ based【tips】
Anaconda environment installation skimage package
MySQL foundation - advanced functions
Is 1E3 a floating point number?
Empowering Chinese core entrepreneurs! See how Moore elite solves the development problems of small and medium-sized chip Enterprises
Cglib create proxy
LTE小区搜索过程及SCH/BCH设计
OSV_ q Expected all tensors to be on the same device, but found at least two devices, cuda:0
PCA学习
WebApplicationType#deduceFromClasspath
Yolov5 improvement 15: network lightweight method deep separable convolution