当前位置:网站首页>Thesis reading (2) - vggnet of classification
Thesis reading (2) - vggnet of classification
2022-07-28 23:02:00 【leu_ mon】
VGGNet
author :Karen Simonyan & Andrew Zisserman
Company : University of Oxford and Google DeepMind
Time :ILSVRC 2014 runner-up
subject :Very Deep Convolutional Networks for Large-Scale Image
Abstract
In this work , We study the influence of convolution network depth on the accuracy in large-scale image recognition environment . Our main contribution is to use very small (3×3) Convolutional filter architecture makes a comprehensive assessment of the increase in network depth , This shows that by pushing the depth to 16-19 The weighting layer can realize the significant improvement of the prior art configuration . These findings are our ImageNet Challenge 2014 The basis of the paper , Our team won the first place and the second place in the positioning and classification process . We also show that , Our representation is good for generalization of other datasets , The best results on other datasets . We make our two best ConvNet Models are publicly available , In order to further study the depth vision representation in computer vision .
background
With ConvNets It is becoming more and more popular in the field of computer vision , In order to achieve better accuracy , Many attempts have been made to improve Krizhevsky wait forsomeone (2012) Original architecture , This paper discusses ConvNet Another important aspect of Architecture Design —— Its depth . For this reason, other parameters of the architecture have been modified , Use very small in all layers (3×3) Convolutional filter , And by adding more convolution layers to steadily increase the depth of the network .
Model

All hidden layers are equipped with ReLU Activation function , The above network ( In addition to a ) No local response normalization (LRN), From the network A Medium 11 A weighted layer (8 Convolutions and 3 All connection layers ) To the network E in Of 19 A weighted layer (16 Convolutions and 3 All connection layers ).
By using three 3×3 Convolution layers are stacked to replace individual 7×7 layer . First , Three nonlinear correction layers are combined , Not a single one , This makes the decision function more discriminative . secondly , Reduce the number of parameters : This can be seen as right 7×7 Convolution filter is regularized , Force them through 3×3 filter ( Inject nonlinearity between them ) decomposition .
combination 1×1 Convolution layer ( To configure C, surface 1) It is a way to increase the nonlinearity of decision function without affecting the receptive field of convolution layer .
result

- Use the local response normalization network (A-LRN The Internet ) In the absence of any normalized layer , On the model A Did not improve .
- The classification error increases with ConvNet Increase in depth and decrease in depth , To configure C( Contains three 1×1 Convolution layer ) Than in the entire network layer 3×3 Convolution configuration D Worse .
- When the depth reaches 19 When the layer , The error rate of the architecture is saturated , But a deeper model may be beneficial to a larger data set .
- A shallow network of measurements top- 1 Error rate ratio network B Of top-1 Error rate ( On the center crop image ) high 7%, This proves that the deep network with small convolution kernel is better than the shallow network with large convolution kernel .
- Scale jitter during training (S ∈ [256;512]) With fixed minimum edge (S = 256 or S = 384) Image training results better than , This confirms that the training set enhancement through scale jitter is indeed helpful in capturing multi-scale image statistics .

The above table adopts multi-scale test , Compared with the previous table, it is found that , Scale jitter during testing results in better performance .
Use multi crop images (multi-crop) Performance is more intensive than evaluation (dense) Slightly better , And these two methods are complementary , Because their combination is superior to each of them . The author believes that this is caused by different treatments of convolution boundary conditions .
- multi-crop: That is, to carry out random cropping of multiple images , Then predict the structure of each sample through the network , And then you average all the outcomes
- densely: utilize FCN Thought , Send the original map directly to the network for prediction , Change the last full connection layer to 1x1 Convolution of , In this way, we can finally get a prediction score map, And then you average the results

Trained 6 A single scale network and a multi-scale network ,7 Network integration has 7.3% Of ILSVRC Test error ( Multiple test scales are used in the test ), Use two best performing multiscale models ( To configure D and E) Are combined , A combination of intensive evaluation and multi cropping image evaluation is used to reduce the test error to 6.8%.

The model proposed in this paper is significantly better than that of previous generations in ILSVRC-2012 and ILSVRC-2013 The model with the best result in the competition .
Training & test
- Batch size 256, momentum by 0.9, Weight falloff (L2 The penalty factor is set to 5⋅10−4)
- Two fully connected layers take dropout Regularization (dropout The ratio is set to 0.5)
- The learning rate is initially set to 0.01, The rate of study is 10 Double the ratio to reduce , Total decrease 3 Time , Training 74 individual epochs.
- By using Glorot&Bengio(2010) To initialize weights without pre training .
- A fixed size is randomly cropped from the normalized training image 224×224 Of The input image
- During training , The cropped image is randomly flipped horizontally and randomly RGB Color shift ; When testing , Enhance the test set by flipping the image horizontally
- Using a large number of cropped images can improve accuracy , Because compared with the full convolution network , It makes the sampling of the input image more precise
Highlights of the article
- The depth of the network increases , The network topology is simple
- Replace big core with small core , And the use of 1*1 Convolution layer , These papers have been tried before the author
- Experiments show that the effect of normalized response layer is not obvious
- When testing, multiple test scales can increase performance
Author outlook
- The depth of the network is important for visual representation .
边栏推荐
- Servlet的使用手把手教学(一)
- Torch.fft.fft 2. () error reporting problem solution
- Leetcode exercise 3 - palindromes
- OSV_ q Expected all tensors to be on the same device, but found at least two devices, cuda:0
- Thesis reading (0) - alexnet of classification
- DirectX repair tool download (where is exagear simulator package)
- MySQL foundation - data query
- 赋能中国芯创业者!看摩尔精英如何破解中小芯片企业发展难题
- 【三维目标检测】3DSSD(一)
- OSV_ q The size of tensor a (704) must match the size of tensor b (320) at non-singleton dime
猜你喜欢

cannot resize variables that require grad

Es personal arrangement of relevant interview questions

LTE小区搜索过程及SCH/BCH设计

console.log()控制台显示...解决办法

shell脚本基础——Shell运行原理+变量、数组定义
![Draem+sspcab [anomaly detection: block]](/img/97/75ce235c2021b56007eecb82afe4b0.png)
Draem+sspcab [anomaly detection: block]
![Fastflow [abnormal detection: normalizing flow]](/img/5e/984e5bd34c493039e3c9909fc4df05.png)
Fastflow [abnormal detection: normalizing flow]
![CFA [anomaly detection: embedded_based]](/img/ee/da822a7e8b443236338d4274b066c7.png)
CFA [anomaly detection: embedded_based]

从 IPv4 向 IPv6 的迁移

【物理应用】大气吸收损耗附matlab代码
随机推荐
can‘t convert cuda:0 device type tensor to numpy. Use Tensor. cpu() to copy the tensor to host memory
No code development platform management background tutorial
ValueError: Using a target size (torch.Size([64])) that is different to the input size (torch.Size([
UNET [basic network]
C语言学习内容总结
OSV_ Q write divergence operator div and Laplace stepped on the pit
Wheel 7: TCP client
[copy] Internet terms, abbreviations, abbreviations
Anomaly detection summary: intensity_ based/Normalizing Flow
DIP-VBTV: Color Image Restoration Model Combining Deep Image Prior and Vector Bundle Total Variation
Paper reading vision gnn: an image is worth graph of nodes
Is 1E3 a floating point number?
Summary of common formula notes for solving problems in Higher Mathematics
Summary of the problem that MathType formula does not correspond in word
【数据库】
LeetCode练习3——回文数
Lenovo r9000p installation matlab2018a+cuda10.0 compilation
《Shortening passengers’ travel time A dynamic metro train scheduling approach using deep reinforcem》
Yolov5 improvement 4: add ECA channel attention mechanism
简单的es高亮实战