当前位置:网站首页>Detailed explanation of NIN network
Detailed explanation of NIN network
2022-07-02 08:30:00 【Red carp and green donkey】
1 Model is introduced
Network In Network (NIN) By M i n L i n Min Lin MinLin And other people in 2014 in , stay CIFAR-10 and CIFAR-100 The classification task reached the best level at that time , Its network structure is made up of three multilayer perceptrons .NiN Model paper 《Network In Network》 Published in ICLR-2014,NIN The convolution kernel design in convolution neural network is examined from a new perspective , By introducing the subnetwork structure to replace the linear mapping part in pure convolution , This form of network structure stimulates the structural design of more complex convolutional neural networks ,GoogLeNet Of Inception Structure is derived from this idea .
2 MLPConv
The author thinks that , Tradition CNN Linear filter used ( Convolution kernel ) It is a generalized linear model under local receptive field (Generalized linear model,GLM), So use CNN In feature extraction , In fact, it implicitly assumes that the characteristics are linearly separable , It is often difficult to separate linear problems .CNN By adding convolution filters to generate higher-level feature representation , The author thought of adding network volume layers as before , You can also make special designs in the convolution layer , That is, use a more effective nonlinear function approximator (nonlinear function approximator), To improve the abstraction ability of convolution layer, so that the network can extract better features in each perception domain .
NiN in , A micro network is used in the convolution layer (micro network) Improve the abstract ability of convolution , Here we use multilayer perceptron (MLP) As micro network, Because of this MLP Satellite network is located in convolution network , So the model was named “network in network” , The following figure compares the ordinary linear convolution layer with the multilayer perceptron convolution layer (MlpConv Layer).
Linear convolution layer and MLPConv Are local receptive fields (local receptive field) Map to the output eigenvector .MLPConv The kernel uses MLP, With the traditional CNN equally ,MLP Sharing parameters in each local receptive field , slide MLP The kernel can finally get the output characteristic graph .NIN Through multiple MLPConv The stack of gets . complete NiN The network structure is shown in the figure below .
The first convolution kernel is 3 × 3 × 3 × 16 3\times3\times3\times16 3×3×3×16, So in a patch The output of convolution on the block is 1 × 1 × 96 1\times1\times96 1×1×96 Of feature map( One 96 Dimension vector ). Then there was another MLP layer , The output is still 96, So the MLP Layer is equivalent to a 1 × 1 1\times1 1×1 The convolution of layer . Refer to the following table for the configuration of model parameters .
The network layer | Enter dimensions | Nuclear size | Output size | Number of parameters |
---|---|---|---|---|
Local full connection layer L 11 L_{11} L11 | 32 × 32 × 3 32\times32\times3 32×32×3 | ( 3 × 3 ) × 16 / 1 (3\times3)\times16/1 (3×3)×16/1 | 30 × 30 × 16 30\times30\times16 30×30×16 | ( 3 × 3 × 3 + 1 ) × 16 (3\times3\times3+1)\times16 (3×3×3+1)×16 |
Fully connected layer L 12 L_{12} L12 | 30 × 30 × 16 30\times30\times16 30×30×16 | 16 × 16 16\times16 16×16 | 30 × 30 × 16 30\times30\times16 30×30×16 | ( ( 16 + 1 ) × 16 ) ((16+1)\times16) ((16+1)×16) |
Local full connection layer L 21 L_{21} L21 | 30 × 30 × 16 30\times30\times16 30×30×16 | ( 3 × 3 ) × 64 / 1 (3\times3)\times64/1 (3×3)×64/1 | 28 × 28 × 64 28\times28\times64 28×28×64 | ( 3 × 3 × 16 + 1 ) × 64 (3\times3\times16+1)\times64 (3×3×16+1)×64 |
Fully connected layer L 22 L_{22} L22 | 28 × 28 × 64 28\times28\times64 28×28×64 | 64 × 64 64\times64 64×64 | 28 × 28 × 64 28\times28\times64 28×28×64 | ( ( 64 + 1 ) × 64 ) ((64+1)\times64) ((64+1)×64) |
Local full connection layer L 31 L_{31} L31 | 28 × 28 × 64 28\times28\times64 28×28×64 | ( 3 × 3 ) × 100 / 1 (3\times3)\times100/1 (3×3)×100/1 | 26 × 26 × 100 26\times26\times100 26×26×100 | ( 3 × 3 × 64 + 1 ) × 100 (3\times3\times64+1)\times100 (3×3×64+1)×100 |
Fully connected layer L 32 L_{32} L32 | 26 × 26 × 100 26\times26\times100 26×26×100 | 100 × 100 100\times100 100×100 | 26 × 26 × 100 26\times26\times100 26×26×100 | ( ( 100 + 1 ) × 100 ) ((100+1)\times100) ((100+1)×100) |
Global average sampling G A P GAP GAP | 26 × 26 × 100 26\times26\times100 26×26×100 | 26 × 26 × 100 / 1 26\times26\times100/1 26×26×100/1 | 1 × 1 × 100 1\times1\times100 1×1×100 | 0 0 0 |
stay NIN in , After three layers MLPConv after , Not connected to the full connection layer (FC), But the last one MLPConv Global average pooling of output characteristic graphs (global average pooling,GAP). The following is a detailed introduction to the global average pooling layer .
3 Global Average Pooling
Conventional CNN The model first uses stacked convolution layers to extract features , Input full connection layer (FC) To classify . This structure follows from LeNet5, Use convolution layer as feature extractor , The full connection layer acts as a classifier . however FC Too many layer parameters , It's easy to over fit , It will affect the generalization performance of the model . So we need to use Dropout Increase the generalization of the model .
It is proposed here GAP Instead of the traditional FC layer . The main idea is that each classification corresponds to the last layer MLPConv The output characteristic diagram of . Average each characteristic graph , The pooled vector obtained after softmax Get the classification probability .GAP The advantages of :
- Strengthen the correspondence between feature mapping and categories , It is more suitable for convolutional neural network , Characteristic graphs can be interpreted as category confidence .
- GAP Layer does not need to optimize parameters , Over fitting can be avoided .
- GAP Summarize spatial information , Therefore, it has better robustness to the spatial transformation of input data .
Can be GAP Think of it as a structural regularizer , Explicitly force the feature map to map to conceptual confidence .
4 Model characteristics
- A multi-layer perceptron structure is used to replace the filtering operation of convolution , It not only effectively reduces the problem of parameter inflation caused by too many convolution kernels , It can also improve the abstract ability of the model to features by introducing nonlinear mapping .
- Use global average pooling instead of the last full connection layer , It can effectively reduce the amount of parameters ( No trainable parameters ), At the same time, pooling uses the information of the entire feature map , It is more robust to the transformation of spatial information , The final output result can be directly used as the confidence of the corresponding category .
边栏推荐
- Mutex
- Global and Chinese market of medicine cabinet 2022-2028: Research Report on technology, participants, trends, market size and share
- Carsim problem failed to start Solver: Path Id Obj (X) was set to y; Aucune valeur de correction de xxxxx?
- Carsim-问题Failed to start Solver: PATH_ID_OBJ(X) was set to Y; no corresponding value of XXXXX?
- My VIM profile
- Matlab mathematical modeling tool
- Carsim 学习心得-粗略翻译1
- Routing foundation - dynamic routing
- Sparse matrix storage
- Wang extracurricular words
猜你喜欢
Use Wireshark to grab TCP three handshakes
sqli-labs第12关
Carla-ue4editor import Roadrunner map file (nanny level tutorial)
[untitled]
MySQL optimization
W10 is upgraded to W11 system, but the screen is black, but the mouse and desktop shortcuts can be used. How to solve it
cve_ 2019_ 0708_ bluekeep_ Rce vulnerability recurrence
sqli-labs第8关(布尔盲注)
双向链表的实现(双向链表与单向链表的简单区别联系和实现)
2022 Heilongjiang latest construction eight members (materialman) simulated examination questions and answers
随机推荐
Mutex
力扣每日一题刷题总结:二叉树篇(持续更新)
Summary of one question per day: linked list (continuously updated)
16: 00 interview, came out at 16:08, the question is really too
k8s入门:Helm 构建 MySQL
Deep understanding of JVM
Use of opencv3 6.2 low pass filter
Linked list classic interview questions (reverse the linked list, middle node, penultimate node, merge and split the linked list, and delete duplicate nodes)
路由基础—动态路由
OpenCV3 6.3 用滤波器进行缩减像素采样
[dynamic planning] p4170: coloring (interval DP)
2022 Heilongjiang latest food safety administrator simulation exam questions and answers
How to apply for a secondary domain name?
樂理基礎(簡述)
HCIA - data link layer
[untitled]
Gateway 简单使用
Global and Chinese markets for conventional rubber track 2022-2028: Research Report on technology, participants, trends, market size and share
ARP and ARP Spoofing
In depth understanding of prototype drawings