当前位置:网站首页>[reading papers] visual convolution zfnet
[reading papers] visual convolution zfnet
2022-06-13 02:24:00 【Shameful child】
Visualizing and Understanding Convolutional Networks
The main job is AlexNet Of visualization , And on this basis, some analysis is made , It is very helpful to understand convolutional neural network .
Know that images can be classified , But I don't know why images can be classified . This article about ZFNET It explains the semantic relevance of each part of the image that can be classified .
In medical imaging , Areas such as autonomous driving , We need to give the design model Interpretability .
- For example, the determination of tumor is the result of which parts of the image .
- What features are extracted from each layer of the network ? Each convolution kernel sums feature map Which features are you interested in , What effect does occlusion of the image have on the result . There is correlation between different parts of the image ?
ZFnet yes 2013 The champion of the image classification contest in
Main work
- Intuitive understanding and analysis CNN Characteristics learned ( What image does the middle layer feature correspond to )
- You can find ways to improve the model ( Observe the characteristics of the middle layer , The analysis model can be improved )
- analysis CNN Masking sensitivity ( The influence of covering a certain area on the classification results )
AlexNet yes 2012 The champion of the image classification contest in , The main reasons for winning the championship are
- Large scale structured data sets
- GPU Hardware computing support
- DropOut Etc. model regularization method
Research methods
- Local sensitivity analysis
- Local correlation analysis
- deconvnet
- This paper provides a method to study image invariance without parameters , Can indicate the of each middle layer feature map, Patterns are interested in that feature
Other research methods
- The gradient rises , Make active feature map The largest original . Careful initialization is required
- The numerical solution of Hessian matrix is used to show the invariance of the optimal response . However, the high-level characteristics of neural networks are difficult to be expressed simply by second-order approximation
Network structure
ZFNet The network architecture of is in AlexNet On the basis of the revision , And AlexNet comparison , Little difference :
- The first 1 Convolution layers ,kernel size from 11 Reduce to 7, take stride from 4 Reduce to 2( This will lead to feature map increase 1 times )
- In order to make the follow-up feature map Keep the same size , The first 2 Convoluted stride from 1 Turn into 2

The trainable parameters in the network are : The weight of convolution kernel , The weight of all connected layers
Multi class cross entropy function
In dichotomous problems , The loss function is the cross entropy loss function . For samples (x,y) Speaking of ,x For the sample y Is the corresponding label . In dichotomous problems , The set of values may be {0,1}, We assume that the real label of a sample is yt, The sample yt=1 The probability of is yp, Then the loss function of the sample is :
In the multi classification problem , The loss function is also the cross entropy loss function , For samples (x,y) Speaking of ,y It's a real label , Forecast tags are a collection of all tags , We assume that there is k Tag value , The first i One sample is predicted to be the K The probability of tags is pi,k, Altogether N Samples , Then the total data set loss function is :
Visualization methods
Mainly adopt
deconvThe method of integrating theTop-kActivate reverse projection onto the original image , So as to determine what part of the image the activation value mainly recognizes .- Convolution neural network maps the original pixel space to the feature space by layer convolution , Deep level feature map The value of each position on represents how similar a pattern is , But because it's in the feature space , It's not conducive for the human eye to observe the corresponding pattern directly , To facilitate observation and understanding , It needs to be mapped back to pixel space
It is required that there must be corresponding reverse operations for each layer .
- about
MaxPoolinglayer , Use in feedforwardswitchVariable to record the source of the maximum valueindex, And then, by this approximation, we getUnpooling - about
Relulayer , Use it directlyRelulayer . And forconv layer, Usedeconv, That is, the transpose of the original convolution kernel is used as the convolution kernel - The reverse reconstruction process is shown in the following figure :
- about
Characteristics analysis
The author of AlexNet Of all levels Top-9 The activation values are visualized , As shown in the figure below :

- layer 2 Corresponding to the side 、 horn 、 Color recognition
- layer 3 Have more invariance , Captured some texture features
- layer 4 Shows important differences in categories , Like a dog's face , Bird's feet, etc .
- layer 5 Start to focus on the overall goal .
- Understanding of neural networks , The underlying network focuses on identifying low-level features , Higher layer network through The combination and abstraction of lower level features form higher-level features .
The author uses this visualization method , Found the original
AlexNetThe question of structure ( For example, the first layer lacks intermediate frequency information , In the second layer, the step size is too large, resulting in some superposition effects and so on ) And the structure has been changed , Then the comparison is made , Discover the changed modeltop-5Higher performance than the original network . The author also analyzed the occlusion sensitivity and consistencyThe characteristic evolution of the training process [6 Features ]、[1,2,5,10,20,30,40,64 Rounds ]

- With the iteration of training , Change of characteristic diagram , Inside each floor 8 Columns represent different epoch Characteristic diagram of .
- The characteristic diagrams listed are , For one of the layers feature map, Activate the strongest sample in all training sets feature map.
- You can see , The low-level characteristic graph converges faster , The high-level feature map should go to the back epoch Just began to change .
- Have effect mutation , Means to be able to make filter Activating the largest feature map has changed
Translation invariance , No rotation invariance
Occlusion experiment


Whether there is a consistent association between the specified target local blocks of different images , The author thinks that the depth model may learn this relationship by default . The author partially occludes the images of five different dogs , Then the sum of Hamming distance between the features of the original image and the occluded image is analyzed , The higher the value, the greater the consistency . Experiments show that , The left eye is occluded by images of different dogs 、 The Hamming distance behind the right eye and nose is less than the random occlusion , Prove that there is a certain relevance .
Deconvolution realizes visualization :
The author uses standard and supervised cnn Model based , A deconvolution layer is added behind each convolution layer , The deconvolution layer can be regarded as the reverse process of the deconvolution layer , It also contains convolution kernel and pooling function ( Or the inverse function ), The function of deconvolution layer is to remap the output characteristics into input signals .
This process , It mainly includes three operations :
- unpooling
- correct
- deconvolution
experiment
author The network structure size adjustment experiment is carried out . After removing the last two full connection layers that contain most of the network parameters , Network performance degradation is minimal ; After removing the middle two convolution layers , Network performance degradation is also rare ; however After removing the above full connection layer and convolution layer , The network performance drops dramatically , The author concludes that : Model depth is important for model performance , There is a minimum depth , When less than this depth , The performance of the model is greatly reduced .
link :https://www.jianshu.com/p/0718963bf3b5
Anti pooling
- Pooling is Irreversible The process of , However, we can record the process of pooling , Coordinate position of the maximum active value .
- Only activate the value of the position coordinate where the maximum activation value is located in the pool process , Other values are set to 0, Of course, this process is only an approximation , Because we are in the process of pooling , Except where the maximum value is , Other values are not 0 Of .
Reactivate
- The process is to use relu The inverse process of a function , In the forward calculation relu Function guarantees nonnegativity , This constraint is still valid in the reverse process , There is no difference between the deactivation process and the activation process , They are all directly adopted relu function .
deconvolution
- Convolution kernel transpose in forward convolution computation is adopted , Convolution with the corrected output characteristics .
summary
- It reveals that these characteristics are far from random 、 Incomprehensible patterns .
- Shows many visually satisfying features , Such as synthetic 、 The increased invariance is distinguished from our category when we are ascending .
- It also shows how to use these visualizations to debug model problems , To get better results
- It is proved by a series of occlusion experiments , Although the model has been trained in classification , But it is highly sensitive to the local structure in the image , And it doesn't just use a wide range of scene contexts . Ablation studies of the model indicate that , The minimum depth of the network , Not any single section , Critical to the performance of the model .
- Melting research : Ablation research usually refers to the deletion of certain parts of a model or algorithm “ function ”, And see how it affects performance .
A series of occlusion experiments proved , Although the model has been trained in classification , But it is highly sensitive to the local structure in the image , And it doesn't just use a wide range of scene contexts . Ablation studies of the model indicate that , The minimum depth of the network , Not any single section , Critical to the performance of the model .
- Melting research : Ablation research usually refers to the deletion of certain parts of a model or algorithm “ function ”, And see how it affects performance .
边栏推荐
- [reading some papers] introducing deep learning into the public horizon alexnet
- Deep learning the principle of armv8/armv9 cache
- Understand CRF
- [programming idea] communication interface of data transmission and decoupling design of communication protocol
- What are the differences in cache/tlb?
- [work notes] xr872 codec driver migration and application program example (with chip debugging method)
- Paper reading - joint beat and downbeat tracking with recurrent neural networks
- STM32 sensorless brushless motor drive
- Common web page status return code crawler
- Superficial understanding of conditional random fields
猜你喜欢
![[reading paper] generate confrontation network Gan](/img/88/950b47cac330f208c0c9e100f42b4f.jpg)
[reading paper] generate confrontation network Gan

Open source video recolor code

Deep learning the principle of armv8/armv9 cache

ROS learning-7 error in custom message or service reference header file

Mean Value Coordinates

Stm32 mpu6050 servo pan tilt support follow

Paper reading - beat tracking by dynamic programming

Chapter7-13_ Dialogue State Tracking (as Question Answering)
![[learning notes] xr872 GUI littlevgl 8.0 migration (file system)](/img/9b/0bf88354e8cfdbcc1ea91311c9a823.jpg)
[learning notes] xr872 GUI littlevgl 8.0 migration (file system)

【Unity】打包WebGL項目遇到的問題及解决記錄
随机推荐
Share three stories about CMDB
ROS learning-8 pit for custom action programming
Installing Oracle with docker for Mac
Leetcode 450. 删除二叉搜索树中的节点 [二叉搜索树]
Basic exercise of test questions Yanghui triangle (two-dimensional array and shallow copy)
1000 fans ~
How to solve the problem of obtaining the time through new date() and writing out the difference of 8 hours between the database and the current time [valid through personal test]
[reading point paper] deeplobv3+ encoder decoder with Atlas separable revolution
[analysis notes] source code analysis of siliconlabs efr32bg22 Bluetooth mesh sensorclient
Parameter measurement method of brushless motor
L1 regularization and its sparsity
Number of special palindromes in basic exercise of test questions
Laravel permission export
AutoX. JS invitation code
Think about the possibility of attacking secure memory through mmu/tlb/cache
Basic exercise of test questions decimal to hexadecimal
在IDEA使用C3P0连接池连接SQL数据库后却不能显示数据库内容
[single chip microcomputer] single timer in front and back platform program framework to realize multi delay tasks
智能安全配电装置如何减少电气火灾事故的发生?
[51nod.3210] binary Statistics (bit operation)



