当前位置：网站首页>[reading papers] visual convolution zfnet

[reading papers] visual convolution zfnet

2022-06-13 02:24:00 【Shameful child】

Visualizing and Understanding Convolutional Networks

The main job is AlexNet Of visualization , And on this basis, some analysis is made , It is very helpful to understand convolutional neural network .
Know that images can be classified , But I don't know why images can be classified . This article about ZFNET It explains the semantic relevance of each part of the image that can be classified .
In medical imaging , Areas such as autonomous driving , We need to give the design model Interpretability .
- For example, the determination of tumor is the result of which parts of the image .
- What features are extracted from each layer of the network ？ Each convolution kernel sums feature map Which features are you interested in , What effect does occlusion of the image have on the result . There is correlation between different parts of the image ？
ZFnet yes 2013 The champion of the image classification contest in
- Main work
  - Intuitive understanding and analysis CNN Characteristics learned （ What image does the middle layer feature correspond to ）
  - You can find ways to improve the model （ Observe the characteristics of the middle layer , The analysis model can be improved ）
  - analysis CNN Masking sensitivity （ The influence of covering a certain area on the classification results ）
- AlexNet yes 2012 The champion of the image classification contest in , The main reasons for winning the championship are
  - Large scale structured data sets
  - GPU Hardware computing support
  - DropOut Etc. model regularization method
- Research methods
  - Local sensitivity analysis
  - Local correlation analysis
  - deconvnet
    - This paper provides a method to study image invariance without parameters , Can indicate the of each middle layer feature map, Patterns are interested in that feature
- Other research methods
  - The gradient rises , Make active feature map The largest original . Careful initialization is required
  - The numerical solution of Hessian matrix is used to show the invariance of the optimal response . However, the high-level characteristics of neural networks are difficult to be expressed simply by second-order approximation
- Network structure
  - ZFNet The network architecture of is in AlexNet On the basis of the revision , And AlexNet comparison , Little difference ：
    - The first 1 Convolution layers ,kernel size from 11 Reduce to 7, take stride from 4 Reduce to 2（ This will lead to feature map increase 1 times ）
    - In order to make the follow-up feature map Keep the same size , The first 2 Convoluted stride from 1 Turn into 2
  - The trainable parameters in the network are ： The weight of convolution kernel , The weight of all connected layers

Multi class cross entropy function

In dichotomous problems , The loss function is the cross entropy loss function . For samples （x,y） Speaking of ,x For the sample y Is the corresponding label . In dichotomous problems , The set of values may be {0,1}, We assume that the real label of a sample is yt, The sample yt=1 The probability of is yp, Then the loss function of the sample is ：
In the multi classification problem , The loss function is also the cross entropy loss function , For samples （x,y） Speaking of ,y It's a real label , Forecast tags are a collection of all tags , We assume that there is k Tag value , The first i One sample is predicted to be the K The probability of tags is pi,k, Altogether N Samples , Then the total data set loss function is ：

Visualization methods

Mainly adopt deconv The method of integrating the Top-k Activate reverse projection onto the original image , So as to determine what part of the image the activation value mainly recognizes .
- Convolution neural network maps the original pixel space to the feature space by layer convolution , Deep level feature map The value of each position on represents how similar a pattern is , But because it's in the feature space , It's not conducive for the human eye to observe the corresponding pattern directly , To facilitate observation and understanding , It needs to be mapped back to pixel space
It is required that there must be corresponding reverse operations for each layer .
- about MaxPooling layer , Use in feedforward switch Variable to record the source of the maximum value index, And then, by this approximation, we get Unpooling
- about Relu layer , Use it directly Relu layer . And for conv layer , Use deconv, That is, the transpose of the original convolution kernel is used as the convolution kernel
- The reverse reconstruction process is shown in the following figure ：

Characteristics analysis

The author of AlexNet Of all levels Top-9 The activation values are visualized , As shown in the figure below ：
- - layer 2 Corresponding to the side 、 horn 、 Color recognition
  - layer 3 Have more invariance , Captured some texture features
  - layer 4 Shows important differences in categories , Like a dog's face , Bird's feet, etc .
  - layer 5 Start to focus on the overall goal .
  - Understanding of neural networks , The underlying network focuses on identifying low-level features , Higher layer network through The combination and abstraction of lower level features form higher-level features .
- The author uses this visualization method , Found the original AlexNet The question of structure （ For example, the first layer lacks intermediate frequency information , In the second layer, the step size is too large, resulting in some superposition effects and so on ） And the structure has been changed , Then the comparison is made , Discover the changed model top-5 Higher performance than the original network . The author also analyzed the occlusion sensitivity and consistency
- The characteristic evolution of the training process [6 Features ]、[1,2,5,10,20,30,40,64 Rounds ]
  - 1. With the iteration of training , Change of characteristic diagram , Inside each floor 8 Columns represent different epoch Characteristic diagram of .
    2. The characteristic diagrams listed are , For one of the layers feature map, Activate the strongest sample in all training sets feature map.
    3. You can see , The low-level characteristic graph converges faster , The high-level feature map should go to the back epoch Just began to change .
    4. Have effect mutation , Means to be able to make filter Activating the largest feature map has changed
- Translation invariance , No rotation invariance
- Occlusion experiment
  - Whether there is a consistent association between the specified target local blocks of different images , The author thinks that the depth model may learn this relationship by default . The author partially occludes the images of five different dogs , Then the sum of Hamming distance between the features of the original image and the occluded image is analyzed , The higher the value, the greater the consistency . Experiments show that , The left eye is occluded by images of different dogs 、 The Hamming distance behind the right eye and nose is less than the random occlusion , Prove that there is a certain relevance .
Deconvolution realizes visualization ：
- The author uses standard and supervised cnn Model based , A deconvolution layer is added behind each convolution layer , The deconvolution layer can be regarded as the reverse process of the deconvolution layer , It also contains convolution kernel and pooling function （ Or the inverse function ）, The function of deconvolution layer is to remap the output characteristics into input signals .
- This process , It mainly includes three operations ：
  - unpooling
  - correct
  - deconvolution
experiment
- author The network structure size adjustment experiment is carried out . After removing the last two full connection layers that contain most of the network parameters , Network performance degradation is minimal ; After removing the middle two convolution layers , Network performance degradation is also rare ; however After removing the above full connection layer and convolution layer , The network performance drops dramatically , The author concludes that ： Model depth is important for model performance , There is a minimum depth , When less than this depth , The performance of the model is greatly reduced .
- ```
  link ：https://www.jianshu.com/p/0718963bf3b5
```

Anti pooling

Pooling is Irreversible The process of , However, we can record the process of pooling , Coordinate position of the maximum active value .
Only activate the value of the position coordinate where the maximum activation value is located in the pool process , Other values are set to 0, Of course, this process is only an approximation , Because we are in the process of pooling , Except where the maximum value is , Other values are not 0 Of .

Reactivate

The process is to use relu The inverse process of a function , In the forward calculation relu Function guarantees nonnegativity , This constraint is still valid in the reverse process , There is no difference between the deactivation process and the activation process , They are all directly adopted relu function .

deconvolution

Convolution kernel transpose in forward convolution computation is adopted , Convolution with the corrected output characteristics .

summary

It reveals that these characteristics are far from random 、 Incomprehensible patterns .
Shows many visually satisfying features , Such as synthetic 、 The increased invariance is distinguished from our category when we are ascending .
It also shows how to use these visualizations to debug model problems , To get better results
It is proved by a series of occlusion experiments , Although the model has been trained in classification , But it is highly sensitive to the local structure in the image , And it doesn't just use a wide range of scene contexts . Ablation studies of the model indicate that , The minimum depth of the network , Not any single section , Critical to the performance of the model .
- Melting research ： Ablation research usually refers to the deletion of certain parts of a model or algorithm “ function ”, And see how it affects performance .

A series of occlusion experiments proved , Although the model has been trained in classification , But it is highly sensitive to the local structure in the image , And it doesn't just use a wide range of scene contexts . Ablation studies of the model indicate that , The minimum depth of the network , Not any single section , Critical to the performance of the model .

Melting research ： Ablation research usually refers to the deletion of certain parts of a model or algorithm “ function ”, And see how it affects performance .

原网站

版权声明
本文为[Shameful child]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202280543149077.html