当前位置：网站首页>Cvpr2022 | reexamine pooling: your receptive field is not the best

Cvpr2022 | reexamine pooling: your receptive field is not the best

2022-06-29 13:09:00 【CV technical guide (official account)】

Preface This paper presents a simple and effective dynamic optimization pool operation （ Dynamically Optimized Pooling operation）, be called DynOPool, It optimizes the end-to-end scaling factor of feature mapping by learning the optimal size and shape of receptive fields in each layer .
Any type of resizing module in the deep neural network can be used DynOPool Operations are replaced at minimal cost . Besides ,DynOPool The complexity of the model is controlled by introducing an additional loss term that limits the computational cost .

Welcome to the official account CV Technical guide , Focus on computer vision technology summary 、 The latest technology tracking 、 Interpretation of classic papers 、CV Recruitment information .

The paper ：https://arxiv.org/abs/2205.15254

Code ： Unpublished

background

Although deep neural networks in computer vision 、 natural language processing 、 robot 、 Bioinformatics and other applications have achieved unprecedented success , But the design of optimal network structure is still a challenging problem . and The size and shape of receptive field determine how the network gathers local information , And have a significant impact on the overall performance of the model . Many components of neural networks , For example, kernel size and step size for convolution and pooling operations , Will affect the configuration of receptive field . However , They still depend on super parameters , The receptive field of the existing model will lead to unsatisfactory shape and size .

This paper introduces that the traditional receptive field with fixed size and shape is a sub optimal problem , it has been reviewed that DynOPool How to use CIFAR-100 Upper VGG-16 Toy experiments solve this problem .

The problems of the traditional receptive field with fixed size and shape ：

1. Asymmetrically distributed information

The shape of the best receptive field will change according to the inherent spatial information asymmetry in the data set . In most cases, the inherent asymmetry is not measurable . Besides , Input resizing, which is usually used for preprocessing, sometimes leads to information asymmetry . In a artificially designed network , The aspect ratio of the image is often adjusted to meet the input specifications of the model . However , The receptive field in this network is not used for processing operations .

In order to verify the proposed method , The author in CIFAR-stretch-V Experiment on , Pictured 1（a） Shown , Compared with the manual design model , Shape pass DynOPool Dynamically optimized feature mapping improves performance by extracting more valuable information in the horizontal direction .

chart 1 Use from CIFAR-100 Three different synthetic data sets were used for toy experiments :

(a) Randomly crop vertically stretched images (b) stay 4×4 Tile the reduced image in the grid (c) Zoom in and out of the image .

2. Densely or sparsely distributed information

Locality is an integral part of designing the optimal model .CNN Learning the complex representation of images by aggregating local information in a cascading way . The importance of local information depends largely on the attributes of each image . for example , When an image is blurred , Most meaningful micro models , Such as the texture of an object , Will be erased . under these circumstances , It is best to expand the receptive field in the early layer , Focus on global information . On the other hand , If an image contains a large amount of class specific information in local details , For example, texture , It will be more important to identify local information .

To test the hypothesis , The author constructed CIFAR-100 Two variations of the dataset ,CIFAR-tile and CIFAR-large, Pictured 1(b) and (c) Shown . The author's model is superior to the artificial model to a great extent .

contribution

In order to alleviate the suboptimal nature of the artificially built architecture and operation , The author puts forward the dynamic optimization of pool operation （DynOPool）, This is a learnable resizing module , It can replace the standard resizing operation . This module finds the best scale factor of receptive field for the operation learned on the data set , Thus, the intermediate feature graph in the network is adjusted to an appropriate size and shape .

The main contribution of the paper ：

1、 It solves the limitation that the existing scale operators in the deep neural network depend on the predetermined super parameters . The importance of finding the best spatial resolution and receptive field in the intermediate feature map is pointed out .

2、 A learnable module for resizing is proposed DynOPool, It can find the best scale factor and receptive domain of the intermediate feature map .DynOPool Use the learned scale factor to identify the best resolution and receptive field of a layer , And propagate the information to subsequent layers , So as to realize the scale optimization in the whole network .

3、 It is proved that in the task of image classification and semantic segmentation , Use DynOPool The model is superior to the baseline algorithm in multiple data sets and network architecture . It also shows the ideal trade-off between accuracy and computational cost .

Method

1. Dynamically optimize the pool (DynOPool)

chart 2 DynOPool Resizing module in

The module optimizes the scale factor between a pair of input and output feature maps r To optimize query points q And get the best resolution of the intermediate feature mapping .DynOPool Without affecting other operators , Adaptive control of the size and shape of the deeper receiving domain .

chart 3 DynOPool The whole optimization process

For scale factor r Gradient instability , There will be a gradient explosion, which will cause the resolution to change significantly during the training process , Use a Reparameterization r as follows ：

2. Model complexity constraints

To maximize the accuracy of the model ,DynOPool Sometimes there is a large scale factor , The resolution of the intermediate feature map is increased . therefore , In order to constrain the calculation cost , Reduce model size , An additional loss item is introduced LGMACs, It consists of each training iteration t The layered GMACs A simple weighted sum of the counts is given , As shown below :

experiment

surface 1 Manual design model and use DynOPool The accuracy of the model (%) and GMACs Compare

chart 4 stay VGG-16 Using artificial design Shape Adaptor And use DynOPool Visualization of training model .

surface 2 stay CIFAR-100 On dataset DynOPool and Shape Adaptor Comparison

surface 3 stay ImageNet On dataset EfficientNet-B0+DynOPool Performance of

surface 4 be based on PascalVOC Of HRNet-W48 Semantic segmentation results

Conclusion

The author proposes a simple and effective dynamic optimization pool operation （DynOPool）, It optimizes the scale factor of end-to-end feature mapping by learning the ideal size and shape of receptive field in each layer , Adjust the size and shape of the intermediate feature map , Effectively extract local details , So as to optimize the overall performance of the model ;

DynOPool The calculation cost is also limited by introducing an additional loss item , So as to control the complexity of the model . Experiments show that , On multiple data sets , This model is superior to baseline network in image classification and semantic segmentation .

CV The technical guide creates a computer vision technology exchange group and a free version of the knowledge planet , At present, the number of people on the planet has 600+, The number of topics reached 200+.

The knowledge planet will release some homework every day , It is used to guide people to learn something , You can continue to punch in and learn according to your homework .

Every day in the technology group, the top conference papers published in recent days will be sent , You can choose the papers you are interested in to read , continued follow Latest technology , If you write an interpretation after reading it and submit it to us , You can also receive royalties .

in addition , The technical group and my circle of friends will also publish various periodicals 、 Notice of solicitation of contributions for the meeting , If you need it, please scan your friends , And pay attention to .

Add groups and planets ： Official account CV Technical guide , Get and edit wechat , Invite to join .

Welcome to the official account CV Technical guide , Focus on computer vision technology summary 、 The latest technology tracking 、 Interpretation of classic papers 、CV Recruitment information .