当前位置:网站首页>Cvpr2022 | reexamine pooling: your receptive field is not the best
Cvpr2022 | reexamine pooling: your receptive field is not the best
2022-06-29 13:09:00 【CV technical guide (official account)】
Preface This paper presents a simple and effective dynamic optimization pool operation ( Dynamically Optimized Pooling operation), be called DynOPool, It optimizes the end-to-end scaling factor of feature mapping by learning the optimal size and shape of receptive fields in each layer .
Any type of resizing module in the deep neural network can be used DynOPool Operations are replaced at minimal cost . Besides ,DynOPool The complexity of the model is controlled by introducing an additional loss term that limits the computational cost .
Welcome to the official account CV Technical guide , Focus on computer vision technology summary 、 The latest technology tracking 、 Interpretation of classic papers 、CV Recruitment information .

The paper :https://arxiv.org/abs/2205.15254
Code : Unpublished
background
Although deep neural networks in computer vision 、 natural language processing 、 robot 、 Bioinformatics and other applications have achieved unprecedented success , But the design of optimal network structure is still a challenging problem . and The size and shape of receptive field determine how the network gathers local information , And have a significant impact on the overall performance of the model . Many components of neural networks , For example, kernel size and step size for convolution and pooling operations , Will affect the configuration of receptive field . However , They still depend on super parameters , The receptive field of the existing model will lead to unsatisfactory shape and size .
This paper introduces that the traditional receptive field with fixed size and shape is a sub optimal problem , it has been reviewed that DynOPool How to use CIFAR-100 Upper VGG-16 Toy experiments solve this problem .
The problems of the traditional receptive field with fixed size and shape :
1. Asymmetrically distributed information
The shape of the best receptive field will change according to the inherent spatial information asymmetry in the data set . In most cases, the inherent asymmetry is not measurable . Besides , Input resizing, which is usually used for preprocessing, sometimes leads to information asymmetry . In a artificially designed network , The aspect ratio of the image is often adjusted to meet the input specifications of the model . However , The receptive field in this network is not used for processing operations .
In order to verify the proposed method , The author in CIFAR-stretch-V Experiment on , Pictured 1(a) Shown , Compared with the manual design model , Shape pass DynOPool Dynamically optimized feature mapping improves performance by extracting more valuable information in the horizontal direction .

chart 1 Use from CIFAR-100 Three different synthetic data sets were used for toy experiments :
(a) Randomly crop vertically stretched images (b) stay 4×4 Tile the reduced image in the grid (c) Zoom in and out of the image .
2. Densely or sparsely distributed information
Locality is an integral part of designing the optimal model .CNN Learning the complex representation of images by aggregating local information in a cascading way . The importance of local information depends largely on the attributes of each image . for example , When an image is blurred , Most meaningful micro models , Such as the texture of an object , Will be erased . under these circumstances , It is best to expand the receptive field in the early layer , Focus on global information . On the other hand , If an image contains a large amount of class specific information in local details , For example, texture , It will be more important to identify local information .
To test the hypothesis , The author constructed CIFAR-100 Two variations of the dataset ,CIFAR-tile and CIFAR-large, Pictured 1(b) and (c) Shown . The author's model is superior to the artificial model to a great extent .
contribution
In order to alleviate the suboptimal nature of the artificially built architecture and operation , The author puts forward the dynamic optimization of pool operation (DynOPool), This is a learnable resizing module , It can replace the standard resizing operation . This module finds the best scale factor of receptive field for the operation learned on the data set , Thus, the intermediate feature graph in the network is adjusted to an appropriate size and shape .
The main contribution of the paper :
1、 It solves the limitation that the existing scale operators in the deep neural network depend on the predetermined super parameters . The importance of finding the best spatial resolution and receptive field in the intermediate feature map is pointed out .
2、 A learnable module for resizing is proposed DynOPool, It can find the best scale factor and receptive domain of the intermediate feature map .DynOPool Use the learned scale factor to identify the best resolution and receptive field of a layer , And propagate the information to subsequent layers , So as to realize the scale optimization in the whole network .
3、 It is proved that in the task of image classification and semantic segmentation , Use DynOPool The model is superior to the baseline algorithm in multiple data sets and network architecture . It also shows the ideal trade-off between accuracy and computational cost .
Method
1. Dynamically optimize the pool (DynOPool)

chart 2 DynOPool Resizing module in
The module optimizes the scale factor between a pair of input and output feature maps r To optimize query points q And get the best resolution of the intermediate feature mapping .DynOPool Without affecting other operators , Adaptive control of the size and shape of the deeper receiving domain .

chart 3 DynOPool The whole optimization process
For scale factor r Gradient instability , There will be a gradient explosion, which will cause the resolution to change significantly during the training process , Use a Reparameterization r as follows :
![]()
2. Model complexity constraints
To maximize the accuracy of the model ,DynOPool Sometimes there is a large scale factor , The resolution of the intermediate feature map is increased . therefore , In order to constrain the calculation cost , Reduce model size , An additional loss item is introduced LGMACs, It consists of each training iteration t The layered GMACs A simple weighted sum of the counts is given , As shown below :

experiment
surface 1 Manual design model and use DynOPool The accuracy of the model (%) and GMACs Compare

chart 4 stay VGG-16 Using artificial design Shape Adaptor And use DynOPool Visualization of training model .

surface 2 stay CIFAR-100 On dataset DynOPool and Shape Adaptor Comparison

surface 3 stay ImageNet On dataset EfficientNet-B0+DynOPool Performance of

surface 4 be based on PascalVOC Of HRNet-W48 Semantic segmentation results

Conclusion
The author proposes a simple and effective dynamic optimization pool operation (DynOPool), It optimizes the scale factor of end-to-end feature mapping by learning the ideal size and shape of receptive field in each layer , Adjust the size and shape of the intermediate feature map , Effectively extract local details , So as to optimize the overall performance of the model ;
DynOPool The calculation cost is also limited by introducing an additional loss item , So as to control the complexity of the model . Experiments show that , On multiple data sets , This model is superior to baseline network in image classification and semantic segmentation .
CV The technical guide creates a computer vision technology exchange group and a free version of the knowledge planet , At present, the number of people on the planet has 600+, The number of topics reached 200+.
The knowledge planet will release some homework every day , It is used to guide people to learn something , You can continue to punch in and learn according to your homework .
Every day in the technology group, the top conference papers published in recent days will be sent , You can choose the papers you are interested in to read , continued follow Latest technology , If you write an interpretation after reading it and submit it to us , You can also receive royalties .
in addition , The technical group and my circle of friends will also publish various periodicals 、 Notice of solicitation of contributions for the meeting , If you need it, please scan your friends , And pay attention to .
Add groups and planets : Official account CV Technical guide , Get and edit wechat , Invite to join .
Welcome to the official account CV Technical guide , Focus on computer vision technology summary 、 The latest technology tracking 、 Interpretation of classic papers 、CV Recruitment information .
Other articles
Introduction to computer vision
CVPR2022 | Reexamine pooling : Your feeling field is not ideal
CVPR 2022 | Unknown target detection module STUD: Learn about unknown targets in the video
CVPR2022 | Ranking based siamese Visual tracking
CVPR2022 | Through target perception Transformer Distillation of knowledge
CVPR2022 Video scene segmentation under unsupervised pre training
Build from scratch Pytorch Model tutorial ( Four ) Write the training process -- Argument parsing
Build from scratch Pytorch Model tutorial ( 3、 ... and ) build Transformer The Internet
Build from scratch Pytorch Model tutorial ( Two ) Build network
Build from scratch Pytorch Model tutorial ( One ) data fetch
A thermal map visualization code tutorial
Some personal thinking habits and thought summary about learning a new technology or field quickly
边栏推荐
猜你喜欢
![Equidistant segmentation of surface rivers in ArcGIS [gradient coloring, pollutant diffusion]](/img/05/18fb41f78b9b57175d50dfece65535.png)
Equidistant segmentation of surface rivers in ArcGIS [gradient coloring, pollutant diffusion]

C#通过中序遍历对二叉树进行线索化

如何计算win/tai/loss in paired t-test

CVPR2022 | 通过目标感知Transformer进行知识蒸馏

C#线索二叉树的定义

CVPR2022 | 重新审视池化:你的感受野不是最理想的

MATLAB求极限

Cereal mall project
![[cloud native] 2.4 kubernetes core practice (middle)](/img/1e/b1b22caa03d499387e1a47a5f86f25.png)
[cloud native] 2.4 kubernetes core practice (middle)

倍福TwinCAT3 的OPC_UA通信测试案例
随机推荐
cnpm报错‘cnpm‘不是内部或外部命令,也不是可运行的程序或批处理文件
LeetCode_双指针_中等_328.奇偶链表
ArcGIS中对面状河流进行等距分段【渐变赋色、污染物扩散】
CVPR2022 | 通过目标感知Transformer进行知识蒸馏
Murphy safety was selected for signing 24 key projects of Zhongguancun Science City
在印度与软件相关的发明可不可以申请专利?
神经网络各个部分的作用 & 彻底理解神经网络
InDesign plug-in - general function development -js debugger open and close -js script development -id plug-in
代码整洁之道学习笔记
如何計算win/tai/loss in paired t-test
云龙开炮版飞机大战(完整版)
Huffman coding
倍福控制器连接松下EtherCAT伺服注意事项
Beifu PLC controls servo through CANopen communication
Detailed explanation on configuration and commissioning of third-party servo of Beifu TwinCAT -- Taking Huichuan is620n as an example
墨菲安全入选中关村科学城24个重点项目签约
RT thread memory management
Proteus软件初学笔记
Recurrence of recommended models (III): recall models youtubednn and DSSM
1. opencv realizes simple color recognition
