当前位置:网站首页>Cvpr2022 | reexamine pooling: your receptive field is not the best
Cvpr2022 | reexamine pooling: your receptive field is not the best
2022-06-29 13:09:00 【CV technical guide (official account)】
Preface This paper presents a simple and effective dynamic optimization pool operation ( Dynamically Optimized Pooling operation), be called DynOPool, It optimizes the end-to-end scaling factor of feature mapping by learning the optimal size and shape of receptive fields in each layer .
Any type of resizing module in the deep neural network can be used DynOPool Operations are replaced at minimal cost . Besides ,DynOPool The complexity of the model is controlled by introducing an additional loss term that limits the computational cost .
Welcome to the official account CV Technical guide , Focus on computer vision technology summary 、 The latest technology tracking 、 Interpretation of classic papers 、CV Recruitment information .

The paper :https://arxiv.org/abs/2205.15254
Code : Unpublished
background
Although deep neural networks in computer vision 、 natural language processing 、 robot 、 Bioinformatics and other applications have achieved unprecedented success , But the design of optimal network structure is still a challenging problem . and The size and shape of receptive field determine how the network gathers local information , And have a significant impact on the overall performance of the model . Many components of neural networks , For example, kernel size and step size for convolution and pooling operations , Will affect the configuration of receptive field . However , They still depend on super parameters , The receptive field of the existing model will lead to unsatisfactory shape and size .
This paper introduces that the traditional receptive field with fixed size and shape is a sub optimal problem , it has been reviewed that DynOPool How to use CIFAR-100 Upper VGG-16 Toy experiments solve this problem .
The problems of the traditional receptive field with fixed size and shape :
1. Asymmetrically distributed information
The shape of the best receptive field will change according to the inherent spatial information asymmetry in the data set . In most cases, the inherent asymmetry is not measurable . Besides , Input resizing, which is usually used for preprocessing, sometimes leads to information asymmetry . In a artificially designed network , The aspect ratio of the image is often adjusted to meet the input specifications of the model . However , The receptive field in this network is not used for processing operations .
In order to verify the proposed method , The author in CIFAR-stretch-V Experiment on , Pictured 1(a) Shown , Compared with the manual design model , Shape pass DynOPool Dynamically optimized feature mapping improves performance by extracting more valuable information in the horizontal direction .

chart 1 Use from CIFAR-100 Three different synthetic data sets were used for toy experiments :
(a) Randomly crop vertically stretched images (b) stay 4×4 Tile the reduced image in the grid (c) Zoom in and out of the image .
2. Densely or sparsely distributed information
Locality is an integral part of designing the optimal model .CNN Learning the complex representation of images by aggregating local information in a cascading way . The importance of local information depends largely on the attributes of each image . for example , When an image is blurred , Most meaningful micro models , Such as the texture of an object , Will be erased . under these circumstances , It is best to expand the receptive field in the early layer , Focus on global information . On the other hand , If an image contains a large amount of class specific information in local details , For example, texture , It will be more important to identify local information .
To test the hypothesis , The author constructed CIFAR-100 Two variations of the dataset ,CIFAR-tile and CIFAR-large, Pictured 1(b) and (c) Shown . The author's model is superior to the artificial model to a great extent .
contribution
In order to alleviate the suboptimal nature of the artificially built architecture and operation , The author puts forward the dynamic optimization of pool operation (DynOPool), This is a learnable resizing module , It can replace the standard resizing operation . This module finds the best scale factor of receptive field for the operation learned on the data set , Thus, the intermediate feature graph in the network is adjusted to an appropriate size and shape .
The main contribution of the paper :
1、 It solves the limitation that the existing scale operators in the deep neural network depend on the predetermined super parameters . The importance of finding the best spatial resolution and receptive field in the intermediate feature map is pointed out .
2、 A learnable module for resizing is proposed DynOPool, It can find the best scale factor and receptive domain of the intermediate feature map .DynOPool Use the learned scale factor to identify the best resolution and receptive field of a layer , And propagate the information to subsequent layers , So as to realize the scale optimization in the whole network .
3、 It is proved that in the task of image classification and semantic segmentation , Use DynOPool The model is superior to the baseline algorithm in multiple data sets and network architecture . It also shows the ideal trade-off between accuracy and computational cost .
Method
1. Dynamically optimize the pool (DynOPool)

chart 2 DynOPool Resizing module in
The module optimizes the scale factor between a pair of input and output feature maps r To optimize query points q And get the best resolution of the intermediate feature mapping .DynOPool Without affecting other operators , Adaptive control of the size and shape of the deeper receiving domain .

chart 3 DynOPool The whole optimization process
For scale factor r Gradient instability , There will be a gradient explosion, which will cause the resolution to change significantly during the training process , Use a Reparameterization r as follows :
![]()
2. Model complexity constraints
To maximize the accuracy of the model ,DynOPool Sometimes there is a large scale factor , The resolution of the intermediate feature map is increased . therefore , In order to constrain the calculation cost , Reduce model size , An additional loss item is introduced LGMACs, It consists of each training iteration t The layered GMACs A simple weighted sum of the counts is given , As shown below :

experiment
surface 1 Manual design model and use DynOPool The accuracy of the model (%) and GMACs Compare

chart 4 stay VGG-16 Using artificial design Shape Adaptor And use DynOPool Visualization of training model .

surface 2 stay CIFAR-100 On dataset DynOPool and Shape Adaptor Comparison

surface 3 stay ImageNet On dataset EfficientNet-B0+DynOPool Performance of

surface 4 be based on PascalVOC Of HRNet-W48 Semantic segmentation results

Conclusion
The author proposes a simple and effective dynamic optimization pool operation (DynOPool), It optimizes the scale factor of end-to-end feature mapping by learning the ideal size and shape of receptive field in each layer , Adjust the size and shape of the intermediate feature map , Effectively extract local details , So as to optimize the overall performance of the model ;
DynOPool The calculation cost is also limited by introducing an additional loss item , So as to control the complexity of the model . Experiments show that , On multiple data sets , This model is superior to baseline network in image classification and semantic segmentation .
CV The technical guide creates a computer vision technology exchange group and a free version of the knowledge planet , At present, the number of people on the planet has 600+, The number of topics reached 200+.
The knowledge planet will release some homework every day , It is used to guide people to learn something , You can continue to punch in and learn according to your homework .
Every day in the technology group, the top conference papers published in recent days will be sent , You can choose the papers you are interested in to read , continued follow Latest technology , If you write an interpretation after reading it and submit it to us , You can also receive royalties .
in addition , The technical group and my circle of friends will also publish various periodicals 、 Notice of solicitation of contributions for the meeting , If you need it, please scan your friends , And pay attention to .
Add groups and planets : Official account CV Technical guide , Get and edit wechat , Invite to join .
Welcome to the official account CV Technical guide , Focus on computer vision technology summary 、 The latest technology tracking 、 Interpretation of classic papers 、CV Recruitment information .
Other articles
Introduction to computer vision
CVPR2022 | Reexamine pooling : Your feeling field is not ideal
CVPR 2022 | Unknown target detection module STUD: Learn about unknown targets in the video
CVPR2022 | Ranking based siamese Visual tracking
CVPR2022 | Through target perception Transformer Distillation of knowledge
CVPR2022 Video scene segmentation under unsupervised pre training
Build from scratch Pytorch Model tutorial ( Four ) Write the training process -- Argument parsing
Build from scratch Pytorch Model tutorial ( 3、 ... and ) build Transformer The Internet
Build from scratch Pytorch Model tutorial ( Two ) Build network
Build from scratch Pytorch Model tutorial ( One ) data fetch
A thermal map visualization code tutorial
Some personal thinking habits and thought summary about learning a new technology or field quickly
边栏推荐
- LeetCode_双指针_中等_328.奇偶链表
- leetcode 903. DI 序列的有效排列
- Can software related inventions be patented in India?
- UI file introduction in QT
- Schiederwerk Power Supply repair smps12 / 50 pfc3800 Analysis
- C # output the middle order traversal through the clue binary tree
- YOLO系列梳理(九)初尝新鲜出炉的YOLOv6
- Deep understanding of volatile keyword
- 倍福PLC通过CANOpen通信控制伺服
- Cnpm reports an error 'cnpm' is not an internal or external command, nor is it a runnable program or batch file
猜你喜欢

Comparison table of LR and Cr button batteries

Can software related inventions be patented in India?

C # indexe l'arbre binaire en traversant l'ordre moyen

leetcode 第 299场周赛

How to calculate win/tai/loss in paired t-test

【云原生】2.4 Kubernetes 核心实战(中)

从零搭建Pytorch模型教程(五)编写训练过程--一些基本的配置

从零搭建Pytorch模型教程(四)编写训练过程--参数解析

神经网络各个部分的作用 & 彻底理解神经网络

3D model downloading and animation control
随机推荐
SCHIEDERWERK电源维修SMPS12/50 PFC3800解析
倍福PLC通过CANOpen通信控制伺服
Difficult conversation breaks through the bottleneck of conversation and achieves perfect communication
倍福控制第三方伺服走CSV模式--以汇川伺服为例
Equidistant segmentation of surface rivers in ArcGIS [gradient coloring, pollutant diffusion]
Huffman coding
STK_GLTF模型
qt json
InDesign插件-常规功能开发-JS调试器打开和关闭-js脚本开发-ID插件
Unexpected ‘debugger‘ statement no-debugger
AcWing 234 放弃测试
Application Service Vulnerability scanning and exploitation of network security skills competition in secondary vocational schools (SSH private key disclosure)
神经网络各个部分的作用 & 彻底理解神经网络
倍福TwinCAT3 的OPC_UA通信测试案例
Cnpm reports an error 'cnpm' is not an internal or external command, nor is it a runnable program or batch file
Golang image/png processing image rotation writing
2022.6.28-----leetcode. three hundred and twenty-four
23、 1-bit data storage (delay line / core /dram/sram/ tape / disk / optical disc /flash SSD)
面试突击61:说一下MySQL事务隔离级别?
ArcGIS中对面状河流进行等距分段【渐变赋色、污染物扩散】
