当前位置:网站首页>Cvpr2022 | reexamine pooling: your receptive field is not the best
Cvpr2022 | reexamine pooling: your receptive field is not the best
2022-06-29 13:09:00 【CV technical guide (official account)】
Preface This paper presents a simple and effective dynamic optimization pool operation ( Dynamically Optimized Pooling operation), be called DynOPool, It optimizes the end-to-end scaling factor of feature mapping by learning the optimal size and shape of receptive fields in each layer .
Any type of resizing module in the deep neural network can be used DynOPool Operations are replaced at minimal cost . Besides ,DynOPool The complexity of the model is controlled by introducing an additional loss term that limits the computational cost .
Welcome to the official account CV Technical guide , Focus on computer vision technology summary 、 The latest technology tracking 、 Interpretation of classic papers 、CV Recruitment information .

The paper :https://arxiv.org/abs/2205.15254
Code : Unpublished
background
Although deep neural networks in computer vision 、 natural language processing 、 robot 、 Bioinformatics and other applications have achieved unprecedented success , But the design of optimal network structure is still a challenging problem . and The size and shape of receptive field determine how the network gathers local information , And have a significant impact on the overall performance of the model . Many components of neural networks , For example, kernel size and step size for convolution and pooling operations , Will affect the configuration of receptive field . However , They still depend on super parameters , The receptive field of the existing model will lead to unsatisfactory shape and size .
This paper introduces that the traditional receptive field with fixed size and shape is a sub optimal problem , it has been reviewed that DynOPool How to use CIFAR-100 Upper VGG-16 Toy experiments solve this problem .
The problems of the traditional receptive field with fixed size and shape :
1. Asymmetrically distributed information
The shape of the best receptive field will change according to the inherent spatial information asymmetry in the data set . In most cases, the inherent asymmetry is not measurable . Besides , Input resizing, which is usually used for preprocessing, sometimes leads to information asymmetry . In a artificially designed network , The aspect ratio of the image is often adjusted to meet the input specifications of the model . However , The receptive field in this network is not used for processing operations .
In order to verify the proposed method , The author in CIFAR-stretch-V Experiment on , Pictured 1(a) Shown , Compared with the manual design model , Shape pass DynOPool Dynamically optimized feature mapping improves performance by extracting more valuable information in the horizontal direction .

chart 1 Use from CIFAR-100 Three different synthetic data sets were used for toy experiments :
(a) Randomly crop vertically stretched images (b) stay 4×4 Tile the reduced image in the grid (c) Zoom in and out of the image .
2. Densely or sparsely distributed information
Locality is an integral part of designing the optimal model .CNN Learning the complex representation of images by aggregating local information in a cascading way . The importance of local information depends largely on the attributes of each image . for example , When an image is blurred , Most meaningful micro models , Such as the texture of an object , Will be erased . under these circumstances , It is best to expand the receptive field in the early layer , Focus on global information . On the other hand , If an image contains a large amount of class specific information in local details , For example, texture , It will be more important to identify local information .
To test the hypothesis , The author constructed CIFAR-100 Two variations of the dataset ,CIFAR-tile and CIFAR-large, Pictured 1(b) and (c) Shown . The author's model is superior to the artificial model to a great extent .
contribution
In order to alleviate the suboptimal nature of the artificially built architecture and operation , The author puts forward the dynamic optimization of pool operation (DynOPool), This is a learnable resizing module , It can replace the standard resizing operation . This module finds the best scale factor of receptive field for the operation learned on the data set , Thus, the intermediate feature graph in the network is adjusted to an appropriate size and shape .
The main contribution of the paper :
1、 It solves the limitation that the existing scale operators in the deep neural network depend on the predetermined super parameters . The importance of finding the best spatial resolution and receptive field in the intermediate feature map is pointed out .
2、 A learnable module for resizing is proposed DynOPool, It can find the best scale factor and receptive domain of the intermediate feature map .DynOPool Use the learned scale factor to identify the best resolution and receptive field of a layer , And propagate the information to subsequent layers , So as to realize the scale optimization in the whole network .
3、 It is proved that in the task of image classification and semantic segmentation , Use DynOPool The model is superior to the baseline algorithm in multiple data sets and network architecture . It also shows the ideal trade-off between accuracy and computational cost .
Method
1. Dynamically optimize the pool (DynOPool)

chart 2 DynOPool Resizing module in
The module optimizes the scale factor between a pair of input and output feature maps r To optimize query points q And get the best resolution of the intermediate feature mapping .DynOPool Without affecting other operators , Adaptive control of the size and shape of the deeper receiving domain .

chart 3 DynOPool The whole optimization process
For scale factor r Gradient instability , There will be a gradient explosion, which will cause the resolution to change significantly during the training process , Use a Reparameterization r as follows :
![]()
2. Model complexity constraints
To maximize the accuracy of the model ,DynOPool Sometimes there is a large scale factor , The resolution of the intermediate feature map is increased . therefore , In order to constrain the calculation cost , Reduce model size , An additional loss item is introduced LGMACs, It consists of each training iteration t The layered GMACs A simple weighted sum of the counts is given , As shown below :

experiment
surface 1 Manual design model and use DynOPool The accuracy of the model (%) and GMACs Compare

chart 4 stay VGG-16 Using artificial design Shape Adaptor And use DynOPool Visualization of training model .

surface 2 stay CIFAR-100 On dataset DynOPool and Shape Adaptor Comparison

surface 3 stay ImageNet On dataset EfficientNet-B0+DynOPool Performance of

surface 4 be based on PascalVOC Of HRNet-W48 Semantic segmentation results

Conclusion
The author proposes a simple and effective dynamic optimization pool operation (DynOPool), It optimizes the scale factor of end-to-end feature mapping by learning the ideal size and shape of receptive field in each layer , Adjust the size and shape of the intermediate feature map , Effectively extract local details , So as to optimize the overall performance of the model ;
DynOPool The calculation cost is also limited by introducing an additional loss item , So as to control the complexity of the model . Experiments show that , On multiple data sets , This model is superior to baseline network in image classification and semantic segmentation .
CV The technical guide creates a computer vision technology exchange group and a free version of the knowledge planet , At present, the number of people on the planet has 600+, The number of topics reached 200+.
The knowledge planet will release some homework every day , It is used to guide people to learn something , You can continue to punch in and learn according to your homework .
Every day in the technology group, the top conference papers published in recent days will be sent , You can choose the papers you are interested in to read , continued follow Latest technology , If you write an interpretation after reading it and submit it to us , You can also receive royalties .
in addition , The technical group and my circle of friends will also publish various periodicals 、 Notice of solicitation of contributions for the meeting , If you need it, please scan your friends , And pay attention to .
Add groups and planets : Official account CV Technical guide , Get and edit wechat , Invite to join .
Welcome to the official account CV Technical guide , Focus on computer vision technology summary 、 The latest technology tracking 、 Interpretation of classic papers 、CV Recruitment information .
Other articles
Introduction to computer vision
CVPR2022 | Reexamine pooling : Your feeling field is not ideal
CVPR 2022 | Unknown target detection module STUD: Learn about unknown targets in the video
CVPR2022 | Ranking based siamese Visual tracking
CVPR2022 | Through target perception Transformer Distillation of knowledge
CVPR2022 Video scene segmentation under unsupervised pre training
Build from scratch Pytorch Model tutorial ( Four ) Write the training process -- Argument parsing
Build from scratch Pytorch Model tutorial ( 3、 ... and ) build Transformer The Internet
Build from scratch Pytorch Model tutorial ( Two ) Build network
Build from scratch Pytorch Model tutorial ( One ) data fetch
A thermal map visualization code tutorial
Some personal thinking habits and thought summary about learning a new technology or field quickly
边栏推荐
- Interview shock 61: tell me about MySQL transaction isolation level?
- leetcode 522. 最长特殊序列 II
- AcWing第57场周赛
- Comment calculer Win / Tai / Loss in paired t - test
- Huffman coding
- Hystrix circuit breaker
- 倍福控制器连接松下EtherCAT伺服注意事项
- How to calculate win/tai/loss in paired t-test
- C#实现二叉排序树定义、插入、构造
- RT thread memory management
猜你喜欢

三维模型下载与动画控制
![[intelligent QBD risk assessment tool] Shanghai daoning brings you leanqbd introduction, trial and tutorial](/img/00/9a6d17844b88f6921ad488f4975684.png)
[intelligent QBD risk assessment tool] Shanghai daoning brings you leanqbd introduction, trial and tutorial
![[cloud native] 2.4 kubernetes core practice (middle)](/img/1e/b1b22caa03d499387e1a47a5f86f25.png)
[cloud native] 2.4 kubernetes core practice (middle)

Matlab简单入门

Interview shock 61: tell me about MySQL transaction isolation level?

QT signal and slot

服务器监控netdata面板配置邮件服务

Proteus Software beginner notes

QT custom control: value range

Beifu PLC controls servo through CANopen communication
随机推荐
LR、CR纽扣电池对照表
clickhouse数据库使用jdbc存储毫秒和纳秒
asp.net 项目使用aspnet_compiler.exe发布
别再重复造轮子了,推荐使用 Google Guava 开源工具类库,真心强大!
Cnpm reports an error 'cnpm' is not an internal or external command, nor is it a runnable program or batch file
File contained log poisoning (user agent)
bind原理及模拟实现
C # implements definition, insertion and construction of binary sort tree
Don't build the wheel again. It is recommended to use Google guava open source tool class library. It is really powerful!
MFC-对话框程序核心-IsDialogMessage函数-MSG 消息结构-GetMessage函数-DispatchMessage函数
Huffman coding
AES-128-CBC-Pkcs7Padding加密PHP实例
asp. Net project using aspnet_ compiler. Exe Publishing
LeetCode_双指针_中等_328.奇偶链表
安装typescript环境并开启VSCode自动监视编译ts文件为js文件
Pygame 对图像进行翻转
Simple introduction to matlab
CVPR2022 | A ConvNet for the 2020s & 如何设计神经网络总结
UI file introduction in QT
推荐模型复现(一):熟悉Torch-RecHub框架与使用
