当前位置:网站首页>Paper reproduction - ac-fpn:attention-guided context feature pyramid network for object detection
Paper reproduction - ac-fpn:attention-guided context feature pyramid network for object detection
2022-06-29 12:29:00 【RooKiChen】
The reproduced paper has been open source , Need to rely on Detectron library , But now most of the papers use mmdetection 了 ( Need to install mmdetection Ku, please read my last article :Ubuntu install mmdetection), Suspect configuration Detectron Library trouble can refer to my reproduced code .
AC-FPN Original thesis :https://arxiv.org/pdf/2005.11475.pdf
AC-FPN Official code :https://github.com/Caojunxu/AC-FPN
It is not implemented in the source code CxAM and CnAM!!! The reason is that the author has made many experiments and found , A separate CEM The module can also work well .
The specific implementation details of some papers are not clear , So I reproduce it according to my own understanding , If there are different methods, welcome to discuss in the comment area .
List of articles
1.AC-FPN The overall structure

AC-FPN It is used to solve the contradiction between receptive field and feature map in high-resolution images , Intuitively speaking, high-resolution images need larger receptive fields , But the detection effect of large receptive field on small targets is not good , Will misjudge the small target as the background , Based on the above problems ,AC-FPN Two modules are proposed to solve this contradiction , They are context extraction modules (CEM) And pay attention to the boot module (attention -guided module, AM), and AM There are two sub modules in , They are context attention modules (Context Attention Module, CxAM) And content attention module (Content Attention Module, CnAM). The innovation is to use different proportions of hole convolution to extract features ,AM Module CnAM and CxAM Follow self-attention The idea is similar to .
2.CEM:Context Extraction Module
In the original paper, the author said CEM A module is a feature map F5 Five rate The void convolution of , Then make dense links ( Dense links to the original paper ), It can also be seen from the figure that each feature map has arrow links with other feature maps , Then deformable convolution is performed on each characteristic graph , Finally, these feature maps are spliced together 1x1 To jump the number of channels . But there is no deformable convolution in the source code , Moreover, this deformable convolution in the paper is also passed by , So I didn't add deformable convolution in my implementation . You can see that there is another one under the hole convolution structure diagram upsampling operation , However, there is no implementation in the source code. I think it is compressed into a 1x1xC Eigenvector of , The function of this vector should be similar to that of spatial attention , Maybe the effect of the author's addition is not good , In the source code, we will abandon the structure .
Personally think that CEM The reason why the effect is very good is that one is added after each cavity convolution GroupNorm operation , This is not given in the original paper . And BatchNorm Different ,GroupNorm You don't need a big one batch_size( Training COCO Data sets ,batch_size It's usually 2).
3. AM: Attention-guided Module
3.1 CxAM

This module is an ordinary self-attention, It's just in the feature map R The average pooling operation is added later .F yes CEM The output characteristics of , from CEM Generate and contain multiscale receptive field information , Put in CxAM modular . Based on this information ,CxAM Adaptively focus on the relationship between related sub regions . therefore , Output CxAM The functionality of will have clear semantics and include context dependencies within surrounding objects .
3.2 CnAM

CnAM Structure follows CxAM Structure is the same , The original paper says that because CEM Deformable convolution is used , The geometric characteristics of the given image have been completely destroyed , This causes the position to shift . So , We designed a new attention module , Called the content attention module (CnAM), To maintain accurate location information for each object .
The difference is this CnAM Take advantage of F5 Of feature map As an input, make up for the damaged positioning information .
4. Training strategy
Here are my personal training strategies : The backbone network is ResNet50, Again COCO Training on dataset 12 round , Used 8 block 40G The memory GPU, each GPU On 2 A picture , The initial learning rate is 0.02, And in the 8 Round and Chapter 11 Wheel descent 0.1 times , It's not bad
5. Duplicate code
Code synchronized to GitHub:https://github.com/RooKichenn/AC-FPN
边栏推荐
猜你喜欢

Intelligent trash can (IV) -- raspberry pie Pico realizes ultrasonic ranging (hc-sr04)

Artbench: the first class balanced, high-quality, clean annotated and standardized artwork generation data set

ArtBench:第一个类平衡的、高质量的、干净注释的和标准化的艺术品生成数据集
![[JUC series] ThreadLocal of synchronization tool class](/img/15/2f8ce68b9e5ee8dab03fb688712935.png)
[JUC series] ThreadLocal of synchronization tool class

What are outer chain and inner chain?

Weekly recommended short video: How did Einstein think?

面试突击61:说一下MySQL事务隔离级别?

How to install oracle19c in Centos8

面试突击61:说一下MySQL事务隔离级别?

Earth observation satellite data
随机推荐
多项目开发入门-业务场景关联基础入门测试 工资表
ShanDong Multi-University Training #3
DALL-E 2背后的工程实践:确保模型的输出符合内容政策
Cache consistency, delete cache, write cache, cache breakdown, cache penetration, cache avalanche
ERP编制物料清单 基础
对p值的理解
GBase8s数据库INTO STANDARD 和 INTO RAW 子句
Gbase8s database sorts standard or raw result tables
Helping the ultimate experience, best practice of volcano engine edge computing
Gbase8s database into standard and into raw clauses
力扣每日一题-第31天-1779.找到最近的有相同x或y坐标的点
面试突击61:说一下MySQL事务隔离级别?
go 学习-搭建开发环境vscode开发环境golang
架构实战营第五模块课后作业
ERP preparation of bill of materials Huaxia
GBase8s数据库select有ORDER BY 子句
ERP编制物料清单 金蝶
GBase8s数据库select有HAVING 子句
Is the table queried by this EMR sparksql node ODPs?
535. TinyURL 的加密与解密 : 设计一个 URL 简化系统