FcaNet: Frequency Channel Attention Networks

Related tags

Deep LearningFcaNet
Overview

FcaNet: Frequency Channel Attention Networks

PyTorch implementation of the paper "FcaNet: Frequency Channel Attention Networks".

alt text

Simplest usage

Models pretrained on ImageNet can be simply accessed by (without any configuration or installation):

model = torch.hub.load('cfzd/FcaNet', 'fca34' ,pretrained=True)
model = torch.hub.load('cfzd/FcaNet', 'fca50' ,pretrained=True)
model = torch.hub.load('cfzd/FcaNet', 'fca101' ,pretrained=True)
model = torch.hub.load('cfzd/FcaNet', 'fca152' ,pretrained=True)

Install

Please see INSTALL.md

Models

Classification models on ImageNet

Due to the conversion between FP16 training and the provided FP32 models, the evaluation results are slightly different(max -0.06%/+0.05%) compared with the reported results.

Model Reported Evaluation Results Link
FcaNet34 75.07 75.02 GoogleDrive/BaiduDrive(code:m7v8)
FcaNet50 78.52 78.57 GoogleDrive/BaiduDrive(code:mgkk)
FcaNet101 79.64 79.63 GoogleDrive/BaiduDrive(code:8t0j)
FcaNet152 80.08 80.02 GoogleDrive/BaiduDrive(code:5yeq)

Detection and instance segmentation models on COCO

Model Backbone AP AP50 AP75 Link
Faster RCNN FcaNet50 39.0 61.1 42.3 GoogleDrive/BaiduDrive(code:q15c)
Faster RCNN FcaNet101 41.2 63.3 44.6 GoogleDrive/BaiduDrive(code:pgnx)
Mask RCNN Fca50 det
Fca50 seg
40.3
36.2
62.0
58.6
44.1
38.1
GoogleDrive/BaiduDrive(code:d9rn)

Training

Please see launch_training_classification.sh and launch_training_detection.sh for training on ImageNet and COCO, respectively.

Testing

Please see launch_eval_classification.sh and launch_eval_detection.sh for testing on ImageNet and COCO, respectively.

FAQ

Since the paper is uploaded to arxiv, many academic peers ask us: the proposed DCT basis can be viewed as a simple tensor, then how about learning the tensor directly? Why use DCT instead of learnable tensor? Learnable tensor can be better than DCT.

Our concrete answer is: the proposed DCT is better than the learnable way, although it is counter-intuitive.

Method ImageNet Top-1 Acc Link
Learnable tensor, random initialization 77.914 GoogleDrive/BaiduDrive(code:p2hl)
Learnable tensor, DCT initialization 78.352 GoogleDrive/BaiduDrive(code:txje)
Fixed tensor, random initialization 77.742 GoogleDrive/BaiduDrive(code:g5t9)
Fixed tensor, DCT initialization (Ours) 78.574 GoogleDrive/BaiduDrive(code:mgkk)

To verify this results, one can select the cooresponding types of tensor in the L73-L83 in model/layer.py, uncomment it and train the whole network.

TODO

  • Object detection models
  • Instance segmentation models
  • Fix the incorrect results of detection models
  • Make the switching between configs more easier
Comments
  • About the performance on cifar10 or cifar100.

    About the performance on cifar10 or cifar100.

    Thanks for your work!!

    Have you tried using fcanet to train classification tasks on cifar10 or cifar100?. If you have tried, what is the frequency components setting?

    opened by NNNNAI 14
  • 有关self.dct_h和self.dct_w的设置?

    有关self.dct_h和self.dct_w的设置?

    在这个类中MultiSpectralAttentionLayer有以下部分。 if h != self.dct_h or w != self.dct_w: x_pooled = torch.nn.functional.adaptive_avg_pool2d(x, (self.dct_h, self.dct_w)) # If you have concerns about one-line-change, don't worry. :) # In the ImageNet models, this line will never be triggered. # This is for compatibility in instance segmentation and object detection.

    如果我的任务是目标检测,我该怎么设置self.dct_h和self.dct_w?

    opened by XFR1998 6
  • 2d dct FLOPs computing method

    2d dct FLOPs computing method

    Hi, I noticed that in your paper you computed FCAnet model FLOPs.

    I wonder how do you compute the FLOPs of 2d dct? Could you provide your formula or code?

    Thanks!

    opened by TianhaoFu 5
  • What's the difference between FcaBottleneck and FcaBasicBlock ?

    What's the difference between FcaBottleneck and FcaBasicBlock ?

    As in your code, the FcaBottleneck expansion is 4 and FcaBasicBlock is 1, FcaBottleneck has one more layer of convolution than FcaBasicBlock, so how should I choose which module to use ?

    opened by meiguoofa 3
  • 关于通道分组

    关于通道分组

    你好,我是一名深度学习初学者,我添加了两个FCA模块使原模型的mIOU提升了2.3,效果很好; 然而对于通道分组,我有一些其他的看法; 如果分组的通道中表示不同的信息,每个分组再使用不同的频率分量,这似乎会造成更多的信息丢失吧,因为DCT可以看作是一种加权和,可以从论文中看到除了GAP是对每个通道上像素的一视同仁,其他的都是对空间上某一个或几个部分注意的更多,这显然是存在偏颇的,这似乎也能解释为什么单个频率分量实验中GAP的效果最好;在这种情况下,对通道进行分组,或许会造成更多的信息损失? 我仔细思考了下,我认为FCAwork的原因主要是存在通道冗余以及DCT加权形成的一种“互补” 因为存在通道冗余,进行通道分组时可能某些分组中的信息相近,并且这些分组的权重是“互补”的,比如一个权重矩阵更注重左半边,一个更注重右半边这样。似乎模块学习这种‘稀疏’的关系效果会更好。 可以认为FAC比SE更充分的使用了冗余的通道。 考虑了两个实验来证明, 不对减小输入的通道数,将FCA与原模型或是SE进行对比,当通道减少到一定程度时,信息没有那么冗余,这时应该会有大量的信息丢失,精度相较于原模型更低; 关于频率分量的选择,选取某些“对称”“互补”的权重矩阵,而不是通过单个频率分量的性能的来选择,并且去除那些"混乱”的权重矩阵,因为单个频率分量证明这种混乱的权重并没有简单分块的效果好 另外可以在大通道数使用大的分组,在小通道数使用小的分组,来检验是否会获得更好的性能

    不能完全表达我的意思,如有错误,恳请指出!

    opened by Asthestarsfalll 2
  • 跑您的模型的时候遇到的一些问题

    跑您的模型的时候遇到的一些问题

    您好,非常欣赏您的idea,所以尝试跑一下您的分类模型。 我下载了ImageNet2012数据集之后,尝试启动您的模型,遇到了以下问题,想请教一下是否我的哪些设置出错了?

    错误信息如下: Traceback (most recent call last): File "main.py", line 643, in main() File "main.py", line 389, in main avg_train_time = train(train_loader, model, criterion, optimizer, epoch, logger, scheduler) File "main.py", line 471, in train prec1, prec5 = accuracy(output.data, target, topk=(1, 5)) File "main.py", line 631, in accuracy correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

    opened by LihuiNb 2
  • selecting frequency components

    selecting frequency components

    Hi, I want to know how did you select the frequency components like Figure6? I want to select 1, 3, 6, 10 frequencies like zigzag DCT.

    And, I want to know the meaning of the numbers in the layer.py.

    num_freq = int(method[3:])
    if 'top' in method:
        all_top_indices_x = [0,0,6,0,0,1,1,4,5,1,3,0,0,0,3,2,4,6,3,5,5,2,6,5,5,3,3,4,2,2,6,1]
        all_top_indices_y = [0,1,0,5,2,0,2,0,0,6,0,4,6,3,5,2,6,3,3,3,5,1,1,2,4,2,1,1,3,0,5,3]
        mapper_x = all_top_indices_x[:num_freq]
        mapper_y = all_top_indices_y[:num_freq]
    elif 'low' in method:
        all_low_indices_x = [0,0,1,1,0,2,2,1,2,0,3,4,0,1,3,0,1,2,3,4,5,0,1,2,3,4,5,6,1,2,3,4]
        all_low_indices_y = [0,1,0,1,2,0,1,2,2,3,0,0,4,3,1,5,4,3,2,1,0,6,5,4,3,2,1,0,6,5,4,3]
        mapper_x = all_low_indices_x[:num_freq]
        mapper_y = all_low_indices_y[:num_freq]
    elif 'bot' in method:
        all_bot_indices_x = [6,1,3,3,2,4,1,2,4,4,5,1,4,6,2,5,6,1,6,2,2,4,3,3,5,5,6,2,5,5,3,6]
        all_bot_indices_y = [6,4,4,6,6,3,1,4,4,5,6,5,2,2,5,1,4,3,5,0,3,1,1,2,4,2,1,1,5,3,3,3]
        mapper_x = all_bot_indices_x[:num_freq]
        mapper_y = all_bot_indices_y[:num_freq]
    else:
        raise NotImplementedError
    return mapper_x, mapper_y
    
    opened by InukKang 1
  • 不大一致

    不大一致

    在layer.py中有: class MultiSpectralAttentionLayer(torch.nn.Module):中有 self.dct_layer = MultiSpectralDCTLayer(dct_h, dct_w, mapper_x, mapper_y, channel) 可见dct_h在前, dct_w在后 就是h在前,w在后 而在class MultiSpectralDCTLayer(nn.Module):中 def init(self, width, height, mapper_x, mapper_y, channel): 可见 width在前,height在后,就是w在前,h在后 请问这有什么说处么?我晕了

    opened by desertfex 1
  • dct_h and dct_w

    dct_h and dct_w

    How can I set dct_h and dct_w if i want to add FCA layer into another model. My feature maps for the layer I want to inset Fca layer are 160x160, 80x80, 40x40, 20x20

    Please advise.

    opened by myasser63 5
  • 想请问一下代码中bot是怎么选取的代表什么意思

    想请问一下代码中bot是怎么选取的代表什么意思

    elif 'bot' in method:
        all_bot_indices_x = [6,1,3,3,2,4,1,2,4,4,5,1,4,6,2,5,6,1,6,2,2,4,3,3,5,5,6,2,5,5,3,6]
        all_bot_indices_y = [6,4,4,6,6,3,1,4,4,5,6,5,2,2,5,1,4,3,5,0,3,1,1,2,4,2,1,1,5,3,3,3]
    
    opened by Liutingjin 1
Toolkit for collecting and applying prompts

PromptSource Promptsource is a toolkit for collecting and applying prompts to NLP datasets. Promptsource uses a simple templating language to programa

BigScience Workshop 998 Jan 03, 2023
Keras implementation of Deeplab v3+ with pretrained weights

Keras implementation of Deeplabv3+ This repo is not longer maintained. I won't respond to issues but will merge PR DeepLab is a state-of-art deep lear

1.3k Dec 07, 2022
FlowTorch is a PyTorch library for learning and sampling from complex probability distributions using a class of methods called Normalizing Flows

FlowTorch is a PyTorch library for learning and sampling from complex probability distributions using a class of methods called Normalizing Flows.

Meta Incubator 272 Jan 02, 2023
DeepLab-ResNet rebuilt in TensorFlow

DeepLab-ResNet-TensorFlow This is an (re-)implementation of DeepLab-ResNet in TensorFlow for semantic image segmentation on the PASCAL VOC dataset. Fr

Vladimir 1.2k Nov 04, 2022
Attempt at implementation of a simple GAN using Keras

Simple GAN This is my attempt to make a wrapper class for a GAN in keras which can be used to abstract the whole architecture process. Simple GAN Over

Deven96 7 May 23, 2019
Code of the paper "Multi-Task Meta-Learning Modification with Stochastic Approximation".

Multi-Task Meta-Learning Modification with Stochastic Approximation This repository contains the code for the paper "Multi-Task Meta-Learning Modifica

Andrew 3 Jan 05, 2022
DeepProbLog is an extension of ProbLog that integrates Probabilistic Logic Programming with deep learning by introducing the neural predicate.

DeepProbLog DeepProbLog is an extension of ProbLog that integrates Probabilistic Logic Programming with deep learning by introducing the neural predic

KU Leuven Machine Learning Research Group 94 Dec 18, 2022
GBIM(Gesture-Based Interaction map)

手势交互地图 GBIM(Gesture-Based Interaction map),基于视觉深度神经网络的交互地图,通过电脑摄像头观察使用者的手势变化,进而控制地图进行简单的交互。网络使用PaddleX提供的轻量级模型PPYOLO Tiny以及MobileNet V3 small,使得整个模型大小约10MB左右,即使在CPU下也能快速定位和识别手势。

8 Feb 10, 2022
A dual benchmarking study of visual forgery and visual forensics techniques

A dual benchmarking study of facial forgery and facial forensics In recent years, visual forgery has reached a level of sophistication that humans can

8 Jul 06, 2022
Air Quality Prediction Using LSTM

AirQualityPredictionUsingLSTM In this Repo, i present to you the winning solution of smart gujarat hackathon 2019 where the task was to predict the qu

Deepak Nandwani 2 Dec 13, 2022
When are Iterative GPs Numerically Accurate?

When are Iterative GPs Numerically Accurate? This is a code repository for the paper "When are Iterative GPs Numerically Accurate?" by Wesley Maddox,

Wesley Maddox 1 Jan 06, 2022
The official implementation of the CVPR2021 paper: Decoupled Dynamic Filter Networks

Decoupled Dynamic Filter Networks This repo is the official implementation of CVPR2021 paper: "Decoupled Dynamic Filter Networks". Introduction DDF is

F.S.Fire 180 Dec 30, 2022
A Python package to process & model ChEMBL data.

insilico: A Python package to process & model ChEMBL data. ChEMBL is a manually curated chemical database of bioactive molecules with drug-like proper

Steven Newton 0 Dec 09, 2021
XViT - Space-time Mixing Attention for Video Transformer

XViT - Space-time Mixing Attention for Video Transformer This is the official implementation of the XViT paper: @inproceedings{bulat2021space, title

Adrian Bulat 33 Dec 23, 2022
NudeNet: Neural Nets for Nudity Classification, Detection and selective censoring

NudeNet: Neural Nets for Nudity Classification, Detection and selective censoring Uncensored version of the following image can be found at https://i.

notAI.tech 1.1k Dec 29, 2022
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks

TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks [Paper] [Project Website] This repository holds the source code, pretra

Humam Alwassel 83 Dec 21, 2022
Diverse Image Generation via Self-Conditioned GANs

Diverse Image Generation via Self-Conditioned GANs Project | Paper Diverse Image Generation via Self-Conditioned GANs Steven Liu, Tongzhou Wang, David

Steven Liu 147 Dec 03, 2022
Deep Crop Rotation

Deep Crop Rotation Paper (to come very soon!) We propose a deep learning approach to modelling both inter- and intra-annual patterns for parcel classi

Félix Quinton 5 Sep 23, 2022
Code for the preprint "Well-classified Examples are Underestimated in Classification with Deep Neural Networks"

This is a repository for the paper of "Well-classified Examples are Underestimated in Classification with Deep Neural Networks" The implementation and

LancoPKU 25 Dec 11, 2022
LSTM model trained on a small dataset of 3000 names written in PyTorch

LSTM model trained on a small dataset of 3000 names. Model generates names from model by selecting one out of top 3 letters suggested by model at a time until an EOS (End Of Sentence) character is no

Sahil Lamba 1 Dec 20, 2021