Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

Overview

Class Activation Map methods implemented in Pytorch

pip install grad-cam

Tested on many Common CNN Networks and Vision Transformers.

Includes smoothing methods to make the CAMs look nice.

Full support for batches of images in all methods.

visualization

Method What it does
GradCAM Weight the 2D activations by the average gradient
GradCAM++ Like GradCAM but uses second order gradients
XGradCAM Like GradCAM but scale the gradients by the normalized activations
AblationCAM Zero out activations and measure how the output drops (this repository includes a fast batched implementation)
ScoreCAM Perbutate the image by the scaled activations and measure how the output drops
EigenCAM Takes the first principle component of the 2D Activations (no class discrimination, but seems to give great results)
EigenGradCAM Like EigenCAM but with class discrimination: First principle component of Activations*Grad. Looks like GradCAM, but cleaner
LayerCAM Spatially weight the activations by positive gradients. Works better especially in lower layers

What makes the network think the image label is 'pug, pug-dog' and 'tabby, tabby cat':

Dog Cat

Combining Grad-CAM with Guided Backpropagation for the 'pug, pug-dog' class:

Combined

More Visual Examples

Resnet50:

Category Image GradCAM AblationCAM ScoreCAM
Dog
Cat

Vision Transfomer (Deit Tiny):

Category Image GradCAM AblationCAM ScoreCAM
Dog
Cat

Swin Transfomer (Tiny window:7 patch:4 input-size:224):

Category Image GradCAM AblationCAM ScoreCAM
Dog
Cat

It seems that GradCAM++ is almost the same as GradCAM, in most networks except VGG where the advantage is larger.

Network Image GradCAM GradCAM++ Score-CAM Ablation-CAM Eigen-CAM
VGG16
Resnet50

Chosing the Target Layer

You need to choose the target layer to compute CAM for. Some common choices are:

  • Resnet18 and 50: model.layer4[-1]
  • VGG and densenet161: model.features[-1]
  • mnasnet1_0: model.layers[-1]
  • ViT: model.blocks[-1].norm1
  • SwinT: model.layers[-1].blocks[-1].norm1

Using from code as a library

from pytorch_grad_cam import GradCAM, ScoreCAM, GradCAMPlusPlus, AblationCAM, XGradCAM, EigenCAM
from pytorch_grad_cam.utils.image import show_cam_on_image
from torchvision.models import resnet50

model = resnet50(pretrained=True)
target_layer = model.layer4[-1]
input_tensor = # Create an input tensor image for your model..
# Note: input_tensor can be a batch tensor with several images!

# Construct the CAM object once, and then re-use it on many images:
cam = GradCAM(model=model, target_layer=target_layer, use_cuda=args.use_cuda)

# If target_category is None, the highest scoring category
# will be used for every image in the batch.
# target_category can also be an integer, or a list of different integers
# for every image in the batch.
target_category = 281

# You can also pass aug_smooth=True and eigen_smooth=True, to apply smoothing.
grayscale_cam = cam(input_tensor=input_tensor, target_category=target_category)

# In this example grayscale_cam has only one image in the batch:
grayscale_cam = grayscale_cam[0, :]
visualization = show_cam_on_image(rgb_img, grayscale_cam)

Smoothing to get nice looking CAMs

To reduce noise in the CAMs, and make it fit better on the objects, two smoothing methods are supported:

  • aug_smooth=True

    Test time augmentation: increases the run time by x6.

    Applies a combination of horizontal flips, and mutiplying the image by [1.0, 1.1, 0.9].

    This has the effect of better centering the CAM around the objects.

  • eigen_smooth=True

    First principle component of activations*weights

    This has the effect of removing a lot of noise.

AblationCAM aug smooth eigen smooth aug+eigen smooth

Running the example script:

Usage: python cam.py --image-path <path_to_image> --method <method>

To use with CUDA: python cam.py --image-path <path_to_image> --use-cuda


You can choose between:

GradCAM , ScoreCAM, GradCAMPlusPlus, AblationCAM, XGradCAM , LayerCAM and EigenCAM.

Some methods like ScoreCAM and AblationCAM require a large number of forward passes, and have a batched implementation.

You can control the batch size with cam.batch_size =


How does it work with Vision Transformers

See usage_examples/vit_example.py

In ViT the output of the layers are typically BATCH x 197 x 192. In the dimension with 197, the first element represents the class token, and the rest represent the 14x14 patches in the image. We can treat the last 196 elements as a 14x14 spatial image, with 192 channels.

To reshape the activations and gradients to 2D spatial images, we can pass the CAM constructor a reshape_transform function.

This can also be a starting point for other architectures that will come in the future.

GradCAM(model=model, target_layer=target_layer, reshape_transform=reshape_transform)

def reshape_transform(tensor, height=14, width=14):
    result = tensor[:, 1 :  , :].reshape(tensor.size(0),
        height, width, tensor.size(2))

    # Bring the channels to the first dimension,
    # like in CNNs.
    result = result.transpose(2, 3).transpose(1, 2)
    return result

Which target_layer should we chose for Vision Transformers?

Since the final classification is done on the class token computed in the last attention block, the output will not be affected by the 14x14 channels in the last layer. The gradient of the output with respect to them, will be 0!

We should chose any layer before the final attention block, for example:

target_layer = model.blocks[-1].norm1

How does it work with Swin Transformers

See usage_examples/swinT_example.py

In Swin transformer base the output of the layers are typically BATCH x 49 x 1024. We can treat the last 49 elements as a 7x7 spatial image, with 1024 channels.

To reshape the activations and gradients to 2D spatial images, we can pass the CAM constructor a reshape_transform function.

This can also be a starting point for other architectures that will come in the future.

GradCAM(model=model, target_layer=target_layer, reshape_transform=reshape_transform)

def reshape_transform(tensor, height=7, width=7):
    result = tensor.reshape(tensor.size(0),
        height, width, tensor.size(2))

    # Bring the channels to the first dimension,
    # like in CNNs.
    result = result.transpose(2, 3).transpose(1, 2)
    return result

Which target_layer should we chose for Swin Transformers?

Since the swin transformer is different from ViT, it does not contains cls_token as present in ViT, therefore we will use all the 7x7 images we get from the last block of the last layer.

We should chose any layer before the final attention block, for example:

target_layer = model.layers[-1].blocks[-1].norm1

Citation

If you use this for research, please cite. Here is an example BibTeX entry:

@misc{jacobgilpytorchcam,
  title={PyTorch library for CAM methods},
  author={Jacob Gildenblat and contributors},
  year={2021},
  publisher={GitHub},
  howpublished={\url{https://github.com/jacobgil/pytorch-grad-cam}},
}

References

https://arxiv.org/abs/1610.02391
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra

https://arxiv.org/abs/1710.11063
Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks Aditya Chattopadhyay, Anirban Sarkar, Prantik Howlader, Vineeth N Balasubramanian

https://arxiv.org/abs/1910.01279
Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks Haofan Wang, Zifan Wang, Mengnan Du, Fan Yang, Zijian Zhang, Sirui Ding, Piotr Mardziel, Xia Hu

https://ieeexplore.ieee.org/abstract/document/9093360/
Ablation-cam: Visual explanations for deep convolutional network via gradient-free localization. Saurabh Desai and Harish G Ramaswamy. In WACV, pages 972–980, 2020

https://arxiv.org/abs/2008.02312
Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs Ruigang Fu, Qingyong Hu, Xiaohu Dong, Yulan Guo, Yinghui Gao, Biao Li

https://arxiv.org/abs/2008.00299
Eigen-CAM: Class Activation Map using Principal Components Mohammed Bany Muhammad, Mohammed Yeasin

http://mftp.mmcheng.net/Papers/21TIP_LayerCAM.pdf
LayerCAM: Exploring Hierarchical Class Activation Maps for Localization Peng-Tao Jiang; Chang-Bin Zhang; Qibin Hou; Ming-Ming Cheng; Yunchao Wei

Owner
Jacob Gildenblat
Machine learning / Computer Vision.
Jacob Gildenblat
Official PyTorch Implementation of "Self-supervised Auxiliary Learning with Meta-paths for Heterogeneous Graphs". NeurIPS 2020.

Self-supervised Auxiliary Learning with Meta-paths for Heterogeneous Graphs This repository is the implementation of SELAR. Dasol Hwang* , Jinyoung Pa

MLV Lab (Machine Learning and Vision Lab at Korea University) 48 Nov 09, 2022
DNA sequence classification by Deep Neural Network

DNA sequence classification by Deep Neural Network: Project Overview worked on the DNA sequence classification problem where the input is the DNA sequ

Mohammed Jawwadul Islam Fida 0 Aug 02, 2022
[CVPR'21] DeepSurfels: Learning Online Appearance Fusion

DeepSurfels: Learning Online Appearance Fusion Paper | Video | Project Page This is the official implementation of the CVPR 2021 submission DeepSurfel

Online Reconstruction 52 Nov 14, 2022
neural image generation

pixray Pixray is an image generation system. It combines previous ideas including: Perception Engines which uses image augmentation and iteratively op

dribnet 398 Dec 17, 2022
Anonymize BLM Protest Images

Anonymize BLM Protest Images This repository automates @BLMPrivacyBot, a Twitter bot that shows the anonymized images to help keep protesters safe. Us

Stanford Machine Learning Group 40 Oct 13, 2022
RLMeta is a light-weight flexible framework for Distributed Reinforcement Learning Research.

RLMeta rlmeta - a flexible lightweight research framework for Distributed Reinforcement Learning based on PyTorch and moolib Installation To build fro

Meta Research 281 Dec 22, 2022
A unet implementation for Image semantic segmentation

Unet-pytorch a unet implementation for Image semantic segmentation 参考网上的Unet做分割的代码,做了一个针对kaggle地盐识别的,请去以下地址获取数据集: https://www.kaggle.com/c/tgs-salt-id

Rabbit 3 Jun 29, 2022
Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning, CVPR 2021

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning By Zhenda Xie*, Yutong Lin*, Zheng Zhang, Yue Ca

Zhenda Xie 293 Dec 20, 2022
Tracking Pipeline helps you to solve the tracking problem more easily

Tracking_Pipeline Tracking_Pipeline helps you to solve the tracking problem more easily I integrate detection algorithms like: Yolov5, Yolov4, YoloX,

VNOpenAI 32 Dec 21, 2022
Denoising Diffusion Implicit Models

Denoising Diffusion Implicit Models (DDIM) Jiaming Song, Chenlin Meng and Stefano Ermon, Stanford Implements sampling from an implicit model that is t

465 Jan 05, 2023
The code for our paper "AutoSF: Searching Scoring Functions for Knowledge Graph Embedding"

AutoSF The code for our paper "AutoSF: Searching Scoring Functions for Knowledge Graph Embedding" and this paper has been accepted by ICDE2020. News:

AutoML Research 64 Dec 17, 2022
AAI supports interdisciplinary research to help better understand human, animal, and artificial cognition.

AnimalAI 3 AAI supports interdisciplinary research to help better understand human, animal, and artificial cognition. It aims to support AI research t

Matthew Crosby 58 Dec 12, 2022
Out of Distribution Detection on Natural Adversarial Examples

OOD-on-NAE Research project on out of distribution detection for the Computer Vision course by Prof. Rob Fergus (CSCI-GA 2271) Paper out on arXiv - ht

Anugya 1 Jun 08, 2022
for a paper about leveraging discourse markers for training new models

TSLM-DISCOURSE-MARKERS Scope This repository contains: (1) Code to extract discourse markers from wikipedia (TSA). (1) Code to extract significant dis

International Business Machines 6 Nov 02, 2022
Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation Introduction This is a PyTorch

XMed-Lab 30 Sep 23, 2022
Code for "Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance" at NeurIPS 2021

Finding Regions of Heterogeneity in Decision-Making via Expected Conditional Covariance Justin Lim, Christina X Ji, Michael Oberst, Saul Blecker, Leor

Sontag Lab 3 Feb 03, 2022
A list of awesome PyTorch scholarship articles, guides, blogs, courses and other resources.

Awesome PyTorch Scholarship Resources A collection of awesome PyTorch and Python learning resources. Contributions are always welcome! Course Informat

Arnas Gečas 302 Dec 03, 2022
ATAC: Adversarially Trained Actor Critic

ATAC: Adversarially Trained Actor Critic Adversarially Trained Actor Critic for Offline Reinforcement Learning by Ching-An Cheng*, Tengyang Xie*, Nan

Microsoft 41 Dec 08, 2022
基于DouZero定制AI实战欢乐斗地主

DouZero_For_Happy_DouDiZhu: 将DouZero用于欢乐斗地主实战 本项目基于DouZero 环境配置请移步项目DouZero 模型默认为WP,更换模型请修改start.py中的模型路径 运行main.py即可 SL (baselines/sl/): 基于人类数据进行深度学习

1.5k Jan 08, 2023