Implementation of Axial attention - attending to multi-dimensional data efficiently

Overview

Axial Attention

PyPI version

Implementation of Axial attention in Pytorch. A simple but powerful technique to attend to multi-dimensional data efficiently. It has worked wonders for me and many other researchers.

Simply add some positional encoding to your data and pass it into this handy class, specifying which dimension is considered the embedding, and how many axial dimensions to rotate through. All the permutating, reshaping, will be taken care of for you.

This paper was actually rejected on the basis of being too simple. And yet, it has since been used successfully in a number of applications, among those weather prediction, all-attention image segmentation. Just goes to show.

Install

$ pip install axial_attention

Usage

Image

import torch
from axial_attention import AxialAttention

img = torch.randn(1, 3, 256, 256)

attn = AxialAttention(
    dim = 3,               # embedding dimension
    dim_index = 1,         # where is the embedding dimension
    dim_heads = 32,        # dimension of each head. defaults to dim // heads if not supplied
    heads = 1,             # number of heads for multi-head attention
    num_dimensions = 2,    # number of axial dimensions (images is 2, video is 3, or more)
    sum_axial_out = True   # whether to sum the contributions of attention on each axis, or to run the input through them sequentially. defaults to true
)

attn(img) # (1, 3, 256, 256)

Channel-last image latents

import torch
from axial_attention import AxialAttention

img = torch.randn(1, 20, 20, 512)

attn = AxialAttention(
    dim = 512,           # embedding dimension
    dim_index = -1,      # where is the embedding dimension
    heads = 8,           # number of heads for multi-head attention
    num_dimensions = 2,  # number of axial dimensions (images is 2, video is 3, or more)
)

attn(img) # (1, 20, 20 ,512)

Video

import torch
from axial_attention import AxialAttention

video = torch.randn(1, 5, 128, 256, 256)

attn = AxialAttention(
    dim = 128,           # embedding dimension
    dim_index = 2,       # where is the embedding dimension
    heads = 8,           # number of heads for multi-head attention
    num_dimensions = 3,  # number of axial dimensions (images is 2, video is 3, or more)
)

attn(video) # (1, 5, 128, 256, 256)

Image Transformer, with reversible network

import torch
from torch import nn
from axial_attention import AxialImageTransformer

conv1x1 = nn.Conv2d(3, 128, 1)

transformer = AxialImageTransformer(
    dim = 128,
    depth = 12,
    reversible = True
)

img = torch.randn(1, 3, 512, 512)

transformer(conv1x1(img)) # (1, 3, 512, 512)

With axial positional embedding

import torch
from axial_attention import AxialAttention, AxialPositionalEmbedding

img = torch.randn(1, 512, 20, 20)

attn = AxialAttention(
    dim = 512,
    heads = 8,
    dim_index = 1
)

pos_emb = AxialPositionalEmbedding(
    dim = 512,
    shape = (20, 20)
)

img = pos_emb(img)  # (1, 512, 20, 20)  - now positionally embedded
img = attn(img)     # (1, 512, 20, 20)

Citation

@misc{ho2019axial,
    title  = {Axial Attention in Multidimensional Transformers},
    author = {Jonathan Ho and Nal Kalchbrenner and Dirk Weissenborn and Tim Salimans},
    year   = {2019},
    archivePrefix = {arXiv}
}
@misc{wang2020axialdeeplab,
    title   = {Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation},
    author  = {Huiyu Wang and Yukun Zhu and Bradley Green and Hartwig Adam and Alan Yuille and Liang-Chieh Chen},
    year    = {2020},
    eprint  = {2003.07853},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}
@inproceedings{huang2019ccnet,
    title   = {Ccnet: Criss-cross attention for semantic segmentation},
    author  = {Huang, Zilong and Wang, Xinggang and Huang, Lichao and Huang, Chang and Wei, Yunchao and Liu, Wenyu},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
    pages   = {603--612},
    year    = {2019}
}
Comments
  • Reimplementation of image modeling results in AXIAL ATTENTION IN MULTIDIMENSIONAL TRANSFORMERS.

    Reimplementation of image modeling results in AXIAL ATTENTION IN MULTIDIMENSIONAL TRANSFORMERS.

    Hi, this is a nice paper. How can I use your shared code to reimplement the image modeling task on ImageNet 32x32?

    Thanks. Looking forward to your reply.

    opened by liujiaheng 3
  • AxialPositionalEmbedding

    AxialPositionalEmbedding

    Would you be able to provide an example of how to add the positional encoding with the AxialPositionalEmbedding class or explain what the emb_dim, emb_dim_index, and dimensions arguments are specifically? Thanks for the repo!

    opened by dansola 2
  • Problem of ParameterList with nn.DataParallel

    Problem of ParameterList with nn.DataParallel

    https://github.com/lucidrains/axial-attention/blob/a1a483c0f4a3922eef8f9a857dc1a802523bd437/axial_attention/axial_attention.py#L100

    This line would lead to the following issue: "UserWarning: nn.ParameterList is being used with DataParallel but this is not supported. This list will appear empty for the models replicated on each GPU except the original one."

    It is a known issue here

    The simple solution should be to store the Parameters directly on the Module.

    class AxialPositionalEmbedding(nn.Module):
        def __init__(self, dim, shape, emb_dim_index = 1):
            super().__init__()
            parameters = []
            total_dimensions = len(shape) + 2
            ax_dim_indexes = [i for i in range(1, total_dimensions) if i != emb_dim_index]
            
            for i, (axial_dim, axial_dim_index) in enumerate(zip(shape, ax_dim_indexes)):
                shape = [1] * total_dimensions
                shape[emb_dim_index] = dim
                shape[axial_dim_index] = axial_dim
                parameter = nn.Parameter(torch.randn(*shape))
                setattr(self, f'param_{i}', parameter)
                setattr(self, f'param_num', i+1)
    
        def forward(self, x):
            for i in range(self.param_num):
                x = x + getattr(self, f'param_{i}')
            return x
    
    opened by resuly 1
  • Positional embeddings for different image sizes

    Positional embeddings for different image sizes

    Hi, once again thanks for your great work! Since I want to use the axial attention with positional embedding for unknown image sizes (But I know the max size), I was wondering if you think that changing https://github.com/lucidrains/axial-attention/blob/master/axial_attention/axial_attention.py#L104 to

    for cnt, param in enumerate(self.params):
        x = x + param[([slice(None)] * (cnt + 2) + [slice(x.shape[cnt + 2])])]
    

    does the right thing. I can now do this

    v = AxialImageTransformer(64, depth = 1, axial_pos_emb_shape = (64,64), dim_index = 1)       
    t1 = torch.randn(2, 64, 17, 16)
    t2 = torch.randn(2, 64, 13, 18)
    t3 = torch.randn(2, 64, 64, 64)
    print(v(t1).shape)
    print(v(t2).shape)
    print(v(t3).shape)
    Output:
    torch.Size([2, 64, 17, 16])
    torch.Size([2, 64, 13, 18])
    torch.Size([2, 64, 64, 64])
    

    I think that makes it easier to integrate it in fully convolutional nets for multi scale training.

    opened by PhilippMarquardt 1
  • User Warning: Mixed memory format inputs detected

    User Warning: Mixed memory format inputs detected

    At site-packages/axial_attention/axial_attention.py:176: UserWarning: Mixed memory format inputs detected while calling the operator. The operator will output contiguous tensor even if some of the inputs are in channels_last format. ( Triggered internally at /opt/conda/conda-bld/pytorch_1595629427286/work/aten/src/ATen/native/TensorIterator.cpp:918.) return sum(map(lambda axial_attn: axial_attn(x), self.axial_attentions))

    I am using latest axial_attention (v0.4) and Pytorch 1.6.0

    Code:

    import torch
    from axial_attention import AxialAttention
    
    img = torch.randn(1, 24, 64, 64)
    
    attn = AxialAttention(
        dim = 24,               # embedding dimension
        dim_index = 1,         # where is the embedding dimension
        dim_heads = 32,        # dimension of each head. defaults to dim // heads if not supplied
        heads = 8,             # number of heads for multi-head attention
        num_dimensions = 2,    # number of axial dimensions (images is 2, video is 3, or more)
        sum_axial_out = True   # whether to sum the contributions of attention on each axis, or to run the input through them sequentially. defaults to true
    )
    
    out= attn(img) 
    
    

    Will it affect trainings and inference?

    opened by lokeshkvn 1
  • Examples for image sequence/video

    Examples for image sequence/video

    Hello, Do you have examples of integrating this on image sequences? I am trying to get rid of ConvLSTM's for encoding sequence of images and AxialAttention may be a good starting point. Do you have an exmaple/notebook that I could look to integrate this on my type of data? Thank you for this amazing work. Thomas

    opened by tcapelle 1
  • Ask a question

    Ask a question

    I'm interested to your excellent work,but I'm new to pytorch,can I ask a question where is the start position in the code that i will understand whole project from it ?Thx for your reply

    opened by meiguoofa 0
  • Hi, I have a problem

    Hi, I have a problem

    import torch from axial_attention import AxialAttention

    img = torch.randn(1, 3, 256, 256)

    attn = AxialAttention( dim = 3, # embedding dimension dim_index = 1, # where is the embedding dimension dim_heads = 32, # dimension of each head. defaults to dim // heads if not supplied heads = 1, # number of heads for multi-head attention num_dimensions = 2, # number of axial dimensions (images is 2, video is 3, or more) sum_axial_out = True # whether to sum the contributions of attention on each axis, or to run the input through them sequentially. defaults to true )

    attn(img) # (1, 3, 256, 256)

    Thanks for your great project, I want to ask if my image is one channel image will influence the num_dimensions value?

    opened by meiguoofa 0
  • Extracting attention maps

    Extracting attention maps

    Hi there,

    Excellent project!

    I'm using axial-attention with video (1, 5, 128, 256, 256) and sum_axial_out=True, and I wish to visualise the attention maps.

    Essentially, given my video, and two frame indices frame_a_idx and frame_b_idx, I need to extract the attention map over frame_b to a chosen pixel (x, y) in frame_a (after the axial sum).

    My understanding is that I should be able to reshape the dots (after softmax) according to the permutations in calculate_permutations, then sum these permuted dots together to form a final attention score tensor of an accessible shape, thus ready for visualisation.

    I am slightly stuck due to the numerous axial permutations and shape mismatches. What I am doing is as follows:

    In SelfAttention.forward():

    dots_reshaped = dots.reshape(b, h, t, t)
    return out, dots_reshaped
    

    In PermuteToFrom.forward():

     # attention
    axial, dots = self.fn(axial, **kwargs)
    
    # restore to original shape and permutation
    axial = axial.reshape(*shape)
    axial = axial.permute(*self.inv_permutation).contiguous()
    dots = dots.reshape(*shape[:3], *dots.shape[1:])
    

    However, I am unsure of how to un-permute the dots appropriately such that all resulting โ€œaxesโ€ (of different sizes) can be summed. If you have suggestions or code for doing so, it would be very much appreciated, thanks!

    opened by vibrant-galaxy 3
Releases(0.6.1)
Owner
Phil Wang
Working with Attention. It's all we need
Phil Wang
Codes for paper "KNAS: Green Neural Architecture Search"

KNAS Codes for paper "KNAS: Green Neural Architecture Search" KNAS is a green (energy-efficient) Neural Architecture Search (NAS) approach. It contain

90 Dec 22, 2022
OBBDetection is a oriented object detection library, which is based on MMdetection.

OBBDetection news: We are now updating OBBDetection to new vision based on MMdetection v2.10, which has more advanced models and more efficient featur

jbwang1997 401 Jan 02, 2023
Repository containing the PhD Thesis "Formal Verification of Deep Reinforcement Learning Agents"

Getting Started This repository contains the code used for the following publications: Probabilistic Guarantees for Safe Deep Reinforcement Learning (

Edoardo Bacci 5 Aug 31, 2022
RP-GAN: Stable GAN Training with Random Projections

RP-GAN: Stable GAN Training with Random Projections This repository contains a reference implementation of the algorithm described in the paper: Behna

Ayan Chakrabarti 20 Sep 18, 2021
Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems

AequeVox Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems README under development. Python Packages Required

Sai Sathiesh 2 Aug 28, 2022
Python library for tracking human heads with FLAME (a 3D morphable head model)

Video Head Tracker 3D tracking library for human heads based on FLAME (a 3D morphable head model). The tracking algorithm is inspired by face2face. It

61 Dec 25, 2022
Objax Apache-2Objax (๐Ÿฅ‰19 ยท โญ 580) - Objax is a machine learning framework that provides an Object.. Apache-2 jax

Objax Tutorials | Install | Documentation | Philosophy This is not an officially supported Google product. Objax is an open source machine learning fr

Google 729 Jan 02, 2023
Simple and Distributed Machine Learning

Synapse Machine Learning SynapseML (previously MMLSpark) is an open source library to simplify the creation of scalable machine learning pipelines. Sy

Microsoft 3.9k Dec 30, 2022
Implementation of Invariant Point Attention, used for coordinate refinement in the structure module of Alphafold2, as a standalone Pytorch module

Invariant Point Attention - Pytorch Implementation of Invariant Point Attention as a standalone module, which was used in the structure module of Alph

Phil Wang 113 Jan 05, 2023
Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Implementation of temporal pooling methods studied in [ICIP'20] A Comparative Evaluation Of Temporal Pooling Methods For Blind Video Quality Assessment

Zhengzhong Tu 5 Sep 16, 2022
Code for our CVPR2021 paper coordinate attention

Coordinate Attention for Efficient Mobile Network Design (preprint) This repository is a PyTorch implementation of our coordinate attention (will appe

Qibin (Andrew) Hou 726 Jan 05, 2023
DEEPAGร‰: Answering Questions in Portuguese about the Brazilian Environment

DEEPAGร‰: Answering Questions in Portuguese about the Brazilian Environment This repository is related to the paper DEEPAGร‰: Answering Questions in Por

0 Dec 10, 2021
Unofficial pytorch implementation of the paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution"

DFSA Unofficial pytorch implementation of the ICCV 2021 paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution" (p

2 Nov 15, 2021
This is an example of object detection on Micro bacterium tuberculosis using Mask-RCNN

Mask-RCNN on Mycobacterium tuberculosis This is an example of object detection on Mycobacterium Tuberculosis using Mask RCNN. Implement of Mask R-CNN

Jun-En Ding 1 Sep 16, 2021
A curated list of programmatic weak supervision papers and resources

A curated list of programmatic weak supervision papers and resources

Jieyu Zhang 118 Jan 02, 2023
๊ณต๊ณต์žฅ์†Œ์—์„œ ๋ˆˆ๋งŒ ๋Œ๋ฆฌ๋ฉด CCTV๊ฐ€ ๋ณด์ธ๋‹ค๋Š” ๋ง์ด ๊ณผ์–ธ์ด ์•„๋‹ ์ •๋„๋กœ CCTV๊ฐ€ ์šฐ๋ฆฌ ์ƒํ™œ์— ๊นŠ์ˆ™์ด ์ž๋ฆฌ ์žก์•˜์Šต๋‹ˆ๋‹ค.

ObsCare_Main ์†Œ๊ฐœ ๊ณต๊ณต์žฅ์†Œ์—์„œ ๋ˆˆ๋งŒ ๋Œ๋ฆฌ๋ฉด CCTV๊ฐ€ ๋ณด์ธ๋‹ค๋Š” ๋ง์ด ๊ณผ์–ธ์ด ์•„๋‹ ์ •๋„๋กœ CCTV๊ฐ€ ์šฐ๋ฆฌ ์ƒํ™œ์— ๊นŠ์ˆ™์ด ์ž๋ฆฌ ์žก์•˜์Šต๋‹ˆ๋‹ค. CCTV์˜ ๋Œ€์ˆ˜๊ฐ€ ๊ธ‰๊ฒฉํžˆ ๋Š˜์–ด๋‚˜๋ฉด์„œ ๊ด€๋ฆฌ์™€ ํšจ์œจ์„ฑ ๋ฌธ์ œ์™€ ๋”๋ถˆ์–ด, ๊ณณ๊ณณ์— ์„ค์น˜๋œ CCTV๋ฅผ ๊ฐœ๋ณ„ ๊ด€์ œํ•˜๋Š” ๊ฒƒ์œผ๋กœ๋Š” ์‘๊ธ‰ ์ƒ

5 Jul 07, 2022
[ICLR'21] FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

FedBN: Federated Learning on Non-IID Features via Local Batch Normalization This is the PyTorch implemention of our paper FedBN: Federated Learning on

<a href=[email protected]"> 156 Dec 15, 2022
library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

NLopt is a library for nonlinear local and global optimization, for functions with and without gradient information. It is designed as a simple, unifi

Steven G. Johnson 1.4k Dec 25, 2022
Official Implementation for the "An Empirical Investigation of 3D Anomaly Detection and Segmentation" paper.

An Empirical Investigation of 3D Anomaly Detection and Segmentation Project | Paper Official PyTorch Implementation for the "An Empirical Investigatio

Eliahu Horwitz 55 Dec 14, 2022
Python based Advanced AI Assistant

Knick is a virtual artificial intelligence project, fully developed in python. The objective of this project is to develop a virtual assistant that can handle our minor, intermediate as well as heavy

19 Nov 15, 2022