Implementation of the HMAX model of vision in PyTorch

Overview

PyTorch implementation of HMAX

PyTorch implementation of the HMAX model that closely follows that of the MATLAB implementation of The Laboratory for Computational Cognitive Neuroscience:

http://maxlab.neuro.georgetown.edu/hmax.html

The S and C units of the HMAX model can almost be mapped directly onto TorchVision's Conv2d and MaxPool2d layers, where channels are used to store the filters for different orientations. However, HMAX also implements multiple scales, which doesn't map nicely onto the existing TorchVision functionality. Therefore, each scale has its own Conv2d layer, which are executed in parallel.

Here is a schematic overview of the network architecture:

layers consisting of units with increasing scale
S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1 S1
 \ /   \ /   \ /   \ /   \ /   \ /   \ /   \ /
  C1    C1    C1    C1    C1    C1    C1    C1
   \     \     \    |     /     /     /     /
           ALL-TO-ALL CONNECTIVITY
   /     /     /    |     \     \     \     \
  S2    S2    S2    S2    S2    S2    S2    S2
   |     |     |     |     |     |     |     |
  C2    C2    C2    C2    C2    C2    C2    C2

Installation

This script depends on the NumPy, SciPy, PyTorch and TorchVision packages.

Clone the repository somewhere and run the example.py script:

git clone https://github.com/wmvanvliet/pytorch_hmax
python example.py

Usage

See the example.py script on how to run the model on 10 example images.

You might also like...
Pytorch implementation of
Pytorch implementation of "Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet"

Token Labeling: Training an 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet (arxiv) This is a Pytorch implementation of our te

This repository contains a pytorch implementation of
This repository contains a pytorch implementation of "StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision".

StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision | Project Page | Paper | This repository contains a pytorch implementation of "St

PyTorch implementation of
PyTorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision" Tolstikhin et al. (2021)

mlp-mixer-pytorch PyTorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision" Tolstikhin et al. (2021) Usage import torch from mlp_mixer

Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.
Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

Less is More: Pay Less Attention in Vision Transformers Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers. By

A PyTorch Implementation of ViT (Vision Transformer)
A PyTorch Implementation of ViT (Vision Transformer)

ViT - Vision Transformer This is an implementation of ViT - Vision Transformer by Google Research Team through the paper "An Image is Worth 16x16 Word

Pytorch implementation of the DeepDream computer vision algorithm
Pytorch implementation of the DeepDream computer vision algorithm

deep-dream-in-pytorch Pytorch (https://github.com/pytorch/pytorch) implementation of the deep dream (https://en.wikipedia.org/wiki/DeepDream) computer

A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.
A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers.

ViTGAN: Training GANs with Vision Transformers A PyTorch implementation of ViTGAN based on paper ViTGAN: Training GANs with Vision Transformers. Refer

Unofficial PyTorch implementation of MobileViT based on paper
Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

MobileViT RegNet Unofficial PyTorch implementation of MobileViT based on paper MOBILEVIT: LIGHT-WEIGHT, GENERAL-PURPOSE, AND MOBILE-FRIENDLY VISION TR

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners This repository is built upon BEiT, thanks very much! Now, we on

Comments
  • Provide direct (not nested) path to stimuli

    Provide direct (not nested) path to stimuli

    Hi,

    great repo and effort. I really admire your courage to write HMAX in python. I have a question about loading data in, namely about this part of the code: https://github.com/wmvanvliet/pytorch_hmax/blob/master/example.py#L18

    I know that by default, the ImageFolder expects to have nested folders (as stated in docs or mentioned in this issue) but it's quite clumsy in this case. Eg even if you look at your example, having subfolders for each photo just doesn't look good. Would you have a way how to go around this? Any suggestion on how to provide only a path to all images and not this nested path? I was reading some discussions but haven't figured out how to implement it.


    One more question (I didn't want to open an extra issue for that), shouldn't in https://github.com/wmvanvliet/pytorch_hmax/blob/master/example.py#L28 be batch_size=len(images)) instead of batch_size=10 (written symbolically)?

    Thanks.

    opened by jankaWIS 5
  • Input of non-square images fails

    Input of non-square images fails

    Hi again, I was playing a bit around and discovered that it fails for non-square dimensional images, i.e. where height != width. Maybe I was looking wrong or missed something, but I haven't found it mentioned anywhere and the docs kind of suggests that it can be any height and any width. The same goes for the description of the layers (e.g. s1). In the other issue, you mentioned that

    One thing you may want to add to this transformer pipeline is a transforms.Resize followed by a transforms.CenterCrop to ensure all images end up having the same height and width

    but didn't mention why. Why is it not possible for non-square images? Is there any workaround if one doesn't want to crop? Maybe to pad like in this post*?

    To demonstrate the issue:

    import os
    import torch
    from torch.utils.data import DataLoader
    from torchvision import datasets, transforms
    import pickle
    
    import hmax
    
    path_hmax = './'
    # Initialize the model with the universal patch set
    print('Constructing model')
    model = hmax.HMAX(os.path.join(path_hmax,'universal_patch_set.mat'))
    
    # A folder with example images
    example_images = datasets.ImageFolder(
        os.path.join(path_hmax,'example_images'),
        transform=transforms.Compose([
            transforms.Resize((400, 500)),
            transforms.CenterCrop((400, 500)),
            transforms.Grayscale(),
            transforms.ToTensor(),
            transforms.Lambda(lambda x: x * 255),
        ])
    )
    
    # A dataloader that will run through all example images in one batch
    dataloader = DataLoader(example_images, batch_size=10)
    
    # Determine whether there is a compatible GPU available
    device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
    
    # Run the model on the example images
    print('Running model on', device)
    model = model.to(device)
    for X, y in dataloader:
        s1, c1, s2, c2 = model.get_all_layers(X.to(device))
    
    print('[done]')
    

    will give an error in the forward function:

    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    [<ipython-input-7-a6bab15d9571>](https://localhost:8080/#) in <module>()
         33 model = model.to(device)
         34 for X, y in dataloader:
    ---> 35     s1, c1, s2, c2 = model.get_all_layers(X.to(device))
         36 
         37 # print('Saving output of all layers to: output.pkl')
    
    4 frames
    [/gdrive/MyDrive/Colab Notebooks/data_HMAX/pytorch_hmax/hmax.py](https://localhost:8080/#) in forward(self, c1_outputs)
        285             conv_output = conv_output.view(
        286                 -1, self.num_orientations, self.num_patches, conv_output_size,
    --> 287                 conv_output_size)
        288 
        289             # Pool over orientations
    
    RuntimeError: shape '[-1, 4, 400, 126, 126]' is invalid for input of size 203616000
    

    *Code for that:

    import torchvision.transforms.functional as F
    
    class SquarePad:
        def __call__(self, image):
            max_wh = max(image.size)
            p_left, p_top = [(max_wh - s) // 2 for s in image.size]
            p_right, p_bottom = [max_wh - (s+pad) for s, pad in zip(image.size, [p_left, p_top])]
            padding = (p_left, p_top, p_right, p_bottom)
            return F.pad(image, padding, 0, 'constant')
    
    target_image_size = (224, 224)  # as an example
    # now use it as the replacement of transforms.Pad class
    transform=transforms.Compose([
        SquarePad(),
        transforms.Resize(target_image_size),
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
    ])
    
    opened by jankaWIS 1
Releases(v0.2)
  • v0.2(Jul 7, 2022)

    For this version, I've modified the HMAX code a bit to exactly match that of the original MATLAB code of Maximilian Riesenhuber. This is a bit slower and consumes a bit more memory, as the code needs to work around some subtle differences between the MATLAB and PyTorch functions. Perhaps in the future, we could add an "optimized" model that is allowed to deviate from the reference implementation for increased efficiency, but for now I feel it is more important to follow the reference implementation to the letter.

    Major change: default C2 activation function is now 'euclidean' instead of 'gaussian'.

    Source code(tar.gz)
    Source code(zip)
  • v0.1(Jul 7, 2022)

Owner
Marijn van Vliet
Research Software Engineer.
Marijn van Vliet
Code & Data for the Paper "Time Masking for Temporal Language Models", WSDM 2022

Time Masking for Temporal Language Models This repository provides a reference implementation of the paper: Time Masking for Temporal Language Models

Guy Rosin 12 Jan 06, 2023
Python library for computer vision labeling tasks. The core functionality is to translate bounding box annotations between different formats-for example, from coco to yolo.

PyLabel pip install pylabel PyLabel is a Python package to help you prepare image datasets for computer vision models including PyTorch and YOLOv5. I

PyLabel Project 176 Jan 01, 2023
Train a deep learning net with OpenStreetMap features and satellite imagery.

DeepOSM Classify roads and features in satellite imagery, by training neural networks with OpenStreetMap (OSM) data. DeepOSM can: Download a chunk of

TrailBehind, Inc. 1.3k Nov 24, 2022
This is our ARTS test set, an enriched test set to probe Aspect Robustness of ABSA.

This is the repository for our 2020 paper "Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis". Data We provide

35 Nov 16, 2022
Repository for "Space-Time Correspondence as a Contrastive Random Walk" (NeurIPS 2020)

Space-Time Correspondence as a Contrastive Random Walk This is the repository for Space-Time Correspondence as a Contrastive Random Walk, published at

A. Jabri 239 Dec 27, 2022
git《USD-Seg:Learning Universal Shape Dictionary for Realtime Instance Segmentation》(2020) GitHub: [fig2]

USD-Seg This project is an implement of paper USD-Seg:Learning Universal Shape Dictionary for Realtime Instance Segmentation, based on FCOS detector f

Ruolin Ye 80 Nov 28, 2022
FcaNet: Frequency Channel Attention Networks

FcaNet: Frequency Channel Attention Networks PyTorch implementation of the paper "FcaNet: Frequency Channel Attention Networks". Simplest usage Models

327 Dec 27, 2022
QuanTaichi evaluation suite

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021) Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, W

Taichi Developers 120 Jan 04, 2023
New AidForBlind - Various Libraries used like OpenCV and other mentioned in Requirements.txt

AidForBlind Recommended PyCharm IDE Various Libraries used like OpenCV and other

Aalhad Chandewar 1 Jan 13, 2022
PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021)

PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021) This repo presents PyTorch implementation of M

Evgeny 79 Dec 19, 2022
This repository contains the needed resources to build the HIRID-ICU-Benchmark dataset

HiRID-ICU-Benchmark This repository contains the needed resources to build the HIRID-ICU-Benchmark dataset for which the manuscript can be found here.

Biomedical Informatics at ETH Zurich 30 Dec 16, 2022
Quick program made to generate alpha and delta tables for Hidden Markov Models

HMM_Calc Functions for generating Alpha and Delta tables from a Hidden Markov Model. Parameters: a: Matrix of transition probabilities. a[i][j] = a_{i

Adem Odza 1 Dec 04, 2021
Weighing Counts: Sequential Crowd Counting by Reinforcement Learning

LibraNet This repository includes the official implementation of LibraNet for crowd counting, presented in our paper: Weighing Counts: Sequential Crow

Hao Lu 18 Nov 05, 2022
Plover-tapey-tape: an alternative to Plover’s built-in paper tape

plover-tapey-tape plover-tapey-tape is an alternative to Plover’s built-in paper

7 May 29, 2022
U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

Dennis Bappert 104 Nov 25, 2022
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

AugMix Introduction We propose AugMix, a data processing technique that mixes augmented images and enforces consistent embeddings of the augmented ima

Google Research 876 Dec 17, 2022
Learning to See by Looking at Noise

Learning to See by Looking at Noise This is the official implementation of Learning to See by Looking at Noise. In this work, we investigate a suite o

Manel Baradad Jurjo 82 Dec 24, 2022
AI pipelines for Nvidia Jetson Platform

Jetson Multicamera Pipelines Easy-to-use realtime CV/AI pipelines for Nvidia Jetson Platform. This project: Builds a typical multi-camera pipeline, i.

NVIDIA AI IOT 96 Dec 23, 2022
Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296

Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions This repo contains the dataset and code for the paper Benchmarking Ro

Jiachen Sun 168 Dec 29, 2022
Implementation of the paper "Fine-Tuning Transformers: Vocabulary Transfer"

Transformer-vocabulary-transfer Implementation of the paper "Fine-Tuning Transfo

LEYA 13 Nov 30, 2022