Pytorch implementation of TailCalibX : Feature Generation for Long-tail Classification

Last update: Jan 02, 2023

Overview

TailCalibX : Feature Generation for Long-tail Classification

by Rahul Vigneswaran, Marc T. Law, Vineeth N. Balasubramanian, Makarand Tapaswi

🐣 Easy Usage (Recommended way to use our method)
- 💻 Installation
- 👨‍💻 Example Code
🧪 Advanced Usage
🏋️‍♂️ Trained weights
🪀 Results on a Toy Dataset
🌴 Directory Tree
📃 Citation
👁 Contributing
❤ About me
✨ Extras
📝 License

🐣 Easy Usage (Recommended way to use our method)

⚠ Caution: TailCalibX is just TailCalib employed multiple times. Specifically, we generate a set of features once every epoch and use them to train the classifier. In order to mimic that, three things must be done at every epoch in the following order:

Collect all the features from your dataloader.
Use the tailcalib package to make the features balanced by generating samples.
Train the classifier.
Repeat.

💻 Installation

Use the package manager pip to install tailcalib.

pip install tailcalib

👨‍💻 Example Code

Check the instruction here for a much more detailed python package information.

# Import
from tailcalib import tailcalib

# Initialize
a = tailcalib(base_engine="numpy")   # Options: "numpy", "pytorch"

# Imbalanced random fake data
import numpy as np
X = np.random.rand(200,100)
y = np.random.randint(0,10, (200,))

# Balancing the data using "tailcalib"
feat, lab, gen = a.generate(X=X, y=y)

# Output comparison
print(f"Before: {np.unique(y, return_counts=True)}")
print(f"After: {np.unique(lab, return_counts=True)}")

🧪 Advanced Usage

✔ Things to do before you run the code from this repo

Change the data_root for your dataset in main.py.
If you are using wandb logging (Weights & Biases), make sure to change the wandb.init in main.py accordingly.

📀 How to use?

For just the methods proposed in this paper :
- For CIFAR100-LT: run_TailCalibX_CIFAR100-LT.sh
- For mini-ImageNet-LT : run_TailCalibX_mini-ImageNet-LT.sh
For all the results show in the paper :
- For CIFAR100-LT: run_all_CIFAR100-LT.sh
- For mini-ImageNet-LT : run_all_mini-ImageNet-LT.sh

📚 How to create the mini-ImageNet-LT dataset?

Check Notebooks/Create_mini-ImageNet-LT.ipynb for the script that generates the mini-ImageNet-LT dataset with varying imbalance ratios and train-test-val splits.

⚙ Arguments

--seed : Select seed for fixing it.
- Default : 1
--gpu : Select the GPUs to be used.
- Default : "0,1,2,3"
--experiment: Experiment number (Check 'libs/utils/experiment_maker.py').
- Default : 0.1
--dataset : Dataset number.
- Choices : 0 - CIFAR100, 1 - mini-imagenet
- Default : 0
--imbalance : Select Imbalance factor.
- Choices : 0: 1, 1: 100, 2: 50, 3: 10
- Default : 1
--type_of_val : Choose which dataset split to use.
- Choices: "vt": val_from_test, "vtr": val_from_train, "vit": val_is_test
- Default : "vit"
--cv1 to --cv9 : Custom variable to use in experiments - purpose changes according to the experiment.
- Default : "1"
--train : Run training sequence
- Default : False
--generate : Run generation sequence
- Default : False
--retraining : Run retraining sequence
- Default : False
--resume : Will resume from the 'latest_model_checkpoint.pth' and wandb if applicable.
- Default : False
--save_features : Collect feature representations.
- Default : False
--save_features_phase : Dataset split of representations to collect.
- Choices : "train", "val", "test"
- Default : "train"
--config : If you have a yaml file with appropriate config, provide the path here. Will override the 'experiment_maker'.
- Default : None

🏋️‍♂️ Trained weights

Experiment	CIFAR100-LT (ResNet32, seed 1, Imb 100)	mini-ImageNet-LT (ResNeXt50)
TailCalib	Git-LFS	Git-LFS
TailCalibX	Git-LFS	Git-LFS
CBD + TailCalibX	Git-LFS	Git-LFS

🪀 Results on a Toy Dataset

The higher the Imb ratio, the more imbalanced the dataset is. Imb ratio = maximum_sample_count / minimum_sample_count.

Check this notebook to play with the toy example from which the plot below was generated.

🌴 Directory Tree

TailCalibX
├── libs
│   ├── core
│   │   ├── ce.py
│   │   ├── core_base.py
│   │   ├── ecbd.py
│   │   ├── modals.py
│   │   ├── TailCalib.py
│   │   └── TailCalibX.py
│   ├── data
│   │   ├── dataloader.py
│   │   ├── ImbalanceCIFAR.py
│   │   └── mini-imagenet
│   │       ├── 0.01_test.txt
│   │       ├── 0.01_train.txt
│   │       └── 0.01_val.txt
│   ├── loss
│   │   ├── CosineDistill.py
│   │   └── SoftmaxLoss.py
│   ├── models
│   │   ├── CosineDotProductClassifier.py
│   │   ├── DotProductClassifier.py
│   │   ├── ecbd_converter.py
│   │   ├── ResNet32Feature.py
│   │   ├── ResNext50Feature.py
│   │   └── ResNextFeature.py
│   ├── samplers
│   │   └── ClassAwareSampler.py
│   └── utils
│       ├── Default_config.yaml
│       ├── experiments_maker.py
│       ├── globals.py
│       ├── logger.py
│       └── utils.py
├── LICENSE
├── main.py
├── Notebooks
│   ├── Create_mini-ImageNet-LT.ipynb
│   └── toy_example.ipynb
├── readme_assets
│   ├── method.svg
│   └── toy_example_output.svg
├── README.md
├── run_all_CIFAR100-LT.sh
├── run_all_mini-ImageNet-LT.sh
├── run_TailCalibX_CIFAR100-LT.sh
└── run_TailCalibX_mini-imagenet-LT.sh

Ignored tailcalib_pip as it is for the tailcalib pip package.

📃 Citation

@inproceedings{rahul2021tailcalibX,
    title   = {{Feature Generation for Long-tail Classification}},
    author  = {Rahul Vigneswaran and Marc T. Law and Vineeth N. Balasubramanian and Makarand Tapaswi},
    booktitle = {ICVGIP},
    year = {2021}
}

👁 Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

❤ About me

Rahul Vigneswaran

✨ Extras

🐝 Long-tail buzz : If you are interested in deep learning research which involves long-tailed / imbalanced dataset, take a look at Long-tail buzz to learn about the recent trending papers in this field.

📝 License

MIT

Pytorch implementation of TailCalibX : Feature Generation for Long-tail Classification

Related tags

Overview

TailCalibX : Feature Generation for Long-tail Classification

Table of contents

🐣 Easy Usage (Recommended way to use our method)

💻 Installation

👨‍💻 Example Code

🧪 Advanced Usage

✔ Things to do before you run the code from this repo

📀 How to use?

📚 How to create the mini-ImageNet-LT dataset?

⚙ Arguments

🏋️‍♂️ Trained weights

🪀 Results on a Toy Dataset

🌴 Directory Tree

📃 Citation

👁 Contributing

❤ About me

✨ Extras

📝 License

Owner

Rahul Vigneswaran

The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment

PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

RAANet: Range-Aware Attention Network for LiDAR-based 3D Object Detection with Auxiliary Density Level Estimation

Source code of the paper "Deep Learning of Latent Variable Models for Industrial Process Monitoring".

Split Variational AutoEncoder

Expert Finding in Legal Community Question Answering

Complete system for facial identity system

The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction".

Simulation code and tutorial for BBHnet training data

Learning cell communication from spatial graphs of cells

This is a repository for a No-Code object detection inference API using the OpenVINO. It's supported on both Windows and Linux Operating systems.

Autonomous Robots Kalman Filters

Low-code/No-code approach for deep learning inference on devices

PyTorch common framework to accelerate network implementation, training and validation

PyTorch implementation of the paper The Lottery Ticket Hypothesis for Object Recognition

Pytorch Geometric Tutorials

A "gym" style toolkit for building lightweight Neural Architecture Search systems

Optimising chemical reactions using machine learning

This repository is for EMNLP 2021 paper: It is Not as Good as You Think! Evaluating Simultaneous Machine Translation on Interpretation Data

Fortuitous Forgetting in Connectionist Networks