High-quality implementations of standard and SOTA methods on a variety of tasks.

Overview

Uncertainty Baselines

Tests

The goal of Uncertainty Baselines is to provide a template for researchers to build on. The baselines can be a starting point for any new ideas, applications, and/or for communicating with other uncertainty and robustness researchers. This is done in three ways:

  1. Provide high-quality implementations of standard and state-of-the-art methods on standard tasks.
  2. Have minimal dependencies on other files in the codebase. Baselines should be easily forkable without relying on other baselines and generic modules.
  3. Prescribe best practices for uncertainty and robustness benchmarking.

Motivation. There are many uncertainty and robustness implementations across GitHub. However, they are typically one-off experiments for a specific paper (many papers don't even have code). There are no clear examples that uncertainty researchers can build on to quickly prototype their work. Everyone must implement their own baseline. In fact, even on standard tasks, every project differs slightly in their experiment setup, whether it be architectures, hyperparameters, or data preprocessing. This makes it difficult to compare properly against baselines.

Installation

To install the latest development version, run

pip install "git+https://github.com/google/uncertainty-baselines.git#egg=uncertainty_baselines"

There is not yet a stable version (nor an official release of this library). All APIs are subject to change. Installing uncertainty_baselines does not automatically install any backend. For TensorFlow, you will need to install TensorFlow ( tensorflow or tf-nightly), TensorFlow Addons (tensorflow- addons or tfa-nightly), and TensorBoard (tensorboard or tb-nightly). See the extra dependencies one can install in setup.py.

Usage

Baselines

The baselines/ directory includes all the baselines, organized by their training dataset. For example, baselines/cifar/determinstic.py is a Wide ResNet 28-10 obtaining 96.0% test accuracy on CIFAR-10.

Launching with TPUs. You often need TPUs to reproduce baselines. There are three options:

  1. Colab. Colab offers free TPUs. This is the most convenient and budget-friendly option. You can experiment with a baseline by copying its script and running it from scratch. This works well for simple experimentation. However, be careful relying on Colab long-term: TPU access isn't guaranteed, and Colab can only go so far for managing multiple long experiments.

  2. Google Cloud. This is the most flexible option. First, you'll need to create a virtual machine instance (details here).

    Here's an example to launch the BatchEnsemble baseline on CIFAR-10. We assume a few environment variables which are set up with the cloud TPU (details here).

    export BUCKET=gs://bucket-name
    export TPU_NAME=ub-cifar-batchensemble
    export DATA_DIR=$BUCKET/tensorflow_datasets
    export OUTPUT_DIR=$BUCKET/model
    
    python baselines/cifar/batchensemble.py \
        --tpu=$TPU_NAME \
        --data_dir=$DATA_DIR \
        --output_dir=$OUTPUT_DIR

    Note the TPU's accelerator type must align with the number of cores for the baseline (num_cores flag). In this example, BatchEnsemble uses a default of num_cores=8. So the TPU must be set up with accelerator_type=v3-8.

  3. Change the flags. For example, go from 8 TPU cores to 8 GPUs, or reduce the number of cores to train the baseline.

    python baselines/cifar/batchensemble.py \
        --data_dir=/tmp/tensorflow_datasets \
        --output_dir=/tmp/model \
        --use_gpu=True \
        --num_cores=8

    Results may be similar, but ultimately all bets are off. GPU vs TPU may not make much of a difference in practice, especially if you use the same numerical precision. However, changing the number of cores matters a lot. The total batch size during each training step is often determined by num_cores, so be careful!

Datasets

The ub.datasets module consists of datasets following the TensorFlow Datasets API. They add minimal logic such as default data preprocessing. Note: in ipython/colab notebook, one may need to activate tf earger execution mode tf.compat.v1.enable_eager_execution().

import uncertainty_baselines as ub

# Load CIFAR-10, holding out 10% for validation.
dataset_builder = ub.datasets.Cifar10Dataset(split='train',
                                             validation_percent=0.1)
train_dataset = dataset_builder.load(batch_size=FLAGS.batch_size)
for batch in train_dataset:
  # Apply code over batches of the data.

You can also use get to instantiate datasets from strings (e.g., commandline flags).

dataset_builder = ub.datasets.get(dataset_name, split=split, **dataset_kwargs)

To use the datasets in Jax and PyTorch:

for batch in tfds.as_numpy(ds):
  train_step(batch)

Note that tfds.as_numpy calls tensor.numpy(). This invokes an unnecessary copy compared to tensor._numpy().

for batch in iter(ds):
  train_step(jax.tree_map(lambda y: y._numpy(), batch))

Models

The ub.models module consists of models following the tf.keras.Model API.

import uncertainty_baselines as ub

model = ub.models.wide_resnet(input_shape=(32, 32, 3),
                              depth=28,
                              width_multiplier=10,
                              num_classes=10,
                              l2=1e-4)

You can also use get to instantiate models from strings (e.g., commandline flags).

model = ub.models.get(model_name, batch_size=FLAGS.batch_size)

Metrics

We define metrics used across datasets below. All results are reported by roughly 3 significant digits and averaged over 10 runs.

  1. # Parameters. Number of parameters in the model to make predictions after training.

  2. Test Accuracy. Accuracy over the test set. For a dataset of N input-output pairs (xn, yn) where the label yn takes on 1 of K values, the accuracy is

    1/N \sum_{n=1}^N 1[ \argmax{ p(yn | xn) } = yn ],

    where 1 is the indicator function that is 1 when the model's predicted class is equal to the label and 0 otherwise.

  3. Test Cal. Error. Expected calibration error (ECE) over the test set (Naeini et al., 2015). ECE discretizes the probability interval [0, 1] under equally spaced bins and assigns each predicted probability to the bin that encompasses it. The calibration error is the difference between the fraction of predictions in the bin that are correct (accuracy) and the mean of the probabilities in the bin (confidence). The expected calibration error averages across bins.

    For a dataset of N input-output pairs (xn, yn) where the label yn takes on 1 of K values, ECE computes a weighted average

    \sum_{b=1}^B n_b / N | acc(b) - conf(b) |,

    where B is the number of bins, n_b is the number of predictions in bin b, and acc(b) and conf(b) is the accuracy and confidence of bin b respectively.

  4. Test NLL. Negative log-likelihood over the test set (measured in nats). For a dataset of N input-output pairs (xn, yn), the negative log-likelihood is

    -1/N \sum_{n=1}^N \log p(yn | xn).

    It is equivalent up to a constant to the KL divergence from the true data distribution to the model, therefore capturing the overall goodness of fit to the true distribution (Murphy, 2012). It can also be intepreted as the amount of bits (nats) to explain the data (Grunwald, 2004).

  5. Train/Test Runtime. Training runtime is the total wall-clock time to train the model, including any intermediate test set evaluations. Test Runtime refers to the time it takes to run a forward pass on the GPU/TPU, i.e., the duration for which the device is not idle. Note that Test Runtime does not include time on the coordinator: this is more precise in comparing baselines because including the coordinator adds overhead in GPU/TPU scheduling and data fetching---producing high variance results.

Viewing metrics. Uncertainty Baselines writes TensorFlow summaries to the model_dir which can be consumed by TensorBoard. This includes the TensorBoard hyperparameters plugin, which can be used to analyze hyperparamter tuning sweeps.

If you wish to upload to the PUBLICLY READABLE tensorboard.dev, use:

tensorboard dev upload --logdir MODEL_DIR --plugins "scalars,graphs,hparams" --name "My experiment" --description "My experiment details"

References

If you'd like to cite Uncertainty Baselines, use the following BibTeX entry.

Z. Nado, N. Band, M. Collier, J. Djolonga, M. Dusenberry, S. Farquhar, A. Filos, M. Havasi, R. Jenatton, G. Jerfel, J. Liu, Z. Mariet, J. Nixon, S. Padhy, J. Ren, T. Rudner, Y. Wen, F. Wenzel, K. Murphy, D. Sculley, B. Lakshminarayanan, J. Snoek, Y. Gal, and D. Tran. Uncertainty Baselines: Benchmarks for uncertainty & robustness in deep learning, arXiv preprint arXiv:2106.04015, 2021.

@article{nado2021uncertainty,
  author = {Zachary Nado and Neil Band and Mark Collier and Josip Djolonga and Michael Dusenberry and Sebastian Farquhar and Angelos Filos and Marton Havasi and Rodolphe Jenatton and Ghassen Jerfel and Jeremiah Liu and Zelda Mariet and Jeremy Nixon and Shreyas Padhy and Jie Ren and Tim Rudner and Yeming Wen and Florian Wenzel and Kevin Murphy and D. Sculley and Balaji Lakshminarayanan and Jasper Snoek and Yarin Gal and Dustin Tran},
  title = {{Uncertainty Baselines}:  Benchmarks for Uncertainty \& Robustness in Deep Learning},
  journal = {arXiv preprint arXiv:2106.04015},
  year = {2021},
}

Papers using Uncertainty Baselines

The following papers have used code from Uncertainty Baselines:

  1. A Simple Fix to Mahalanobis Distance for Improving Near-OOD Detection
  2. BatchEnsemble: An Alternative Approach to Efficient Ensembles and Lifelong Learning
  3. DEUP: Direct Epistemic Uncertainty Prediction
  4. Distilling Ensembles Improves Uncertainty Estimates
  5. Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors
  6. Exploring the Uncertainty Properties of Neural Networks' Implicit Priors in the Infinite-Width Limit
  7. Hyperparameter Ensembles for Robustness and Uncertainty Quantification
  8. Measuring Calibration in Deep Learning
  9. Measuring and Improving Model-Moderator Collaboration using Uncertainty Estimation
  10. Neural networks with late-phase weights
  11. On the Practicality of Deterministic Epistemic Uncertainty
  12. Prediction-Time Batch Normalization for Robustness under Covariate Shift
  13. Refining the variational posterior through iterative optimization
  14. Revisiting One-vs-All Classifiers for Predictive Uncertainty and Out-of-Distribution Detection in Neural Networks
  15. Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness
  16. Training independent subnetworks for robust prediction

Contributing

Adding a Baseline

  1. Write a script that loads the fixed training dataset and model. Typically, this is forked from other baselines.
  2. After tuning, set the default flag values to the best hyperparameters.
  3. Add the baseline's performance to the table of results in the corresponding README.md.

Adding a Dataset

  1. Add the bibtex reference to references.md.
  2. Add the dataset definition to the datasets/ dir. Every file should have a subclass of datasets.base.BaseDataset, which at a minimum requires implementing a constructor, a tfds.core.DatasetBuilder, and _create_process_example_fn.
  3. Add a test that at a minimum constructs the dataset and checks the shapes of elements.
  4. Add the dataset to datasets/datasets.py for easy access.
  5. Add the dataset class to datasets/__init__.py.

For an example of adding a dataset, see this pull request.

Adding a Model

  1. Add the bibtex reference to references.md.

  2. Add the model definition to the models/ dir. Every file should have a create_model function with the following signature:

    def create_model(
        batch_size: int,
        ...
        **unused_kwargs: Dict[str, Any])
        -> tf.keras.models.Model:
  3. Add a test that at a minimum constructs the model and does a forward pass.

  4. Add the model to models/models.py for easy access.

  5. Add the create_model function to models/__init__.py.

Owner
Google
Google ❤️ Open Source
Google
Multi-query Video Retreival

Multi-query Video Retreival

Princeton Visual AI Lab 17 Nov 22, 2022
The Habitat-Matterport 3D Research Dataset - the largest-ever dataset of 3D indoor spaces.

Habitat-Matterport 3D Dataset (HM3D) The Habitat-Matterport 3D Research Dataset is the largest-ever dataset of 3D indoor spaces. It consists of 1,000

Meta Research 62 Dec 27, 2022
Official code repository for the publication "Latent Equilibrium: A unified learning theory for arbitrarily fast computation with arbitrarily slow neurons"

Latent Equilibrium: A unified learning theory for arbitrarily fast computation with arbitrarily slow neurons This repository contains the code to repr

Computational Neuroscience, University of Bern 3 Aug 04, 2022
QMagFace: Simple and Accurate Quality-Aware Face Recognition

Quality-Aware Face Recognition 26.11.2021 start readme QMagFace: Simple and Accurate Quality-Aware Face Recognition Research Paper Implementation - To

Philipp Terhörst 59 Jan 04, 2023
Code for "Human Pose Regression with Residual Log-likelihood Estimation", ICCV 2021 Oral

Human Pose Regression with Residual Log-likelihood Estimation [Paper] [arXiv] [Project Page] Human Pose Regression with Residual Log-likelihood Estima

JeffLi 347 Dec 24, 2022
[CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search The official implementation of the paper LightTra

Multimedia Research 290 Dec 24, 2022
FasterAI: A library to make smaller and faster models with FastAI.

Fasterai fasterai is a library created to make neural network smaller and faster. It essentially relies on common compression techniques for networks

Nathan Hubens 193 Jan 01, 2023
hipCaffe: the HIP port of Caffe

Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Cent

ROCm Software Platform 126 Dec 05, 2022
MERLOT: Multimodal Neural Script Knowledge Models

merlot MERLOT: Multimodal Neural Script Knowledge Models MERLOT is a model for learning what we are calling "neural script knowledge" -- representatio

Rowan Zellers 190 Dec 22, 2022
ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction. NeurIPS 2021.

Gengshan Yang 59 Nov 25, 2022
Code of paper: "DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks"

DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks Abstract: Adversarial training has been proven to

倪仕文 (Shiwen Ni) 58 Nov 10, 2022
A user-friendly research and development tool built to standardize RL competency assessment for custom agents and environments.

Built with ❤️ by Sam Showalter Contents Overview Installation Dependencies Usage Scripts Standard Execution Environment Development Environment Benchm

SRI-AIC 1 Nov 18, 2021
Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

alias-free-gan-pytorch Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) This implementation

Kim Seonghyeon 502 Jan 03, 2023
Official PyTorch repo for JoJoGAN: One Shot Face Stylization

JoJoGAN: One Shot Face Stylization This is the PyTorch implementation of JoJoGAN: One Shot Face Stylization. Abstract: While there have been recent ad

1.3k Dec 29, 2022
Streamlit app demonstrating an image browser for the Udacity self-driving-car dataset with realtime object detection using YOLO.

Streamlit Demo: The Udacity Self-driving Car Image Browser This project demonstrates the Udacity self-driving-car dataset and YOLO object detection in

Streamlit 992 Jan 04, 2023
A toy compiler that can convert Python scripts to pickle bytecode 🥒

Pickora 🐰 A small compiler that can convert Python scripts to pickle bytecode. Requirements Python 3.8+ No third-party modules are required. Usage us

ꌗᖘ꒒ꀤ꓄꒒ꀤꈤꍟ 68 Jan 04, 2023
A simple AI that will give you si ple task and this is made with python

Crystal-AI A simple AI that will give you si ple task and this is made with python Prerequsites: Python3.6.2 pyttsx3 pip install pyttsx3 pyaudio pip i

CrystalAnd 1 Dec 25, 2021
The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)

Autoregressive Image Generation using Residual Quantization (CVPR 2022) The official implementation of "Autoregressive Image Generation using Residual

Kakao Brain 529 Dec 30, 2022
QA-GNN: Question Answering using Language Models and Knowledge Graphs

QA-GNN: Question Answering using Language Models and Knowledge Graphs This repo provides the source code & data of our paper: QA-GNN: Reasoning with L

Michihiro Yasunaga 434 Jan 04, 2023
StyleTransfer - Open source style transfer project, based on VGG19

StyleTransfer - Open source style transfer project, based on VGG19

Patrick martins de lima 9 Dec 13, 2021