Weakly Supervised End-to-End Learning (NeurIPS 2021)

Last update: Jan 06, 2023

Overview

WeaSEL: Weakly Supervised End-to-end Learning

This is a PyTorch-Lightning-based framework, based on our End-to-End Weak Supervision paper (NeurIPS 2021), that allows you to train your favorite neural network for weakly-supervised classification¹

only with multiple labeling functions (LFs)², i.e. without any labeled training data!
in an end-to-end manner, i.e. directly train and evaluate your neural net (end-model from here on), there's no need to train a separate label model any more as in Snorkel & co,
with better test set performance and enhanced robustness against correlated or inaccurate LFs than prior methods like Snorkel

¹ This includes learning from crowdsourced labels or annotations!
² LFs are labeling heuristics, that output noisy labels for (subsets of) the training data (e.g. crowdworkers or keyword detectors).

Credits

The following template was extremely useful as source of inspiration and for getting started with the PL+Hydra implementation: ashleve/lightning-hydra-template
Weasel image credits go to Rohan Chang for this Unsplash-licensed image

Getting Started

This library assumes familiarity with (multi-source) weak supervision, if that's not the case you may want to first learn its basics in e.g. this overview slides from Stanford or this Snorkel tutorial.

That being said, have a look at our examples and the notebooks therein showing you how to use Weasel for your own dataset, LF set, or end-model. E.g.:

A high-level starter tutorial, with few code, many explanations and including Snorkel as a baseline (so that if you are familiar with Snorkel you can see the similarities and differences to Weasel).
See how the whole WeaSEL pipeline works with all details, necessary steps and definitions for a new dataset & custom end-model. This notebook will probably make you learn the most about WeaSEL and how to apply it to your own problem.
A realistic ML experiment script with all that's part of a ML pipeline, including logging to Weight&Biases, arbitrary callbacks, and eventually retrieving your fully trained end-model.

Reproducibility

Please have a look at the research code branch, which operates on pure PyTorch.

Installation

1. New environment (recommended, but optional)

conda create --name weasel python=3.7  # or other python version >=3.7
conda activate weasel

2a: From source

python -m pip install git+https://github.com/autonlab/weasel#egg=weasel[all]

2b: From source, editable install

git clone https://github.com/autonlab/weasel.git
cd weasel
pip install -e .[all]

Minimal dependencies

Minimal dependencies, in particular not using Hydra, can be installed with

python -m pip install git+https://github.com/autonlab/weasel

The needed environment corresponds to conda env create -f env_gpu_minimal.yml.

If you choose to use this variant, you won't be able to run some of the examples: You may want to have a look at this notebook that walks you through how to use Weasel without Hydra as the config manager.

Note: Weasel is under active development, some uncovered edge cases might exist, and any feedback is very welcomed!

Apply WeaSEL to your own problem

Configuration with Hydra

Optional: This template config will help you get started with your own application, an analogous config is used in this tutorial script that you may want to check out.

Pre-defined or custom downstream models & Baselines

Please have a look at the detailed instructions in this Readme.

Using your own dataset and/or labeling heuristics

Please have a look at the detailed instructions in this Readme.

Citation

@article{cachay2021endtoend,
  author={R{\"u}hling Cachay, Salva and Boecking, Benedikt and Dubrawski, Artur},
  journal={Advances in Neural Information Processing Systems}, 
  title={End-to-End Weak Supervision},
  year={2021}
}

Weakly Supervised End-to-End Learning (NeurIPS 2021)

Related tags

Overview

WeaSEL: Weakly Supervised End-to-end Learning

Getting Started

Reproducibility

Installation

Apply WeaSEL to your own problem

Configuration with Hydra

Pre-defined or custom downstream models & Baselines

Using your own dataset and/or labeling heuristics

Citation

Owner

Auton Lab, Carnegie Mellon University

[NeurIPS2021] Code Release of Learning Transferable Perturbations

This repo is to present various code demos on how to use our Graph4NLP library.

YoloV5 implemented by TensorFlow2 , with support for training, evaluation and inference.

Logsig-RNN: a novel network for robust and efficient skeleton-based action recognition

Official code of "Mitigating the Mutual Error Amplification for Semi-Supervised Object Detection"

Invertible conditional GANs for image editing

CompilerGym is a library of easy to use and performant reinforcement learning environments for compiler tasks

Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter

A collection of educational notebooks on multi-view geometry and computer vision.

An implementation of the BADGE batch active learning algorithm.

A Python library for generating new text from existing samples.

The challenge for Quantum Coalition Hackathon 2021

PyTorch DepthNet Training on Still Box dataset

PyTorch wrapper for Taichi data-oriented class

alfred-py: A deep learning utility library for human

PyTorch implementation of the paper: Long-tail Learning via Logit Adjustment

Repository for paper "Non-intrusive speech intelligibility prediction from discrete latent representations"

Code for "OctField: Hierarchical Implicit Functions for 3D Modeling (NeurIPS 2021)"

Code for the paper Open Sesame: Getting Inside BERT's Linguistic Knowledge.

On Nonlinear Latent Transformations for GAN-based Image Editing - PyTorch implementation

Weakly Supervised End-to-End Learning (NeurIPS 2021)

Related tags

Overview

WeaSEL: Weakly Supervised End-to-end Learning

Getting Started

Reproducibility

Installation

Apply WeaSEL to your own problem

Configuration with Hydra

Pre-defined or custom downstream models & Baselines

Using your own dataset and/or labeling heuristics

Citation

Owner

Auton Lab, Carnegie Mellon University

[NeurIPS2021] Code Release of Learning Transferable Perturbations

This repo is to present various code demos on how to use our Graph4NLP library.

YoloV5 implemented by TensorFlow2 , with support for training, evaluation and inference.

Logsig-RNN: a novel network for robust and efficient skeleton-based action recognition

Official code of "Mitigating the Mutual Error Amplification for Semi-Supervised Object Detection"

Invertible conditional GANs for image editing

CompilerGym is a library of easy to use and performant reinforcement learning environments for compiler tasks

Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter

A collection of educational notebooks on multi-view geometry and computer vision.

An implementation of the BADGE batch active learning algorithm.

A Python library for generating new text from existing samples.

The challenge for Quantum Coalition Hackathon 2021

PyTorch DepthNet Training on Still Box dataset

PyTorch wrapper for Taichi data-oriented class

alfred-py: A deep learning utility library for **human**

PyTorch implementation of the paper: Long-tail Learning via Logit Adjustment

Repository for paper "Non-intrusive speech intelligibility prediction from discrete latent representations"

Code for "OctField: Hierarchical Implicit Functions for 3D Modeling (NeurIPS 2021)"

Code for the paper Open Sesame: Getting Inside BERT's Linguistic Knowledge.

On Nonlinear Latent Transformations for GAN-based Image Editing - PyTorch implementation

alfred-py: A deep learning utility library for human