Second-Order Neural ODE Optimizer, NeurIPS 2021 spotlight

Related tags

Deep Learningsnopt
Overview

Second-order Neural ODE Optimizer
(NeurIPS 2021 Spotlight) [arXiv]

✔️ faster convergence in wall-clock time | ✔️ O(1) memory cost |
✔️ better test-time performance | ✔️ architecture co-optimization

This repo provides PyTorch code of Second-order Neural ODE Optimizer (SNOpt), a second-order optimizer for training Neural ODEs that retains O(1) memory cost with superior convergence and test-time performance.

SNOpt result

Installation

This code is developed with Python3. PyTorch >=1.7 (we recommend 1.8.1) and torchdiffeq >= 0.2.0 are required.

  1. Install the dependencies with Anaconda and activate the environment snopt with
    conda env create --file requirements.yaml python=3
    conda activate snopt
  2. [Optional] This repo provides a modification (with 15 lines!) of torchdiffeq that allows SNOpt to collect 2nd-order information during adjoint-based training. If you wish to run torchdiffeq on other commit, simply copy-and-paste the folder to this directory then apply the provided snopt_integration.patch.
    cp -r <path_to_your_torchdiffeq_folder> .
    git apply snopt_integration.patch

Run the code

We provide example code for 8 datasets across image classification (main_img_clf.py), time-series prediction (main_time_series.py), and continuous normalizing flow (main_cnf.py). The command lines to generate similar results shown in our paper are detailed in scripts folder. Datasets will be automatically downloaded to data folder at the first call, and all results will be saved to result folder.

bash scripts/run_img_clf.sh     <dataset> # dataset can be {mnist, svhn, cifar10}
bash scripts/run_time_series.sh <dataset> # dataset can be {char-traj, art-wr, spo-ad}
bash scripts/run_cnf.sh         <dataset> # dataset can be {miniboone, gas}

For architecture (specifically integration time) co-optimization, run

bash scripts/run_img_clf.sh cifar10-t1-optimize

Integration with your workflow

snopt can be integrated flawlessly with existing training work flow. Below we provide a handy checklist and pseudo-code to help your integration. For more complex examples, please refer to main_*.py in this repo.

  • Import torchdiffeq that is patched with snopt integration; otherwise simply use torchdiffeq in this repo.
  • Inherit snopt.ODEFuncBase as your vector field; implement the forward pass in F rather than forward.
  • Create Neural ODE with ode layer(s) using snopt.ODEBlock; implement properties odes and ode_mods.
  • Initialize snopt.SNOpt as preconditioner; call train_itr_setup() and step() before standard optim.zero_grad() and optim.step() (see the code below).
  • That's it 🤓 ! Enjoy your second-order training 🚂 🚅 !
import torch
from torchdiffeq import odeint_adjoint as odesolve
from snopt import SNOpt, ODEFuncBase, ODEBlock
from easydict import EasyDict as dict

class ODEFunc(ODEFuncBase):
    def __init__(self, opt):
        super(ODEFunc, self).__init__(opt)
        self.linear = torch.nn.Linear(input_dim, input_dim)

    def F(self, t, z):
        return self.linear(z)

class NeuralODE(torch.nn.Module):
    def __init__(self, ode):
        super(NeuralODE, self).__init__()
        self.ode = ode

    def forward(self, z):
        return self.ode(z)

    @property
    def odes(self): # in case we have multiple odes, collect them in a list
        return [self.ode]

    @property
    def ode_mods(self): # modules of all ode(s)
        return [mod for mod in self.ode.odefunc.modules()]

# Create Neural ODE
opt = dict(
    optimizer='SNOpt',tol=1e-3,ode_solver='dopri5',use_adaptive_t1=False,snopt_step_size=0.01)
odefunc = ODEFunc(opt)
integration_time = torch.tensor([0.0, 1.0]).float()
ode = ODEBlock(opt, odefunc, odesolve, integration_time)
net = NeuralODE(ode)

# Create SNOpt optimizer
precond = SNOpt(net, eps=0.05, update_freq=100)
optim = torch.optim.SGD(net.parameters(), lr=0.001)

# Training loop
for (x,y) in training_loader:
    precond.train_itr_setup() # <--- additional step for precond
    optim.zero_grad()

    loss = loss_function(net(x), y)
    loss.backward()

    # Run SNOpt optimizer
    precond.step()            # <--- additional step for precond
    optim.step()

What the library actually contains

This snopt library implements the following objects for efficient 2nd-order adjoint-based training of Neural ODEs.

  • ODEFuncBase: Defines the vector field (inherits torch.nn.Module) of Neural ODE.
  • CNFFuncBase: Serves the same purposes as ODEFuncBase except for CNF applications.
  • ODEBlock: A Neural-ODE module (torch.nn.Module) that solves the initial value problem (given the vector field, integration time, and a ODE solver) and handles integration time co-optimization with feedback policy.
  • SNOpt: Our primary 2nd-order optimizer (torch.optim.Optimizer), implemented as a "preconditioner" (see example code above). It takes the following arguments.
    • net is the Neural ODE. Note that the entire network (rather than net.parameters()) is required.
    • eps is the the regularization that stabilizes preconditioning. We recommend the value in [0.05, 0.1].
    • update_freq is the frequency to refresh the 2nd-order information. We recommend the value 100~200.
    • alpha decides the running averages of eigenvalues. We recommend fixing the value to 0.75.
    • full_precond decides whether we wish to precondition layers aside from those in Neural ODEs.
  • SNOptAdjointCollector: A helper to collect information from torchdiffeq to construct 2nd-order matrices.
  • IntegrationTimeOptimizer: Our 2nd-order method that co-optimizes the integration time (i.e., t1). This is done by calling t1_train_itr_setup(train_it) and update_t1() together with optim.zero_grad() and optim.step() (see trainer.py).

The options are passed in as opt and contains the following fields (see options.py for full descriptions.)

  • optimizer is the training method. Use "SNOpt" to enable our method.
  • ode_solver specifies the ODE solver (default is "dopri5") with the absolute/relative tolerance tol.
  • For CNF applications, use divergence_type to specify how divergence should be computed.
  • snopt_step_size determines the step sizes SNOpt will sample along the integration to compute 2nd-order matrices. We recommend the value 0.01 for integration time [0,1], which yield around 100 sampled points.
  • For integration time (t1) co-optimization, enable the flag use_adaptive_t1 and setup the following options.
    • adaptive_t1 specifies t1 optimization method. Choices are "baseline" and "feedback"(ours).
    • t1_lr is the learning rate. We recommend the value in [0.05, 0.1].
    • t1_reg is the coefficient of the quadratic penalty imposed on t1. The performance is quite sensitive to this value. We recommend the value in [1e-4, 1e-3].
    • t1_update_freq is the frequency to update t1. We recommend the value 50~100.

Remarks & Citation

The current library only supports adjoint-based training, yet it can be extended to normal odeint method (stay tuned!). The pre-processing of tabular and uea datasets are adopted from ffjord and NeuralCDE, and the eigenvalue-regularized preconditioning is adopted from EKFAC-pytorch.

If you find this library useful, please cite ⬇️ . Contact me ([email protected]) if you have any questions!

@inproceedings{liu2021second,
  title={Second-order Neural ODE Optimizer},
  author={Liu, Guan-Horng and Chen, Tianrong and Theodorou, Evangelos A},
  booktitle={Advances in Neural Information Processing Systems},
  year={2021},
}
Owner
Guan-Horng Liu
CMU RI → Uber ATG → GaTech ML
Guan-Horng Liu
Using CNN to mimic the driver based on training data from Torcs

Behavioural-Cloning-in-autonomous-driving Using CNN to mimic the driver based on training data from Torcs. Approach First, the data was collected from

Sudharshan 2 Jan 05, 2022
Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation

Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation Paper Multi-Target Adversarial Frameworks for Domain Adaptation in

Valeo.ai 20 Jun 21, 2022
TorchGRL is the source code for our paper Graph Convolution-Based Deep Reinforcement Learning for Multi-Agent Decision-Making in Mixed Traffic Environments for IV 2022.

TorchGRL TorchGRL is the source code for our paper Graph Convolution-Based Deep Reinforcement Learning for Multi-Agent Decision-Making in Mixed Traffi

XXQQ 42 Dec 09, 2022
Dynamic Realtime Animation Control

Our project is targeted at making an application that dynamically detects the user’s expressions and gestures and projects it onto an animation software which then renders a 2D/3D animation realtime

Harsh Avinash 10 Aug 01, 2022
Safe Model-Based Reinforcement Learning using Robust Control Barrier Functions

README Repository containing the code for the paper "Safe Model-Based Reinforcement Learning using Robust Control Barrier Functions". Specifically, an

Yousef Emam 13 Nov 24, 2022
Code and real data for the paper "Counterfactual Temporal Point Processes", available at arXiv.

counterfactual-tpp This is a repository containing code and real data for the paper Counterfactual Temporal Point Processes. Pre-requisites This code

Networks Learning 11 Dec 09, 2022
A command line simple note taking app

Why yet another note taking program? note was designed with a very specific target in mind: me, and my 2354 scraps of paper. It runs from the command

64 Nov 20, 2022
Toolchain to build Yoshi's Island from source code

Project-Y Toolchain to build Yoshi's Island (J) V1.0 from source code, by MrL314 Last updated: September 17, 2021 Setup To begin, download this toolch

MrL314 19 Apr 18, 2022
Heterogeneous Deep Graph Infomax

Heterogeneous-Deep-Graph-Infomax Parameter Setting: HDGI-A: Node-level dimension: 16 Attention head: 4 Semantic-level attention vector: 8 learning rat

52 Oct 31, 2022
GUI for TOAD-GAN, a PCG-ML algorithm for Token-based Super Mario Bros. Levels.

If you are using this code in your own project, please cite our paper: @inproceedings{awiszus2020toadgan, title={TOAD-GAN: Coherent Style Level Gene

Maren A. 13 Dec 14, 2022
Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs.

Lunar Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs. About Lunar can be modified to work

Zeyad Mansour 276 Jan 07, 2023
百度2021年语言与智能技术竞赛机器阅读理解Pytorch版baseline

项目说明: 百度2021年语言与智能技术竞赛机器阅读理解Pytorch版baseline 比赛链接:https://aistudio.baidu.com/aistudio/competition/detail/66?isFromLuge=true 官方的baseline版本是基于paddlepadd

周俊贤 54 Nov 23, 2022
Offical implementation of Shunted Self-Attention via Multi-Scale Token Aggregation

Shunted Transformer This is the offical implementation of Shunted Self-Attention via Multi-Scale Token Aggregation by Sucheng Ren, Daquan Zhou, Shengf

156 Dec 27, 2022
Attentive Implicit Representation Networks (AIR-Nets)

Attentive Implicit Representation Networks (AIR-Nets) Preprint | Supplementary | Accepted at the International Conference on 3D Vision (3DV) teaser.mo

29 Dec 07, 2022
PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

PyStan NOTE: This documentation describes a BETA release of PyStan 3. PyStan is a Python interface to Stan, a package for Bayesian inference. Stan® is

Stan 229 Dec 29, 2022
Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

This is a playground for pytorch beginners, which contains predefined models on popular dataset. Currently we support mnist, svhn cifar10, cifar100 st

Aaron Chen 2.4k Dec 28, 2022
CompilerGym is a library of easy to use and performant reinforcement learning environments for compiler tasks

CompilerGym is a library of easy to use and performant reinforcement learning environments for compiler tasks

Facebook Research 721 Jan 03, 2023
Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.

Unsupervised Phone and Word Segmentation using Vector-Quantized Neural Networks Overview Unsupervised phone and word segmentation on speech data is pe

Herman Kamper 13 Dec 11, 2022
Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"

Medical-Transformer Pytorch Code for the paper "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation" About this repo: This repo

Jeya Maria Jose 615 Dec 25, 2022
Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation

NVIDIA Research Projects 4.8k Jan 09, 2023