Differentiable SDE solvers with GPU support and efficient sensitivity analysis.

Overview

PyTorch Implementation of Differentiable SDE Solvers Python package

This library provides stochastic differential equation (SDE) solvers with GPU support and efficient backpropagation.


Installation

pip install git+https://github.com/google-research/torchsde.git

Requirements: Python >=3.6 and PyTorch >=1.6.0, <1.8.0.

Documentation

Available here.

Examples

Quick example

import torch
import torchsde

batch_size, state_size, brownian_size = 32, 3, 2
t_size = 20

class SDE(torch.nn.Module):
    noise_type = 'general'
    sde_type = 'ito'

    def __init__(self):
        super().__init__()
        self.mu = torch.nn.Linear(state_size, 
                                  state_size)
        self.sigma = torch.nn.Linear(state_size, 
                                     state_size * brownian_size)

    # Drift
    def f(self, t, y):
        return self.mu(y)  # shape (batch_size, state_size)

    # Diffusion
    def g(self, t, y):
        return self.sigma(y).view(batch_size, 
                                  state_size, 
                                  brownian_size)

sde = SDE()
y0 = torch.full((batch_size, state_size), 0.1)
ts = torch.linspace(0, 1, t_size)
# Initial state y0, the SDE is solved over the interval [ts[0], ts[-1]].
# ys will have shape (t_size, batch_size, state_size)
ys = torchsde.sdeint(sde, y0, ts)

Notebook

examples/demo.ipynb gives a short guide on how to solve SDEs, including subtle points such as fixing the randomness in the solver and the choice of noise types.

Latent SDE

examples/latent_sde.py learns a latent stochastic differential equation, as in Section 5 of [1]. The example fits an SDE to data, whilst regularizing it to be like an Ornstein-Uhlenbeck prior process. The model can be loosely viewed as a variational autoencoder with its prior and approximate posterior being SDEs. This example can be run via

python -m examples.latent_sde --train-dir <TRAIN_DIR>

The program outputs figures to the path specified by <TRAIN_DIR>. Training should stabilize after 500 iterations with the default hyperparameters.

Neural SDEs as GANs

examples/sde_gan.py learns an SDE as a GAN, as in [2]. The example trains an SDE as the generator of a GAN, whilst using a neural CDE [3] as the discriminator. This example can be run via

python -m examples.sde_gan

Citation

If you found this codebase useful in your research, please consider citing either or both of:

@article{li2020scalable,
  title={Scalable gradients for stochastic differential equations},
  author={Li, Xuechen and Wong, Ting-Kam Leonard and Chen, Ricky T. Q. and Duvenaud, David},
  journal={International Conference on Artificial Intelligence and Statistics},
  year={2020}
}
@article{kidger2020neuralsde,
  title={Neural {SDE}s {M}ade {E}asy: {SDE}s are {I}nfinite-{D}imensional {GAN}s},
  author={Kidger, Patrick and Foster, James and Li, Xuechen and Oberhauser, Harald and Lyons, Terry},
  journal={arXiv:2102.03657},
  year={2021}
}

References

[1] Xuechen Li, Ting-Kam Leonard Wong, Ricky T. Q. Chen, David Duvenaud. "Scalable Gradients for Stochastic Differential Equations". International Conference on Artificial Intelligence and Statistics. 2020. [arXiv]

[2] Patrick Kidger, James Foster, Xuechen Li, Harald Oberhauser, Terry Lyons. "Neural SDEs as Infinite-Dimensional GANs". Machine Learning and the Physical Sciences, NeurIPS 2020 [arXiv]

[3] Patrick Kidger, James Morrill, James Foster, Terry Lyons, "Neural Controlled Differential Equations for Irregular Time Series". Neural Information Processing Systems 2020. [arXiv]


This is a research project, not an official Google product.

Comments
  • Add support for Stratonovich adjoint

    Add support for Stratonovich adjoint

    Opening this up is mainly to let you know this is in progress.

    The remaining stuff:

    • [x] Write new adjoint selection function
    • [x] Write new check contract.
    • [x] Fix broken test for sdeint.
    • [x] Fix broken tests for adjoint.
    • [x] Fix broken tests for adjoint logqp.
    • [x] Make BrownianPath support space-time, davie, foster Levy area.
    • [ ] ~~Add gdg_jvp support.~~ (Do this in another PR)
    • [x] Example for gradient compute with adjoint for Strat SDE. (this shows adjoint is working for Stratonovich SDEs)

    Update: It's all done. Would appreciate comments in getting the code in better shape, as although I tried to think through carefully about most code, some parts were done hastily. Now all tests pass. @patrick-kidger

    This PR is really long, I'd prefer not to block/be blocked by others' work. So I'm leaving gdg_jvp for adjoint in another PR.

    A couple of random things that playing around with the new code has make me think about:

    • Type check for scalar: We might want to think more carefully about the interface being supported; it seems that scalar noise matches better with diagonal noise in terms of functionality, but its current interface matches up with general/additive
    • Type check slowing down sdeint and sdeint_adjoint for small problems. I noticed this after running on some small examples with the new code.
    • Unifying some of the code in brownian.
    • From now on, whenever we plan to change something, we should always run the test suite and make sure everything there passes. If anything gets broken while modifying the codebase, we should fix it according to the test (or modify the test; less preferable).

    Update: Additional caveats:

    • BrownianPath is currently not optimized; I could do these optimization in another PR. The main problems are:
      • search over t is global, whereas it could be made local
      • cholesky and matinv slow things down and might not be the most numerically stable
    opened by lxuechen 24
  • Milstein (Strat), Milstein grad-free (Ito + Strat)

    Milstein (Strat), Milstein grad-free (Ito + Strat)

    @patrick-kidger Hi! I'm sharing a WIP draft with three methods: Milstein for Strat, Milstein grad-free for Ito and Strat - done as described - grad-free variants are inside original methods where Milstein Strat is a separate one (as agreed for code duplication, but whole code differs in - dt - should I unify it into one method with or leave that duplicaton?) Grad-free mode can be used by passing {'grad_free': True} in options: False or lack of that option results in default sde.gdg_prod usage.

    Issues/questions:

    1. I think that method sde.g_prod shouldn't compute diffusion g and perform mul with I_k -> in Milstein grad-free I need to compute g, g * I_k and g * v so I end up with g being calculated three times instead of one. What do you think about extending sde API by adding def prod(self, g, v) method that will apply correct seq_mul according to the noise type? (that will introduce a bit of code duplication I guess)
    2. I will finish writing diagnostics when fix for gdg_prod bug is merged. But stratanovich_diagonal.py with grad-free Milstein works fine. Here's one plot and rate: I think that usage of grad-free variant changes order to 0.5, right? (I can if it in code)
    3. sqrt_dt = torch.sqrt(dt) if isinstance(dt, torch.Tensor) else math.sqrt(dt) appears in few places now - how about introducing misc.sqrt?
    opened by mtsokol 19
  • Brownian unification

    Brownian unification

    As per #60. (Also obsoletes #54, a great many things in #15, and a bit of #22)

    Merged BrownianPath/Tree/Interval. BrownianPath and Tree are wrappers for BrownianInterval with particular arguments. In particular this means all of them are now on the same interface, providing the same functionality, and of course this is much easier to support. Plus, no C++ or blist cluttering things up.

    This also includes several improvements to BrownianInterval:

    • Use of trampolining and tail recursion
    • Fewer SeedSequences are now spawned
    • Those SeedSequences that are spawned now use a more efficient spawn_key, which should resolve the previous issue that tuple additions were taking up an appreciable fraction of the run time. The spawn keys are now a 2-tuple of (depth in tree, index at depth level), rather than being long tuples that get longer each time we split.

    There are some small downsides:

    • We're no longer as efficient for point-based queries. The efficiency drop should be relatively small: the increase in cost is logarithmic in (length of the interval / average step size), and moreover this doesn't affect us in torchsde anyway as we only ever made interval-based queries.
    • We're not longer running at C++ speeds. Practically speaking the difference doesn't seem to be huge: on the sdeint_adjoint benchmark, Old+Path+CPP=3.2s vs New+Path+Py=4s. Old+Tree+CPP=5s vs New+Tree+Py=8s. It's also interesting to see some of the Python speedups: Old+Tree+Py=25s vs New+Tree+Py=8s. Old+Interval+Py=13s vs New+Tree+Py=7.8s (Incidentally this is Tree w/ tol=1e-5 rather than the default 1e-6.) If we do want to return to C++ then I'd suggest using the current BrownianInterval code as a model.

    Overall I definitely think the upsides outweight the downsides.

    opened by patrick-kidger 14
  • Heun's method

    Heun's method

    @patrick-kidger Hi!

    As suggested it's PR for dev version with all latest comments applied from previous PR. Few questions:

    1. I pushed diagnostic/stratonovich_diagonal.py and modified problems.py to verify whether it's correctly implemented. Do we want it merged or should I remove it from PR? (right now it's rather draft file)
    2. SDEIto and SDEStratonovich introduce only param for sde_type. As ForwardSDEIto API suited my case for Stratonovich case I changed it to ForwardSDE which inherits from BaseSDE with sde_type param. (It's just a draft - changed only there).

    So will Forward Stratonovich SDE API differ from ForwardSDEIto or can we unify it that way?

    opened by mtsokol 13
  • SDE GAN example

    SDE GAN example

    Added an SDE-GAN example. Only a draft as I'm not satisfied that it's working yet / that it's as fast as it could be.

    I've added a couple other things in here too. For one, I've added a CHANGELOG.txt file. (Anything I've left out?) I've also tweaked the citation request with the new paper, to reflect the huge amount of work we've put into this repo for that paper.

    cla: yes 
    opened by patrick-kidger 11
  • PoC Heun's method

    PoC Heun's method

    Hi!

    I'm a master degree student and I've recently started learning SDE topics, and I came across your project.

    For the purpose of learning I've decided to try contributing - in known issues it states that Stratonovich methods are missing so I thought about trying that: Here's a draft of Euler counterpart - Heun's method.

    If you would have a bit of time for a code review and guidance and it won't be too difficult for me I would like to try contributing.

    It's just a draft with example run in demo notebook so without tests yet. What do you think about it?

    Do you already have a concrete plan for Stratonovich module? Or do you suggest other good-first-issue to start?

    Thank you for any help.

    good first issue 
    opened by mtsokol 11
  • Latent experiment

    Latent experiment

    Hi!

    Following next instructions in https://github.com/google-research/torchsde/pull/38#issuecomment-686559686 here's my idea for that - I've looked what flow exactly is for previous version available on master and hopefully recreated it.

    That's a result after 300 iterations (still not similar to 300 in previous version):

    WDYT?

    btw. running it locally was still slow and burning laptop so I ended up running it on CGP instance (e2-highcpu-8) as I found they have student packs and got one iteration at 4sec.

    opened by mtsokol 10
  • Add log-ODE scheme and simplify typing.

    Add log-ODE scheme and simplify typing.

    Main things here:

    • Add log-ODE scheme obtained with Lie-Trotter splitting and midpoint as described in your draft
    • Unify all the diagnostics files
    • Minor tweaks for problems and utility functions in diagnostics
    • Put all type info in types.py so that we don't have to import two modules elsewhere

    Here's the strong order plot rate

    Off the top of my head, I'm not entirely sure what strong order to expect. We could also do some W_2 error analysis, and I've done this in the past with pot for my sampling paper, but it's rather slow to run, given that these are cubic time algorithm IIRC. On another note, Sinkhorn would produce extra error.

    Thinking out loud, possible sources of error just for this set of diagnostics are

    • true solution approximation by midpoint
    • not enough samples to approximate the mse
    opened by lxuechen 10
  • Added double adjoint

    Added double adjoint

    Just creating a draft PR to let you know I'm working on double-backward through the adjoint.

    In informal tests it seems to work as expected. The only thing left to do is write some formal tests. I'll put them in test_adjoint.py and switch that file over to pytest while I'm at it.

    opened by patrick-kidger 9
  • Extend CI workflow to include PyPI dist publishing

    Extend CI workflow to include PyPI dist publishing

    Minor PR, title says it all.

    Publishing is triggered only by release events. To work, the addition requires setup of a PyPI password in repo secrets.

    Addresses this.

    cla: yes 
    opened by Zymrael 8
  • Added reversible Heun solver

    Added reversible Heun solver

    Code

    • Added the reversible Heun solver. This is the highlight of this PR. Most of the minor changes are due to getting this working, one way or another.
      • As this solver uses extra state beyond the solution y, this means that solvers can now carry along some extra state. In particular this state is initialised outside of _SdeintAdjointMethod so that the gradients through it are correct. This is why the code for adjoints has been adjusted in the way that it has, to make that possible.
    • Bugfixes:
      • the adjoint pass was using a single-dimensional flat tensor as its state. Given that the rest of the code is written to assume a batch dimension is present, it's a little surprising that this worked. Regardless it now adds a dummy batch dimension, just to be sure.
      • fixed some type hints that were wrong.

    Documentation

    • Documented logqp option -- my assumption is that we don't need to deprecate this, now that we have a relatively efficient+maintainble way of handling it.
    • Added advice on which solver to use when.

    Examples

    • Updated SDE-GAN example using the improvements introduced in the new paper. Now runs in about 3 hours rather than 8.

    Tests

    • Added new test_againt_sdeint that compares gradients for sdeint_adjoint against sdeint.

    Note that there's still a few TODOs against the arXiv reference for the new paper. I'm opening this as a PR so you can review it now if you get time, but no time pressure though. I'll fill in the reference once the paper is on arXiv, which will be on Monday.

    cla: yes 
    opened by patrick-kidger 8
  • Suggest to loosen the dependency on boltons

    Suggest to loosen the dependency on boltons

    Dear developers,

    Your project torchsde requires "boltons==20.2.1" in its dependency. After analyzing the source code, we found that the following versions of boltons can also be suitable without affecting your project, i.e., boltons 19.0.0, 19.0.1, 19.1.0, 19.2.0, 19.3.0, 20.0.0, 20.1.0, 20.2.0. Therefore, we suggest to loosen the dependency on boltons from "boltons==20.2.1" to "boltons>=19.0.0,<=20.2.1" to avoid any possible conflict for importing more packages or for downstream projects that may use ddos_script.

    May I pull a request to further loosen the dependency on boltons?

    By the way, could you please tell us whether such dependency analysis may be potentially helpful for maintaining dependencies easier during your development?



    Details:

    Your project (commit id: 53038a3efcd77f6c9f3cfd0310700a59be5d5d2d) directly uses 1 APIs from package boltons.

    boltons.cacheutils.LRU.__init__
    

    Beginning fromwhich, 5 functions are then indirectly called, including 4 boltons's internal APIs and 1 outsider APIs as follows:

    [/google-research/torchsde]
    +--boltons.cacheutils.LRU.__init__
    |      +--boltons.cacheutils.LRI.__init__
    |      |      +--boltons.cacheutils.RLock.__init__
    |      |      +--threading.RLock
    |      |      +--boltons.cacheutils.LRI._init_ll
    |      |      +--boltons.cacheutils.LRI.update
    

    Since all these functions have not been changed between any version for package "boltons" from [19.0.0, 19.0.1, 19.1.0, 19.2.0, 19.3.0, 20.0.0, 20.1.0, 20.2.0] and 20.2.1. Therefore, we believe it is safe to loosen the corresponding dependency.

    opened by Agnes-U 0
  • How to do double stochastic integrals?

    How to do double stochastic integrals?

    Is it possible to use or modify this code to do double stochastic integrals in a reasonable way, in order to support higher order SDE solvers? I am using your BrownianTree implementation with my own custom solvers right now and would like to try a higher order solver.

    Thank you, Katherine Crowson

    opened by crowsonkb 1
  • Solver for

    Solver for "general"-type noise missing...

    I was trying to use general noise with either sdeint or sdeint_adjoint. But I always get the error message below:

    SDE has noise type general but solver only supports noise types ('additive', 'diagonal', 'scalar')

    Would you mind implementing the corresponding solver?

    Another question: what are the meanings of g_prod, gdg_prod and g_prod_and_gdg_prod?

    Thanks

    opened by hanmingcr 0
  • Query about the reproducibility of the Motion Capture dataset in

    Query about the reproducibility of the Motion Capture dataset in "Scalable Gradients..." (Li et al., 2020)

    I am trying to reproduce the results of the CMU Motion Capture dataset. I use references from examples/latent_sde_lorenz.py, the paper, and the preprocessed dataset linked by ODE2VAE's repo.

    My current runs' results have large discrepancies from the results in the paper so I want to check if there are any training details I'm missing. (I am not very familiar with Bayesian modeling so I try to follow the hyperparameters in the paper as closely as possible.)

    Here are the main issues:

    • The validation and the test MSE for Latent SDE are in the range of 20-30, while Latent SDE's Test MSE in the paper is 4.03 +- 0.20.
    • The log-likelihood term in the ELBO for Latent SDE is in the magnitude of 10^6 to 10^9 depending on the choice of hyperparameters, while the code for ODE2VAE has the log-likelihood for ODE2VAE in the magnitude of 10^4.

    Here are the main training details I think I am most likely to wrongly interpret from the paper:

    • When calculating the log-likelihood, I am following your example and take the sum over all timesteps and the data dimension, then take the mean over the number of samples. Please let me know if this is correct for the CMU mocap dataset.
    • In training, the prediction and the log-likelihood should only be calculated for the last 297 predictions (since the first 3 observations are used to encode the context).
    • The solver used is not mentioned. I have tried euler_heun, milstein, and even reversible_heun.
    • The initial learning rate for Adam is 0.01 in the paper, but in some runs with the initial lr to be 0.001, they seem to be more stable. I'm curious if you have any comments about this.
    • The dt for the solver is said to be 1/5 of the minimum time difference between two observations. All observations are regular so we can choose the minimum time to be a particular value a (eg., 1), then dt would be 0.2 * a. I want to know if my interpretation here is correct. The paper didn't mention this value a or the start/end time. It would be nice if you remembered this.

    Tagging @lxuechen since you know the most about the exp details. Thank you for your time!

    opened by nghiahhnguyen 8
  • Computational time for Brownian Interval

    Computational time for Brownian Interval

    I observed something strange about computation time for brownain interval

    sde.cnt = 0
    %timeit -n 2 -r 7 torchsde.sdeint(sde, y0, th.tensor([0.0, 1.0]), dt=0.01, method="euler")
    print(sde.cnt)
    # 1.87 s ± 60.6 ms per loop (mean ± std. dev. of 7 runs, 2 loops each)
    # 1428
    
    sde.cnt = 0
    %timeit -n 2 -r 7 torchsde.sdeint(sde, y0, th.tensor([0.0, 5.0]), dt=0.05, method="euler")
    print(sde.cnt)
    # 57.3 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 2 loops each)
    # 1414
    
    sde.cnt = 0
    %timeit -n 2 -r 7 torchsde.sdeint(sde, y0, th.tensor([0.0, 10.0]), dt=0.1, method="euler")
    print(sde.cnt)
    # 57.2 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 2 loops each)
    # 1414
    

    where the sde is very similar to the one defined in the Quick example in README. In the above three examples, I change the different ts and dt. I think they should have roughly the same computation time. But it turns out the time used by the line are very different. According to the paper, the worse case should roughly be O(log T/dt) if I understand correctly. Why the first case is so slow?

    opened by qsh-zh 2
  • Added publishing workflow

    Added publishing workflow

    Prompted by (and I expect superseding) #100, I've put together a publish workflow. It should automatically upload a build to PyPI whenever we do a release.

    Untested for obvious reasons.

    At the moment it uses a PyPI username + password for authentication. We can change that to using an API token if you prefer.

    cla: yes 
    opened by patrick-kidger 2
Releases(v0.2.4)
  • v0.2.4(Jan 5, 2021)

    Efficiency improvements:

    • add f_and_g and f_and_g_prod functions so that drift and diffusion can be computed together

    Bug fixes:

    • calling solver.integrate twice internally
    • adaptive error estimation not recognizing singleton tensors
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Oct 22, 2020)

    New features include

    • BrownianInterval: A Brownian motion data structure that has constant memory storage and exact queries relying on LRU caches
    • Basic solvers for Stratonovich SDEs
    • Full adjoint support for Ito and Stratonovich SDEs for all noise types declared in the codebase
    • Various Python performance enhancements
    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Jul 28, 2020)

    • Add new Brownian motion classes with faster query speed based on PyTorch C++ API.
      • The new Brownian motion classes have the same API as existing ones, so they serve as direct replacements.
      • Importing these classes is as simple as from torchsde.brownian_lib import BrownianPath, BrownianTree.
      • The old Brownian motion classes written in pure Python are not yet deprecated, and likely won't be deprecated in the near future.
    • Add type hints for functions of the public API.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Jul 8, 2020)

Owner
Google Research
Google Research
A code copied from google-research which named motion-imitation was rewrited with PyTorch

motor-system Introduction A code copied from google-research which named motion-imitation was rewrited with PyTorch. More details can get from this pr

NewEra 6 Jan 08, 2022
PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

Cong Cai 12 Dec 19, 2021
PyTorch toolkit for biomedical imaging

farabio is a minimal PyTorch toolkit for out-of-the-box deep learning support in biomedical imaging. For further information, see Wikis and Docs.

San Askaruly 47 Dec 28, 2022
PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

PyTorch implementation of [1611.06440 Pruning Convolutional Neural Networks for Resource Efficient Inference] This demonstrates pruning a VGG16 based

Jacob Gildenblat 836 Dec 26, 2022
Pretrained EfficientNet, EfficientNet-Lite, MixNet, MobileNetV3 / V2, MNASNet A1 and B1, FBNet, Single-Path NAS

(Generic) EfficientNets for PyTorch A 'generic' implementation of EfficientNet, MixNet, MobileNetV3, etc. that covers most of the compute/parameter ef

Ross Wightman 1.5k Jan 01, 2023
Tacotron 2 - PyTorch implementation with faster-than-realtime inference

Tacotron 2 (without wavenet) PyTorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions. This implementati

NVIDIA Corporation 4.1k Jan 03, 2023
A few Windows specific scripts for PyTorch

It is a repo that contains scripts that makes using PyTorch on Windows easier. Easy Installation Update: Starting from 0.4.0, you can go to the offici

408 Dec 15, 2022
TorchShard is a lightweight engine for slicing a PyTorch tensor into parallel shards

TorchShard is a lightweight engine for slicing a PyTorch tensor into parallel shards. It can reduce GPU memory and scale up the training when the model has massive linear layers (e.g., ViT, BERT and

Kaiyu Yue 275 Nov 22, 2022
A Pytorch Implementation for Compact Bilinear Pooling.

CompactBilinearPooling-Pytorch A Pytorch Implementation for Compact Bilinear Pooling. Adapted from tensorflow_compact_bilinear_pooling Prerequisites I

169 Dec 23, 2022
A Closer Look at Structured Pruning for Neural Network Compression

A Closer Look at Structured Pruning for Neural Network Compression Code used to reproduce experiments in https://arxiv.org/abs/1810.04622. To prune, w

Bayesian and Neural Systems Group 140 Dec 05, 2022
Over9000 optimizer

Optimizers and tests Every result is avg of 20 runs. Dataset LR Schedule Imagenette size 128, 5 epoch Imagewoof size 128, 5 epoch Adam - baseline OneC

Mikhail Grankin 405 Nov 27, 2022
High-level batteries-included neural network training library for Pytorch

Pywick High-Level Training framework for Pytorch Pywick is a high-level Pytorch training framework that aims to get you up and running quickly with st

382 Dec 06, 2022
Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training

Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training

Martin Krasser 251 Dec 25, 2022
A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch

Torchmeta A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch. Torchmeta contains popular meta-learning bench

Tristan Deleu 1.7k Jan 06, 2023
On the Variance of the Adaptive Learning Rate and Beyond

RAdam On the Variance of the Adaptive Learning Rate and Beyond We are in an early-release beta. Expect some adventures and rough edges. Table of Conte

Liyuan Liu 2.5k Dec 27, 2022
lookahead optimizer (Lookahead Optimizer: k steps forward, 1 step back) for pytorch

lookahead optimizer for pytorch PyTorch implement of Lookahead Optimizer: k steps forward, 1 step back Usage: base_opt = torch.optim.Adam(model.parame

Liam 318 Dec 09, 2022
PyTorch implementations of normalizing flow and its variants.

PyTorch implementations of normalizing flow and its variants.

Tatsuya Yatagawa 55 Dec 01, 2022
PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf

README TabNet : Attentive Interpretable Tabular Learning This is a pyTorch implementation of Tabnet (Arik, S. O., & Pfister, T. (2019). TabNet: Attent

DreamQuark 2k Dec 27, 2022
Tutorial for surrogate gradient learning in spiking neural networks

SpyTorch A tutorial on surrogate gradient learning in spiking neural networks Version: 0.4 This repository contains tutorial files to get you started

Friedemann Zenke 203 Nov 28, 2022
Tez is a super-simple and lightweight Trainer for PyTorch. It also comes with many utils that you can use to tackle over 90% of deep learning projects in PyTorch.

Tez: a simple pytorch trainer NOTE: Currently, we are not accepting any pull requests! All PRs will be closed. If you want a feature or something does

abhishek thakur 1.1k Jan 04, 2023