torch-optimizer -- collection of optimizers for Pytorch

Overview

torch-optimizer

GitHub Actions status for master branch Documentation Status

torch-optimizer -- collection of optimizers for PyTorch compatible with optim module.

Simple example

import torch_optimizer as optim

# model = ...
optimizer = optim.DiffGrad(model.parameters(), lr=0.001)
optimizer.step()

Installation

Installation process is simple, just:

$ pip install torch_optimizer

Documentation

https://pytorch-optimizer.rtfd.io

Supported Optimizers

A2GradExp https://arxiv.org/abs/1810.00553
A2GradInc https://arxiv.org/abs/1810.00553
A2GradUni https://arxiv.org/abs/1810.00553
AccSGD https://arxiv.org/abs/1803.05591
AdaBelief https://arxiv.org/abs/2010.07468
AdaBound https://arxiv.org/abs/1902.09843
AdaMod https://arxiv.org/abs/1910.12249
Adafactor https://arxiv.org/abs/1804.04235
Adahessian https://arxiv.org/abs/2006.00719
AdamP https://arxiv.org/abs/2006.08217
AggMo https://arxiv.org/abs/1804.00325
Apollo https://arxiv.org/abs/2009.13586
DiffGrad https://arxiv.org/abs/1909.11015
Lamb https://arxiv.org/abs/1904.00962
Lookahead https://arxiv.org/abs/1907.08610
NovoGrad https://arxiv.org/abs/1905.11286
PID https://www4.comp.polyu.edu.hk/~cslzhang/paper/CVPR18_PID.pdf
QHAdam https://arxiv.org/abs/1810.06801
QHM https://arxiv.org/abs/1810.06801
RAdam https://arxiv.org/abs/1908.03265
Ranger https://medium.com/@lessw/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d
RangerQH https://arxiv.org/abs/1810.06801
RangerVA https://arxiv.org/abs/1908.00700v2
SGDP https://arxiv.org/abs/2006.08217
SGDW https://arxiv.org/abs/1608.03983
SWATS https://arxiv.org/abs/1712.07628
Shampoo https://arxiv.org/abs/1802.09568
Yogi https://papers.nips.cc/paper/8186-adaptive-methods-for-nonconvex-optimization

Visualizations

Visualizations help us to see how different algorithms deals with simple situations like: saddle points, local minima, valleys etc, and may provide interesting insights into inner workings of algorithm. Rosenbrock and Rastrigin benchmark functions was selected, because:

  • Rosenbrock (also known as banana function), is non-convex function that has one global minima (1.0. 1.0). The global minimum is inside a long, narrow, parabolic shaped flat valley. To find the valley is trivial. To converge to the global minima, however, is difficult. Optimization algorithms might pay a lot of attention to one coordinate, and have problems to follow valley which is relatively flat.
  • Rastrigin function is a non-convex and has one global minima in (0.0, 0.0). Finding the minimum of this function is a fairly difficult problem due to its large search space and its large number of local minima.

    https://upload.wikimedia.org/wikipedia/commons/8/8b/Rastrigin_function.png

Each optimizer performs 501 optimization steps. Learning rate is best one found by hyper parameter search algorithm, rest of tuning parameters are default. It is very easy to extend script and tune other optimizer parameters.

python examples/viz_optimizers.py

Warning

Do not pick optimizer based on visualizations, optimization approaches have unique properties and may be tailored for different purposes or may require explicit learning rate schedule etc. Best way to find out, is to try one on your particular problem and see if it improves scores.

If you do not know which optimizer to use start with built in SGD/Adam, once training logic is ready and baseline scores are established, swap optimizer and see if there is any improvement.

A2GradExp

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_A2GradExp.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_A2GradExp.png
import torch_optimizer as optim

# model = ...
optimizer = optim.A2GradExp(
    model.parameters(),
    kappa=1000.0,
    beta=10.0,
    lips=10.0,
    rho=0.5,
)
optimizer.step()

Paper: Optimal Adaptive and Accelerated Stochastic Gradient Descent (2018) [https://arxiv.org/abs/1810.00553]

Reference Code: https://github.com/severilov/A2Grad_optimizer

A2GradInc

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_A2GradInc.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_A2GradInc.png
import torch_optimizer as optim

# model = ...
optimizer = optim.A2GradInc(
    model.parameters(),
    kappa=1000.0,
    beta=10.0,
    lips=10.0,
)
optimizer.step()

Paper: Optimal Adaptive and Accelerated Stochastic Gradient Descent (2018) [https://arxiv.org/abs/1810.00553]

Reference Code: https://github.com/severilov/A2Grad_optimizer

A2GradUni

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_A2GradUni.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_A2GradUni.png
import torch_optimizer as optim

# model = ...
optimizer = optim.A2GradUni(
    model.parameters(),
    kappa=1000.0,
    beta=10.0,
    lips=10.0,
)
optimizer.step()

Paper: Optimal Adaptive and Accelerated Stochastic Gradient Descent (2018) [https://arxiv.org/abs/1810.00553]

Reference Code: https://github.com/severilov/A2Grad_optimizer

AccSGD

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_AccSGD.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_AccSGD.png
import torch_optimizer as optim

# model = ...
optimizer = optim.AccSGD(
    model.parameters(),
    lr=1e-3,
    kappa=1000.0,
    xi=10.0,
    small_const=0.7,
    weight_decay=0
)
optimizer.step()

Paper: On the insufficiency of existing momentum schemes for Stochastic Optimization (2019) [https://arxiv.org/abs/1803.05591]

Reference Code: https://github.com/rahulkidambi/AccSGD

AdaBelief

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_AdaBelief.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_AdaBelief.png
import torch_optimizer as optim

# model = ...
optimizer = optim.AdaBelief(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-3,
    weight_decay=0,
    amsgrad=False,
    weight_decouple=False,
    fixed_decay=False,
    rectify=False,
)
optimizer.step()

Paper: AdaBelief Optimizer, adapting stepsizes by the belief in observed gradients (2020) [https://arxiv.org/abs/2010.07468]

Reference Code: https://github.com/juntang-zhuang/Adabelief-Optimizer

AdaBound

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_AdaBound.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_AdaBound.png
import torch_optimizer as optim

# model = ...
optimizer = optim.AdaBound(
    m.parameters(),
    lr= 1e-3,
    betas= (0.9, 0.999),
    final_lr = 0.1,
    gamma=1e-3,
    eps= 1e-8,
    weight_decay=0,
    amsbound=False,
)
optimizer.step()

Paper: Adaptive Gradient Methods with Dynamic Bound of Learning Rate (2019) [https://arxiv.org/abs/1902.09843]

Reference Code: https://github.com/Luolc/AdaBound

AdaMod

AdaMod method restricts the adaptive learning rates with adaptive and momental upper bounds. The dynamic learning rate bounds are based on the exponential moving averages of the adaptive learning rates themselves, which smooth out unexpected large learning rates and stabilize the training of deep neural networks.

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_AdaMod.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_AdaMod.png
import torch_optimizer as optim

# model = ...
optimizer = optim.AdaMod(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    beta3=0.999,
    eps=1e-8,
    weight_decay=0,
)
optimizer.step()

Paper: An Adaptive and Momental Bound Method for Stochastic Learning. (2019) [https://arxiv.org/abs/1910.12249]

Reference Code: https://github.com/lancopku/AdaMod

Adafactor

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_Adafactor.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_Adafactor.png
import torch_optimizer as optim

# model = ...
optimizer = optim.Adafactor(
    m.parameters(),
    lr= 1e-3,
    eps2= (1e-30, 1e-3),
    clip_threshold=1.0,
    decay_rate=-0.8,
    beta1=None,
    weight_decay=0.0,
    scale_parameter=True,
    relative_step=True,
    warmup_init=False,
)
optimizer.step()

Paper: Adafactor: Adaptive Learning Rates with Sublinear Memory Cost. (2018) [https://arxiv.org/abs/1804.04235]

Reference Code: https://github.com/pytorch/fairseq/blob/master/fairseq/optim/adafactor.py

Adahessian

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_Adahessian.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_Adahessian.png
import torch_optimizer as optim

# model = ...
optimizer = optim.Adahessian(
    m.parameters(),
    lr= 1.0,
    betas= (0.9, 0.999)
    eps= 1e-4,
    weight_decay=0.0,
    hessian_power=1.0,
)
      loss_fn(m(input), target).backward(create_graph = True) # create_graph=True is necessary for Hessian calculation
optimizer.step()

Paper: ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning (2020) [https://arxiv.org/abs/2006.00719]

Reference Code: https://github.com/amirgholami/adahessian

AdamP

AdamP propose a simple and effective solution: at each iteration of Adam optimizer applied on scale-invariant weights (e.g., Conv weights preceding a BN layer), AdamP remove the radial component (i.e., parallel to the weight vector) from the update vector. Intuitively, this operation prevents the unnecessary update along the radial direction that only increases the weight norm without contributing to the loss minimization.

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_AdamP.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_AdamP.png
import torch_optimizer as optim

# model = ...
optimizer = optim.AdamP(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
    delta = 0.1,
    wd_ratio = 0.1
)
optimizer.step()

Paper: Slowing Down the Weight Norm Increase in Momentum-based Optimizers. (2020) [https://arxiv.org/abs/2006.08217]

Reference Code: https://github.com/clovaai/AdamP

AggMo

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_AggMo.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_AggMo.png
import torch_optimizer as optim

# model = ...
optimizer = optim.AggMo(
    m.parameters(),
    lr= 1e-3,
    betas=(0.0, 0.9, 0.99),
    weight_decay=0,
)
optimizer.step()

Paper: Aggregated Momentum: Stability Through Passive Damping. (2019) [https://arxiv.org/abs/1804.00325]

Reference Code: https://github.com/AtheMathmo/AggMo

Apollo

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_Apollo.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_Apollo.png
import torch_optimizer as optim

# model = ...
optimizer = optim.Apollo(
    m.parameters(),
    lr= 1e-2,
    beta=0.9,
    eps=1e-4,
    warmup=0,
    init_lr=0.01,
    weight_decay=0,
)
optimizer.step()

Paper: Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization. (2020) [https://arxiv.org/abs/2009.13586]

Reference Code: https://github.com/XuezheMax/apollo

DiffGrad

Optimizer based on the difference between the present and the immediate past gradient, the step size is adjusted for each parameter in such a way that it should have a larger step size for faster gradient changing parameters and a lower step size for lower gradient changing parameters.

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_DiffGrad.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_DiffGrad.png
import torch_optimizer as optim

# model = ...
optimizer = optim.DiffGrad(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
)
optimizer.step()

Paper: diffGrad: An Optimization Method for Convolutional Neural Networks. (2019) [https://arxiv.org/abs/1909.11015]

Reference Code: https://github.com/shivram1987/diffGrad

Lamb

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_Lamb.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_Lamb.png
import torch_optimizer as optim

# model = ...
optimizer = optim.Lamb(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
)
optimizer.step()

Paper: Large Batch Optimization for Deep Learning: Training BERT in 76 minutes (2019) [https://arxiv.org/abs/1904.00962]

Reference Code: https://github.com/cybertronai/pytorch-lamb

Lookahead

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_LookaheadYogi.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_LookaheadYogi.png
import torch_optimizer as optim

# model = ...
# base optimizer, any other optimizer can be used like Adam or DiffGrad
yogi = optim.Yogi(
    m.parameters(),
    lr= 1e-2,
    betas=(0.9, 0.999),
    eps=1e-3,
    initial_accumulator=1e-6,
    weight_decay=0,
)

optimizer = optim.Lookahead(yogi, k=5, alpha=0.5)
optimizer.step()

Paper: Lookahead Optimizer: k steps forward, 1 step back (2019) [https://arxiv.org/abs/1907.08610]

Reference Code: https://github.com/alphadl/lookahead.pytorch

NovoGrad

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_NovoGrad.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_NovoGrad.png
import torch_optimizer as optim

# model = ...
optimizer = optim.NovoGrad(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
    grad_averaging=False,
    amsgrad=False,
)
optimizer.step()

Paper: Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks (2019) [https://arxiv.org/abs/1905.11286]

Reference Code: https://github.com/NVIDIA/DeepLearningExamples/

PID

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_PID.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_PID.png
import torch_optimizer as optim

# model = ...
optimizer = optim.PID(
    m.parameters(),
    lr=1e-3,
    momentum=0,
    dampening=0,
    weight_decay=1e-2,
    integral=5.0,
    derivative=10.0,
)
optimizer.step()

Paper: A PID Controller Approach for Stochastic Optimization of Deep Networks (2018) [http://www4.comp.polyu.edu.hk/~cslzhang/paper/CVPR18_PID.pdf]

Reference Code: https://github.com/tensorboy/PIDOptimizer

QHAdam

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_QHAdam.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_QHAdam.png
import torch_optimizer as optim

# model = ...
optimizer = optim.QHAdam(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    nus=(1.0, 1.0),
    weight_decay=0,
    decouple_weight_decay=False,
    eps=1e-8,
)
optimizer.step()

Paper: Quasi-hyperbolic momentum and Adam for deep learning (2019) [https://arxiv.org/abs/1810.06801]

Reference Code: https://github.com/facebookresearch/qhoptim

QHM

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_QHM.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_QHM.png
import torch_optimizer as optim

# model = ...
optimizer = optim.QHM(
    m.parameters(),
    lr=1e-3,
    momentum=0,
    nu=0.7,
    weight_decay=1e-2,
    weight_decay_type='grad',
)
optimizer.step()

Paper: Quasi-hyperbolic momentum and Adam for deep learning (2019) [https://arxiv.org/abs/1810.06801]

Reference Code: https://github.com/facebookresearch/qhoptim

RAdam

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_RAdam.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_RAdam.png
import torch_optimizer as optim

# model = ...
optimizer = optim.RAdam(
    m.parameters(),
    lr= 1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,
)
optimizer.step()

Paper: On the Variance of the Adaptive Learning Rate and Beyond (2019) [https://arxiv.org/abs/1908.03265]

Reference Code: https://github.com/LiyuanLucasLiu/RAdam

Ranger

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_Ranger.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_Ranger.png
import torch_optimizer as optim

# model = ...
optimizer = optim.Ranger(
    m.parameters(),
    lr=1e-3,
    alpha=0.5,
    k=6,
    N_sma_threshhold=5,
    betas=(.95, 0.999),
    eps=1e-5,
    weight_decay=0
)
optimizer.step()

Paper: New Deep Learning Optimizer, Ranger: Synergistic combination of RAdam + LookAhead for the best of both (2019) [https://medium.com/@lessw/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d]

Reference Code: https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

RangerQH

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_RangerQH.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_RangerQH.png
import torch_optimizer as optim

# model = ...
optimizer = optim.RangerQH(
    m.parameters(),
    lr=1e-3,
    betas=(0.9, 0.999),
    nus=(.7, 1.0),
    weight_decay=0.0,
    k=6,
    alpha=.5,
    decouple_weight_decay=False,
    eps=1e-8,
)
optimizer.step()

Paper: Quasi-hyperbolic momentum and Adam for deep learning (2018) [https://arxiv.org/abs/1810.06801]

Reference Code: https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

RangerVA

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_RangerVA.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_RangerVA.png
import torch_optimizer as optim

# model = ...
optimizer = optim.RangerVA(
    m.parameters(),
    lr=1e-3,
    alpha=0.5,
    k=6,
    n_sma_threshhold=5,
    betas=(.95, 0.999),
    eps=1e-5,
    weight_decay=0,
    amsgrad=True,
    transformer='softplus',
    smooth=50,
    grad_transformer='square'
)
optimizer.step()

Paper: Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM (2019) [https://arxiv.org/abs/1908.00700v2]

Reference Code: https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

SGDP

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_SGDP.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_SGDP.png
import torch_optimizer as optim

# model = ...
optimizer = optim.SGDP(
    m.parameters(),
    lr= 1e-3,
    momentum=0,
    dampening=0,
    weight_decay=1e-2,
    nesterov=False,
    delta = 0.1,
    wd_ratio = 0.1
)
optimizer.step()

Paper: Slowing Down the Weight Norm Increase in Momentum-based Optimizers. (2020) [https://arxiv.org/abs/2006.08217]

Reference Code: https://github.com/clovaai/AdamP

SGDW

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_SGDW.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_SGDW.png
import torch_optimizer as optim

# model = ...
optimizer = optim.SGDW(
    m.parameters(),
    lr= 1e-3,
    momentum=0,
    dampening=0,
    weight_decay=1e-2,
    nesterov=False,
)
optimizer.step()

Paper: SGDR: Stochastic Gradient Descent with Warm Restarts (2017) [https://arxiv.org/abs/1608.03983]

Reference Code: https://github.com/pytorch/pytorch/pull/22466

SWATS

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_SWATS.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_SWATS.png
import torch_optimizer as optim

# model = ...
optimizer = optim.SWATS(
    model.parameters(),
    lr=1e-1,
    betas=(0.9, 0.999),
    eps=1e-3,
    weight_decay= 0.0,
    amsgrad=False,
    nesterov=False,
)
optimizer.step()

Paper: Improving Generalization Performance by Switching from Adam to SGD (2017) [https://arxiv.org/abs/1712.07628]

Reference Code: https://github.com/Mrpatekful/swats

Shampoo

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_Shampoo.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_Shampoo.png
import torch_optimizer as optim

# model = ...
optimizer = optim.Shampoo(
    m.parameters(),
    lr=1e-1,
    momentum=0.0,
    weight_decay=0.0,
    epsilon=1e-4,
    update_freq=1,
)
optimizer.step()

Paper: Shampoo: Preconditioned Stochastic Tensor Optimization (2018) [https://arxiv.org/abs/1802.09568]

Reference Code: https://github.com/moskomule/shampoo.pytorch

Yogi

Yogi is optimization algorithm based on ADAM with more fine grained effective learning rate control, and has similar theoretical guarantees on convergence as ADAM.

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_Yogi.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_Yogi.png
import torch_optimizer as optim

# model = ...
optimizer = optim.Yogi(
    m.parameters(),
    lr= 1e-2,
    betas=(0.9, 0.999),
    eps=1e-3,
    initial_accumulator=1e-6,
    weight_decay=0,
)
optimizer.step()

Paper: Adaptive Methods for Nonconvex Optimization (2018) [https://papers.nips.cc/paper/8186-adaptive-methods-for-nonconvex-optimization]

Reference Code: https://github.com/4rtemi5/Yogi-Optimizer_Keras

Adam (PyTorch built-in)

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_Adam.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_Adam.png

SGD (PyTorch built-in)

https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rastrigin_SGD.png https://raw.githubusercontent.com/jettify/pytorch-optimizer/master/docs/rosenbrock_SGD.png
Comments
  • Add Ranger optimizer

    Add Ranger optimizer

    Hello

    paper: https://arxiv.org/abs/1908.00700v2 implementation: https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

    Also: The clamp parameter for the weight_norm (=10) is hardcoded in LAMB, can you add a new parameter to custom it ? weight_norm = p.data.pow(2).sum().sqrt().clamp(0, 10) You can use torch.norm(.) to compute the norm.

    Thank you.

    opened by tkon3 8
  • Using GPU to train the model

    Using GPU to train the model

    Hello, I'm really appreciate your work. But now I wonder how to use GPU to train the model. There are always mistakes when I use the CUDA device. Thanks a lot. device = torch.device('cuda' if use_cuda else 'cpu')

    opened by penny9287 7
  • Add adahessian

    Add adahessian

    Hello,

    I´m a big fan of this project. Recently a new optimized has been proposed, that promises SOTA for many tasks.

    https://github.com/amirgholami/adahessian

    The possibility of using it here is very good!

    opened by bratao 6
  • Lamb optimizer warning in pytorch 1.6

    Lamb optimizer warning in pytorch 1.6

    Hi I'm getting this deprecated warning in pytorch 1.6 for Lamb:

    
      | 2020-06-25T01:58:41.682+01:00 | add_(Tensor other, *, Number alpha) (Triggered internally at /opt/conda/conda-bld/pytorch_1592982553767/work/torch/csrc/utils/python_arg_parser.cpp:766.)
    -- | -- | --
      | 2020-06-25T01:58:41.682+01:00 | exp_avg.mul_(beta1).add_(1 - beta1, grad)
      | 2020-06-25T01:58:41.682+01:00 | 2020-06-25T00:58:41 - WARNING - /opt/conda/envs/py36/lib/python3.6/site-packages/torch_optimizer/lamb.py:120: UserWarning: This overload of add_ is deprecated:
      | 2020-06-25T01:58:41.682+01:00 | add_(Number alpha, Tensor other)
      | 2020-06-25T01:58:41.682+01:00 | Consider using one of the following signatures instead:
      | 2020-06-25T01:58:41.682+01:00 | add_(Tensor other, *, Number alpha) (Triggered internally at /opt/conda/conda-bld/pytorch_1592982553767/work/torch/csrc/utils/python_arg_parser.cpp:766.)
      | 2020-06-25T01:58:41.682+01:00 | exp_avg.mul_(beta1).add_(1 - beta1, grad)
    
    
    
    opened by Laksh1997 6
  • Unfair comparison in Visualizations

    Unfair comparison in Visualizations

    Hi,

    Thanks a lot for this great repo. For the comparison in the Visualizations example, I found that for each config, you run 100 updates. I am concerned that 100 is too small so that it would favor optimizers that have fast convergence in the first few updates.

    For other optimizers that the convergence is relatively slow at beginning, it would select large lr. This could lead to unstable convergence for these optimizers.

    Moreover, for hyper-parameter search, the objective is the distance between the last step point and the minimum. I think the function value of the last step point may be a better objective.

    At last, some optimizers implicitly implement learning rate decay (such as AdaBound and RAdam), but some not. But in your comparison, no explicit learning rate schedule is used.

    opened by XuezheMax 5
  • 'Yogi' object has no attribute 'Yogi'

    'Yogi' object has no attribute 'Yogi'

    Hi, if calling yogi from pytorch-optimizer has some bug ( runtime error, Yogi object has no attribute Yogi). so at this moment, i am calling yogi.py directly.

    #import torch_optimizer as optim      # in second iteration (for statement), it getting error. 
    from yogi import Yogi                        # import from yogi.py file (include types.py definition)
    
    for fold, (train_idx, val_idx) in enumerate(...):
        model = Net(...)
        # optim = optim.Yogi(model, parameters(), lr=1e-2, betas=(0.9, 0.999), eps=1e-3, initial_accumulator=1e-6, weight_decay=0)
        optim = Yogi(model, parameters(), lr=1e-2, betas=(0.9, 0.999), eps=1e-3, initial_accumulator=1e-6, weight_decay=0)
        ...
    
    opened by sailfish009 5
  • Add torch_optimizer.get() method

    Add torch_optimizer.get() method

    The get method

    How would you feel about a function like this?

    It makes it very easy to change between optimizer to test some of them, for example

    import argparse
    import torch_optimizer as optim
    
    parser = argparse.ArgumentParser()
    parser.add_argument('--optimizer', default='AccSGD')
    parser.add_argument('--lr', type=float, default=1e-3)
    
    if __name__ == '__main__':
        args = parser.parse_args()
        opt_class = optim.get(args.optimizer)
        optimizer = opt_class(model.parameters(), lr=args.lr, **kwargs)
    

    I can be improved and restricted as well. If it would be me, I'd make aliases, and probably search globals only for things in __all__ and their aliases, and make the search case-insensitive.

    Tell me if you'd be interested.

    Drop-in replacement for torch.optim

    I would also import all optimizers from torch.optim directly, so that Adam could also be imported from here. If both these things are adopted, it would be easier than ever to compare between Adam, Radam, SGD and AccSGD for exampe. As simple as

    python train.py --optimizer adam
    python train.py --optimizer radam
    python train.py --optimizer sgd
    python train.py --optimizer accsgd
    

    With only one import (torch_optimizer) and no if statements.

    opened by mpariente 5
  • Add Rangers

    Add Rangers

    As discussed in #64, just added the three Rangers as a dependency.

    For now, the params can't be tested because the error messages are not the same. Also, I could not make the beale test work for Ranger.

    This is just a draft, feel free to push to my fork if you want to change the PR.

    Regarding the docstrings and types, I wonder if it wouldn't be easier to sublass the optimizers here so that we can use the object in types.py. Or should I copy them there?

    opened by mpariente 5
  • YOGI Initialization

    YOGI Initialization

    exp_avg_sq Initialization

    "Thus, for YOGI, we propose to initialize the vt based on gradient square evaluated at the initial point averaged over a (reasonably large) mini-batch."

    The initial exp_avg_sq should be initialized to the gradient square.

    exp_avg Initialization

    image

    The YOGI optimizer exp_avg should be initialized to zero instead of initial_accumulator based on m0 above.

    opened by PetrochukM 5
  • Add baseline visualizations

    Add baseline visualizations

    I love the illustrations, but I find the absence of any kind of baselines a shame. It'd be nice to see how Adam or SGD do on the example functions and compare them with some of the more fancy optimizers.

    Would this be possible?

    I can probably run the required experiments myself, if there are no problems.

    opened by slckl 5
  • Add batch invariant optimizer AdaScale?

    Add batch invariant optimizer AdaScale?

    AdaScale was introduced in this paper: https://openreview.net/forum?id=rygxdA4YPS

    The paper showed that the proposed optimizer is able to get the same results across five tasks regardless of the batch size (testing from 32 to 32.8k). Here's the graphic: image

    It'd be nice to have a batch invariant optimizer included.

    opened by PetrochukM 5
  • Bump sphinx from 4.2.0 to 6.1.2

    Bump sphinx from 4.2.0 to 6.1.2

    Bumps sphinx from 4.2.0 to 6.1.2.

    Release notes

    Sourced from sphinx's releases.

    v6.1.2

    Changelog: https://www.sphinx-doc.org/en/master/changes.html

    v6.1.1

    Changelog: https://www.sphinx-doc.org/en/master/changes.html

    v6.1.0

    Changelog: https://www.sphinx-doc.org/en/master/changes.html

    v6.0.1

    Changelog: https://www.sphinx-doc.org/en/master/changes.html

    v6.0.0

    Changelog: https://www.sphinx-doc.org/en/master/changes.html

    v6.0.0b2

    Changelog: https://www.sphinx-doc.org/en/master/changes.html

    v6.0.0b1

    Changelog: https://www.sphinx-doc.org/en/master/changes.html

    v5.3.0

    Changelog: https://www.sphinx-doc.org/en/master/changes.html

    v5.2.3

    Changelog: https://www.sphinx-doc.org/en/master/changes.html

    v5.2.2

    Changelog: https://www.sphinx-doc.org/en/master/changes.html

    v5.2.1

    Changelog: https://www.sphinx-doc.org/en/master/changes.html

    v5.2.0

    Changelog: https://www.sphinx-doc.org/en/master/changes.html

    v5.1.1

    Changelog: https://www.sphinx-doc.org/en/master/changes.html

    v5.1.0

    Changelog: https://www.sphinx-doc.org/en/master/changes.html

    v5.0.2

    Changelog: https://www.sphinx-doc.org/en/master/changes.html

    v5.0.1

    Changelog: https://www.sphinx-doc.org/en/master/changes.html

    v5.0.0

    No release notes provided.

    ... (truncated)

    Changelog

    Sourced from sphinx's changelog.

    Release 6.1.2 (released Jan 07, 2023)

    Bugs fixed

    • #11101: LaTeX: div.topic_padding key of sphinxsetup documented at 5.1.0 was implemented with name topic_padding

    • #11099: LaTeX: shadowrule key of sphinxsetup causes PDF build to crash since Sphinx 5.1.0

    • #11096: LaTeX: shadowsize key of sphinxsetup causes PDF build to crash since Sphinx 5.1.0

    • #11095: LaTeX: shadow of :dudir:topic and contents_ boxes not in page margin since Sphinx 5.1.0

      .. _contents: https://docutils.sourceforge.io/docs/ref/rst/directives.html#table-of-contents

    • #11100: Fix copying images when running under parallel mode.

    Release 6.1.1 (released Jan 05, 2023)

    Bugs fixed

    • #11091: Fix util.nodes.apply_source_workaround for literal_block nodes with no source information in the node or the node's parents.

    Release 6.1.0 (released Jan 05, 2023)

    Dependencies

    Incompatible changes

    • #10979: gettext: Removed support for pluralisation in get_translation. This was unused and complicated other changes to sphinx.locale.

    Deprecated

    • sphinx.util functions:

      • Renamed sphinx.util.typing.stringify() to sphinx.util.typing.stringify_annotation()

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump numpy from 1.21.3 to 1.24.1

    Bump numpy from 1.21.3 to 1.24.1

    Bumps numpy from 1.21.3 to 1.24.1.

    Release notes

    Sourced from numpy's releases.

    v1.24.1

    NumPy 1.24.1 Release Notes

    NumPy 1.24.1 is a maintenance release that fixes bugs and regressions discovered after the 1.24.0 release. The Python versions supported by this release are 3.8-3.11.

    Contributors

    A total of 12 people contributed to this release. People with a "+" by their names contributed a patch for the first time.

    • Andrew Nelson
    • Ben Greiner +
    • Charles Harris
    • Clément Robert
    • Matteo Raso
    • Matti Picus
    • Melissa Weber Mendonça
    • Miles Cranmer
    • Ralf Gommers
    • Rohit Goswami
    • Sayed Adel
    • Sebastian Berg

    Pull requests merged

    A total of 18 pull requests were merged for this release.

    • #22820: BLD: add workaround in setup.py for newer setuptools
    • #22830: BLD: CIRRUS_TAG redux
    • #22831: DOC: fix a couple typos in 1.23 notes
    • #22832: BUG: Fix refcounting errors found using pytest-leaks
    • #22834: BUG, SIMD: Fix invalid value encountered in several ufuncs
    • #22837: TST: ignore more np.distutils.log imports
    • #22839: BUG: Do not use getdata() in np.ma.masked_invalid
    • #22847: BUG: Ensure correct behavior for rows ending in delimiter in...
    • #22848: BUG, SIMD: Fix the bitmask of the boolean comparison
    • #22857: BLD: Help raspian arm + clang 13 about __builtin_mul_overflow
    • #22858: API: Ensure a full mask is returned for masked_invalid
    • #22866: BUG: Polynomials now copy properly (#22669)
    • #22867: BUG, SIMD: Fix memory overlap in ufunc comparison loops
    • #22868: BUG: Fortify string casts against floating point warnings
    • #22875: TST: Ignore nan-warnings in randomized out tests
    • #22883: MAINT: restore npymath implementations needed for freebsd
    • #22884: BUG: Fix integer overflow in in1d for mixed integer dtypes #22877
    • #22887: BUG: Use whole file for encoding checks with charset_normalizer.

    Checksums

    ... (truncated)

    Commits
    • a28f4f2 Merge pull request #22888 from charris/prepare-1.24.1-release
    • f8fea39 REL: Prepare for the NumPY 1.24.1 release.
    • 6f491e0 Merge pull request #22887 from charris/backport-22872
    • 48f5fe4 BUG: Use whole file for encoding checks with charset_normalizer [f2py] (#22...
    • 0f3484a Merge pull request #22883 from charris/backport-22882
    • 002c60d Merge pull request #22884 from charris/backport-22878
    • 38ef9ce BUG: Fix integer overflow in in1d for mixed integer dtypes #22877 (#22878)
    • bb00c68 MAINT: restore npymath implementations needed for freebsd
    • 64e09c3 Merge pull request #22875 from charris/backport-22869
    • dc7bac6 TST: Ignore nan-warnings in randomized out tests
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump wheel from 0.37.0 to 0.38.4

    Bump wheel from 0.37.0 to 0.38.4

    Bumps wheel from 0.37.0 to 0.38.4.

    Changelog

    Sourced from wheel's changelog.

    Release Notes

    UNRELEASED

    • Updated vendored packaging to 22.0

    0.38.4 (2022-11-09)

    • Fixed PKG-INFO conversion in bdist_wheel mangling UTF-8 header values in METADATA (PR by Anderson Bravalheri)

    0.38.3 (2022-11-08)

    • Fixed install failure when used with --no-binary, reported on Ubuntu 20.04, by removing setup_requires from setup.cfg

    0.38.2 (2022-11-05)

    • Fixed regression introduced in v0.38.1 which broke parsing of wheel file names with multiple platform tags

    0.38.1 (2022-11-04)

    • Removed install dependency on setuptools
    • The future-proof fix in 0.36.0 for converting PyPy's SOABI into a abi tag was faulty. Fixed so that future changes in the SOABI will not change the tag.

    0.38.0 (2022-10-21)

    • Dropped support for Python < 3.7
    • Updated vendored packaging to 21.3
    • Replaced all uses of distutils with setuptools
    • The handling of license_files (including glob patterns and default values) is now delegated to setuptools>=57.0.0 (#466). The package dependencies were updated to reflect this change.
    • Fixed potential DoS attack via the WHEEL_INFO_RE regular expression
    • Fixed ValueError: ZIP does not support timestamps before 1980 when using SOURCE_DATE_EPOCH=0 or when on-disk timestamps are earlier than 1980-01-01. Such timestamps are now changed to the minimum value before packaging.

    0.37.1 (2021-12-22)

    • Fixed wheel pack duplicating the WHEEL contents when the build number has changed (#415)
    • Fixed parsing of file names containing commas in RECORD (PR by Hood Chatham)

    0.37.0 (2021-08-09)

    • Added official Python 3.10 support
    • Updated vendored packaging library to v20.9

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump isort from 5.9.3 to 5.11.4

    Bump isort from 5.9.3 to 5.11.4

    Bumps isort from 5.9.3 to 5.11.4.

    Release notes

    Sourced from isort's releases.

    5.11.4

    Changes

    :package: Dependencies

    5.11.3

    Changes

    :beetle: Fixes

    :construction_worker: Continuous Integration

    v5.11.3

    Changes

    :beetle: Fixes

    :construction_worker: Continuous Integration

    5.11.2

    Changes

    5.11.1

    Changes December 12 2022

    ... (truncated)

    Changelog

    Sourced from isort's changelog.

    5.11.4 December 21 2022

    5.11.3 December 16 2022

    5.11.2 December 12 2022

    5.11.1 December 12 2022

    5.11.0 December 12 2022

    ... (truncated)

    Commits
    • 98390f5 Merge pull request #2059 from PyCQA/version/5.11.4
    • df69a05 Bump version 5.11.4
    • f9add58 Merge pull request #2058 from PyCQA/deps/poetry-1.3.1
    • 36caa91 Bump Poetry 1.3.1
    • 3c2e2d0 Merge pull request #1978 from mgorny/toml-test
    • 45d6abd Remove obsolete toml import from the test suite
    • 3020e0b Merge pull request #2057 from mgorny/poetry-install
    • a6fdbfd Stop installing documentation files to top-level site-packages
    • ff306f8 Fix tag template to match old standard
    • 227c4ae Merge pull request #2052 from hugovk/main
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump black from 21.9b0 to 23.1a1

    Bump black from 21.9b0 to 23.1a1

    Bumps black from 21.9b0 to 23.1a1.

    Release notes

    Sourced from black's releases.

    23.1a1

    This release provides a preview of Black's 2023 stable style. Black's default formatting style includes the following changes:

    • Enforce empty lines before classes and functions with sticky leading comments (#3302) (22.12.0)
    • Reformat empty and whitespace-only files as either an empty file (if no newline is present) or as a single newline character (if a newline is present) (#3348) (22.12.0)
    • Implicitly concatenated strings used as function args are now wrapped inside parentheses (#3307) (22.12.0)
    • Correctly handle trailing commas that are inside a line's leading non-nested parens (#3370) (22.12.0)
    • --skip-string-normalization / -S now prevents docstring prefixes from being normalized as expected (#3168) (since 22.8.0)
    • When using --skip-magic-trailing-comma or -C, trailing commas are stripped from subscript expressions with more than 1 element (#3209) (22.8.0)
    • Implicitly concatenated strings inside a list, set, or tuple are now wrapped inside parentheses (#3162) (22.8.0)
    • Fix a string merging/split issue when a comment is present in the middle of implicitly concatenated strings on its own line (#3227) (22.8.0)
    • Docstring quotes are no longer moved if it would violate the line length limit (#3044, #3430) (22.6.0)
    • Parentheses around return annotations are now managed (#2990) (22.6.0)
    • Remove unnecessary parentheses around awaited objects (#2991) (22.6.0)
    • Remove unnecessary parentheses in with statements (#2926) (22.6.0)
    • Remove trailing newlines after code block open (#3035) (22.6.0)
    • Code cell separators #%% are now standardised to # %% (#2919) (22.3.0)
    • Remove unnecessary parentheses from except statements (#2939) (22.3.0)
    • Remove unnecessary parentheses from tuple unpacking in for loops (#2945) (22.3.0)
    • Avoid magic-trailing-comma in single-element subscripts (#2942) (22.3.0)

    Please try it out and give feedback here: psf/black#3407

    A stable 23.1.0 release will follow in January 2023.

    22.12.0

    Preview style

    • Enforce empty lines before classes and functions with sticky leading comments (#3302)
    • Reformat empty and whitespace-only files as either an empty file (if no newline is present) or as a single newline character (if a newline is present) (#3348)
    • Implicitly concatenated strings used as function args are now wrapped inside parentheses (#3307)
    • Correctly handle trailing commas that are inside a line's leading non-nested parens (#3370)

    Configuration

    ... (truncated)

    Changelog

    Sourced from black's changelog.

    Change Log

    Unreleased

    Highlights

    Stable style

    • Fix a crash when a colon line is marked between # fmt: off and # fmt: on (#3439)

    Preview style

    • Fix a crash in preview style with assert + parenthesized string (#3415)
    • Fix crashes in preview style with walrus operators used in function return annotations and except clauses (#3423)
    • Do not put the closing quotes in a docstring on a separate line, even if the line is too long (#3430)
    • Long values in dict literals are now wrapped in parentheses; correspondingly unnecessary parentheses around short values in dict literals are now removed; long string lambda values are now wrapped in parentheses (#3440)

    Configuration

    Packaging

    • Upgrade mypyc from 0.971 to 0.991 so mypycified Black can be built on armv7 (#3380)
    • Drop specific support for the tomli requirement on 3.11 alpha releases, working around a bug that would cause the requirement not to be installed on any non-final Python releases (#3448)

    Parser

    Performance

    Output

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump torch from 1.10.0 to 1.13.1

    Bump torch from 1.10.0 to 1.13.1

    Bumps torch from 1.10.0 to 1.13.1.

    Release notes

    Sourced from torch's releases.

    PyTorch 1.13.1 Release, small bug fix release

    This release is meant to fix the following issues (regressions / silent correctness):

    • RuntimeError by torch.nn.modules.activation.MultiheadAttention with bias=False and batch_first=True #88669
    • Installation via pip on Amazon Linux 2, regression #88869
    • Installation using poetry on Mac M1, failure #88049
    • Missing masked tensor documentation #89734
    • torch.jit.annotations.parse_type_line is not safe (command injection) #88868
    • Use the Python frame safely in _pythonCallstack #88993
    • Double-backward with full_backward_hook causes RuntimeError #88312
    • Fix logical error in get_default_qat_qconfig #88876
    • Fix cuda/cpu check on NoneType and unit test #88854 and #88970
    • Onnx ATen Fallback for BUILD_CAFFE2=0 for ONNX-only ops #88504
    • Onnx operator_export_type on the new registry #87735
    • torchrun AttributeError caused by file_based_local_timer on Windows #85427

    The release tracker should contain all relevant pull requests related to this release as well as links to related issues

    PyTorch 1.13: beta versions of functorch and improved support for Apple’s new M1 chips are now available

    Pytorch 1.13 Release Notes

    • Highlights
    • Backwards Incompatible Changes
    • New Features
    • Improvements
    • Performance
    • Documentation
    • Developers

    Highlights

    We are excited to announce the release of PyTorch 1.13! This includes stable versions of BetterTransformer. We deprecated CUDA 10.2 and 11.3 and completed migration of CUDA 11.6 and 11.7. Beta includes improved support for Apple M1 chips and functorch, a library that offers composable vmap (vectorization) and autodiff transforms, being included in-tree with the PyTorch release. This release is composed of over 3,749 commits and 467 contributors since 1.12.1. We want to sincerely thank our dedicated community for your contributions.

    Summary:

    • The BetterTransformer feature set supports fastpath execution for common Transformer models during Inference out-of-the-box, without the need to modify the model. Additional improvements include accelerated add+matmul linear algebra kernels for sizes commonly used in Transformer models and Nested Tensors is now enabled by default.

    • Timely deprecating older CUDA versions allows us to proceed with introducing the latest CUDA version as they are introduced by Nvidia®, and hence allows support for C++17 in PyTorch and new NVIDIA Open GPU Kernel Modules.

    • Previously, functorch was released out-of-tree in a separate package. After installing PyTorch, a user will be able to import functorch and use functorch without needing to install another package.

    • PyTorch is offering native builds for Apple® silicon machines that use Apple's new M1 chip as a beta feature, providing improved support across PyTorch's APIs.

    Stable Beta Prototype
    Better TransformerCUDA 10.2 and 11.3 CI/CD Deprecation Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIsExtend NNC to support channels last and bf16Functorch now in PyTorch Core LibraryBeta Support for M1 devices Arm® Compute Library backend support for AWS Graviton CUDA Sanitizer

    You can check the blogpost that shows the new features here.

    Backwards Incompatible changes

    ... (truncated)

    Changelog

    Sourced from torch's changelog.

    Releasing PyTorch

    General Overview

    Releasing a new version of PyTorch generally entails 3 major steps:

    1. Cutting a release branch preparations
    2. Cutting a release branch and making release branch specific changes
    3. Drafting RCs (Release Candidates), and merging cherry picks
    4. Promoting RCs to stable and performing release day tasks

    Cutting a release branch preparations

    Following Requirements needs to be met prior to final RC Cut:

    • Resolve all outstanding issues in the milestones(for example 1.11.0)before first RC cut is completed. After RC cut is completed following script should be executed from builder repo in order to validate the presence of the fixes in the release branch :

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
Releases(v0.3.0)
  • v0.3.0(Oct 31, 2021)

  • v0.2.0(Oct 26, 2021)

    Changes

    • Drop RAdam optimizer since it is included in pytorch.
    • Do not include tests as installable package.
    • Preserver memory layout where possible.
    • Add MADGRAD optimizer.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Jan 1, 2021)

    Changes

    • Initial release.
    • Added support for A2GradExp, A2GradInc, A2GradUni, AccSGD, AdaBelief, AdaBound, AdaMod, Adafactor, Adahessian, AdamP, AggMo, Apollo, DiffGrad, Lamb, Lookahead, NovoGrad, PID, QHAdam, QHM, RAdam, Ranger, RangerQH, RangerVA, SGDP, SGDW, SWATS, Shampoo, Yogi.
    Source code(tar.gz)
    Source code(zip)
  • v0.0.1a17(Nov 27, 2020)

    Changes

    4357dcf Better deploy workflow 9dfd2fc Add deploy workflow (#230) 31f60c9 Move to GitHub actions (#228) 7276b69 More test coverage for params validation. Add weight decay validaiotn for SWATS (#225) 2e013a0 Better test of lr value validation. ccc920d Bump sphinx from 3.3.0 to 3.3.1 788d1e8 Bump matplotlib from 3.3.2 to 3.3.3 3adf578 Add optimizer selection warning (#222) 5b2b59e Bump numpy from 1.19.3 to 1.19.4 9123897 Bump sphinx from 3.2.1 to 3.3.0 cc8acc8 Add apollo optimizer to the readme (#218) 1dcec1d Add apollo optimizer test 03e6dad Add apollo optimizer implemenation 15cc6da (add-apollo-optimizer) Merge branch 'master' of github.com:jettify/pytorch-optimizer ce7b51c Bump mypy from 0.782 to 0.790 3da1672 Bump numpy from 1.19.2 to 1.19.3 (#216) 6a1f550 Bump pytest from 6.1.1 to 6.1.2 3327ebb Bump torchvision from 0.7.0 to 0.8.1 40a7ba5 Bump torch from 1.6.0 to 1.7.0 a61c8ad Correct links to the A2Grad paper (#211) 8ce32cf bump version

    Source code(tar.gz)
    Source code(zip)
  • v0.0.1a16(Oct 20, 2020)

    Changes

    efeea8f Add adabelief to the readme. (#210) 9c72aa0 Add adabelief optimizer (#209) 0d94e4e Update CONTRIBUTING.rst (#207) 3a4abcd Bump sphinx-autodoc-typehints from 1.11.0 to 1.11.1 f08f793 Update readme with a2grad optimizers (#204) 9003a68 Add A2GradInc and A2GradExp optimizers. (#203) d221899 Bump hyperopt from 0.2.4 to 0.2.5 35f14f6 Bump ipdb from 0.13.3 to 0.13.4 1005b6b Bump flake8 from 3.8.3 to 3.8.4 9ad5102 Bump pytest from 6.1.0 to 6.1.1 bd71c05 Add a2grad optimizer (#199) ba60ddb Bump pytest from 6.0.2 to 6.1.0 (#197) 1e142d4 Bump matplotlib from 3.3.1 to 3.3.2 02f0d90 Bump pytest from 6.0.1 to 6.0.2 1ba47a8 Bump numpy from 1.19.1 to 1.19.2 be69ae7 Add adafactor test case. (#192) db002c6 Merge pull request #191 from matech96/patch-1 7edf138 yogi doc fix a2905c5 Merge pull request #190 from jettify/update-dock-with-new-optimizers d31dbc9 (origin/update-dock-with-new-optimizers, update-dock-with-new-optimizers) Update docs with new optimizers. 2308555 Merge pull request #189 from jettify/add-adafactor-optimizer ae0118b (origin/add-adafactor-optimizer, add-adafactor-optimizer) Add adafactor to the readme. f7237ee Add adafactor tests dcaa63f Add adafactor implementation b6fdd4e Merge pull request #186 from jettify/fix-warning-in-sgdp-and-adamp e1c2d2d (origin/fix-warning-in-sgdp-and-adamp, fix-warning-in-sgdp-and-adamp) Fix warnign in adamp 7eaa2f3 Fix warnings in sgdp 75d625a Fix warnings in adabound (#184) 30ef850 Bump black from 19.10b0 to 20.8b1 a149682 Fix warnings in qhadam (#182) 8787b6b Add swats to the readme (#181) c7bb0fd Add SWATS optimizer (#178) 5835261 Bump pytest-cov from 2.10.0 to 2.10.1 519473d Bump sphinx from 3.2.0 to 3.2.1 1b705d0 Fix missed warnings in case weight_decay specified (#177) 17c754e Fix warnings in shampoo (#176) 13db710 Fix warnings in accsgd (#175) 51ffdc6 Bump matplotlib from 3.3.0 to 3.3.1 913543a Fix warnings in yogi optimizer. (#173) ce54a95 Fix warnings in adamod (#172) a144f4b bump version

    Source code(tar.gz)
    Source code(zip)
  • v0.0.1a15(Aug 11, 2020)

    Changes

    279d420 Bump sphinx from 3.1.2 to 3.2.0 e345295 Fix warnings in sgdw (#168) 077c72d Fix warnings in qhm (#167) b6d87b2 Fix warning in radam (#166) 03ee2a4 Fix warning in novograd optimizer. (#165) 19026f5 Resolve warnings in pid optimizer (#164) 92ac42e Fix warnings in lamb optimizer (#163) 0df1686 Fix pytorch 1.6.0 compatibility (#162) 6a74cee Reformat code and sort imports (#160) 33a8bbe Add aggmo to the readme (#159) b4cc233 Bump pytest from 6.0.0 to 6.0.1 a48e955 More robust setup.py and bump numpy version (#157) a2749e2 Bump torchvision from 0.6.1 to 0.7.0 b20d9b3 Bump pytest from 5.4.3 to 6.0.0 a545011 Add aggmo optimizer (#153) 1552465 Bump matplotlib from 3.2.2 to 3.3.0 72d4e35 Bump dev version

    Source code(tar.gz)
    Source code(zip)
  • v0.0.1a14(Jul 13, 2020)

    e90a185 Bump version e21b422 Bump sphinx from 3.1.1 to 3.1.2 1b70e4e Fix numpy version (#148) 9a9d233 Add SGDP optimizer (#145) 8452433 Bump numpy from 1.18.5 to 1.19.0 40c0723 Bump ipython from 7.15.0 to 7.16.1 a891371 Bump ipdb from 0.13.2 to 0.13.3 8f5c382 Add AdamP optimizer (#133) cae9bde Bump mypy from 0.781 to 0.782 17da984 Bump sphinx-autodoc-typehints from 1.10.3 to 1.11.0 387581d Bump mypy from 0.780 to 0.781 a291360 Bump torchvision from 0.6.0 to 0.6.1 fd5badc Bump torch from 1.5.0 to 1.5.1 144e72e Bump matplotlib from 3.2.1 to 3.2.2

    Source code(tar.gz)
    Source code(zip)
  • v0.0.1a13(Jun 17, 2020)

    Changes

    3c0dd77 Add documentation reference to the README. 4e5548e Bump pytest-cov from 2.9.0 to 2.10.0 861aa98 Bump sphinx from 3.1.0 to 3.1.1 219ae8a Bump flake8 from 3.8.2 to 3.8.3 81107ab Bump sphinx from 3.0.4 to 3.1.0 e3b059c Add lookagead optimizer to the README. (#128) 8ab662a Prepare lookahead optimizer for release. 29bd821 Bump ipython from 7.14.0 to 7.15.0 5f7e4e8 Bump mypy from 0.770 to 0.780 982a052 Bump numpy from 1.18.4 to 1.18.5 1e41844 Bump pytest from 5.4.2 to 5.4.3 96f7257 Add doc string and reformat code. da9e0f3 Fix linter. 8fda937 Add project URLs ba33395 Expose shampoo properly 6d194c1 Bump sphinx from 3.0.3 to 3.0.4 b3adae7 Refresh docs (#122) bfa8358 Bump flake8 from 3.8.1 to 3.8.2 2fd60a8 Bump pytest-cov from 2.8.1 to 2.9.0 542dd46 Add read the docs badge c77f808 Make tests less flaky (#119) 18742bf Bump flake8-quotes from 3.0.0 to 3.2.0 267b8c5 Add python3.8 to setup.py d14a85c Bump flake8 from 3.7.9 to 3.8.1 2d9da19 Update diffgrad.py (#118) ed84071 Bump pytest from 5.4.1 to 5.4.2 690ad22 Add Shampoo optimizer to the readme. (#114) 590b288 Bump ipython from 7.13.0 to 7.14.0 096b021 Bump numpy from 1.18.3 to 1.18.4 7875299 Add shampoo optimizer. (#110) fff339d Bump sphinx from 3.0.2 to 3.0.3 86664f5 Try py3.8 (#109) 87d9ccf Bump dev version

    Source code(tar.gz)
    Source code(zip)
  • v0.0.1a12(Apr 26, 2020)

    Changes

    63393ae Bump hyperopt from 0.2.3 to 0.2.4 bfc46c4 Bump torchvision from 0.5.0 to 0.6.0 660d831 Bump torch from 1.4.0 to 1.5.0 f21c85a Bump numpy from 1.18.2 to 1.18.3 e8d4866 Bump sphinx from 3.0.1 to 3.0.2 2cfbf20 RAdam fix for issue #96. (#103) df65965 Bump sphinx from 3.0.0 to 3.0.1 19408f5 Return type hint fixed for torch_optimizer.get (#101) 243509b Bump sphinx from 2.4.4 to 3.0.0 0320faa Raise exception if optimizer not found. Fix few mypy types. (#98) 05b2da5 Bump dev version

    Source code(tar.gz)
    Source code(zip)
  • v0.0.1a11(Apr 5, 2020)

    Changes

    e2425ec Rework get optimizer function (#97) 4376be1 Add torch_optimizer.get() method (#95) fd4f0a9 Add ranger optimizer to the readme (#94) f0e1f7c Add Rangers (#93) 40b832c Add keywords to setup.py af6d6e1 Add reference code links in doc strings 1bd385d Bump flake8-quotes from 2.1.1 to 3.0.0 4997fa3 Documentation update and README fixes. a7e330e Minor doc string cleanup and fix constant for QHM cb7bdb9 Add QHAdam to the readme (#91) 5155f0b Update docs with new optimizers. (#90) 3ef2b4f Bump matplotlib from 3.2.0 to 3.2.1 7aaaafa Better test coverage for QHAdam optimizer (#88) 7ab6b72 Add QHAdam optimizer (#87) ad59d3f Bump pytest from 5.4.0 to 5.4.1 a5e0fdc Bump version

    Source code(tar.gz)
    Source code(zip)
  • v0.0.1a10(Mar 15, 2020)

    Changes

    71e9bb4 Add Python3.5 support (#85) 071a73e apply black 3447833 Bump pytest from 5.3.5 to 5.4.0 b2e587a Bump mypy from 0.761 to 0.770 64e8e48 Add QHM optimizer to the readme. (#82) 86c2587 Better test coverage for QHM (#81) a3609b4 Fix readme formatting 4b7d230 added Adam & SGD images to README and viz script (#80) 697fdb1 Add comments about yogi optimizer initialization #77 (#79) 95b0f91 Add deepsourde badge. fd57585 Minor style changes. 5bcc51f Bump version and minor tweaks in setup.py 7003d6c Less error prone parameter validation. (#78) abb84d3 Add QHM basic implementation. (#73) ea1d413 Bump sphinx from 2.4.3 to 2.4.4 6a2e8f8 Tweak linter configuration and address one issue in setup.py (#74) d6a52ae Bump matplotlib from 3.1.3 to 3.2.0 fa961b7 Add PID optimizer to the list of supported in README. (#70) c84d1ea Add .deepsource.toml 76484ff Bump ipdb from 0.13.1 to 0.13.2

    Source code(tar.gz)
    Source code(zip)
  • v0.0.1a9(Mar 4, 2020)

    9b80b11 Better code coverage for PID optimizer (#68) a3f2fdc Change default values for yogi optimizer (#62) 564584e Add grad_clip to the example and re-tune p. for all methods (#67) 3391056 Add clamp_value (float) and debias (bool) parameters to LAMB optimizer (#65) 92d6818 Add PID optimizer (#66)

    Source code(tar.gz)
    Source code(zip)
  • v0.0.1a8(Mar 2, 2020)

  • v0.0.1a7(Feb 27, 2020)

  • v0.0.1a6(Feb 22, 2020)

  • v0.0.1a5(Feb 15, 2020)

  • v0.0.1a4(Feb 11, 2020)

  • v0.0.1a3(Feb 9, 2020)

  • v0.0.1a2(Feb 3, 2020)

  • v0.0.1a1(Jan 22, 2020)

Owner
Nikolay Novik
Nikolay Novik
A pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch.

Compact Bilinear Pooling for PyTorch. This repository has a pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch. This

Grégoire Payen de La Garanderie 234 Dec 07, 2022
Riemannian Adaptive Optimization Methods with pytorch optim

geoopt Manifold aware pytorch.optim. Unofficial implementation for “Riemannian Adaptive Optimization Methods” ICLR2019 and more. Installation Make sur

642 Jan 03, 2023
OptNet: Differentiable Optimization as a Layer in Neural Networks

OptNet: Differentiable Optimization as a Layer in Neural Networks This repository is by Brandon Amos and J. Zico Kolter and contains the PyTorch sourc

CMU Locus Lab 428 Dec 24, 2022
higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

higher is a library providing support for higher-order optimization, e.g. through unrolled first-order optimization loops, of "meta" aspects of these

Facebook Research 1.5k Jan 03, 2023
Code for paper "Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking"

model_based_energy_constrained_compression Code for paper "Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and

Haichuan Yang 16 Jun 15, 2022
This is an differentiable pytorch implementation of SIFT patch descriptor.

This is an differentiable pytorch implementation of SIFT patch descriptor. It is very slow for describing one patch, but quite fast for batch. It can

Dmytro Mishkin 150 Dec 24, 2022
PyGCL: Graph Contrastive Learning Library for PyTorch

PyGCL is an open-source library for graph contrastive learning (GCL), which features modularized GCL components from published papers, standardized evaluation, and experiment management.

GCL: Graph Contrastive Learning Library for PyTorch 592 Jan 07, 2023
pip install antialiased-cnns to improve stability and accuracy

Antialiased CNNs [Project Page] [Paper] [Talk] Making Convolutional Networks Shift-Invariant Again Richard Zhang. In ICML, 2019. Quick & easy start Ru

Adobe, Inc. 1.6k Dec 28, 2022
PyTorch Extension Library of Optimized Autograd Sparse Matrix Operations

PyTorch Sparse This package consists of a small extension library of optimized sparse matrix operations with autograd support. This package currently

Matthias Fey 757 Jan 04, 2023
PyTorch Extension Library of Optimized Scatter Operations

PyTorch Scatter Documentation This package consists of a small extension library of highly optimized sparse update (scatter and segment) operations fo

Matthias Fey 1.2k Jan 07, 2023
Implements pytorch code for the Accelerated SGD algorithm.

AccSGD This is the code associated with Accelerated SGD algorithm used in the paper On the insufficiency of existing momentum schemes for Stochastic O

205 Jan 02, 2023
Pretrained EfficientNet, EfficientNet-Lite, MixNet, MobileNetV3 / V2, MNASNet A1 and B1, FBNet, Single-Path NAS

(Generic) EfficientNets for PyTorch A 'generic' implementation of EfficientNet, MixNet, MobileNetV3, etc. that covers most of the compute/parameter ef

Ross Wightman 1.5k Jan 01, 2023
An optimizer that trains as fast as Adam and as good as SGD.

AdaBound An optimizer that trains as fast as Adam and as good as SGD, for developing state-of-the-art deep learning models on a wide variety of popula

LoLo 2.9k Dec 27, 2022
On the Variance of the Adaptive Learning Rate and Beyond

RAdam On the Variance of the Adaptive Learning Rate and Beyond We are in an early-release beta. Expect some adventures and rough edges. Table of Conte

Liyuan Liu 2.5k Dec 27, 2022
Code snippets created for the PyTorch discussion board

PyTorch misc Collection of code snippets I've written for the PyTorch discussion board. All scripts were testes using the PyTorch 1.0 preview and torc

461 Dec 26, 2022
A simplified framework and utilities for PyTorch

Here is Poutyne. Poutyne is a simplified framework for PyTorch and handles much of the boilerplating code needed to train neural networks. Use Poutyne

GRAAL/GRAIL 534 Dec 17, 2022
PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

PyTorch implementation of [1611.06440 Pruning Convolutional Neural Networks for Resource Efficient Inference] This demonstrates pruning a VGG16 based

Jacob Gildenblat 836 Dec 26, 2022
A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Intro PyTorch implementation of Learning to learn by gradient descent by gradient descent. Run python main.py TODO Initial implementation Toy data LST

Ilya Kostrikov 300 Dec 11, 2022
A code copied from google-research which named motion-imitation was rewrited with PyTorch

motor-system Introduction A code copied from google-research which named motion-imitation was rewrited with PyTorch. More details can get from this pr

NewEra 6 Jan 08, 2022
270 Dec 24, 2022