Riemannian Adaptive Optimization Methods with pytorch optim

Overview

geoopt

Python Package Index Documentation Status Build Status Coverage Status Codestyle Black Gitter

Manifold aware pytorch.optim.

Unofficial implementation for “Riemannian Adaptive Optimization Methods” ICLR2019 and more.

Installation

Make sure you have pytorch>=1.9.0 installed

There are two ways to install geoopt:

  1. GitHub (preferred so far) due to active development
pip install git+https://github.com/geoopt/geoopt.git
  1. pypi (this might be significantly behind master branch)
pip install geoopt

The preferred way to install geoopt will change once stable project stage is achieved. Now, pypi is behind master as we actively develop and implement new features.

PyTorch Support

Geoopt officially supports 2 latest stable versions (1.9.0 so far) of pytorch upstream or the latest major release. We also test (TODO: there were complications with github workflows, need help) against the nightly build, but do not be 100% sure about compatibility. As for older pytorch versions, you may use it on your own risk (do not forget to run tests).

What is done so far

Work is in progress but you can already use this. Note that API might change in future releases.

Tensors

  • geoopt.ManifoldTensor – just as torch.Tensor with additional manifold keyword argument.
  • geoopt.ManifoldParameter – same as above, recognized in torch.nn.Module.parameters as correctly subclassed.

All above containers have special methods to work with them as with points on a certain manifold

  • .proj_() – inplace projection on the manifold.
  • .proju(u) – project vector u on the tangent space. You need to project all vectors for all methods below.
  • .egrad2rgrad(u) – project gradient u on Riemannian manifold
  • .inner(u, v=None) – inner product at this point for two tangent vectors at this point. The passed vectors are not projected, they are assumed to be already projected.
  • .retr(u) – retraction map following vector u
  • .expmap(u) – exponential map following vector u (if expmap is not available in closed form, best approximation is used)
  • .transp(v, u) – transport vector v with direction u
  • .retr_transp(v, u) – transport self, vector v (and possibly more vectors) with direction u (returns are plain tensors)

Manifolds

  • geoopt.Euclidean – unconstrained manifold in R with Euclidean metric
  • geoopt.Stiefel – Stiefel manifold on matrices A in R^{n x p} : A^t A=I, n >= p
  • geoopt.Sphere - Sphere manifold ||x||=1
  • geoopt.BirkhoffPolytope - manifold of Doubly Stochastic matrices
  • geoopt.Stereographic - Constant curvature stereographic projection model
  • geoopt.SphereProjection - Sphere stereographic projection model
  • geoopt.PoincareBall - Poincare ball model
  • geoopt.Lorentz - Hyperboloid model
  • geoopt.ProductManifold - Product manifold constructor
  • geoopt.Scaled - Scaled version of the manifold. Similar to Learning Mixed-Curvature Representations in Product Spaces if combined with ProductManifold
  • geoopt.SymmetricPositiveDefinite - SPD matrix manifold
  • geoopt.UpperHalf - Siegel Upper half manifold. Supports Riemannian and Finsler metrics, as in Symmetric Spaces for Graph Embeddings: A Finsler-Riemannian Approach.
  • geoopt.BoundedDomain - Siegel Bounded domain manifold. Supports Riemannian and Finsler metrics.

All manifolds implement methods necessary to manipulate tensors on manifolds and tangent vectors to be used in general purpose. See more in documentation.

Optimizers

  • geoopt.optim.RiemannianSGD – a subclass of torch.optim.SGD with the same API
  • geoopt.optim.RiemannianAdam – a subclass of torch.optim.Adam

Samplers

  • geoopt.samplers.RSGLD – Riemannian Stochastic Gradient Langevin Dynamics
  • geoopt.samplers.RHMC – Riemannian Hamiltonian Monte-Carlo
  • geoopt.samplers.SGRHMC – Stochastic Gradient Riemannian Hamiltonian Monte-Carlo

Citing Geoopt

If you find this project useful in your research, please kindly add this bibtex entry in references and cite.

@misc{geoopt2020kochurov,
    title={Geoopt: Riemannian Optimization in PyTorch},
    author={Max Kochurov and Rasul Karimov and Serge Kozlukov},
    year={2020},
    eprint={2005.02819},
    archivePrefix={arXiv},
    primaryClass={cs.CG}
}
Comments
  • Line search

    Line search

    I made a Riemannian line search optimizer with strong Wolfe conditions.

    It's not yet perfect. I think it makes some redundant calls to closure during stepping, and when it's close to a local minimum it suffers from numerical errors and can sometimes take strange steps

    opened by RikVoorhaar 26
  • Poincare ball model

    Poincare ball model

    Huray, we are ready to start.

    image Image from here

    Interesting reading I've done so far:

    Hyperbolic Networks Hyperbolic Entailment Cones for Learning Hierarchical Embeddings Poincaré GloVe: Hyperbolic Word Embeddings

    Some implementation takeouts (mostly from here):

    • Project results of all operations in the ball of radius 1 − eps, where eps = 10^{-5}
    • Numerical errors also appear when hyperbolic vectors get closer to 0, perturb with eps = 10^{-15}
    • Pass clipped to [-15, 15] input to tanh and clip tanh^{-1} to [-1+eps, 1-eps]

    CC @leuchine!

    opened by ferrine 23
  • StereographicProductManifold to use gyrovector space functions in product manifolds

    StereographicProductManifold to use gyrovector space functions in product manifolds

    Hi,

    I subclassed the ProductManifold and created a StereographicProductManifold. Here I added the functions dist2plane, expmap0and mobius_add by calling the respective functions in the underlying Stereographic manifolds. I added some test functions for this. Additionally, I added wrapped_normal to Stereographic as an alternative random function and I added scipy to the requirements in setup.py as mentioned in #161

    opened by gatoniel 20
  • $c$ counter-inuitively stands for **negative** curvature

    $c$ counter-inuitively stands for **negative** curvature

    Currently the c parameter in PoincareBall defines negative curvature though it is most natural to assume c to be just curvature. That is unconventional, misleading, and is likely to cause inconsistencies some time later

    opened by SomeoneSerge 17
  • How to properly use ManifoldParameter

    How to properly use ManifoldParameter

    I currently use the following piece of code:

    if self.init_id:
        init = torch.eye(input_dims[0]).view((input_dims[0], input_dims[0], 1, 1))
    else:
        init = torch.randn(input_dims[0], input_dims[0])# + input_dims[0]*torch.eye(input_dims[0])
    self.orth_w = geoopt.ManifoldParameter(init, manifold=stiefel_man)
    

    but if init is random, it won't be on the stiefel-manifold! So i tried self.orth_w.proj_(), but it tells me:

    RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.

    I can guard the .proj_() with a with torch.no_grad(), but I don't know whether this is the right approach.

    I just want to do a straightforward matrix-multiplication on the forward pass and want geoopt to not leave the stiefel-manifold during my training.

    opened by LeanderK 12
  • Hyperboloid

    Hyperboloid

    There are some numerical issues, and I'm starting to think that there is no easy solution for it at this point. Maybe something like this paper should be investigated more.

    • [x] Add curvature
    • [x] Add tests
    • [x] Add an example with hgnn/attention [example with basic usage]
    • [x] Add docs

    Example with optimization should be added on the separate PR

    opened by rrkarim 11
  • Not working properly with CUDA

    Not working properly with CUDA

    I am working with a model similar to the example below:

    class Model(nn.Module):
        def __init__(self, word2vec):
            super(Model, self).__init__()
            self.word_lut = gt.ManifoldParameter(word2vec, manifold=gt.PoincareBall())
    

    When I move the model to CUDA, and run it there is a problem during the optimization (See full traceback below [1]): RuntimeError: Expected object of backend CUDA but got backend CPU for argument #2 'other'

    The reason of the error is that when I move the model to CUDA, the tensor in the ManifoldParameter is moved, but its manifold is not, therefore the attribute tensor self.c of the manifold is still allocated in the CPU.

    I solved it doing this:

    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")    
    for p in model.parameters():
        if isinstance(p, (ManifoldParameter, ManifoldTensor)):
            p.manifold.to(device)
    

    I don't know if issue #49 was an attempt to solve this and I didn't moved the model in the proper way to CUDA or this is not completely documented. In case that the explanation on how to do this properly exists and I couldn't find it, my apologies for creating this issue.

    [1] Full traceback

    Traceback (most recent call last):
      File "./train.py", line 122, in <module>
        main()
      File "./train.py", line 113, in main
        coach.train()
      File "/home/lopezfo/projects/hyfi/hyfi/Coach.py", line 43, in train
        train_loss = self.train_epoch(epoch)
      File "/home/lopezfo/projects/hyfi/hyfi/Coach.py", line 99, in train_epoch
        self.optim.step()
      File "/home/lopezfo/anaconda3/envs/hyfi/lib/python3.6/site-packages/geoopt/optim/radam.py", line 145, in step
        state["exp_avg_sq"],
      File "/home/lopezfo/anaconda3/envs/hyfi/lib/python3.6/site-packages/geoopt/optim/tracing.py", line 34, in partial
        step(manifold, *args, **kwargs)
      File "/home/lopezfo/anaconda3/envs/hyfi/lib/python3.6/site-packages/geoopt/optim/radam.py", line 191, in perform_step
        point, -step_size * direction, exp_avg
      File "/home/lopezfo/anaconda3/envs/hyfi/lib/python3.6/site-packages/geoopt/manifolds/poincare/__init__.py", line 110, in retr_transp
        y = self.retr(x, u)
      File "/home/lopezfo/anaconda3/envs/hyfi/lib/python3.6/site-packages/geoopt/manifolds/poincare/__init__.py", line 69, in retr
        return math.project(approx, c=self.c)
      File "/home/lopezfo/anaconda3/envs/hyfi/lib/python3.6/site-packages/geoopt/manifolds/poincare/math.py", line 78, in project
        return _project(x, c, dim, eps)
      File "/home/lopezfo/anaconda3/envs/hyfi/lib/python3.6/site-packages/geoopt/manifolds/poincare/math.py", line 86, in _project
        cond = norm > maxnorm
    RuntimeError: Expected object of backend CUDA but got backend CPU for argument #2 'other'
    
    opened by fedelopez77 11
  • implementation of SPD manifolds

    implementation of SPD manifolds

    I'm trying to implement the SPD manifolds, which is widely used in many papers.

    Current commits have implemented all abstract methods of base.Manifold, while more method implementations are on the way. Some test has been done locally and constructions of the testing module are also on the way.

    Progress

    • [x] move symmetric matrix operations to batch_linalg
    • [x] keepdims functionality for inner and _stein_metric.
    • [x] mention in documentation about SPD manifolds.
    • [x] implementation for random and origin
    • [x] manifold test module.
      • [x] shape case for a symmetric positive-definite matrix.
      • [x] test basic operation of SPD manifolds.
      • [x] simple optimization problem on SPD manifolds.
    • [ ] mention the PR in the CHANGELOG.

    Reference Implementation

    Some paper using SPD Manifolds

    • Computationally Tractable Riemannian Manifolds for Graph Embeddings [arxiv]
    • A Riemannian Network for SPD Matrix Learning [arxiv]

    All suggestions will be accepted with an open mind. Hoping this pull request will be merged when all works have done.

    enhancement 
    opened by tao-harald 10
  • geoopt.optim.RiemannianSGD does not work with Distributed Data Parallel

    geoopt.optim.RiemannianSGD does not work with Distributed Data Parallel

    First of all, thank you for this library!

    Description of the bug

    When training with Distributed Data Parallel (DDP), the gradient between different devices is not correctly synchronized when using RiemannianSGD (or RiemannianAdam). Replacing it with a standard torch.optim.SGD works well. Note that when using DDP the gradient is synchronized during .backprop() (see this link).

    To Reproduce

    Simple code training on ImageNet:

    import os
    
    import geoopt
    import torch
    import torch.distributed
    import torch.multiprocessing as mp
    import torchvision
    import torchvision.models as models
    from torch.nn.parallel import DistributedDataParallel as DDP
    from torch.utils import data
    from torchvision import transforms
    
    
    def process_ddp(master_port, local_rank, world_size):
    
        os.environ['MASTER_ADDR'] = 'localhost'
        os.environ['MASTER_PORT'] = str(master_port)
        torch.cuda.set_device(local_rank)
        device = torch.device("cuda", local_rank)
        torch.distributed.init_process_group("nccl", rank=local_rank, world_size=world_size, init_method='env://')
        assert world_size == torch.distributed.get_world_size()
    
        return device
    
    
    def main(local_rank, world_size):
    
        path_dataset = '/path/to/ImageNet'  # Any other dataset should result in a similar behavior
        master_port = 9999
    
        device = process_ddp(master_port, local_rank, world_size)
    
        model = models.resnet18()
        model = model.to(device)
    
        # optimizer = geoopt.optim.RiemannianSGD(model.parameters(), lr=0.1, stabilize=10)
        optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
    
        # Data parallelization
        model = DDP(model, device_ids=[local_rank], output_device=local_rank)
    
        # Prepare dataset
        transform = transforms.Compose([
            transforms.CenterCrop(size=256),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
        ])
        dataset = torchvision.datasets.ImageNet(split='train', root=path_dataset, transform=transform)
        sampler = torch.utils.data.distributed.DistributedSampler(dataset, shuffle=True)
        data_loader = torch.utils.data.DataLoader(dataset, batch_size=4, sampler=sampler, shuffle=False, num_workers=8)
    
        # Train part of the first epoch
        model.train()
    
        for idx, (images, labels) in enumerate(data_loader):
            if idx >= 10:
                break
            images = images.to(device)
            labels = labels.to(device)
    
            with torch.set_grad_enabled(True):
                features = model(images)
                loss = torch.nn.functional.cross_entropy(features, labels)
    
            loss.backward()
            print(f'grad iteration {idx} on gpu {device}: {model.module.conv1.weight.grad.mean()}', flush=True)
            optimizer.step()
            optimizer.zero_grad()
            print(f'weight iteration {idx} on gpu {device}: {model.module.conv1.weight.mean()}', flush=True)
    
        # cleanup
        torch.distributed.destroy_process_group()
    
    
    if __name__ == '__main__':
        world_size_main = torch.cuda.device_count()
        mp.spawn(main,
                 args=(world_size_main,),
                 nprocs=world_size_main,
                 join=True)
    

    In order to run, use: CUDA_VISIBLE_DEVICES=0,1 python run.py

    Expected behavior

    The expected behavior is the one that occurs when the line optimizer = torch.optim.SGD(model.parameters(), lr=0.1) is uncommented, and the line optimizer = geoopt.optim.RiemannianSGD(model.parameters(), lr=0.1, , stabilize=10) is commented. In that case, the output is:

    grad iteration 0 on gpu cuda:1: 0.011769304051995277
    grad iteration 0 on gpu cuda:0: 0.011769304051995277
    weight iteration 0 on gpu cuda:1: -0.001249525579623878
    weight iteration 0 on gpu cuda:0: -0.001249525579623878
    grad iteration 1 on gpu cuda:0: -0.015764284878969193
    grad iteration 1 on gpu cuda:1: -0.015764284878969193
    weight iteration 1 on gpu cuda:1: 0.0003269027511123568
    weight iteration 1 on gpu cuda:0: 0.0003269027511123568
    grad iteration 2 on gpu cuda:1: -0.006310341879725456
    grad iteration 2 on gpu cuda:0: -0.006310341879725456
    weight iteration 2 on gpu cuda:1: 0.000957937038037926
    weight iteration 2 on gpu cuda:0: 0.000957937038037926
    grad iteration 3 on gpu cuda:0: 0.0021547293290495872
    grad iteration 3 on gpu cuda:1: 0.0021547293290495872
    weight iteration 3 on gpu cuda:1: 0.000742464151699096
    weight iteration 3 on gpu cuda:0: 0.000742464151699096
    grad iteration 4 on gpu cuda:1: -0.002606849418953061
    grad iteration 4 on gpu cuda:0: -0.002606849418953061
    weight iteration 4 on gpu cuda:1: 0.001003148965537548
    weight iteration 4 on gpu cuda:0: 0.001003148965537548
    grad iteration 5 on gpu cuda:1: 0.00043087091762572527
    grad iteration 5 on gpu cuda:0: 0.00043087091762572527
    weight iteration 5 on gpu cuda:1: 0.0009600619086995721
    weight iteration 5 on gpu cuda:0: 0.0009600619086995721
    grad iteration 6 on gpu cuda:0: 0.00014396056940313429
    grad iteration 6 on gpu cuda:1: 0.00014396056940313429
    weight iteration 6 on gpu cuda:1: 0.0009456658735871315
    weight iteration 6 on gpu cuda:0: 0.0009456658735871315
    grad iteration 7 on gpu cuda:1: -0.002603260101750493
    grad iteration 7 on gpu cuda:0: -0.002603260101750493
    weight iteration 7 on gpu cuda:1: 0.001205991953611374
    weight iteration 7 on gpu cuda:0: 0.001205991953611374
    grad iteration 8 on gpu cuda:0: 0.000458348571555689
    grad iteration 8 on gpu cuda:1: 0.000458348571555689
    weight iteration 8 on gpu cuda:0: 0.0011601571459323168
    weight iteration 8 on gpu cuda:1: 0.0011601571459323168
    grad iteration 9 on gpu cuda:1: -0.0004215179360471666
    grad iteration 9 on gpu cuda:0: -0.0004215179360471666
    weight iteration 9 on gpu cuda:1: 0.0012023089220747352
    weight iteration 9 on gpu cuda:0: 0.0012023089220747352
    

    The gradients in the two GPUs are correctly synchronized. However, when using RiemannianSGD, the output is:

    grad iteration 0 on gpu cuda:1: 0.0035285688936710358
    grad iteration 0 on gpu cuda:0: 0.0035285688936710358
    weight iteration 0 on gpu cuda:0: -7.928906597953755e-06
    weight iteration 0 on gpu cuda:1: -7.928906597953755e-06
    grad iteration 1 on gpu cuda:0: -0.04444637894630432
    grad iteration 1 on gpu cuda:1: 0.002550020581111312
    weight iteration 1 on gpu cuda:0: 0.00018905977776739746
    weight iteration 1 on gpu cuda:1: -2.1470928913913667e-05
    grad iteration 2 on gpu cuda:0: 0.009863540530204773
    grad iteration 2 on gpu cuda:1: 0.0026304360944777727
    weight iteration 2 on gpu cuda:0: 0.00013691956701222807
    weight iteration 2 on gpu cuda:1: -3.7374120438471437e-05
    grad iteration 3 on gpu cuda:0: -0.0161017756909132
    grad iteration 3 on gpu cuda:1: -0.0023103044368326664
    weight iteration 3 on gpu cuda:0: 0.0002405263076070696
    weight iteration 3 on gpu cuda:1: -2.4383840354857966e-05
    grad iteration 4 on gpu cuda:0: 0.010763526894152164
    grad iteration 4 on gpu cuda:1: -0.017034146934747696
    weight iteration 4 on gpu cuda:0: 0.00017218326684087515
    weight iteration 4 on gpu cuda:1: 7.665574958082289e-05
    grad iteration 5 on gpu cuda:0: 0.008465449325740337
    weight iteration 5 on gpu cuda:0: 0.00012061965389875695
    grad iteration 5 on gpu cuda:1: 0.0011690922547131777
    weight iteration 5 on gpu cuda:1: 6.942617619642988e-05
    grad iteration 6 on gpu cuda:0: 0.0013559082290157676
    weight iteration 6 on gpu cuda:0: 0.00011242596519878134
    grad iteration 6 on gpu cuda:1: 0.0008932517375797033
    weight iteration 6 on gpu cuda:1: 6.43157254671678e-05
    grad iteration 7 on gpu cuda:0: 0.02651313878595829
    weight iteration 7 on gpu cuda:0: -1.233588955074083e-05
    grad iteration 7 on gpu cuda:1: -0.007853103801608086
    weight iteration 7 on gpu cuda:1: 0.00010782096069306135
    grad iteration 8 on gpu cuda:0: 0.009321866557002068
    weight iteration 8 on gpu cuda:0: -7.130965968826786e-05
    grad iteration 8 on gpu cuda:1: -0.0039948648773133755
    grad iteration 9 on gpu cuda:0: -0.0119229881092906
    weight iteration 8 on gpu cuda:1: 0.00013168319128453732
    weight iteration 9 on gpu cuda:0: -1.202533780997328e-06
    grad iteration 9 on gpu cuda:1: 0.002446404891088605
    weight iteration 9 on gpu cuda:1: 0.00011651107342913747
    

    There is some problem with the gradient synchronization, which causes the weights in the two devices to diverge.

    Library version information:

    • python -c 'import torch;print("torch:", torch.version.__version__, end=" ");print("cuda:", torch.version.cuda)' torch: 1.8.1 cuda: 11.1

    • the way you installed geoopt, github, pip pip

    • OS Ubuntu 18.04.5 LTS

    EDIT: I simplified a little bit the code by removing mixed precision.

    bug 
    opened by surisdi 9
  • Regression to points in Poincarè Disk Model

    Regression to points in Poincarè Disk Model

    Hi,

    I'm building neural network in pytorch which has to learn to make regression from vectors in Euclidean Space to some vectors in Poincarè Disk Model, so I think that the usage of RiemannianSGD can be a good choice as the optimizer.

    I'm trying to use the library but I have some question:

    • When and how I have to cast tensors? Now I transform the target tensors with ManifoldTensor with Y = ManifoldTensor(Y, manifold = geoopt.manifolds.PoincareBall(), I think that even the prediction has to be a Manifold Tensor, but if I transform the prediction outside the model the ManifoldTensor does not maintain the grad value

    • Have I to build each layer of the network with ManifoldTensor or torch.nn.Linear can be used?

    • Have I to specify to the optimizer RiemannianSGD the manifold? From the documentation I think that I don't have to do it

    Anyway thanks for your work on this project ; - )

    opened by NooneBug 8
  • Error on import geoopt

    Error on import geoopt

    Hi,

    I recently updated the pytorch environment library and so I installed again geoopt via : pip install git+https://github.com/geoopt/geoopt.git

    When I try to import the library this is the error that is reported to me:

    
      import geoopt
      File "/home/vmanuel/.conda/envs/MTNCI/lib/python3.6/site-packages/geoopt/__init__.py", line 1, in <module>
        from . import manifolds
      File "/home/vmanuel/.conda/envs/MTNCI/lib/python3.6/site-packages/geoopt/manifolds/__init__.py", line 3, in <module>
        from .stiefel import Stiefel, EuclideanStiefel, CanonicalStiefel, EuclideanStiefelExact
      File "/home/vmanuel/.conda/envs/MTNCI/lib/python3.6/site-packages/geoopt/manifolds/stiefel.py", line 4, in <module>
        from .. import linalg
      File "/home/vmanuel/.conda/envs/MTNCI/lib/python3.6/site-packages/geoopt/linalg/__init__.py", line 1, in <module>
        from .batch_linalg import svd, qr, sym, extract_diag, matrix_rank, expm, block_matrix
      File "/home/vmanuel/.conda/envs/MTNCI/lib/python3.6/site-packages/geoopt/linalg/batch_linalg.py", line 3, in <module>
        from . import _expm
      File "/home/vmanuel/.conda/envs/MTNCI/lib/python3.6/site-packages/geoopt/linalg/_expm.py", line 8, in <module>
        @torch.jit.script
      File "/home/vmanuel/.conda/envs/MTNCI/lib/python3.6/site-packages/torch/jit/__init__.py", line 364, in script
        graph = _script_graph(fn, _frames_up=_frames_up + 1)
      File "/home/vmanuel/.conda/envs/MTNCI/lib/python3.6/site-packages/torch/jit/__init__.py", line 360, in _script_graph
        return _jit_script_compile(ast, rcb)
    RuntimeError: 
    builtin cannot be used as a value:
            33522128640.0,
            1323241920.0,
            40840800.0,
            960960.0,
            16380.0,
            182.0,
            1.0,
        )
    
        ident = torch.eye(A.shape[1], dtype=A.dtype, device=A.device)
                          ~~~~~~~ <--- HERE
        A2 = torch.matmul(A, A)
        A4 = torch.matmul(A2, A2)
        A6 = torch.matmul(A4, A2)
        U = torch.matmul(
            A,
            torch.matmul(A6, b13 * A6 + b11 * A4 + b9 * A2)
            + b7 * A6
            + b5 * A4
            + b3 * A2
    

    Can you help me?

    opened by NooneBug 7
  • `_dist2plane`  triggers codegen Warning

    `_dist2plane` triggers codegen Warning

    Describe the bug/To Reproduce The following warning appears when I use Distance2PoincareHyperplanes as the classifier for torchvision.models.efficientnet_v2_s.

    ~/miniconda3/envs/a5000/lib/python3.10/site-packages/geoopt/manifolds/stereographic/math.py:1562: UserWarning: operator() profile_node %301 : int = prim::profile_ivalue(%299)
     does not have profile information (Triggered internally at  /opt/conda/conda-bld/pytorch_1656352645774/work/torch/csrc/jit/codegen/cuda/graph_fuser.cpp:104.)
      return _dist2plane(
    ~/miniconda3/envs/a5000/lib/python3.10/site-packages/geoopt/manifolds/stereographic/math.py:1562: UserWarning: FALLBACK path has been taken inside: compileCudaFusionGroup. This is an indication that codegen Failed for some reason.
    To debug try disable codegen fallback path via setting the env variable `export PYTORCH_NVFUSER_DISABLE=fallback`
    To report the issue, try enable logging via setting the envvariable ` export PYTORCH_JIT_LOG_LEVEL=manager.cpp`
     (Triggered internally at  /opt/conda/conda-bld/pytorch_1656352645774/work/torch/csrc/jit/codegen/cuda/manager.cpp:237.)
      return _dist2plane(
    

    Expected behavior Warning appears during model training.

    Please complete the following information:

    • torch: 1.12.0, cuda: 11.3, geoopt 0.4.1
    • The way you installed geoopt: pip
    • OS: Ubuntu 20.04.4 LTS (GNU/Linux 5.4.0-109-generic x86_64)

    Additional context NIL

    bug 
    opened by jin-zhe 2
  • Manifold projection fails

    Manifold projection fails

    Describe the bug With certain matrices, the use of the projx under the Lorentz Manifold fails with_check_point_on_manifold

    To Reproduce Steps to reproduce the behavior:

    • I provided an example problem tensor for this case
    • Call projx on the tensor with a k of 5.0
    • Check if the result is on the manifold

    Expected behavior With float 64 and a defined error value, check on manifold should result in True

    Screenshots

    • It may be a training/testing loss, other information for more context

    Please complete the following information:

    bug 
    opened by inboxedshoe 4
  • Add mobius methods for Lorentz model

    Add mobius methods for Lorentz model

    Thanks for your useful library. However, I found that Lorentz model has not implemented any mobius methods such as mobius_add() or mobius_matvec(). I hope you will update these methods to this library soon.

    enhancement 
    opened by IceIce1ce 1
  • Extended lorenz

    Extended lorenz

    following #142

    • [ ] mobius_add

    • [ ] mobius_matvec

    • [ ] mobius_scalar_mul

    • [ ] mobius_pointwise_mul

    • [ ] mobius_fn_apply

    • [ ] test new functions

    • [ ] re-review Rasul's implementation

    help wanted wip 
    opened by ferrine 1
  • Add mobius_add() and mobius_matvec() methods for Lorentz manifold

    Add mobius_add() and mobius_matvec() methods for Lorentz manifold

    Hi,

    Thanks for the useful tool you've developed, and I really appreciate it. When using geoopt, I found that the mobius_add() and mobius_matvec() methods for Lorentz manifold are missing, and there's an implementation at https://www.github.com/HazyResearch/hgcn. Could those methods be added into the package soon? Thanks a lot!

    enhancement 
    opened by martinwhl 1
Releases(v.0.5.1)
  • v.0.5.1(Nov 28, 2022)

    What's Changed

    • Update testing.yml by @ferrine in https://github.com/geoopt/geoopt/pull/198

    Full Changelog: https://github.com/geoopt/geoopt/compare/v0.5.0...v.0.5.1

    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Jun 29, 2022)

    What's Changed

    • fix typos by @ferrine in https://github.com/geoopt/geoopt/pull/190
    • StereographicProductManifold to use gyrovector space functions in product manifolds by @gatoniel in https://github.com/geoopt/geoopt/pull/163
    • Seminar by @ferrine in https://github.com/geoopt/geoopt/pull/192

    New Contributors

    • @gatoniel made their first contribution in https://github.com/geoopt/geoopt/pull/163

    Full Changelog: https://github.com/geoopt/geoopt/compare/v0.4.1...v0.5.0

    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Mar 15, 2022)

    What's Changed

    • add tests for pytorch 1.10.0 by @ferrine in https://github.com/geoopt/geoopt/pull/186
    • add a test, fix deepcopy and copy by @ferrine in https://github.com/geoopt/geoopt/pull/189

    Full Changelog: https://github.com/geoopt/geoopt/compare/v0.4.0...v0.4.1

    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Sep 2, 2021)

    geoopt (0.4.0)

    New Features

    • new Symmetric Positive Definite manifold (#153)
    • new Siegel manifolds: Upper half model and Bounded domain model, with support for Riemannian and Finsler metrics (#179)

    Maintainance

    • create pull request templates (#154)
    • update tests for pytorch 1.9.0

    Bug Fixes

    • fix step increments in optimizers (#165)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0rc1(Jul 1, 2021)

    geoopt (0.4.0rc1)

    New Features

    • new Symmetric Positive Definite manifold (#153)
    • new Siegel manifolds: Upper half model and Bounded domain model, with support for Riemannian and Finsler metrics (#179)

    Maintainance

    • create pull request templates (#154)
    • update tests for pytorch 1.9.0

    Bug Fixes

    • fix step increments in optimizers (#165)
    Source code(tar.gz)
    Source code(zip)
  • v0.3.1(Oct 29, 2020)

  • v0.3.0(Oct 7, 2020)

    New Features

    • Riemannian Line Search (#140)
    • Per group stabilization (#149)

    Maintenance

    • Fix API warnings (mentioned in #148)
    • support torch >= 1.4.0
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Jun 12, 2020)

    geoopt v0.2.0

    New Features

    • BirkhoffPolytope (#125)
    • Lorenz Manifold (#121)
    • kappa-Stereographic model (#126)
    • Sparse optimizers (#130)

    Maintenance

    • Tests for pytorch>=1.4, cpuonly (#133)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Nov 30, 2019)

    Bug Fixes

    • Fix scaling issues with random methods
    • Fix poincare methods cosub and norm that were working not properly
    • Fix Sphere distance for small values
    Source code(tar.gz)
    Source code(zip)
PyTorch extensions for fast R&D prototyping and Kaggle farming

Pytorch-toolbelt A pytorch-toolbelt is a Python library with a set of bells and whistles for PyTorch for fast R&D prototyping and Kaggle farming: What

Eugene Khvedchenya 1.3k Jan 05, 2023
A very simple and small path tracer written in pytorch meant to be run on the GPU

MentisOculi Pytorch Path Tracer A very simple and small path tracer written in pytorch meant to be run on the GPU Why use pytorch and not some other c

Matthew B. Mirman 222 Dec 01, 2022
Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)

News SRU++, a new SRU variant, is released. [tech report] [blog] The experimental code and SRU++ implementation are available on the dev branch which

ASAPP Research 2.1k Jan 01, 2023
Implementation of LambdaNetworks, a new approach to image recognition that reaches SOTA with less compute

Lambda Networks - Pytorch Implementation of λ Networks, a new approach to image recognition that reaches SOTA on ImageNet. The new method utilizes λ l

Phil Wang 1.5k Jan 07, 2023
A PyTorch implementation of L-BFGS.

PyTorch-LBFGS: A PyTorch Implementation of L-BFGS Authors: Hao-Jun Michael Shi (Northwestern University) and Dheevatsa Mudigere (Facebook) What is it?

Hao-Jun Michael Shi 478 Dec 27, 2022
An implementation of Performer, a linear attention-based transformer, in Pytorch

Performer - Pytorch An implementation of Performer, a linear attention-based transformer variant with a Fast Attention Via positive Orthogonal Random

Phil Wang 900 Dec 22, 2022
Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.

PyTorch Implementation of Differentiable ODE Solvers This library provides ordinary differential equation (ODE) solvers implemented in PyTorch. Backpr

Ricky Chen 4.4k Jan 04, 2023
Code snippets created for the PyTorch discussion board

PyTorch misc Collection of code snippets I've written for the PyTorch discussion board. All scripts were testes using the PyTorch 1.0 preview and torc

461 Dec 26, 2022
higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.

higher is a library providing support for higher-order optimization, e.g. through unrolled first-order optimization loops, of "meta" aspects of these

Facebook Research 1.5k Jan 03, 2023
A Pytorch Implementation for Compact Bilinear Pooling.

CompactBilinearPooling-Pytorch A Pytorch Implementation for Compact Bilinear Pooling. Adapted from tensorflow_compact_bilinear_pooling Prerequisites I

169 Dec 23, 2022
Fast, general, and tested differentiable structured prediction in PyTorch

Torch-Struct: Structured Prediction Library A library of tested, GPU implementations of core structured prediction algorithms for deep learning applic

HNLP 1.1k Jan 07, 2023
Pretrained ConvNets for pytorch: NASNet, ResNeXt, ResNet, InceptionV4, InceptionResnetV2, Xception, DPN, etc.

Pretrained models for Pytorch (Work in progress) The goal of this repo is: to help to reproduce research papers results (transfer learning setups for

Remi 8.7k Dec 31, 2022
A pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch.

Compact Bilinear Pooling for PyTorch. This repository has a pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch. This

Grégoire Payen de La Garanderie 234 Dec 07, 2022
Bunch of optimizer implementations in PyTorch

Bunch of optimizer implementations in PyTorch

Hyeongchan Kim 76 Jan 03, 2023
3D-RETR: End-to-End Single and Multi-View3D Reconstruction with Transformers

3D-RETR: End-to-End Single and Multi-View 3D Reconstruction with Transformers (BMVC 2021) Zai Shi*, Zhao Meng*, Yiran Xing, Yunpu Ma, Roger Wattenhofe

Zai Shi 36 Dec 21, 2022
Code for paper "Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking"

model_based_energy_constrained_compression Code for paper "Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and

Haichuan Yang 16 Jun 15, 2022
Differentiable SDE solvers with GPU support and efficient sensitivity analysis.

PyTorch Implementation of Differentiable SDE Solvers This library provides stochastic differential equation (SDE) solvers with GPU support and efficie

Google Research 1.2k Jan 04, 2023
GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks This repository implements a capsule model Inten

Joel Huang 15 Dec 24, 2022
An optimizer that trains as fast as Adam and as good as SGD.

AdaBound An optimizer that trains as fast as Adam and as good as SGD, for developing state-of-the-art deep learning models on a wide variety of popula

LoLo 2.9k Dec 27, 2022
Reformer, the efficient Transformer, in Pytorch

Reformer, the Efficient Transformer, in Pytorch This is a Pytorch implementation of Reformer https://openreview.net/pdf?id=rkgNKkHtvB It includes LSH

Phil Wang 1.8k Jan 06, 2023