PyTorch wrapper for Taichi data-oriented class

Related tags

Deep Learningstannum
Overview

Stannum

PyTorch wrapper for Taichi data-oriented class

PRs are welcomed, please see TODOs.

Usage

from stannum import Tin
import torch

data_oriented = TiClass()  # some Taichi data-oriented class 
device = torch.device("cpu")
tin_layer = Tin(data_oriented, device=device)
    .register_kernel(data_oriented.forward_kernel)
    .register_input_field(data_oriented.input_field, True)
    .register_output_field(data_oriented.output_field, True)
    .register_weight_field(data_oriented.weight_field, True, name="field name")
    .finish() # finish() is required to finish construction
tin_layer.set_kernel_args(1.0)
output = tin_layer(input_tensor)

For input and output:

  • We can register multiple input_field, output_field, weight_field.
  • At least one input_field and one output_field should be registered.
  • The order of input tensors must match the registration order of input_fields.
  • The output order will align with the registration order of output_fields.

Installation & Dependencies

Install stannum with pip by

python -m pip install stannum

Make sure you have the following installed:

  • PyTorch
  • Taichi

TODOs

Documentation

  • Code documentation
  • Documentation for users
  • Nicer error messages

Engineering

  • Set up CI pipeline

Features

  • PyTorch-related:
    • PyTorch checkpoint and save model
    • Proxy torch.nn.parameter.Parameter for weight fields for optimizers
  • Python related:
    • @property for a data-oriented class as an alternative way to register
  • Taichi related:
    • Wait for Taichi to have native PyTorch tensor view to optimize performance
    • Automatic Batching - waiting for upstream Taichi improvement
      • workaround for now: do static manual batching, that is to extend fields with one more dimension for batching
  • Self:
    • Allow registering multiple kernels in a call chain fashion
      • workaround for now: combine kernels into a mega kernel using @ti.complex_kernel

Misc

  • A nice logo
Comments
  • Compatible changes for v1.1.0 rc

    Compatible changes for v1.1.0 rc

    We're in the process of getting v1.1.0 release rc wheel and noticed this PR is required for stannum to work v1.1.0.

    v1.1.0 tracking: https://github.com/taichi-dev/taichi/milestone/5

    opened by ailzhang 3
  • Get rid of eager mode

    Get rid of eager mode

    When the problems in https://github.com/taichi-dev/taichi/pull/4356 get fully resolved, we can safely get rid of the eager mode introduced in v0.5.0 without penalty on performance, reducing overhead.

    Taichi-related wait_for_upstream 
    opened by ifsheldon 1
  • Get rid of clearing fields

    Get rid of clearing fields

    Once https://github.com/taichi-dev/taichi/issues/4334 and this https://github.com/taichi-dev/taichi/issues/4016 get resolved, we can get rid of auto_clear introduced in v0.4.4 and clearing in Tube to avoid unnecessary overhead.

    Taichi-related wait_for_upstream 
    opened by ifsheldon 1
  • Flexible tensor shape support

    Flexible tensor shape support

    Now stannum only supports tensors with fixed shapes which are defined by shapes of registered fields. However, Taichi kernels are more flexible than that.

    For example, this simple kernel can handle 3 arrays of the same arbitrary length

    @ti.kernel
    def array_add(array0: ti.template(), array1: ti.template(), output_array: ti.template()):
        for i in range(array0.shape[0]):
            output_array[i] = array0[i] + array1[i]  
    

    But, we cannot do that with stannum now.

    Now I don't have a clear idea about how to implement this, but discussions and PRs are always welcomed.

    enhancement Taichi-related welcome_contribution 
    opened by ifsheldon 1
  • [bug fix] fix pip build no content

    [bug fix] fix pip build no content

    Previously there is missing a level in the src hierarchy, causing the source code not packaged into the whl build artifact, resulting in the package can be installed but can not be imported.

    This PR fixes this problem by restoring the correct code layout according to Python Packaging Tutorial. It creates a new src folder and move the stannum folder into it, and updates the folder name in setup.py.

    opened by jerrylususu 0
  • Dynamic output tensor shape

    Dynamic output tensor shape

    Hi ! I'm writing a convolution-like operator using Stannum. It can be used throughout a neural network, meaning each layer may have a different input/output shape. When trying to register the output tensor, it leads to this error: AssertionError: Dim = -1 is not allowed when registering output tensors but only registering input tensors

    Does it means I have to template and recompile the kernel for each layer of the neural network ?

    For reference, here is the whole kernel/tube construction:

    @ti.kernel
    def op_taichi(gamma: ti.template(), mu: ti.template(), c: ti.template(), input: ti.template(), weight_shape_1: int, weight_shape_2: int, weight_shape_3:int):
        ti.block_local(c, mu, gamma)
        for bi in range(input.shape[0]):
            for c0 in range(input.shape[1]):
                for i0 in range(input.shape[2]):
                    for j0 in range(input.shape[3]):
                        for i0p in range(input.shape[5]):
                            for j0p in range(input.shape[6]):
                                v = 0.
                                for ci in ti.static(range(weight_shape_1)):
                                    for ii in ti.static(range(weight_shape_2)):
                                        for ji in ti.static(range(weight_shape_3)):
                                            v += (mu[bi, ci, i0+ii, j0+ji] * mu[bi, ci, i0p+ii, j0p+ji] + gamma[bi, ci, i0+ii, j0+ji, ci, i0p+ii, j0p+ji])
                                input[bi, c0, i0, j0, c0, i0p, j0p] += c[c0] * v
        return input
    
    
    def conv2duf_taichi(input, gamma, mu, c, weight_shape):
        if c.dim() == 0:
            c = c.repeat(input.shape[1])
        global TUBE
        if TUBE is None:
            device = input.device # TODO dim alignment with -2, ...
            b = input.shape[0]
            tube = Tube(device) \
                .register_input_tensor((-1,)*7, input.dtype, "gamma", True) \
                .register_input_tensor((-1,)*4, input.dtype, "mu", True) \
                .register_input_tensor((-1,), input.dtype, "c", True) \
                .register_output_tensor((-1,)*7, input.dtype, "input", True) \
                .register_kernel(op_taichi, ["gamma", "mu", "c", "input"]) \
                .finish()
            TUBE = tube
        return TUBE(gamma, mu, c, input, weight_shape[1], weight_shape[2], weight_shape[3])
    
    opened by sebastienwood 5
  • How best to use Vector or Matrix fields?

    How best to use Vector or Matrix fields?

    Is this something worth adding? Happy to give it a go.

    I see this is kind of supported for complex types. Is it preferable to just convert scalar fields to vector fields (via indexing) in the kernel? I don't see any easy way of converting a n,m,3 field to a n,m vector3 field but I might be missing something?

    opened by oliver-batchelor 1
  • Memory and Performance issue of Taichi

    Memory and Performance issue of Taichi

    With the current Taichi (v0.9.1 - 1.2.1), calling Tube N times will result in N^2 time complexity because when creating a field Taichi need to inject kernel information into a field which leads to memory movement which is O(M) where M is the number of existing fields. The complexity is simply 1+2+3+....+N = O(N^2). This is not our fault. And Taichi developers are fixing it, although it takes quite some time.

    In the forward-only computation, this can be resolved by eagerly destroying fields and SNodeTree, which is included in stannum=0.6.2.

    Taichi-related wait_for_upstream 
    opened by ifsheldon 4
  • Automatic batching

    Automatic batching

    Now stannum (and generally Taichi) cannot do automatic batch as done in PyTorch.

    For example, the below can only handle 3 arrays, but if we have a batch of arrays, we will have to loop over the batch dimension or change the code to support batches of a fixed size. This issue is somewhat related to issue #5. The ultimate goal should be supporting automatic batching with tensors of valid flexible shapes.

    @ti.kernel
    def array_add(self):
        for i in self.array0:
            self.output_array[i] = self.array0[i] + self.array1[i]  
    

    For the first step, dynamic looping (i.e. calling the kernel over and over again) is acceptable and is a good first issue.

    PRs and discussions are always welcomed.

    enhancement good first issue Taichi-related wait_for_upstream welcome_contribution 
    opened by ifsheldon 3
Releases(v0.8.0)
  • v0.8.0(Dec 28, 2022)

    Since last release:

    • A bug has been fix. The bug appears when after forward computation, if we update kernel extra args using set_kernel_extra_args (once or multiple times), backward computation is messed up due to inconsistent kernel inputs during forward and backward passes.
    • The APIs of Tin and EmptyTin have been changed: the constructors need auto_clear_grad specified, which is a reminder to users that gradients of fields must be taken care of carefully so to not have incorrect gradients after multiple runs of Tin or EmptyTin layers.
    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Sep 20, 2022)

    Nothing big has change in the code base of stannum, but since Taichi developers have delivered a long waited performance improvement, I want to urge everyone using stannum to update Taichi in use to 1.1.3. And some kind warning and documentation are added to help stannum users to understand this important upstream update.

    Source code(tar.gz)
    Source code(zip)
  • v0.6.4(Aug 10, 2022)

  • v0.6.2(Mar 21, 2022)

    Introduced a configuration in Tube enable_backward. When enable_backward is False, Tube will eagerly recycle Taichi memory by destroying SNodeTree right after forward calculation. This should improve performance of forward-only calculations and should mitigate the memory problem of Taichi in forward-only mode.

    Source code(tar.gz)
    Source code(zip)
  • v0.6.1(Mar 8, 2022)

    • #7 is fixed because of upstream Taichi has fixed uninitialized memory problem in 0.9.1
    • Intermediate fields are now required to be batched if any input tensors are batched
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Feb 23, 2022)

    Persistent mode and Eager mode of Tube

    Before v0.5.0, the Taichi fields created in Tube is persistent and their lifetime is like: PyTorch upstream tensors -> Tube -> create fields -> forward pass -> copy values to downstream tensors -> compute graph of Autograd completes -> optional backward pass -> compute graph destroyed -> destroy fields

    They're so-called persistent fields as they persist when the compute graph is being constructed.

    Now in v0.5.0, we introduce an eager mode of Tube. With persistent_fields=False when instancing a Tube, eager mode is turned on, in which the lifetime of fields is like: PyTorch upstream tensors -> Tube -> fields -> copied values to downstream tensors -> destroy fields -> compute graph of Autograd completes -> optional backward pass -> compute graph destroyed

    Zooming in the optional backward pass, since we've destroyed fields that store values in the forward pass, we need to re-allocate new fields when calculating gradients, then the backward pass is like: Downstream gradients -> Tube -> create fields and load values -> load downstream gradients to fields -> backward pass -> copy gradients to tensors -> Destroy fields -> Upstream PyTorch gradient calculation

    This introduces some overhead but may be faster on "old" Taichi (any Taichi that does not merge https://github.com/taichi-dev/taichi/pull/4356). For details, please see this PR. At the time we release v0.5.0, stable Taichi does not merge this PR.

    Compatibility issue fixes

    At the time we release v0.5.0, Taichi has been being under refactoring heavily, so we introduced many small fixes to deal with incompatibilities caused by such refactoring. If you find compatibility issues, feel free to submit issues and make PRs.

    Source code(tar.gz)
    Source code(zip)
  • v0.4.4(Feb 21, 2022)

    Fix many problems due to Taichi changes and bugs:

    • API import problems due to Taichi API changes
    • Memory uninit problem due to this https://github.com/taichi-dev/taichi/issues/4334 and this https://github.com/taichi-dev/taichi/issues/4016
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Jan 14, 2022)

    Tube

    Tube is more flexible than Tin and slower in that it helps you create necessary fields and do automatic batching.

    Registrations

    All you need to do is to register:

    • Input/intermediate/output tensor shapes instead of fields
    • At least one kernel that takes the following as arguments
      • Taichi fields: correspond to tensors (may or may not require gradients)
      • (Optional) Extra arguments: will NOT receive gradients

    Acceptable dimensions of tensors to be registered:

    • None: means the flexible batch dimension, must be the first dimension e.g. (None, 2, 3, 4)
    • Positive integers: fixed dimensions with the indicated dimensionality
    • Negative integers:
      • -1: means any number [1, +inf), only usable in the registration of input tensors.
      • Negative integers < -1: indices of some dimensions that must be of the same dimensionality
        • Restriction: negative indices must be "declared" in the registration of input tensors first, then used in the registration of intermediate and output tensors.
        • Example 1: tensor a and b of shapes a: (2, -2, 3) and b: (-2, 5, 6) mean the dimensions of -2 must match.
        • Example 2: tensor a and b of shapes a: (-1, 2, 3) and b: (-1, 5, 6) mean no restrictions on the first dimensions.

    Registration order: Input tensors/intermediate fields/output tensors must be registered first, and then kernel.

    @ti.kernel
    def ti_add(arr_a: ti.template(), arr_b: ti.template(), output_arr: ti.template()):
        for i in arr_a:
            output_arr[i] = arr_a[i] + arr_b[i]
    
    ti.init(ti.cpu)
    cpu = torch.device("cpu")
    a = torch.ones(10)
    b = torch.ones(10)
    tube = Tube(cpu) \
        .register_input_tensor((10,), torch.float32, "arr_a", False) \
        .register_input_tensor((10,), torch.float32, "arr_b", False) \
        .register_output_tensor((10,), torch.float32, "output_arr", False) \
        .register_kernel(ti_add, ["arr_a", "arr_b", "output_arr"]) \
        .finish()
    out = tube(a, b)
    

    When registering a kernel, a list of field/tensor names is required, for example, the above ["arr_a", "arr_b", "output_arr"]. This list should correspond to the fields in the arguments of a kernel (e.g. above ti_add()).

    The order of input tensors should match the input fields of a kernel.

    Automatic batching

    Automatic batching is done simply by running kernels batch times. The batch number is determined by the leading dimension of tensors of registered shape (None, ...).

    It's required that if any input tensors or intermediate fields are batched (which means they have registered the first dimension to be None), all output tensors must be registered as batched.

    Examples

    Simple one without negative indices or batch dimension:

    @ti.kernel
    def ti_add(arr_a: ti.template(), arr_b: ti.template(), output_arr: ti.template()):
        for i in arr_a:
            output_arr[i] = arr_a[i] + arr_b[i]
    
    ti.init(ti.cpu)
    cpu = torch.device("cpu")
    a = torch.ones(10)
    b = torch.ones(10)
    tube = Tube(cpu) \
        .register_input_tensor((10,), torch.float32, "arr_a", False) \
        .register_input_tensor((10,), torch.float32, "arr_b", False) \
        .register_output_tensor((10,), torch.float32, "output_arr", False) \
        .register_kernel(ti_add, ["arr_a", "arr_b", "output_arr"]) \
        .finish()
    out = tube(a, b)
    

    With negative dimension index:

    ti.init(ti.cpu)
    cpu = torch.device("cpu")
    tube = Tube(cpu) \
        .register_input_tensor((-2,), torch.float32, "arr_a", False) \
        .register_input_tensor((-2,), torch.float32, "arr_b", False) \
        .register_output_tensor((-2,), torch.float32, "output_arr", False) \
        .register_kernel(ti_add, ["arr_a", "arr_b", "output_arr"]) \
        .finish()
    dim = 10
    a = torch.ones(dim)
    b = torch.ones(dim)
    out = tube(a, b)
    assert torch.allclose(out, torch.full((dim,), 2.))
    dim = 100
    a = torch.ones(dim)
    b = torch.ones(dim)
    out = tube(a, b)
    assert torch.allclose(out, torch.full((dim,), 2.))
    

    With batch dimension:

    @ti.kernel
    def int_add(a: ti.template(), b: ti.template(), out: ti.template()):
        out[None] = a[None] + b[None]
    
    ti.init(ti.cpu)
    b = torch.tensor(1., requires_grad=True)
    batched_a = torch.ones(10, requires_grad=True)
    tube = Tube() \
        .register_input_tensor((None,), torch.float32, "a") \
        .register_input_tensor((), torch.float32, "b") \
        .register_output_tensor((None,), torch.float32, "out", True) \
        .register_kernel(int_add, ["a", "b", "out"]) \
        .finish()
    out = tube(batched_a, b)
    loss = out.sum()
    loss.backward()
    assert torch.allclose(torch.ones_like(batched_a) + 1, out)
    assert b.grad == 10.
    assert torch.allclose(torch.ones_like(batched_a), batched_a.grad)
    

    For more invalid use examples, please see tests in tests/test_tube.

    Advanced field construction with FieldManager

    There is a way to tweak how fields are constructed in order to gain performance improvement in kernel calculations.

    By supplying a customized FieldManager when registering a field, you can construct a field however you want.

    Please refer to the code FieldManger in src/stannum/auxiliary.py for more information.

    If you don't know why constructing fields differently can improve performance, don't use this feature.

    If you don't know how to construct fields differently, please refer to Taichi field documentation.

    Source code(tar.gz)
    Source code(zip)
    stannum-0.4.0-py3-none-any.whl(15.81 KB)
  • v0.3.2(Jan 1, 2022)

  • v0.3.1(Dec 30, 2021)

    Fix a bug.

    Details: When some of input fields or internal fields do not need gradient (i.e. needs_grad==False), incorrect numbers of backward gradients will be passed to PyTorch Autograd, crashing back propagation.

    Source code(tar.gz)
    Source code(zip)
  • v0.3(Dec 30, 2021)

    New feature:

    • Add complex tensor support: need to specify that a field expects a complex tensor as data source
      tin_layer = Tin(data_oriented_vector_field, device) \
            .register_kernel(data_oriented_vector_field.forward_kernel, 1.0) \
            .register_input_field(data_oriented_vector_field.input_field, complex_dtype=True) \
            .register_output_field(data_oriented_vector_field.output_field, complex_dtype=True) \
            .register_internal_field(data_oriented_vector_field.multiplier) \
            .finish()
      

    Engineering:

    • Refactored code a bit
    • Add type hints to enhance code readability
    Source code(tar.gz)
    Source code(zip)
    stannum-0.3.0-py3-none-any.whl(7.00 KB)
    stannum-0.3.0.tar.gz(7.52 KB)
  • v0.2(Aug 1, 2021)

    Now you can register multiple kernels. These kernels will be called sequentially with the same order of registration. Please be noted that all fields needed to store intermediate results must be register.

    API changes:

    • Tin.register_weight_field() -> Tin.register_internal_field()
    Source code(tar.gz)
    Source code(zip)
  • v0.1.3(Jul 14, 2021)

  • v0.1.2(Jul 13, 2021)

    Now you don't need to specify needs_grad when registering a field via .register_*_field(), as long as you use Taichi > 0.7.26. If you use a legacy version of Taichi, you must still specify needs_grad yourself, though.

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Jul 11, 2021)

  • v0.1(Jul 9, 2021)

Pytorch implementation of DeepMind's differentiable neural computer paper.

DNC pytorch This is a Pytorch implementation of DeepMind's Differentiable Neural Computer (DNC) architecture introduced in their recent Nature paper:

Yuanpu Xie 91 Nov 21, 2022
Automatic differentiation with weighted finite-state transducers.

GTN: Automatic Differentiation with WFSTs Quickstart | Installation | Documentation What is GTN? GTN is a framework for automatic differentiation with

100 Dec 29, 2022
A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising (CVPR 2020 Oral & TPAMI 2021)

ELD The implementation of CVPR 2020 (Oral) paper "A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising" and its journal (TPAMI) v

Kaixuan Wei 359 Jan 01, 2023
Vertex AI: Serverless framework for MLOPs (ESP / ENG)

Vertex AI: Serverless framework for MLOPs (ESP / ENG) Español Qué es esto? Este repo contiene un pipeline end to end diseñado usando el SDK de Kubeflo

Hernán Escudero 2 Apr 28, 2022
Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness

Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness Code for Paper "Imbalanced Gradients: A Subtle Cause of Overestimated Adv

Hanxun Huang 11 Nov 30, 2022
I tried to apply the CAM algorithm to YOLOv4 and it worked.

YOLOV4:You Only Look Once目标检测模型在pytorch当中的实现 2021年2月7日更新: 加入letterbox_image的选项,关闭letterbox_image后网络的map得到大幅度提升。 目录 性能情况 Performance 实现的内容 Achievement

55 Dec 05, 2022
PyTorch implementation of "Image-to-Image Translation Using Conditional Adversarial Networks".

pix2pix-pytorch PyTorch implementation of Image-to-Image Translation Using Conditional Adversarial Networks. Based on pix2pix by Phillip Isola et al.

mrzhu 383 Dec 17, 2022
PyTorch trainer and model for Sequence Classification

PyTorch-trainer-and-model-for-Sequence-Classification After cloning the repository, modify your training data so that the training data is a .csv file

NhanTieu 2 Dec 09, 2022
Drone detection using YOLOv5

This drone detection system uses YOLOv5 which is a family of object detection architectures and we have trained the model on Drone Dataset. Overview I

Tushar Sarkar 27 Dec 20, 2022
Official Pytorch implementation of "Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes", CVPR 2022

Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes / 3DCrowdNet News 💪 3DCrowdNet achieves the state-of-the-art accuracy on 3D

Hongsuk Choi 113 Dec 21, 2022
这是一个利用facenet和retinaface实现人脸识别的库,可以进行在线的人脸识别。

Facenet+Retinaface:人脸识别模型在Keras当中的实现 目录 注意事项 Attention 所需环境 Environment 文件下载 Download 预测步骤 How2predict 参考资料 Reference 注意事项 该库中包含了两个网络,分别是retinaface和fa

Bubbliiiing 31 Nov 15, 2022
Code release to accompany paper "Geometry-Aware Gradient Algorithms for Neural Architecture Search."

Geometry-Aware Gradient Algorithms for Neural Architecture Search This repository contains the code required to run the experiments for the DARTS sear

18 May 27, 2022
本步态识别系统主要基于GaitSet模型进行实现

本步态识别系统主要基于GaitSet模型进行实现。在尝试部署本系统之前,建立理解GaitSet模型的网络结构、训练和推理方法。 系统的实现效果如视频所示: 演示视频 由于模型较大,部分模型文件存储在百度云盘。 链接提取码:33mb 具体部署过程 1.下载代码 2.安装requirements.txt

16 Oct 22, 2022
Unoffical reMarkable AddOn for Firefox.

reMarkable for Firefox (Download) This repo converts the offical reMarkable Chrome Extension into a Firefox AddOn published here under the name "Unoff

Jelle Schutter 45 Nov 28, 2022
Tf alloc - Simplication of GPU allocation for Tensorflow2

tf_alloc Simpliying GPU allocation for Tensorflow Developer: korkite (Junseo Ko)

Junseo Ko 3 Feb 10, 2022
PN-Net a neural field-based framework for depth estimation from single-view RGB images.

PN-Net We present a neural field-based framework for depth estimation from single-view RGB images. Rather than representing a 2D depth map as a single

1 Oct 02, 2021
Code release for paper: The Boombox: Visual Reconstruction from Acoustic Vibrations

The Boombox: Visual Reconstruction from Acoustic Vibrations Boyuan Chen, Mia Chiquier, Hod Lipson, Carl Vondrick Columbia University Project Website |

Boyuan Chen 12 Nov 30, 2022
The code for paper "Learning Implicit Fields for Generative Shape Modeling".

implicit-decoder The tensorflow code for paper "Learning Implicit Fields for Generative Shape Modeling", Zhiqin Chen, Hao (Richard) Zhang. Project pag

Zhiqin Chen 353 Dec 30, 2022
MixRNet(Using mixup as regularization and tuning hyper-parameters for ResNets)

MixRNet(Using mixup as regularization and tuning hyper-parameters for ResNets) Using mixup data augmentation as reguliraztion and tuning the hyper par

Bhanu 2 Jan 16, 2022
A Python package for time series augmentation

tsaug tsaug is a Python package for time series augmentation. It offers a set of augmentation methods for time series, as well as a simple API to conn

Arundo Analytics 278 Jan 01, 2023