Functional tensors for probabilistic programming

Overview

Build Status Latest Version Documentation Status

Funsor

Funsor is a tensor-like library for functions and distributions.

See Functional tensors for probabilistic programming for a system description.

Installing

Install using pip:

Funsor supports Python 3.6+.

pip install funsor

Install from source:

git clone [email protected]:pyro-ppl/funsor.git
cd funsor
git checkout master
pip install .

Using funsor

Funsor can be used through a number of interfaces:

  • Funsors can be used directly for probabilistic computations, using PyTorch optimizers in a standard training loop. Start with these examples: discrete_hmm, eeg_slds, kalman_filter, pcfg, sensor, slds, and vae.
  • Funsors can be used to implement custom inference algorithms within Pyro, using custom elbo implementations in standard pyro.infer.SVI training. See these examples: mixed_hmm and bart forecasting.
  • funsor.pyro provides a number of Pyro-compatible (and PyTorch-compatible) distribution classes that use funsors under the hood, as well utilities to convert between funsors and distributions.
  • funsor.minipyro provides a limited alternate backend for the Pyro probabilistic programming language, and can perform some ELBO computations exactly.

Design

See design doc.

The goal of this library is to generalize Pyro's delayed inference algorithms from discrete to continuous variables, and to create machinery to enable partially delayed sampling compatible with universality. To achieve this goal this library makes three orthogonal design choices:

  1. Open terms are objects. Funsors generalize the tensor interface to also cover arbitrary functions of multiple variables ("inputs"), where variables may be integers, real numbers, or real tensors. Function evaluation / substitution is the basic operation, generalizing tensor indexing. This allows probability distributions to be first-class Funsors and make use of existing tensor machinery, for example we can generalize tensor contraction to computing analytic integrals in conjugate probabilistic models.

  2. Support nonstandard interpretation. Funsors support user-defined interpretations, including, eager, lazy, mixed eager+lazy, memoized (like opt_einsum's sharing), and approximate interpretations like Monte Carlo approximations of integration operations (e.g. .sum() over a funsor dimension).

  3. Named dimensions. Substitution is the most basic operation of Funsors. To avoid the difficulties of broadcasting and advanced indexing in positionally-indexed tensor libraries, all Funsor dimensions are named. Indexing uses the .__call__() method and can be interpreted as substitution (with well-understood semantics). Funsors are viewed as algebraic expressions with one algebraic free variable per dimension. Each dimension is either covariant (an output) or contravariant (an input).

Using funsor we can easily implement Pyro-style delayed sampling, roughly:

trace_log_prob = 0.

def pyro_sample(name, dist, obs=None):
    assert isinstance(dist, Funsor)
    if obs is not None:
        value = obs
    elif lazy:
        # delayed sampling (like Pyro's parallel enumeration)
        value = funsor.Variable(name, dist.support)
    else:
        value = dist.sample('value')[0]['value']

    # save log_prob in trace
    trace_log_prob += dist(value)

    return value

# ...later during inference...
loss = -trace_log_prob.reduce(logaddexp)  # collapses delayed variables

See funsor/minipyro.py for complete implementation.

Related projects

Citation

If you use Funsor, please consider citing:

@article{obermeyer2019functional,
  author = {Obermeyer, Fritz and Bingham, Eli and Jankowiak, Martin and
            Phan, Du and Chen, Jonathan P},
  title = {{Functional Tensors for Probabilistic Programming}},
  journal = {arXiv preprint arXiv:1910.10775},
  year = {2019}
}
Comments
  • Profiling GaussianHMM with NumPyro backend

    Profiling GaussianHMM with NumPyro backend

    I just do profiling funsor.pyro.hmm.GaussianHMM and pyro.distributions.GaussianHMM and have some observations:

    • funsor is slow comparing to pyro, especially for small hidden/obs dims. With batch_dim, time_dim, obs_dim, hidden_dim = 5, 6, 3, 2, pyro takes 5ms while funsor takes 35ms to evaluate log_prob.
    • The overhead in funsor seems to be constant. I increased obs_dim, hidden_dim to 30, 20 and verified that .
    • torch.cat takes a large amount of time in funsor (e.g. with batch_dim, time_dim = 50, 60, this op takes 7ms per the total 70ms). Similarly, torch.pad takes a portion of time in pyro (but the time is less than torch.cat in funsor). I think the reason is we replace pad by new_zeros + cat in funsor.
    • Time for numerical calculation (except cat, pad) seems to be similar between two versions. It seems to me that at some points, funsor switch dimensions. I guess this is due to the fact that in funsor, we store dimensions in a set. For example, I looked at args of triangular solve and got (this is not important I believe)
    funsor torch.Size([5, 3, 2, 4]) torch.Size([5, 3, 2, 2])
    funsor torch.Size([5, 3, 2, 1]) torch.Size([5, 3, 2, 2])
    funsor torch.Size([5, 1, 2, 4]) torch.Size([5, 1, 2, 2])
    funsor torch.Size([5, 1, 2, 1]) torch.Size([5, 1, 2, 2])
    funsor torch.Size([1, 5, 2, 4]) torch.Size([1, 5, 2, 2])
    funsor torch.Size([1, 5, 2, 1]) torch.Size([1, 5, 2, 2])
    funsor torch.Size([5, 2, 2]) torch.Size([5, 2, 2])
    funsor torch.Size([5, 2, 1]) torch.Size([5, 2, 2])
    funsor torch.Size([5, 2, 1]) torch.Size([5, 2, 2])
    CPU times: user 287 ms, sys: 30.9 ms, total: 318 ms
    Wall time: 28.3 ms
    pyro torch.Size([5, 3, 2, 4]) torch.Size([5, 3, 2, 2])
    pyro torch.Size([5, 3, 2, 1]) torch.Size([5, 3, 2, 2])
    pyro torch.Size([5, 1, 2, 4]) torch.Size([5, 1, 2, 2])
    pyro torch.Size([5, 1, 2, 1]) torch.Size([5, 1, 2, 2])
    pyro torch.Size([5, 1, 2, 4]) torch.Size([5, 1, 2, 2])
    pyro torch.Size([5, 1, 2, 1]) torch.Size([5, 1, 2, 2])
    pyro torch.Size([5, 2, 2]) torch.Size([5, 2, 2])
    pyro torch.Size([5, 2, 1]) torch.Size([5, 2, 2])
    pyro torch.Size([5, 2, 1]) torch.Size([5, 2, 2])
    CPU times: user 70.4 ms, sys: 1.17 ms, total: 71.6 ms
    Wall time: 5.84 ms
    
    Profiling code
    import pyro.distributions as dist
    import pytest
    import torch
    from pyro.distributions.util import broadcast_shape
    
    from funsor.pyro.hmm import DiscreteHMM, GaussianHMM, GaussianMRF, SwitchingLinearHMM
    from funsor.testing import assert_close, random_mvn
    
    batch_dim, time_dim, obs_dim, hidden_dim = 5, 6, 3, 2
    
    init_shape = (batch_dim,)
    trans_mat_shape = trans_mvn_shape = obs_mat_shape = obs_mvn_shape = (batch_dim, time_dim)
    init_dist = random_mvn(init_shape, hidden_dim)
    trans_mat = torch.randn(trans_mat_shape + (hidden_dim, hidden_dim))
    trans_dist = random_mvn(trans_mvn_shape, hidden_dim)
    obs_mat = torch.randn(obs_mat_shape + (hidden_dim, obs_dim))
    obs_dist = random_mvn(obs_mvn_shape, obs_dim)
    
    actual_dist = GaussianHMM(init_dist, trans_mat, trans_dist, obs_mat, obs_dist)
    expected_dist = dist.GaussianHMM(init_dist, trans_mat, trans_dist, obs_mat, obs_dist)
    assert actual_dist.batch_shape == expected_dist.batch_shape
    assert actual_dist.event_shape == expected_dist.event_shape
    
    shape = broadcast_shape(init_shape + (1,),
                            trans_mat_shape, trans_mvn_shape,
                            obs_mat_shape, obs_mvn_shape)
    data = obs_dist.expand(shape).sample()
    assert data.shape == actual_dist.shape()
    
    %time actual_log_prob = actual_dist.log_prob(data)
    %time expected_log_prob = expected_dist.log_prob(data)
    

    With the above observations, I think that using jax.jit will resolve this "constant" overhead. So I would like to go ahead and implement GaussianHMM in NumPyro. This requires modifications of current funsor.pyro to make it work with NumPyro. All changes in this PR are just for profiling, not to merge. Hopefully, jax.jit will work well with lazily evaluation mechanisms in funsor.

    Tasks

    • [x] todo
    discussion 
    opened by fehiepsi 14
  • Support global backend

    Support global backend

    Addresses #207

    Follow up discussions at #317, this PR supports global backend in funsor.

    To make it easier for reviewing, I list here important changes in this PR:

    • Separate out jax and numpy implementation. I also take this chance to reorder those ops according to their names (to make it easier to compare different backends or to find some specific implementation).
    • Use numpy for default implementation of numeric ops. funsor.torch or funsor.jax are lazily imported, depending on the environment variable FUNSOR_BACKEND or the usage of funsor.set_backend(...).
    • We can change backend using the utility set_backend. There is a problem: if we have change backend to JAX, we can't revert back to numpy backend.
    • Port Pyro Tensordot implementation to funsor. Pyro implementation has some statements x.dim(), which is not available for ndarray
    • Despite that we will use numpy backend by default, I still use PyTorch names (except for amax, amin) for those ops (like the current master branch). I guess Pyro devs would like PyTorch names than NumPy names for numeric ops.

    Some small changes:

    • replace torch._C._get_tracing_state by funsor.util.get_tracing_state
    • add funsor.util.is_nn_module
    • add ops.is_numeric_array to check if a variable is a numeric array. We can rename it to ops.is_tensor to match torch.is_tensor if prefered.
    • add ops.detach to detach gradient. This is no op for numpy backend.
    • fix issues of ops.log which gives warnings at 0. value
    • support both tuple or iterator for testing utilities randn, zeros,... (because I found so many places in test files that use that pattern torch.zeros(1, 2, 3) instead of torch.zeros((1, 2, 3))).
    • dispatch for recursion_reinterpret.register(torch.Tensor) is implemented in torch.py file, instead of interpreter.py file. (similar for children, to_funsor, allclose)
    • others are cleanup and reordering changes

    Tests pass with all backends

    • [x] test_affine
    • [x] test_alpha_conversion
    • [x] test_cnf
    • [x] test_delta
    • [x] test_gaussian
    • [x] test_import
    • [x] test_integrate
    • [x] test_joint
    • [x] test_sum_product
    • [x] test_tensor
    • [x] test_terms

    Tests only work with torch/pyro

    • test/pyro/
    • test/test_minipyro.py
    • test/test_adjoint.py: requires einsum.adjoint.require_backward
    • test/test_distributions.py: refactoring
    • test/test_einsum.py
    • test/test_memoize.py: requires some einsum stuffs
    • test/test_optimizer.py: require distributions and einsum

    TODO:

    • [x] separate numpy and jax backend.
    • [x] revise DICE logic
    awaiting review 
    opened by fehiepsi 12
  • Add a single normal form for sum-product contractions

    Add a single normal form for sum-product contractions

    Resolves #156

    As outlined in #156 I'm using this as a staging branch for the refactoring PRs associated with the proposed changes there:

    • [x] #157 Original PR (diff)
    • [x] #173 Refactor Affine into Contraction (https://github.com/pyro-ppl/funsor/pull/157/commits/e20978978283998341f5ed75a755cebc76fa2bab)
    • [x] #165 Refactor optimizer to use Contraction and normalize (https://github.com/pyro-ppl/funsor/pull/157/commits/a4b2ec29b10f6649d86cf9fee24d56e574b279c8)
    • [x] #169 Refactor Joint into multivariate Delta and Contraction (https://github.com/pyro-ppl/funsor/pull/157/commits/d2c79b0c8cdab99b370ca160de53dc8d62469dd9)
    • [x] Separate unfolding from normalization (https://github.com/pyro-ppl/funsor/pull/157/commits/ffdd38176567242fde59ab1f070488ab98cce72e)
    • [x] Troubleshoot performance issues (mostly caused by repeated unfolding)
    • [x] Use #212 for pattern matching
    • [x] Remove obsolete patterns (diff)
    • [x] Update MultiDelta to keep separate log_densitys for each point (https://github.com/pyro-ppl/funsor/pull/157/commits/30cbcb1c13806991cdeb70e5618799b60cf844b9)
    • [x] Fix regressions in adjoint and einsum
    • [x] Add missing patterns for minipyro
    • [x] Update bound inference tests for Number

    Original description: This PR takes a first step towards the goals in #156 by adding two things:

    1. A Contraction term representing finitary sum-product operations that takes advantage of multipledispatch.Variadic for pattern matching. This is also necessary for #72.
    2. A normalize interpretation that uses associative and distributive properties to put linear expressions into something like DNF, grouping operations into single Contraction terms where possible.

    As a demonstration, I have included patterns and tests illustrating how Joint is equivalent to interleaving eager and normalize.

    Tested:

    • I copied smoke tests and test cases verbatim from test_joint.py, test_gaussian.py, test_affine.py and test_einsum.py
    awaiting review refactor 
    opened by eb8680 10
  • DiCE factors from importance sampling?

    DiCE factors from importance sampling?

    In Tensor.unscaled_sample, a DiCE factor is added to the sample. In Gaussian.unscaled_sample, this isn't necessary because of the parameterization of the delta function.

    So far including DiCE with the samples doesn't sit quite right conceptually. Could it make sense to instead approach this from importance sampling? The default interpretation of adjoint(p) @ f for continuous p might be to use importance sampling with some default proposal q. To maintain differentiability, the importance weights would be p / q.detach(). If samples can be drawn from p, then p can be used for the proposal, and the importance weights would be p / p.detach(), which in this context are the DiCE factors.

    Am I missing some benefits of the DiCE formalism?

    discussion 
    opened by mattwescott 10
  • Generalize partial sum product to Markov models

    Generalize partial sum product to Markov models

    This attempts to generalize partial_sum_product to support elimination of markov variables following discussion with @eb8680 and @fritzo at pyro-ppl/pyro#2687.

    TODO

    • [x] modified_partial_sum_product that eliminates first-order markov variables using parallel-scan algorithm.
    • [x] Add more tests
    • [ ] Rename modified_partial_sum_product or replace partial_sum_product by it?

    Tests

    • [x] _expected_modified_partial_sum_product unrolls one markov dim at a time
    • [x] test_modified_partial_sum_product_0
    • [x] test_modified_partial_sum_product_1
    • [x] test_modified_partial_sum_product_2
    • [x] test_modified_partial_sum_product_3
    • [x] test_modified_partial_sum_product_4
    • [x] test_modified_partial_sum_product_5
    • [x] test_modified_partial_sum_product_6
    • [x] test_modified_partial_sum_product_7: xfail x and y markov vars with different ordinals
    • [x] test_modified_partial_sum_product_8
    • [x] test_modified_partial_sum_product_9
    • [x] test_modified_partial_sum_product_10
    • [x] test_modified_partial_sum_product_11
    • [x] test_modified_partial_sum_product_12: xfail w_curr in time plate but lower ordinal than y and x markov vars
    • [x] test_modified_partial_sum_product_13
    • [x] test_modified_partial_sum_product_14
    • [x] test_modified_partial_sum_product_15: xfail y depends on two time dims time and tones
    • [x] test_modified_partial_sum_product_16: x_prev->y_curr and y_prev->x_curr

    Note

    • Markov sites cannot be within local plates inside (see tests 2 and 10)

    Update

    • Added unrolling of markov vars in local plates to handle tests 2 and 10

    Useful links

    • pyro-ppl/funsor#167
    • pyro-ppl/funsor#204
    • pyro-ppl/funsor#293
    • pyro-ppl/numpyro#686
    • pyro-ppl/funsor#399
    enhancement WIP 
    opened by ordabayevy 9
  • Make funsor.distributions backend agnostic

    Make funsor.distributions backend agnostic

    Blocked by https://github.com/pyro-ppl/pyro-api/pull/19

    This is similar to #326 but doesn't include funsor.numpyro stuff. I have resolved has_rsample issue and make the whole test_distributions backend agnostic.

    awaiting review 
    opened by fehiepsi 9
  • refactor Gaussian to use Tensor api instead of torch.Tensor

    refactor Gaussian to use Tensor api instead of torch.Tensor

    This PR follows Approach 2 in #207 to make Gaussian backend-agnostic. My plan is to

    • check if info_vec is torch.Tensor, then set self.backend = TorchTensor.
    • make backend-specific ops such as triangular_solve, cholesky static methods of their corresponding TorchTensor type. For example,
    x.triangular_solve(y)
    

    will be

    self.backend.triangular_solve(x, y)
    

    It seems that there is a lot of work to do but I hope that future refactoring will be easier after this.

    Tasks

    • ~[] subclass Tensor to TorchTensor~ this is not necessary for now
    • [x] collect and implement all required ~static methods~ ops for TorchTensor
    • [x] use the above ~static methods~ ops in Gaussian
    • [x] do the same for other classes/functions in gaussian.py
    • [x] make sure all tests pass
    • [x] make align_tensors and materialize backend agnostic
    • [x] make numpy backend pass for gaussian smoke test

    Future PRs

    • deal with torch._C._get_tracing_state() statements
    • make sure that numpy backend will work
    awaiting review 
    opened by fehiepsi 9
  • `NotImplementedError` when using `apply_optimizer`

    `NotImplementedError` when using `apply_optimizer`

    This term below comes up in test_pyroapi_funsor:

    from funsor.cnf import Contraction
    from funsor.tensor import Tensor
    import torch
    import funsor.ops as ops
    from funsor import Bint, Real
    from funsor.terms import Unary, Binary, Variable, Number, lazy, to_data
    from funsor.constant import Constant
    from funsor.delta import Delta
    import funsor
    funsor.set_backend("torch")
    
    with lazy:
        term = Contraction(ops.add, ops.mul,
         frozenset({Variable('plate_inner__BOUND_78', Bint[2]), Variable('x__BOUND_77', Real), Variable('plate_outer__BOUND_79', Bint[3])}),  # noqa
         (Unary(ops.exp,
           Contraction(ops.null, ops.add,
            frozenset(),
            (Delta(
              (('x__BOUND_77',
                (Tensor(
                  torch.tensor([-1.5227684840130555, 0.38168390241818895, -1.0276085758313882], dtype=torch.float64),  # noqa
                  (('plate_outer__BOUND_79',
                    Bint[3],),),
                  'real'),
                 Number(0.0),),),)),
             Constant(
              (('plate_inner__BOUND_78',
                Bint[2],),),
              Tensor(
               torch.tensor([0.0, 0.0, 0.0], dtype=torch.float64),
               (('plate_outer__BOUND_79',
                 Bint[3],),),
               'real')),))),
          Unary(ops.all,
           Binary(ops.eq,
            Tensor(
             torch.tensor([-1.5227684840130555, 0.38168390241818895, -1.0276085758313882], dtype=torch.float64),  # noqa
             (('plate_outer__BOUND_79',
               Bint[3],),),
             'real'),
            Tensor(
             torch.tensor([-1.5227684840130555, 0.38168390241818895, -1.0276085758313882], dtype=torch.float64),  # noqa
             (('plate_outer__BOUND_79',
               Bint[3],),),
             'real'))),
          Tensor(
           torch.tensor([[-3.304130052277938, -0.9255234395261538, -1.5122103473560844], [-3.217490519312117, -1.2663745664889694, -1.2900109994655682]],
    dtype=torch.float64),  # noqa
           (('plate_inner__BOUND_78',
             Bint[2],),
            ('plate_outer__BOUND_79',
             Bint[3],),),
           'real'),))
    
    x = to_data(funsor.optimizer.apply_optimizer(term))
    

    Error message:

    Traceback (most recent call last):
      File "hehe.py", line 54, in <module>
        x = to_data(funsor.optimizer.apply_optimizer(term))
      File "/home/ordabayev/repos/funsor/funsor/optimizer.py", line 169, in apply_optimizer
        return interpreter.reinterpret(expr)
      File "/home/ordabayev/repos/funsor/funsor/interpreter.py", line 255, in reinterpret
        return recursion_reinterpret(x)
      File "/home/ordabayev/repos/funsor/funsor/interpreter.py", line 236, in recursion_reinterpret
        return _STACK[-1].interpret(type(x), *map(recursion_reinterpret, children(x)))
      File "/home/ordabayev/repos/funsor/funsor/interpretations.py", line 196,
    in interpret
        result = s.interpret(cls, *args)
      File "/home/ordabayev/repos/funsor/funsor/interpretations.py", line 155,
    in interpret
        return self.dispatch(cls, *args)(*args)
      File "/home/ordabayev/repos/funsor/funsor/optimizer.py", line 87, in optimize_contraction_variadic
        return optimize.interpret(Contraction, r, b, v, tuple(ts))
      File "/home/ordabayev/repos/funsor/funsor/interpretations.py", line 196,
    in interpret
        result = s.interpret(cls, *args)
      File "/home/ordabayev/repos/funsor/funsor/interpretations.py", line 155,
    in interpret
        return self.dispatch(cls, *args)(*args)
      File "/home/ordabayev/repos/funsor/funsor/optimizer.py", line 145, in optimize_contract_finitary_funsor
        path_end = Contraction(
      File "/home/ordabayev/repos/funsor/funsor/terms.py", line 210, in __call__
        return interpret(cls, *args)
      File "/home/ordabayev/repos/funsor/funsor/interpretations.py", line 196,
    in interpret
        result = s.interpret(cls, *args)
      File "/home/ordabayev/repos/funsor/funsor/interpretations.py", line 155,
    in interpret
        return self.dispatch(cls, *args)(*args)
      File "/home/ordabayev/repos/funsor/funsor/cnf.py", line 323, in eager_contraction_tensor
        raise NotImplementedError("TODO")
    NotImplementedError: TODO
    
    enhancement 
    opened by ordabayevy 8
  • Fix nan gradients in TraceEnum_ELBO when evaluated eagerly

    Fix nan gradients in TraceEnum_ELBO when evaluated eagerly

    Fixes nan gradients in TraceEnum_ELBO when evaluated eagerly (mentioned in https://github.com/pyro-ppl/funsor/issues/493).

    The problem seems to be that reflect.interpret in line 62 leads to substitutions that have adjoint as a base_interpretation which in turn leads to some extra expressions being added to the tape. Here I fix it by changing interpretation to _old_interpretation.

    This also solves test_adjoint.py::test_sequential_sum_product_adjoint: xfail_param(MarkovProduct, reason="mysteriously doubles adjoint values?") (blocked by #545 ).

    awaiting review 
    opened by ordabayevy 7
  • Avoid creating distributions when inferring value domain

    Avoid creating distributions when inferring value domain

    Addresses #412 pair coded with @eb8680

    This refactors logic in Distribution._infer_value_domain() to avoid creating a temporary dummy distribution if backend distributions implement a .infer_shapes() method.

    awaiting review refactor 
    opened by fritzo 7
  • Switch to sqrt(precision) representation in Gaussian

    Switch to sqrt(precision) representation in Gaussian

    Resolves #567 Adapts @fehiepsi's https://github.com/pyro-ppl/pyro/pull/2019

    This switches the internal Gaussian representation to a numerically stable and space efficient representation

    - Gaussian(info_vec, precision, inputs)
    + Gaussian(white_vec, prec_sqrt, inputs)
    

    In the new parametrization, Gaussians represent the log-density function

    Gaussian(white_vec, prec_sqrt, OrderedDict(x=Reals[n]))
      = -1/2 || x @ prec_sqrt - white_vec ||^2
    

    These two parameters are shaped to efficiently represent low-rank data:

    assert white_vec.shape == batch_shape + (rank,)
    assert prec_sqrt.shape == batch_shape + (dim, rank)
    

    reducing space complexity from O(dim(dim+1)) to O(rank(dim+1)). In my real-world example rank=1, dim=2369, and batch_shape=(1343,), so the space reduction is 30GB → 13MB.

    Computations is cheap in this representation: addition amounts to concatenation, and plate-reduction amounts to transpose and reshape. Some ops are only supported on full-rank Gaussians, and I've added checks based on the new property .is_full_rank. This partial support is ok because the Gaussian funsors that arise in Bayesian models are all full rank due to priors (notwithstanding numerical loss of rank).

    Because the Gaussian funsor is internal, the interface change should not cause breakage of most user code, since most user code uses to_funsor() and to_data() with backend-specific distributions. One broken piece of user code is Pyro's AutoGaussianFunsor which will need an update (and which will be sped up).

    As suggested by @eb8680, I've added some optional kwarg parametrizations and properties to support conversion to other Gaussian representations, e.g. g = Gaussian(mean=..., covariance=..., inputs=...) and g._mean, g._covariance. This allows more Gaussian math to live in gaussian.py.

    Tested

    • [x] added new tests
    • [x] pass existing funsor tests (linear algebra)
    • [x] pass existing funsor tests (patterns)
    • [x] pass NumPyro tests (conjugacy)
    • [x] pass Pyro tests on a branch (https://github.com/pyro-ppl/pyro/pull/2943)
    • [x] check computational cost on the pyro-cov model (results: under 12GB memory)
    awaiting review refactor 
    opened by fritzo 6
  • Feature Request: Add Support for TransformedDistribution with non-trivial event dimensions

    Feature Request: Add Support for TransformedDistribution with non-trivial event dimensions

    https://github.com/pyro-ppl/funsor/blob/1d07af18c21894dd56e2f4f877c7845430c3b729/funsor/distribution.py#L531

    Opening a feature request as per this comment by @fehiepsi

    Also note that the same issue happens when using TruncatedNormal distribution, see this comment.

    enhancement 
    opened by pankajb64 0
  • FR rules for (Gaussian + UniformPolygon).reduce(logaddexp)

    FR rules for (Gaussian + UniformPolygon).reduce(logaddexp)

    It would be great to have reduction rules for uniform likelihoods wrt Gaussian latent variables:

    • [ ] (MyContinuousDist + Uniform).reduce(logaddexp) implemented via .cdf()
    • [ ] (Gaussian + UniformPolygon).reduce(logaddexp) implemented via quadrature (MC or QMC)

    This could be used for Pyro and NumPyro probabilistic programs with constraint-like likelihoods, something like

    def model(data):
        z = pyro.sample("z", Normal(...))
        # Observe a constraint like lb < z < ub.
        pyro.sample("x", Uniform(...), obs=z)
    

    here we can be lazy and compute marginal likelihood via Normal(...).cdf([lb, ub]).

    enhancement 
    opened by fritzo 0
  • Delta integrate pattern

    Delta integrate pattern

    (Separating this from #593 PR)

    This proposes the following logic for the Delta Integrate pattern:

    • If reduced_var is in integrand.inputs then apply substitution to delta and integrand:
    delta = Delta("x",  point, log_density)
    integrand = Variable("x")
    Integrate(delta, integrand, reduced_vars="x")
      => delta(x=point).exp() * integrand(x=point)
      => log_density.exp() * point
    

    where log_density can be a Dice factor or an importance weight in general.

    • If reduced_var is not in integrand.inputs then just reduce delta:
    delta = Delta("x",  point, log_density)
    integrand = Number(3.0)
    Integrate(delta, integrand, reduced_vars="x")
      => delta.reduce(logaddexp, "x").exp() * integrand
      => 1 * 3.0
      => 3.0
    

    where delta is normalized.

    awaiting review 
    opened by ordabayevy 0
  • Provenance funsor

    Provenance funsor

    Provenance tracking implementation in funsor. Here the provenance is the set of (name, point) tuples of RV samples that were in the history of the computations of the tracked term. Substitution results in the multiplication by the density:

    >>> logp_x = Provenance(logp_value, {("x", point)})
    >>> logp_x(x=point)
    logp_value
    

    The Provenance funsor is intended to be used in https://github.com/pyro-ppl/pyro/pull/2893 as a wrapper for ProvenanceTensor.

    enhancement awaiting review 
    opened by ordabayevy 1
  • FR Thompson sampling example

    FR Thompson sampling example

    I came across this cute example of nested semiring dynamic programming in the context of Bayesian optimization. Thompson sampling first performs Bayesian regression, fitting a posterior p(θ|Xs,ys) over parameters θ given a fully-supervised dataset of (X,y) pairs, then repeatedly samples parameters θ and optimizes expected reward E[y|X,θ] over X. That is, at each step

    X_optimal <- arg max_X int_y int_θ y p(y|θ,X)  p(θ) prod_i p(yi|θ,Xi)
    # new choice                       --reward-- prior ---likelihood---
    

    For example in Pyroed p(θ) prod_i p(yi|θ,Xi) is approximated variationally, and E[y|θ,X] is just a tensor contraction (although there it is optimized via annealed Gibbs sampling rather than via dynamic programming, as an aside we could add an annealed Gibbs sampling interpretation to Funsor for sum-product contractions).

    What would be a good example here, where we might leverage Funsor's dynamic programming to implement both the integral and argmax operators?

    examples 
    opened by fritzo 0
Releases(0.4.4)
  • 0.4.4(Dec 28, 2022)

  • 0.4.3(Mar 21, 2022)

    What's Changed

    • Update black by @fritzo in https://github.com/pyro-ppl/funsor/pull/583
    • Extend partial_sum_product() to cases tractable for Gaussians by @fritzo in https://github.com/pyro-ppl/funsor/pull/584
    • update classifiers by @anukaal in https://github.com/pyro-ppl/funsor/pull/586
    • Fix provenance tensor to work with PyTorch 1.11 by @fritzo in https://github.com/pyro-ppl/funsor/pull/588
    • Update to PyTorch 1.11.0 by @fritzo in https://github.com/pyro-ppl/funsor/pull/589
    • Bump to version 0.4.3 by @fritzo in https://github.com/pyro-ppl/funsor/pull/590

    New Contributors

    • @anukaal made their first contribution in https://github.com/pyro-ppl/funsor/pull/586

    Full Changelog: https://github.com/pyro-ppl/funsor/compare/0.4.2...0.4.3

    Source code(tar.gz)
    Source code(zip)
  • 0.4.2(Dec 13, 2021)

    Enhancements

    • #543 Added ProvenanceTensor for torch backend
    • #543 Added Constant funsor
    • #549 Added funsor.recipes.forward_fiter_backward_rsample()
    • #553 Added a Precondition interpretation for Gaussians, plus funsor.recipes.forward_fiter_backward_precondition()
    • #572 Support Funsors in pprint.pprint() and pdb

    Breaking changes

    • #568 Switch to sqrt(precision) representation for Gaussian
    Source code(tar.gz)
    Source code(zip)
  • 0.4.1(Jul 2, 2021)

  • 0.4.0(Jan 24, 2021)

    • Added new sum product algorithms (contributed by @ordabayevy)
    • Added many new conjugacy patterns
    • Automated op registration and made adding new ops easier
    • #419 Support ExtendedDistribution
    • #432 Support IndependentConstraint
    • #397 Allow .reduce(op, vars_not_in_inputs)
    • #430 Add Tuple Funsor
    • #409 Change sarkka_bilmes_product interface
    Source code(tar.gz)
    Source code(zip)
  • 0.3.0(Oct 15, 2020)

    • #380 Drop support for Python 3.5 and PyTorch <=1.5
    • #352 New syntax for funsor.domain.Domains: bint(n) -> Bint[n], reals(*shape) -> Reals[shape]
    • #369 New StatefulInterpretation pattern to simplify context-dependent interpretations
    Source code(tar.gz)
    Source code(zip)
  • 0.2.0(Jul 27, 2020)

    • Supports multiple backends: numpy (default), torch, and jax. To change the backend, use funsor.set_backend("torch") or funsor.set_backend("jax").
    Source code(tar.gz)
    Source code(zip)
  • 0.1.2(Oct 24, 2019)

  • 0.1.1(Oct 24, 2019)

  • 0.1.0(Oct 23, 2019)

Owner
Pyro - Deep Universal Probabilistic Programming
CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological images.

cleanX CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological

Candace Makeda Moore, MD 20 Jan 05, 2023
Feature engineering and machine learning: together at last

Feature engineering and machine learning: together at last! Lambdo is a workflow engine which significantly simplifies data analysis by unifying featu

Alexandr Savinov 14 Sep 15, 2022
A powerful data analysis package based on mathematical step functions. Strongly aligned with pandas.

The leading use-case for the staircase package is for the creation and analysis of step functions. Pretty exciting huh. But don't hit the close button

48 Dec 21, 2022
A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and lo

Coiled 102 Nov 10, 2022
Python package for analyzing behavioral data for Brain Observatory: Visual Behavior

Allen Institute Visual Behavior Analysis package This repository contains code for analyzing behavioral data from the Allen Brain Observatory: Visual

Allen Institute 16 Nov 04, 2022
Data Competition: automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly

Table of contents Introduction Dataset Model & Metrics How to Run Quickstart Install Training Evaluation Detection DATA COMPETITION The COVID-19 pande

Thanh Dat Vu 1 Feb 27, 2022
First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we want to understand column level lineage and automate impact analysis.

dbt-osmosis First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we wan

Alexander Butler 150 Jan 06, 2023
Parses data out of your Google Takeout (History, Activity, Youtube, Locations, etc...)

google_takeout_parser parses both the Historical HTML and new JSON format for Google Takeouts caches individual takeout results behind cachew merge mu

Sean Breckenridge 27 Dec 28, 2022
Titanic data analysis for python

Titanic-data-analysis This Repo is an analysis on Titanic_mod.csv This csv file contains some assumed data of the Titanic ship after sinking This full

Hardik Bhanot 1 Dec 26, 2021
Python utility to extract differences between two pandas dataframes.

Python utility to extract differences between two pandas dataframes.

Jaime Valero 8 Jan 07, 2023
The Master's in Data Science Program run by the Faculty of Mathematics and Information Science

The Master's in Data Science Program run by the Faculty of Mathematics and Information Science is among the first European programs in Data Science and is fully focused on data engineering and data a

Amir Ali 2 Jun 17, 2022
A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Realtime Financial Market Data Visualization and Analysis Introduction This repo shows my project about real-time stock data pipeline. All the code is

6 Sep 07, 2022
Analyse the limit order book in seconds. Zoom to tick level or get yourself an overview of the trading day.

Analyse the limit order book in seconds. Zoom to tick level or get yourself an overview of the trading day. Correlate the market activity with the Apple Keynote presentations.

2 Jan 04, 2022
Hangar is version control for tensor data. Commit, branch, merge, revert, and collaborate in the data-defined software era.

Overview docs tests package Hangar is version control for tensor data. Commit, branch, merge, revert, and collaborate in the data-defined software era

Tensorwerk 193 Nov 29, 2022
Minimal working example of data acquisition with nidaqmx python API

Data Aquisition using NI-DAQmx python API Based on this project It is a minimal working example for data acquisition using the NI-DAQmx python API. It

Pablo 1 Nov 05, 2021
Meltano: ELT for the DataOps era. Meltano is open source, self-hosted, CLI-first, debuggable, and extensible.

Meltano is open source, self-hosted, CLI-first, debuggable, and extensible. Pipelines are code, ready to be version c

Meltano 625 Jan 02, 2023
An implementation of the largeVis algorithm for visualizing large, high-dimensional datasets, for R

largeVis This is an implementation of the largeVis algorithm described in (https://arxiv.org/abs/1602.00370). It also incorporates: A very fast algori

336 May 25, 2022
Data Science Environment Setup in single line

datascienv is package that helps your to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries

Ashish Patel 55 Dec 16, 2022
Provide a market analysis (R)

market-study Provide a market analysis (R) - FRENCH Produisez une étude de marché Prérequis Pour effectuer ce projet, vous devrez maîtriser la manipul

1 Feb 13, 2022