Fast, flexible and easy to use probabilistic modelling in Python.

Overview

build Documentation Status Binder

Please consider citing the JMLR-MLOSS Manuscript if you've used pomegranate in your academic work!

pomegranate is a package for building probabilistic models in Python that is implemented in Cython for speed. A primary focus of pomegranate is to merge the easy-to-use API of scikit-learn with the modularity of probabilistic modeling to allow users to specify complicated models without needing to worry about implementation details. The models implemented here are built from the ground up with big data processing in mind and so natively support features like multi-threaded parallelism and out-of-core processing. Click on the binder badge above to interactively play with the tutorials!

Installation

pomegranate is pip-installable using pip install pomegranate and conda-installable using conda install pomegranate. If neither work, more detailed installation instructions can be found here.

If you get an error involving pomegranate/base.c, try installing with pip install --no-cache-dir pomegranate.

If you get an error involving pomegranate/distributions/NeuralNetworkWrapper.c: No such file or directory, try installing Cython first and then re-installing.

Models

The discrete Bayesian networks also support novel work on structure learning in the presence of constraints through a constraint graph. These constraints can dramatically speed up structure learning through the use of loose general prior knowledge, and can frequently make the exact learning task take only polynomial time instead of exponential time. See the PeerJ manuscript for the theory and the pomegranate tutorial for the practical usage!

To support the above algorithms, it has efficient implementations of the following:

  • Kmeans/Kmeans++/Kmeans||
  • Factor Graphs

Features

Please take a look at the tutorials folder, which includes several tutorials on how to effectively use pomegranate!

See the website for extensive documentation, API references, and FAQs about each of the models and supported features.

No good project is done alone, and so I'd like to thank all the previous contributors to YAHMM, and all the current contributors to pomegranate, including the graduate students who share my office I annoy on a regular basis by bouncing ideas off of.

Dependencies

pomegranate requires:

- Cython (only if building from source)
- NumPy
- SciPy
- NetworkX
- joblib

To run the tests, you also must have nose installed.

Contributing

If you would like to contribute a feature then fork the master branch (fork the release if you are fixing a bug). Be sure to run the tests before changing any code. You'll need to have nosetests installed. The following command will run all the tests:

python setup.py test

Let us know what you want to do just in case we're already working on an implementation of something similar. This way we can avoid any needless duplication of effort. Also, please don't forget to add tests for any new functions.

Comments
  • Fitting mixture of Gaussians throws error

    Fitting mixture of Gaussians throws error

    Since upgrading to pomegranate 0.9.0, code that runs in version 0.7.7 now throws an error.

    Fitting a mixture of MultivariateGaussianDistributions with either fit() or from_samples() e.g.

    pom_model = GeneralMixtureModel.from_samples(MultivariateGaussianDistribution, n_components=10, X=data, verbose=True, stop_threshold=0.01)
    

    results in the following:

    Traceback (most recent call last):
      File "pomegranate/distributions.pyx", line 2254, in pomegranate.distributions.MultivariateGaussianDistribution.from_summaries (pomegranate/distributions.c:40879)
      File "[path]/venv/lib/python3.5/site-packages/scipy/linalg/decomp_cholesky.py", line 91, in cholesky
        check_finite=check_finite)
      File "[path]/venv/lib/python3.5/site-packages/scipy/linalg/decomp_cholesky.py", line 40, in _cholesky
        "definite" % info)
    numpy.linalg.linalg.LinAlgError: 1-th leading minor of the array is not positive definite
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "[path]/venv/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
        exec(code_obj, self.user_global_ns, self.user_ns)
      File "<ipython-input-9-2d0481bd9d41>", line 2, in <module>
        verbose=True, stop_threshold=0.01)
      File "pomegranate/gmm.pyx", line 547, in pomegranate.gmm.GeneralMixtureModel.from_samples (pomegranate/gmm.c:9465)
      File "pomegranate/distributions.pyx", line 2284, in pomegranate.distributions.MultivariateGaussianDistribution.from_samples (pomegranate/distributions.c:41830)
      File "pomegranate/distributions.pyx", line 205, in pomegranate.distributions.Distribution.fit (pomegranate/distributions.c:5382)
      File "pomegranate/distributions.pyx", line 2259, in pomegranate.distributions.MultivariateGaussianDistribution.from_summaries (pomegranate/distributions.c:41197)
    TypeError: Cannot cast ufunc subtract output from dtype('complex128') to dtype('float64') with casting rule 'same_kind'
    

    I vaguely suspect this might have something to do with the following entry from the 0.9.0 changelog:

    • Fixed an issue with multivariate Gaussian distributions where the covariance matrix is no longer invertible with enough missing data by subtracting the smallest eigenvalue from the diagonal

    It might also be worth mentioning that I installed pomegranate and its dependencies in my venv using PyCharm's built-in package-manager.

    Package versions are:

    pomegranate==0.9.0
    numpy==1.14.2
    scipy==1.0.1
    networkx==1.9.1
    joblib==0.11
    

    I tried out different package version combinations and concluded that the above code runs normally if pomegranate==0.7.7 and the other packages remain as listed above.

    I look forward to your input!

    opened by jojanzing 35
  • ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

    ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

    The input for labels needs to be more clear as to what it's looking for. Currently I have

         modelWave = HiddenMarkovModel('Gestures').from_samples(NormalDistribution, 3, training, labels=[sw0, sw1, sw2, sw3], algorithm='labeled')
    Y = modelWave.predict(test, algorithm='viterbi')
    

    where sw0, sw1, ... are states. I get the error shown in the title. What exactly is the label argument supposed to hold?

    opened by sali45 30
  • Hmm.method: Probabilty or Likelihood?

    Hmm.method: Probabilty or Likelihood?

    Hey guys,

    It is not my first issue post and I know it is time consuming to answer all these posts, so I just want to thank you for your help in advance! I’ll really appreciate it and hope its solutions help some of you too!!

    According to the docs the “hmm.log_probability()”-function returns the likelihood: 𝑃(𝐷|𝑀). But in some Issue-cases these two values are equivalently used #381,#336 and #233. Can you help me check if my understanding is correct about the results and their definition space?

    1. The.log_probability uses hmm.forward() algorithm. This means that the results of the hmm.forward() algorithm are likelihoods, likewise hmm.backward() algorithm?

    2. Since theForward-Backward Algorithm is based on the forward- and backward-algorithm it’s returned emission values are likelihoods as well? This is what I understand from the wikipedia following the link in the Code of the hmm.pyx file.

    3. Is the description of the result of the forward-backward-function, which says it returns

    “The normalized probabilities of each state generating each emission.”

    (hmm.pyx: line 1772). Is this a prediction?

    1. Since many methods are using the _log_probability function of the used distribution (like the _forward/_backward or the _predict_log_proba), is it true that this function is calculating the prediction? (e.g. in the NormalDistribution.pyx file: _log_probability is computing the prediction of the Normal-density-function given the Parameters)

    2. hmm.Predict_proba() is giving the probability; according to the docs. It is the exp(predict_log_proba(sequence)). The explanations are saying:

    “The normalized probabilities of each state generating each emission.”

    ( hmm.pyx line 2170), thus a prediction?

    1. Having a look at the code of predict_log_proba(sequence), one sees that it is using _log_proba using _predict_log_proba using _log_probability. I’m not quite sure how the values of_log_probabilty are used, but these are used in the forward-backward algorithm again. Thus a likelihood? (according question 2)

    2. Same with hmm.predict, which uses the method maximum_a_posteriori which again uses .forward-backward() (hmm.pyx line 2278). So, the sequence of states which are returned are based on likelihood?

    3. Nethertheless, having a look at bayes.pyx the .predict_proba is descriped as:

    “posterior P(M|D)…The normalized probability P(M|D) for each sample. This is the probability that the sample was generated from each component.”

    Thus the .predict_proba Is giving the probability for GMMs and HMMs?

    A quick yes/no to the questions 1)-8) would already be enough. Thanks a lot in advanced!!

    1. The function hmm.predict() uses the function maximum_a_posteriori, which leads to conclusion 7. It probably seems petty, but if I want to compare the probability of a state giving a sample, it is quite important. To be precise, is the value from MultivariateGaussianDistribution[0].log_probabiltity(Sequence) with hmm.forward_backward(Sequence)[1][:,0] comparable if the Transistion matrix is equally filled?

    2. For the comparison it is additionally important to know which methoad for normalizing the probability was used for the results of the forward-backward algorithm. (Hmm.pyd: line 1752)

    3. One more thing, which was mentioned in the #381 and #336 and connected to question 10: is there a way to standardize the output into a probability space from 0-1?

    opened by B4marc 24
  • Fitting Bayesian Networks in parallel

    Fitting Bayesian Networks in parallel

    Bayesian networks cannot be fit using the parallel functionality provided. Is it viable to add parallelization to the summarization process done in the "summarize" function? It would be beneficial, while this functionality is not implemented, to inform the user with appropriate error messages whenever parallelization functions are used to fit Bayesian networks.

    opened by UndeadKernel 23
  • Fitting a continuous multivariate HMM from labeled data

    Fitting a continuous multivariate HMM from labeled data

    Hi,

    I'm interested in fitting a Multivariate Gamma HMM from labeled data. The observations are from one long time-series, so right now I'm treating it as a single sequence.

    I have a dataset with observations that are 6-dimensional from independent Gamma distributions.

    X: (N,6) numpy array of dtype float
        N is the number of samples and 6 is the number of features.
    

    The dataset also includes labels for the 4 hidden states.

    labels: (N,) numpy array of dtype int 
        Labels from the set [0,1,2,3].
    

    The documentation and tutorials are great for fitting an event HMM to unlabeled data and for a multivariate GMM, but I'm having a surprisingly difficult time with a continuous multivariate HMM.

    It seems that the from_samples() method is the best to use

    model = HiddenMarkovModel()
    model.from_sample(distribution=GammaDistribution, X=X, labels=labels, algorithm='labeled')
    

    but I get TypeError: from_samples() takes at least 3 position arguments (1 given).

    Everything is so nicely set up, I'm sure there's an easy solution I'm missing. Any suggetions? Hopefully this will help someone else looking to fit a continuous multivariate HMM from labeled data.

    Thanks

    opened by sdk29 20
  • incompatibilities with the upcoming networkx 2.0

    incompatibilities with the upcoming networkx 2.0

    networkx 2.0 switches graph.nodes() to return a generator rather than a list (see https://networkx.readthedocs.io/en/latest/reference/release_2.0.html#api-changes), so constructs such as len(self.graph.nodes()) (https://github.com/jmschrei/pomegranate/blob/master/pomegranate/hmm.pyx#L693) will fail with

    TypeError: object of type 'generator' has no len()
    
    opened by anntzer 20
  • sample weights for GMM

    sample weights for GMM

    Does GeneralMixtureModel.fit() support a weight on the training samples? It returns no error, but the same result (independently on the weights provided). And no error is returned, even when weights has wrong shapes:

    #code snipet
    
    import numpy as np
    from pomegranate import GeneralMixtureModel
    from pomegranate import MultivariateGaussianDistribution
    
    X = np.array([[1, 2],[2, 2],[3, 4]])
    w = np.zeros((356,))
    
    gmm = GeneralMixtureModel(MultivariateGaussianDistribution, n_components=1)
    gmm.fit(X, weights=w)
    preds = gmm.predict(X)
    print preds
    
    

    return no errors.

    opened by ngoix 20
  • Add shortcuts for missing data

    Add shortcuts for missing data

    The following benchmarking program shows that on sklearn's "digits" dataset, which is their dataset most suited for Bayesian analysis, adding these shortcuts reduces the total execution time by about 10% when there are missing values and does not increase the execution time when there are no missing values.

    Interestingly, I could not reproduce the missing values slowdown on this dataset that I see with our own dataset or a randomly generated dataset. Nonetheless, the benchmarks show that there is a significant benefit to adding shortcuts.

    Script:

    import numpy as np
    from neurtu import delayed, Benchmark
    from pomegranate import BayesianNetwork
    from random import random
    from sklearn.datasets import load_digits
    
    X = load_digits(return_X_y=True)[0][:,:10]
    train = delayed(BayesianNetwork)().from_samples(X)
    
    print('Without missing values:')
    print(Benchmark(wall_time=True, cpu_time=True, repeat=3)(train))
    print()
    
    for i in range(X.shape[0]):
        for j in range(X.shape[1]):
            if random() <= 0.1:
                X[i][j] = np.nan
    
    print('With missing values:')
    print(Benchmark(wall_time=True, cpu_time=True, repeat=3)(train))
    print()
    

    Without NaN shortcuts:

    Without missing values:
          wall_time  cpu_time
    mean  11.357033  7.907993
    max   11.491907  7.944401
    std    0.117067  0.031715
    
    With missing values:
          wall_time  cpu_time
    mean  10.816061  7.586988
    max   10.831504  7.742489
    std    0.013388  0.148912
    

    With NaN shortcuts:

    Without missing values:
          wall_time  cpu_time
    mean  11.586937  7.937418
    max   12.074349  8.004921
    std    0.422112  0.067759
    
    With missing values:
          wall_time  cpu_time
    mean   9.708813  6.849663
    max    9.726768  6.869913
    std    0.020418  0.025905
    

    Please let me know if there's anything else you need.

    opened by alexhenrie 19
  • Learning algorithms ?

    Learning algorithms ?

    @jmschrei, excellent work with the API - its the most intuitive amongst the several pgm libraries I have used. Are there plans of implementing learning algorithms in pomegranate ?

    enhancement v1.0.0 
    opened by ronojoy 19
  • Hidden Markov model with Gaussian emissions not training properly

    Hidden Markov model with Gaussian emissions not training properly

    I am trying to fit a time series of a chaotic system with a 3 state hidden markov model with gaussian emissions. I format the data as a numpy array called input_data, and run the code below:

    `

    In[1]: input_data.shape
    Out[1]: (1, 200001, 6)
    In[2]: model=HiddenMarkovModel()
    In[3]: gaussmodel=model.from_samples(MultivariateGaussianDistribution, n_components=3, X=input_data)
    

    `

    Unfortunately the returned states of the model are not accurate, all components of the gaussian means are nearly identical and strictly descending absolute value:

    `

    In[4]: gaussmodel.states[1].distribution.parameters[0]
    Out[4]: 
    [-0.09880035589313062,
     -0.0979069486147086,
     -0.09692864738743179,
     -0.09590506750933378,
     -0.09484837637195158,
     -0.09376959974296883
    
    In[5]: gaussmodel.states[0].distribution.parameters[0]
    Out[5]: 
    [0.25390719830239683,
     0.2529250438880774,
     0.25183304658880706,
     0.25062083419641734,
     0.24927706774681227,
     0.24778986004254214]`
    

    The data set has no such symmetry to it, and in fact these mean values are in some cases totally outside the range of values present in the data, so I do not think it is a problem with the kmeans initialisation of the states.

    opened by joshdorrington 18
  • How to make mixture of HMM

    How to make mixture of HMM

    I want to train mixture of HMM... My data is as follows: [ [11281100, 11009101, 10527100, 10527100, 10103101, 10103101], [11281100,10642100,10395100,11130100,10396101,10740100,10828101,10828101,10828101,10828101, 11009101,10828101,10896100,11006101,10232100] ] Note : Data is categorical. Code I have to train mixture of HMM and for that I have to train each HMM individually.

    hmm1 = HiddenMorkoveModel.from_sample(DiscreteDistribution, n_components = 3, X = X) hmm2 = HiddenMorkoveModel.from_sample(DiscreteDistribution, n_components = 3, X = X) hmm3 = HiddenMorkoveModel.from_sample(DiscreteDistribution, n_components = 3, X = X) hmm4 = HiddenMorkoveModel.from_sample(DiscreteDistribution, n_components = 3, X = X)

    then GMM step: model = GeneralMixtureModel([hmm1, hmm2, hmm3, hmm4]) model.fit(X)

    Here X is new observation sequence.

    but the above step "fit" says: nan i.e. [1] Improvement: nan Time (s): 0.001003 Total Improvement: nan Total Time (s): 0.0050 nan

    And when I tried to: model.predict_proba(np.array([11281100,10642100,10395100,11130100,10396101,10740100,10828101])) Output: array([[ nan, nan, nan, nan], [ nan, nan, nan, nan], [ nan, nan, nan, nan], [ nan, nan, nan, nan], [ nan, nan, nan, nan], [ nan, nan, nan, nan], [ nan, nan, nan, nan]])

    Can you please tell me what is the issue ?

    Thanks

    opened by amitkumarx86 18
  • [BUG] Pomegranate can not be pip installed on Python 3.11

    [BUG] Pomegranate can not be pip installed on Python 3.11

    Description When I try to pip install pomegranate on Python 3.11, it starts building wheels and eventually terminates with the following error:

          gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.11 -I/tmp/pip-build-env-9ivz0_ud/overlay/lib/python3.11/site-packages/numpy/core/include -c pomegranate/MarkovChain.c -o build/temp.linux-x86_64-cpython-311/pomegranate/MarkovChain.o
          pomegranate/MarkovChain.c:208:12: fatal error: longintrepr.h: No such file or directory
            208 |   #include "longintrepr.h"
                |            ^~~~~~~~~~~~~~~
          compilation terminated.
          error: command '/usr/bin/gcc' failed with exit code 1
          [end of output]
      
      note: This error originates from a subprocess, and is likely not a problem with pip.
      ERROR: Failed building wheel for pomegranate
    Failed to build pomegranate
    ERROR: Could not build wheels for pomegranate, which is required to install pyproject.toml-based projects
    

    This error might be related to Cython, looking at this Cython pull request.

    To Reproduce

    docker run python:3.11 pip3 install pomegranate
    
    opened by hylkedonker 0
  • refactored _log_probability function to avoid gil in multithreading

    refactored _log_probability function to avoid gil in multithreading

    I noticed that when using training HiddenMarkovModel with emissions composed by ExponentialDistribution wrapped in IndependentComponentsDistribution, the training was orders of magnitude slower than when using ExponentialDistribution directly. I was able to replicate the same behaviour also with NormalDistribution and so I narrowed it down to IndependentComponentsDistribution.

    The slower speed is noticable only when using multithreading in fit (n_jobs > 1). I was able to narrow down the cause to the function _log_probability of IndependentComponentsDistribution. The following version of the function leads to slow behavior:

    cdef void _log_probability(self, double* X, double* log_probability, int n) nogil:
            cdef int i, j
            cdef double logp
            
            memset(log_probability, 0, n*sizeof(double))
            
            for i in range(n):
    	        for j in range(self.d):
                              if self.cython == 1:
    	                          (<Model> self.distributions_ptr[j])._log_probability(X+i*self.d+j, &logp, 1)
                              else:
    	                          with gil:
    		                          python_log_probability(self.distributions[j], X+i*self.d+j, &logp, 1)
    
                              log_probability[i] += logp * self.weights_ptr[j]
    

    But the following is fast

    cdef void _log_probability(self, double* X, double* log_probability, int n) nogil:
            cdef int i, j
            cdef double logp
            
            memset(log_probability, 0, n*sizeof(double))
            
            for i in range(n):
    	        for j in range(self.d):
                              if True or self.cython == 1: # notice the True here
    	                          (<Model> self.distributions_ptr[j])._log_probability(X+i*self.d+j, &logp, 1)
                              else:
    	                          with gil:
    		                          python_log_probability(self.distributions[j], X+i*self.d+j, &logp, 1)
    
                              log_probability[i] += logp * self.weights_ptr[j]
    

    This happens even though self.cython is always 1, and so the else block is never executed. Moreover, this is also slow:

    cdef void _log_probability(self, double* X, double* log_probability, int n) nogil:
            cdef int i, j
            cdef double logp
            
            memset(log_probability, 0, n*sizeof(double))
            
            for i in range(n):
    	        for j in range(self.d):
                              if  self.cython == 1 or True: # notice the True here with reverse order
    	                          (<Model> self.distributions_ptr[j])._log_probability(X+i*self.d+j, &logp, 1)
                              else:
    	                          with gil:
    		                          python_log_probability(self.distributions[j], X+i*self.d+j, &logp, 1)
    
                              log_probability[i] += logp * self.weights_ptr[j]
    

    I think that branch in the for loop somehow makes it harder for cython to compile efficient code. Here a complete reproducible example that elicits the slow behaviour:

    from pomegranate.distributions import ExponentialDistribution, IndependentComponentsDistribution
    from pomegranate.base import State
    from pomegranate.io import SequenceGenerator
    from pomegranate.hmm import HiddenMarkovModel
    import numpy as np
    
    def get_model(A_d, B_d):
        model = HiddenMarkovModel()
       
        A_s = State(A_d, name="A")
        B_s = State(B_d, name="B")
    
        model.add_states([A_s, B_s])
    
        model.add_transition(
            a=model.start,
            b=A_s,
            probability=0.5,
        )
        model.add_transition(
            a=model.start,
            b=B_s,
            probability=0.5,
        )
        model.add_transition(
            a=A_s,
            b=A_s,
            probability=0.99,
        )
        model.add_transition(
            a=B_s,
            b=B_s,
            probability=0.99,
        )
        model.add_transition(
            a=A_s,
            b=B_s,
            probability=0.01,
        )
        model.add_transition(
            a=B_s,
            b=A_s,
            probability=0.01,
        )
    
        model.bake(verbose=True)
        
        return model
    
    model = get_model(
        IndependentComponentsDistribution(
            [ExponentialDistribution(1), ExponentialDistribution(1), ExponentialDistribution(1)]
        ),
        IndependentComponentsDistribution(
            [ExponentialDistribution(100), ExponentialDistribution(100), ExponentialDistribution(100)]
        )
    )
    X = np.zeros((1000, 3))
    X[:500] = 1
    X[500:] = 1/100
    X_set = [X]*1000
    model.fit(X_set, min_iterations=10, max_iterations=10, verbose=True, n_jobs=-1)
    
    model = get_model(
        ExponentialDistribution(1),
        ExponentialDistribution(100)
    )
    X = np.zeros((1000))
    X[:500] = 1
    X[500:] = 1/100
    X_set = [X]*1000
    model.fit(X_set, min_iterations=10, max_iterations=10, verbose=True, n_jobs=-1)
    

    The first fit call is slow with the original pomegranate code:

    [1] Improvement: 8114.942703745794	Time (s): 6.926
    [2] Improvement: 0.0011873976327478886	Time (s): 8.598
    [3] Improvement: 1.3969838619232178e-09	Time (s): 8.284
    [4] Improvement: -6.426125764846802e-08	Time (s): 8.033
    [5] Improvement: 6.472691893577576e-08	Time (s): 8.972
    [6] Improvement: -6.472691893577576e-08	Time (s): 8.011
    [7] Improvement: 6.472691893577576e-08	Time (s): 7.855
    [8] Improvement: -6.472691893577576e-08	Time (s): 8.366
    [9] Improvement: 6.472691893577576e-08	Time (s): 8.269
    [10] Improvement: -6.472691893577576e-08	Time (s): 8.243
    Total Training Improvement: 8114.943891080562
    Total Training Time (s): 88.2151
    

    The second fit call without IndependentComponentsDistribution is fast:

    [1] Improvement: 8111.520942713367	Time (s): 0.1579
    [2] Improvement: 1.6850003311410546	Time (s): 0.1475
    [3] Improvement: 0.00018647708930075169	Time (s): 0.1554
    [4] Improvement: 1.3271346688270569e-08	Time (s): 0.1509
    [5] Improvement: -1.234002411365509e-08	Time (s): 0.1648
    [6] Improvement: 2.3283064365386963e-10	Time (s): 0.1637
    [7] Improvement: 0.0	Time (s): 0.1708
    [8] Improvement: 0.0	Time (s): 0.1509
    [9] Improvement: 0.0	Time (s): 0.1576
    [10] Improvement: 0.0	Time (s): 0.1464
    Total Training Improvement: 8113.206129522761
    Total Training Time (s): 1.7545
    

    With the modifications in this commit, the slow fit call becomes fast:

    [1] Improvement: 8114.942703745794	Time (s): 0.1773
    [2] Improvement: 0.0011873976327478886	Time (s): 0.1736
    [3] Improvement: 1.3969838619232178e-09	Time (s): 0.1783
    [4] Improvement: -6.426125764846802e-08	Time (s): 0.1667
    [5] Improvement: 6.472691893577576e-08	Time (s): 0.1846
    [6] Improvement: -6.472691893577576e-08	Time (s): 0.1884
    [7] Improvement: 6.472691893577576e-08	Time (s): 0.1826
    [8] Improvement: -6.472691893577576e-08	Time (s): 0.1638
    [9] Improvement: 6.472691893577576e-08	Time (s): 0.1868
    [10] Improvement: -6.472691893577576e-08	Time (s): 0.1695
    Total Training Improvement: 8114.943891080562
    Total Training Time (s): 1.9820
    
    opened by saulpierotti 0
  • File

    File "pomegranate/hmm.pyx", line 1047, in pomegranate.hmm.HiddenMarkovModel.bake UnboundLocalError: local variable 'dist' referenced before assignment

    Describe the bug Trying to do a supervised training with pomegranate.hmm.HiddenMarkovModel. Expecting to train the model and test on observations. However this is the err even though not using dist:

    Pomegranate Version: 0.14.4:

        exec(code_obj, self.user_global_ns, self.user_ns)
      File "<ipython-input-8-391a7a4c3790>", line 2, in <module>
        model.bake()
      File "pomegranate/hmm.pyx", line 1047, in pomegranate.hmm.HiddenMarkovModel.bake
    UnboundLocalError: local variable 'dist' referenced before assignment
    
    

    To Reproduce

        model = HiddenMarkovModel()
        model.bake() # error here
        model.fit([3.2, 6.7, 10.55, 19.55], labels=[1, 2, 3, 4], algorithm='labeled') 
        all_pred = model.predict([2.33, 1.22, 1.4, 10.6])
    

    Thank you so much for your thoughts on this.

    opened by smalik89 3
  • Wrong Markov Chain initial state distribution

    Wrong Markov Chain initial state distribution

    Hello everybody!

    Can anyone please explain me the initial state distribution resulting from .from_samples()?

    Thank you!

    import numpy as np
    from pomegranate import *
    
    np.random.seed(137)
    seq_data = np.random.randint(0, 10, (1, 1000))
    
    markov_chain = MarkovChain.from_samples(seq_data)
    
    print(markov_chain.distributions[0])
    
    n = seq_data.shape[0] * seq_data.shape[1]
    
    for i in range(10):
        print(i, ':', len(np.where(seq_data == i)[0]) / n)
    

    image

    opened by angelo5d0 0
  • Model.predict() gives only minus ones

    Model.predict() gives only minus ones

    I am new to pomegranate, and I want to build a HMM with two hidden states. I have labels, and a sequence of series of observations, most of which come from a multinomial distribution (k = 4, n = 60). (There is also one that comes from a wrapped normal distribution—see issue #1002—but it can be forgotten for now.) Any calls to predict() result in just a list of minus ones, and sometimes the warning "Sequence is impossible":

    import numpy as np
    from pomegranate import HiddenMarkovModel, DirichletDistribution
    
    # label = np.array([0, 0, 0, ..., 1, 1, 1])
    # data = np.array([[60, 0, 0, 0], [58, 2, 0, 0], ...])
    dists = [DirichletDistribution.from_samples(data[np.where(label == 0)]),
             DirichletDistribution.from_samples(data[np.where(label == 1)])]
    trans_mat = np.array([[0.999, 0.001],
                          [0.002, 0.998]])
    starts = np.array([0.5, 0.5])
    model = HiddenMarkovModel.from_matrix(trans_mat, dists, starts)
    model.predict(data)
    # [-1, -1, -1, ...]
    

    What am I missing here? (If I create the model with HiddenMarkovModel.from_samples() instead, predict(), fit(), etc. result in a segfault, but I didn't file a bug report yet, since I think I am just doing something wrong.

    opened by mhavu 6
Releases(0.4.0)
  • 0.4.0(Mar 30, 2016)

  • 0.0.2(Jan 4, 2015)

    I am releasing the code for HMMs, FSMs, and discrete Bayesian networks for public debugging. The development for HMMs and FSMs is probably finished, but any bugs or feedback are still welcome! Predominately looking for feedback on the Bayesian network implementation and usage.

    Source code(tar.gz)
    Source code(zip)
Owner
Jacob Schreiber
I am a post-doc at Stanford University (previously University of Washington), studying large scale machine learning and computational biology
Jacob Schreiber
This program will stylize your photos with fast neural style transfer.

Neural Style Transfer (NST) Using TensorFlow Demo TensorFlow TensorFlow is an end-to-end open source platform for machine learning. It has a comprehen

Ismail Boularbah 1 Aug 08, 2022
Fight Recognition from Still Images in the Wild @ WACVW2022, Real-world Surveillance Workshop

Fight Detection from Still Images in the Wild Detecting fights from still images is an important task required to limit the distribution of social med

Şeymanur Aktı 10 Nov 09, 2022
Code repository for "Free View Synthesis", ECCV 2020.

Free View Synthesis Code repository for "Free View Synthesis", ECCV 2020. Setup Install the following Python packages in your Python environment - num

Intelligent Systems Lab Org 253 Dec 07, 2022
Paddle-Skeleton-Based-Action-Recognition - DecoupleGCN-DropGraph, ASGCN, AGCN, STGCN

Paddle-Skeleton-Action-Recognition DecoupleGCN-DropGraph, ASGCN, AGCN, STGCN. Yo

Chenxu Peng 3 Nov 02, 2022
Stroke-predictions-ml-model - Machine learning model to predict individuals chances of having a stroke

stroke-predictions-ml-model machine learning model to predict individuals chance

Alex Volchek 1 Jan 03, 2022
FactSeg: Foreground Activation Driven Small Object Semantic Segmentation in Large-Scale Remote Sensing Imagery (TGRS)

FactSeg: Foreground Activation Driven Small Object Semantic Segmentation in Large-Scale Remote Sensing Imagery by Ailong Ma, Junjue Wang*, Yanfei Zhon

Kingdrone 43 Jan 05, 2023
A blender add-on that automatically re-aligns wrong axis objects.

Auto Align A blender add-on that automatically re-aligns wrong axis objects. Usage There are three options available in the 3D Viewport Sidebar It

29 Nov 25, 2022
[ICML 2021] Towards Understanding and Mitigating Social Biases in Language Models

Towards Understanding and Mitigating Social Biases in Language Models This repo contains code and data for evaluating and mitigating bias from generat

Paul Liang 42 Jan 03, 2023
LSTM-VAE Implementation and Relevant Evaluations

LSTM-VAE Implementation and Relevant Evaluations Before using any file in this repository, please create two directories under the root directory name

Lan Zhang 5 Oct 08, 2022
EfficientDet (Scalable and Efficient Object Detection) implementation in Keras and Tensorflow

EfficientDet This is an implementation of EfficientDet for object detection on Keras and Tensorflow. The project is based on the official implementati

1.3k Dec 19, 2022
Causal Imitative Model for Autonomous Driving

Causal Imitative Model for Autonomous Driving Mohammad Reza Samsami, Mohammadhossein Bahari, Saber Salehkaleybar, Alexandre Alahi. arXiv 2021. [Projec

VITA lab at EPFL 8 Oct 04, 2022
Bayesian optimisation library developped by Huawei Noah's Ark Library

Bayesian Optimisation Research This directory contains official implementations for Bayesian optimisation works developped by Huawei R&D, Noah's Ark L

HUAWEI Noah's Ark Lab 395 Dec 30, 2022
Weakly- and Semi-Supervised Panoptic Segmentation (ECCV18)

Weakly- and Semi-Supervised Panoptic Segmentation by Qizhu Li*, Anurag Arnab*, Philip H.S. Torr This repository demonstrates the weakly supervised gro

Qizhu Li 159 Dec 20, 2022
Implementation of Kaneko et al.'s MaskCycleGAN-VC model for non-parallel voice conversion.

MaskCycleGAN-VC Unofficial PyTorch implementation of Kaneko et al.'s MaskCycleGAN-VC (2021) for non-parallel voice conversion. MaskCycleGAN-VC is the

86 Dec 25, 2022
DTCN IJCAI - Sequential prediction learning framework and algorithm

DTCN This is the implementation of our paper "Sequential Prediction of Social Me

Bobby 2 Jan 24, 2022
[CVPR 2022] TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing (CVPR 2022) This repository provides the official PyTorch impleme

Billy XU 128 Jan 03, 2023
Convolutional Neural Network for 3D meshes in PyTorch

MeshCNN in PyTorch SIGGRAPH 2019 [Paper] [Project Page] MeshCNN is a general-purpose deep neural network for 3D triangular meshes, which can be used f

Rana Hanocka 1.4k Jan 04, 2023
Implementation of Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

advantage-weighted-regression Implementation of Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning, by Peng et al. (

Omar D. Domingues 1 Dec 02, 2021
This is a classifier which basically predicts whether there is a gun law in a state or not, depending on various things like murder rates etc.

Gun-Laws-Classifier This is a classifier which basically predicts whether there is a gun law in a state or not, depending on various things like murde

Awais Saleem 1 Jan 20, 2022
A production-ready, scalable Indexer for the Jina neural search framework, based on HNSW and PSQL

🌟 HNSW + PostgreSQL Indexer HNSWPostgreSQLIndexer Jina is a production-ready, scalable Indexer for the Jina neural search framework. It combines the

Jina AI 25 Oct 14, 2022