scikit-survival is a Python module for survival analysis built on top of scikit-learn.

Overview

License readthedocs.org Digital Object Identifier (DOI)

Linux Build Status macOS Build Status Windows Build Status on AppVeyor codecov Codacy Badge

scikit-survival

scikit-survival is a Python module for survival analysis built on top of scikit-learn. It allows doing survival analysis while utilizing the power of scikit-learn, e.g., for pre-processing or doing cross-validation.

About Survival Analysis

The objective in survival analysis (also referred to as time-to-event or reliability analysis) is to establish a connection between covariates and the time of an event. What makes survival analysis differ from traditional machine learning is the fact that parts of the training data can only be partially observed – they are censored.

For instance, in a clinical study, patients are often monitored for a particular time period, and events occurring in this particular period are recorded. If a patient experiences an event, the exact time of the event can be recorded – the patient’s record is uncensored. In contrast, right censored records refer to patients that remained event-free during the study period and it is unknown whether an event has or has not occurred after the study ended. Consequently, survival analysis demands for models that take this unique characteristic of such a dataset into account.

Requirements

  • Python 3.7 or later
  • ecos
  • joblib
  • numexpr
  • numpy 1.16 or later
  • osqp
  • pandas 0.25 or later
  • scikit-learn 0.24
  • scipy 1.0 or later
  • C/C++ compiler

Installation

The easiest way to install scikit-survival is to use Anaconda by running:

conda install -c sebp scikit-survival

Alternatively, you can install scikit-survival from source following this guide.

Examples

The user guide provides in-depth information on the key concepts of scikit-survival, an overview of available survival models, and hands-on examples in the form of Jupyter notebooks.

Help and Support

Documentation

Bug reports

  • If you encountered a problem, please submit a bug report.

Questions

  • If you have a question on how to use scikit-survival, please use GitHub Discussions.
  • For general theoretical or methodological questions on survival analysis, please use Cross Validated.

Contributing

New contributors are always welcome. Please have a look at the contributing guidelines on how to get started and to make sure your code complies with our guidelines.

References

Please cite the following paper if you are using scikit-survival.

S. Pölsterl, "scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn," Journal of Machine Learning Research, vol. 21, no. 212, pp. 1–6, 2020.
@article{sksurv,
  author  = {Sebastian P{\"o}lsterl},
  title   = {scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {212},
  pages   = {1-6},
  url     = {http://jmlr.org/papers/v21/20-729.html}
}
Comments
  • CoxPH SurvivalAnalysis and Singular Matrix Error

    CoxPH SurvivalAnalysis and Singular Matrix Error

    I'm going through the tutorial using the veterans lung cancer study and I am using the same code for my own dataset for Cox regression. My problem is to calculating the days to graft failure after a transplant and the dataset has about 900 features after encoding and other preprocessing steps and it has 130K rows. I prepared data for Cox regression (data_x is a dataframe and data_y is a numpy array of status and suvival_in_days) and took a sample of it to run. However when I run the CoxRegression, I am getting the error of: LinAlgError:Matrix is Singular I manipulated my data in different ways, but I could not understand what is the problem and how to solve it.

    awaiting response 
    opened by sarahysh12 22
  • Explain how to interpret output of .predict() in API doc

    Explain how to interpret output of .predict() in API doc

    (I also posted this as a question on Stack Overflow: https://stackoverflow.com/q/47274356/1870832 )

    I'm confused how to interpret the output of .predict from a fitted CoxnetSurvivalAnalysis model in scikit-survival. I've read through the notebook Intro to Survival Analysis in scikit-survival and the API reference, but can't find an explanation. Below is a minimal example of what leads to my confusion:

    import pandas as pd
    from sksurv.datasets import load_veterans_lung_cancer
    from sksurv.linear_model import CoxnetSurvivalAnalysis
    
    # load data
    data_X, data_y = load_veterans_lung_cancer()
    
    # one-hot-encode categorical columns in X
    categorical_cols = ['Celltype', 'Prior_therapy', 'Treatment']
    
    X = data_X.copy()
    for c in categorical_cols:
        dummy_matrix = pd.get_dummies(X[c], prefix=c, drop_first=False)
        X = pd.concat([X, dummy_matrix], axis=1).drop(c, axis=1)
    
    # display final X to fit Cox Elastic Net model on
    del data_X
    print(X.head(3))
    
    

    so here's the X going into the model:

       Age_in_years  Celltype  Karnofsky_score  Months_from_Diagnosis  \
    0          69.0  squamous             60.0                    7.0   
    1          64.0  squamous             70.0                    5.0   
    2          38.0  squamous             60.0                    3.0   
    
      Prior_therapy Treatment  
    0            no  standard  
    1           yes  standard  
    2            no  standard  
    
    

    ...moving on to fitting model and generating predictions:

    # Fit Model
    coxnet_model = CoxnetSurvivalAnalysis()
    coxnet.fit(X, data_y)    
    
    # What are these predictions?    
    preds = coxnet.predict(X)
    
    

    preds has same number of records as X, but their values are wayyy different than the values in data_y, even when predicted on the same data they were fit on.

    print(preds.mean()) 
    print(data_y['Survival_in_days'].mean())
    

    output:

    -0.044114643249153422
    121.62773722627738
    
    

    So what exactly are preds? Clearly .predict means something pretty different here than in scikit-learn, but I can't figure out what. The API Reference says it returns "The predicted decision function," but what does that mean? And how do I get to the predicted estimate in months yhat for a given X? I'm new to survival analysis so I'm obviously missing something.

    opened by MaxPowerWasTaken 21
  • During install: error: command '/usr/bin/clang' failed with exit code 1

    During install: error: command '/usr/bin/clang' failed with exit code 1

    Python version: Python 3.10.3

    OS: OSX 12.4 (Proc: M1 chip)

    When trying to pip install (tried versions 0.17 and 0.18):

          222 warnings and 4 errors generated.
          error: command '/usr/bin/clang' failed with exit code 1
          [end of output]
    

    The errors seem to be:

          In file included from sksurv/linear_model/_coxnet.cpp:801:
          In file included from sksurv/linear_model/src/coxnet_wrapper.h:21:
          sksurv/linear_model/src/coxnet/coxnet.h:139:23: error: expected unqualified-id
                      if (!std::isfinite(exp_xw[k])) {
                                ^
    

          In file included from sksurv/linear_model/src/coxnet/coxnet.h:18:
          In file included from sksurv/linear_model/src/eigen/Eigen/Core:374:
          sksurv/linear_model/src/eigen/Eigen/src/Core/MathFunctions.h:753:12: error: reference to unresolved using declaration
              return isnan EIGEN_NOT_A_MACRO (x);
                     ^
    

          In file included from sksurv/linear_model/src/coxnet/coxnet.h:18:
          In file included from sksurv/linear_model/src/eigen/Eigen/Core:374:
          sksurv/linear_model/src/eigen/Eigen/src/Core/MathFunctions.h:738:12: error: reference to unresolved using declaration
              return isinf EIGEN_NOT_A_MACRO (x);
                     ^
    

          In file included from sksurv/linear_model/src/coxnet/coxnet.h:18:
          In file included from sksurv/linear_model/src/eigen/Eigen/Core:374:
          sksurv/linear_model/src/eigen/Eigen/src/Core/MathFunctions.h:723:12: error: reference to unresolved using declaration
              return isfinite EIGEN_NOT_A_MACRO (x);
                     ^
    

    Happy to provide more details if needed

    opened by tpilewicz 13
  • 0.12.0: from sksurv.ensemble import RandomSurvivalForest fails

    0.12.0: from sksurv.ensemble import RandomSurvivalForest fails

    Upon upgrading to 0.12.0

    >>> from sksurv.ensemble import RandomSurvivalForest
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/gchu/miniconda3/envs/dev/lib/python3.6/site-packages/sksurv/ensemble/__init__.py", line 2, in <module>
        from .forest import RandomSurvivalForest  # noqa: F401
      File "/Users/gchu/miniconda3/envs/dev/lib/python3.6/site-packages/sksurv/ensemble/forest.py", line 14, in <module>
        from ..tree import SurvivalTree
      File "/Users/gchu/miniconda3/envs/dev/lib/python3.6/site-packages/sksurv/tree/__init__.py", line 1, in <module>
        from .tree import SurvivalTree  # noqa: F401
      File "/Users/gchu/miniconda3/envs/dev/lib/python3.6/site-packages/sksurv/tree/tree.py", line 14, in <module>
        from ._criterion import LogrankCriterion
      File "_splitter.pxd", line 34, in init sksurv.tree._criterion
    ValueError: sklearn.tree._splitter.Splitter size changed, may indicate binary incompatibility. Expected 368 from C header, got 360 from PyObject
    >>>
    
    opened by gregchu 13
  • Fix a variety of build problems.

    Fix a variety of build problems.

    Checklist

    • [x] py.test passes
    • [x] documentation renders correctly

    What does this implement/fix? Explain your changes

    In LLVM, this project was not compiling properly. With these changes, the project seems to compile fine.

    opened by llpamies 10
  • viz of ensemble models

    viz of ensemble models

    Hi!

    would you have any advice on how to visualize decision path / decision trees from the ensemble survival model methods (either RF or Gradient Boosting)?

    opened by ad05bzag 10
  • Different results of CoxPHSurvivalAnalysis and CoxnetSurvivalAnalysis

    Different results of CoxPHSurvivalAnalysis and CoxnetSurvivalAnalysis

    The documentation of CoxPHSurvivalAnalysis says:

    Cox proportional hazards model.

    And the documentation of CoxnetSurvivalAnalysis says:

    Cox's proportional hazard's model with elastic net penalty.

    So I assume the two classes implement the same model, and should return the same results when set with the same model parameters and given the same data. However, I see different results. Why? Also, what are the differences between them?

    Codes:

    from sksurv.linear_model import CoxPHSurvivalAnalysis, CoxnetSurvivalAnalysis
    from sksurv.datasets import load_veterans_lung_cancer
    from sksurv.preprocessing import OneHotEncoder
    
    X_, y = load_veterans_lung_cancer()
    X = OneHotEncoder().fit_transform(X_)
    
    # try to match the model parameters wherever possible
    f = CoxPHSurvivalAnalysis(alpha=0.5, n_iter=100000)
    g = CoxnetSurvivalAnalysis(alphas=[0.5], alpha_min_ratio=1, n_alphas=1, 
                               l1_ratio=1e-16, tol=1e-09, normalize=False)
    
    print(f)
    print(g)
    
    f.fit(X, y)
    g.fit(X, y)
    
    print(f.coef_)
    print(g.coef_[:,0])
    

    Output:

    CoxPHSurvivalAnalysis(alpha=0.5, n_iter=100000, tol=1e-09, verbose=0)
    CoxnetSurvivalAnalysis(alpha_min_ratio=0.0001, alphas=[0.5], copy_X=True,
                l1_ratio=1e-16, max_iter=100000, n_alphas=1, normalize=False,
                penalty_factor=None, tol=1e-09, verbose=False)
    [-8.34518623e-03 -7.21105070e-01 -2.80434400e-01 -1.11234345e+00
     -3.26083027e-02 -1.93213436e-04  6.22726190e-02  2.90289950e-01]
    [-0.00346722 -0.05117406  0.06044394 -0.16433136 -0.03300373  0.0003172
     -0.00881617  0.06956854]
    

    What I've gathered:

    • CoxPHSurvivalAnalysis is sksurv's own implementation of Cox Proportional Hazard model, and supports ridge (L2) regularization.
    • CoxnetSurvivalAnalysis is a wrapper of some C++ extension codes used by R's glmnet package, and supports elastic net (L1 and L2) regularization.
    • In the test files, CoxPHSurvivalAnalysis is tested with the Rossi dataset, while CoxnetSurvivalAnalysis is tested with the Breast Cancer dataset.
    • The two classes have different constructor signatures and methods (eg, only CoxPHSurvivalAnalysis has predict_survival_function).

    Will it be some nice features to have a consolidated constructor signatures and methods for the two classes? And have them tested on the same dataset, for validation or comparison?

    Thanks.

    opened by leihuang 10
  • Add `apply` and `decision_path` to `SurvivalTree`

    Add `apply` and `decision_path` to `SurvivalTree`

    Checklist

    • [x] closes #290
    • [x] py.test passes
    • [x] tests are included
    • [x] code is well formatted
    • [x] documentation renders correctly

    What does this implement/fix? Explain your changes

    Add apply and decision_path to SurvivalForest to also enable the same methods for RandomSurvivalForest and ExtraSurvivalTrees.

    opened by Vincent-Maladiere 8
  • RandomSurvivalForest - predict_survival_function

    RandomSurvivalForest - predict_survival_function

    Describe the bug

    1. I am trying to predict the survival function for my data using RandomSurvivalForest, although the class method works well, it doesn't retrieve the times for each of the steps in the survival function. Each list containing the survival function has a lenght equal or lower to the number of unique times in our "y", hence we can't deduct to what point in time each steps belongs to.

    2. Additionally, if you follow the example given in the documentation of RandomSurvivalForest, you will get the following error:

    from sksurv.datasets import load_whas500
    X, y = load_whas500()
    times = sorted(np.unique(y["lenfol"])) 
    n_times = len(times) 
    # n_times =  395
    
    estimator = RandomSurvivalForest().fit(X, y)
    surv_funcs = estimator.predict_survival_function(X.iloc[:5])
    
    surv_funcs[0]
    # array([0.9975    , 0.9975    , 0.9975    , 0.9975    , 0.9975    ,
    #       0.9975    , 0.9975    , 0.995     , 0.98883333, 0.98883333,...
    
    len(surv_funcs[0])
    # 162
    
    

    Additionally, if you follow the example given in the documentation of RandomSurvivalForest, you will get an error since the result of predict_survival_function is an 1D unlike the same function used in CoxnetSurvivalAnalysis or CoxPHSurvivalAnalysis. This is the error you get:

    from sksurv.datasets import load_whas500
    X, y = load_whas500()
    estimator = RandomSurvivalForest().fit(X, y)
    surv_funcs = estimator.predict_survival_function(X.iloc[:5])
    for fn in surv_funcs:
           plt.step(fn.x, fn(fn.x), where="post")
    
    plt.ylim(0, 1)
    plt.show()
    
    AttributeError: 'numpy.ndarray' object has no attribute 'x'
    
    opened by felipe0216 8
  • Error when using PIP to install scikit-survival 0.13 that uses PEP 517

    Error when using PIP to install scikit-survival 0.13 that uses PEP 517

    Describe the bug

    A clear and concise description of what the bug is.

    Code Sample to Reproduce the Bug

    # Insert your code here that produces the bug.
    # This example should be as succinct as possible and self-contained,
    # i.e., not rely on external data.
    # We are going to copy-paste your code and we expect to get the same result as you.
    # It should run in a fresh python session, and so include all relevant imports.
    

    Expected Results A clear and concise description of what you expected to happen.

    Actual Results Please paste or specifically describe the actual output or traceback.

    Versions Please execute the following snippet and paste the output below.

    import sklearn; sklearn.show_versions()
    import sksurv; print("sksurv:", sksurv.__version__)
    import cvxopt; print("cvxopt:", cvxopt.__version__)
    import cvxpy; print("cvxpy:", cvxpy.__version__)
    import numexpr; print("numexpr:", numexpr.__version__)
    import osqp; print("osqp:", osqp.OSQP().version())
    
    opened by SurajitTest 8
  • Loss Function

    Loss Function "ipcwls" in GradientBoostingSurvivalAnalysis leads to error

    Hi

    I was trying to train a time-to-failure model using machine sensor data. I chose the loss function 'ipcwls' which as per the docs weights the observations by their censoring weights. Although I'm not aware of the thoery behind it, it seemed like a reasonable choice. But, the code fails while applying the fit() function with the error message "input contains nan infinity or a value too large for dtype float64"

    FYI, All of my X variables are scaled and they take continuous values within +-50 range. Quite a few has small values close to zero (5-6 decimal places). Is the loss function choice leading to a division by zero situation? Need some clarity on this and when this loss function should not be used.

    Thanks, Soham

    opened by Soham2112 8
  • n_iter_no_change in GradientBoostingSurvivalAnalysis

    n_iter_no_change in GradientBoostingSurvivalAnalysis

    Describe the bug

    The documentation for the parameter "n_estimators_" of GradientBoostingSurvivalAnalysis says "The number of estimators as selected by early stopping (if n_iter_no_change is specified)." However, GradientBoostingSurvivalAnalysis does not accept n_iter_no_change as an argument.

    Code Sample to Reproduce the Bug

    from sksurv.ensemble import GradientBoostingSurvivalAnalysis
    GradientBoostingSurvivalAnalysis(n_iter_no_change = 10)
    

    Actual Results

    TypeError: GradientBoostingSurvivalAnalysis.__init__() got an unexpected keyword argument 'n_iter_no_change'
    Please paste or specifically describe the actual output or traceback.
    

    Versions System: python: 3.10.8 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0] machine: Linux-5.15.0-56-generic-x86_64-with-glibc2.35

    Python dependencies: sklearn: 1.2.0 pip: 22.3.1 setuptools: 65.5.0 numpy: 1.23.4 scipy: None Cython: 0.29.32 pandas: 1.5.1 matplotlib: 3.6.2 joblib: 1.2.0 threadpoolctl: 3.1.0

    opened by TristanFauvel 0
  • Added conditional property to expose time scale predictions

    Added conditional property to expose time scale predictions

    Checklist

    • [X] closes #324
    • [X] py.test passes
    • [ ] tests are included
    • [X] code is well formatted
    • [X] documentation renders correctly

    Added a decorator for properties, which are only available, if a check returns true. The decorator provided by scikit-learn only works for functions sadly.

    @sebp I am not sure what to test exactly, maybe a test which tests whether pipelines correctly patch the property and functions through? I also think this should not show up in the documentation, as it is internal?

    opened by Finesim97 5
  • SciKit-Learn Pipeline not patched with

    SciKit-Learn Pipeline not patched with "_predict_risk_score"

    Describe the bug

    In my own evaluation code I used the check for '_predict_risk_score' to see, whether models return their predictions on the time scale or risk scale, but this doesn't work, when the estimator is wrapped in a pipeline.

    # Insert your code here that produces the bug.
    from sklearn.pipeline import Pipeline
    from sksurv.linear_model.aft import IPCRidge
    from sksurv.datasets import load_veterans_lung_cancer
    from sksurv.preprocessing import OneHotEncoder
    from sksurv.base import SurvivalAnalysisMixin
    
    
    data_x, data_y = load_veterans_lung_cancer()
    
    
    data_x_prep = OneHotEncoder().fit_transform(data_x)
    model_direct = IPCRidge().fit(data_x_prep, data_y)
    
    
    pipe = Pipeline([('encode', OneHotEncoder()),
                     ('model', IPCRidge())])
    pipe.fit(data_x, data_y)
    
    
    # Are equal
    print(model_direct.predict(data_x_prep.head()))
    print(pipe.predict(data_x.head()))
    
    
    # Steal super method
    # This does not match, because ...
    print(SurvivalAnalysisMixin.score(model_direct, data_x_prep, data_y))
    print(SurvivalAnalysisMixin.score(pipe, data_x, data_y))
    
    
    # ... the property is not patched through
    # if this returns true, the scores are treated as being on the time scale
    print(not getattr(model_direct, "_predict_risk_score", True))
    print(not getattr(pipe, "_predict_risk_score", True))
    
    
    # The second one should also be true!
    

    Expected Results A Pipeline object should also have the corresponding property set, as this might break evaluation codes.

    Actual Results The property is not available. It should be possible to just add it to the __init__.py, but I am not sure, how well it works together with the @property decorator. Currently I am finishing my master thesis, but I should be able to work out a PR on the 5th of December while testing the behaviour.

    Versions (Not running the newest version cough)

    System:
        python: 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:03)  [GCC 9.4.0]
    executable: /home/jovyan/master-thesis/env/bin/python
       machine: Linux-5.10.0-15-amd64-x86_64-with-glibc2.35
    
    Python dependencies:
          sklearn: 1.1.2
              pip: 22.2.2
       setuptools: 65.4.0
            numpy: 1.23.3
            scipy: 1.9.1
           Cython: None
           pandas: 1.5.0
       matplotlib: 3.6.0
           joblib: 1.2.0
    threadpoolctl: 3.1.0
    
    Built with OpenMP: True
    
    threadpoolctl info:
           user_api: openmp
       internal_api: openmp
             prefix: libgomp
           filepath: /home/jovyan/master-thesis/env/lib/python3.9/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
            version: None
        num_threads: 48
    
           user_api: blas
       internal_api: openblas
             prefix: libopenblas
           filepath: /home/jovyan/master-thesis/env/lib/libopenblasp-r0.3.21.so
            version: 0.3.21
    threading_layer: pthreads
       architecture: Zen
        num_threads: 48
    
           user_api: blas
       internal_api: openblas
             prefix: libopenblas
           filepath: /home/jovyan/master-thesis/env/lib/python3.9/site-packages/scipy.libs/libopenblasp-r0-9f9f5dbc.3.18.so
            version: 0.3.18
    threading_layer: pthreads
       architecture: Zen
        num_threads: 48
    sksurv: 0.18.0
    
    enhancement 
    opened by Finesim97 2
  • Bug in nonparametric.py when calling IPCRidge

    Bug in nonparametric.py when calling IPCRidge

    Describe the bug

    Running IPCRidge hangs with the following message

    assert (Ghat > 0).all()

    and nothing after. I found that changing the option 'reverse = False' as shown down below in kaplan_meier_estimator in the function ipc_weights in the file nonparametric.py corrects the mistake. Error message:

    AssertionError                            Traceback (most recent call last)
    Input In [74], in <cell line: 5>()
          2 set_config(display="text")  # displays text representation of estimators
          4 estimator = IPCRidge(alpha = 0.5,fit_intercept=True)
    ----> 5 estimator.fit(data_x,data_y)
    
    File /opt/homebrew/Caskroom/miniforge/base/envs/teaching_env/lib/python3.10/site-packages/sksurv/linear_model/aft.py:90, in IPCRidge.fit(self, X, y)
         72 """Build an accelerated failure time model.
         73 
         74 Parameters
       (...)
         86 self
         87 """
         88 event, time = check_array_survival(X, y)
    ---> 90 weights = ipc_weights(event, time)
         91 super().fit(X, numpy.log(time), sample_weight=weights)
         93 return self
    
    File /opt/homebrew/Caskroom/miniforge/base/envs/teaching_env/lib/python3.10/site-packages/sksurv/nonparametric.py:323, in ipc_weights(event, time)
        320 idx = numpy.searchsorted(unique_time, time[event])
        321 Ghat = p[idx]
    --> 323 assert (Ghat > 0).all()
        325 weights = numpy.zeros(time.shape[0])
        326 weights[event] = 1.0 / Ghat
    
    AssertionError: 
    

    Code Sample to Reproduce the Bug

    used code:

    estimator = IPCRidge(alpha = 0.5,fit_intercept=True)
    estimator.fit(data_x,data_y)
    
    

    Here is what I changed in the nonparametric.py in the line unique_time, p = kaplan_meier_estimator(event, time, reverse=False) -- changed True to False

    def ipc_weights(event, time):
        """Compute inverse probability of censoring weights
    
        Parameters
        ----------
        event : array, shape = (n_samples,)
            Boolean event indicator.
    
        time : array, shape = (n_samples,)
            Time when a subject experienced an event or was censored.
    
        Returns
        -------
        weights : array, shape = (n_samples,)
            inverse probability of censoring weights
    
        See also
        --------
        CensoringDistributionEstimator
            An estimator interface for estimating inverse probability
            of censoring weights for unseen time points.
        """
        if event.all():
            return np.ones(time.shape[0])
    
        unique_time, p = kaplan_meier_estimator(event, time, reverse=False)
    
        idx = np.searchsorted(unique_time, time[event])
        Ghat = p[idx]
    
        assert (Ghat > 0).all()
    
        weights = np.zeros(time.shape[0])
        weights[event] = 1.0 / Ghat
    
        return weights
    

    Machine and packages versions used:

    Last updated: 2022-11-08T08:59:04.111247-05:00
    
    Python implementation: CPython
    Python version       : 3.10.5
    IPython version      : 8.4.0
    
    Compiler    : Clang 13.0.1 
    OS          : Darwin
    Release     : 21.6.0
    Machine     : arm64
    Processor   : arm
    CPU cores   : 10
    Architecture: 64bit
    
    matplotlib: 3.5.2
    numpy     : 1.22.4
    pandas    : 1.4.4
    json      : 2.0.9
    
    bug 
    opened by fbarfi 4
  • Suggestions for StepFunction

    Suggestions for StepFunction

    I have 2 minor suggestions for StepFunction that I would like to see:

    1. Different argument name for 'x' in init and call. In addition, current API reference is missing.
    2. Sort the arrays inside the function.

    Thanks.

    awaiting response 
    opened by drproduck 1
  • KM_variance_estimator

    KM_variance_estimator

    Checklist

    • [x] py.test passes
    • [x] tests are included
    • [x] code is well formatted
    • [ ] documentation renders correctly

    What does this implement/fix? Explain your changes

    Hi @sebp, I added the Greenwood's estimation of KM variance to nonparametric.py (this is a prerequesite for implementing some goodness-of-fit tests). NB: I ran tox -e py310-docs but for some reason the new function does not not appear in the API doc. Best,

    opened by TristanFauvel 3
Releases(v0.19.0.post1)
Owner
Sebastian Pölsterl
Sebastian Pölsterl
VHub - An API that permits uploading of vulnerability datasets and return of the serialized data

VHub - An API that permits uploading of vulnerability datasets and return of the serialized data

André Rodrigues 2 Feb 14, 2022
Flenser is a simple, minimal, automated exploratory data analysis tool.

Flenser Have you ever been handed a dataset you've never seen before? Flenser is a simple, minimal, automated exploratory data analysis tool. It runs

John McCambridge 79 Sep 20, 2022
Working Time Statistics of working hours and working conditions by industry and company

Working Time Statistics of working hours and working conditions by industry and company

Feng Ruohang 88 Nov 04, 2022
Active Learning demo using two small datasets

ActiveLearningDemo How to run step one put the dataset folder and use command below to split the dataset to the required structure run utils.py For ea

3 Nov 10, 2021
Average time per match by division

HW_02 Unzip matches.rar to access .json files for matches. Get an API key to access their data at: https://developer.riotgames.com/ Average time per m

11 Jan 07, 2022
Analytical view of olist e-commerce in Brazil

Analysis of E-Commerce Public Dataset by Olist The objective of this project is to propose an analytical view of olist e-commerce in Brazil. For this

Gurpreet Singh 1 Jan 11, 2022
SparseLasso: Sparse Solutions for the Lasso

SparseLasso: Sparse Solutions for the Lasso Introduction SparseLasso provides a Scikit-Learn based estimation of the Lasso with cross-validation tunin

Gabriel Okasa 1 Nov 08, 2021
Geospatial data-science analysis on reasons behind delay in Grab ride-share services

Grab x Pulis Detailed analysis done to investigate possible reasons for delay in Grab services for NUS Data Analytics Competition 2022, to be found in

Keng Hwee 6 Jun 07, 2022
Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.

weightedcalcs weightedcalcs is a pandas-based Python library for calculating weighted means, medians, standard deviations, and more. Features Plays we

Jeremy Singer-Vine 98 Dec 31, 2022
CPSPEC is an astrophysical data reduction software for timing

CPSPEC manual Introduction CPSPEC is an astrophysical data reduction software for timing. Various timing properties, such as power spectra and cross s

Tenyo Kawamura 1 Oct 20, 2021
Programmatically access the physical and chemical properties of elements in modern periodic table.

API to fetch elements of the periodic table in JSON format. Uses Pandas for dumping .csv data to .json and Flask for API Integration. Deployed on "pyt

the techno hack 3 Oct 23, 2022
A 2-dimensional physics engine written in Cairo

A 2-dimensional physics engine written in Cairo

Topology 38 Nov 16, 2022
MS in Data Science capstone project. Studying attacks on autonomous vehicles.

Surveying Attack Models for CAVs Guide to Installing CARLA and Collecting Data Our project focuses on surveying attack models for Connveced Autonomous

Isabela Caetano 1 Dec 09, 2021
Package for decomposing EMG signals into motor unit firings, as used in Formento et al 2021.

EMGDecomp Package for decomposing EMG signals into motor unit firings, created for Formento et al 2021. Based heavily on Negro et al, 2016. Supports G

13 Nov 01, 2022
Data and code accompanying the paper Politics and Virality in the Time of Twitter

Politics and Virality in the Time of Twitter Data and code accompanying the paper Politics and Virality in the Time of Twitter. In specific: the code

Cardiff NLP 3 Jul 02, 2022
Intercepting proxy + analysis toolkit for Second Life compatible virtual worlds

Hippolyzer Hippolyzer is a revival of Linden Lab's PyOGP library targeting modern Python 3, with a focus on debugging issues in Second Life-compatible

Salad Dais 6 Sep 01, 2022
The lastest all in one bombing tool coded in python uses tbomb api

BaapG-Attack is a python3 based script which is officially made for linux based distro . It is inbuit mass bomber with sms, mail, calls and many more bombing

59 Dec 25, 2022
A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset

xwrf A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset. The primary objective of

National Center for Atmospheric Research 43 Nov 29, 2022
A Python Tools to imaging the shallow seismic structure

ShallowSeismicImaging Tools to imaging the shallow seismic structure, above 10 km, based on the ZH ratio measured from the ambient seismic noise, and

Xiao Xiao 9 Aug 09, 2022
A meta plugin for processing timelapse data timepoint by timepoint in napari

napari-time-slicer A meta plugin for processing timelapse data timepoint by timepoint. It enables a list of napari plugins to process 2D+t or 3D+t dat

Robert Haase 2 Oct 13, 2022