Extensible, parallel implementations of t-SNE

Overview

openTSNE

Build Status Documentation Status Codacy Badge License Badge

openTSNE is a modular Python implementation of t-Distributed Stochasitc Neighbor Embedding (t-SNE) [1], a popular dimensionality-reduction algorithm for visualizing high-dimensional data sets. openTSNE incorporates the latest improvements to the t-SNE algorithm, including the ability to add new data points to existing embeddings [2], massive speed improvements [3] [4], enabling t-SNE to scale to millions of data points and various tricks to improve global alignment of the resulting visualizations [5].

Macosko 2015 mouse retina t-SNE embedding

A visualization of 44,808 single cell transcriptomes obtained from the mouse retina [6] embedded using the multiscale kernel trick to better preserve the global aligment of the clusters.

Installation

openTSNE requires Python 3.6 or higher in order to run.

Conda

openTSNE can be easily installed from conda-forge with

conda install --channel conda-forge opentsne

Conda package

PyPi

openTSNE is also available through pip and can be installed with

pip install opentsne

PyPi package

Installing from source

If you wish to install openTSNE from source, please run

python setup.py install

in the root directory to install the appropriate dependencies and compile the necessary binary files.

Please note that openTSNE requires a C/C++ compiler to be available on the system. Additionally, numpy must be pre-installed in the active environment.

In order for openTSNE to utilize multiple threads, the C/C++ compiler must support OpenMP. In practice, almost all compilers implement this with the exception of older version of clang on OSX systems.

To squeeze the most out of openTSNE, you may also consider installing FFTW3 prior to installation. FFTW3 implements the Fast Fourier Transform, which is heavily used in openTSNE. If FFTW3 is not available, openTSNE will use numpy’s implementation of the FFT, which is slightly slower than FFTW. The difference is only noticeable with large data sets containing millions of data points.

A hello world example

Getting started with openTSNE is very simple. First, we'll load up some data using scikit-learn

from sklearn import datasets

iris = datasets.load_iris()
x, y = iris["data"], iris["target"]

then, we'll import and run

from openTSNE import TSNE

embedding = TSNE().fit(x)

Citation

If you make use of openTSNE for your work we would appreciate it if you would cite the paper

@article {Poli{\v c}ar731877,
    author = {Poli{\v c}ar, Pavlin G. and Stra{\v z}ar, Martin and Zupan, Bla{\v z}},
    title = {openTSNE: a modular Python library for t-SNE dimensionality reduction and embedding},
    year = {2019},
    doi = {10.1101/731877},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {https://www.biorxiv.org/content/early/2019/08/13/731877},
    eprint = {https://www.biorxiv.org/content/early/2019/08/13/731877.full.pdf},
    journal = {bioRxiv}
}

openTSNE implements two efficient algorithms for t-SNE. Please consider citing the original authors of the algorithm that you use. If you use FIt-SNE (default), then the citation is [4] below, but if you use Barnes-Hut the citation is [3].

References

[1] Van Der Maaten, Laurens, and Hinton, Geoffrey. “Visualizing data using t-SNE.” Journal of Machine Learning Research 9.Nov (2008): 2579-2605.
[2] Poličar, Pavlin G., Martin Stražar, and Blaž Zupan. “Embedding to Reference t-SNE Space Addresses Batch Effects in Single-Cell Classification.” BioRxiv (2019): 671404.
[3] (1, 2) Van Der Maaten, Laurens. “Accelerating t-SNE using tree-based algorithms.” Journal of Machine Learning Research 15.1 (2014): 3221-3245.
[4] (1, 2) Linderman, George C., et al. "Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data." Nature Methods 16.3 (2019): 243.
[5] Kobak, Dmitry, and Berens, Philipp. “The art of using t-SNE for single-cell transcriptomics.” Nature Communications 10, 5416 (2019).
[6] Macosko, Evan Z., et al. “Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets.” Cell 161.5 (2015): 1202-1214.
Comments
  • A bunch of comments and questions

    A bunch of comments and questions

    Hi Pavlin! Great work. I did not know about Orange but I am working with scRNA-seq data myself (cf. your Zeisel2018 example) and I am using Python, so it's interesting to see developments in that direction.

    I have a couple of scattered comments/questions that I will just dump here. This isn't a real "issue".

    1. You say that BH is much faster than FFT for smaller datasets. That's interesting; I did not notice this. What kind of numbers are you talking about here? I was under impression that with n<10k both methods are so fast (I guess all 1000 iterations under 1 min?) that the exact time does not really matter...

    2. Any specific reason to use "Python/Numba implementation of nearest neighbor descent" for approximate nearest neighbours? There are some popular libraries, e.g. annoy. Is your implementation much faster than that? Because otherwise it could be easier to use a well-known established library... I think Leland McInnes is using something similar (Numba implementation of nearest neighbor descent) in his UMAP; did you follow him here?

    3. I did not look at the actual code, but from the description on the main page it sounds that you don't have a vanilla t-SNE implementation in here. Is it true? I think it would be nice to have vanilla t-SNE in here too. For datasets with n=1k-2k it's pretty fast and I guess many people would prefer to use vanilla t-SNE if possible.

    4. I noticed you writing this in one of the closed issues:

      we allow new data to be added into the existing embedding by direct optimization. To my knowledge, no other library does this. It's sometimes difficult to get nice embeddings like this, but it may have potential.

      That's interesting. How exactly are you doing this? You fix the existing embedding, compute all the affinities for the extended dataset (original data + new data) and then optimize the cost by allowing only the positions of the new points to change? Something like that?

    5. George sped up his code quite a bit by adding multithreading to the F_attr computations. He is now implementing multithreading for the repulsive forces too. See https://github.com/KlugerLab/FIt-SNE/pull/32, and the discussion there. This might be interesting for you too. Or are you already using multithreading during gradient descent?

    6. I am guessing that your Zeisel2018 plot is colored using the same 16 "megaclusters" that Zeisel et al. use in Figure 1B (https://www.cell.com/cms/attachment/f1754f20-890c-42f5-aa27-bbb243127883/gr1_lrg.jpg). If so, it would be great if you used the same colors as in their figure; this would ease the comparison. Of course you are not trying to make comparisons here, but this is something that would be interesting to me personally :)

    opened by dkobak 37
  • Runtime and RAM usage compared to FIt-SNE

    Runtime and RAM usage compared to FIt-SNE

    I understand that openTSNE is expected to be slower than FIt-SNE, but I'd like to understand how much slower it is in typical situations. As I reported earlier, when I run it on 70000x50 PCA-reduced MNIST data with default parameters and n_jobs=-1, I get ~60 seconds with FIt-SNE and ~120 seconds with openTSNE. Every 50 iterations take around 2s vs around 4s.

    I did not check for this specific case, but I suspect that FFT takes only a small fraction of this time, and the computational bottleneck is formed by the attractive forces. Can one profile openTSNE and see how much time is taken by different steps, such as repulsive/attractive computations?

    Apart from that, and possibly even more worryingly, I replicated the data 6x and added some noise, to get a 420000x50 data matrix. It takes FIt-SNE around 1Gb of RAM to allocate the space for the kNN matrix, so it works just fine on my laptop. However, openTSNE rapidly took >7Gb of RAM and crashed the kernel (I have 16 Gb but around half was taken by other processes). This happened in the first seconds, so I assume it happens during the kNN search. Does pynndescent eat up so much memory in this case?

    discussion 
    opened by dkobak 25
  • Why does transform() have exaggeration=2 by default?

    Why does transform() have exaggeration=2 by default?

    The parameters of the transform function are

    def transform(self, X, perplexity=5, initialization="median", k=25,
    learning_rate=100, n_iter=100, exaggeration=2, momentum=0, max_grad_norm=0.05):
    

    so it has exaggeration=2 by default. Why? This looks unintuitive to me: exaggeration is a slightly "weird" trick that can arguably be very useful for huge data sets, but I would expect the out-of-sample embedding to work just fine with it. Am I missing something?

    I am also curious why momentum is set to 0 (unlike in normal tSNE optimization), but here I don't have any intuition for what it should be.

    Another question is: will this function work with n_iter=0 if one just wants to get an embedding using medians of k nearest neighbours? That would be handy. Or is there another way to get this? Perhaps from prepare_partial?

    And lastly, when transform() is applied to points from a very different data set (imagine positioning Smart-seq2 cells onto a 10x Chromium reference), I prefer to use correlation distances because I suspect Euclidean distances might be completely off (even when the original tSNE was done using Euclidean distances). I think openTSNE currently does not support this, right? Did you have any problems with that? One could perhaps allow transform() to take a metric argument (is correlation among the supported metrics, btw?). The downside is that if this metric is different from the metric used to prepare the embedding, then the nearest neighbours object will have to be recomputed, so it will suddenly become much slower. Let me know if I should post it as a separate issue.

    question 
    opened by dkobak 25
  • Implement auto early exaggeration

    Implement auto early exaggeration

    Implements #218.

    First, early_exaggeration="auto" is now set to max(12, exaggeration).

    Second, the learning rate. We have various functions that currently take learning_rate="auto" and set it to max(200, N/12). I did not change this, because those functions usually do not know what the early exaggeration was. So I kept it as is. I only changed the behaviour of the base class: there learning_rate="auto" is now set to max(200, N/early_exaggeration).

    This works as intended:

    X = np.random.randn(10000,10)
    
    TSNE(verbose=True).fit(X)
    # Prints 
    # TSNE(early_exaggeration=12, verbose=True)
    # Uses lr=833.33
    
    TSNE(verbose=True, exaggeration=5).fit(X)
    # Prints
    # TSNE(early_exaggeration=12, exaggeration=5, verbose=True)
    # Uses lr=833.33
    
    TSNE(verbose=True, exaggeration=20).fit(X)
    # Prints
    # TSNE(early_exaggeration=20, exaggeration=20, verbose=True)
    # Uses lr=500.00
    

    (Note that the learning rate is currently not printed by the repr(self) because it's kept as "auto" at construction time and only set later. That's also how we had it before.)

    opened by dkobak 24
  • Add spectral initialization using diffusion maps

    Add spectral initialization using diffusion maps

    Description of changes

    Fixes #110.

    I ended up implementing diffusion maps only because computationally, computing the leading eigenvectors is much faster than the smallest eigenvectors, and of the various spectral methods, diffusion maps are the only ones that require this. I checked what UMAP does - it uses the symmetric normalized laplacian for initialization - but they manually set the number of lanczos iteration limit, which I don't understand. This seemed like the better option.

    @dkobak Do you want to take a look at this? I implemented this using scipy.sparse.linalg.svds because it turns out to be faster than scipy.sparse.linalg.eigsh and it seemed to produce strange results when I increased the error tolerance, while svds results seemed reasonable.

    Includes
    • [X] Code changes
    • [ ] Tests
    • [ ] Documentation
    opened by pavlin-policar 22
  • `pynndescent` has recently changed

    `pynndescent` has recently changed

    Expected behaviour

    Return the embedding

    Actual behaviour

    Return the embedding with one warning : .../miniconda3/lib/python3.7/site-packages/openTSNE/nearest_neighbors.py:181: UserWarning: pynndescent has recently changed which distance metrics are supported, and openTSNE.nearest_neighbors has not been updated. Please notify the developers of this change. "pynndescent has recently changed which distance metrics are supported, "

    Steps to reproduce the behavior

    Hello World steps

    opened by VallinP 18
  • Added Annoy support

    Added Annoy support

    Added Annoy support as per #101. Annoy is used by default if it supports the given metric and if the input data is not scipy.sparse (otherwise Pynndescent is used).

    This needs installed Annoy (I installed this https://anaconda.org/conda-forge/python-annoy), but I wasn't sure where to add this dependency.

    opened by dkobak 16
  • FFT parameters and runtime for very expanded embeddings

    FFT parameters and runtime for very expanded embeddings

    I have been doing some experiments on convergence and running t-SNE for many more iterations than I normally do. And I again noticed something that I used to see every now and then: the runtime jumps wildly between "epochs" of 50 iterations. This only happens when the embedding is very expanded and so FFT gets really slow. Look:

    Iteration   50, KL divergence 4.8674, 50 iterations in 1.8320 sec
    Iteration  100, KL divergence 4.3461, 50 iterations in 1.8760 sec
    Iteration  150, KL divergence 4.0797, 50 iterations in 2.6252 sec
    Iteration  200, KL divergence 3.9082, 50 iterations in 4.5062 sec
    Iteration  250, KL divergence 3.7864, 50 iterations in 5.4258 sec
    Iteration  300, KL divergence 3.6957, 50 iterations in 7.2500 sec
    Iteration  350, KL divergence 3.6259, 50 iterations in 9.0705 sec
    Iteration  400, KL divergence 3.5711, 50 iterations in 10.1077 sec
    Iteration  450, KL divergence 3.5271, 50 iterations in 12.2412 sec
    Iteration  500, KL divergence 3.4909, 50 iterations in 13.6440 sec
    Iteration  550, KL divergence 3.4604, 50 iterations in 14.6127 sec
    Iteration  600, KL divergence 3.4356, 50 iterations in 17.2364 sec
    Iteration  650, KL divergence 3.4143, 50 iterations in 17.6973 sec
    Iteration  700, KL divergence 3.3986, 50 iterations in 27.9720 sec
    Iteration  750, KL divergence 3.3914, 50 iterations in 34.0480 sec
    Iteration  800, KL divergence 3.3863, 50 iterations in 34.4572 sec
    Iteration  850, KL divergence 3.3820, 50 iterations in 36.9247 sec
    Iteration  900, KL divergence 3.3779, 50 iterations in 47.0994 sec
    Iteration  950, KL divergence 3.3737, 50 iterations in 40.8424 sec
    Iteration 1000, KL divergence 3.3696, 50 iterations in 62.1549 sec
    Iteration 1050, KL divergence 3.3653, 50 iterations in 30.6310 sec
    Iteration 1100, KL divergence 3.3613, 50 iterations in 44.9781 sec
    Iteration 1150, KL divergence 3.3571, 50 iterations in 36.9257 sec
    Iteration 1200, KL divergence 3.3531, 50 iterations in 66.3830 sec
    Iteration 1250, KL divergence 3.3493, 50 iterations in 37.7215 sec
    Iteration 1300, KL divergence 3.3457, 50 iterations in 33.7942 sec
    Iteration 1350, KL divergence 3.3421, 50 iterations in 33.7507 sec
    Iteration 1400, KL divergence 3.3387, 50 iterations in 59.2065 sec
    Iteration 1450, KL divergence 3.3354, 50 iterations in 36.3713 sec
    Iteration 1500, KL divergence 3.3323, 50 iterations in 39.1894 sec
    Iteration 1550, KL divergence 3.3293, 50 iterations in 67.3239 sec
    Iteration 1600, KL divergence 3.3265, 50 iterations in 33.9837 sec
    Iteration 1650, KL divergence 3.3238, 50 iterations in 63.5015 sec
    

    For the record, this is on full MNIST with uniform k=15 affinity, n_jobs=-1. Note that after it gets to 30 seconds / 50 iterations, it starts fluctuating between 30 and 60. This does not make sense.

    I suspect it may be related to how interpolation params are chosen depending on the grid size. Can it be that those heuristics may need improvement?

    Incidentally, can it be that the interpolation params can be relaxed once the embedding becomes very large (e.g. span larger than [-100,100]) so that optimisation runs faster without -- perhaps! -- compromising the approximation too much?

    CCing to @linqiaozhi.

    opened by dkobak 15
  • Pynndescent build/query

    Pynndescent build/query

    We discussed this before, but I've been playing around with some sparse data now and wanted to report some runtimes.

    When using pynndecent, openTSNE runs build() with n_neighbors=15 and then query() with n_neighbors=3*perplexity. At the same time, Leland said that that's not efficient and the recommended way to use pynndescent is to run build() with the desired number of neighbors and then simply take its constructed kNN graph without querying. You said that you ran some benchmarks and found your way to be faster. Here are runtimes I got on X that is sparse of size (100000, 9630).

    nn = NNDescent(X, metric='cosine', n_neighbors=15)      # Wall time: 39 s
    nn.query(X, k=15)                                         # Wall time: 1min 57s
    nn.query(X, k=90)                                         # Wall time: 3min 21s
    nn90 = NNDescent(X, metric='cosine', n_neighbors=90)  # Wall time: 7min 45s
    nn90.query(X, k=90)                                     # Wall time: 57min 53s
    

    For k=90 it is indeed faster to build with k=15 and then query with k=90, so I can confirm your observation.

    My only suggestion would be to modify the NNDescent class so that if the desired k is less than some threshold then build is done with k+1 and then the constructed tree is returned without query. We can simply use 15 as the threshold. I did this locally and can PR.

    opened by dkobak 15
  • Cannot pass random_state to PerplexityBasedNN when using Annoy

    Cannot pass random_state to PerplexityBasedNN when using Annoy

    Hi Pavlin,

    this is quite a miniscule bug, but I noticed that when using PerplexityBasedNN it fails when you pass it a numpy RandomState instance as it uses that for the call to the AnnoyIndex(...).set_seed(seed) call. Since the documentation says that is accepts both an integer and a numpy random state, I guess this is a (tiny) bug.

    Expected behaviour

    It sets a seed for the internal random state of annoy.

    Actual behaviour

    It crashes with a TypeError:

      File "/home/jnb/dev/openTSNE/openTSNE/nearest_neighbors.py", line 276, in build
        self.index.set_seed(self.random_state)
    TypeError: an integer is required (got type numpy.random.mtrand.RandomState)
    
    Steps to reproduce the behavior
    import numpy as np
    from openTSNE import PerplexityBasedNN
    
    random_state = np.random.default_rng(333)
    data = random_state.uniform(size=(10000,10))
    PerplexityBasedNN(rdata, andom_state=random_state)
    

    Fix

    in nearest_neighbors.py line 275 can be changed from self.index.set_seed(self.random_state) to

            if isinstance(self.random_state, int):
                self.index.set_seed(self.random_state)
            else: # has to be a numpy RandomState
                self.index.set_seed(self.random_state.randint(-(2 ** 31), 2 ** 31))
    

    Let me know if it should come as a pull request or if you'll just incorporate it like this. Cheers

    opened by jnboehm 14
  • Workaround for -1 in pynndescent index

    Workaround for -1 in pynndescent index

    Fixes #130 .

    Changes:

    • Query() is only used for k>15.
    • n_jobs fixed to 1 for sparse inputs to avoid a pynndescent bug
    • find all points where index contains -1 values, and let them randomly attract each other.
    opened by dkobak 14
  • Switching spectral initialization to sklean.manifold.SpectralEmbeddings

    Switching spectral initialization to sklean.manifold.SpectralEmbeddings

    A student in our lab is currently looking into spectral initialization, and she found out that openTSNE.init.spectral(tol=0) in some cases does not agree to sklearn.manifold.SpectralEmbedding(affinity='precomputed'). In some cases it does agree perfectly or near-perfectly, but we have an example when the result is very different, and SpectralEmbedding gives what seems like a more meaningful result.

    I looked at the math, and it seems that they should conceptually be computing the same thing (SpectralEmbedding finds eigenvectors of the L_sym, whereas init.spectral finds generalized eigenvectors or W and D, but that should be equivalent, as per https://jlmelville.github.io/smallvis/spectral.html @jlmelville). We don't know what the difference is due to. It may be numerical.

    However, conceptually, it seems sensible if openTSNE would simply outsource the computation to sklearn.

    A related issue is that init.spectral is not reproducible and gives different results with each run. Apparently the way we initialize v0 makes ARPACK to still have some randomness. Sklearn gets around this by initializing v0 differently. I guess openTSNE should do the same -- but of course if we end up simply calling SpectralEmbedding then it won't matter.

    opened by dkobak 10
  • Online Documentation Not Rendering Python Code

    Online Documentation Not Rendering Python Code

    Expected behaviour

    Screen Shot 2022-12-14 at 8 14 34 PM

    Actual behaviour

    Screen Shot 2022-12-14 at 8 14 16 PM

    Steps to reproduce the behavior

    Go to: https://opentsne.readthedocs.io/en/latest/examples/02_advanced_usage/02_advanced_usage.html

    opened by MattScicluna 0
  • Missed reference

    Missed reference

    I think you missed our paper in your citations.

    Zhirong Yang, Jaakko Peltonen and Samuel Kaski. Scalable Optimization of Neighbor Embedding for Visualization. In ICML 2013.

    opened by rozyangno 2
  • Memory collapses with precomputed block matrix

    Memory collapses with precomputed block matrix

    Expected behaviour

    When I run tSNE on a symmetric 200x200 block matrix such as this one distancematrix0 I expect TSNE to return 4 distinct clusters (actually 4 points only). Sklearn yields this.

    Actual behaviour

    Using openTSNE the terminal crashes with full memory (50% of the time). If it survives the clusters are visible, however the result is not as satisfying.

    Steps to reproduce the behavior

    matrix = Block matrix tsne = TSNE(metric='precomputed', initialization='spectral', negative_gradient_method='bh') embedding = tsne.fit(matrix)

    NOTE: I am using the direct installation from GitHub this morning.

    bug 
    opened by fsvbach 17
Releases(v0.6.2)
  • v0.6.2(Mar 18, 2022)

    Changes

    • By default, we now use the MultiscaleMixture affinity model, enabling us to pass in a list of perplexities instead of a single perplexity value. This is fully backwards compatible.
    • Previously, perplexity values would be changed according to the dataset. E.g. we pass in perplexity=100 with N=150. Then TSNE.perplexity would be equal to 50. Instead, keep this value as is and add an effective_perplexity_ attribute (following the convention from scikit-learn, which puts in the corrected perplexity values.
    • Fix bug where interpolation grid was being prepared even when using BH optimization during transform.
    • Enable calling .transform with precomputed distances. In this case, the data matrix will be assumed to be a distance matrix.

    Build changes

    • Build with oldest-supported-numpy
    • Build linux wheels on manylinux2014 instead of manylinux2010, following numpy's example
    • Build MacOS wheels on macOS-10.15 instead of macos-10.14 Azure VM
    • Fix potential problem with clang-13, which actually does optimization with infinities using the -ffast-math flag
    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Apr 25, 2021)

    Changes:

    • Remove affinites from TSNE construction, allow custom affinities and initialization in .fit method. This improves the API when dealing with non-tabular data. This is not backwards compatible.
    • Add metric="precomputed". This includes the addition of openTSNE.nearest_neighbors.PrecomputedDistanceMatrix and openTSNE.nearest_neighbors.PrecomputedNeighbors.
    • Add knn_index parameter to openTSNE.affinity classes.
    • Add (less-than-ideal) workaround for pickling Annoy objects.
    • Extend the range of recommended FFTW boxes up to 1000.
    • Remove deprecated openTSNE.nearest_neighbors.BallTree.
    • Remove deprecated openTSNE.callbacks.ErrorLogger.
    • Remove deprecated TSNE.neighbors_method property.
    • Add and set as default negative_gradient_method="auto".
    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Dec 24, 2020)

  • v0.4.0(May 4, 2020)

    Major changes:

    • Remove numba dependency, switch over to using Annoy nearest neighbor search. Pynndescent is now optional and can be used if installed manually.
    • Massively speed-up transform by keeping reference interpolation grid fixed. Limit new points to circle centered around reference embedding.
    • Implement variable degrees of freedom.

    Minor changes:

    • Add spectral initialization using diffusion maps.
    • Replace cumbersome ErrorLogger callback with the verbose flag.
    • Change the default number of iterations to 750.
    • Add learning_rate="auto" option.
    • Remove the min_grad_norm parameter.

    Bugfixes:

    • Fix case where KL divergence was sometimes reported as NaN.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Sep 11, 2018)

    In order to make usage as simple as possible and remove and external dependencies on FFTW (which needed to be installed locally before), this update replaces FFTW with numpy's FFT.

    Source code(tar.gz)
    Source code(zip)
Owner
Pavlin Poličar
PhD student working on applying machine learning methods to biomedical and scRNA-seq data.
Pavlin Poličar
This is a small repository for me to implement my simply Data Visualisation skills through Python.

Data Visualisations This is a small repository for me to implement my simply Data Visualisation skills through Python. Steam Population Chart from 10/

9 Dec 31, 2021
A python-generated website for visualizing the novel coronavirus (COVID-19) data for Greece.

COVID-19-Greece A python-generated website for visualizing the novel coronavirus (COVID-19) data for Greece. Data sources Data provided by Johns Hopki

Isabelle Viktoria Maciohsek 23 Jan 03, 2023
A grammar of graphics for Python

plotnine Latest Release License DOI Build Status Coverage Documentation plotnine is an implementation of a grammar of graphics in Python, it is based

Hassan Kibirige 3.3k Jan 01, 2023
Fast visualization of radar_scenes based on oleschum/radar_scenes

RadarScenes Tools About This python package provides fast visualization for the RadarScenes dataset. The Open GL based visualizer is smoother than ole

Henrik Söderlund 2 Dec 09, 2021
Schema validation just got Pythonic

Schema validation just got Pythonic schema is a library for validating Python data structures, such as those obtained from config-files, forms, extern

Vladimir Keleshev 2.7k Jan 06, 2023
Statistical data visualization using matplotlib

seaborn: statistical data visualization Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing

Michael Waskom 10.2k Dec 30, 2022
OpenStats is a library built on top of streamlit that extracts data from the Github API and shows the main KPIs

Open Stats Discover and share the KPIs of your OpenSource project. OpenStats is a library built on top of streamlit that extracts data from the Github

Pere Miquel Brull 4 Apr 03, 2022
Frbmclust - Clusterize FRB profiles using hierarchical clustering, plot corresponding parameters distributions

frbmclust Getting Started Clusterize FRB profiles using hierarchical clustering,

3 May 06, 2022
This Crash Course will cover all you need to know to start using Plotly in your projects.

Plotly Crash Course This course was designed to help you get started using Plotly. If you ever felt like your data visualization skills could use an u

Fábio Neves 2 Aug 21, 2022
Visualize large time-series data in plotly

plotly_resampler enables visualizing large sequential data by adding resampling functionality to Plotly figures. In this Plotly-Resampler demo over 11

PreDiCT.IDLab 604 Dec 28, 2022
Tandem Mass Spectrum Prediction with Graph Transformers

MassFormer This is the original implementation of MassFormer, a graph transformer for small molecule MS/MS prediction. Check out the preprint on arxiv

Röst Lab 13 Oct 27, 2022
Python & Julia port of codes in excellent R books

X4DS This repo is a collection of Python & Julia port of codes in the following excellent R books: An Introduction to Statistical Learning (ISLR) Stat

Gitony 5 Jun 21, 2022
A set of useful perceptually uniform colormaps for plotting scientific data

Colorcet: Collection of perceptually uniform colormaps Build Status Coverage Latest dev release Latest release Docs What is it? Colorcet is a collecti

HoloViz 590 Dec 31, 2022
simple tool to paint axis x and y

simple tool to paint axis x and y

G705 1 Oct 21, 2021
Visual Python is a GUI-based Python code generator, developed on the Jupyter Notebook environment as an extension.

Visual Python is a GUI-based Python code generator, developed on the Jupyter Notebook environment as an extension.

Visual Python 564 Jan 03, 2023
Print matplotlib colors

mplcolors Tired of searching "matplotlib colors" every week/day/hour? This simple script displays them all conveniently right in your terminal emulato

Brandon Barker 32 Dec 13, 2022
Piglet-shaders - PoC of custom shaders for Piglet

Piglet custom shader PoC This is a PoC for compiling Piglet fragment shaders usi

6 Mar 10, 2022
CompleX Group Interactions (XGI) provides an ecosystem for the analysis and representation of complex systems with group interactions.

XGI CompleX Group Interactions (XGI) is a Python package for the representation, manipulation, and study of the structure, dynamics, and functions of

Complex Group Interactions 67 Dec 28, 2022
An interactive dashboard built with python that enables you to visualise how rent prices differ across Sweden.

sweden-rent-dashboard An interactive dashboard built with python that enables you to visualise how rent prices differ across Sweden. The dashboard/web

Rory Crean 5 Dec 19, 2021
Regress.me is an easy to use data visualization tool powered by Dash/Plotly.

Regress.me Regress.me is an easy to use data visualization tool powered by Dash/Plotly. Regress.me.-.Google.Chrome.2022-05-10.15-58-59.mp4 Get Started

Amar 14 Aug 14, 2022