CLASSIX is a fast and explainable clustering algorithm based on sorting

Overview

CLASSIX

Fast and explainable clustering based on sorting

Publish Build Status codecov License: MIT PyPI pyversions !pypi Documentation Status Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge Binder

CLASSIX is a fast and explainable clustering algorithm based on sorting. Here are a few highlights:

  • Ability to cluster low and high-dimensional data of arbitrary shape efficiently.
  • Ability to detect and deal with outliers in the data.
  • Ability to provide textual explanations for the generated clusters.
  • Full reproducibility of all tests in the accompanying paper.
  • Support of Cython compilation.

CLASSIX is a contrived acronym of CLustering by Aggregation with Sorting-based Indexing and the letter X for explainability. CLASSIX clustering consists of two phases, namely a greedy aggregation phase of the sorted data into groups of nearby data points, followed by a merging phase of groups into clusters. The algorithm is controlled by two parameters, namely the distance parameter radius for the group aggregation and a minPts parameter controlling the minimal cluster size.

Here is a video abstract of CLASSIX:

A detailed documentation (still in progress), including tutorials, is available at Dev.

Install

CLASSIX has the following dependencies for its clustering functionality:

  • cython>=0.29.4
  • numpy>=1.20.0
  • scipy>1.6.0
  • requests

and requires the following packages for data visualization:

  • matplotlib
  • pandas

To install the current release via PIP use:

pip install ClassixClustering

To install this package with conda run:

conda install -c nla.stefan.xinye classix

To check the installation you can use either of commands (the second is for conda users)

python -m pip show ClassixClustering
conda list classix

Download this repository via:

$ git clone https://github.com/nla-group/classix.git

Quick start

from sklearn import datasets
from classix import CLASSIX

# Generate synthetic data
X, y = datasets.make_blobs(n_samples=1000, centers=2, n_features=2, random_state=1)

# Employ CLASSIX clustering
clx = CLASSIX(sorting='pca', radius=0.5, verbose=0)
clx.fit(X)

Get the clustering result by clx.labels_ and visualize the clustering:

plt.figure(figsize=(10,10))
plt.rcParams['axes.facecolor'] = 'white'
plt.scatter(X[:,0], X[:,1], c=clx.labels_)
plt.show()

The explain method

CLASSIX provides an API for the easy visualization of clusters, and to explain the assignment of data points to their clusters. To get an overview of the data points, the location of starting points, and their associated groups, simply type:

clx.explain(plot=True)

The starting points are marked as the small red boxes. The method also returns a textual summary as follows:

A clustering of 5000 data points with 2 features has been performed. 
The radius parameter was set to 0.50 and MinPts was set to 0. 
As the provided data has been scaled by a factor of 1/6.01,
data points within a radius of R=0.50*6.01=3.01 were aggregated into groups. 
In total 7903 comparisons were required (1.58 comparisons per data point). 
This resulted in 14 groups, each uniquely associated with a starting point. 
These 14 groups were subsequently merged into 2 clusters. 
A list of all starting points is shown below.
----------------------------------------
 Group  NrPts  Cluster  Coordinates 
   0     398      0     -1.19 -1.09 
   1    1073      0     -0.65 -1.15 
   2     553      0     -1.17 -0.56 
  ---      lines omitted        ---
  11       6      1       0.42 1.35 
  12       5      1       1.24 0.59 
  13       2      1        1.0 1.08 
----------------------------------------
In order to explain the clustering of individual data points, 
use .explain(ind1) or .explain(ind1, ind2) with indices of the data points. 

In the above table, Group denotes the group label, NrPts denotes the number of data points in the group, Cluster is the cluster label assigned to the group, and the final column shows the Coordinates of the starting point. In order to explain the cluster assignment of a particular data point, we provide its index to the explain method:

clx.explain(0, plot=True)

Output:
The data point 0 is in group 2, which has been merged into cluster #0.

We can also query why two data points ended up in the same cluster, or not:

clx.explain(0, 2000, plot=True)

Output:
The data point 0 is in group 2, which has been merged into cluster 0.
The data point 2000 is in group 10, which has been merged into cluster 1.
There is no path of overlapping groups between these clusters.

Reproducible experiment

All empirical data in the paper are reproducible by running the code in the folder of "exp". Before running, ensure the dependency package scikit-learn and hdbscan are installed, and compile Quickshift++ code (obtained in Quickshift++: Provably Good Initializations for Sample-Based Mean Shift). After configuring all of these, run the commands below.

cd exp
python3 run exp_main.py

All results will be stored on "exp/results". Please let us know if you have any questions.

Citation

@techreport{CG22b,
  title   = {Fast and explainable clustering based on sorting},
  author  = {Chen, Xinye and G\"{u}ttel, Stefan},
  year    = {2022},
  number  = {arXiv:2202.01456},
  pages   = {25},
  institution = {The University of Manchester},
  address = {UK},
  type    = {arXiv EPrint},
  url     = {https://arxiv.org/abs/2202.01456}
}

License

This project is licensed under the terms of the MIT license.

Comments
  • ModuleNotFoundError: No module named 'numpy'

    ModuleNotFoundError: No module named 'numpy'

    I am trying to install classix 0.7.4 from a requirements file into a venv, but unfortunately I am getting an error about numpy not found, which kind of makes no sense to me, because it is installed in one of the steps above in version 1.22.4. I was hoping someone here could help me figure out what is going wrong.

    System: Debian 11 (bullseye) Python version: 3.9.2 CLASSIX version: 0.7.4 Cython: 0.29.32

    Steps to reproduce:

    $ python3 -m venv venv
    $ source venv/bin/activate
    $ python3 -m pip install -r requirements.txt 
    Collecting tqdm~=4.64.0
      Using cached tqdm-4.64.1-py2.py3-none-any.whl (78 kB)
    Collecting numpy~=1.22.4
      Using cached numpy-1.22.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.8 MB)
    Collecting pandas~=1.4.2
      Using cached pandas-1.4.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.7 MB)
    Collecting seaborn~=0.11.2
      Using cached seaborn-0.11.2-py3-none-any.whl (292 kB)
    Collecting matplotlib~=3.5.2
      Using cached matplotlib-3.5.3-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.2 MB)
    Collecting tensorflow~=2.9.1
      Using cached tensorflow-2.9.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (511.8 MB)
    Collecting scikit-learn~=1.1.1
      Using cached scikit_learn-1.1.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (30.8 MB)
    Collecting nfstream~=6.5.1
      Using cached nfstream-6.5.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
    Collecting tabulate~=0.8.9
      Using cached tabulate-0.8.10-py3-none-any.whl (29 kB)
    Collecting missingno~=0.5.1
      Using cached missingno-0.5.1-py3-none-any.whl (8.7 kB)
    Collecting scipy~=1.8.1
      Using cached scipy-1.8.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (42.2 MB)
    Collecting cython~=0.29.32
      Using cached Cython-0.29.32-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (2.0 MB)
    Collecting scapy~=2.4.5
      Using cached scapy-2.4.5.tar.gz (1.1 MB)
    Collecting zstandard~=0.18.0
      Using cached zstandard-0.18.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.5 MB)
    Collecting protobuf~=3.19.4
      Using cached protobuf-3.19.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
    Collecting pyclustering~=0.10.1.2
      Using cached pyclustering-0.10.1.2.tar.gz (2.6 MB)
    Collecting classixclustering~=0.7.4
      Using cached classixclustering-0.7.4.tar.gz (629 kB)
        ERROR: Command errored out with exit status 1:
         command: /home/XXX/venv/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-a5h4ms0x/classixclustering_5529ebbd0bef4c0489673327bfb2a134/setup.py'"'"'; __file__='"'"'/tmp/pip-install-a5h4ms0x/classixclustering_5529ebbd0bef4c0489673327bfb2a134/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-ldj6pnm6
             cwd: /tmp/pip-install-a5h4ms0x/classixclustering_5529ebbd0bef4c0489673327bfb2a134/
        Complete output (5 lines):
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "/tmp/pip-install-a5h4ms0x/classixclustering_5529ebbd0bef4c0489673327bfb2a134/setup.py", line 1, in <module>
            import numpy
        ModuleNotFoundError: No module named 'numpy'
        ----------------------------------------
    WARNING: Discarding https://files.pythonhosted.org/packages/4d/0f/5a17e5d8045195d1a112b143a8143fff86e558d7cbeacad886d1b93be6db/classixclustering-0.7.4.tar.gz#sha256=d0f72deccb40ca9eb14905bb1a0f41787a824446eebac5a67a7ae59ec4c65342 (from https://pypi.org/simple/classixclustering/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
    ERROR: Could not find a version that satisfies the requirement classixclustering~=0.7.4
    ERROR: No matching distribution found for classixclustering~=0.7.4
    

    The content of the requirements.txt looks as follows:

    tqdm~=4.64.0
    numpy~=1.22.4
    pandas~=1.4.2
    seaborn~=0.11.2
    matplotlib~=3.5.2
    tensorflow~=2.9.1
    scikit-learn~=1.1.1
    nfstream~=6.5.1
    tabulate~=0.8.9
    missingno~=0.5.1
    scipy~=1.8.1
    cython~=0.29.32
    scapy~=2.4.5
    zstandard~=0.18.0
    protobuf~=3.19.4
    pyclustering~=0.10.1.2
    classixclustering~=0.7.4
    umap-learn~=0.5.3
    

    Any help resolving this issue would be appreciated, thanks.

    opened by Schwaggot 11
  • Cython fail on windows 11

    Cython fail on windows 11

    When importing anything from classix e.g.

    from classix import CLASSIX
    

    I get the following output

    Cython fail.
    Cython fail.
    

    I'm running Windows 11 and see this on both a pip installed version of classix and from doing python setup.py install from the GitHub repo.

    However, everything else seems to be working fine. Execution has fallen back to the pure Python versions and it seems to be working pretty quickly:

    Using the standard Windows Python install: https://www.python.org/downloads/windows/ with the whole stack pip installed the fit below on 2,000,000 10 dimensional points takes 11.2 seconds on my machine.

    from sklearn import datasets
    from classix import CLASSIX
    import numpy as np
    import matplotlib.pyplot as plt 
    
    X, y = datasets.make_blobs(n_samples=2000000, centers=4, n_features=10, random_state=1) 
    
    clx = CLASSIX(sorting='pca', verbose=0)
    clx.fit(X)
    

    I'm wondering what speed differences I'd see with the Cython version? Also, can I expect the results to be identical?

    opened by mikecroucher 4
  • IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) Error

    IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) Error

    Using this example:

    from sklearn import datasets
    import numpy as np
    from classix import CLASSIX
    
    X, y = datasets.make_blobs(n_samples=5000, centers=2, n_features=2, cluster_std=1, random_state=1)
    clx = CLASSIX(sorting='pca', radius=0.15, group_merging='density', verbose=1, minPts=13, post_alloc=False)
    clx.fit(X)
    

    I am getting the following error:

    ---------------------------------------------------------------------------
    IndexError                                Traceback (most recent call last)
    <ipython-input-569-bb587af68fc1> in <module>
          5 X, y = datasets.make_blobs(n_samples=5000, centers=2, n_features=2, cluster_std=1, random_state=1)
          6 clx = CLASSIX(sorting='pca', radius=0.15, group_merging='density', verbose=1, minPts=13, post_alloc=False)
    ----> 7 clx.fit(X)
          8 
          9 X
    
    ~/miniconda3/envs/ltf-analysis/lib/python3.8/site-packages/classix/clustering.py in fit(self, data)
        506             self.labels_ = copy.deepcopy(self.groups_)
        507         else:
    --> 508             self.labels_ = self.clustering(
        509                 data=self.data,
        510                 agg_labels=self.groups_,
    
    ~/miniconda3/envs/ltf-analysis/lib/python3.8/site-packages/classix/clustering.py in clustering(self, data, agg_labels, splist, sorting, radius, method, minPts)
        724             # self.merge_groups = merge_pairs(self.connected_pairs_)
        725 
    --> 726         self.merge_groups, self.connected_pairs_ = self.fast_agglomerate(data, splist, radius, method, scale=self.scale)
        727         maxid = max(labels) + 1
        728 
    
    ~/miniconda3/envs/ltf-analysis/lib/python3.8/site-packages/classix/merging.py in fast_agglomerate(data, splist, radius, method, scale)
        115             # den1 = splist[int(i), 2] / volume # density(splist[int(i), 2], volume = volume)
        116             for j in select_stps.astype(int):
    --> 117                 sp2 = data[splist[j, 0]] # splist[int(j), 3:]
        118 
        119                 c2 = np.linalg.norm(data-sp2, ord=2, axis=-1) <= radius
    
    IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
    

    python==3.8.12 classixclustering==0.6.5 numpy==1.22.0 scipy==1.7.3

    opened by joshdunnlime 3
  • Error will be reported when the feature dimension is greater than 3

    Error will be reported when the feature dimension is greater than 3

    Hi, Xinye! Thank you for your excellent work! I try to apply your method to my own multi-dim dataset. I found if the 'n_features' > 3 will lead to an error(also will perform on the synthetic dataset ):


    NameError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_8868\670907302.py in 7 # Call CLASSIX 8 clx = CLASSIX(radius=0.5, verbose=0) ----> 9 clx.fit(X)

    E:\Anaconda3\envs\DL-main\lib\site-packages\classix\clustering.py in fit(self, data) 488 489 # aggregation --> 490 self.groups_, self.splist_, self.dist_nr = self.aggregate(data=self.data, sorting=self.sorting, tol=self.radius) 491 self.splist_ = np.array(self.splist_) 492

    E:\Anaconda3\envs\DL-main\lib\site-packages\classix\aggregation.py in aggregate(data, sorting, tol) 82 sort_vals = [email protected](-1) 83 else: ---> 84 U1, s1, _ = svds(data, k=1, return_singular_vectors="u") 85 sort_vals = U1[:,0]*s1[0] 86

    NameError: name 'svds' is not defined


    from sklearn import datasets from classix import CLASSIX

    Generate synthetic data

    X, y = datasets.make_blobs(n_samples=5000, centers=2, n_features=25, random_state=1)

    Call CLASSIX

    clx = CLASSIX(radius=0.5, verbose=0) clx.fit(X)


    Maybe I use your method in a wrong way. Could you tell me how to get CLASSIX work on muiti-dim data? Any advise will be thankful! Best wishes!

    opened by MotorZ 2
  • group_merging=='none'

    group_merging=='none'

    It would be helpful and intuitive to return the group labels as obtained by the aggregation phase when

    group_merging.lower()=='none' or group_merging is None

    In this case the cluster labels are just the group labels returned by aggregation.

    enhancement 
    opened by guettel 1
  • Fail instalment

    Fail instalment

    Some users cannot install CLASSIX by pip install classix and report the installing issue. I told them classix is also another software name. Instead we should use pip install ClassixClustering.

    opened by chenxinye 1
  • Allow indexing with data frame labels

    Allow indexing with data frame labels

    Assume I have a date frame df like

    Anna   0.3   -0.1   0.5
    Bert   0.0   -0.2   0.7
    Carl  -0.8   -0.1   0.2
    

    where the first column is the index. It would be nice to be able to do

    clx = CLASSIX(radius=0.5)
    clx.fit(df)
    clx.explain('Anna', 'Bert')
    

    and get an output similar to

    The data point 'Anna' is in group 0, which has been merged into cluster 0.
    The data point 'Bert' is in group 1, which has been merged into cluster 1.
    There is no path of overlapping groups between these clusters.
    

    The table of groups could contain an additional column for the label:

    -------------------------------------------------
     Group  Label   NrPts  Cluster  Coordinates  
       0    'Anna'  1      0        0.3   -0.1   0.5
       1    'Bert'  2      1        0.0   -0.2   0.7
    -------------------------------------------------
    

    The plot function could also use the labels instead of numerical point indices, but there should probably be an option to revert to numerical indices for the starting points if they plot gets too cluttered.

    enhancement 
    opened by guettel 1
  • Error using Scipy 1.8.0

    Error using Scipy 1.8.0

    This code works fine with numpy 1.22.2 and Scipy 1.7.3

    from sklearn import datasets
    from classix import CLASSIX
    
    # Generate synthetic data
    X, y = datasets.make_blobs(n_samples=2000000, centers=4, n_features=10, random_state=1) #data_med
    
    # Employ CLASSIX clustering
    clx = CLASSIX(sorting='pca', radius=0.5, verbose=1)
    clx.fit(X)
    

    but fails with Scipy 1.8.0:

    CLASSIX(sorting='pca', radius=0.5, minPts=0, group_merging='distance')
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    Input In [1], in <cell line: 9>()
          7 # Employ CLASSIX clustering
          8 clx = CLASSIX(sorting='pca', radius=0.5, verbose=1)
    ----> 9 clx.fit(X)
    
    File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\classix\clustering.py:460, in CLASSIX.fit(self, data)
        457     self.data = (data - self._mu) / self._scl
        459 # aggregation
    --> 460 self.agg_labels_, self.splist_, self.dist_nr = aggregate(data=self.data, sorting=self.sorting, tol=self.radius) 
        461 self.splist_ = np.array(self.splist_)
        463 self.clean_index_ = np.full(self.data.shape[0], True) # claim clean data indices
    
    File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\classix\aggregation_cm.pyx:44, in classix.aggregation_cm.aggregate()
    
    File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\classix\aggregation_cm.pyx:100, in classix.aggregation_cm.aggregate()
    
    File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\sparse\linalg\_eigen\_svds.py:269, in svds(A, k, ncv, tol, which, v0, maxiter, return_singular_vectors, solver, random_state, options)
        134 """
        135 Partial singular value decomposition of a sparse matrix.
        136 
       (...)
        265 
        266 """
        267 rs_was_None = random_state is None  # avoid changing v0 for arpack/lobpcg
    --> 269 args = _iv(A, k, ncv, tol, which, v0, maxiter, return_singular_vectors,
        270            solver, random_state)
        271 (A, k, ncv, tol, which, v0, maxiter,
        272  return_singular_vectors, solver, random_state) = args
        274 largest = (which == 'LM')
    
    File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\sparse\linalg\_eigen\_svds.py:63, in _iv(A, k, ncv, tol, which, v0, maxiter, return_singular, solver, random_state)
         60     raise ValueError(f"solver must be one of {solvers}.")
         62 # input validation/standardization for `A`
    ---> 63 A = aslinearoperator(A)  # this takes care of some input validation
         64 if not (np.issubdtype(A.dtype, np.complexfloating)
         65         or np.issubdtype(A.dtype, np.floating)):
         66     message = "`A` must be of floating or complex floating data type."
    
    File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\sparse\linalg\_interface.py:826, in aslinearoperator(A)
        822     return LinearOperator(A.shape, A.matvec, rmatvec=rmatvec,
        823                           rmatmat=rmatmat, dtype=dtype)
        825 else:
    --> 826     raise TypeError('type not understood')
    
    TypeError: type not understood
    
    opened by mikecroucher 1
  • Question:Possibility of adding a temporal component to the clustering

    Question:Possibility of adding a temporal component to the clustering

    I wanted to know if it's possible to add a temporal component when it comes to how each node is clustered. Nodes with a close timestamp are clustered and this would take precedence over the spatial.

    opened by Emmanuel-Mekonnen 1
Releases(v0.7.7)
  • v0.7.7(Mar 3, 2022)

    CLASSIX v0.7.7 release! :tada: :tada: :tada:

    Update

    • Add parameter verbose for method cython_is_available() to show if the implementation is using Cython or memoryview
    • Fix the issues of installation in the setup file.

    We sincerely appreciate the valuable feedback and contributions from @Schwaggot.

    Also, we thanks all the previous participants and contributors including @joshdunnlime, @mikecroucher

    Source code(tar.gz)
    Source code(zip)
An esoteric data type built entirely of NaNs.

NaNsAreNumbers An esoteric data type built entirely of NaNs. Installation pip install nans_are_numbers Explanation A floating point number is just co

Travis Hoppe 72 Jan 01, 2023
Data Structures and algorithms package implementation

Documentation Simple and Easy Package --This is package for enabling basic linear and non-linear data structures and algos-- Data Structures Array Sta

1 Oct 30, 2021
This repository is a compilation of important Data Structures and Algorithms based on Python.

Python DSA 🐍 This repository is a compilation of important Data Structures and Algorithms based on Python. Please make seperate folders for different

Bhavya Verma 27 Oct 29, 2022
A mutable set that remembers the order of its entries. One of Python's missing data types.

An OrderedSet is a mutable data structure that is a hybrid of a list and a set. It remembers the order of its entries, and every entry has an index nu

Elia Robyn Lake (Robyn Speer) 173 Nov 28, 2022
Datastructures such as linked list, trees, graphs etc

datastructures datastructures such as linked list, trees, graphs etc Made a public repository for coding enthusiasts. Those who want to collaborate on

0 Dec 01, 2021
Python collections that are backended by sqlite3 DB and are compatible with the built-in collections

sqlitecollections Python collections that are backended by sqlite3 DB and are compatible with the built-in collections Installation $ pip install git+

Takeshi OSOEKAWA 11 Feb 03, 2022
A Python implementation of red-black trees

Python red-black trees A Python implementation of red-black trees. This code was originally copied from programiz.com, but I have made a few tweaks to

Emily Dolson 7 Oct 20, 2022
A simple tutorial to use tree-sitter to parse code into ASTs

A simple tutorial to use py-tree-sitter to parse code into ASTs. To understand what is tree-sitter, see https://github.com/tree-sitter/tree-sitter. Tr

Nghi D. Q. Bui 7 Sep 17, 2022
Multidict is dict-like collection of key-value pairs where key might be occurred more than once in the container.

multidict Multidict is dict-like collection of key-value pairs where key might be occurred more than once in the container. Introduction HTTP Headers

aio-libs 325 Dec 27, 2022
pyprobables is a pure-python library for probabilistic data structures

pyprobables is a pure-python library for probabilistic data structures. The goal is to provide the developer with a pure-python implementation of common probabilistic data-structures to use in their

Tyler Barrus 86 Dec 25, 2022
An command-line utility that schedules your exams preparation routines

studyplan A tiny utility that schedules your exams preparation routines. You only need to specify the tasks and the deadline. App will output a iCal f

Ilya Breitburg 3 May 18, 2022
Python tree data library

Links Documentation PyPI GitHub Changelog Issues Contributors If you enjoy anytree Getting started Usage is simple. Construction from anytree impo

776 Dec 28, 2022
My solutions to the competitive programming problems on LeetCode, USACO, LintCode, etc.

This repository holds my solutions to the competitive programming problems on LeetCode, USACO, LintCode, CCC, UVa, SPOJ, and Codeforces. The LeetCode

Yu Shen 32 Sep 17, 2022
A high-performance immutable mapping type for Python.

immutables An immutable mapping type for Python. The underlying datastructure is a Hash Array Mapped Trie (HAMT) used in Clojure, Scala, Haskell, and

magicstack 996 Jan 02, 2023
Data Structure With Python

Data-Structure-With-Python- Python programs also include in this repo Stack A stack is a linear data structure that stores items in a Last-In/First-Ou

Sumit Nautiyal 2 Jan 09, 2022
A Munch is a Python dictionary that provides attribute-style access (a la JavaScript objects).

munch munch is a fork of David Schoonover's Bunch package, providing similar functionality. 99% of the work was done by him, and the fork was made mai

Infinidat Ltd. 643 Jan 07, 2023
Supporting information (calculation outputs, structures)

Supporting information (calculation outputs, structures)

Eric Berquist 2 Feb 02, 2022
This repo is all about different data structures and algorithms..

Data Structure and Algorithm : Want to learn data strutrues and algorithms ??? Then Stop thinking more and start to learn today. This repo will help y

Priyanka Kothari 7 Jul 10, 2022
A Python library for electronic structure pre/post-processing

PyProcar PyProcar is a robust, open-source Python library used for pre- and post-processing of the electronic structure data coming from DFT calculati

Romero Group 124 Dec 07, 2022
A HDF5-based python pickle replacement

Hickle Hickle is an HDF5 based clone of pickle, with a twist: instead of serializing to a pickle file, Hickle dumps to an HDF5 file (Hierarchical Data

Danny Price 450 Dec 21, 2022