PyClustering is a Python, C++ data mining library.

Last update: Jan 05, 2023

Overview

PyClustering

pyclustering is a Python, C++ data mining library (clustering algorithm, oscillatory networks, neural networks). The library provides Python and C++ implementations (C++ pyclustering library) of each algorithm or model. C++ pyclustering library is a part of pyclustering and supported for Linux, Windows and MacOS operating systems.

Version: 0.11.dev

License: The 3-Clause BSD License

E-Mail: [email protected]

Documentation: https://pyclustering.github.io/docs/0.10.1/html/

Homepage: https://pyclustering.github.io/

PyClustering Wiki: https://github.com/annoviko/pyclustering/wiki

Dependencies

Required packages: scipy, matplotlib, numpy, Pillow

Python version: >=3.6 (32-bit, 64-bit)

C++ version: >= 14 (32-bit, 64-bit)

Performance

Each algorithm is implemented using Python and C/C++ language, if your platform is not supported then Python implementation is used, otherwise C/C++. Implementation can be chosen by ccore flag (by default it is always 'True' and it means that C/C++ is used), for example:

# As by default - C/C++ part of the library is used
xmeans_instance_1 = xmeans(data_points, start_centers, 20, ccore=True);

# The same - C/C++ part of the library is used by default
xmeans_instance_2 = xmeans(data_points, start_centers, 20);

# Switch off core - Python is used
xmeans_instance_3 = xmeans(data_points, start_centers, 20, ccore=False);

Installation

Installation using pip3 tool:

$ pip3 install pyclustering

Manual installation from official repository using Makefile:

# get sources of the pyclustering library, for example, from repository
$ mkdir pyclustering
$ cd pyclustering/
$ git clone https://github.com/annoviko/pyclustering.git .

# compile CCORE library (core of the pyclustering library).
$ cd ccore/
$ make ccore_64bit      # build for 64-bit OS

# $ make ccore_32bit    # build for 32-bit OS

# return to parent folder of the pyclustering library
$ cd ../

# install pyclustering library
$ python3 setup.py install

# optionally - test the library
$ python3 setup.py test

Manual installation using CMake:

# get sources of the pyclustering library, for example, from repository
$ mkdir pyclustering
$ cd pyclustering/
$ git clone https://github.com/annoviko/pyclustering.git .

# generate build files.
$ mkdir build
$ cmake ..

# build pyclustering-shared target depending on what was generated (Makefile or MSVC solution)
# if Makefile has been generated then
$ make pyclustering-shared

# return to parent folder of the pyclustering library
$ cd ../

# install pyclustering library
$ python3 setup.py install

# optionally - test the library
$ python3 setup.py test

Manual installation using Microsoft Visual Studio solution:

Clone repository from: https://github.com/annoviko/pyclustering.git
Open folder pyclustering/ccore
Open Visual Studio project ccore.sln
Select solution platform: x86 or x64
Build pyclustering-shared project.
Add pyclustering folder to python path or install it using setup.py

# install pyclustering library
$ python3 setup.py install

# optionally - test the library
$ python3 setup.py test

Proposals, Questions, Bugs

In case of any questions, proposals or bugs related to the pyclustering please contact to [email protected] or create an issue here.

PyClustering Status

Branch	master	0.10.dev	0.10.1.rel
Build (Linux, MacOS)
Build (Win)
Code Coverage

Cite the Library

If you are using pyclustering library in a scientific paper, please, cite the library:

Novikov, A., 2019. PyClustering: Data Mining Library. Journal of Open Source Software, 4(36), p.1230. Available at: http://dx.doi.org/10.21105/joss.01230.

BibTeX entry:

@article{Novikov2019,
    doi         = {10.21105/joss.01230},
    url         = {https://doi.org/10.21105/joss.01230},
    year        = 2019,
    month       = {apr},
    publisher   = {The Open Journal},
    volume      = {4},
    number      = {36},
    pages       = {1230},
    author      = {Andrei Novikov},
    title       = {{PyClustering}: Data Mining Library},
    journal     = {Journal of Open Source Software}
}

Brief Overview of the Library Content

Clustering algorithms and methods (module pyclustering.cluster):

Algorithm	Python	C++
Agglomerative	✓	✓
BANG	✓
BIRCH	✓
BSAS	✓	✓
CLARANS	✓
CLIQUE	✓	✓
CURE	✓	✓
DBSCAN	✓	✓
Elbow	✓	✓
EMA	✓
Fuzzy C-Means	✓	✓
GA (Genetic Algorithm)	✓	✓
G-Means	✓	✓
HSyncNet	✓	✓
K-Means	✓	✓
K-Means++	✓	✓
K-Medians	✓	✓
K-Medoids	✓	✓
MBSAS	✓	✓
OPTICS	✓	✓
ROCK	✓	✓
Silhouette	✓	✓
SOM-SC	✓	✓
SyncNet	✓	✓
Sync-SOM	✓
TTSAS	✓	✓
X-Means	✓	✓

Oscillatory networks and neural networks (module pyclustering.nnet):

Model	Python	C++
CNN (Chaotic Neural Network)	✓
fSync (Oscillatory network based on Landau-Stuart equation and Kuramoto model)	✓
HHN (Oscillatory network based on Hodgkin-Huxley model)	✓	✓
Hysteresis Oscillatory Network	✓
LEGION (Local Excitatory Global Inhibitory Oscillatory Network)	✓	✓
PCNN (Pulse-Coupled Neural Network)	✓	✓
SOM (Self-Organized Map)	✓	✓
Sync (Oscillatory network based on Kuramoto model)	✓	✓
SyncPR (Oscillatory network for pattern recognition)	✓	✓
SyncSegm (Oscillatory network for image segmentation)	✓	✓

Graph Coloring Algorithms (module pyclustering.gcolor):

Algorithm	Python	C++
DSatur	✓
Hysteresis	✓
GColorSync	✓

Containers (module pyclustering.container):

Algorithm	Python	C++
KD Tree	✓	✓
CF Tree	✓

Examples in the Library

The library contains examples for each algorithm and oscillatory network model:

Clustering examples: pyclustering/cluster/examples

Graph coloring examples: pyclustering/gcolor/examples

Oscillatory network examples: pyclustering/nnet/examples

Code Examples

Data clustering by CURE algorithm

from pyclustering.cluster import cluster_visualizer;
from pyclustering.cluster.cure import cure;
from pyclustering.utils import read_sample;
from pyclustering.samples.definitions import FCPS_SAMPLES;

# Input data in following format [ [0.1, 0.5], [0.3, 0.1], ... ].
input_data = read_sample(FCPS_SAMPLES.SAMPLE_LSUN);

# Allocate three clusters.
cure_instance = cure(input_data, 3);
cure_instance.process();
clusters = cure_instance.get_clusters();

# Visualize allocated clusters.
visualizer = cluster_visualizer();
visualizer.append_clusters(clusters, input_data);
visualizer.show();

Data clustering by K-Means algorithm

from pyclustering.cluster.kmeans import kmeans, kmeans_visualizer
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.samples.definitions import FCPS_SAMPLES
from pyclustering.utils import read_sample

# Load list of points for cluster analysis.
sample = read_sample(FCPS_SAMPLES.SAMPLE_TWO_DIAMONDS)

# Prepare initial centers using K-Means++ method.
initial_centers = kmeans_plusplus_initializer(sample, 2).initialize()

# Create instance of K-Means algorithm with prepared centers.
kmeans_instance = kmeans(sample, initial_centers)

# Run cluster analysis and obtain results.
kmeans_instance.process()
clusters = kmeans_instance.get_clusters()
final_centers = kmeans_instance.get_centers()

# Visualize obtained results
kmeans_visualizer.show_clusters(sample, clusters, final_centers)

Data clustering by OPTICS algorithm

from pyclustering.cluster import cluster_visualizer
from pyclustering.cluster.optics import optics, ordering_analyser, ordering_visualizer
from pyclustering.samples.definitions import FCPS_SAMPLES
from pyclustering.utils import read_sample

# Read sample for clustering from some file
sample = read_sample(FCPS_SAMPLES.SAMPLE_LSUN)

# Run cluster analysis where connectivity radius is bigger than real
radius = 2.0
neighbors = 3
amount_of_clusters = 3
optics_instance = optics(sample, radius, neighbors, amount_of_clusters)

# Performs cluster analysis
optics_instance.process()

# Obtain results of clustering
clusters = optics_instance.get_clusters()
noise = optics_instance.get_noise()
ordering = optics_instance.get_ordering()

# Visualize ordering diagram
analyser = ordering_analyser(ordering)
ordering_visualizer.show_ordering_diagram(analyser, amount_of_clusters)

# Visualize clustering results
visualizer = cluster_visualizer()
visualizer.append_clusters(clusters, sample)
visualizer.show()

Simulation of oscillatory network PCNN

from pyclustering.nnet.pcnn import pcnn_network, pcnn_visualizer

# Create Pulse-Coupled neural network with 10 oscillators.
net = pcnn_network(10)

# Perform simulation during 100 steps using binary external stimulus.
dynamic = net.simulate(50, [1, 1, 1, 0, 0, 0, 0, 1, 1, 1])

# Allocate synchronous ensembles from the output dynamic.
ensembles = dynamic.allocate_sync_ensembles()

# Show output dynamic.
pcnn_visualizer.show_output_dynamic(dynamic, ensembles)

Simulation of chaotic neural network CNN

from pyclustering.cluster import cluster_visualizer
from pyclustering.samples.definitions import SIMPLE_SAMPLES
from pyclustering.utils import read_sample
from pyclustering.nnet.cnn import cnn_network, cnn_visualizer

# Load stimulus from file.
stimulus = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3)

# Create chaotic neural network, amount of neurons should be equal to amount of stimulus.
network_instance = cnn_network(len(stimulus))

# Perform simulation during 100 steps.
steps = 100
output_dynamic = network_instance.simulate(steps, stimulus)

# Display output dynamic of the network.
cnn_visualizer.show_output_dynamic(output_dynamic)

# Display dynamic matrix and observation matrix to show clustering phenomenon.
cnn_visualizer.show_dynamic_matrix(output_dynamic)
cnn_visualizer.show_observation_matrix(output_dynamic)

# Visualize clustering results.
clusters = output_dynamic.allocate_sync_ensembles(10)
visualizer = cluster_visualizer()
visualizer.append_clusters(clusters, stimulus)
visualizer.show()

Illustrations

Cluster allocation on FCPS dataset collection by DBSCAN:

Cluster allocation by OPTICS using cluster-ordering diagram:

Partial synchronization (clustering) in Sync oscillatory network:

Cluster visualization by SOM (Self-Organized Feature Map)

Comments

Performance Issue - OPTICS

I am running OPTICS algorithm on 50k data points, since the data is text it has around 5k features. The time taken to run the program seems huge. Tried using ccore but doesnt seem to improve. Is there any way that I could improve performance.
Investigation Optimization

opened by swetha0613 19
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Thank you for your library, it is very useful for me and the data mining community. I wanted to run birch algorithm but I had this error from the cftree.py: if (merged_entry.get_diameter() > self.__threshold): ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().

Also when I want to use the parameter diameter when I instantiate the birch algorithm, I get this error: birch_instance = birch(x,3,diameter=0.1) TypeError: init() got an unexpected keyword argument 'diameter'.

One last question, would it be possible to leave the parameter number_clusters optional to let the user use other clustering algorithms in the last step of birch instead of the hierarchical method?
Bug Question

opened by nabilEM 13

How to use pyclustering kmedoids using gower distance matrix?

Hi,

Not sure if this has already been asked but I have a dataframe consisting of categorical and numerical data. I want to cluster this data to extract features. I use the following code from https://sourceforge.net/projects/gower-distance-4python/files/ to calculate the gower distance.

My code is as follows:

`import pyclustering 

from sklearn.metrics.pairwise import pairwise_distances
import numpy as np    
from pyclustering.cluster.kmedoids import kmedoids;
from pyclustering.utils import read_sample;
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.cluster.elbow import elbow
from pyclustering.cluster.kmeans import kmeans
from pyclustering.cluster.encoder import cluster_encoder, type_encoding

D = gower_distances(filtOrdersGower_subset)
initial_medoids = kmeans_plusplus_initializer(D, 4).initialize(return_index=True)
kmedoids_instance = kmedoids(D,initial_medoids, data_type='distance_matrix');

kmedoids_instance.process();
clusters = kmedoids_instance.get_clusters();
`

how do i plot these clusters/ get what features in my data are most important? New to pyclustering @annoviko

Question

opened by zahs123 13

k-medioids with custom distance
I am new to pyclustering. Rummaging through the source code I didn't see how I could insert custom distance (either by passing a callable that computes pairwise distance or a precomputed distance matrix). Could you help? Thanks.

To be more specific, the following is the sort of thing I'm talking about:

import numpy as np from scipy.cluster.hierarchy import linkage, fcluster def my_dist(u,v): # exemplifying using a weird distance metric. return (u + v).sum() data = np.array([[1,2,3,4], [5,6,7,8]]) clust = linkage(data, method='average', metric=my_dist) prediction = fcluster(clust, 2, criterion='maxclust')
Question Proposal
opened by suwangcompling 13

Anyway to lose the matplotlib dependency or make it optional?

I'm getting the following:

Traceback (most recent call last):
  File "/Users/alex/dev/something/extractor/ml.py", line 8, in <module>
    from pyclustering.cluster.kmeans import kmeans
  File "/Users/alex/dev/venv/content-extractor-3pp1FAW1/lib/python3.6/site-packages/pyclustering/cluster/__init__.py", line 26, in <module>
    import matplotlib.pyplot as plt;
  File "/Users/alex/dev/venv/content-extractor-3pp1FAW1/lib/python3.6/site-packages/matplotlib/pyplot.py", line 115, in <module>
    _backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
  File "/Users/alex/dev/venv/content-extractor-3pp1FAW1/lib/python3.6/site-packages/matplotlib/backends/__init__.py", line 62, in pylab_setup
    [backend_name], 0)
  File "/Users/alex/dev/venv/content-extractor-3pp1FAW1/lib/python3.6/site-packages/matplotlib/backends/backend_macosx.py", line 17, in <module>
    from matplotlib.backends import _macosx

RuntimeError: Python is not installed as a framework. The Mac OS X
backend will not be able to function correctly if Python is not
installed as a framework. See the Python documentation for more
information on installing Python as a framework on Mac OS X. Please
either reinstall Python as a framework, or try one of the other
backends. If you are using (Ana)Conda please install python.app and
replace the use of 'python' with 'pythonw'. See 'Working with
Matplotlib on OSX' in the Matplotlib FAQ for more information.

Don't need plotting, just the clusters :-/ Perhaps move the

import matplotlib.pyplot as plt;
import matplotlib.gridspec as gridspec;

inside the show() function?

Investigation Optimization

opened by awhillas 10

Missing labels_ and predict function for K-Means

Great work. but k-means missing labels_ and predict function like sklearn ~ https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans
Enhancement Proposal

opened by liufsd 9
[ccore]ccore.so can not find

when i use "python xmeans_examples.py" at first it runs correctly,but after a few seconds,i got this problem: could you help me.thanks so much!~
Question

opened by paulinsider 9
G-Means: Setting maximum number of clusters like for X-Means

Hi,

I wanted to ask if it is possible to add a k_max parameter to the call to gmeans? So Similar to xmeans, which support this parameter. The reason is that gmeans returns for some datasets a really large number of clusters (sometimes it is even the same of the size of the dataset, which is the worst case). I do not know the reason behind this, but it would be nice if I could limit the number of clusters as I can do for xmeans.
Enhancement Proposal

opened by tschechlovdev 8
[pyclustering.cluster.kmedians] exception: access violation reading 0x(Memory Address)

Hi,

When using kmedians, i get an error related to an access violation when reading a memory address. This happens if i use ccore=True. If i use ccore=False kmedians_obj.process() returns no medians or clusters. My guess is that it is related to the number of clusters (and maybe tolerance), although i am not sure. It happens when the ratio number of clusters to points is below 1/10 (The number of values and clusters i was using when got it was between 380-450 for 45 clusters). However, it might be interesting to try to capture the error so it is more informative.

Thanks a lot for your work!
Bug Question

opened by jordiarjona 8
[pyclustering.cluster.rock] Use ROCK for clustering data set.

Hi,

I am trying to use the Robust Clustering Algorithm for Categorical Attributes (ROCK) algorithm on a data set containing categorical attributes but getting an error that data can not be str. How can I use this method with categorical data set.

Thanks, Naser
Question

opened by NaserMonsefi 8
kmedoids returns empty cluster lists for version 0.10.1

Hi,

Previously, code working on one server with version 0.9.3.1 worked as expected. However, the same code run on a different server with version 0.10.1 returned some empty clusters for the same dataset and initial medoids.

initial_medoids=[0,1,2,3] kmedoids_instance=kmedoids(df2,initial_medoids,metric=metric) kmedoids_instance.process() clusters=kmedoids_instance.get_clusters() medoids=kmedoids_instance.get_medoids() print(clusters)

The above would return indices for clusters 0 and 1 but empty lists for clusters 2 and 3, despite there not being any missing in my data df2. I would expect at the very least, the medoids themselves to be in clusters 2 and 3.

Thank you, this is a great package, I really appreciate it.

Lauren
Bug

opened by laurenleesc 7
Reference for the "Elbow length" method?

The documentation of the elbow package suggests this is based on the reference Thorndike 1953: https://github.com/annoviko/pyclustering/blob/bf4f51a472622292627ec8c294eb205585e50f52/pyclustering/cluster/elbow.py#L4 https://github.com/annoviko/pyclustering/blob/bf4f51a472622292627ec8c294eb205585e50f52/docs/citation.bib#L552-L556 Yet, I cannot find the "Elbow length" equation used in this reference, in fact he appears very skeptical that such elbows can be reliably identified (for a good reason...). Is there another reference for this particular method?

opened by kno10 0
xmeans does not agree to paper?

The last term, p * 0.5 * log(N), should be in the sum only once IMHO. It is in the top BIC equation (j is the model index, not the cluster index), not in the l(Dn) equation where n is the cluster index) in https://web.cs.dal.ca/~shepherd/courses/csci6403/clustering/xmeans.pdf No guarantees that everything else is fine.

I also rename sigma_sqrt to sigma_sq because it is supposed to be sigma square, not square root.

Note that if sigma_multiplier = float('-inf'), the result will always be infinity, won't it?

opened by kno10 0

Build failed: 'numeric_limits' is not a member of 'std'

platform: Arch Linux
gcc version 12.1.1 20220730 (GCC)

When buildling package, gcc throws error:

In file included from src/cluster/bsas.cpp:10:
./include/pyclustering/cluster/bsas.hpp:92:44: error: 'numeric_limits' is not a member of 'std'
   92 |         double        m_distance    = std::numeric_limits<double>::max();   /**< Distance between the cluster and a specific point. */
      |                                            ^~~~~~~~~~~~~~
./include/pyclustering/cluster/bsas.hpp:92:59: error: expected primary-expression before 'double'
   92 |         double        m_distance    = std::numeric_limits<double>::max();   /**< Distance between the cluster and a specific point. */
      |                                                           ^~~~~~
make[1]: *** [ccore.mk:154: obj/ccore/64-bit/cluster/bsas.o] Error 1
make[1]: *** Waiting for unfinished jobs....
In file included from ./include/pyclustering/cluster/mbsas.hpp:12,
                 from src/cluster/mbsas.cpp:10:
./include/pyclustering/cluster/bsas.hpp:92:44: error: 'numeric_limits' is not a member of 'std'
   92 |         double        m_distance    = std::numeric_limits<double>::max();   /**< Distance between the cluster and a specific point. */
      |                                            ^~~~~~~~~~~~~~
./include/pyclustering/cluster/bsas.hpp:92:59: error: expected primary-expression before 'double'
   92 |         double        m_distance    = std::numeric_limits<double>::max();   /**< Distance between the cluster and a specific point. */
      |                                                           ^~~~~~
make[1]: *** [ccore.mk:154: obj/ccore/64-bit/cluster/mbsas.o] Error 1
src/cluster/agglomerative.cpp: In member function 'void pyclustering::clst::agglomerative::merge_by_average_link()':
src/cluster/agglomerative.cpp:89:44: error: 'numeric_limits' is not a member of 'std'
   89 |     double minimum_average_distance = std::numeric_limits<double>::max();
      |                                            ^~~~~~~~~~~~~~
src/cluster/agglomerative.cpp:89:59: error: expected primary-expression before 'double'
   89 |     double minimum_average_distance = std::numeric_limits<double>::max();
      |                                                           ^~~~~~
src/cluster/agglomerative.cpp: In member function 'void pyclustering::clst::agglomerative::merge_by_centroid_link()':
src/cluster/agglomerative.cpp:123:44: error: 'numeric_limits' is not a member of 'std'
  123 |     double minimum_average_distance = std::numeric_limits<double>::max();
      |                                            ^~~~~~~~~~~~~~
src/cluster/agglomerative.cpp:123:59: error: expected primary-expression before 'double'
  123 |     double minimum_average_distance = std::numeric_limits<double>::max();
      |                                                           ^~~~~~
src/cluster/agglomerative.cpp: In member function 'void pyclustering::clst::agglomerative::merge_by_complete_link()':
src/cluster/agglomerative.cpp:149:45: error: 'numeric_limits' is not a member of 'std'
  149 |     double minimum_complete_distance = std::numeric_limits<double>::max();
      |                                             ^~~~~~~~~~~~~~
src/cluster/agglomerative.cpp:149:60: error: expected primary-expression before 'double'
  149 |     double minimum_complete_distance = std::numeric_limits<double>::max();
      |                                                            ^~~~~~
src/cluster/agglomerative.cpp: In member function 'void pyclustering::clst::agglomerative::merge_by_signle_link()':
src/cluster/agglomerative.cpp:184:43: error: 'numeric_limits' is not a member of 'std'
  184 |     double minimum_single_distance = std::numeric_limits<double>::max();
      |                                           ^~~~~~~~~~~~~~
src/cluster/agglomerative.cpp:184:58: error: expected primary-expression before 'double'
  184 |     double minimum_single_distance = std::numeric_limits<double>::max();
      |                                                          ^~~~~~
src/cluster/agglomerative.cpp:193:54: error: 'numeric_limits' is not a member of 'std'
  193 |             double candidate_minimum_distance = std::numeric_limits<double>::max();
      |                                                      ^~~~~~~~~~~~~~
src/cluster/agglomerative.cpp:193:69: error: expected primary-expression before 'double'
  193 |             double candidate_minimum_distance = std::numeric_limits<double>::max();
      |                                                                     ^~~~~~
make[1]: *** [ccore.mk:154: obj/ccore/64-bit/cluster/agglomerative.o] Error 1
make[1]: Leaving directory '/tmp/makepkg/python-pyclustering-git/src/pyclustering/ccore'
make: *** [makefile:53: ccore_64bit] Error 2

opened by Catty2014 0

predict error for kmeans

`from pyclustering.cluster.kmeans import kmeans, kmeans_visualizer from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer from pyclustering.samples.definitions import FCPS_SAMPLES from pyclustering.utils import read_sample samples = read_sample(FCPS_SAMPLES.SAMPLE_TWO_DIAMONDS) initial_centers = kmeans_plusplus_initializer(samples, 2).initialize() kmeans_instance = kmeans(samples, initial_centers) kmeans_instance.process() clusters = kmeans_instance.get_clusters() final_centers = kmeans_instance.get_centers()

kmeans_instance.predict(samples)`

and i meet this:

AttributeError Traceback (most recent call last) /tmp/ipykernel_20827/3994711565.py in ----> 1 kmeans_instance.predict(samples)

~/envs/envs/spark_seg/lib/python3.7/site-packages/pyclustering/cluster/kmeans.py in predict(self, points) 441 for index_point in range(len(nppoints)): 442 if self.__metric.get_type() != type_metric.USER_DEFINED: --> 443 differences[index_point] = self.__metric(nppoints[index_point], self.__centers) 444 else: 445 differences[index_point] = [self.__metric(nppoints[index_point], center) for center in self.__centers]

~/envs/envs/spark_seg/lib/python3.7/site-packages/pyclustering/utils/metric.py in call(self, point1, point2) 130 131 """ --> 132 return self.__calculator(point1, point2) 133 134

~/envs/envs/spark_seg/lib/python3.7/site-packages/pyclustering/utils/metric.py in euclidean_distance_square_numpy(object1, object2) 368 369 """ --> 370 if len(object1.shape) > 1 or len(object2.shape) > 1: 371 return numpy.sum(numpy.square(object1 - object2), axis=1).T 372 else:

AttributeError: 'list' object has no attribute 'shape'

opened by BeHappyForMe 0
(Minor issue) Typo in repository description (pyclustring)

There is an "e" missing in the word "pyclustring" in the repository description. It should say "PyClustering" instead of "pyclustring".

opened by 99991 0

Releases(0.10.1.2)

0.10.1.2(Nov 25, 2020)
pyclustering 0.10.1.2 library is a collection of clustering algorithms, oscillatory networks, etc.

CORRECTED MAJOR BUGS:

Corrected bug with empty clusters for K-Medoids (C++ pyclustering::clst::kmeadois). See: https://github.com/annoviko/pyclustering/issues/659

Source code(tar.gz)
Source code(zip)
pyclustering-0.10.1.2.tar.gz(2.45 MB)
0.10.1.1(Nov 24, 2020)
pyclustering 0.10.1.1 library is a collection of clustering algorithms, oscillatory networks, etc.

CORRECTED MAJOR BUGS:

Corrected bug with incorrect cluster allocation for K-Medoids (C++ pyclustering::clst::kmeadois). See: https://github.com/annoviko/pyclustering/issues/659

Source code(tar.gz)
Source code(zip)
pyclustering-0.10.1.1.tar.gz(2.45 MB)
0.10.1(Nov 19, 2020)
pyclustering 0.10.1 library is a collection of clustering algorithms, oscillatory networks, etc.

GENERAL CHANGES:

The library is distributed under BSD-3-Clause library. See: https://github.com/annoviko/pyclustering/issues/517

C++ pyclustering can be built using CMake. See: https://github.com/annoviko/pyclustering/issues/603

Supported dumping and loading for DBSCAN algorithm via pickle (Python: pyclustering.cluster.dbscan). See: https://github.com/annoviko/pyclustering/issues/650

Package installer resolves all required dependencies automatically. See: https://github.com/annoviko/pyclustering/issues/647

Introduced human-readable error for genetic clustering algorithm in case of non-normalized data (Python: pyclustering.cluster.ga). See: https://github.com/annoviko/pyclustering/issues/597

Optimized windows implementation parallel_for and parallel_for_each by using pyclustering::parallel instead of PPL that affects all algorithms which use these functions (C++: pyclustering::parallel). See: https://github.com/annoviko/pyclustering/issues/642

Optimized parallel_for algorithm for short cycles that affects all algorithms which use parallel_for (C++: pyclustering::parallel). See: https://github.com/annoviko/pyclustering/issues/642

Introduced kstep parameter for elbow algorithm to use custom K search steps (Python: pyclustering.cluster.elbow, C++: pyclustering::cluster::elbow). See: https://github.com/annoviko/pyclustering/issues/489

Introduced p_step parameter for parallel_for function (C++: pyclustering::parallel). See: https://github.com/annoviko/pyclustering/issues/640

Optimized python implementation of K-Medoids algorithm (Python: pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/526

C++ pyclustering CLIQUE interface returns human-readable errors (Python: pyclustering.cluster.clique). See: https://github.com/annoviko/pyclustering/issues/635 See: https://github.com/annoviko/pyclustering/issues/634

Introduced metric parameter for X-Means algorithm to use custom metric for clustering (Python: pyclustering.cluster.xmeans; C++ pyclustering::clst::xmeans). See: https://github.com/annoviko/pyclustering/issues/619

Introduced alpha and beta probabilistic bounds for MNDL splitting criteria for X-Means algorithm (Python: pyclustering.cluster.xmeans; C++: pyclustering::clst::xmeans). See: https://github.com/annoviko/pyclustering/issues/624

CORRECTED MAJOR BUGS:

Corrected bug with a command python3 -m pyclustering.tests that was using the current folder to find tests to run (Python: pyclustering). See: https://github.com/annoviko/pyclustering/issues/648

Corrected bug with Elbow algorithm where kmax is not used to calculate K (Python: pyclustering.cluster.elbow; C++: pyclustering::clst::elbow). See: https://github.com/annoviko/pyclustering/issues/639

Corrected implementation of K-Medians (PAM) algorithm that is aligned with original algorithm (Python: pyclustering.cluster.kmedoids; C++: pyclustering::clst::kmedoids). See: https://github.com/annoviko/pyclustering/issues/503

Corrected literature references that were for K-Medians (PAM) implementation (Python: pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/pull/572

Corrected bug when K-Medoids updates input parameter initial_medoids that were provided to the algorithm (Python: pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/630

Corrected bug with Euclidean distance when numpy is used (Python: pyclustering.utils.metric). See: https://github.com/annoviko/pyclustering/issues/625

Corrected bug with Minkowski distance when numpy is used (Python: pyclustering.utils.metric). See: https://github.com/annoviko/pyclustering/issues/626

Corrected bug with Gower distance when numpy calculation is used and data shape is bigger than 1 (Python: pyclustering.utils.metric). See: https://github.com/annoviko/pyclustering/issues/627

Corrected MNDL splitting criteria for X-Means algorithm (Python: pyclustering.cluster.xmeans; C++: pyclustering::clst::xmeans). See: https://github.com/annoviko/pyclustering/issues/623

Source code(tar.gz)
Source code(zip)
pyclustering-0.10.1.tar.gz(2.45 MB)
0.10.0.1(Aug 17, 2020)
pyclustering 0.10.0.1 library is a collection of clustering algorithms and methods, oscillatory networks, etc.

GENERAL CHANGES:

Metadata of the library is updated. See: no reference

Supported command test for setup.py script (Python: pyclustering). See: https://github.com/annoviko/pyclustering/issues/607

Introduced parameter random_seed for algorithms/models to control the seed of the random functionality: kmeans++, random_center_initializer, ga, gmeans, xmeans, som, somsc, elbow, silhouette_ksearch (Python: pyclustering.cluster; C++: pyclustering.clst). See: https://github.com/annoviko/pyclustering/issues/578

Introduced parameter k_max to G-Means algorithm to use it as an optional stop condition for the algorithm (Python: pyclustering.cluster.gmeans; C++: pyclustering::clst::gmeans). See: https://github.com/annoviko/pyclustering/issues/602

Implemented method save() for cluster_visualizer and cluster_visualizer_multidim to save visualization to file (Python: pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/601

Optimization of CURE algorithm using balanced KD-tree (Python: pyclustering.cluster.cure; C++: pyclustering::clst::cure). See: https://github.com/annoviko/pyclustering/issues/589

Optimization of OPTICS algorithm using balanced KD-tree (Python: pyclustering.cluster.optics; C++: pyclustering::clst::optics). See: https://github.com/annoviko/pyclustering/issues/588

Optimization of DBSCAN algorithm using balanced KD-tree (Python: pyclustering.cluster.dbscan; C++: pyclustering::clst::dbscan). See: https://github.com/annoviko/pyclustering/issues/587

Implemented new optimized balanced KD-tree kdtree_balanced (Python: pyclustering.cluster.kdtree; C++: pyclustering::container::kdtree_balanced). See: https://github.com/annoviko/pyclustering/issues/379

Implemented KD-tree graphical visualizer kdtree_visualizer for KD-trees with 2-dimensional data (Python: pyclustering.container.kdtree). See: https://github.com/annoviko/pyclustering/issues/586

Updated interface of each clustering algorithm in C/C++ pyclustering cluster_data is substituted by concrete classes (C++ pyclustering::clst). See: https://github.com/annoviko/pyclustering/issues/577

CORRECTED MAJOR BUGS:

Bug with wrong data type for scores in Silhouette K-search algorithm in case of using C++ (Python: pyclustering.cluster.silhouette). See: https://github.com/annoviko/pyclustering/issues/606

Bug with a random distribution in the random center initializer (Python: pyclustering.cluster.center_initializer). See: https://github.com/annoviko/pyclustering/issues/573

Bug with incorrect converting Index List and Object List to Labeling when clusters do not contains one or more points from an input data (Python pyclustering.cluster.encoder). See: https://github.com/annoviko/pyclustering/issues/596

Bug with an exception in case of using user-defined metric for K-Means algorithm (Python pyclustering.cluster.kmeans). See: https://github.com/annoviko/pyclustering/pull/600

Memory leakage in the interface between python and C++ pyclustering library in case of CURE algorithm usage (C++ pyclustering). See: https://github.com/annoviko/pyclustering/issues/581

Source code(tar.gz)
Source code(zip)
pyclustering-0.10.0.1.tar.gz(2.55 MB)
0.10.0(Aug 17, 2020)
pyclustering 0.10.0 library is a collection of clustering algorithms and methods, oscillatory networks, etc.

GENERAL CHANGES:

Supported command test for setup.py script (Python: pyclustering). See: https://github.com/annoviko/pyclustering/issues/607

Introduced parameter random_seed for algorithms/models to control the seed of the random functionality: kmeans++, random_center_initializer, ga, gmeans, xmeans, som, somsc, elbow, silhouette_ksearch (Python: pyclustering.cluster; C++: pyclustering.clst). See: https://github.com/annoviko/pyclustering/issues/578

Introduced parameter k_max to G-Means algorithm to use it as an optional stop condition for the algorithm (Python: pyclustering.cluster.gmeans; C++: pyclustering::clst::gmeans). See: https://github.com/annoviko/pyclustering/issues/602

Implemented method save() for cluster_visualizer and cluster_visualizer_multidim to save visualization to file (Python: pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/601

Optimization of CURE algorithm using balanced KD-tree (Python: pyclustering.cluster.cure; C++: pyclustering::clst::cure). See: https://github.com/annoviko/pyclustering/issues/589

Optimization of OPTICS algorithm using balanced KD-tree (Python: pyclustering.cluster.optics; C++: pyclustering::clst::optics). See: https://github.com/annoviko/pyclustering/issues/588

Optimization of DBSCAN algorithm using balanced KD-tree (Python: pyclustering.cluster.dbscan; C++: pyclustering::clst::dbscan). See: https://github.com/annoviko/pyclustering/issues/587

Implemented new optimized balanced KD-tree kdtree_balanced (Python: pyclustering.cluster.kdtree; C++: pyclustering::container::kdtree_balanced). See: https://github.com/annoviko/pyclustering/issues/379

Implemented KD-tree graphical visualizer kdtree_visualizer for KD-trees with 2-dimensional data (Python: pyclustering.container.kdtree). See: https://github.com/annoviko/pyclustering/issues/586

Updated interface of each clustering algorithm in C/C++ pyclustering cluster_data is substituted by concrete classes (C++ pyclustering::clst). See: https://github.com/annoviko/pyclustering/issues/577

CORRECTED MAJOR BUGS:

Bug with wrong data type for scores in Silhouette K-search algorithm in case of using C++ (Python: pyclustering.cluster.silhouette). See: https://github.com/annoviko/pyclustering/issues/606

Bug with a random distribution in the random center initializer (Python: pyclustering.cluster.center_initializer). See: https://github.com/annoviko/pyclustering/issues/573

Bug with incorrect converting Index List and Object List to Labeling when clusters do not contains one or more points from an input data (Python pyclustering.cluster.encoder). See: https://github.com/annoviko/pyclustering/issues/596

Bug with an exception in case of using user-defined metric for K-Means algorithm (Python pyclustering.cluster.kmeans). See: https://github.com/annoviko/pyclustering/pull/600

Memory leakage in the interface between python and C++ pyclustering library in case of CURE algorithm usage (C++ pyclustering). See: https://github.com/annoviko/pyclustering/issues/581

Source code(tar.gz)
Source code(zip)
0.9.3.1(Dec 24, 2019)
pyclustering 0.9.3.1 library is a collection of clustering algorithms and methods, oscillatory networks, etc.

CORRECTED MAJOR BUGS:

Hotfix for the CF-tree - call method with incorrect amount of arguments. See: https://github.com/annoviko/pyclustering/issues/570

Source code(tar.gz)
Source code(zip)
pyclustering-0.9.3.1.tar.gz(2.51 MB)
0.9.3(Dec 23, 2019)
pyclustering 0.9.3 library is a collection of clustering algorithms and methods, oscillatory networks, etc.

GENERAL CHANGES:

Introduced get_cf_clusters and get_cf_entries methods for BIRCH algorithm to get CF-entry encoding information (pyclustering.cluster.birch). See: https://github.com/annoviko/pyclustering/issues/569

Introduced predict method for SOMSC algorithm to find closest clusters for specified points (pyclustering.cluster.somsc). See: https://github.com/annoviko/pyclustering/issues/546

Parallel optimization of C++ pyclustering compilation process. See: https://github.com/annoviko/pyclustering/issues/553

Include folder for easy integration to other C++ projects. See: https://github.com/annoviko/pyclustering/issues/554

Introduced new targets to build static libraries on Windows platform. See: https://github.com/annoviko/pyclustering/issues/555

Introduced new targets to build static libraries on Linux/MacOS platforms. See: https://github.com/annoviko/pyclustering/issues/556

CORRECTED MAJOR BUGS:

Bug with incorrect finding of closest CF-entry (pyclustering.container.cftree). See: https://github.com/annoviko/pyclustering/issues/564

Bug with incorrect BIRCH clustering due incorrect leaf analysis (pyclustering.cluster.birch). See: https://github.com/annoviko/pyclustering/issues/563

Bug with incorrect search procedure of farthest nodes in CF-tree (pyclustering.container.cftree). See: https://github.com/annoviko/pyclustering/issues/551

Bug with crash during clustering with the same points in case of BIRCH (pyclustering.cluster.birch). See: https://github.com/annoviko/pyclustering/issues/561

Source code(tar.gz)
Source code(zip)
pyclustering-0.9.3-binaries-all.tar.gz(2.51 MB)
0.9.2(Oct 10, 2019)
pyclustering 0.9.2 library is a collection of clustering algorithms and methods, oscillatory networks, etc.

GENERAL CHANGES:

Introduced checking of input arguments for clustering algorithm to provide human-readable errors (pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/548

Implemented functionality to perform Anderson-Darling test for Gaussian distribution (ccore.stats). See: https://github.com/annoviko/pyclustering/issues/550

Implemented new clustering algorithm G-Means (pyclustering.cluster.gmeans, ccore.clst.gmeans). See: https://github.com/annoviko/pyclustering/issues/506

Introduced parameter repeat to improve parameters in X-Means algorithm (pyclustering.cluster.xmeans, ccore.clst.xmeans). See: https://github.com/annoviko/pyclustering/issues/525

Introduced new distance metric: Gower (pyclustering.utils.metric, ccore.utils.metric). See: https://github.com/annoviko/pyclustering/issues/544

Introduced sampling algorithms reservoir_r and reservoir_x (pyclustering.utils.sampling). See: https://github.com/annoviko/pyclustering/issues/542

Introduced parameter data_type to Silhouette method to use distance matrix (pyclustering.cluster.silhouette, ccore.clst.silhouette). See: https://github.com/annoviko/pyclustering/issues/543

Optimization of HHN (Hodgkin-Huxley Neural Network) by parallel processing (ccore.nnet.hhn). See: https://github.com/annoviko/pyclustering/issues/541

Introduced get_total_wce method for xmeans algorithm to find WCE (pyclustering.cluster.xmeans). See: https://github.com/annoviko/pyclustering/issues/508

CORRECTED MAJOR BUGS:

Incorrect center initialization in K-Means++ when candidates are not farthest (pyclustering.cluster.center_initializer). See: https://github.com/annoviko/pyclustering/issues/549

Source code(tar.gz)
Source code(zip)
pyclustering-0.9.2-binaries-all.tar.gz(2.50 MB)
0.9.1(Sep 4, 2019)
pyclustering 0.9.1 library is a collection of clustering algorithms and methods, oscillatory networks, neural networks, etc.

GENERAL CHANGES:

Introduced predict method for X-Means algorithm to find closest clusters for particular points (pyclustering.cluster.xmeans). See: https://github.com/annoviko/pyclustering/issues/540

Optimization of OPTICS algorithm by reducing complexity (ccore.clst.optics). See: https://github.com/annoviko/pyclustering/issues/521

Optimization of K-Medians algorithm by parallel processing (ccore.clst.kmedians). See: https://github.com/annoviko/pyclustering/issues/529

Introduced predict method for K-Medoids algorithm to find closest clusters for particular points (pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/527

Introduced predict method for K-Means algorithm to find closest clusters for particular points (pyclustering.cluster.kmeans). See: https://github.com/annoviko/pyclustering/issues/515

Parallel optimization of Elbow method. (ccore.clst.elbow). See: https://github.com/annoviko/pyclustering/issues/511

Source code(tar.gz)
Source code(zip)
pyclustering-0.9.1-binaries-all.tar.gz(2.41 MB)
0.9.0(Apr 18, 2019)
pyclustering 0.9.0 library is a collection of clustering algorithms and methods, oscillatory networks, neural networks, etc.

GENERAL CHANGES:

CCORE (pyclustering core) is supported for MacOS. See: https://github.com/annoviko/pyclustering/issues/486

Introduced parallel Fuzzy C-Means algorithm (pyclustering.cluster.fcm, ccore.clst.fcm). See: https://github.com/annoviko/pyclustering/issues/386

Introduced new 'itermax' parameter for K-Means, K-Medians, K-Medoids algorithm to control maximum amount of iterations (pyclustering.cluster, ccore.clst). See: https://github.com/annoviko/pyclustering/issues/496

Implemented Silhouette and Silhouette K-Search algorithm for CCORE (ccore.clst.silhouette, ccore.clst.silhouette_ksearch). See: https://github.com/annoviko/pyclustering/issues/490

Implemented CLIQUE algorithms (pyclustering.cluster.clique, ccore.clst.clique). See: https://github.com/annoviko/pyclustering/issues/381

Introduced new distance metrics: Canberra and Chi Square (pyclustering.utils.metric, ccore.utils.metric). See: https://github.com/annoviko/pyclustering/issues/482

Optimization of CURE algorithm (C++ implementation) by using heap (multiset) instead of list to store clusters in queue (ccore.clst.cure). See: https://github.com/annoviko/pyclustering/issues/479

CORRECTED MAJOR BUGS:

Bug with crossover mask generation for genetic clustering algorithm (pyclustering.cluster.ga). See: https://github.com/annoviko/pyclustering/pull/474

Bug with hanging of K-Medians algorithm for some cases when algorithm is initialized by wrong amount of centers (ccore.clst.kmedians). See: https://github.com/annoviko/pyclustering/issues/498

Bug with incorrect center initialization, when the same point can be placed to result more than once (pyclustering.cluster.center_initializer, ccore.clst.kmeans_plus_plus). See: https://github.com/annoviko/pyclustering/issues/497

Bug with incorrect clustering in case of CURE python implementation when clusters are allocated incorrectly (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/483

Bug with incorrect distance calculation for kmeans++ in case of index representation for centers (pyclustering.cluster.center_initializer). See: https://github.com/annoviko/pyclustering/issues/485

Source code(tar.gz)
Source code(zip)
pyclustering-0.9.0-binaries-all.tar.gz(2.35 MB)
0.8.2-joss(Apr 11, 2019)

pyclustering 0.8.2-joss library is a collection of clustering algorithms and methods, oscillatory networks, neural networks, etc.

It is a special release for JOSS (The Journal of Open Source Software). This version contains only cosmetic changes related to documentation and project description that have been introduced after JOSS reivew.
Source code(tar.gz)
Source code(zip)
0.8.2(Nov 19, 2018)
pyclustering 0.8.2 library is a collection of clustering algorithms and methods, oscillatory networks, neural networks, etc.

GENERAL CHANGES:

Implemented Silhouette method and Silhouette KSearcher to find out proper amount of clusters (pyclustering.cluster.silhouette). See: https://github.com/annoviko/pyclustering/issues/416

Introduced new 'return_index' parameter for kmeans_plus_plus and random_center_initializer algorithms (method 'initialize') to initialize initial medoids (pyclustering.cluster.center_initializer). See: https://github.com/annoviko/pyclustering/issues/421

Display warning instead of throwing error if matplotlib or Pillow cannot be imported (MAC OS X problems). See: https://github.com/annoviko/pyclustering/issues/455

Implemented Random Center Initializer for CCORE (ccore.clst.random_center_initializer). See: no reference.

Implemented Elbow method to find out proper amount of clusters in dataset (pyclustering.cluster.elbow, ccore.clst.elbow). See: https://github.com/annoviko/pyclustering/issues/416

Introduced new method 'get_optics_objects' for OPTICS algorithm to obtain detailed information about ordering (pyclustering.cluster.optics, ccore.clst.optics). See: https://github.com/annoviko/pyclustering/issues/464

Added new clustering answers for SAMPLE SIMPLE data collections (pyclustering.samples). See: https://github.com/annoviko/pyclustering/issues/459

Implemented multidimensional cluster visualizer (pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/450

Parallel optimization of K-Medoids algorithm (ccore.clst.kmedoids). See: https://github.com/annoviko/pyclustering/issues/447

Parallel optimization of K-Means and X-Means (that uses K-Means) algorithms (ccore.clst.kmeans, ccore.clst.xmeans). See: https://github.com/annoviko/pyclustering/issues/451

Introduced new threshold parameter 'amount of block points' to BANG algorithm to allocate outliers more precisely (pyclustering.cluster.bang). See: https://github.com/annoviko/pyclustering/issues/446

Optimization of conveying results from C++ to Python for K-Medians and K-Medoids (pyclustering.cluster.kmedoids, pyclustering.cluster.kmedians). See: https://github.com/annoviko/pyclustering/issues/445

Implemented cluster generator (pyclustering.cluster.generator). See: https://github.com/annoviko/pyclustering/issues/444

Implemented BANG animator to render animation of clustering process (pyclustering.cluster.bang). See: https://github.com/annoviko/pyclustering/issues/442

Optimization of CURE algorithm by using Euclidean Square distance (pyclustering.cluster.cure, ccore.clst.cure). See: https://github.com/annoviko/pyclustering/issues/439

Supported numpy.ndarray points in KD-tree (pyclustering.container.kdtree). See: https://github.com/annoviko/pyclustering/issues/438

CORRECTED MAJOR BUGS:

Bug with clustering failure in case of non-numpy user defined metric for K-Means algorithm (pyclustering.cluster.kmeans). See: https://github.com/annoviko/pyclustering/issues/471

Bug with animation of correlation matrix in case of new versions of matplotlib (pyclustering.nnet.sync). See: no reference.

Bug with SOM and pickle when it was not possible to store and load network using pickle (pyclustering.nnet.som). See: https://github.com/annoviko/pyclustering/issues/456

Bug with DBSCAN when points are marked as a noise (pyclustering.cluster.dbscan). See: https://github.com/annoviko/pyclustering/issues/462

Bug with randomly enabled connection weights in case of SyncNet based algorithms using CCORE interface (pyclustering.nnet.syncnet). See: https://github.com/annoviko/pyclustering/issues/452

Bug with calculation weighted connection for Sync based clustering algorithms in C++ implementation (ccore.nnet.syncnet). See: no reference

Bug with failure in case of numpy.ndarray data type in python part of CURE algorithm (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/438

Bug with BANG algorithm with empty dimensions - when data contains column with the same values (pyclustering.cluster.bang). See: https://github.com/annoviko/pyclustering/issues/449

Source code(tar.gz)
Source code(zip)
pyclustering-0.8.2-binaries-all.tar.gz(1.97 MB)
0.8.1(May 29, 2018)
pyclustering 0.8.1 library is collection of clustering algorithms, oscillatory networks, neural networks, etc.

GENERAL CHANGES:

Implemented feature to use specific metric for distance calculation in K-Means algorithm (pyclustering.cluster.kmeans, ccore.clst.kmeans). See: https://github.com/annoviko/pyclustering/issues/434

Implemented BANG-clustering algorithm with result visualizer (pyclustering.cluster.bang). See: https://github.com/annoviko/pyclustering/issues/424

Implemented feature to use specific metric for distance calculation in K-Medians algorithm (pyclustering.cluster.kmedians, ccore.clst.kmedians). See: https://github.com/annoviko/pyclustering/issues/429

Supported new type of input data for K-Medoids - distance matrix (pyclustering.cluster.kmedoids, ccore.clst.kmedoids). See: https://github.com/annoviko/pyclustering/issues/418

Implemented TTSAS algorithm (pyclustering.cluster.ttsas, ccore.clst.ttsas). See: https://github.com/annoviko/pyclustering/issues/398

Implemented MBSAS algorithm (pyclustering.cluster.mbsas, ccore.clst.mbsas). See: https://github.com/annoviko/pyclustering/issues/398

Implemented BSAS algorithm (pyclustering.cluster.bsas, ccore.clst.bsas). See: https://github.com/annoviko/pyclustering/issues/398

Implemented feature to use specific metric for distance calculation in K-Medoids algorithm (pyclustering.cluster.kmedoids, ccore.clst.kmedoids). See: https://github.com/annoviko/pyclustering/issues/417

Implemented distance metric collection (pyclustering.utils.metric, ccore.utils.metric). See: no reference.

Supported new type of input data for OPTICS - distance matrix (pyclustering.cluster.optics, ccore.clst.optics). See: https://github.com/annoviko/pyclustering/issues/412

Supported new type of input data for DBSCAN - distance matrix (pyclustering.cluster.dbscan, ccore.clst.dbscan). See: no reference.

Implemented K-Means observer and visualizer to visualize and animate clustering results (pyclustering.cluster.kmeans, ccore.clst.kmeans). See: no reference.

CORRECTED MAJOR BUGS:

Bug with out of range in K-Medians (pyclustering.cluster.kmedians, ccore.clst.kmedians). See: https://github.com/annoviko/pyclustering/issues/428

Bug with fast linking in PCNN (python implementation only) that wasn't used despite the corresponding option (pyclustering.nnet.pcnn). See: https://github.com/annoviko/pyclustering/issues/419

Source code(tar.gz)
Source code(zip)
0.8.0(Feb 23, 2018)
pyclustering 0.8.0 library is collection of clustering algorithms, oscillatory networks, neural networks, etc.

GENERAL CHANGES:

Optimization K-Means++ algorithm using numpy (pyclustering.cluster.center_initializer). See: no reference.

Implemented K-Means++ initializer for CCORE (ccore.clst.kmeans_plus_plus). See: https://github.com/annoviko/pyclustering/issues/382

Optimization of X-Means clustering process by using KMeans++ for initial centers of split regions (pyclustering.cluster.xmeans, ccore.clst.xmeans). See: https://github.com/annoviko/pyclustering/issues/382

Implemented parallel Sync-family algorithms for C/C++ implementation (CCORE) only (ccore.sync). See: https://github.com/annoviko/pyclustering/issues/170

C/C++ implementation is used by default to increase performance. See: https://github.com/annoviko/pyclustering/issues/393

Ignore 'ccore' flag to use C/C++ if platform is not supported (pyclustering.core). See: https://github.com/annoviko/pyclustering/issues/393

Optimization of python implementation of the K-Means algorithm using numpy (pyclustering.cluster.kmeans). See: https://github.com/annoviko/pyclustering/issues/403

Implemented dynamic visualizer for oscillatory networks (pyclustering.nnet.dynamic_visualizer). See: no reference.

Implemented C/C++ Hodgkin-Huxley oscillatory network for image segmentation in CCORE to increase performance (ccore.hhn, pyclustering.nnet.hhn). See: https://github.com/annoviko/pyclustering/issues/217

Performance optimization for CCORE on linux platform. See: no reference.

32-bit platform of CCORE is supported for Linux OS. See: https://github.com/annoviko/pyclustering/issues/253

32-bit platform of CCORE is supported for Windows OS. See: https://github.com/annoviko/pyclustering/issues/253

Implemented method 'get_probabilities()' for obtaining belong probability in EM-algorithm (pyclustering.cluster.ema). See: https://github.com/annoviko/pyclustering/issues/387

Python implementation of CURE algorithm method 'get_clusters()' returns list of indexes (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/384

Implemented parallel processing for X-Means algorithm (ccore.clst.xmeans). See: https://github.com/annoviko/pyclustering/issues/372

Implemented pool threads for parallel processing (ccore.parallel). See: https://github.com/annoviko/pyclustering/issues/383

Optimization of OPTICS algorithm using KD-tree for searching nearest neighbors (pyclustering.cluster.optics, ccore.optics). See: https://github.com/annoviko/pyclustering/issues/370

Optimization of DBSCAN algorithm using KD-tree for searching nearest neighbors (pyclustering.cluster.dbscan, ccore.dbscan). See: https://github.com/annoviko/pyclustering/issues/369

CORRECTED MAJOR BUGS:

Incorrect type of medoid's index in K-Medians algorithm in case of Python 2.x (pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/415

Hanging of method 'find_node' in KD-tree if it does not contain node with specified point and payload (pyclustering.container.kdtree). See: no reference.

Incorrect clustering by CURE algorithm in some cases when data have a lot of identical points (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/414

Segmentation fault in CURE algorithm in some cases when data have a lot of identical points (ccore.clst.cure). See: no reference.

Incorrect segmentation by Python version of syncsegm - oscillatory network based on sync for image segmentation (pyclustering.nnet.syncsegm). See: https://github.com/annoviko/pyclustering/issues/409

Zero value of sigma under logarithm function in Python version of pyclustering X-Means algorithm (pyclustering.cluster.xmeans). See: https://github.com/annoviko/pyclustering/issues/407

Amplitude threshold is ignored during synchronous ensembles allocation for amplitude output dynamic 'allocate_sync_ensembles' - affect HNN, LEGION (pyclustering.utils). See: no reference.

Wrong indexes can be returned during synchronous ensembles allocation for amplitude output dynamic 'allocate_sync_ensembles' - affect HNN, LEGION (pyclustering.utils). See: no reference.

Amount of allocated clusters can be differ from amount of centers in X-Means algorithm (ccore.clst.xmeans). See: https://github.com/annoviko/pyclustering/issues/389

Amount of allocated clusters can be bigger than kmax in X-Means algorithm (pyclustering.cluster.xmeans, ccore.clst.xmeans). See: https://github.com/annoviko/pyclustering/issues/388

Corrected bug with returned nullptr in method 'kdtree_searcher::find_nearest_node()' (ccore.container.kdtree). See: no reference.

Source code(tar.gz)
Source code(zip)
0.7.2(Oct 23, 2017)
pyclustering 0.7.2 library is collection of clustering algorithms, oscillatory networks, neural networks, etc.

GENERAL CHANGES (pyclustering):

Correction for setup failure with PKG-INFO.rst.

Source code(tar.gz)
Source code(zip)
0.7.1(Oct 19, 2017)
pyclustering 0.7.1 library is collection of clustering algorithms, osicllatory networks, neural networks, etc.

GENERAL CHANGES (pyclustering):

Metadata of the package is updated.

Source code(tar.gz)
Source code(zip)
0.7.0(Oct 16, 2017)
pyclustering 0.7.0 library is collection of clustering algorithms, oscllatory networks, neural networks, etc.

GENERAL CHANGES (pyclustering):

Implemented Expectation-Maximization clustering algorithm for Gaussian Mixute Model and clustering visualizer for this particular algorithm (pyclustering.cluster.ema) See: https://github.com/annoviko/pyclustering/issues/16

Implemented Genetic Clustering Algorithm (GCA) and clustering visualizer for this particular algorithm (pyclustering.cluster.ga) See: https://github.com/annoviko/pyclustering/issues/360

Implemented feature to obtain and visualize evolution of order parameter and local order parameter for Sync network and Sync-based algorithms (pyclustering.nnet.sync). See: https://github.com/annoviko/pyclustering/issues/355

Implemented K-Means++ method for initialization of initial centers for algorithms like K-Means or X-Means (pyclustering.cluster.center_initializer). See: https://github.com/annoviko/pyclustering/issues/354

Implemented fSync oscillatory network that is based on Landau-Stuart equation and Kuramoto model (pyclustering.nnet.fsync). See: https://github.com/annoviko/pyclustering/issues/168

Optimization of pyclustering client to core library 'CCORE' library (pyclustering.core). See: https://github.com/annoviko/pyclustering/issues/289 See: https://github.com/annoviko/pyclustering/issues/351

Implemented feature to show network structure of Sync family oscillatory networks in case 'ccore' usage. See: https://github.com/annoviko/pyclustering/issues/344

Implemented feature to colorize OPTICS ordering diagram when amount of clusters is specified. See: no reference.

Improved clustering results in case of usage MNDL splitting criterion for small datasets. See: https://github.com/annoviko/pyclustering/issues/328

Feature to display connectivity radius on cluster-ordering diagram by ordering_visualizer (pyclustering.cluster.optics). See: https://github.com/annoviko/pyclustering/issues/314

Feature to use CCORE implementation of OPTICS algorithm to take advance in performance (pyclustering.cluster.optics). See: https://github.com/annoviko/pyclustering/issues/120

Implemented feature to shows animation of pattern recognition process that has been performed by the SyncPR oscillatory network. Method 'animate_pattern_recognition()' of class 'syncpr_visualizer' (pyclustering.nnet.syncpr). See: https://www.youtube.com/watch?v=Ro7KbApL4MQ See: https://www.youtube.com/watch?v=iIusOsGehoY

Implemented feature to obtain nodes of specified level of CF-tree. Method 'get_level_nodes()' of class 'cftree' (pyclustering.container.cftree). See: no reference.

Implemented feature to allocate/display/animate phase matrix: 'allocate_phase_matrix()', 'show_phase_matrix()', 'animate_phase_matrix()' (pyclustering.nnet.sync). See: no reference.

Implemented chaotic neural network where clustering phenomenon can be observed: 'cnn_network', 'cnn_dynamic', 'cnn_visualizer' (pyclustering.nnet.cnn). See: https://github.com/annoviko/pyclustering/issues/301

Implemented feature to analyse ordering diagram using amout of clusters that should be allocated as an input parameter to calculate correct connvectity radius for clustering (pyclustering.cluster.optics). See: https://github.com/annoviko/pyclustering/issues/307

Implemented feature to omit usage of initial centers - X-Means starts processing from random initial center (pyclustering.cluster.xmeans). See: no reference.

Implemented feature for cluster visualizer: cluster attributes (pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/295

Implemented SOM-SC algorithm (SOM Simple Clustering) (pyclustering.cluster.somsc). See: https://github.com/annoviko/pyclustering/issues/321

GENERAL CHANGES (ccore):

Implemented feature to obtain and visualize evolution of order parameter and local order parameter for Sync network and Sync-based algorithms (ccore.nnet.sync). See: https://github.com/annoviko/pyclustering/issues/355

Cygwin x64 platform is supported (ccore). See: https://github.com/annoviko/pyclustering/issues/353

Optimization of CCORE library interface (ccore.interface). See: https://github.com/annoviko/pyclustering/issues/289

Implemented MNDL splitting crinterion for X-Means algorithm (ccore.cluster_analysis.xmeans). See: https://github.com/annoviko/pyclustering/issues/159

Implemented OPTICS algorithm and interface for client that results all clustering results (ccore.cluster_analysis.optics). See: https://github.com/annoviko/pyclustering/issues/120

Implmeneted packing of connectivity matrix of Sync family oscillatory networks (ccore.interface.sync_interface). See: https://github.com/annoviko/pyclustering/issues/344

CORRECTED MAJOR BUGS:

Bug with segmentation fault during 'free()' on some linux operating systems. See: no reference.

Bug with sending the first element to cluster in OPTICS even if it is noise element. See: no reference.

Bug with amount of allocated clusters by K-Medoids algorithm in Python implementation and CCORE (pyclustering.cluster.kmedoids, ccore.cluster.medoids). See: https://github.com/annoviko/pyclustering/issues/366 See: https://github.com/annoviko/pyclustering/issues/367

Bug with getting neighbors and getting information about connections in Sync-based network and algorithms in case of usage CCORE. See: no reference.

Bug with calculation of number of oscillations for output dynamics. See: no reference.

Memory leakage in LEGION in case of CCORE usage - API function 'legion_destroy()' was not called (pyclustering.nnet.legion). See: no reference.

Bug with crash of antmeans algorithm for python version 3.6.0:414df79263a11, Dec 23 2016 [MSC v.1900 64 bit (AMD64)] (pyclustering.cluster.antmeans). See: https://github.com/annoviko/pyclustering/issues/350

Memory leakage in destructor of 'pyclustering_package' - exchange mechanism between ccore and pyclustering (ccore.interface.pyclustering_package'). See: https://github.com/annoviko/pyclustering/issues/347

Bug with loosing of the initial state of hSync output dynamic in case of CCORE usage (ccore.cluster.hsyncnet). See: https://github.com/annoviko/pyclustering/issues/346

Bug with hSync output dynamic that was displayed with discontinous parts as a set of rectangles (pyclustering.cluster.hsyncnet). See: https://github.com/annoviko/pyclustering/issues/345

Bug with visualization of CNN network in case 3D data (pyclustering.nnet.cnn). See: https://github.com/annoviko/pyclustering/issues/338

Bug with CCORE wrapper crashing after returning value from CCORE (pyclustering.core). See: https://github.com/annoviko/pyclustering/issues/337

Bug with calculation BIC splitting criterion for X-Means algorithm (pyclustering.cluster.xmeans). See: https://github.com/annoviko/pyclustering/issues/326

Bug with calculation MNDL splitting criterion for X-Means algorithm (pyclustering.cluster.xmeans). See: https://github.com/annoviko/pyclustering/issues/328

Bug with loss of CF-nodes in CF-tree during inserting that leads unbalanced CF-tree (pyclustering.container.cftree). See: https://github.com/annoviko/pyclustering/issues/304

Bug with time stamps for each iteration in hsyncnet algorithm (ccore.cluster.hsyncnet). See: https://github.com/annoviko/pyclustering/issues/306

Bug with memory occupation by CCORE DBSCAN implementation due to adjacency matrix usage (ccore.cluster.dbscan). See: https://github.com/annoviko/pyclustering/issues/309

Bug with CURE: always finds max two representative points (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/310

Bug with infinite loop in case of incorrect number of clusters 'ordering_analyser' (pyclustering.cluster.optics). See: https://github.com/annoviko/pyclustering/issues/317

Bug with incorrect connectivity radius for allocation specified amount of clusters 'ordering_analyser' (pyclustering.cluster.optics). See: https://github.com/annoviko/pyclustering/issues/316

Bug with clusters are allocated in the homogeneous ordering 'ordering_analyser' (pyclustering.cluster.optics). See: https://github.com/annoviko/pyclustering/issues/315

Source code(tar.gz)
Source code(zip)
0.6.6(Oct 7, 2016)
pyclustring 0.6.6 library is collection of clustering algorithms, oscllatory networks, neural networks, etc.

GENERAL CHANGES (pyclustering):

Implemented phase oscillatory network syncpr (pyclustering.nnet.syncpr). See: https://github.com/annoviko/pyclustering/issues/208

Feature for pyclustering.nnet.syncpr that allows to use ccore library for solving. See: https://github.com/annoviko/pyclustering/issues/232

Optimized simulation algorithm for sync oscillatory network (pyclustering.nnet.sync) when collecting results are not requested. See: https://github.com/annoviko/pyclustering/issues/233

Images of english alphabet 100x100. See: https://github.com/annoviko/pyclustering/commit/aa28f1a8a363fbeb5f074d22ec1e8258a1dd0579

Implemented feature to use rectangular network structures in oscillatory networks. See: https://github.com/annoviko/pyclustering/issues/259

Implemented CLARANS algorithm (pyclustering.cluster.clarans). See: https://github.com/annoviko/pyclustering/issues/52

Implemented feature to analyse and visualize results of hysteresis oscillatory network (pyclustering.nnet.hysteresis). See: https://github.com/annoviko/pyclustering/issues/75

Implemented feature to analyse and visualize results of graph coloring algorithm based on hysteresis oscillatory network (pyclustering.gcolor.hysteresis). See: https://github.com/annoviko/pyclustering/issues/75

Implemented ant colony based algorithm for TSP problem (pyclustering.tsp.antcolony). See: https://github.com/annoviko/pyclustering/pull/277

Implemented feature to use CCORE K-Medians algorithm using argument 'ccore' to ensure high performance (pyclustering.cluster.kmedians). See: https://github.com/annoviko/pyclustering/issues/231

Implemented feature to place several plots on each row using parameter 'maximum number of rows' for cluster visualizer (pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/274

Implemented feature to specify initial number of neighbors to calculate initial connectivity radius and increase percent of number of neighbors (or radius if total number of object is exceeded) on each step (pyclustering.cluster.hsyncnet). See: https://github.com/annoviko/pyclustering/issues/284

Implemented double-layer oscillatory network based on modified Kuramoto model for image segmentation (pyclustering.nnet.syncsegm). See: no reference

Added new examples and demos. See: no reference

Implemented feature to use CCORE K-Medoids algorithm using argument 'ccore' to ensure high performance (pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/230

Implemented feature for CURE algorithm that provides additional information about clustering results: representative points and mean point of each cluster (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/292

Implemented feature to animate analysed output dynamic of Sync family oscillatory networks (sync_visualizer, syncnet_visualizer): correlation matrix, phase coordinates, cluster allocation (pyclustering.nnet.sync, pyclustering.cluster.syncnet). See: https://www.youtube.com/watch?v=5S5mFYVihso See: https://www.youtube.com/watch?v=Vd-ww9PcZvI See: https://www.youtube.com/watch?v=QYPqWoyNHO8 See: https://www.youtube.com/watch?v=RA0MiC2WlbY

Improved algorithm SYNC-SOM: accuracy of clustering and calculation are improved in line with proof of concept where connection between oscillator in the second layer (that is represented by the self-organized feature map) should be created in line with classical radius like in SyncNet, but indirectly: if objects that correspond to two different neurons can be connected than neurons should be also connected with each other (pyclustering.cluster.syncsom). See: https://github.com/annoviko/pyclustering/issues/297

GENERAL CHANGES (ccore):

Implemented phase oscillatory network for pattern recognition syncpr (ccore.cluster.syncpr). See: https://github.com/annoviko/pyclustering/issues/232

Implemented agglomerative algorithm for cluster analysis (ccore.cluster.agglomerative). See: https://github.com/annoviko/pyclustering/issues/212

Implemented feature to use rectangular network structures in oscillatory networks. See: https://github.com/annoviko/pyclustering/issues/259

Implemented ant colony based algorithm for TSP problem (ccore.tsp.antcolony). See: https://github.com/annoviko/pyclustering/pull/277

Implemented K-Medians algorithm for cluster analysis (ccore.cluster.kmedians). See: https://github.com/annoviko/pyclustering/issues/231

Implemented feature to specify initial number of neighbors to calculate initial connectivity radius and increase percent of number of neighbors (or radius if total number of object is exceeded) on each step (ccore.cluster.hsyncnet). https://github.com/annoviko/pyclustering/issues/284

Implemented K-Medoids algorithm for cluster analysis (ccore.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/230

Implemented feature for CURE algorithm that provides additional information about clustering results: representative points and mean point of each cluster (ccore.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/293

Implemented new class collection to oscillatory and neural network constructing. See: https://github.com/annoviko/pyclustering/issues/264

Memory usage optimization for ROCK algorithm. See: no reference

CORRECTED MAJOR BUGS:

Bug with callback methods in ccore library in syncnet (ccore.cluster.syncnet) and hsyncnet (ccore.cluster.hsyncnet) that may lead to loss of accuracy.

Bug with division by zero in kmeans algorithm (ccore.kmeans, pyclustering.cluster.kmeans) when cluster after center updating is not able to capture object. See: https://github.com/annoviko/pyclustering/issues/238

Bug with stack overflow in KD tree in case of big data (pyclustering.container.kdtree, ccore.container.kdtree). See: https://github.com/annoviko/pyclustering/pull/239 See: https://github.com/annoviko/pyclustering/issues/255 See: https://github.com/annoviko/pyclustering/issues/254

Bug with incorrect clustering in case of the same elements in cure algorithm (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/pull/239

Bug with execution fail in case of wrong number of initial medians and in case of the same objects with several initial medians (pyclustering.cluster.kmedians). See: https://github.com/annoviko/pyclustering/issues/256

Bug with calculation synchronous ensembles near by zero: oscillators 2*pi and 0 are considered as different (pyclustering.nnet.sync, ccore.nnet.sync). See: https://github.com/annoviko/pyclustering/issues/263

Bug with cluster allocation in kmedoids algorithm in case of the same objects with several initial medoids (pyclustering.cluster.kmedoids). See: https://github.com/annoviko/pyclustering/issues/269

Bug with visualization of clusters in 3D (pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/273

Bug with obtaining nearest entry for absorbing during inserting node (pyclustering.container.cftree). See: https://github.com/annoviko/pyclustering/issues/282

Bug with SOM method show_network() in case of usage CCORE (pyclustering.nnet.som). See: https://github.com/annoviko/pyclustering/issues/283

Bug with cluster allocation in case of switched off dynamic collecting (pyclustering.cluster.hsyncnet). See: https://github.com/annoviko/pyclustering/issues/285

Bug with execution fail during clustering data with rough values of initial medians (pyclustering.cluster.kmedians). See: https://github.com/annoviko/pyclustering/issues/286

Bug with meamory leakage on interface between CCORE and pyclustering (ccore). See: no reference

Bug with allocation correlation matrix in case of usage CCORE (pyclustering.nnet.sync). See: https://github.com/annoviko/pyclustering/issues/288

Bug with memory leakage in CURE algorithm - deallocation of representative points (ccore.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/294

Bug with cluster visualization in case of 1D input data (pyclustering.cluster). See: https://github.com/annoviko/pyclustering/issues/296

Bug with loss of CF-nodes in CF-tree during inserting that leads unbalanced CF-tree (pyclustering.container.cftree). See: https://github.com/annoviko/pyclustering/issues/304

Bug with time stamps for each iteration in hsyncnet algorithm (ccore.cluster.hsyncnet). See: https://github.com/annoviko/pyclustering/issues/306

Bug with memory occupation by CCORE DBSCAN implementation due to adjacency matrix usage (ccore.cluster.dbscan). See: https://github.com/annoviko/pyclustering/issues/309

Bug with CURE: always finds max two representative points (pyclustering.cluster.cure). See: https://github.com/annoviko/pyclustering/issues/310

Source code(tar.gz)
Source code(zip)

PyClustering is a Python, C++ data mining library.

Related tags

Overview

PyClustering

Dependencies

Performance

Installation

Proposals, Questions, Bugs

PyClustering Status

Cite the Library

Brief Overview of the Library Content

Examples in the Library

Code Examples

Illustrations

Comments

Releases(0.10.1.2)

0.10.1.2(Nov 25, 2020)

0.10.1.1(Nov 24, 2020)

0.10.1(Nov 19, 2020)

0.10.0.1(Aug 17, 2020)

0.10.0(Aug 17, 2020)

0.9.3.1(Dec 24, 2019)

0.9.3(Dec 23, 2019)

0.9.2(Oct 10, 2019)

0.9.1(Sep 4, 2019)

0.9.0(Apr 18, 2019)

0.8.2-joss(Apr 11, 2019)

0.8.2(Nov 19, 2018)

0.8.1(May 29, 2018)

0.8.0(Feb 23, 2018)

0.7.2(Oct 23, 2017)

0.7.1(Oct 19, 2017)

0.7.0(Oct 16, 2017)

0.6.6(Oct 7, 2016)

Owner

Andrei Novikov

Fast, flexible and easy to use probabilistic modelling in Python.

The Dash Enterprise App Gallery "Oil & Gas Wells" example

Package for decomposing EMG signals into motor unit firings, as used in Formento et al 2021.

Pipeline and Dataset helpers for complex algorithm evaluation.

Techdegree Data Analysis Project 2

Catalogue data - A Python Scripts to prepare catalogue data

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Pyspark Spotify ETL

We're Team Arson and we're using the power of predictive modeling to combat wildfires.

PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)

Python library for creating data pipelines with chain functional programming

Big Data & Cloud Computing for Oceanography

Autopsy Module to analyze Registry Hives based on bookmarks provided by EricZimmerman for his tool RegistryExplorer

This module is used to create Convolutional AutoEncoders for Variational Data Assimilation

PySpark Structured Streaming ROS Kafka ApacheSpark Cassandra

Bearsql allows you to query pandas dataframe with sql syntax.

Fit models to your data in Python with Sherpa.

Parses data out of your Google Takeout (History, Activity, Youtube, Locations, etc...)

Full ELT process on GCP environment.

Shot notebooks resuming the main functions of GeoPandas