An intuitive library to add plotting functionality to scikit-learn objects.

Overview

Welcome to Scikit-plot

PyPI version license Build Status PyPI DOI

Single line functions for detailed visualizations

The quickest and easiest way to go from analysis...

roc_curves

...to this.

Scikit-plot is the result of an unartistic data scientist's dreadful realization that visualization is one of the most crucial components in the data science process, not just a mere afterthought.

Gaining insights is simply a lot easier when you're looking at a colored heatmap of a confusion matrix complete with class labels rather than a single-line dump of numbers enclosed in brackets. Besides, if you ever need to present your results to someone (virtually any time anybody hires you to do data science), you show them visualizations, not a bunch of numbers in Excel.

That said, there are a number of visualizations that frequently pop up in machine learning. Scikit-plot is a humble attempt to provide aesthetically-challenged programmers (such as myself) the opportunity to generate quick and beautiful graphs and plots with as little boilerplate as possible.

Okay then, prove it. Show us an example.

Say we use Naive Bayes in multi-class classification and decide we want to visualize the results of a common classification metric, the Area under the Receiver Operating Characteristic curve. Since the ROC is only valid in binary classification, we want to show the respective ROC of each class if it were the positive class. As an added bonus, let's show the micro-averaged and macro-averaged curve in the plot as well.

Let's use scikit-plot with the sample digits dataset from scikit-learn.

# The usual train-test split mumbo-jumbo
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
nb = GaussianNB()
nb.fit(X_train, y_train)
predicted_probas = nb.predict_proba(X_test)

# The magic happens here
import matplotlib.pyplot as plt
import scikitplot as skplt
skplt.metrics.plot_roc(y_test, predicted_probas)
plt.show()

roc_curves

Pretty.

And... That's it. Encaptured in that small example is the entire philosophy of Scikit-plot: single line functions for detailed visualization. You simply browse the plots available in the documentation, and call the function with the necessary arguments. Scikit-plot tries to stay out of your way as much as possible. No unnecessary bells and whistles. And when you do need the bells and whistles, each function offers a myriad of parameters for customizing various elements in your plots.

Finally, compare and view the non-scikit-plot way of plotting the multi-class ROC curve. Which one would you rather do?

Maximum flexibility. Compatibility with non-scikit-learn objects.

Although Scikit-plot is loosely based around the scikit-learn interface, you don't actually need Scikit-learn objects to use the available functions. As long as you provide the functions what they're asking for, they'll happily draw the plots for you.

Here's a quick example to generate the precision-recall curves of a Keras classifier on a sample dataset.

# Import what's needed for the Functions API
import matplotlib.pyplot as plt
import scikitplot as skplt

# This is a Keras classifier. We'll generate probabilities on the test set.
keras_clf.fit(X_train, y_train, batch_size=64, nb_epoch=10, verbose=2)
probas = keras_clf.predict_proba(X_test, batch_size=64)

# Now plot.
skplt.metrics.plot_precision_recall_curve(y_test, probas)
plt.show()

p_r_curves

You can see clearly here that skplt.metrics.plot_precision_recall_curve needs only the ground truth y-values and the predicted probabilities to generate the plot. This lets you use anything you want as the classifier, from Keras NNs to NLTK Naive Bayes to that groundbreaking classifier algorithm you just wrote.

The possibilities are endless.

Installation

Installation is simple! First, make sure you have the dependencies Scikit-learn and Matplotlib installed.

Then just run:

pip install scikit-plot

Or if you want the latest development version, clone this repo and run

python setup.py install

at the root folder.

If using conda, you can install Scikit-plot by running:

conda install -c conda-forge scikit-plot

Documentation and Examples

Explore the full features of Scikit-plot.

You can find detailed documentation here.

Examples are found in the examples folder of this repo.

Contributing to Scikit-plot

Reporting a bug? Suggesting a feature? Want to add your own plot to the library? Visit our contributor guidelines.

Citing Scikit-plot

Are you using Scikit-plot in an academic paper? You should be! Reviewers love eye candy.

If so, please consider citing Scikit-plot with DOI DOI

APA

Reiichiro Nakano. (2018). reiinakano/scikit-plot: 0.3.7 [Data set]. Zenodo. http://doi.org/10.5281/zenodo.293191

IEEE

[1]Reiichiro Nakano, “reiinakano/scikit-plot: 0.3.7”. Zenodo, 19-Feb-2017.

ACM

[1]Reiichiro Nakano 2018. reiinakano/scikit-plot: 0.3.7. Zenodo.

Happy plotting!

Comments
  • Improve handling of unbalanced confusion matrices

    Improve handling of unbalanced confusion matrices

    Here I have made a few changes that make it easier to plot confusion matrices where the true and predicted sets of labels are not the same. This is a case that can occur when doing something like applying "new" categories to a dataset with an older set of categories.

    The changes included are the following:

    Fix an issue with nan values showing up when unbalanced confusion matrices are normalized. Where rows with zero entries would sum to zero and then divide by zero when normalizing each cell.

    Add options to limit the labels displayed on the true and predicted axes, as with unbalanced confusion matrices some of the labels can be only in the set of true labels or only in the set of predicted labels.

    You can see the effect of the new options here:

    import numpy as np
    import matplotlib.pyplot as plt
    import scikitplot as sciplt
    
    y_true = np.array(["A", "A", "B", "B", "B", "C", "D"])
    y_pred = np.array(["A", "A", "Ba", "Bb", "Ba", "C", "D"])
    
    print(y_true.shape)
    print(y_pred.shape)
    
    true_labels = np.unique(y_true)
    pred_labels = np.unique(y_pred)
    
    labels = np.sort(np.unique(np.concatenate([true_labels, pred_labels])))
    
    true_label_indexes = np.where(np.isin(labels, true_labels))
    pred_label_indexes = np.where(np.isin(labels, pred_labels))
    
    sciplt.plotters.plot_confusion_matrix(y_true, y_pred, hide_zeros=True, normalize=True, true_label_indexes=true_label_indexes, pred_label_indexes=pred_label_indexes, labels=labels)
    plt.show()
    

    figure_1

    opened by ExcaliburZero 12
  • 0.2.3 to 0.2.6 update failed

    0.2.3 to 0.2.6 update failed

    I've just tried to upgrade the package, but it gave the following error:

    Collecting scikit-plot
      Using cached scikit-plot-0.2.6.tar.gz
        Complete output from command python setup.py egg_info:
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "/tmp/pip-build-7wut1485/scikit-plot/setup.py", line 9, in <module>
            import scikitplot
          File "/tmp/pip-build-7wut1485/scikit-plot/scikitplot/__init__.py", line 5, in <module>
            from scikitplot.classifiers import classifier_factory
          File "/tmp/pip-build-7wut1485/scikit-plot/scikitplot/classifiers.py", line 5, in <module>
            import matplotlib.pyplot as plt
          File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/matplotlib/pyplot.py", line 115, in <module>
            _backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
          File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/matplotlib/backends/__init__.py", line 32, in pylab_setup
            globals(),locals(),[backend_name],0)
          File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/matplotlib/backends/backend_tkagg.py", line 6, in <module>
            from six.moves import tkinter as Tk
          File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/six.py", line 92, in __get__
            result = self._resolve()
          File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/six.py", line 115, in _resolve
            return _import_module(self.mod)
          File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/six.py", line 82, in _import_module
            __import__(name)
          File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/lib/python3.6/tkinter/__init__.py", line 36, in <module>
            import _tkinter # If this fails your Python may not be configured for Tk
        ModuleNotFoundError: No module named '_tkinter'
        
        ----------------------------------------
    Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-7wut1485/scikit-plot/
    

    Unfortunately, I don't know how to debug this problem. If you need some info, please don't hesitate to ask!

    opened by paulochf 10
  • Adding a parameter to plot_confusion_matrix() to hide overlaid counts

    Adding a parameter to plot_confusion_matrix() to hide overlaid counts

    Hi @reiinakano,

    Thank you for this great repo! I am using plot_confusion_matrix() but my counts are quite large so the overlaid counts end up overlapping each other and result in a cluttered plot. I was wondering if I could submit a pull request to update this function to add a hide_counts parameter to give the option to not plot the counts? I've already forked and created a branch with the changes. Thank you!

    opened by echan5 7
  • Plot ONLY one class

    Plot ONLY one class

    Hello i have a precision-recall curve where i plot as the following:

    skplt.metrics.plot_precision_recall_curve(y_test, y_probas, curves=['each_class'])
    

    I have two classes in the data (one positive and one negative class with labels 1 and -1 respectively). Questions: How can I plot ONLY the positive class?

    Thank you

    enhancement help wanted 
    opened by foo123 7
  • Plot precision-recall curve for support vector machine classifier

    Plot precision-recall curve for support vector machine classifier

    Hello I want to plot a precision-recall curve for SVC (support vector machine classifier), but the scikit-learn svm classifier does not implement a predict_proba method. How can I do that in scikit-plot (as far as I can see in the documentation it accepts prediction probabilities to plot the curve)?

    Note that the scikit-learn documentation page has an example of precision-recall curve for SVC

    Thank you, Nikos

    opened by foo123 6
  • Add Jupyter notebook examples

    Add Jupyter notebook examples

    It would be nice to have Jupyter notebooks in the "examples" folder showing the different plots as used in a Jupyter notebook. It could contain the same exact code as the examples in the .py files, but adjusted for size (Jupyter notebook plots tend to come out much smaller).

    easy 
    opened by reiinakano 6
  • Update to plot_confusion_matrix (figsize argument and to work if Seaborn is used)

    Update to plot_confusion_matrix (figsize argument and to work if Seaborn is used)

    Using the confusion matrix in a jupyter notebook returns a plot that is quite small. If Seaborn is also used, some values in the plot are hard to read (white text on white lines).

    I have added a figsize-argument to the plot_confusion_matrix and changed the way that values are displayed (now with neutral background box). For larger plots all text-elements scale with the figsize-argument.

    opened by frankherfert 6
  • Adding argument to allow the user to specify which roc_curve are plotted

    Adding argument to allow the user to specify which roc_curve are plotted

    You tagged this issue #14 as help wanted so I thought I'd pitch in. Feel free to edit if it doesn't match your styIe.

    I added a little bit of code to allow the user to pass a list to the roc curve plotting functions to allow them to suppress/show each of the three types of curves: class-specific curves, micro averages, and macro averages.

    opened by doug-friedman 5
  • Error installing No module named sklearn.metrics

    Error installing No module named sklearn.metrics

    Hi there, I am getting an error installing it

    pip install scikit-plot                                                              ~ 1
    Collecting scikit-plot
      Downloading scikit-plot-0.2.1.tar.gz
        Complete output from command python setup.py egg_info:
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\setup.py", line 9, in <module>
            import scikitplot
          File "c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\scikitplot\__init__.py", line 5, in <module>
            from scikitplot.classifiers import classifier_factory
          File "c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\scikitplot\classifiers.py", line 7, in <module>
            from scikitplot import plotters
          File "c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\scikitplot\plotters.py", line 9, in <module>
            from sklearn.metrics import confusion_matrix
        ImportError: No module named sklearn.metrics
    
        ----------------------------------------
    Command "python setup.py egg_info" failed with error code 1 in c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\
    
    opened by ArthurZ 5
  • Throws error

    Throws error "IndexError: too many indices for array" when trying to plot roc for binary classification

    For binary classification, when I input numpy arrays having test label and test probabilities, it throws the following error :

    
    y_true = np.array(ytest)
    y_probas = np.array(p_test)
    skplt.metrics.plot_roc_curve(y_true,y_probas)
    plt.show()
    
    IndexError                                Traceback (most recent call last)
    <ipython-input-49-1b02f082006a> in <module>()
    ----> 1 skplt.metrics.plot_roc_curve(y_true,y_probas)
          2 plt.show()
    
    
    /Users/tarun/anaconda/envs/gl-env/lib/python2.7/site-packages/scikitplot/metrics.pyc in plot_roc_curve(y_true, y_probas, title, curves, ax, figsize, cmap, title_fontsize, text_fontsize)
        247     roc_auc = dict()
        248     for i in range(len(classes)):
    --> 249         fpr[i], tpr[i], _ = roc_curve(y_true, probas[:, i],
        250                                       pos_label=classes[i])
        251         roc_auc[i] = auc(fpr[i], tpr[i])
    
    IndexError: too many indices for array
    
    opened by TarunTater 4
  • Class mismatch in skplt.plot_confusion_matrix when test has fewer classes than training

    Class mismatch in skplt.plot_confusion_matrix when test has fewer classes than training

    Hello, I have an issue when trying to plot a confusion matrix fewer classes in my test set than in training. The class with 12 000+ occcurences in my sample should be labelled 'O' is it possible to get around this, or to include the label set manually as an input?

    image it's not a big issue but would be nice if we could fix it. Thanks for your help

    opened by ArmandGiraud 4
  • Regarding the scikit-plot.metrics.plot_roc function

    Regarding the scikit-plot.metrics.plot_roc function

    In you code I noticed that if we pass classes in the form of their actual meaning instead of (0,1,2 .. ) and we pass it as (c,b,a) then np.unique(y_true) makes the classes in the form of its alphabetical format and this changes the position of the classes that the model was trained on classes = np.unique(y_true)

    fpr_dict[i], tpr_dict[i], _ = roc_curve(y_true, probas[:, i], pos_label=classes[i])

    Hence if you could add a parameter of class_labels in the function

    def plot_roc_multi(y_true, y_probas,class_labels, title='ROC Curves', plot_micro=True, plot_macro=True, classes_to_plot=None, ax=None, figsize=None, cmap='nipy_spectral', title_fontsize="large", text_fontsize="medium"):

    where class_labels is in the form of an array [a,b,c] it would be much easier I think

    opened by Akshay1-6180 1
  • add class prediction error plot

    add class prediction error plot

    This is just a heads up for the programmers that this is a dead library and it functionalities may break any time. There has been not a single update form 2018 and most probably there will be no bug-fix or feature implementations.

    Either implement your own plotting functions or look at other active modules such as yellowbrick.

    scikit-plot is dead and it might break any time, if you use this in production.

    opened by bhishanpdl 0
  • Adding class_names option to gain and lift plots

    Adding class_names option to gain and lift plots

    I'd like to be able to set the names of the classes used in the legend in the the plot_lift_curve() and plot_cumulative_gain() functions.

    I've made a pull request (https://github.com/reiinakano/scikit-plot/pull/109) that adds an optional arg class_names to each of these functions to accomplish this.

    This differs from issue https://github.com/reiinakano/scikit-plot/issues/78 in that I am trying to change the labels in the plot's legend, not omit one of the classes from the plot.

    opened by MichaelFishmanOD 0
Releases(v0.3.7)
  • v0.3.7(Aug 19, 2018)

  • v0.3.5(May 12, 2018)

    New features:

    • plot_precision_recall_curve and plot_roc_curve have been deprecated for plot_precision_recall and plot_roc, respectively. The major difference is the deletion of the curves parameter and the use of plot_macro, plot_micro, and classes_to_plot to choose which curves should be plotted. Thanks to @lugq1990 for this change.
    Source code(tar.gz)
    Source code(zip)
  • v0.3.4(Feb 5, 2018)

  • v0.3.3(Oct 26, 2017)

  • v0.3.2(Oct 25, 2017)

    New Features

    • Gain Chart and Lift Chart added to scikitplot.metrics module #71
    • Updated Jupyter notebook examples for v0.3.x by @ljvmiranda921 #69

    Bugfix

    • Changed deprecated spectral colormap to nipy_spectral by @emredjan #66
    Source code(tar.gz)
    Source code(zip)
  • v0.3.1(Sep 17, 2017)

  • v0.3.0(Sep 13, 2017)

    New features:

    • plot_learning_curve has new parameter scoring to allow custom scoring functions. By @jengelman
    • New plotting function plot_calibration_curves

    Deprecations

    • The Factory API has been deprecated and will be removed in v0.4.0
    • scikitplot.plotters has been deprecated and the functions in the Functions API have been distributed to various new modules. See documentation for more details.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.8(Sep 8, 2017)

    Features

    • New option hide_zeros for plot_confusion_matrix by @ExcaliburZero. #39
    • New option to plot only certain labels in plot_confusion_matrix by @ExcaliburZero. #41
    • New options to set colormaps for plot_pca_2d_projection, plot_silhouette, plot_precision_recall_curve, plot_roc_curve, and plot_confusion_matrix. #50

    Bugfix:

    • Fixed bug with nan values in confusion matrices by @ExcaliburZero (#42)
    Source code(tar.gz)
    Source code(zip)
  • v0.2.7(Jul 9, 2017)

  • v0.2.6(May 17, 2017)

  • v0.2.5(Apr 30, 2017)

  • v0.2.4(Apr 25, 2017)

  • v0.2.3(Mar 19, 2017)

    New features:

    • plot_precision_recall_curve and plot_ks_statistic now have a new curves argument that allows the user to choose which curves should be plotted. Thanks to @doug-friedman for this PR.
    • Jupyter notebook examples are now available thanks to @lstmemery
    Source code(tar.gz)
    Source code(zip)
  • v0.2.2(Feb 26, 2017)

    New features:

    • plot_pca_2d_projection function
    • plot_pca_component_variance function
    • plots now have a figsize, title_fontsize, and text_fontsize feature to allow user to customize the size of the plot. This is particularly crucial for Jupyter notebook users where the default settings come out too small. Thanks to @frankherfert for this idea.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Feb 19, 2017)

  • v0.2.0(Feb 18, 2017)

  • v0.1.0(Feb 17, 2017)

Owner
Reiichiro Nakano
I like working on awesome things with awesome people!
Reiichiro Nakano
Generate visualizations of GitHub user and repository statistics using GitHub Actions.

GitHub Stats Visualization Generate visualizations of GitHub user and repository statistics using GitHub Actions. This project is currently a work-in-

JoelImgu 3 Dec 14, 2022
Some examples with MatPlotLib library in Python

MatPlotLib Example Some examples with MatPlotLib library in Python Point: Run files only in project's directory About me Full name: Matin Ardestani Ag

Matin Ardestani 4 Mar 29, 2022
A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy.

Visdom A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Python. Overview Concepts Setup Usage API To

FOSSASIA 9.4k Jan 07, 2023
基于python爬虫爬取COVID-19爆发开始至今全球疫情数据并利用Echarts对数据进行分析与多样化展示。

COVID-19-Epidemic-Map 基于python爬虫爬取COVID-19爆发开始至今全球疫情数据并利用Echarts对数据进行分析与多样化展示。 觉得项目还不错的话欢迎给一个star! 项目的源码可以正常运行,各个库的版本、数据库的建表语句、运行过程中遇到的坑以及解决方式在笔记.md中都

31 Dec 15, 2022
Make sankey, alluvial and sankey bump plots in ggplot

The goal of ggsankey is to make beautiful sankey, alluvial and sankey bump plots in ggplot2

David Sjoberg 156 Jan 03, 2023
daily report of @arkinvest ETF activity + data collection

ark_invest daily weekday report of @arkinvest ETF activity + data collection This script was created to: Extract and save daily csv's from ARKInvest's

T D 27 Jan 02, 2023
A small script written in Python3 that generates a visual representation of the Mandelbrot set.

Mandelbrot Set Generator A small script written in Python3 that generates a visual representation of the Mandelbrot set. Abstract The colors in the ou

1 Dec 28, 2021
Import, visualize, and analyze SpiderFoot OSINT data in Neo4j, a graph database

SpiderFoot Neo4j Tools Import, visualize, and analyze SpiderFoot OSINT data in Neo4j, a graph database Step 1: Installation NOTE: This installs the sf

Black Lantern Security 42 Dec 26, 2022
CPG represent!

CoolPandasGroup CPG represent! Arianna Brandon Enne Luan Tracie Project requirements: use Pandas to clean and format datasets use Jupyter Notebook to

Enne 3 Feb 07, 2022
AB-test-analyzer - Python class to perform AB test analysis

AB-test-analyzer Python class to perform AB test analysis Overview This repo con

13 Jul 16, 2022
Simulation du problème de Monty Hall avec Python et matplotlib

Le problème de Monty Hall C'est un jeu télévisé où il y a trois portes sur le plateau de jeu. Seule une de ces portes cache un trésor. Il n'y a rien d

ETCHART YANG 1 Jan 06, 2022
Homework 2: Matplotlib and Data Visualization

Homework 2: Matplotlib and Data Visualization Overview These data visualizations were created for my introductory computer science course using Python

Sophia Huang 12 Oct 20, 2022
High performance, editable, stylable datagrids in jupyter and jupyterlab

An ipywidgets wrapper of regular-table for Jupyter. Examples Two Billion Rows Notebook Click Events Notebook Edit Events Notebook Styling Notebook Pan

J.P. Morgan Chase 75 Dec 15, 2022
An open-source tool for visual and modular block programing in python

PyFlow PyFlow is an open-source tool for modular visual programing in python ! Although for now the tool is in Beta and features are coming in bit by

1.1k Jan 06, 2023
A workshop on data visualization in Python with notebooks and exercises for following along.

Beyond the Basics: Data Visualization in Python The human brain excels at finding patterns in visual representations, which is why data visualizations

Stefanie Molin 162 Dec 05, 2022
Python script for writing text on github contribution chart.

Github Contribution Drawer Python script for writing text on github contribution chart. Requirements Python 3.X Getting Started Create repository Put

Steven 0 May 27, 2022
These data visualizations were created as homework for my CS40 class. I hope you enjoy!

Data Visualizations These data visualizations were created as homework for my CS40 class. I hope you enjoy! Nobel Laureates by their Country of Birth

9 Sep 02, 2022
Automatically generate GitHub activity!

Commit Bot Automatically generate GitHub activity! We've all wanted to be the developer that commits every day, but that requires a lot of work. Let's

Ricky 4 Jun 07, 2022
Focus on Algorithm Design, Not on Data Wrangling

The dataTap Python library is the primary interface for using dataTap's rich data management tools. Create datasets, stream annotations, and analyze model performance all with one library.

Zensors 37 Nov 25, 2022
A Python library for plotting hockey rinks with Matplotlib.

Hockey Rink A Python library for plotting hockey rinks with Matplotlib. Installation pip install hockey_rink Current Rinks The following shows the cus

24 Jan 02, 2023