machine learning with logical rules in Python

Last update: Dec 31, 2022

Related tags

Overview

skope-rules

Skope-rules is a Python machine learning module built on top of scikit-learn and distributed under the 3-Clause BSD license.

Skope-rules aims at learning logical, interpretable rules for "scoping" a target class, i.e. detecting with high precision instances of this class.

Skope-rules is a trade off between the interpretability of a Decision Tree and the modelization power of a Random Forest.

See the AUTHORS.rst file for a list of contributors.

Installation

You can get the latest sources with pip :

pip install skope-rules

Quick Start

SkopeRules can be used to describe classes with logical rules :

from sklearn.datasets import load_iris
from skrules import SkopeRules

dataset = load_iris()
feature_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
clf = SkopeRules(max_depth_duplication=2,
                 n_estimators=30,
                 precision_min=0.3,
                 recall_min=0.1,
                 feature_names=feature_names)

for idx, species in enumerate(dataset.target_names):
    X, y = dataset.data, dataset.target
    clf.fit(X, y == idx)
    rules = clf.rules_[0:3]
    print("Rules for iris", species)
    for rule in rules:
        print(rule)
    print()
    print(20*'=')
    print()

SkopeRules can also be used as a predictor if you use the "score_top_rules" method :

from sklearn.datasets import load_boston
from sklearn.metrics import precision_recall_curve
from matplotlib import pyplot as plt
from skrules import SkopeRules

dataset = load_boston()
clf = SkopeRules(max_depth_duplication=None,
                 n_estimators=30,
                 precision_min=0.2,
                 recall_min=0.01,
                 feature_names=dataset.feature_names)

X, y = dataset.data, dataset.target > 25
X_train, y_train = X[:len(y)//2], y[:len(y)//2]
X_test, y_test = X[len(y)//2:], y[len(y)//2:]
clf.fit(X_train, y_train)
y_score = clf.score_top_rules(X_test) # Get a risk score for each test example
precision, recall, _ = precision_recall_curve(y_test, y_score)
plt.plot(recall, precision)
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision Recall curve')
plt.show()

For more examples and use cases please check our documentation. You can also check the demonstration notebooks.

Links with existing literature

The main advantage of decision rules is that they are offering interpretable models. The problem of generating such rules has been widely considered in machine learning, see e.g. RuleFit [1], Slipper [2], LRI [3], MLRules[4].

A decision rule is a logical expression of the form "IF conditions THEN response". In a binary classification setting, if an instance satisfies conditions of the rule, then it is assigned to one of the two classes. If this instance does not satisfy conditions, it remains unassigned.

In [2, 3, 4], rules induction is done by considering each single decision rule as a base classifier in an ensemble, which is built by greedily minimizing some loss function.
In [1], rules are extracted from an ensemble of trees; a weighted combination of these rules is then built by solving a L1-regularized optimization problem over the weights as described in [5].

In this package, we use the second approach. Rules are extracted from tree ensemble, which allow us to take advantage of existing fast algorithms (such as bagged decision trees, or gradient boosting) to produce such tree ensemble. Too similar or duplicated rules are then removed, based on a similarity threshold of their supports.. The main goal of this package is to provide rules verifying precision and recall conditions. It still implement a score (decision_function) method, but which does not solve the L1-regularized optimization problem as in [1]. Instead, weights are simply proportional to the OOB associated precision of the rule.

This package also offers convenient methods to compute predictions with the k most precise rules (cf score_top_rules() and predict_top_rules() functions).

[1] Friedman and Popescu, Predictive learning via rule ensembles,Technical Report, 2005.

[2] Cohen and Singer, A simple, fast, and effective rule learner, National Conference on Artificial Intelligence, 1999.

[3] Weiss and Indurkhya, Lightweight rule induction, ICML, 2000.

[4] Dembczyński, Kotłowski and Słowiński, Maximum Likelihood Rule Ensembles, ICML, 2008.

[5] Friedman and Popescu, Gradient directed regularization, Technical Report, 2004.

Dependencies

skope-rules requires:

Python (>= 2.7 or >= 3.3)
NumPy (>= 1.10.4)
SciPy (>= 0.17.0)
Pandas (>= 0.18.1)
Scikit-Learn (>= 0.17.1)

For running the examples Matplotlib >= 1.1.1 is required.

Documentation

You can access the full project documentation here

You can also check the notebooks/ folder which contains some examples of utilization.

Comments

TerminatedWorkerError

I keep running into a TerminatedWorkerError when running clf.fit with skope rules. I seem to have ample memory so I'm unsure what's going on. Any potential ideas?

Traceback (most recent call last):
  File "experiment.py", line 171, in <module>
    result = process(topic)
  File "experiment.py", line 95, in process
    clf.fit(features, training_data_labels)
  File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/skrules/skope_rules.py", line 312, in fit
    clf.fit(X, y)
  File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/sklearn/ensemble/bagging.py", line 244, in fit
    return self._fit(X, y, self.max_samples, sample_weight=sample_weight)
  File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/sklearn/ensemble/bagging.py", line 378, in _fit
    for i in range(n_jobs))
  File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/sklearn/externals/joblib/parallel.py", line 930, in __call__
    self.retrieve()
  File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/sklearn/externals/joblib/parallel.py", line 833, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/home/ubuntu/.local/share/virtualenvs/taxonomy-analysis2-BU9HWu51/lib/python3.7/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 521, in wrap_future_result
    return future.result(timeout=timeout)
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
sklearn.externals.joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGKILL(-9)}

opened by AlJohri 37

Remove unnecessary checks for numpy/scipy in setup.py.

These deps are listing in install_requires, so will get installed by pip.

Addresses issue #19.

Interesting project, thanks for your dev efforts folks :-)

opened by timstaley 7

Skope Rules should accept any kind of feature name

SkopeRules uses pandas.eval method for evaluating semantic rules. It leads to error when features have meaningful characters in their name (eg: (,)=- ). For example :

from sklearn.datasets import load_iris
from skrules import SkopeRules
dataset = load_iris()

X, y, features_names = dataset.data, dataset.target, dataset.feature_names
y = (y == 0)  # Predicting the first specy vs all
clf = SkopeRules(max_depth_duplication=2,
                 n_estimators=30,
                 precision_min=0.3,
                 recall_min=0.1,
                 feature_names=features_names)
clf.fit(X, y)

will lead to following error :

Traceback (most recent call last):
  File "main.py", line 20, in <module>
    clf.fit(X, y)
  File "/usr/local/lib/python3.6/site-packages/skrules/skope_rules.py", line 350, in fit
    for r in set(rules_from_tree)]
  File "/usr/local/lib/python3.6/site-packages/skrules/skope_rules.py", line 350, in <listcomp>
    for r in set(rules_from_tree)]
  File "/usr/local/lib/python3.6/site-packages/skrules/skope_rules.py", line 600, in _eval_rule_perf
    detected_index = list(X.query(rule).index)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 2297, in query
    res = self.eval(expr, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 2366, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/computation/eval.py", line 290, in eval
    truediv=truediv)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/computation/expr.py", line 732, in __init__
    self.terms = self.parse()
  File "/usr/local/lib/python3.6/site-packages/pandas/core/computation/expr.py", line 749, in parse
    return self._visitor.visit(self.expr)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/computation/expr.py", line 310, in visit
    node = ast.fix_missing_locations(ast.parse(clean))
  File "/usr/local/Cellar/python3/3.6.4/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ast.py", line 35, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<unknown>", line 1
    petal length (cm )<=2.5999999046325684

Skope Rules should accept any kind of feature name. It means we have to transform feature name for computation and transforming it back at the end.

opened by floriangardin 3

Fix import error in modern Python

collections.Iterable alias was removed in Python 3.10 and typing.Iterable alias is marked as deprecated; fallback to explicit import from collections.abc.

opened by patrick-nicholson-czi 1

ImportError: cannot import name 'Iterable' from 'collections'

Python 3.10
skope-rules==1.0.1

Error

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Input In [1], in <module>
     15 from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
     16 from interpret.glassbox import ExplainableBoostingClassifier
---> 17 from skrules import SkopeRules

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/skrules/__init__.py:1, in <module>
----> 1 from .skope_rules import SkopeRules
      2 from .rule import Rule, replace_feature_name
      4 __all__ = ['SkopeRules', 'Rule']

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/skrules/skope_rules.py:2, in <module>
      1 import numpy as np
----> 2 from collections import Counter, Iterable
      3 import pandas
      4 import numbers

ImportError: cannot import name 'Iterable' from 'collections' (/t/pyenv/versions/3.10.2/lib/python3.10/collections/__init__.py)

opened by elcolie 1

Fix/update tests

Description

This PR contains some fixes to get tests working and build passing, primarily around updating tests and imports to handle deprecation of various sklearn testing functions and other imports.

Instead of pinning a new sklearn version, have tried to maintain compatibility with a bunch of try-except blocks, but would be happy to hear thoughts on this approach. May also be worthwhile to add different sklearn versions in the travis CI build.

opened by AndrewTanQB 1
issue in mask indexing

Hi, thank you for sharing this great package.

However, I think I might find a mistake in the mask indexing.

mask = ~samples

samples is numpy array, and when you put ~, you can get -(value+1).

ex. samples = np.array([1,2,3,4]) ~samples [-2, -3, -4, -5]

please check this issue.

Thanks!

opened by stat17-hb 1
Release new version to pypi.org?

There are a number of useful commits on the master branch, e.g. https://github.com/scikit-learn-contrib/skope-rules/pull/24.

It's been more than 1.5 years since the last release. Would it be possible for you to upload a new package to pypi.org?

opened by ecederstrand 1
Any variable name can be used in "feature_names"

Now any variable name can be used in the "feature_names" list parameter of Skope Rules. I decoupled the feature names from the internal queries logic.

opened by floriangardin 1
conda-forge package

It would be nice to add a skope-rules package to conda-forge https://conda-forge.org/ (in addition to pypi)

P.S. You can use grayskull https://github.com/conda-incubator/grayskull to generate a boilerplate for the conda recipe.

opened by candalfigomoro 2
cannot import name 'ScopeRules' from 'skrules'

Hi!

The package import spell, which is clearly described in the package readme, does not work

Six imported. What should I do to make the package work?

opened by avraam-inside 1
Questions about how to use and interpret rules?
Can SkopeRules be used for multiclass classification or only binary classification.

How do I interpret the outputted decision rules? Do the top-k rules in the example notebook correspond to the rules that best classify the test data, ordered in descending order by precision? If I want to classify new test data, do I consider the top-1 rule, the majority vote from the top-k rules, or some other approach?

If I want to understand the underlying method and how rules are computed, is Predictive Learning via Rule Ensembles by Friedman and Popescu the closest work?
opened by preethiseshadri518 0

Not compatible with sklearn v1?

Minimal example:

>>> import sklearn
>>> sklearn.__version__
1.0.1
>>> import skrules
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-3-195b491d5645> in <module>
----> 1 import skrules

~/.virtualenvs/risk-modeling/lib/python3.9/site-packages/skrules/__init__.py in <module>
----> 1 from .skope_rules import SkopeRules
      2 from .rule import Rule, replace_feature_name
      3 
      4 __all__ = ['SkopeRules', 'Rule']

~/.virtualenvs/risk-modeling/lib/python3.9/site-packages/skrules/skope_rules.py in <module>
     10 from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
     11 from sklearn.ensemble import BaggingClassifier, BaggingRegressor
---> 12 from sklearn.externals import six
     13 from sklearn.tree import _tree
     14 

ImportError: cannot import name 'six' from 'sklearn.externals' (/home/mwintner/.virtualenvs/risk-modeling/lib/python3.9/site-packages/sklearn/externals/__init__.py)

According to some stackoverflow sources like this one, six is not in sklearn.externals beyond sklearn v0.23.

opened by mwintner-fn 1

The oob score

I think the oob score computed in the fit function is wrong.

The authors get the oob sample indices by "mask = ~samples", and then apply X[mask, :] to get the oob samples. Actually, I test the case and found that there are many same elements between samples and X[mask,:], and the length of training samples and mask samples are the same. For example, if we totally have 100 samples, when 80 samples are used to train the model, then the length of oob samples should be 100-80=20 (without considering replacement).

I also turn to the implementation of sampling oob of randomforest, and I found following codes:

random_instance = check_random_state(random_state) sample_indices = random_instance.randint(0, samples, max_samples) # get the indices of training samples sample_counts = np.bincount(sample_indices, minlength=len(samples)) unsampled_mask = sample_counts == 0 indices_range = np.arange(len(samples)) unsampled_indices = indices_range[unsampled_mask] # get the indices of oob samples

then the unsampled_indices is the truely oob sample indices.

opened by wjj5881005 0

Releases(v1.0.1)

v1.0.1(Dec 11, 2020)

Source code(tar.gz)
Source code(zip)

Owner

scikit-learn compatible projects

GitHub Repository http://skope-rules.readthedocs.io

Extra blocks for scikit-learn pipelines.

scikit-lego We love scikit learn but very often we find ourselves writing custom transformers, metrics and models. The goal of this project is to atte

941 Dec 30, 2022

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

imbalanced-learn imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-cla

6.2k Jan 01, 2023

Multivariate imputation and matrix completion algorithms implemented in Python

A variety of matrix completion and imputation algorithms implemented in Python 3.6. To install: pip install fancyimpute Do not use conda. We don't sup

1.1k Dec 18, 2022

scikit-learn inspired API for CRFsuite

sklearn-crfsuite sklearn-crfsuite is a thin CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn. sklearn_crfsuite.CRF i

418 Jan 09, 2023

A library of sklearn compatible categorical variable encoders

Categorical Encoding Methods A set of scikit-learn-style transformers for encoding categorical variables into numeric by means of different techniques

2.1k Jan 02, 2023

Large-scale linear classification, regression and ranking in Python

lightning lightning is a library for large-scale linear classification, regression and ranking in Python. Highlights: follows the scikit-learn API con

1.6k Dec 31, 2022

A scikit-learn based module for multi-label et. al. classification

scikit-multilearn scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Pyth

803 Jan 05, 2023

A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

4.2k Dec 28, 2022

machine learning with logical rules in Python

skope-rules Skope-rules is a Python machine learning module built on top of scikit-learn and distributed under the 3-Clause BSD license. Skope-rules a

504 Dec 31, 2022

Data Analysis Baseline Library

dabl The data analysis baseline library. "Mr Sanchez, are you a data scientist?" "I dabl, Mr president." Find more information on the website. State o

122 Dec 27, 2022

Fast solver for L1-type problems: Lasso, sparse Logisitic regression, Group Lasso, weighted Lasso, Multitask Lasso, etc.

celer Fast algorithm to solve Lasso-like problems with dual extrapolation. Currently, the package handles the following problems: Lasso weighted Lasso

168 Dec 13, 2022

Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

213 Jan 02, 2023

(AAAI' 20) A Python Toolbox for Machine Learning Model Combination

combo: A Python Toolbox for Machine Learning Model Combination Deployment & Documentation & Stats Build Status & Coverage & Maintainability & License

606 Dec 21, 2022

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

3.7k Jan 01, 2023

machine learning with logical rules in Python

Related tags

Overview

skope-rules

Installation

Quick Start

Links with existing literature

Dependencies

Documentation

Comments

Description

Releases(v1.0.1)

v1.0.1(Dec 11, 2020)

Owner

Extra blocks for scikit-learn pipelines.

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

Multivariate imputation and matrix completion algorithms implemented in Python

scikit-learn inspired API for CRFsuite

A library of sklearn compatible categorical variable encoders

Large-scale linear classification, regression and ranking in Python

A scikit-learn based module for multi-label et. al. classification

A library of extension and helper modules for Python's data analysis and machine learning libraries.

machine learning with logical rules in Python

Data Analysis Baseline Library

Fast solver for L1-type problems: Lasso, sparse Logisitic regression, Group Lasso, weighted Lasso, Multitask Lasso, etc.

Scikit-learn compatible estimation of general graphical models

(AAAI' 20) A Python Toolbox for Machine Learning Model Combination

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

A Python library for dynamic classifier and ensemble selection

Topological Data Analysis for Python🐍

scikit-learn cross validators for iterative stratification of multilabel data