[ICML 2021] A fast algorithm for fitting robust decision trees.

Overview

GROOT: Growing Robust Trees

Growing Robust Trees (GROOT) is an algorithm that fits binary classification decision trees such that they are robust against user-specified adversarial examples. The algorithm closely resembles algorithms used for fitting normal decision trees (i.e. CART) but changes the splitting criterion and the way samples propagate when creating a split.

This repository contains the module groot that implements GROOT as a Scikit-learn compatible classifier, an adversary for model evaluation and easy functions to import datasets. For documentation see https://groot.cyber-analytics.nl

Simple example

To train and evaluate GROOT on a toy dataset against an attacker that can move samples by 0.5 in each direction one can use the following code:

from groot.adversary import DecisionTreeAdversary
from groot.model import GrootTreeClassifier

from sklearn.datasets import make_moons

X, y = make_moons(noise=0.3, random_state=0)
X_test, y_test = make_moons(noise=0.3, random_state=1)

attack_model = [0.5, 0.5]
is_numerical = [True, True]
tree = GrootTreeClassifier(attack_model=attack_model, is_numerical=is_numerical, random_state=0)

tree.fit(X, y)
accuracy = tree.score(X_test, y_test)
adversarial_accuracy = DecisionTreeAdversary(tree, "groot").adversarial_accuracy(X_test, y_test)

print("Accuracy:", accuracy)
print("Adversarial Accuracy:", adversarial_accuracy)

Installation

groot can be installed from PyPi: pip install groot-trees

To use Kantchelian's MILP attack it is required that you have GUROBI installed along with their python package: python -m pip install -i https://pypi.gurobi.com gurobipy

Specific dependency versions

To reproduce our experiments with exact package versions you can clone the repository and run: pip install -r requirements.txt

We recommend using virtual environments.

Reproducing 'Efficient Training of Robust Decision Trees Against Adversarial Examples' (article)

To reproduce the results from the paper we provide generate_k_fold_results.py, a script that takes the trained models (from JSON format) and generates tables and figures. The resulting figures generate under /out/.

To not only generate the results but to also retrain all models we include the scripts train_kfold_models.py and fit_chen_xgboost.py. The first script runs the algorithms in parallel for each dataset then outputs to /out/trees/ and /out/forests/. Warning: the script can take a long time to run (about a day given 16 cores). The second script train specifically the Chen et al. boosting ensembles. /out/results.zip contains all results from when we ran the scripts.

To experiment on image datasets we have a script image_experiments.py that fits and output the results. In this script, one can change the dataset variable to 'mnist' or 'fmnist' to switch between the two.

The scripts summarize_datasets.py and visualize_threat_models.py output some figures we used in the text.

Implementation details

The TREANT implementation (groot.treant.py) is copied almost completely from the authors of TREANT at https://github.com/gtolomei/treant with small modifications to better interface with the experiments. The heuristic by Chen et al. runs in the GROOT code, only with a different score function. This score function can be enabled by setting chen_heuristic=True on a GrootTreeClassifier before calling .fit(X, y). The provably robust boosting implementation comes almost completely from their code at https://github.com/max-andr/provably-robust-boosting and we use a small wrapper around their code (groot.provably_robust_boosting.wrapper.py) to use it. When we recorded the runtimes we turned off all parallel options in the @jit annotations from the code. The implementation of Chen et al. boosting can be found in their own repo https://github.com/chenhongge/RobustTrees, from whic we need to compile and copy the binary xgboost to the current directory. The script fit_chen_xgboost.py then calls this binary and uses the command line interface to fit all models.

Important note on TREANT

To encode L-infinity norms correctly we had to modify TREANT to NOT apply rules recursively. This means we added a single break statement in the treant.Attacker.__compute_attack() method. If you are planning on using TREANT with recursive attacker rules then you should remove this statement or use TREANT's unmodified code at https://github.com/gtolomei/treant .

Contact

For any questions or comments please create an issue or contact me directly.

Comments
  • Reproducing results from the article, issue with runtimes.csv

    Reproducing results from the article, issue with runtimes.csv

    Hello! I am trying to reproduce results from the article, and I can't figure out certain problem. First I am trying to run train_kfold_models, but the code always ouputs an error: "ImportError: cannot import name 'GrootTree' from 'groot.model'". Is there something wrong with the .py file I am trying to run, or is this problem something that doesn't occur to you and everyone else (-->something wrong on computer or files or environment)?

    Onni Mansikkamäki

    opened by OnniMansikkamaki 3
  • is_numerical argument GrootTreeClassifier

    is_numerical argument GrootTreeClassifier

    Running the example code on the make moons data in the README I get:

    Traceback (most recent call last):
      File "/home/.../groot_test.py", line 11, in <module>
        tree = GrootTreeClassifier(attack_model=attack_model, is_numerical=is_numerical, random_state=0)
    TypeError: __init__() got an unexpected keyword argument 'is_numerical'
    

    Leaving out the argument and having this line instead: tree = GrootTreeClassifier(attack_model=attack_model, random_state=0) results in this error:

    Traceback (most recent call last):
      File "/home/.../groot_test.py", line 15, in <module>
        adversarial_accuracy = DecisionTreeAdversary(tree, "groot").adversarial_accuracy(X_test, y_test)
      File "/home/.../venv/lib/python3.9/site-packages/groot/adversary.py", line 259, in __init__
        self.is_numeric = self.decision_tree.is_numerical
    AttributeError: 'GrootTreeClassifier' object has no attribute 'is_numerical'
    

    I'm guessing the code got an update, but the readme didn't. Or I made a stupid mistake, also very possible.

    opened by laudv 2
  • Reproducing result from paper

    Reproducing result from paper

    Hello! I am trying to reproduce the results from the paper. I am struggling to find, where these files: generate_k_fold_results.py, train_kfold_models.py, fit_chen_xgboost.py, image_experiments.py, summarize_datasets.py and visualize_threat_models.py are provided?

    Onni Mansikkamäki

    opened by OnniMansikkamaki 0
  • Regression decision trees and random forests

    Regression decision trees and random forests

    This PR adds GROOT decision trees and random forests that use the adversarial sum of absolute errors to make splits. It also adds new tests, speeds them up and updates the documentation.

    opened by daniel-vos 0
  • Add regression, tests and refactor into base class

    Add regression, tests and refactor into base class

    This PR adds a regression GROOT tree based on the adversarial sum of absolute errors, more tests and refactors GROOT trees into a base class (BaseGrootTree) with subclasses GrootTreeClassifier and GrootTreeRegressor extending it.

    opened by daniel-vos 0
Releases(v0.0.1)
Owner
Cyber Analytics Lab
@ Delft University of Technology
Cyber Analytics Lab
Implementation of Memory-Efficient Neural Networks with Multi-Level Generation, ICCV 2021

Memory-Efficient Multi-Level In-Situ Generation (MLG) By Jiaqi Gu, Hanqing Zhu, Chenghao Feng, Mingjie Liu, Zixuan Jiang, Ray T. Chen and David Z. Pan

Jiaqi Gu 2 Jan 04, 2022
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
A script that trains a model to recognize handwritten digits using the MNIST data set.

handwritten-digits-recognition A script that trains a model to recognize handwritten digits using the MNIST data set. Then it loads external files and

Hamza Sayih 1 Oct 30, 2021
SimBERT升级版(SimBERTv2)!

RoFormer-Sim RoFormer-Sim,又称SimBERTv2,是我们之前发布的SimBERT模型的升级版。 介绍 https://kexue.fm/archives/8454 训练 tensorflow 1.14 + keras 2.3.1 + bert4keras 0.10.6 下载

318 Dec 31, 2022
GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks This repository implements a capsule model Inten

Joel Huang 15 Dec 24, 2022
[ICCV'21] Pri3D: Can 3D Priors Help 2D Representation Learning?

Pri3D: Can 3D Priors Help 2D Representation Learning? [ICCV 2021] Pri3D leverages 3D priors for downstream 2D image understanding tasks: during pre-tr

Ji Hou 124 Jan 06, 2023
[AAAI 2022] Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

Sparse Structure Learning via Graph Neural Networks for inductive document classification Make graph dataset create co-occurrence graph for datasets.

16 Dec 22, 2022
《K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters》(2020)

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters This repository is the implementation of the paper "K-Adapter: Infusing Knowledge

Microsoft 118 Dec 13, 2022
Offical implementation for "Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation".

Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation (NeurIPS 2021) by Qiming Hu, Xiaojie Guo. Dependencies P

Qiming Hu 31 Dec 20, 2022
Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

AimCLR This is an official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Reco

Gty 44 Dec 17, 2022
[PNAS2021] The neural architecture of language: Integrative modeling converges on predictive processing

The neural architecture of language: Integrative modeling converges on predictive processing Code accompanying the paper The neural architecture of la

Martin Schrimpf 36 Dec 01, 2022
Active window border replacement for window managers.

xborder Active window border replacement for window managers. Usage git clone https://github.com/deter0/xborder cd xborder chmod +x xborders ./xborder

deter 250 Dec 30, 2022
Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan (ECCV 2020, in PyTorch)

Contrastive Unpaired Translation (CUT) video (1m) | video (10m) | website | paper We provide our PyTorch implementation of unpaired image-to-image tra

1.7k Dec 27, 2022
CCPD: a diverse and well-annotated dataset for license plate detection and recognition

CCPD (Chinese City Parking Dataset, ECCV) UPdate on 10/03/2019. CCPD Dataset is now updated. We are confident that images in subsets of CCPD is much m

detectRecog 1.8k Dec 30, 2022
Fit Fast, Explain Fast

FastExplain Fit Fast, Explain Fast Installing pip install fast-explain About FastExplain FastExplain provides an out-of-the-box tool for analysts to

8 Dec 15, 2022
Detecting drunk people through thermal images using Deep Learning (CNN)

Drunk Detection CNN Detecting drunk people through thermal images using Deep Learning (CNN) Dataset We used thermal images provided by Electronics Lab

Giacomo Ferretti 3 Oct 27, 2022
ADB-IP-ROTATION - Use your mobile phone to gain a temporary IP address using ADB and data tethering

ADB IP ROTATE This an Python script based on Android Debug Bridge (adb) shell sc

Dor Bismuth 2 Jul 12, 2022
[CVPR'22] Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast

wseg Overview The Pytorch implementation of Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast. [arXiv] Though image-level weakly

Ye Du 96 Dec 30, 2022
Arbitrary Distribution Modeling with Censorship in Real Time 59 2 60 3 Bidding Advertising for KDD'21

Arbitrary_Distribution_Modeling This repo implements the Neighborhood Likelihood Loss (NLL) and Arbitrary Distribution Modeling (ADM, with Interacting

7 Jan 03, 2023
[CVPR 2021] Anycost GANs for Interactive Image Synthesis and Editing

Anycost GAN video | paper | website Anycost GANs for Interactive Image Synthesis and Editing Ji Lin, Richard Zhang, Frieder Ganz, Song Han, Jun-Yan Zh

MIT HAN Lab 726 Dec 28, 2022