(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework

Overview

(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework


Background: Outlier detection (OD) is a key data mining task for identifying abnormal objects from general samples with numerous high-stake applications including fraud detection and intrusion detection.

To scale outlier detection (OD) to large-scale, high-dimensional datasets, we propose TOD, a novel system that abstracts OD algorithms into basic tensor operations for efficient GPU acceleration.

The corresponding paper. The code is being cleaned up and released. Please watch and star!

One reason to use it:

On average, TOD is 11 times faster than PyOD!

If you need another reason: it can handle much larger datasets:more than a million sample OD within an hour!


TOD is featured for:

  • Unified APIs, detailed documentation, and examples for the easy use (under construction)
  • Supports more than 10 different OD algorithms and more are being added
  • TOD supports multi-GPU acceleration
  • Advanced techniques like provable quantization

Programming Model Interface

Complex OD algorithms can be abstracted into common tensor operators.

https://raw.githubusercontent.com/yzhao062/pytod/master/figs/abstraction.png

For instance, ABOD and COPOD can be assembled by the basic tensor operators.

https://raw.githubusercontent.com/yzhao062/pytod/master/figs/abstraction_example.png

End-to-end Performance Comparison with PyOD

Overall, it is much (on avg. 11 times) faster than PyOD takes way less run time.

https://raw.githubusercontent.com/yzhao062/pytod/master/figs/run_time.png

Code is being released. Watch and star for the latest news!

Comments
  • Error while installing package

    Error while installing package

    I installed Pytorch 1.10 from their site. It seen in virtual environment. I try pip install pytod but when searching for pytorch, it cannot find it because it searches with the "pytorch" package, not the "torch" package.

    ERROR: Could not find a version that satisfies the requirement pytorch>=1.7 (from pytod) (from versions: 0.1.2, 1.0.2)
    ERROR: No matching distribution found for pytorch>=1.7
    
    opened by nuriakiin 1
  • decision_function() returns None

    decision_function() returns None

    Thanks for the package. When I try to implement LOF (or KNN) decision_function() on test data returns empty object. Is there a fix to this? Following is the code that replicates the issue (on GPU):

    from pytod.models.lof import LOF import torch import numpy as np

    x = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [75,80]], dtype=np.float32) x = torch.from_numpy(x)

    y = np.array([[6, 5], [1, 2], [3, 4], [5, 1], [11,12]], dtype=np.float32) y = torch.from_numpy(y)

    lof = LOF(n_neighbors=2, device = 'cuda:0')

    lof.fit(x)

    print(lof.decision_function(y))

    opened by sugatc 0
  • Support for novelty detection and changing distance metric with local outlier factor

    Support for novelty detection and changing distance metric with local outlier factor

    The current implementation of LOF doesn't allow changing the distance metric to 'cosine', for example or setting novelty = True which prevents it from being used for novelty detection task. It will be great if support can be added for these.

    opened by sugatc 2
  • can't fit model in colab

    can't fit model in colab

    when i try fit on any model in colab gpu instance i get the following error. my dataset has 2 columns and 1 million rows:


    AttributeError Traceback (most recent call last) in () 4 clf_name = 'KNN' 5 clf = LOF() ----> 6 clf.fit(X)

    3 frames /usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in getattr(self, name) 5485 ): 5486 return self[name] -> 5487 return object.getattribute(self, name) 5488 5489 def setattr(self, name: str, value) -> None:

    AttributeError: 'DataFrame' object has no attribute 'to'

    opened by yairVanti 0
  • clean up reproducibility scripts

    clean up reproducibility scripts

    We are cleaning up these scripts for an easy run, while the primary results are reproducible with the compare_real_data.py (https://github.com/yzhao062/pytod/tree/main/reproducibility)

    enhancement 
    opened by yzhao062 0
Releases(v0.0.2)
  • v0.0.2(Jun 19, 2022)

    v<0.0.1>, <04/12/2021> -- Add LOF. v<0.0.1>, <04/23/2021> -- Add ABOD. v<0.0.2>, <06/19/2021> -- Add PCA and HBOS. v<0.0.2>, <06/19/2021> -- Turn on test suites.

    Now we have updated both the paper the repo to cover more algorithms.

    Source code(tar.gz)
    Source code(zip)
Owner
Yue Zhao
Ph.D. Student @ CMU. Outlier Detection Systems | ML Systems (MLSys) | Anomaly/Outlier Detection | AutoML. Twitter@ yzhao062
Yue Zhao
An API-first distributed deployment system of deep learning models using timeseries data to analyze and predict systems behaviour

Gordo Building thousands of models with timeseries data to monitor systems. Table of content About Examples Install Uninstall Developer manual How to

Equinor 26 Dec 27, 2022
This repository contains tutorials for the py4DSTEM Python package

py4DSTEM Tutorials This repository contains tutorials for the py4DSTEM Python package. For more information about py4DSTEM, including installation ins

11 Dec 23, 2022
Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Official PyTorch Implementation for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'2021, Oral Presentation) HOTR: End-to-

Kakao Brain 114 Nov 28, 2022
Neural Reprojection Error: Merging Feature Learning and Camera Pose Estimation

Neural Reprojection Error: Merging Feature Learning and Camera Pose Estimation This is the official repository for our paper Neural Reprojection Error

Hugo Germain 78 Dec 01, 2022
Single-Shot Motion Completion with Transformer

Single-Shot Motion Completion with Transformer 👉 [Preprint] 👈 Abstract Motion completion is a challenging and long-discussed problem, which is of gr

FuxiCV 78 Dec 29, 2022
Implementation of UNET architecture for Image Segmentation.

Semantic Segmentation using UNET This is the implementation of UNET on Carvana Image Masking Kaggle Challenge About the Dataset This dataset contains

Anushka agarwal 4 Dec 21, 2021
Script that attempts to force M1 macs into RGB mode when used with monitors that are defaulting to YPbPr.

fix_m1_rgb Script that attempts to force M1 macs into RGB mode when used with monitors that are defaulting to YPbPr. No warranty provided for using th

Kevin Gao 116 Jan 01, 2023
These are the materials for the paper "Few-Shot Out-of-Domain Transfer Learning of Natural Language Explanations"

Few-shot-NLEs These are the materials for the paper "Few-Shot Out-of-Domain Transfer Learning of Natural Language Explanations". You can find the smal

Yordan Yordanov 0 Oct 21, 2022
Pytorch and Keras Implementations of Hyperspectral Image Classification -- Traditional to Deep Models: A Survey for Future Prospects.

The repository contains the implementations for Hyperspectral Image Classification -- Traditional to Deep Models: A Survey for Future Prospects. Model

Ankur Deria 115 Jan 06, 2023
Open source Python module for computer vision

About PCV PCV is a pure Python library for computer vision based on the book "Programming Computer Vision with Python" by Jan Erik Solem. More details

Jan Erik Solem 1.9k Jan 06, 2023
Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

Real-ESRGAN Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data Ported from https://github.com/xinntao/Real-ESRGAN Depend

Holy Wu 44 Dec 27, 2022
Intrinsic Image Harmonization

Intrinsic Image Harmonization [Paper] Zonghui Guo, Haiyong Zheng, Yufeng Jiang, Zhaorui Gu, Bing Zheng Here we provide PyTorch implementation and the

VISION @ OUC 44 Dec 21, 2022
Code of paper Interact, Embed, and EnlargE (IEEE): Boosting Modality-specific Representations for Multi-Modal Person Re-identification.

Interact, Embed, and EnlargE (IEEE): Boosting Modality-specific Representations for Multi-Modal Person Re-identification We provide the codes for repr

12 Dec 12, 2022
Pytorch implementation for DFN: Distributed Feedback Network for Single-Image Deraining.

DFN:Distributed Feedback Network for Single-Image Deraining Abstract Recently, deep convolutional neural networks have achieved great success for sing

6 Nov 05, 2022
Numenta published papers code and data

Numenta research papers code and data This repository contains reproducible code for selected Numenta papers. It is currently under construction and w

Numenta 293 Jan 06, 2023
Benchmark tools for Compressive LiDAR-to-map registration

Benchmark tools for Compressive LiDAR-to-map registration This repo contains the released version of code and datasets used for our IROS 2021 paper: "

Allie 9 Nov 24, 2022
MMdet2-based reposity about lightweight detection model: Nanodet, PicoDet.

Lightweight-Detection-and-KD MMdet2-based reposity about lightweight detection model: Nanodet, PicoDet. This repo also includes detection knowledge di

Egqawkq 12 Jan 05, 2023
a pytorch implementation of auto-punctuation learned character by character

Learning Auto-Punctuation by Reading Engadget Articles Link to Other of my work 🌟 Deep Learning Notes: A collection of my notes going from basic mult

Ge Yang 137 Nov 09, 2022
Learning Modified Indicator Functions for Surface Reconstruction

Learning Modified Indicator Functions for Surface Reconstruction In this work, we propose a learning-based approach for implicit surface reconstructio

4 Apr 18, 2022
Kaggle | 9th place single model solution for TGS Salt Identification Challenge

UNet for segmenting salt deposits from seismic images with PyTorch. General We, tugstugi and xuyuan, have participated in the Kaggle competition TGS S

Erdene-Ochir Tuguldur 276 Dec 20, 2022