A Java implementation of the experiments for the paper "k-Center Clustering with Outliers in Sliding Windows"

Overview

OutliersSlidingWindows

A Java implementation of the experiments for the paper "k-Center Clustering with Outliers in Sliding Windows"

Dataset generation

The original datasets, namely Higgs and Cover, are provided (compressed) in the data folder. One can download and preprocess the datasets as follows:

wget https://archive.ics.uci.edu/ml/machine-learning-databases/00280/HIGGS.csv.gz
cat HIGGS.csv.gz | gunzip | cut -d ',' -f 23,24,25,26,27,28,29 > higgs.dat

wget https://archive.ics.uci.edu/ml/machine-learning-databases/covtype/covtype.data.gz
gunzip covtype.data.gz

The script datasets.sh decompresses the zipped original datasets and generates the artificial datasets used in the paper. In particular, the program InjectOutliers takes a dataset and injects artificial outliers. It takes as an argument:

  • in, the path to the input dataset
  • out, the path to the output file
  • p, the probability with which to inject an outlier after every point
  • r, the scaling factor for the norm of the outlier points
  • d, the dimension of the points

The program GenerateArtificial generates automatically a dataset with points in a unit ball with outliers on the suface of a ball of radius r. It takes as an argument:

  • out, the path to the output file
  • p, the probability with which to inject an outlier
  • r, the radius of the outer ball
  • d, the dimension of the points

Running the experiments

The script exec.sh runs a representative subset of the experiments presented in the paper.

The program Main runs the experiments on the comparison of our k-center algorithm with the sequential ones. It takes as and argument:

  • in, the path to the input dataset
  • out, the path to the output file
  • d, the dimension of the points
  • k, the number of centers
  • z, the number of outliers
  • N, the window size
  • beta, eps, lambda, parameters of our method
  • minDist, maxDist, parameters of our method
  • samp, the number of candidate centers for sampled-charikar
  • doChar, if set to 1 executes charikar, else it is skipped

It outputs, in the folder out/k-cen/, a file with:

  • the first line reporting the parameters of the experiments
  • a line for each of the sampled windows reporting, for each of the four methods, the update times, the query times, the memory usage and the clustering radius.

The program MainLambda runs the experiments on the sensitivity on lambda. It takes as and argument:

  • in, the path to the input dataset
  • out, the path to the output file
  • d, the dimension of the points
  • k, the number of centers
  • z, the number of outliers
  • N, the window size
  • beta, eps, lambda, parameters of our method (lambda unused)
  • minDist, maxDist, parameters of our method
  • doSlow, if set to 1 executes the slowest test, else it is skipped

It outputs, in the folder out/lam/, a file with:

  • the first line reporting the parameters of the experiments
  • a line for each of the sampled windows reporting, for each of the four methods, the update times, the query times, the memory usage due to histograms, the total memory usage and the clustering radius.

The program MainEffDiam runs the experiments on the effective diameter algorithms. It takes as and argument:

  • in, the path to the input dataset
  • out, the path to the output file
  • d, the dimension of the points
  • alpha, fraction fo distances to discard
  • eta, lower bound on ratio between effective diameter and diameter
  • N, the window size
  • beta, eps, lambda, parameters of our method
  • minDist, maxDist, parameters of our method
  • doSeq, if set to 1 executes the sequential method, else it is skipped

It outputs, in the folder out/diam/, a file with:

  • the first line reporting the parameters of the experiments
  • a line for each of the sampled windows reporting, for each of the two methods, the update times, the query times, the memory usage and the effective diameter estimate.
Owner
PaoloPellizzoni
PaoloPellizzoni
Implementing a simplified copy of Shazam application from scratch using MinHashing and LSH.

Building Shazam from scratch In this repository we tried to implement a simplified copy of the Shazam application able to tell you the name of a song

Arturo Ghinassi 0 Nov 17, 2022
Contrastive Multi-View Representation Learning on Graphs

Contrastive Multi-View Representation Learning on Graphs This work introduces a self-supervised approach based on contrastive multi-view learning to l

Kaveh 208 Dec 23, 2022
CausaLM: Causal Model Explanation Through Counterfactual Language Models

CausaLM: Causal Model Explanation Through Counterfactual Language Models Authors: Amir Feder, Nadav Oved, Uri Shalit, Roi Reichart Abstract: Understan

Amir Feder 39 Jul 10, 2022
Simple, efficient and flexible vision toolbox for mxnet framework.

MXbox: Simple, efficient and flexible vision toolbox for mxnet framework. MXbox is a toolbox aiming to provide a general and simple interface for visi

Ligeng Zhu 31 Oct 19, 2019
CoMoGAN: continuous model-guided image-to-image translation. CVPR 2021 oral.

CoMoGAN: Continuous Model-guided Image-to-Image Translation Official repository. Paper CoMoGAN: continuous model-guided image-to-image translation [ar

166 Dec 31, 2022
The Generic Manipulation Driver Package - Implements a ROS Interface over the robotics toolbox for Python

Armer Driver Armer aims to provide an interface layer between the hardware drivers of a robotic arm giving the user control in several ways: Joint vel

QUT Centre for Robotics (QCR) 13 Nov 26, 2022
Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in ONNX

ONNX msg_chn_wacv20 depth completion Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20 model in

Ibai Gorordo 19 Oct 22, 2022
SegNet including indices pooling for Semantic Segmentation with tensorflow and keras

SegNet SegNet is a model of semantic segmentation based on Fully Comvolutional Network. This repository contains the implementation of learning and te

Yuta Kamikawa 172 Dec 23, 2022
code release for USENIX'22 paper `On the Security Risks of AutoML`

This project is a minimized runnable project cut from trojanzoo, which contains more datasets, models, attacks and defenses. This repo will not be mai

Ren Pang 5 Apr 19, 2022
Model parallel transformers in Jax and Haiku

Mesh Transformer Jax A haiku library using the new(ly documented) xmap operator in Jax for model parallelism of transformers. See enwik8_example.py fo

Ben Wang 4.8k Jan 01, 2023
Official PyTorch implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

U-GAT-IT — Official PyTorch Implementation : Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Imag

Hyeonwoo Kang 2.4k Jan 04, 2023
Rotary Transformer

[中文|English] Rotary Transformer Rotary Transformer is an MLM pre-trained language model with rotary position embedding (RoPE). The RoPE is a relative

325 Jan 03, 2023
Randstad Artificial Intelligence Challenge (powered by VGEN). Soluzione proposta da Stefano Fiorucci (anakin87) - primo classificato

Randstad Artificial Intelligence Challenge (powered by VGEN) Soluzione proposta da Stefano Fiorucci (anakin87) - primo classificato Struttura director

Stefano Fiorucci 1 Nov 13, 2021
Code for generating the figures in the paper "Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Linearly Classified Under All Possible Views?"

Code for running simulations for the paper "Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Lin

Matthew Farrell 1 Nov 22, 2022
LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021 We propose a cross encoder model (LTR_CrossEncoder) for information retrieval, re-retrie

Xuan Hieu Duong 7 Jan 12, 2022
NeRD: Neural Reflectance Decomposition from Image Collections

NeRD: Neural Reflectance Decomposition from Image Collections Project Page | Video | Paper | Dataset Implementation for NeRD. A novel method which dec

Computergraphics (University of Tübingen) 195 Dec 29, 2022
Thermal Control of Laser Powder Bed Fusion using Deep Reinforcement Learning

This repository is the implementation of the paper "Thermal Control of Laser Powder Bed Fusion Using Deep Reinforcement Learning", linked here. The project makes use of the Deep Reinforcement Library

BaratiLab 11 Dec 27, 2022
Demo code for paper "Learning optical flow from still images", CVPR 2021.

Depthstillation Demo code for "Learning optical flow from still images", CVPR 2021. [Project page] - [Paper] - [Supplementary] This code is provided t

130 Dec 25, 2022
The code for paper Efficiently Solve the Max-cut Problem via a Quantum Qubit Rotation Algorithm

Quantum Qubit Rotation Algorithm Single qubit rotation gates $$ U(\Theta)=\bigotimes_{i=1}^n R_x (\phi_i) $$ QQRA for the max-cut problem This code wa

SheffieldWang 0 Oct 18, 2021
MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

Main repo for ECCV 2020 paper MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images. visual.cs.brown.edu/matryodshka

Brown University Visual Computing Group 75 Dec 13, 2022