This repository contains the DendroMap implementation for scalable and interactive exploration of image datasets in machine learning.

Overview

DendroMap

DendroMap is an interactive tool to explore large-scale image datasets used for machine learning.

A deep understanding of your data can be vital to train or debug your model effectively. However, due to the lack of structure and little-to-no metadata, it can be difficult to gain any insight into large-scale image datasets.

DendroMap adds structure to the data by hierarchically clustering together similar images. Then, the clusters are displayed in a modified treemap visualization that supports zooming.

Check out the live demo of DendroMap and explore for yourself on a few different datasets. If you're interested in

  • the DendroMap motivations
  • how we created the DendroMap visualization
  • DendroMap's effectiveness: user study on DendroMap compared to t-SNE grid for exploration

be sure to also check out our research paper:

Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps.
Donald Bertucci, Md Montaser Hamid, Yashwanthi Anand, Anita Ruangrotsakun, Delyar Tabatabai, Melissa Perez, and Minsuk Kahng.
arXiv preprint arXiv:2205.06935, 2022.

Use Your Own Data

In the public deployment, we hosted our data in the DendroMap Data repository. You can use your own data by following the instructions and example in the DendroMap Data README.md and you can use our python functions found in the clustering folder in this repo. There, you will find specific examples and instructions for how to generate the clustering files.

After generating those files, you can add another option in the src/dataOptions.js file as an object to specify how to read your data with the correct format. This is also detailed in the DendroMap Data README.md, and is simple as adding an option like this:

{
	dataset: "YOUR DATASET NAME",
	model: "YOUR MODEL NAME",
	cluster_filepath: "CLUSTER_FILEPATH",
	class_cluster_filepath: "CLASS_CLUSTER_FILEPATH**OPTIONAL**",
	image_filepath: "IMAGE_FILEPATH",
}

in the src/dataOptions.js options array. Paths start from the public folder, so put your data in there. For more information, go to the README.md in the clustering folder. Notebooks that computed the data in DendroMap Data are located there.

DendroMap Component

The DendroMap treemap visualization itself (not the whole project) only relies on having d3.js and the accompanying Javascript files in the src/components/dendroMap directory. You can reuse that Svelte component by importing from src/components/dendroMap/DendroMap.svelte.

The Component is used in src/App.svelte for an example on what props it takes. Here is the rundown of a simple example: at the bare minimum you can create the DendroMap component with these props (propName:type).

<DendroMap
	dendrogramData:dendrogramNode // (root node as nested JSON from dendrogram-data repo)
	imageFilepath:string // relative path from public dir
	imageWidth:number
	imageHeight:number
	width:number
	height:number
	numClustersShowing:number // > 1
/>

A more comprehensive list of props is below, but please look in the src/components/dendroMap/DendroMap.svelte file to see more details: there are many defaults arguments.

<DendroMap
	dendrogramData: dendrogramNode // (root node as nested JSON from dendrogram-data repo)
	imageFilepath: string // relative path from public dir
	imageWidth: number
	imageHeight: number
	width: number
	height: number
	numClustersShowing: number // > 1

	// the very long list of optional props that you can use to customize the DendroMap
	// ? is not in the actual name, just indicates optional
	highlightedOpacity?: number // between [0.0, 1.0]
	hiddenOpacity?: number // between [0.0, 1.0]
	transitionSpeed?: number // milliseconds for the animation of zooming
	clusterColorInterpolateCallback?: (normalized: number) => string // by default uses d3.interpolateGreys
	labelColorCallback?: (d: d3.HierarchyNode) => string
	labelSizeCallback?: (d: d3.HierarchyNode) => string
	misclassificationColor?: string
	outlineStrokeWidth?: string
	outerPadding?: number // the outer perimeter space of a rects
	innerPadding?: number // the touching inside space between rects
	topPadding?: number // additional top padding on the top of rects
	labelYSpace?: number // shifts the image grid down to make room for label on top

	currentParentCluster?: d3.HierarchyNode // this argument is used to bind: for svelte, not really a prop
	// breadth is the default and renders nodes left to right breadth first traversal
	// min_merging_distance is the common way to get dendrogram clusters from a dendrogram
	// max_node_count traverses and splits the next largest sized node, resulting in an even rendering
	renderingMethod?: "breadth" | "min_merging_distance" | "max_node_count" | "custom_sort"
	// this is only in effect if the renderingMethod is "custom_sort". Nodes last are popped and rendered first in the sort
	customSort?: (a: dendrogramNode, b: dendrogramNode) => number // see example in code
	imagesToFocus?: number[] // instance index of the ones to highlight
	outlineMisclassified?: boolean
	focusMisclassified?: boolean
	clusterLabelCallback?: (d: d3.HierarchyNode) => string
	imageTitleCallback?: (d: d3.HierarchyNode) => string

	// will fire based on user interaction
	// detail contains <T> {data: T, element: HTMLElement, event}
	on:imageClick?: ({detail}) => void
	on:imageMouseEnter?: ({detail}) => void
	on:imageMouseLeave?: ({detail}) => void
	on:clusterClick?: ({detail}) => void
	on:clusterMouseEnter?: ({detail}) => void
	on:clusterMouseLeave?: ({detail}) => void
/>

Run Locally!

This project uses Svelte. You can run the code on your local machine by using one of the following: development or build.

Development

cd dendromap      # inside the dendromap directory
npm install       # install packages if you haven't
npm run dev       # live-reloading server on port 8080

then navigate to port 8080 for a live-reloading on file change development server.

Build

cd dendromap		# inside the dendromap directory
npm install       	# install packages if you haven't
npm run build       	# build project
npm run start		# run on port 8080

then navigate to port 8080 for the static build server.

Links

Owner
DIV Lab
Data Interaction and Visualization Lab at Oregon State University
DIV Lab
Converts given image (png, jpg, etc) to amogus gif.

Image to Amogus Converter Converts given image (.png, .jpg, etc) to an amogus gif! Usage Place image in the /target/ folder (or anywhere realistically

Hank Magan 1 Nov 24, 2021
Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Single Optical Path

Keyhole Imaging Code & Dataset Code associated with the paper "Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Singl

Stanford Computational Imaging Lab 20 Feb 03, 2022
scAR (single-cell Ambient Remover) is a package for data denoising in single-cell omics.

scAR scAR (single cell Ambient Remover) is a package for denoising multiple single cell omics data. It can be used for multiple tasks, such as, sgRNA

19 Nov 28, 2022
Deep Networks with Recurrent Layer Aggregation

RLA-Net: Recurrent Layer Aggregation Recurrence along Depth: Deep Networks with Recurrent Layer Aggregation This is an implementation of RLA-Net (acce

Joy Fang 21 Aug 16, 2022
MazeRL is an application oriented Deep Reinforcement Learning (RL) framework

MazeRL is an application oriented Deep Reinforcement Learning (RL) framework, addressing real-world decision problems. Our vision is to cover the complete development life cycle of RL applications ra

EnliteAI GmbH 222 Dec 24, 2022
Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

pgmpy pgmpy is a python library for working with Probabilistic Graphical Models. Documentation and list of algorithms supported is at our official sit

pgmpy 2.2k Jan 03, 2023
Temporal Segment Networks (TSN) in PyTorch

TSN-Pytorch We have released MMAction, a full-fledged action understanding toolbox based on PyTorch. It includes implementation for TSN as well as oth

1k Jan 03, 2023
Official Pytorch implementation of Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations

Scene Representation Networks This is the official implementation of the NeurIPS submission "Scene Representation Networks: Continuous 3D-Structure-Aw

Vincent Sitzmann 365 Jan 06, 2023
Official code for the CVPR 2021 paper "How Well Do Self-Supervised Models Transfer?"

How Well Do Self-Supervised Models Transfer? This repository hosts the code for the experiments in the CVPR 2021 paper How Well Do Self-Supervised Mod

Linus Ericsson 157 Dec 16, 2022
LSTM-VAE Implementation and Relevant Evaluations

LSTM-VAE Implementation and Relevant Evaluations Before using any file in this repository, please create two directories under the root directory name

Lan Zhang 5 Oct 08, 2022
Official PyTorch implementation of Segmenter: Transformer for Semantic Segmentation

Segmenter: Transformer for Semantic Segmentation Segmenter: Transformer for Semantic Segmentation by Robin Strudel*, Ricardo Garcia*, Ivan Laptev and

594 Jan 06, 2023
This is a simple backtesting framework to help you test your crypto currency trading. It includes a way to download and store historical crypto data and to execute a trading strategy.

You can use this simple crypto backtesting script to ensure your trading strategy is successful Minimal setup required and works well with static TP a

Andrei 154 Sep 12, 2022
PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Cross-Speaker-Emotion-Transfer - PyTorch Implementation PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Conditio

Keon Lee 114 Jan 08, 2023
YOLOv5 + ROS2 object detection package

YOLOv5-ROS YOLOv5 + ROS2 object detection package This program changes the input of detect.py (ultralytics/yolov5) to sensor_msgs/Image of ROS2. Requi

Ar-Ray 23 Dec 19, 2022
Official repository for the paper F, B, Alpha Matting

FBA Matting Official repository for the paper F, B, Alpha Matting. This paper and project is under heavy revision for peer reviewed publication, and s

Marco Forte 404 Jan 05, 2023
Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation (ICCV 2021)

Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation Home | PyTorch BigGAN Discovery | TensorFlow ProGAN Regulariza

Yuxiang Wei 54 Dec 30, 2022
Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks"

ON-LSTM This repository contains the code used for word-level language model and unsupervised parsing experiments in Ordered Neurons: Integrating Tree

Yikang Shen 572 Nov 21, 2022
ncnn is a high-performance neural network inference framework optimized for the mobile platform

ncnn ncnn is a high-performance neural network inference computing framework optimized for mobile platforms. ncnn is deeply considerate about deployme

Tencent 16.2k Jan 05, 2023
Trainable PyTorch reproduction of AlphaFold 2

OpenFold A faithful PyTorch reproduction of DeepMind's AlphaFold 2. Features OpenFold carefully reproduces (almost) all of the features of the origina

AQ Laboratory 1.7k Dec 29, 2022
This is an official pytorch implementation of Fast Fourier Convolution.

Fast Fourier Convolution (FFC) for Image Classification This is the official code of Fast Fourier Convolution for image classification on ImageNet. Ma

pkumi 199 Jan 03, 2023