Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

Overview

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

This repository is official Tensorflow implementation of paper:

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning [paper link]

and Tensorflow 2 example code for
   "Custom layers", "Custom training loop", "XLA (JIT)-compiling", "Distributed learing", and "Gradients accumulator".

Paper abstract

Conventional NAS-based pruning algorithms aim to find the sub-network with the best validation performance. However, validation performance does not successfully represent test performance, i.e., potential performance. Also, although fine-tuning the pruned network to restore the performance drop is an inevitable process, few studies have handled this issue. This paper proposes a novel sub-network search and fine-tuning method, i.e., Ensemble Knowledge Guidance (EKG). First, we experimentally prove that the fluctuation of the loss landscape is an effective metric to evaluate the potential performance. In order to search a sub-network with the smoothest loss landscape at a low cost, we propose a pseudo-supernet built by an ensemble sub-network knowledge distillation. Next, we propose a novel fine-tuning that re-uses the information of the search phase. We store the interim sub-networks, that is, the by-products of the search phase, and transfer their knowledge into the pruned network. Note that EKG is easy to be plugged-in and computationally efficient. For example, in the case of ResNet-50, about 45% of FLOPS is removed without any performance drop in only 315 GPU hours.


Conceptual visualization of the goal of the proposed method.

Contribution points and key features

  • As a new tool to measure the potential performance of sub-network in NAS-based pruning, the smoothness of the loss landscape is presented. Also, the experimental evidence that the loss landscape fluctuation has a higher correlation with the test performance than the validation performance is provided.
  • The pseudo-supernet based on an ensemble sub-network knowledge distillation is proposed to find a sub-network of smoother loss landscape without increasing complexity. It helps NAS-based pruning to prune all pre-trained networks, and also allows to find optimal sub-network(s) more accurately.
  • To our knowledge, this paper provides the world-first approach to store the information of the search phase in a memory bank and to reuse it in the fine-tuning phase of the pruned network. The proposed memory bank contributes to greatly improving the performance of the pruned network.

Requirement

  • Tensorflow >= 2.7 (I have tested on 2.7-2.8)
  • Pickle
  • tqdm

How to run

  1. Move to the codebase.
  2. Train and evaluate our model by the below command.
  # ResNet-56 on CIFAR10
  python train_cifar.py --gpu_id 0 --arch ResNet-56 --dataset CIFAR10 --search_target_rate 0.45 --train_path ../test
  python test.py --gpu_id 0 --arch ResNet-56 --dataset CIFAR10 --trained_param ../test/trained_param.pkl

Experimental results


(Left) Potential performance vs. validation loss (right) Potential performance vs. condition number. 50 sub-networks of ResNet-56 trained on CIFAR10 were used for this experiment. accurately.


Visualization of loss landscapes of sub-networks searched by various filter importance scoring algorithms.

Comparison with various pruning techniques for ResNet family trained on ImageNet.


Performance analysis in case of ResNet-50 trained on ImageNet-2012. The left plot is the FLOPs reduction rate-Top-1 accuracy, and the right plot is the GPU hours-Top-1 accuracy.

Reference

@article{lee2022ensemble,
  title        = {Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning},
  author       = {Seunghyun Lee, Byung Cheol Song},
  year         = 2022,
  journal      = {arXiv preprint arXiv:2203.02651}
}

Owner
Seunghyun Lee
Knowledge distillation; Neural network light-weighting; Tensorflow
Seunghyun Lee
An index of recommendation algorithms that are based on Graph Neural Networks.

An index of recommendation algorithms that are based on Graph Neural Networks.

FIB LAB, Tsinghua University 564 Jan 07, 2023
GemNet model in PyTorch, as proposed in "GemNet: Universal Directional Graph Neural Networks for Molecules" (NeurIPS 2021)

GemNet: Universal Directional Graph Neural Networks for Molecules Reference implementation in PyTorch of the geometric message passing neural network

Data Analytics and Machine Learning Group 124 Dec 30, 2022
DockStream: A Docking Wrapper to Enhance De Novo Molecular Design

DockStream Description DockStream is a docking wrapper providing access to a collection of ligand embedders and docking backends. Docking execution an

AstraZeneca - Molecular AI 72 Jan 02, 2023
Extension to fastai for volumetric medical data

FAIMED 3D use fastai to quickly train fully three-dimensional models on radiological data Classification from faimed3d.all import * Load data in vari

Keno 26 Aug 22, 2022
code for Multi-scale Matching Networks for Semantic Correspondence, ICCV

MMNet This repo is the official implementation of ICCV 2021 paper "Multi-scale Matching Networks for Semantic Correspondence.". Pre-requisite conda cr

joey zhao 25 Dec 12, 2022
(JMLR'19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

Python Outlier Detection (PyOD) Deployment & Documentation & Stats Build Status & Coverage & Maintainability & License PyOD is a comprehensive and sca

Yue Zhao 6.6k Jan 03, 2023
Semantic graph parser based on Categorial grammars

Lambekseq "Everyone who failed Greek or Latin hates it." This package is for proving theorems in Categorial grammars (CG) and constructing semantic gr

10 Aug 19, 2022
Object-Centric Learning with Slot Attention

Slot Attention This is a re-implementation of "Object-Centric Learning with Slot Attention" in PyTorch (https://arxiv.org/abs/2006.15055). Requirement

Untitled AI 72 Jan 02, 2023
Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs

Implementation for the paper: Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs, Nurendra Choudhary, Nikhil Rao, Sumeet Ka

Nurendra Choudhary 8 Nov 15, 2022
HHP-Net: A light Heteroscedastic neural network for Head Pose estimation with uncertainty

HHP-Net: A light Heteroscedastic neural network for Head Pose estimation with uncertainty Giorgio Cantarini, Francesca Odone, Nicoletta Noceti, Federi

18 Aug 02, 2022
Inhomogeneous Social Recommendation with Hypergraph Convolutional Networks

Inhomogeneous Social Recommendation with Hypergraph Convolutional Networks This is our Pytorch implementation for the paper: Zirui Zhu, Chen Gao, Xu C

Zirui Zhu 3 Dec 30, 2022
Tutorials, assignments, and competitions for MIT Deep Learning related courses.

MIT Deep Learning This repository is a collection of tutorials for MIT Deep Learning courses. More added as courses progress. Tutorial: Deep Learning

Lex Fridman 9.5k Jan 07, 2023
CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces

CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces This is a repository for the following pape

17 Oct 13, 2022
Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21).

ACTION-Net Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21). Getting Started EgoGesture data folder struct

V-Sense 171 Dec 26, 2022
Virtual hand gesture mouse using a webcam

NonMouse 日本語のREADMEはこちら This is an application that allows you to use your hand itself as a mouse. The program uses a web camera to recognize your han

Yuki Takeyama 55 Jan 01, 2023
Laplacian Score-regularized Concrete Autoencoders

Laplacian Score-regularized Concrete Autoencoders Requirements: torch = 1.9 scikit-learn = 0.24 omegaconf = 2.0.6 scipy = 1.6.0 matplotlib How to

JS 6 Dec 07, 2022
XViT - Space-time Mixing Attention for Video Transformer

XViT - Space-time Mixing Attention for Video Transformer This is the official implementation of the XViT paper: @inproceedings{bulat2021space, title

Adrian Bulat 33 Dec 23, 2022
Tensorflow implementation of Swin Transformer model.

Swin Transformer (Tensorflow) Tensorflow reimplementation of Swin Transformer model. Based on Official Pytorch implementation. Requirements tensorflow

167 Jan 08, 2023
Implementation for Stankevičiūtė et al. "Conformal time-series forecasting", NeurIPS 2021.

Conformal time-series forecasting Implementation for Stankevičiūtė et al. "Conformal time-series forecasting", NeurIPS 2021. If you use our code in yo

Kamilė Stankevičiūtė 36 Nov 21, 2022
Python framework for Stochastic Differential Equations modeling

SDElearn: a Python package for SDE modeling This package implements functionalities for working with Stochastic Differential Equations models (SDEs fo

4 May 10, 2022