quantize aware training package for NCNN on pytorch

Related tags

Deep Learningncnnqat
Overview

ncnnqat

ncnnqat is a quantize aware training package for NCNN on pytorch.

Table of Contents

Installation

  • Supported Platforms: Linux

  • Accelerators and GPUs: NVIDIA GPUs via CUDA driver 10.1.

  • Dependencies:

    • python >= 3.5, < 4
    • pytorch >= 1.6
    • numpy >= 1.18.1
    • onnx >= 1.7.0
    • onnx-simplifier >= 0.3.6
  • Install ncnnqat via pypi:

    $ pip install ncnnqat (to do....)

    It is recommended to install from the source code

  • or Install ncnnqat via repo:

    $ git clone https://github.com/ChenShisen/ncnnqat
    $ cd ncnnqat
    $ make install

Usage

  • register_quantization_hook and merge_freeze_bn

    (suggest finetuning from a well-trained model, do it after a few epochs of training otherwise.)

    from ncnnqat import unquant_weight, merge_freeze_bn, register_quantization_hook
    ...
    ...
        for epoch in range(epoch_train):
            model.train()
        if epoch==well_epoch:
            register_quantization_hook(model)
        if epoch>=well_epoch:
            model = merge_freeze_bn(model)  #it will change bn to eval() mode during training
    ...
  • Unquantize weight before update it

    ...
    ... 
        if epoch>=well_epoch:
            model.apply(unquant_weight)  # using original weight while updating
        optimizer.step()
    ...
  • Save weight and save ncnn quantize table after train

    ...
    ...
        onnx_path = "./xxx/model.onnx"
        table_path="./xxx/model.table"
        dummy_input = torch.randn(1, 3, img_size, img_size, device='cuda')
        input_names = [ "input" ]
        output_names = [ "fc" ]
        torch.onnx.export(model, dummy_input, onnx_path, verbose=False, input_names=input_names, output_names=output_names)
        save_table(model,onnx_path=onnx_path,table=table_path)
    
    ...

    if use "model = nn.DataParallel(model)",pytorch unsupport torch.onnx.export,you should save state_dict first and prepare a new model with one gpu,then you will export onnx model.

    ...
    ...
        model_s = new_net() #
        model_s.cuda()
        register_quantization_hook(model_s)
        #model_s = merge_freeze_bn(model_s)
        onnx_path = "./xxx/model.onnx"
        table_path="./xxx/model.table"
        dummy_input = torch.randn(1, 3, img_size, img_size, device='cuda')
        input_names = [ "input" ]
        output_names = [ "fc" ]
        model_s.load_state_dict({k.replace('module.',''):v for k,v in model.state_dict().items()}) #model_s = model     model = nn.DataParallel(model)
              
        torch.onnx.export(model_s, dummy_input, onnx_path, verbose=False, input_names=input_names, output_names=output_names)
        save_table(model_s,onnx_path=onnx_path,table=table_path)
        
    
    ...

Code Examples

Cifar10 quantization aware training example.

python test/test_cifar10.py

SSD300 quantization aware training example.

   ln -s /your_coco_path/coco ./tests/ssd300/data
   python -m torch.distributed.launch \
    --nproc_per_node=4 \
    --nnodes=1 \
    --node_rank=0 \
    ./tests/ssd300/main.py \
    -d ./tests/ssd300/data/coco
    python ./tests/ssd300/main.py --onnx_save  #load model dict, export onnx and ncnn table

Results

  • Cifar10

    result:

    net fp32(onnx) ncnnqat ncnn aciq ncnn kl
    mobilenet_v2 0.91 0.9066 0.9033 0.9066
    resnet18 0.94 0.93333 0.9367 0.937
  • SSD300(resnet18|coco)

    fp32:
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.193
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.344
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.191
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.042
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.195
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.328
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.199
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.293
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.309
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.084
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.326
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.501
    Current AP: 0.19269
    
    ncnnqat:
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.192
     Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.342
     Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.194
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.041
     Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.194
     Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.327
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.197
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.291
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.307
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.082
     Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.325
     Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.497
    Current AP: 0.19202
    

Todo

....

Alignment Attention Fusion framework for Few-Shot Object Detection

AAF framework Framework generalities This repository contains the code of the AAF framework proposed in this paper. The main idea behind this work is

Pierre Le Jeune 20 Dec 16, 2022
A Tensorflow implementation of the Text Conditioned Auxiliary Classifier Generative Adversarial Network for Generating Images from text descriptions

A Tensorflow implementation of the Text Conditioned Auxiliary Classifier Generative Adversarial Network for Generating Images from text descriptions

Ayushman Dash 93 Aug 04, 2022
Steerable discovery of neural audio effects

Steerable discovery of neural audio effects Christian J. Steinmetz and Joshua D. Reiss Abstract Applications of deep learning for audio effects often

Christian J. Steinmetz 182 Dec 29, 2022
Official PyTorch implementation of "AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks"

AASIST This repository provides the overall framework for training and evaluating audio anti-spoofing systems proposed in 'AASIST: Audio Anti-Spoofing

Clova AI Research 56 Jan 02, 2023
Uses OpenCV and Python Code to detect a face on the screen

Simple-Face-Detection This code uses OpenCV and Python Code to detect a face on the screen. This serves as an example program. Important prerequisites

Denis Woolley (CreepyD) 1 Feb 12, 2022
DeepFaceLab fork which provides IPython Notebook to use DFL with Google Colab

DFL-Colab — DeepFaceLab fork for Google Colab This project provides you IPython Notebook to use DeepFaceLab with Google Colaboratory. You can create y

779 Jan 05, 2023
Unity Propagation in Bayesian Networks Handling Inconsistency via Unity Smoothing

This repository contains the scripts needed to generate the results from the paper Unity Propagation in Bayesian Networks Handling Inconsistency via U

0 Jan 19, 2022
Deep Networks with Recurrent Layer Aggregation

RLA-Net: Recurrent Layer Aggregation Recurrence along Depth: Deep Networks with Recurrent Layer Aggregation This is an implementation of RLA-Net (acce

Joy Fang 21 Aug 16, 2022
This is the official Pytorch-version code of FlatGCN (Flattened Graph Convolutional Networks for Recommendation).

FlatGCN This is the official Pytorch-version code of FlatGCN (Flattened Graph Convolutional Networks for Recommendation, submitted to ICASSP2022). Req

Dreamer 2 Aug 09, 2022
RipsNet: a general architecture for fast and robust estimation of the persistent homology of point clouds

RipsNet: a general architecture for fast and robust estimation of the persistent homology of point clouds This repository contains the code asscoiated

Felix Hensel 14 Dec 12, 2022
WeakVRD-Captioning - Implementation of paper Improving Image Captioning with Better Use of Caption

WeakVRD-Captioning - Implementation of paper Improving Image Captioning with Better Use of Caption

30 Oct 28, 2022
The source code of the paper "SHGNN: Structure-Aware Heterogeneous Graph Neural Network"

SHGNN: Structure-Aware Heterogeneous Graph Neural Network The source code and dataset of the paper: SHGNN: Structure-Aware Heterogeneous Graph Neural

Wentao Xu 7 Nov 13, 2022
Posterior predictive distributions quantify uncertainties ignored by point estimates.

Posterior predictive distributions quantify uncertainties ignored by point estimates.

DeepMind 177 Dec 06, 2022
TensorFlow implementation for Bayesian Modeling and Uncertainty Quantification for Learning to Optimize: What, Why, and How

Bayesian Modeling and Uncertainty Quantification for Learning to Optimize: What, Why, and How TensorFlow implementation for Bayesian Modeling and Unce

Shen Lab at Texas A&M University 8 Sep 02, 2022
Animate molecular orbital transitions using Psi4 and Blender

Molecular Orbital Transitions (MOT) Animate molecular orbital transitions using Psi4 and Blender Author: Maximilian Paradiz Dominguez, University of A

3 Feb 01, 2022
3D cascade RCNN for object detection on point cloud

3D Cascade RCNN This is the implementation of 3D Cascade RCNN: High Quality Object Detection in Point Clouds. We designed a 3D object detection model

Qi Cai 22 Dec 02, 2022
A Dying Light 2 (DL2) PAKFile Utility for Modders and Mod Makers.

Dying Light 2 PAKFile Utility A Dying Light 2 (DL2) PAKFile Utility for Modders and Mod Makers. This tool aims to make PAKFile (.pak files) modding a

RHQ Online 12 Aug 26, 2022
This repo. is an implementation of ACFFNet, which is accepted for in Image and Vision Computing.

Attention-Guided-Contextual-Feature-Fusion-Network-for-Salient-Object-Detection This repo. is an implementation of ACFFNet, which is accepted for in I

5 Nov 21, 2022
A graph adversarial learning toolbox based on PyTorch and DGL.

GraphWar: Arms Race in Graph Adversarial Learning NOTE: GraphWar is still in the early stages and the API will likely continue to change. 🚀 Installat

Jintang Li 54 Jan 05, 2023
YOLTv5 rapidly detects objects in arbitrarily large aerial or satellite images that far exceed the ~600×600 pixel size typically ingested by deep learning object detection frameworks

YOLTv5 rapidly detects objects in arbitrarily large aerial or satellite images that far exceed the ~600×600 pixel size typically ingested by deep learning object detection frameworks.

Adam Van Etten 145 Jan 01, 2023