MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

We propose a benchmark to evaluate different quantization algorithms on various settings. MQBench is a first attempt to evaluate, analyze, and benchmark the reproducibility and deployability for model quantization algorithms. We choose multiple different platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms under a unified training pipeline. MQBench acts like a bridge to connect the algorithm and the hardware. We conduct a comprehensive analysis and find considerable intuitive or counter-intuitive insights.

Table of Contents

Highlighted Features


These instructions will help get MQBench up.

  1. Clone MQBench.

  2. (Optionally) Create a Python virtual environment.

  3. Install the MQBench-required packages

    $ pip install -r requirements.txt

    Notes: MQBench uses Pytorch-1.8, our quantized model is based on the new torch.fx tracing techniques.

  4. MQBench use the Pytorch distributed data-parallel training with nccl backend (see details here), please make sure your machine can initailize that distributed learning environment.

How to Reproduce MQBench

We provide the running scripts and configuration file config.yaml of all experiments in MQBench.

To reproduce LSQ on ResNet-18,

  1. enter the directory

    $ cd PATH-TO-PROJECT/qbench_zoo
    $ cd lsq_experiments/resnet18_4bit_academic
  2. run script

    $ sh

    Note that contain some commands that may not be found, the core running command is

    python -u -m prototype.solver.cls_quant_solver --config config.yaml

How to self-implement a quantization algorithm

All our quantization algorithms are implemented in prototype/quantization/

To implementa a new algorithm, you need to add you quantizer into this directory.

All quantizer are inheritant from QuantizeBase class. Each QuantizedBase will have an observer class which is used to estimate/update the quantization range. The observer design is inspired from the Pytorch-1.8 repo. Intializing a QuantizeBase class will also initialize a Observer class.

The parameters contained for QuantizeBase and Observer include:

  1. quant_min, quant_max, which specify the $N_{min}, N_{max}$ for rounding boundaries.
  2. qshcme, which can be torch.per_tensor_symmetric, torch.per_channel_symmetric, torch.per_tensor_affine, and torch.per_channel_affine. This is often determined by the hardware setup.
  3. ch_axis, which is the dimension of channel-wise quantization. -1 is for per-tensor quantization. Typically for nn.Conv2d and nn.Linear module, the ch_axis should be 0.
  4. ada_sign, which can adaptively choose the signness. ada_sign should be enabled for academic setting only.
  5. pot_scale, which is used to determine the powers-of-two scale parameters.

Note: each specified quantizer may have its own unique parameters, see example of LSQ below.

Example Implementation of LSQ:

  1. For initialization, we add new parameters for storing the scale, zero_point:

    self.use_grad_scaling = use_grad_scaling
    self.scale = Parameter(torch.tensor([scale]))
    self.zero_point = Parameter(torch.tensor([zero_point]))
  2. The major implementation is the forward function, which should contain several cases:

    1. In case of ada_sign=True, the quantization range should be adjusted.

      if self.ada_sign and X.min() >= 0:
        	self.quant_max = self.activation_post_process.quant_max = 2 ** self.bitwidth - 1
        	self.quant_min = self.activation_post_process.quant_min = 0
        	self.activation_post_process.adjust_sign = True
    2. In case of symmetric quantization, the zero point should set to 0.
    3. In case of powers-of-two scale, the scale should be quantized by:

      def pot_quantization(tensor: torch.Tensor):
          log2t = torch.log2(tensor)
          log2t = (torch.round(log2t)-log2t).detach() + log2t
          return 2 ** log2t
      scale = pot_quantization(self.scale)
    4. Implement both per-channel and per-tensor quantization.

After adding you quantizer...

The next step is to register the quantizer in prototype/quantization/

Import your quantizer and then add it to get_qconfig function, and parse necessary arguments.

The final step is to override a config.yaml file:

    w_method: lsq
    a_method: lsq
    bit: 4

backend: academic
bnfold: 4

By replacing the w_method, a_method, you can run your implementation.

Note: the rest of the config file should not be modified in order to keep a unified training setting.

How to self-implement a hardware configuration

Adding a new setting in hardware is much simpler that algorithms. To do this, we can add another condition in the if-else selection. For example, adding a new hardware TFLite Micro:

        elif backend == "tflitemicro":
            backend_params = dict(ada_sign=False, symmetry=True, per_channel=False, pot_scale=True)

    model_qconfig = get_qconfig(**self.qparams, **backend_params)
    model = quantize_fx.prepare_qat_fx(model, {"": model_qconfig}, foldbn_config)

Submitting Your Results to MQBench

You can submit your implementation to MQBench by submmitting a merge request to this repo. The implementation of new algorithms and the running scripts, log file are needed for evalutation.


This project is under Apache 2.0 License.

  • Deploy之前想保存量化的pth模型,torch.save失败



    File "/opt/conda/lib/python3.8/site-packages/torch/", line 379, in save _save(obj, opened_zipfile, pickle_module, pickle_protocol) File "/opt/conda/lib/python3.8/site-packages/torch/", line 484, in _save pickler.dump(obj) AttributeError: Can't pickle local object 'ObserverBase.__init__.<locals>.PerChannelLoadHook'

    opened by wangshankun 13
  • 基于最新mqbench对yolox进行量化,选择backbend=tengine_u8时报错:AttributeError: 'dict' object has no attribute 'detach'

    基于最新mqbench对yolox进行量化,选择backbend=tengine_u8时报错:AttributeError: 'dict' object has no attribute 'detach'

    使用UP框架基于最新mqbench对yolox进行QAT训练,选择backbend=tengine_u8 时报错:AttributeError: 'dict' object has no attribute 'detach'


    num_classes: &num_classes 13
      aligned: true
        # async_norm: True
      special_bn_init: true
      task_names: quant_det
        type: quant
      quant_type: qat
      deploy_backend: Tengine_u8
      cali_batch_size: 900
          w_observer: MinMaxObserver
          a_observer: EMAMinMaxObserver
          w_fakequantize: FixedFakeQuantize
          a_fakequantize: FixedFakeQuantize
        leaf_module: [Space2Depth, FrozenBatchNorm2d]
          additional_module_type: [ConvFreezebn2d, ConvFreezebnReLU2d]
      type: yolox_mixup_cv2
        extra_input: true
        input_size: [640, 640]
        mixup_scale: [0.8, 1.6]
        fill_color: 0
      type: mosaic
        extra_input: true
        tar_size: 640
        fill_color: 0
      type: random_perspective_yolox
        degrees: 10.0 # 0.0
        translate: 0.1
        scale: [0.1, 2.0] # 0.5
        shear: 2.0 # 0.0
        perspective: 0.0
        fill_color: 0  # 0
        border: [-320, -320]
      type: augment_hsv
        hgain: 0.015
        sgain: 0.7
        vgain: 0.4
        color_mode: BGR
      type: flip
        flip_p: 0.5
    to_tensor: &to_tensor
      type: custom_to_tensor
    train_resize: &train_resize
      type: keep_ar_resize_max
        max_size: 640
        random_size: [15, 25]
        scale_step: 32
        padding_type: left_top
        padding_val: 0
    test_resize: &test_resize
      type: keep_ar_resize_max
        max_size: 640
        padding_type: left_top
        padding_val: 0
          type: coco
            meta_file: train.json
              type: fs_opencv
                image_dir: &img_root /images/
                color_mode: BGR
            transformer: [*train_resize, *to_tensor]
          type: base
              type: dist
              kwargs: {}
            batch_size: 4
          type: coco
            meta_file: &gt_file val.json
              type: fs_opencv
                image_dir: *img_root
                color_mode: BGR
            transformer: [*test_resize, *to_tensor]
              type: COCO
                gt_file: *gt_file
                iou_types: [bbox]
          type: base
              type: dist
              kwargs: {}
            batch_size: 4
        type: base
          num_workers: 4
          alignment: 32
          worker_init: true
          pad_type: batch_pad
    trainer: # Required.
      max_epoch: &max_epoch 6             # total epochs for the training
      save_freq: 1
      test_freq: 1
      only_save_latest: false
      optimizer:                 # optimizer = SGD(params,lr=0.01,momentum=0.937,weight_decay=0.0005)
        register_type: yolov5
        type: SGD
          lr: 0.0000003125
          momentum: 0.9
          nesterov: true
          weight_decay: 0.0      # weight_decay = 0.0005 * batch_szie / 64
      lr_scheduler:              # lr_scheduler = MultStepLR(optimizer, milestones=[9,14],gamma=0.1)
        warmup_epochs: 0        # set to be 0 to disable warmup. When warmup,  target_lr = init_lr * total_batch_size
        warmup_type: linear
        warmup_ratio: 0.001
        type: MultiStepLR
          milestones: [2, 4]     # epochs to decay lr
          gamma: 0.1             # decay rate
      save_dir: checkpoints/yolox_s_ret_a1_comloc_quant_tengine
      results_dir: results_dir/yolox_s_ret_a1_comloc_quant_tengine
      resume_model: /United-Perception/train_config/pretrain/300_65_ckpt_best.pth
      auto_resume: True
      enable: false
      ema_type: exp
        decay: 0.9998
    - name: backbone
      type: yolox_s
        out_layers: [2, 3, 4]
        out_strides: [8, 16, 32]
        normalize: {type: mqbench_freeze_bn}
        act_fn: {type: Silu}
    - name: neck
      prev: backbone
      type: YoloxPAFPN
        depth: 0.33
        out_strides: [8, 16, 32]
        normalize: {type: mqbench_freeze_bn}
        act_fn: {type: Silu}
    - name: roi_head
      prev: neck
      type: YoloXHead
        num_classes: *num_classes
        width: 0.5
        num_point: &dense_points 1
        normalize: {type: mqbench_freeze_bn}
        act_fn: {type: Silu}
    - name: post_process
      prev: roi_head
      type: retina_post_iou
        num_classes: *num_classes
                                      # number of classes including backgroudn. for rpn, it's 2; for RetinaNet, it's 81
            type: quality_focal_loss
              gamma: 2.0
            type: sigmoid_cross_entropy
            type: compose_loc_loss
              - type: iou_loss
                  loss_type: giou
                  loss_weight: 1.0
              - type: l1_loss
                  loss_weight: 1.0
            type: hand_craft
              anchor_ratios: [1]    # anchor strides are provided as feature strides by feature extractor
              anchor_scales: [4]   # scale of anchors relative to feature map
            type: atss
              top_n: 9
              use_iou: true
            type: base
              pre_nms_score_thresh: 0.05    # to reduce computation
              pre_nms_top_n: 1000
              post_nms_top_n: 1000
              roi_min_size: 0                 # minimum scale of a valid roi
                type: retina
                  top_n: 100
                    type: naive
                    nms_iou_thresh: 0.65


    [MQBENCH] INFO: Enable observer and Disable quantize for act_fake_quant
    [MQBENCH] INFO: Enable observer and Disable quantize for act_fake_quant
    [MQBENCH] INFO: Enable observer and Disable quantize for act_fake_quant
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.8/", line 194, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/opt/conda/lib/python3.8/", line 87, in _run_code
        exec(code, run_globals)
      File "/data/lsc/United-Perception/up/", line 27, in <module>
      File "/data/lsc/United-Perception/up/", line 21, in main
      File "/data/lsc/United-Perception/up/commands/", line 144, in _main
        launch(main, args.num_gpus_per_machine, args.num_machines, args=args, start_method=args.fork_method)
      File "/data/lsc/United-Perception/up/utils/env/", line 52, in launch
      File "/opt/conda/lib/python3.8/site-packages/torch/multiprocessing/", line 188, in start_processes
        while not context.join():
      File "/opt/conda/lib/python3.8/site-packages/torch/multiprocessing/", line 150, in join
        raise ProcessRaisedException(msg, error_index,
    -- Process 3 terminated with the following error:
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.8/site-packages/torch/multiprocessing/", line 59, in _wrap
        fn(i, *args)
      File "/data/lsc/United-Perception/up/utils/env/", line 117, in _distributed_worker
      File "/data/lsc/United-Perception/up/commands/", line 134, in main
        runner = RUNNER_REGISTRY.get(runner_cfg['type'])(cfg, **runner_cfg['kwargs'])
      File "/data/lsc/United-Perception/up/tasks/quant/runner/", line 17, in __init__
        super(QuantRunner, self).__init__(config, work_dir, training)
      File "/data/lsc/United-Perception/up/runner/", line 59, in __init__
      File "/data/lsc/United-Perception/up/tasks/quant/runner/", line 34, in build
      File "/data/lsc/United-Perception/up/tasks/quant/runner/", line 182, in calibrate
      File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/data/lsc/United-Perception/up/tasks/quant/models/", line 76, in forward
        output = submodule(input)
      File "/opt/conda/lib/python3.8/site-packages/torch/fx/", line 308, in wrapped_call
        return cls_call(self, *args, **kwargs)
      File "/opt/conda/lib/python3.8/site-packages/torch/fx/", line 308, in wrapped_call
        return cls_call(self, *args, **kwargs)
      File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "<eval_with_key_2>", line 4, in forward
        input_1_post_act_fake_quantizer = self.input_1_post_act_fake_quantizer(input_1);  input_1 = None
      File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/data/lsc/United-Perception/MQBench/mqbench/fake_quantize/", line 20, in forward
    AttributeError: 'dict' object has no attribute 'detach'


    opened by RedHandLM 11
  • Hi, will export to QLinear save weights in int8?

    Hi, will export to QLinear save weights in int8?

    Using tensorrt backend, will QLinear make the onnx model smaller? I got some error when trying to save to QLinear:

    deploy/", line 138, in optimize_model
        assert node_detect, "Graph is illegel, error occured!"
    AssertionError: Graph is illegel, error occured!
    opened by jinfagang 10
  • how to use in mmdet build model

    how to use in mmdet build model

    when use mmdet build this model, it will like: object {
    module list aaaa module list bbb } when use prepare_by_platform to trace will get error like: TypeError: 'xxxobject' object does not support indexing

    opened by 791136190 10
  • how to ptq for Faster RCNN or SSD?

    how to ptq for Faster RCNN or SSD?

    From QDROP paper,i notice the benchmark result include Faster RCNN; image

    Could you provide this examples?

    In addition, it's best to provide PTQ of SSD,another import object detection network;

    opened by wangshankun 9
  • onnx inference

    onnx inference

    Hello. Finish model translate to onnx-quant, however cant use onnx-runtime to inference. error log No Op registered for LearnablePerTensorAffine with domain_version of 11

    opened by www516717402 9
  • 多个不同scale的输入,量化影响了结果


    任务的模型有两个输入,一个是image,经过backbone后得到image features,另一个是输入其他detection模型检测得到的bbox 坐标,坐标经过了归一化,是0~1之间的float值。坐标经过线性层以及卷积层的上采样,结果与image features做concat。使用MQBench量化后,发现INT8的推理结果,对于head1精度很高,但是head2有明显的精度损失。 网络定义如下:



    opened by zhouyang1989 8
  • DDP multi-gpu training issues with Imagenet example

    DDP multi-gpu training issues with Imagenet example

    I am trying to use multi-gpu QAT training using Imagenet example code. It runs into issue after first iteration training update.

    RuntimeError: grad.numel() == bucket_view.numel() INTERNAL ASSERT FAILED at "/pytorch/torch/lib/c10d/reducer.cpp":343, please report a bug to PyTorch.

    The code works fine with multi-gpu training if I comment the wrapper code that quantize the original model i.e., model=prepare_by_platform(model, args.backend). Did anyone encounter the same issue?

    opened by kartikgupta-at-anu 7
  • KeyError for Adaround、Qdrop

    KeyError for Adaround、Qdrop

    When I use the MQBench to quant RLFN model with Qdrop、adaround, some errors have occurred. env: Ubuntu18.04,cuda11.1, MQbench version: e2175203c8e62596e66500a720a6cb1d1fc1dacd RLFN is a super resolution model from:, the model id is 4. image

    error: [MQBENCH] INFO: Disable observer and Disable quantize. [MQBENCH] INFO: Disable observer and Enable quantize. [MQBENCH] INFO: prepare layer reconstruction for fea_conv [MQBENCH] INFO: the node list is below! [MQBENCH] INFO: [input_1_post_act_fake_quantizer, fea_conv, fea_conv_post_act_fake_quantizer_2] Traceback (most recent call last): File "", line 158, in main() File "", line 137, in main model = ptq_reconstruction(model, stacked_tensor, EasyDict(ptq_reconstruction_config)) File ".../mqbench/", line 636, in ptq_reconstruction fp32_module = fp32_modules[qnode2fpnode_dict[layer_node_list[-1]]] KeyError: fea_conv_post_act_fake_quantizer_2

    Here is my code tracking and analysis

    (1)mode.code def forward(self, input): input_1 = input input_1_post_act_fake_quantizer = self.input_1_post_act_fake_quantizer(input_1); input_1 = None fea_conv = self.fea_conv(input_1_post_act_fake_quantizer); input_1_post_act_fake_quantizer = None fea_conv_post_act_fake_quantizer_2 = self.fea_conv_post_act_fake_quantizer(fea_conv) fea_conv_post_act_fake_quantizer_1 = self.fea_conv_post_act_fake_quantizer(fea_conv) fea_conv_post_act_fake_quantizer = self.fea_conv_post_act_fake_quantizer(fea_conv); fea_conv = None ... (2)"problems" 问题,quant model node.target多对1,导致quant_named_nodes缺少keys: mqbench/》qnode2fpnode(quant_modules, fp32_modules): def qnode2fpnode(quant_modules, fp32_modules): quant_named_nodes = { node for node in quant_modules} """ node:fea_conv_post_act_fake_quantizer_2 node:fea_conv_post_act_fake_quantizer_1 """ fp32_named_nodes = { node for node in fp32_modules} qnode2fpnode_dict = {quant_named_nodes[key]: fp32_named_nodes[key] for key in quant_named_nodes} return qnode2fpnode_dict

    I am not familiar with the process of trained PTQ, so looking forward to your suggestions and Solutions.

    opened by feixiang7701 7
  • MQBench的结果与SNPE DSP的结果不是位精确的

    MQBench的结果与SNPE DSP的结果不是位精确的


    环境 pytorch: 1.8.1 MQBench: branch main, e2175203 SNPE: snpe-

    问题: 我用一个只有两层卷积模型做了一个简单的测试,比对MQBench 量化后的结果和SNPE DSP的结果,发现并不是位精确的,请问一下这是否是正常的,我是否有哪里做错了。


    • MQBench量化
    def seed_torchv2(seed: int = 42) -> None:
        os.environ['PYTHONHASHSEED'] = str(seed)
        torch.backends.cudnn.benchmark = False
        torch.backends.cudnn.deterministic = True
    class Net(nn.Module):
        def __init__(self):
            super(Net, self).__init__()
            self.avg_pool = nn.AdaptiveAvgPool2d(1)
            self.conv = nn.Conv2d(3, 128,1,1, bias=True)
            self.conv2 = nn.Conv2d(128, 20,1,1,bias=True)
            self.relu = nn.ReLU()
            self.flat = nn.Flatten(1)
        def forward(self, x): # (1,3,20,20)
            x = self.avg_pool(x)
            x = self.conv(x)
            x = self.conv2(x)
            x = self.flat(x)
            return x
    SIZE = 20
    backend = BackendType.SNPE
    np.set_printoptions(suppress=True, precision=6)
    def gen_input_data(length=100):
        data = []
        for _ in range(length):
            data.append(np.ones((1,3,SIZE,SIZE), dtype=np.float32) * 0.1 * np.random.randint(0, 10))
        return np.stack(data, axis=0)
    model = Net()          # use vision pre-defined model
    train_data = gen_input_data(100)
    dummy_input = np.zeros((1,3,SIZE,SIZE), dtype=np.float32) + 0.5
    print("pytorch fp32 result")
    # quant
    model = prepare_by_platform(model, backend)
    for i, d in enumerate(train_data):
        _ = model(torch.from_numpy(d).float())
    print("quant sim result")
    input_shape = {"image":[1,3,SIZE,SIZE]}
    convert_deploy(model, backend, input_shape)
    # save dummy input and test it on DSP
    image = dummy_input.copy()
    assert image.shape == (1,3,SIZE,SIZE)
    assert image.dtype == np.float32
    print("#" * 50)
    pytorch fp32 result
    tensor([[-0.347889, -0.289117, -0.083191, -0.222827,  0.124699,  0.235278,
              0.434433, -0.302174, -0.047763,  0.229472, -0.037784,  0.082496,
             -0.150852, -0.170281,  0.130777,  0.146441, -0.494992, -0.182881,
              0.600709, -0.063706]], grad_fn=<ViewBackward>)
    quant sim result
    tensor([[-0.344930, -0.290467, -0.081694, -0.222389,  0.131618,  0.231466,
              0.435701, -0.299544, -0.049924,  0.226927, -0.036308,  0.081694,
             -0.149772, -0.172465,  0.131618,  0.149772, -0.494702, -0.181542,
              0.599088, -0.063540]], grad_fn=<ViewBackward>
    • DLC转换 ./snpe-onnx-to-dlc --input_network mqbench_qmodel_deploy_model.onnx --output_path tmp.dlc --quantization_overrides mqbench_qmodel_clip_ranges.json ./snpe-dlc-quantize --input_dlc tmp.dlc --input_list tmp_file.txt --output_dlc tmp_quat_mq.dlc --override_params --bias_bitwidth 32 tmp_file.txt和tmp_file_android.txt都只有一个文件就是tmp.raw,tmp.raw在上面python程序里面保存下来为一个3x20x20的float文件

    • SNPE DSP run ./snpe-net-run --container /sdcard/tmp_quat_mq.dlc --input_list /sdcard/tmp_file_android.txt --use_dsp

    ################################################## 74.raw (20,) [-0.34493 -0.285929 -0.081694 -0.222389 0.127079 0.236005 0.435701 -0.299544 -0.049924 0.226927 -0.036308 0.081694 -0.149772 -0.172465 0.131618 0.149772 -0.490163 -0.177003 0.599088 -0.068078]

    比对quant sim result 和 DSP 的结果,可以看到粗斜体是二者不一致的地方

    good first issue Stale 
    opened by changewOw 7
  • Train with PACT but the value for cliping weights and activations which denoted as `alpha` seems not change.

    Train with PACT but the value for cliping weights and activations which denoted as `alpha` seems not change.

    the value for cliping weights and activations which denoted as alpha is initialized to 6.0, In my opinion, this value should be updated during training, but I found it not, I am training with the imagenet_example just adding such following configs to make PACT working.

    if args.quant:
            extra_params = {
                'extra_qconfig_dict': {
                    'w_observer': "MinMaxObserver",
                    'a_observer': "EMAMinMaxObserver",
                    'w_fakequantize': "PACTFakeQuantize",
                    'a_fakequantize': "PACTFakeQuantize",
                    'a_fakeq_params': {},
                    'w_qscheme': {
                        'bit': 8,
                        'symmetry': True,
                        'per_channel': False,
                        'pot_scale': False
                    'a_qscheme': {
                        'bit': 8,
                        'symmetry': True,
                        'per_channel': False,
                        'pot_scale': False
                'extra_quantizer_dict': {},
                'preserve_attr': {},
                'concrete_args': {},
                'extra_fuse_dict': {}
            print("==> config with extra params", extra_params)
            model = prepare_by_platform(model, args.backend, extra_params)
    opened by jianyin2016 7
  • 关于使用ONNX-QNN在生成Deploy模型出现的问题



                extra_qconfig_dict = {
                    'w_observer': 'ClipStdObserver',
                    'a_observer': 'ClipStdObserver',
                    'w_fakequantize': 'DSQFakeQuantize',
                    'a_fakequantize': 'DSQFakeQuantize',
                    'w_qscheme': {
                        'bit': 8,
                        'symmetry': True,
                        'per_channel': False,
                        'pot_scale': True
                    'a_qscheme': {
                        'bit': 8,
                        'symmetry': True,
                        'per_channel': False,
                        'pot_scale': True
                prepare_custom_config_dict = {
                    'extra_qconfig_dict': extra_qconfig_dict
               self.model = prepare_by_platform(self.model, BackendType.ONNX_QNN, prepare_custom_config_dict)


      File "", line 411, in train
        convert_deploy(self.model, BackendType.ONNX_QNN, input_shape, model_name = 'model_QNN')
      File "MQBench-0.0.6-py3.9.egg/mqbench/", line 184, in convert_deploy
        convert_function(deploy_model, **kwargs)
      File "MQBench-0.0.6-py3.9.egg/mqbench/", line 138, in deploy_qparams_tvm
      File "MQBench-0.0.6-py3.9.egg/mqbench/deploy/", line 273, in run
      File "MQBench-0.0.6-py3.9.egg/mqbench/deploy/", line 258, in format_qlinear_dtype_pass
        scale, zero_point, qmin, qmax = node.input[1], node.input[2], node.input[3], node.input[4]
    IndexError: list index (3) out of range
    opened by Zhoukai1234 1
  • The QAT top1@acc of mobilenet_v2 a4w4 LSQ cannot be reproduced as the paper shown 70.6%.

    The QAT [email protected] of mobilenet_v2 a4w4 LSQ cannot be reproduced as the paper shown 70.6%.

    Hi, thanks for providing this amazing quantization framework ! I want to reproduce the [email protected] of mobilenet_v2 a4w4 LSQ under academic setting. The quantization configuration is as below:

     w_qscheme=QuantizeScheme(symmetry=True, per_channel=True, pot_scale=False, bit=4, symmetric_range=False, p=2.4),
                                     a_qscheme=QuantizeScheme(symmetry=True, per_channel=False, pot_scale=False, bit=4, symmetric_range=False, p=2.4),

    For the training strategy, I set weght decay=0, lr = 1e-3 and batch_size=128 per GPU using 8 cards Nvidia A100. And the adjust_learning_rate strategy is remained the same as However, the highest [email protected] I reproduced in the validation set was only 68.66%, which is far from the 70.6% as the paper presented.

    Which part I have missed ?

    opened by LuletterSoul 0
  • 关于yolov5s进行PTQ量化出现TraceError问题


    嗨 大家好,




    from mqbench.prepare_by_platform import prepare_by_platform, BackendType
    backend = BackendType.ONNX_QNN
    model = prepare_by_platform(model, backend)


    torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow


    opened by xiaopengaia 2
  • 关于如何将conv和bn层进行合并的问题


    嗨 大家好,










    opened by xiaopengaia 2
