nn-Meter is a novel and efficient system to accurately predict the inference latency of DNN models on diverse edge devices

Last update: Dec 26, 2022

Overview

Note: This is an alpha (preview) version which is still under refining.

nn-Meter is a novel and efficient system to accurately predict the inference latency of DNN models on diverse edge devices. The key idea is dividing a whole model inference into kernels, i.e., the execution units of fused operators on a device, and conduct kernel-level prediction. We currently evaluate four popular platforms on a large dataset of 26k models. It achieves 99.0% (mobile CPU), 99.1% (mobile Adreno 640 GPU), 99.0% (mobile Adreno 630 GPU), and 83.4% (Intel VPU) prediction accuracy.

The current supported hardware and inference frameworks:

Device	Framework	Processor	+-10% Accuracy	Hardware name
Pixel4	TFLite v2.1	CortexA76 CPU	99.0%	cortexA76cpu_tflite21
Mi9	TFLite v2.1	Adreno 640 GPU	99.1%	adreno640gpu_tflite21
Pixel3XL	TFLite v2.1	Adreno 630 GPU	99.0%	adreno630gpu_tflite21
Intel Movidius NCS2	OpenVINO2019R2	Myriad VPU	83.4%	myriadvpu_openvino2019r2

nn-Meter has achieved the Mobisys 21 Best Paper Award! For more details, please check out paper:

nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices

Who should consider using nn-Meter

Those who want to get the DNN inference latency on mobile and edge devices with no deployment efforts on real devices.
Those who want to run hardware-aware NAS with NNI.
Those who want to build latency predictors for their own devices.

Installation

Currently nn-Meter has been tested on Linux and Windows system. Windows 10, Ubuntu 16.04 and 20.04 with python 3.6.10 are tested and supported. Please first install python3 before nn-Meter installation.

We haven't released this package yet, so development installation is required. To install the latest version of nn-Meter, users should install the package through source code. First git clone nn-Meter package to local:

git clone [email protected]:microsoft/nn-Meter.git
cd nn-Meter

Then simply run the following pip install in an environment that has python >= 3.6. The command will complete the automatic installation of all necessary dependencies and nn-Meter.

pip install .

nn-Meter is a latency predictor of models with type of tensorflow, pytorch, onnx, nn-meter IR graph and NNI IR graph. To use nn-Meter for specific model type, you also need to install corresponding pacakges. The well tested versions are listed below:

Testing Model Tpye	Requirments
Tensorflow	`tensorflow==1.15.0`
Torch	`onnx==1.9.0`, `torch==1.9.0`, `torchvision==0.10.0`
Onnx	`onnx==1.9.0`
nn-Meter IR graph	---
NNI IR graph	`nni==2.4`

Please also check the versions of numpy and scikit_learn. The different versions may change the prediction accuracy of kernel predictors.

The stable version of wheel binary pacakge will be released soon.

Usage

To apply for hardware latency prediction, nn-Meter provides two types of interfaces：

command line nn-meter after nn-meter installation.
Python binding provided by the module nn_meter

Here is a summary of supported inputs of the two methods.

Testing Model Type	Command Support	Python Binding
Tensorflow	Checkpoint file dumped by `tf.saved_model()` and endwith `.pb`	Checkpoint file dumped by `tf.saved_model` and endwith `.pb`
Torch	Models in `torchvision.models`	Object of `torch.nn.Module`
Onnx	Checkpoint file dumped by `onnx.save()` and endwith `.onnx`	Checkpoint file dumped by `onnx.save()` or model loaded by `onnx.load()`
nn-Meter IR graph	Json file in the format of nn-Meter IR Graph	`dict` object following the format of nn-Meter IR Graph
NNI IR graph	-	NNI IR graph object

In both methods, users could appoint predictor name and version to target a specific hardware platform (device). Currently, nn-Meter supports prediction on the following four configs:

Predictor (device_inferenceframework)	Processor Category	Version
cortexA76cpu_tflite21	CPU	1.0
adreno640gpu_tflite21	GPU	1.0
adreno630gpu_tflite21	GPU	1.0
myriadvpu_openvino2019r2	VPU	1.0

Users can get all predefined predictors and versions by running

# to list all predefined predictors
nn-meter --list-predictors

Predict latency of saved CNN model

After installation, a command named nn-meter is enabled. To predict the latency for a CNN model with a predefined predictor in command line, users can run the following commands

# for Tensorflow (*.pb) file
nn-meter --predictor <hardware> [--predictor-version <version>] --tensorflow <pb-file_or_folder> 

# for ONNX (*.onnx) file
nn-meter --predictor <hardware> [--predictor-version <version>] --onnx <onnx-file_or_folder>

# for torch model from torchvision model zoo (str)
nn-meter --predictor <hardware> [--predictor-version <version>] --torchvision <model-name> <model-name>... 

# for nn-Meter IR (*.json) file
nn-meter --predictor <hardware> [--predictor-version <version>] --nn-meter-ir <json-file_or_folder>

--predictor-version arguments is optional. When the predictor version is not specified by users, nn-meter will use the latest verison of the predictor.

nn-Meter can support batch mode prediction. To predict latency for multiple models in the same model type once, user should collect all models in one folder and state the folder after --[model-type] liked argument.

It should also be noted that for PyTorch model, nn-meter can only support existing models in torchvision model zoo. The string followed by --torchvision should be exactly one or more string indicating name(s) of some existing torchvision models.

Convert to nn-Meter IR Graph

Furthermore, users may be interested to convert tensorflow pb-file or onnx file to nn-Meter IR graph. Users could convert nn-Meter IR graph and save to .json file be running

# for Tensorflow (*.pb) file
nn-meter getir --tensorflow <pb-file> [--output <output-name>]

# for ONNX (*.onnx) file
nn-meter getir --onnx <onnx-file> [--output <output-name>]

Output name is default to be /path/to/input/file/__ir.json if not specified by users.

Use nn-Meter in your python code

After installation, users can import nn-Meter in python code

from nn_meter import load_latency_predictor

predictor = load_latency_predictor(hardware_name, hardware_predictor_version) # case insensitive in backend

# build your model (e.g., model instance of torch.nn.Module)
model = ... 

lat = predictor.predict(model, model_type) # the resulting latency is in unit of ms

By calling load_latency_predictor, user selects the target hardware and loads the corresponding predictor. nn-Meter will try to find the right predictor file in ~/.nn_meter/data. If the predictor file doesn't exist, it will download from the Github release.

In predictor.predict, the allowed items of the parameter model_type include ["pb", "torch", "onnx", "nnmeter-ir", "nni-ir"], representing model types of tensorflow, torch, onnx, nn-meter IR graph and NNI IR graph, respectively.

Users could view the information all built-in predictors by list_latency_predictors or view the config file in nn_meter/configs/predictors.yaml.

Users could get a nn-Meter IR graph by applying model_file_to_graph and model_to_graph by calling the model name or model object and specify the model type. The supporting model types of model_file_to_graph include "onnx", "pb", "torch", "nnmeter-ir" and "nni-ir", while the supporting model types of model_to_graph include "onnx", "torch" and "nni-ir".

Hardware-aware NAS by nn-Meter and NNI

To empower affordable DNN on the edge and mobile devices, hardware-aware NAS searches both high accuracy and low latency models. In particular, the search algorithm only considers the models within the target latency constraints during the search process.

Currently we provides example of end-to-end multi-trial NAS, which is a random search algorithm on SPOS NAS search space. More examples of more hardware-aware NAS and model compression algorithms are coming soon.

To run multi-trail SPOS demo, NNI should be installed through source code by following NNI Doc

python setup.py develop

Then run multi-trail SPOS demo:

python ${NNI_ROOT}/examples/nas/oneshot/spos/multi_trial.py

How the demo works

Refer to NNI Doc for how to perform NAS by NNI.

To support hardware-aware NAS, you first need a Strategy that supports filtering the models by latency. We provide such a filter named LatencyFilter in NNI and initialize a Random strategy with the filter:

simple_strategy = strategy.Random(model_filter=LatencyFilter(threshold=100, predictor=base_predictor))

LatencyFilter will predict the models' latency by using nn-Meter and filter out the models whose latency with the given predictor are larger than the threshold (i.e., 100 in this example). You can also build your own strategies and filters to support more flexible NAS such as sorting the models according to latency.

Then, pass this strategy to RetiariiExperiment:

exp = RetiariiExperiment(base_model, trainer, strategy=simple_strategy)

exp_config = RetiariiExeConfig('local')
...
exp_config.dummy_input = [1, 3, 32, 32]

exp.run(exp_config, port)

In exp_config, dummy_input is required for tracing shape info.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

License

The entire codebase is under MIT license

The dataset is under Open Use of Data Agreement

Citation

If you find that nn-Meter helps your research, please consider citing it:

@inproceedings{nnmeter,
    author = {Zhang, Li Lyna and Han, Shihao and Wei, Jianyu and Zheng, Ningxin and Cao, Ting and Yang, Yuqing and Liu, Yunxin},
    title = {nn-Meter: Towards Accurate Latency Prediction of Deep-Learning Model Inference on Diverse Edge Devices},
    year = {2021},
    publisher = {ACM},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3458864.3467882},
    doi = {10.1145/3458864.3467882},
    booktitle = {Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services},
    pages = {81–93},
}

@misc{nnmetercode,
    author = {Microsoft Research nn-Meter Team},
    title = {nn-Meter: Towards Accurate Latency Prediction of Deep-Learning Model Inference on Diverse Edge Devices},
    year = {2021},
    url = {https://github.com/microsoft/nn-Meter},
}

Comments

Signficant differences between measurements and predictions on Pixel 4

Hi,

I am trying to reproduce the results of nn-Meter by comparing the measurements on Pixel 4 with the results from the pre-trained predictors (i.e., cortexA76cpu_tflite21 and adreno640gpu_tflite21). However, I observed significant differences between the measurements and predictions.

I converted the provided TensorFlow pb models (i.e., pb_models) into .tflite format, and built the binary benchmark_model from TFLite v2.1 source. I benchmarked all the models with the following commands on Pixel 4 (Snapdragon 855 and Adreno 640):

# For CPUs:
/data/local/tmp/benchmark_model --warmup_runs=10 --num_runs=10 --num_threads=1 --graph=${path}

# For GPUs:
/data/local/tmp/benchmark_model --warmup_runs=10 --num_runs=10 --use_gpu=true --graph=${path}

For example, the measurement of resnet18_0 on CPU shows:

$ /data/local/tmp/benchmark_model --warmup_runs=10 --num_runs=10 --num_threads=1 --graph=${path}
STARTING
...
Loaded model /data/local/tmp/output/tflite-pb-tf21/resnet18_0.tflite
resolved reporter
INFO: Initialized TensorFlow Lite runtime.
Initialized session in 0.732ms
[Init Phase] - Memory usage: max resident set size = 3.07422 MB, total malloc-ed size = 14.5485 MB
[Init Phase] - Memory usage: max resident set size = 3.07422 MB, total malloc-ed size = 14.5485 MB
Running benchmark for at least 10 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=10 first=150351 curr=119397 min=119349 max=150351 avg=122502 std=9282

Running benchmark for at least 10 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=10 first=119529 curr=119341 min=119341 max=119529 avg=119410 std=53

[Overall] - Memory usage: max resident set size = 71.4961 MB, total malloc-ed size = 31.3739 MB
Average inference timings in us: Warmup: 122502, Init: 732, no stats: 119410

but the prediction on cortexA76cpu_tflite21 is:

...
(nn-Meter) Get weight shape of fc13.fc/MatMul from ['fc13.fc/weight'], input shape:[512, 1000].
(nn-Meter) Get input shape of fc13.fc/MatMul from Reshape, input shape:[-1, 512].
(nn-Meter) Input shape of fc13.fc/MatMul op is [[-1, 512]].
(nn-Meter) Output shape of fc13.fc/MatMul op is [[-1, 1000]].
(nn-Meter) Predict latency: 216.19714599005837 ms
resnet18_0,216.19714599005837

with error 81% (i.e., 216.19 v.s. 119.41).

Similarly, for resnet50_0 on GPU, the measurement is:

$ /data/local/tmp/benchmark_model --warmup_runs=10 --num_runs=10 --use_gpu=true --graph=${path}
STARTING!
...
Loaded model /data/local/tmp/output/tflite-pb-tf21/resnet50_0.tflite
resolved reporter
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for GPU.
ERROR: Next operations are not supported by GPU delegate:
MEAN: Operation is not supported.
First 70 operations will run on the GPU, and the remaining 2 on the CPU.
INFO: Initialized OpenCL-based API.
Applied GPU delegate.
Initialized session in 665.544ms
[Init Phase] - Memory usage: max resident set size = 274.34 MB, total malloc-ed size = 1.32245 MB
Running benchmark for at least 10 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=10 first=51879 curr=58338 min=43539 max=58484 avg=55702.5 std=4507

Running benchmark for at least 10 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=18 first=58433 curr=58263 min=56980 max=59873 avg=58350.9 std=674

[Overall] - Memory usage: max resident set size = 274.34 MB, total malloc-ed size = 1.90115 MB
Average inference timings in us: Warmup: 55702.5, Init: 665544, no stats: 58350.9

and the predictor produces the following:

...
(nn-Meter) Find node fc21.fc/MatMul with its weight op fc21.fc/weight.
(nn-Meter) Get weight shape of fc21.fc/MatMul from ['fc21.fc/weight'], input shape:[2048, 1000].
(nn-Meter) Get input shape of fc21.fc/MatMul from Reshape, input shape:[-1, 2048].
(nn-Meter) Input shape of fc21.fc/MatMul op is [[-1, 2048]].
(nn-Meter) Output shape of fc21.fc/MatMul op is [[-1, 1000]].
(nn-Meter) Predict latency: 91.73126828870865 ms
resnet50_0,91.73126828870865

with error 57% (i.e., 91.73 v.s. 58.35).

I am wondering whether I set up the same experimental environment as the one for training the predictors. I can provide more information (e.g., the tflite models) if needed and look into the issue further.

Thank you!

opened by 165749 7

Roadmap
nn-Meter is not only a latency predictor but also a critical component in the hardware-aware model design. It empowers existing NAS (neural architecture search) and other efficient model design tasks to be specialized for the target hardware platform.

There are multiple aspects will be covered in this and related repo, including:

latency prediction and pre-trained predictors

the IR converter, kernel detection tools

builtin kernel predictors and pre-trained weights

algorithm integration (mainly in NNI), the integration of latency prediction in existing NAS and compression algorithms.

model latency dataset, the collected latencies of thousands of model architectures. Also includes data loaders and an improved GNN predictor.

Release Plan

version 1.0-alpha

Date: 2021 August

Latency prediction

[x] basic framework and utilities for latency prediction (e.g., config management, artifacts downloading, builtin predictors)

[x] basic CI workflow with integrated test

[x] documentation and examples

Algorithm integration

[x] initial multi-trial NAS example

version 1.0-beta

Date: 2021 November

Algorithm integration

[x] SPOS / Proxyless NAS in NNI

[x] ~~SPOS: first integrate nn-meter in the evolution search~~ (move to 2.0)

[x] Proxyless NAS: predict the block latency in the search space, provide the lookup table

Dataset

[x] make model-latency dataset public

[x] reference design of an improved GNN latency predictor

version 2.0

Date: 2021 ~~November~~ December

Algorithm integration

[x] SPOS: first integrate nn-meter in the evolution search

latency predictor building tools

[x] fusion rule detecton

[x] adaptive data sampler
opened by mydmdm 5
google.protobuf.message.DecodeError: Error parsing message with type 'tensorflow.GraphDef'

when I use 'nn-meter predict --predictor cortexA76cpu_tflite21 --predictor-version 1.0 --tensorflow mobilenetv3small_0.onnx ' or 'nn-meter predict --predictor cortexA76cpu_tflite21 --tensorflow mobilenetv3small_0.json' in my command line, this error occured. Any suggestions?

Traceback (most recent call last): File "/opt/conda/bin/nn-meter", line 8, in sys.exit(nn_meter_cli()) File "/opt/conda/lib/python3.7/site-packages/nn_meter/nn_meter_cli.py", line 182, in nn_meter_cli args.func(args) File "/opt/conda/lib/python3.7/site-packages/nn_meter/nn_meter_cli.py", line 54, in apply_latency_predictor_cli latency = predictor.predict(model, model_type) # in unit of ms File "/opt/conda/lib/python3.7/site-packages/nn_meter/predictor/nn_meter_predictor.py", line 102, in predict graph = model_file_to_graph(model, model_type, input_shape, apply_nni=apply_nni) File "/opt/conda/lib/python3.7/site-packages/nn_meter/ir_converter/utils.py", line 41, in model_file_to_graph converter = FrozenPbConverter(filename) File "/opt/conda/lib/python3.7/site-packages/nn_meter/ir_converter/frozenpb_converter/frozenpb_converter.py", line 15, in init parser = FrozenPbParser(file_name) File "/opt/conda/lib/python3.7/site-packages/nn_meter/ir_converter/frozenpb_converter/frozenpb_parser.py", line 19, in init graph.ParseFromString(f.read()) google.protobuf.message.DecodeError: Error parsing message with type 'tensorflow.GraphDef'

opened by howardgriffin 3

"list index out of range" when using torch Conv1d

For debugging purposes i built a simple model based on PyTorch Lightning:

class TCNModel(nni.retiarii.evaluator.pytorch.lightning.LightningModule): def init(self): super().init() self.output = nn.Conv1d(1, 1, kernel_size=1)

def forward(self, x): x = self.output(x) return x

and:

def compute_model_latency_in_ms(model, batch_size, latency_platform): predictor = load_latency_predictor(latency_platform) latency = predictor.predict(model=model, model_type='torch', input_shape=[batch_size,1,88201]) return latency

When trying to predict the latency of this with nn-meter, i get the following error:

PS C:\Users\alexa\Desktop\Code\NAS_New_Trial> python -u "c:\Users\alexa\Desktop\Code\NAS_New_Trial\Baseline_Train.py"
Global seed set to 42
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[2022-05-12 12:03:43] INFO (root/MainThread) checking local kernel predictors at C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21
[2022-05-12 12:03:43] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\add.pkl
C:\Users\alexa\Python308\lib\site-packages\sklearn\base.py:310: UserWarning: Trying to unpickle estimator DecisionTreeRegressor from version 0.23.1 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.
  warnings.warn(
C:\Users\alexa\Python308\lib\site-packages\sklearn\base.py:310: UserWarning: Trying to unpickle estimator RandomForestRegressor from version 0.23.1 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.
  warnings.warn(
[2022-05-12 12:03:43] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\addrelu.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\avgpool.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\bn.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\bnrelu.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\channelshuffle.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\concat.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\conv-bn-relu.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\dwconv-bn-relu.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\fc.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\global-avgpool.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\hswish.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\maxpool.pkl
[2022-05-12 12:03:44] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\relu.pkl
[2022-05-12 12:03:45] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\se.pkl
[2022-05-12 12:03:45] INFO (root/MainThread) load predictor C:\Users\alexa/.nn_meter/data\predictor\cortexA76cpu_tflite21\split.pkl
[2022-05-12 12:03:45] INFO (root/MainThread) Start latency prediction ...
[2022-05-12 12:03:45] INFO (root/MainThread) Onnx-based Torch Converter is applied for model conversion
Traceback (most recent call last):
  File "c:\Users\alexa\Desktop\Code\NAS_New_Trial\Baseline_Train.py", line 53, in <module>
    lat = compute_model_latency_in_ms(model, args.batch_size, latency_platform)
  File "c:\Users\alexa\Desktop\Code\NAS_New_Trial\Utils.py", line 37, in compute_model_latency_in_ms
    latency = predictor.predict(model=model, model_type='torch', input_shape=[batch_size,1,88201])
  File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\predictor\nn_meter_predictor.py", line 107, in predict
    self.kd.load_graph(graph)
  File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\kernel_detector\kernel_detector.py", line 19, in load_graph
    new_graph = convert_nodes(graph)
  File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\kernel_detector\utils\ir_tools.py", line 42, in convert_nodes
    cin = node["attr"]["input_shape"][0][3]
IndexError: list index out of range

After a lot of fiddling around, I noticed that this happens with a Conv1D but not with Conv2D. Alternatively, I could change cin = node["attr"]["input_shape"][0][3] to cin = node["attr"]["input_shape"][0][2] in nn_meter\kernel_detector\utils\ir_tools.py, but obviously I don't know how this influences the prediction itself and if that leads to weird behaviour with models that utilize Conv2D.

At this point I'd like to ask you for a) a quick fix I can apply safely (I need nn-meter for my bachelors thesis ;) ) b) an update for future users.

Also I find it weird that both fixes (usind Conv3D and modifying ir_tools.py) lead to a prediction of 0ms latency with this simple model. Is that plausible?

Thank you very much!

EDIT:

Using a more complex model leads to even more errors regarding the Conv1D:

class TCNModel(pl.LightningModule):   
    def __init__(self, 
                 ninputs=1,
                 noutputs=1,
                 kernel_size=13, 
                 dilation_growth=10, 
                 channel_growth=1, 
                 channel_width=32, 
                 stack_size=10,
                 grouped=False,
                 causal=True,
                 lr = 5e-3, 
                 train_loss = "l1+stft", # 'stft' or 'l1+stft' or 'l1'
                 save_dir = "UnknownEffect",
                 num_examples = 5):
        super().__init__()
        self.save_hyperparameters()

        out1_ch = ninputs * channel_width * channel_growth
        out2_ch = out1_ch * channel_growth
        out3_ch = out2_ch * channel_growth
        out4_ch = out3_ch * channel_growth

        dilation1 = 1
        dilation2 = dilation_growth ** (1 % stack_size)
        dilation3 = dilation_growth ** (2 % stack_size)
        dilation4 = dilation_growth ** (3 % stack_size)

        self.block1 = TCNBlock(ninputs, out1_ch, kernel_size, dilation1, causal, grouped)
        self.block2 = TCNBlock(out1_ch, out2_ch, kernel_size, dilation2, causal, grouped)
        self.block3 = TCNBlock(out2_ch, out3_ch, kernel_size, dilation3, causal, grouped)
        self.block4 = TCNBlock(out3_ch, out4_ch, kernel_size, dilation4, causal, grouped)
        self.output = nn.Conv1d(out4_ch, noutputs, kernel_size=1)        

    def forward(self, x):  
        
        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x)
        x = self.block4(x)
        x = self.output(x)
        return x

class TCNBlock(nn.Module):
    def __init__(self, 
                in_ch, 
                out_ch, 
                kernel_size=3, 
                dilation=1, 
                grouped=False, 
                causal=True):
        super().__init__()

        self.in_ch = in_ch
        self.out_ch = out_ch
        self.kernel_size = kernel_size
        self.dilation = dilation
        self.grouped = grouped
        self.causal = causal

        self.conv1 = nn.Conv1d(in_ch, out_ch, kernel_size=kernel_size, dilation=dilation)
        # self.bn = nn.BatchNorm1d(out_ch)
        # self.relu = nn.PReLU(out_ch)
        # self.res = nn.Conv1d(in_ch, out_ch, kernel_size=1, groups=in_ch)

    def forward(self, x):
        x = self.conv1(x)

        return x

leads to:

[2022-05-12 13:14:00] INFO (root/MainThread) Start latency prediction ...
[2022-05-12 13:14:00] INFO (root/MainThread) Onnx-based Torch Converter is applied for model conversion
[2022-05-12 13:14:01] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#0', 'input_tensors': [[4, 1, 88201]], 'ks': [13], 'strides': [1], 'cin': 88201, 'cout': 88189, 'inbounds': [], 'outbounds': ['conv#1']}
[2022-05-12 13:14:01] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#1', 'input_tensors': [[4, 32, 88189]], 'ks': [13], 'strides': [1], 'cin': 88189, 'cout': 88069, 'inbounds': ['conv#0'], 'outbounds': ['conv#2']}
[2022-05-12 13:14:01] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#2', 'input_tensors': [[4, 32, 88069]], 'ks': [13], 'strides': [1], 'cin': 88069, 'cout': 86869, 'inbounds': ['conv#1'], 'outbounds': ['conv#3']}
[2022-05-12 13:14:01] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#3', 'input_tensors': [[4, 32, 86869]], 'ks': [13], 'strides': [1], 'cin': 86869, 'cout': 74869, 'inbounds': ['conv#2'], 'outbounds': ['conv#4']}
[2022-05-12 13:14:01] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#4', 'input_tensors': [[4, 32, 74869]], 'ks': [1], 'strides': [1], 'cin': 74869, 'cout': 74869, 'inbounds': ['conv#3'], 'outbounds': []}   
Traceback (most recent call last):
  File "c:\Users\alexa\Desktop\Code\NAS_New_Trial\Baseline_Train.py", line 53, in <module>
    lat = compute_model_latency_in_ms(model, args.batch_size, latency_platform)
  File "c:\Users\alexa\Desktop\Code\NAS_New_Trial\Utils.py", line 37, in compute_model_latency_in_ms
    latency = predictor.predict(model=model, model_type='torch', input_shape=[batch_size,1,88201])
  File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\predictor\nn_meter_predictor.py", line 109, in predict
    py = nn_predict(self.kernel_predictors, self.kd.kernels) # in unit of ms
  File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\predictor\prediction\predict_by_kernel.py", line 53, in nn_predict
    features = get_predict_features(kernel_units)
  File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\predictor\prediction\extract_feature.py", line 49, in get_predict_features
    ks = item["ks"][1]
IndexError: list index out of range

and after modifying extract_feature.py line 49 to ks = item["ks"][-1] I get

[2022-05-12 13:17:00] INFO (root/MainThread) Start latency prediction ...
[2022-05-12 13:17:00] INFO (root/MainThread) Onnx-based Torch Converter is applied for model conversion
[2022-05-12 13:17:01] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#0', 'input_tensors': [[4, 1, 88201]], 'ks': [13], 'strides': [1], 'cin': 88201, 'cout': 88189, 'inbounds': [], 'outbounds': ['conv#1']}
[2022-05-12 13:17:01] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#1', 'input_tensors': [[4, 32, 88189]], 'ks': [13], 'strides': [1], 'cin': 88189, 'cout': 88069, 'inbounds': ['conv#0'], 'outbounds': ['conv#2']}
[2022-05-12 13:17:01] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#2', 'input_tensors': [[4, 32, 88069]], 'ks': [13], 'strides': [1], 'cin': 88069, 'cout': 86869, 'inbounds': ['conv#1'], 'outbounds': ['conv#3']}
[2022-05-12 13:17:01] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#3', 'input_tensors': [[4, 32, 86869]], 'ks': [13], 'strides': [1], 'cin': 86869, 'cout': 74869, 'inbounds': ['conv#2'], 'outbounds': ['conv#4']}
[2022-05-12 13:17:01] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#4', 'input_tensors': [[4, 32, 74869]], 'ks': [1], 'strides': [1], 'cin': 74869, 'cout': 74869, 'inbounds': ['conv#3'], 'outbounds': []}   
Traceback (most recent call last):
  File "c:\Users\alexa\Desktop\Code\NAS_New_Trial\Baseline_Train.py", line 53, in <module>
    lat = compute_model_latency_in_ms(model, args.batch_size, latency_platform)
  File "c:\Users\alexa\Desktop\Code\NAS_New_Trial\Utils.py", line 37, in compute_model_latency_in_ms
    latency = predictor.predict(model=model, model_type='torch', input_shape=[batch_size,1,88201])
  File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\predictor\nn_meter_predictor.py", line 109, in predict
    py = nn_predict(self.kernel_predictors, self.kd.kernels) # in unit of ms
  File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\predictor\prediction\predict_by_kernel.py", line 53, in nn_predict
    features = get_predict_features(kernel_units)
  File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\predictor\prediction\extract_feature.py", line 50, in get_predict_features
    s = item["strides"][1] if "strides" in item else 1
IndexError: list index out of range

and after modifying extract_feature.py line 50 to strides = item["strides"][-1] I get

[2022-05-12 13:17:14] INFO (root/MainThread) Start latency prediction ...
[2022-05-12 13:17:14] INFO (root/MainThread) Onnx-based Torch Converter is applied for model conversion
[2022-05-12 13:17:15] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#0', 'input_tensors': [[4, 1, 88201]], 'ks': [13], 'strides': [1], 'cin': 88201, 'cout': 88189, 'inbounds': [], 'outbounds': ['conv#1']}
[2022-05-12 13:17:15] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#1', 'input_tensors': [[4, 32, 88189]], 'ks': [13], 'strides': [1], 'cin': 88189, 'cout': 88069, 'inbounds': ['conv#0'], 'outbounds': ['conv#2']}
[2022-05-12 13:17:15] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#2', 'input_tensors': [[4, 32, 88069]], 'ks': [13], 'strides': [1], 'cin': 88069, 'cout': 86869, 'inbounds': ['conv#1'], 'outbounds': ['conv#3']}
[2022-05-12 13:17:15] INFO (root/MainThread) {'op': 'conv', 'name': 'conv#3', 'input_tensors': [[4, 32, 86869]], 'ks': [13], 'strides': [1], 'cin': 86869, 'cout': 74869, 'inbounds': ['conv#2'], 'outbounds': ['con  File "c:\Users\alexa\Desktop\Code\NAS_New_Trial\Utils.py", line 37, in compute_model_latency_in_ms
    latency = predictor.predict(model=model, model_type='torch', input_shape=[batch_size,1,88201])
  File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\predictor\nn_meter_predictor.py", line 109, in predict
    py = nn_predict(self.kernel_predictors, self.kd.kernels) # in unit of ms
  File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\predictor\prediction\predict_by_kernel.py", line 53, in nn_predict
    features = get_predict_features(kernel_units)
  File "C:\Users\alexa\Python308\lib\site-packages\nn_meter\predictor\prediction\extract_feature.py", line 51, in get_predict_features
    inputh = item["inputh"]
KeyError: 'inputh'

and so on. Is it safe to assume that 1D Convolutions are not supported at this point?

opened by ThePhoenixCoding 2

: 'id' is a python keyword and will cause an unexpected error.

Reproduce: macbook pro python 3.9

1. create tflite workspace
nn-meter create --tflite-workspace <path/to/workspace>
2. connect android device with usb (adb is ready)
3. download the fixed "benchmark_model" and push to device
4. run examples/nn-meter_builder_examples/build_kernel_latency_predictor.ipynb

nothing happens in stage "connect backend" and "parse profile resule", just throw an IndexError as follows, it is confused

(nn-Meter) All 0 models complete. Save all success profiled results to /Users/weixiaobin/Repos/arxiv/nn-Meter/ws-tflite/predictor_build/results/profiled_conv-bn-relu.json.
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Input In [4], in <cell line: 6>()
      3 kernel_type = "conv-bn-relu"
      4 backend = "tflite_cpu"
----> 6 predictor, data = build_predictor_for_kernel(
      7     kernel_type, backend, init_sample_num = 10, finegrained_sample_num = 10, iteration = 5, error_threshold = 0.1
      8 )

File ~/opt/anaconda3/envs/nnm/lib/python3.9/site-packages/nn_meter/builder/nn_meter_builder.py:190, in build_predictor_for_kernel(kernel_type, backend, init_sample_num, finegrained_sample_num, iteration, error_threshold, predict_label)
    187 kernel_data = sample_and_profile_kernel_data(kernel_type, init_sample_num, backend, sampling_mode='prior', mark='prior')
    189 # use current sampled data to build regression model, and locate data with large errors in testset
--> 190 predictor, acc10, error_configs = build_predictor_by_data(kernel_type, kernel_data, backend, error_threshold=error_threshold, mark='prior',
    191                                                           save_path=os.path.join(workspace_path, "results"), predict_label=predict_label)
    192 logging.keyinfo(f'Iteration 0: acc10 {acc10}, error_configs number: {len(error_configs)}')
    194 for i in range(1, iteration):
    195     # finegrained sampling and profiling for large error data

File ~/opt/anaconda3/envs/nnm/lib/python3.9/site-packages/nn_meter/builder/kernel_predictor_builder/predictor_builder/build_predictor.py:37, in build_predictor_by_data(kernel_type, kernel_data, backend, error_threshold, mark, save_path, predict_label)
     35 os.makedirs(os.path.join(save_path, "collection"), exist_ok=True)
     36 os.makedirs(os.path.join(save_path, "predictors"), exist_ok=True)
---> 37 data = get_data_by_profiled_results(kernel_type, feature_parser, kernel_data,
     38                                     save_path=os.path.join(save_path, "collection", f'Data_{kernel_type}_{mark}.csv'),
     39                                     predict_label=predict_label)
     41 # get data for regression
     42 X, Y = data

File ~/opt/anaconda3/envs/nnm/lib/python3.9/site-packages/nn_meter/builder/kernel_predictor_builder/predictor_builder/extract_feature.py:187, in get_data_by_profiled_results(kernel_type, feature_parser, cfgs_path, labs_path, save_path, predict_label)
    185 import pandas as pd
    186 cols = feature_parser.needed_config[:]
--> 187 if len(features[0]) - len(feature_parser.needed_config) > 0: # there are extra features beyond needed config
    188     cols += [f'feature_{i}' for i in range(len(features[0]) - len(feature_parser.needed_config))]
    189 data_df = pd.DataFrame(features, columns=cols)

IndexError: list index out of range

opened by xbwee1024 2

I wonder if nn-Meter can be applied to other network structures?

I modify the original neural models in the code, but it returns "ValueError: Unsupported Model Name: modresnet50_0 in torchvision. Supporting list: resnet18, alexnet, vgg16, squeezenet, densenet161, inception_v3, googlenet, shufflenet_v2, mobilenet_v2, Resnext50_32x4d wide_resnet50_2, mnasnet"

opened by nuts-bottles 2
regarding converting onnx model to graph
Hi，I encounter two questions when reading the code:

It seems that we may not need NetworkX to first convert onnx model to a graph of NetworkX (in function "to_networkx"), the overall information may be extracted directly from onnx model into the result graph in "OnnxConverter.convert" method.

In "to_networkx" function, you added tensor as a node into the G graph ( e.g., "G.add_edge(input_name, node.name)"), it seems no use but has to skip the tensor node when calculating the inbounds/outbounds in "OnnxConverter.convert" method as follows:

for succ in self.G.successors(node): for succ_succ in self.G.successors(succ): ...

just want to know is there something I do not understand about the above code, many thanks!
opened by chencuber 2
Bump tensorflow from 1.15.0 to 2.4.0 in /kerneldetection
Bumps tensorflow from 1.15.0 to 2.4.0.

Release notes

Sourced from tensorflow's releases.

TensorFlow 2.4.0

Release 2.4.0

Major Features and Improvements

tf.distribute introduces experimental support for asynchronous training of models via the tf.distribute.experimental.ParameterServerStrategy API. Please see the tutorial to learn more.

MultiWorkerMirroredStrategy is now a stable API and is no longer considered experimental. Some of the major improvements involve handling peer failure and many bug fixes. Please check out the detailed tutorial on Multi-worker training with Keras.

Introduces experimental support for a new module named tf.experimental.numpy which is a NumPy-compatible API for writing TF programs. See the detailed guide to learn more. Additional details below.

Adds Support for TensorFloat-32 on Ampere based GPUs. TensorFloat-32, or TF32 for short, is a math mode for NVIDIA Ampere based GPUs and is enabled by default.

A major refactoring of the internals of the Keras Functional API has been completed, that should improve the reliability, stability, and performance of constructing Functional models.

Keras mixed precision API tf.keras.mixed_precision is no longer experimental and allows the use of 16-bit floating point formats during training, improving performance by up to 3x on GPUs and 60% on TPUs. Please see below for additional details.

TensorFlow Profiler now supports profiling MultiWorkerMirroredStrategy and tracing multiple workers using the sampling mode API.

TFLite Profiler for Android is available. See the detailed guide to learn more.

TensorFlow pip packages are now built with CUDA11 and cuDNN 8.0.2.

Breaking Changes

TF Core:

Certain float32 ops run in lower precsion on Ampere based GPUs, including matmuls and convolutions, due to the use of TensorFloat-32. Specifically, inputs to such ops are rounded from 23 bits of precision to 10 bits of precision. This is unlikely to cause issues in practice for deep learning models. In some cases, TensorFloat-32 is also used for complex64 ops. TensorFloat-32 can be disabled by running tf.config.experimental.enable_tensor_float_32_execution(False).

The byte layout for string tensors across the C-API has been updated to match TF Core/C++; i.e., a contiguous array of tensorflow::tstring/TF_TStrings.

C-API functions TF_StringDecode, TF_StringEncode, and TF_StringEncodedSize are no longer relevant and have been removed; see core/platform/ctstring.h for string access/modification in C.

tensorflow.python, tensorflow.core and tensorflow.compiler modules are now hidden. These modules are not part of TensorFlow public API.

tf.raw_ops.Max and tf.raw_ops.Min no longer accept inputs of type tf.complex64 or tf.complex128, because the behavior of these ops is not well defined for complex types.

XLA:CPU and XLA:GPU devices are no longer registered by default. Use TF_XLA_FLAGS=--tf_xla_enable_xla_devices if you really need them, but this flag will eventually be removed in subsequent releases.

tf.keras:

The steps_per_execution argument in model.compile() is no longer experimental; if you were passing experimental_steps_per_execution, rename it to steps_per_execution in your code. This argument controls the number of batches to run during each tf.function call when calling model.fit(). Running multiple batches inside a single tf.function call can greatly improve performance on TPUs or small models with a large Python overhead.

A major refactoring of the internals of the Keras Functional API may affect code that is relying on certain internal details:

Code that uses isinstance(x, tf.Tensor) instead of tf.is_tensor when checking Keras symbolic inputs/outputs should switch to using tf.is_tensor.

Code that is overly dependent on the exact names attached to symbolic tensors (e.g. assumes there will be ":0" at the end of the inputs, treats names as unique identifiers instead of using tensor.ref(), etc.) may break.

Code that uses full path for get_concrete_function to trace Keras symbolic inputs directly should switch to building matching tf.TensorSpecs directly and tracing the TensorSpec objects.

Code that relies on the exact number and names of the op layers that TensorFlow operations were converted into may have changed.

Code that uses tf.map_fn/tf.cond/tf.while_loop/control flow as op layers and happens to work before TF 2.4. These will explicitly be unsupported now. Converting these ops to Functional API op layers was unreliable before TF 2.4, and prone to erroring incomprehensibly or being silently buggy.

Code that directly asserts on a Keras symbolic value in cases where ops like tf.rank used to return a static or symbolic value depending on if the input had a fully static shape or not. Now these ops always return symbolic values.

Code already susceptible to leaking tensors outside of graphs becomes slightly more likely to do so now.

Code that tries directly getting gradients with respect to symbolic Keras inputs/outputs. Use GradientTape on the actual Tensors passed to the already-constructed model instead.

Code that requires very tricky shape manipulation via converted op layers in order to work, where the Keras symbolic shape inference proves insufficient.

Code that tries manually walking a tf.keras.Model layer by layer and assumes layers only ever have one positional argument. This assumption doesn't hold true before TF 2.4 either, but is more likely to cause issues now.

... (truncated)

Changelog

Sourced from tensorflow's changelog.

Release 2.4.0

Major Features and Improvements

tf.distribute introduces experimental support for asynchronous training of models via the [tf.distribute.experimental.ParameterServerStrategy] (https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/ParameterServerStrategy) API. Please see the tutorial to learn more.

MultiWorkerMirroredStrategy is now a stable API and is no longer considered experimental. Some of the major improvements involve handling peer failure and many bug fixes. Please check out the detailed tutorial on [Multi-worker training with Keras] (https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras).

Introduces experimental support for a new module named [tf.experimental.numpy] (https://www.tensorflow.org/api_docs/python/tf/experimental/numpy) which is a NumPy-compatible API for writing TF programs. See the [detailed guide] (https://www.tensorflow.org/guide/tf_numpy) to learn more. Additional details below.

Adds Support for TensorFloat-32 on Ampere based GPUs. TensorFloat-32, or TF32 for short, is a math mode for NVIDIA Ampere based GPUs and is enabled by default.

A major refactoring of the internals of the Keras Functional API has been completed, that should improve the reliability, stability, and performance of constructing Functional models.

Keras mixed precision API [tf.keras.mixed_precision] (https://www.tensorflow.org/api_docs/python/tf/keras/mixed_precision?version=nightly) is no longer experimental and allows the use of 16-bit floating point formats during training, improving performance by up to 3x on GPUs and 60% on TPUs. Please see below for additional details.

TensorFlow Profiler now supports profiling MultiWorkerMirroredStrategy and tracing multiple workers using the [sampling mode API] (https://www.tensorflow.org/guide/profiler#profiling_apis).

TFLite Profiler for Android is available. See the detailed [guide] (https://www.tensorflow.org/lite/performance/measurement#trace_tensorflow_lite_internals_in_android) to learn more.

TensorFlow pip packages are now built with CUDA11 and cuDNN 8.0.2.

Breaking Changes

TF Core:

Certain float32 ops run in lower precision on Ampere based GPUs, including

... (truncated)

Commits

582c8d2 Merge pull request #44220 from tensorflow-jenkins/relnotes-2.4.0rc0-18048

c16387f Update RELEASE.md

4cf406c Update RELEASE.md

3f35ef2 Update RELEASE.md

3647e8e Update RELEASE.md

281c7d5 Update RELEASE.md

91ec75f Update RELEASE.md

ed5ad82 Update RELEASE.md

1267bba Update RELEASE.md

13a4067 Update RELEASE.md

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 2
How to generate material/testmodels/mobilenetv3small_0.pb

Hi, I would like to generate a TF2 frozen pb model (such as material/testmodels/mobilenetv3small_0.pb), but my models(generated by TF 2.6.0 and TF 2.7.0) can't be converted to nnm-ir. I find dataset/generator/generate_model.py is used to generate the keras h5 models. Is it possible to release the reference code to generate TF2 frozen pb model? And which TF version was used to generate material/testmodels/mobilenetv3small_0.pb?

Thanks & Regards, X. Zhang

opened by AIxyz 1
Cannot open register_and_connect_backend.ipynb

Hi, I'm trying to create a new predictor for my edge device. The example notebook for register and connect backend (https://github.com/microsoft/nn-Meter/blob/main/examples/nn-meter_builder_examples/register_and_connect_backend.ipynb) is invalid.

The file size is also 0Kb. Please update the file.

Thanks, Ramson Jehu K

opened by Ramsonjehu 1
fusion_rules not match

In your open-sourced cortexA76cpu_tflite21 predictor fusion_rules.json file, I found:

"RBC": { "latency": {}, "obey": null }, "CBC": { "latency": {}, "obey": null }, "BF_bn_relu": { "obey": true }, "BF_conv_bn": { "obey": true }, "BF_dwconv_bn": { "obey": true }, "BF_conv_bn_relu": { "obey": true }, "BF_dwconv_bn_relu": { "obey": true }

do you mean these kernels are naturally fused and hence doesn't need to detect? Why it that? Do I need to add all those to my own detected_fusion_rule.json?

What's more, in the fusion_rules.json, the fusion rules you seem to have detected are:

se_relu, pooling_reshape, dense_concat, dense_add, dense_relu, concat_dense, conv_pooling, conv_relu, add_relu, relu_dense, relu_relu, dwconv_relu, reshape_convtrans, reshape_conv, reshape_relu, reshape_reshape.

But in this page, you claim that the CPU kernel are:

conv-relu,fc,maxpool,global-avgpool,fc-relu,concat,avgpool,conv-bn-relu,bn-relu,conv,SE-relu,conv-bn,dwconv-bn,dwconv-bn-relu,add,hswish,SE,conv-bn-bn-relu,relu,add-relu,channelshuffle,split.

Those two are very different, why? What confuses me even more is, in the default predictorbuild_config.yaml, the kernels are:

conv-bn-relu, dwconv-bn-relu, maxpool, avgpool, fc, concat, split, channelshuffle, se, global-avgpool, bnrelu, bn, hswish, relu, addrelu, add.

Although you mentioned, in the same page, that we can use one conv-bn-relu kernel to represent all conv-related kernels, the rest are still a bit different with your claimed cpu kernels. Does this mean we can merge other kernels to a general kernel as well? And what's the rule for that?

I have this issue, because I want to reproduce your results on a a78 cpu, but I get a very different fusion_rules results with yours, and I don't know what to do with it. Should I implement all the new kernels, or should I merge some of these kernels? My results of detected_fusion_rule.json on a78 cpu are:

add_avgpool, add_concat, add_conv, add_relu, avgpool_add, avgpool_concat, avgpool_relu, avgpool_reshape, concat_convtrans, concat_fc, conv_relu, convtrans_relu, fc_relu, dwconv_add, dwconv_relu, relu_avgpool,relu_concat,relu_conv,relu_convtrans,relu_relu,reshape_concat,reshape_conv, reshape_convtrans,reshape_dwconv,reshape_reshape

opened by XYAskWhy 1
无法Create Workspace for customized platform

为什么nn-meter create --customized-workspace "C:\Users\HP\Desktop\nn meter builder1" --backend "1660s"显示 Traceback (most recent call last): File "c:\programdata\anaconda3\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "c:\programdata\anaconda3\lib\runpy.py", line 87, in run_code exec(code, run_globals) File "C:\ProgramData\Anaconda3\Scripts\nn-meter.exe_main.py", line 7, in File "c:\programdata\anaconda3\lib\site-packages\nn_meter\utils\nn_meter_cli\interface.py", line 266, in nn_meter_cli args.func(args) File "c:\programdata\anaconda3\lib\site-packages\nn_meter\utils\nn_meter_cli\builder.py", line 76, in create_workspace_cli raise ValueError(f"Create workspace failed. Please check the backend registration information.") ValueError: Create workspace failed. Please check the backend registration information. 请问哪里出问题了呢？

for customized platform

nn-meter create --customized-workspace <path/to/place/workspace/> --backend

opened by Omar10092 0
Bump tensorflow from 2.7.2 to 2.9.3 in /docs/requirements
Bumps tensorflow from 2.7.2 to 2.9.3.

Release notes

Sourced from tensorflow's releases.

TensorFlow 2.9.3

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

TensorFlow 2.9.2

Release 2.9.2

This releases introduces several vulnerability fixes:

Fixes a CHECK failure in tf.reshape caused by overflows (CVE-2022-35934)

Fixes a CHECK failure in SobolSample caused by missing validation (CVE-2022-35935)

Fixes an OOB read in Gather_nd op in TF Lite (CVE-2022-35937)

Fixes a CHECK failure in TensorListReserve caused by missing validation (CVE-2022-35960)

Fixes an OOB write in Scatter_nd op in TF Lite (CVE-2022-35939)

Fixes an integer overflow in RaggedRangeOp (CVE-2022-35940)

Fixes a CHECK failure in AvgPoolOp (CVE-2022-35941)

Fixes a CHECK failures in UnbatchGradOp (CVE-2022-35952)

Fixes a segfault TFLite converter on per-channel quantized transposed convolutions (CVE-2022-36027)

Fixes a CHECK failures in AvgPool3DGrad (CVE-2022-35959)

Fixes a CHECK failures in FractionalAvgPoolGrad (CVE-2022-35963)

Fixes a segfault in BlockLSTMGradV2 (CVE-2022-35964)

Fixes a segfault in LowerBound and UpperBound (CVE-2022-35965)

... (truncated)

Changelog

Sourced from tensorflow's changelog.

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

Release 2.8.4

This release introduces several vulnerability fixes:

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

... (truncated)

Commits

a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2

258f9a1 Update py_func.cc

cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474

3e75385 Update version numbers to 2.9.3

bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695

3506c90 Update RELEASE.md

8dcb48e Update RELEASE.md

4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...

6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple

5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0

Releases(v2.0)

v2.0(Jun 27, 2022)
Major Updates

Building tools is coming! Provide nn-Meter building tools for users to build latency predictor for their own devices (#43, #59, #66)

Provide a unified interface to connect with TFLite and OpenVINO platforms, and support users to connect their own devices

Support operator fusion rules detection on target backend, and support users to design new test cases.

Provide tools to build kernel latency predictor for several built-in kernels or user-customized kernels.

Support both Tensorflow and PyTorch implementation of fusion rule test cases and kernels.

Provide examples for using nn-Meter building tools.

Minor Updates & Bug Fixes

Add quick start tutorials for users to get started (#58)

Provide support to torch v1.10, tensorflow v2.7, and nni v2.7 (#43)

Fix bugs in torch converter and kernel detector (#47, #49)

Fix bugs in shape parsing of global avgpool and se operator in onnx converter (#60)

Source code(tar.gz)
Source code(zip)
v2.0-data(Mar 7, 2022)

The TFLite Benchmark Tools with version tensorflow==2.1 and tensorflow==2.7 for nn-Meter builder.
Source code(tar.gz)
Source code(zip)
tflite_benchmark_tools_v2.1.zip(3.02 MB)
tflite_benchmark_tools_v2.7.zip(26.58 MB)
v1.1(Nov 16, 2021)
Major Updates

Add nn-Meter Bench Dataset (#25)

Add GNN dataloader for nn-Meter Bench Dataset (#27)

Support torch v1.9, tensorflow v2.6, nni v2.5 (#36)

Add notebook examples for nn-Meter usage (#26, #29)

Minor Updates & Bug Fixes

Support hardware latency prediction for ProxylessNAS in NNI (https://github.com/microsoft/nni/pull/4206)

Refine shape attributes to sync with NNI (fix issue https://github.com/microsoft/nni/issues/4198, PR #30, #33)

Refactor of nn-Meter Project (#41)

Source code(tar.gz)
Source code(zip)
v1.0(Sep 1, 2021)
Release 1.0 - 9/1/2021 (initial release)

Initial release of nn-Meter.

Major Features

Support pip install and source codes install

Support latency prediction for a CNN model with a predefined predictor (edge device)

Provide command line interface nn-meter after installation, and python binding module nn_meter

Provide Docs

Known Issues

Synchronization with NNI: a stable NNI-based torch converter relies on NNI>=2.5.

Can not support torch.jit._overload_method due to issues from torch.

Source code(tar.gz)
Source code(zip)
v1.0-data(Aug 4, 2021)

Source code(tar.gz)
Source code(zip)
adreno630gpu_tflite21.zip(374.63 MB)
adreno640gpu_tflite21.zip(294.53 MB)
cortexA76cpu_tflite21.zip(358.96 MB)
datasets.zip(109.22 MB)
ir_graphs.zip(101.47 KB)
myriadvpu_openvino2019r2.zip(601.49 MB)
onnx_models.zip(935.92 MB)
pb_models.zip(951.21 MB)

nn-Meter is a novel and efficient system to accurately predict the inference latency of DNN models on diverse edge devices

Related tags

Overview

Who should consider using nn-Meter

Installation

Usage

Predict latency of saved CNN model

Convert to nn-Meter IR Graph

Use nn-Meter in your python code

Hardware-aware NAS by nn-Meter and NNI

How the demo works

Contributing

License

Citation

Comments

Release Plan

version 1.0-alpha

version 1.0-beta

version 2.0

TensorFlow 2.4.0

Release 2.4.0

Major Features and Improvements

Breaking Changes

Release 2.4.0

Major Features and Improvements

Breaking Changes

for customized platform

TensorFlow 2.9.3

Release 2.9.3

TensorFlow 2.9.2

Release 2.9.2

Release 2.9.3

Release 2.8.4

Releases(v2.0)

v2.0(Jun 27, 2022)

Major Updates

Minor Updates & Bug Fixes

v2.0-data(Mar 7, 2022)

v1.1(Nov 16, 2021)

Major Updates

Minor Updates & Bug Fixes

v1.0(Sep 1, 2021)

Release 1.0 - 9/1/2021 (initial release)

Major Features

Known Issues

v1.0-data(Aug 4, 2021)

Owner

Microsoft

All-in-one web-based development environment for machine learning

Scikit-Learn useful pre-defined Pipelines Hub

PROTEIN EXPRESSION ANALYSIS FOR DOWN SYNDROME

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

Code base of KU AIRS: SPARK Autonomous Vehicle Team

MosaicML Composer contains a library of methods, and ways to compose them together for more efficient ML training

LightGBM + Optuna: no brainer

Fourier-Bayesian estimation of stochastic volatility models

This is my implementation on the K-nearest neighbors algorithm from scratch using Python

MasTrade is a trading bot in baselines3,pytorch,gym

Summer: compartmental disease modelling in Python

🚪✊Knock Knock: Get notified when your training ends with only two additional lines of code

💀mummify: a version control tool for machine learning

Arquivos do curso online sobre a estatística voltada para ciência de dados e aprendizado de máquina.

XGBoost + Optuna

A basic Ray Tracer that exploits numpy arrays and functions to work fast.

InfiniteBoost: building infinite ensembles with gradient descent

UpliftML: A Python Package for Scalable Uplift Modeling

Cryptocurrency price prediction and exceptions in python