ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Overview

ONNX Runtime is a cross-platform inference and training machine-learning accelerator.

ONNX Runtime inference can enable faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow/Keras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, etc. ONNX Runtime is compatible with different hardware, drivers, and operating systems, and provides optimal performance by leveraging hardware accelerators where applicable alongside graph optimizations and transforms. Learn more →

ONNX Runtime training can accelerate the model training time on multi-node NVIDIA GPUs for transformer models with a one-line addition for existing PyTorch training scripts. Learn more →

Get Started

General Information: onnxruntime.ai

Usage documention and tutorials: onnxruntime.ai/docs

Companion sample repositories:

Build Pipeline Status

System CPU GPU EPs
Windows Build Status Build Status Build Status
Linux Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Mac Build Status
Build Status
Android Build Status
iOS Build Status
WebAssembly Build Status

Data/Telemetry

Windows distributions of this project may collect usage data and send it to Microsoft to help improve our products and services. See the privacy statement for more details.

Contributions and Feedback

We welcome contributions! Please see the contribution guidelines.

For feature requests or bug reports, please file a GitHub Issue.

For general discussion or questions, please use GitHub Discussions.

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

License

This project is licensed under the MIT License.

Comments
  • Openvino ep 2021.4 v3.3

    Openvino ep 2021.4 v3.3

    Changes enabled in OpenVINO EP for IO Buffer Optimization Enable Auto Plugin Feature

    Motivation and Context

    • Change was required to enable IO Buffer Optimization
    • Change was required to enable AutoPlugin, fix Multi, Hetero Flow
    • Change is ONNXRuntime API to get the Device Location for For ORT Value Tensor
    • If it fixes an open issue, please link to the issue here.
    opened by sfatimar 79
  • Java API for onnxruntime

    Java API for onnxruntime

    Description: This pull request provides a Java 8 API using JNI. It has unit tests ported from the v0.5.0 release of the C# API, I'll work on porting the new tests from the master branch over the next few weeks. I assume there will be some design & naming discussion on this PR so we can have that while I work on the unit tests.

    Currently it builds using a separate gradle project which I've tested on Mac & Linux. The build process involves running gradle clean build -x test; gradle build as the combination of a JNI and Java project in Gradle 5 isn't properly supported. I could do with some help integrating it into the CMake build system, but I've not used CMake much before. Integrating it into CMake will make it simpler to put in the appropriate provider compilation flags and fix the oddities in the build (as CMake has all the information necessary).

    opened by Craigacp 75
  • Support CUDA Graph

    Support CUDA Graph

    Description

    This PR wants to support the feature of CUDA Graph. This feature can significantly reduce the CPU overhead of calling CUDA APIs by submitting the entire graph to the GPU with a single call to cudaGraphLaunch.

    Motivation and Context

    • Why is this change required? What problem does it solve? This feature is pretty helpful to reduce the model latency, especially for the online inference, when the above CPU overhead is a bottleneck. For example, it can reduce the 95% latency of the transformer-based online inference model (with 148 millions of parameters) from 4.3ms to 2.1ms.
    opened by feihugis 72
  • Resolve Optim Params Issues

    Resolve Optim Params Issues

    • Includes a test of Optimizer Parameter Groups for the ONNX BERT Model (3 variations)
    • Resolves the issue of not passing default hyperparameters for parameters not in a group
    • Resolves the issue of sending 'lambda_coef' instead of 'lambda' to the backend
    • Resolves the issue of sending lr to the backend as a hyperparameter
    opened by rayankrish 68
  • Upgrade GIST memory compression nodes, kernels, optimizer rule, and cli

    Upgrade GIST memory compression nodes, kernels, optimizer rule, and cli

    Description: Extend Gist memory compression to support additional compression formats, support of new priority execution order, and other upgrades:

    • New Feature: GistPack1 compression. It compresses from float32/bool to 1 bit. It is used for lossless compression for dropout and relu nodes.
    • New Feature: GistPack8 compression. It compresses from 32 bits/16 bits to 8 bits. It is used for lossy compression for any operator.
    • New Feature: GistPackMsfp15 compression. It compresses 8 (or tile size) values each 32 bits wide to 8 (or tile size) values each 7 bits wide (sign and mantissa) and a single 8 bits shared exponent. It is used for lossy compression for any operator.
    • New Feature: GistPack16 compression. It compresses from 32 bits to 16 bits. It is used for lossy compression for any operator.
    • We also upgraded Gist rule to support different operators. We created a generic Gist rule as long as we provide a Pattern map. The pattern map has key as the target operator and value as the destination operator (e.g. PATTER_MAP[Sofmax] = {“SoftmaxGrad”}. Our rule is operator-agnostic, and makes Gist robust to support new operators in the future.
    • New test for Priority execution order for nested compression.
    • Gist upgrade to support priority execution order to trigger encoder (compression) and decoder (decompression) accordingly.
    • Gist CLI: --use_gist, --op <which operator is being targeted, e.g. Softmax is op 1> --gist_compr <GistPack1|GistPack8|GistPack16|GistPackMsfp15>

    Motivation and Context

    • Why is this change required? What problem does it solve? It fixes and improves Gist optimizer rule by changing Gist operators to handle 1 input and 1 output without the need of early encoder input or late decoder output. It also adds new compression format (Pack1, Pack8).
    training 
    opened by fninaparavecino 61
  • Multi-stream executor

    Multi-stream executor

    Description: This PR including following works:

    1. provide stream and related synchronization abstractions in onnxruntime.
    2. enhance onnxruntime's execution planner / executor / memory arena to support execute multiple streams in parallel.
    3. deprecate the parallel executor for cpu.
    4. deprecate the Fence mechanism.
    5. update the cuda / tensorrt EP to support the stream mechanism, support running different request in different cuda stream.

    Motivation and Context

    • Why is this change required? currently, the execution plan is just a linear list of those primitives, ort will execute them step by step. For any given graph, ORT will serialize it to a fixed execution order. This sequential execution design simplifies most scenarios, but it has the following limitations:
    1. it is difficult to enable inter-node parallelization, we have a half-baked parallel executor but it is very difficult to make it work with GPU.
    2. The fence mechanism can work with single gpu stream + cpu thread case, but when extend to multiple stream, it is difficult to manage the cross GPU stream synchronizations.
    3. our cuda EP rely on the BFCArena to make the memory management work with the GPU async kernels, but current BFCArena is not aware of the streams, so it doesn't behavior correctly when run with multiple streams.

    This PR enhance our existing execution plan and executor to support multiple stream execution. we use an unified algorithm to mange both single stream and multiple stream scenarios. This PR mainly focus on the infrastructure support for multiple stream execution, that is said, given a valid stream assignment, onnxruntime can execute it correctly. How to generate a good stream assignment for a given model will be in the future PR.

    opened by souptc 60
  • Amdmigraphx fix build error

    Amdmigraphx fix build error

    Description: Describe your changes. For build error related to EP API changes

    Motivation and Context

    1. ORT EP is changed to use shared lib, and APIs for EP is changed, AMD migraphx needs corresponding changes to work as an EP.
    2. Added a few operators that AMDMIGraphX implemented recently.
    • Why is this change required? What problem does it solve? See above explanation

    • If it fixes an open issue, please link to the issue here. No

    opened by scxiao 60
  • Python MacOS arm64 release binaries

    Python MacOS arm64 release binaries

    Describe the bug

    ONNX Runtime does not install using pip on M1.

    System information

    • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 11.2.1
    • ONNX Runtime installed from (source or binary): pip
    • Python version: 3.9.1

    To Reproduce

    ~: uname -v
    Darwin Kernel Version 20.3.0: Thu Jan 21 00:06:51 PST 2021; root:xnu-7195.81.3~1/RELEASE_ARM64_T8101
    ~: which python3
    /opt/homebrew/bin/python3
    ~: which pip
    /opt/homebrew/bin/pip
    ~: python3 --version
    Python 3.9.1
    ~: pip install onnxruntime
    ERROR: Could not find a version that satisfies the requirement onnxruntime
    ERROR: No matching distribution found for onnxruntime
    
    feature request 
    opened by lutzroeder 59
  • Bump numpy from 1.21.0 to 1.22.0 in /tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/requirements_torch1.11.0_rocm4.3.1

    Bump numpy from 1.21.0 to 1.22.0 in /tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/requirements_torch1.11.0_rocm4.3.1

    Bumps numpy from 1.21.0 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    api 
    opened by dependabot[bot] 55
  • [Java] Adds support for DNNL, OpenVINO, TensorRT shared providers and refactors the CUDA shared provider loader

    [Java] Adds support for DNNL, OpenVINO, TensorRT shared providers and refactors the CUDA shared provider loader

    Description:

    Refactors the native library loading in Java to allow CUDA to be loaded on demand, fixing #7044. Then expands the shared provider library loading to DNNL, OpenVINO, TensorRT, fixing #6553.

    Added a flag to the native library loading to allow users to supply a directory which contains all the native libraries, fixing #8003. This is also the only way to make the shared library providers load from a different place than the jar, as the individual library path specification conflicts with the way that the ONNX Runtime native code loads the shared library providers.

    I also slightly refactored the Java cmake bits, and added the --console=plain flag to the gradle executions to stop gradle writing over cmake's output.

    Motivation and Context

    • Why is this change required? What problem does it solve? Re-enables DNNL, OpenVINO and TensorRT in Java by allowing them to be packaged in the jar and dynamically loaded in the same way CUDA is.
    • If it fixes an open issue, please link to the issue here. Fixes #6553. Fixes #7044. Fixes #8003.
    opened by Craigacp 54
  • Jetson Xavier - building from source

    Jetson Xavier - building from source

    1. I tried the solution proposed here: `../build.sh --config Release --update --build --build_wheel --use_tensorrt --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --tensorrt_home /usr/lib/aarch64-linux-gnu 2020-02-14 14:34:50,960 Build [INFO] - Build started 2020-02-14 14:34:50,960 Build [DEBUG] - Running subprocess in '/code/onnxruntime' ['git', 'submodule', 'sync', '--recursive'] Synchronizing submodule url for 'cmake/external/DNNLibrary' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/flatbuffers' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/glog' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/onnx' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/onnx/third_party/benchmark' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/onnx/third_party/pybind11' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/onnx/third_party/pybind11/tools/clang' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/protobuf' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/protobuf/third_party/benchmark' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/protobuf/third_party/googletest' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/pybind11' Synchronizing submodule url for 'cmake/external/DNNLibrary/third_party/pybind11/tools/clang' Synchronizing submodule url for 'cmake/external/cub' Synchronizing submodule url for 'cmake/external/date' Synchronizing submodule url for 'cmake/external/eigen' Synchronizing submodule url for 'cmake/external/gemmlowp' Synchronizing submodule url for 'cmake/external/googletest' Synchronizing submodule url for 'cmake/external/grpc' Synchronizing submodule url for 'cmake/external/grpc/third_party/abseil-cpp' Synchronizing submodule url for 'cmake/external/grpc/third_party/benchmark' Synchronizing submodule url for 'cmake/external/grpc/third_party/bloaty' Synchronizing submodule url for 'cmake/external/grpc/third_party/bloaty/third_party/googletest' Synchronizing submodule url for 'cmake/external/grpc/third_party/bloaty/third_party/libFuzzer' Synchronizing submodule url for 'cmake/external/grpc/third_party/bloaty/third_party/re2' Synchronizing submodule url for 'cmake/external/grpc/third_party/boringssl' Synchronizing submodule url for 'cmake/external/grpc/third_party/boringssl-with-bazel' Synchronizing submodule url for 'cmake/external/grpc/third_party/cares/cares' Synchronizing submodule url for 'cmake/external/grpc/third_party/data-plane-api' Synchronizing submodule url for 'cmake/external/grpc/third_party/gflags' Synchronizing submodule url for 'cmake/external/grpc/third_party/gflags/doc' Synchronizing submodule url for 'cmake/external/grpc/third_party/googleapis' Synchronizing submodule url for 'cmake/external/grpc/third_party/googletest' Synchronizing submodule url for 'cmake/external/grpc/third_party/libcxx' Synchronizing submodule url for 'cmake/external/grpc/third_party/libcxxabi' Synchronizing submodule url for 'cmake/external/grpc/third_party/protobuf' Synchronizing submodule url for 'cmake/external/grpc/third_party/protobuf/third_party/benchmark' Synchronizing submodule url for 'cmake/external/grpc/third_party/protobuf/third_party/googletest' Synchronizing submodule url for 'cmake/external/grpc/third_party/protoc-gen-validate' Synchronizing submodule url for 'cmake/external/grpc/third_party/upb' Synchronizing submodule url for 'cmake/external/grpc/third_party/upb/third_party/protobuf' Synchronizing submodule url for 'cmake/external/grpc/third_party/upb/third_party/protobuf/third_party/benchmark' Synchronizing submodule url for 'cmake/external/grpc/third_party/upb/third_party/protobuf/third_party/googletest' Synchronizing submodule url for 'cmake/external/grpc/third_party/zlib' Synchronizing submodule url for 'cmake/external/mimalloc' Synchronizing submodule url for 'cmake/external/nsync' Synchronizing submodule url for 'cmake/external/onnx' Synchronizing submodule url for 'cmake/external/onnx/third_party/benchmark' Synchronizing submodule url for 'cmake/external/onnx/third_party/pybind11' Synchronizing submodule url for 'cmake/external/onnx/third_party/pybind11/tools/clang' Synchronizing submodule url for 'cmake/external/onnx-tensorrt' Synchronizing submodule url for 'cmake/external/onnx-tensorrt/third_party/onnx' Synchronizing submodule url for 'cmake/external/onnx-tensorrt/third_party/onnx/third_party/benchmark' Synchronizing submodule url for 'cmake/external/onnx-tensorrt/third_party/onnx/third_party/pybind11' Synchronizing submodule url for 'cmake/external/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' Synchronizing submodule url for 'cmake/external/protobuf' Synchronizing submodule url for 'cmake/external/protobuf/third_party/benchmark' Synchronizing submodule url for 'cmake/external/protobuf/third_party/googletest' Synchronizing submodule url for 'cmake/external/re2' Synchronizing submodule url for 'cmake/external/spdlog' Synchronizing submodule url for 'cmake/external/tvm' Synchronizing submodule url for 'cmake/external/tvm/3rdparty/HalideIR' Synchronizing submodule url for 'cmake/external/tvm/3rdparty/dlpack' Synchronizing submodule url for 'cmake/external/tvm/3rdparty/dmlc-core' Synchronizing submodule url for 'cmake/external/tvm/3rdparty/rang' Synchronizing submodule url for 'cmake/external/wil' 2020-02-14 14:34:52,305 Build [DEBUG] - Running subprocess in '/code/onnxruntime' ['git', 'submodule', 'update', '--init', '--recursive'] 2020-02-14 14:34:54,502 Build [INFO] - Generating CMake build tree 2020-02-14 14:34:54,504 Build [DEBUG] - Running subprocess in '/code/onnxruntime/build/Linux/Release' ['/usr/local/bin/cmake', '/code/onnxruntime/cmake', '-Donnxruntime_RUN_ONNX_TESTS=OFF', '-Donnxruntime_GENERATE_TEST_REPORTS=ON', '-Donnxruntime_DEV_MODE=OFF', '-DPYTHON_EXECUTABLE=/usr/bin/python3', '-Donnxruntime_USE_CUDA=ON', '-Donnxruntime_USE_NSYNC=OFF', '-Donnxruntime_CUDNN_HOME=/usr/lib/aarch64-linux-gnu', '-Donnxruntime_USE_AUTOML=OFF', '-Donnxruntime_CUDA_HOME=/usr/local/cuda', '-Donnxruntime_USE_JEMALLOC=OFF', '-Donnxruntime_USE_MIMALLOC=OFF', '-Donnxruntime_ENABLE_PYTHON=ON', '-Donnxruntime_BUILD_CSHARP=OFF', '-Donnxruntime_BUILD_SHARED_LIB=OFF', '-Donnxruntime_USE_EIGEN_FOR_BLAS=ON', '-Donnxruntime_USE_OPENBLAS=OFF', '-Donnxruntime_USE_MKLDNN=OFF', '-Donnxruntime_USE_MKLML=OFF', '-Donnxruntime_USE_GEMMLOWP=OFF', '-Donnxruntime_USE_NGRAPH=OFF', '-Donnxruntime_USE_OPENVINO=OFF', '-Donnxruntime_USE_OPENVINO_BINARY=OFF', '-Donnxruntime_USE_OPENVINO_SOURCE=OFF', '-Donnxruntime_USE_OPENVINO_MYRIAD=OFF', '-Donnxruntime_USE_OPENVINO_GPU_FP32=OFF', '-Donnxruntime_USE_OPENVINO_GPU_FP16=OFF', '-Donnxruntime_USE_OPENVINO_CPU_FP32=OFF', '-Donnxruntime_USE_OPENVINO_VAD_M=OFF', '-Donnxruntime_USE_OPENVINO_VAD_F=OFF', '-Donnxruntime_USE_NNAPI=OFF', '-Donnxruntime_USE_OPENMP=ON', '-Donnxruntime_USE_TVM=OFF', '-Donnxruntime_USE_LLVM=OFF', '-Donnxruntime_ENABLE_MICROSOFT_INTERNAL=OFF', '-Donnxruntime_USE_BRAINSLICE=OFF', '-Donnxruntime_USE_NUPHAR=OFF', '-Donnxruntime_USE_EIGEN_THREADPOOL=OFF', '-Donnxruntime_USE_TENSORRT=ON', '-Donnxruntime_TENSORRT_HOME=/usr/lib/aarch64-linux-gnu', '-Donnxruntime_CROSS_COMPILING=OFF', '-Donnxruntime_BUILD_SERVER=OFF', '-Donnxruntime_BUILD_x86=OFF', '-Donnxruntime_USE_FULL_PROTOBUF=ON', '-Donnxruntime_DISABLE_CONTRIB_OPS=OFF', '-Donnxruntime_MSVC_STATIC_RUNTIME=OFF', '-Donnxruntime_ENABLE_LANGUAGE_INTEROP_OPS=OFF', '-Donnxruntime_USE_DML=OFF', '-DCUDA_CUDA_LIBRARY=/usr/local/cuda/lib64/stubs', '-Donnxruntime_PYBIND_EXPORT_OPSCHEMA=OFF', '-DCMAKE_BUILD_TYPE=Release'] Use gtest from submodule -- Found PythonInterp: /usr/bin/python3 (found version "3.6.9") -- Found PythonInterp: /usr/bin/python3 (found suitable version "3.6.9", minimum required is "3.5") Use protobuf from submodule -- The CUDA compiler identification is NVIDIA 10.0.326 -- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc -- Check for working CUDA compiler: /usr/local/cuda-10.0/bin/nvcc - broken CMake Error at /usr/local/share/cmake-3.17/Modules/CMakeTestCUDACompiler.cmake:46 (message): The CUDA compiler

      "/usr/local/cuda-10.0/bin/nvcc"

    is not able to compile a simple test program.

    It fails with the following output:

    Change Dir: /code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp
    
    Run Build Command(s):/usr/bin/make cmTC_bb43d/fast && /usr/bin/make -f CMakeFiles/cmTC_bb43d.dir/build.make CMakeFiles/cmTC_bb43d.dir/build
    make[1]: Entering directory '/code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp'
    Building CUDA object CMakeFiles/cmTC_bb43d.dir/main.cu.o
    /usr/local/cuda-10.0/bin/nvcc    -cudart shared  -Xcompiler=-fPIE   -x cu -c /code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp/main.cu -o CMakeFiles/cmTC_bb43d.dir/main.cu.o
    Linking CUDA executable cmTC_bb43d
    /usr/local/bin/cmake -E cmake_link_script CMakeFiles/cmTC_bb43d.dir/link.txt --verbose=1
    /usr/bin/g++   CMakeFiles/cmTC_bb43d.dir/main.cu.o -o cmTC_bb43d  -lcudadevrt -lcudart_static  -L"/usr/local/cuda-10.0/targets/aarch64-linux/lib/stubs" -L"/usr/local/cuda-10.0/targets/aarch64-linux/lib" -lcudadevrt -lcudart
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::globalState::initializeDriverEntrypoints()':
    :(.text+0x23488): undefined reference to `dlsym'
    :(.text+0x234b0): undefined reference to `dlsym'
    :(.text+0x234d4): undefined reference to `dlsym'
    :(.text+0x234f8): undefined reference to `dlsym'
    :(.text+0x2351c): undefined reference to `dlsym'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o)::(.text+0x23540): more undefined references to `dlsym' follow
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::globalState::loadDriverInternal()':
    :(.text+0x288cc): undefined reference to `dlopen'
    :(.text+0x28904): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::__loadDriverInternalUtil()':
    :(.text+0x289e0): undefined reference to `dlopen'
    :(.text+0x28a14): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::globalState::initializeDriverInternal()':
    :(.text+0x2b664): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInit()':
    :(.text+0x5c7bc): undefined reference to `dlerror'
    :(.text+0x5c7c8): undefined reference to `dlopen'
    :(.text+0x5c7dc): undefined reference to `dlsym'
    :(.text+0x5c7e4): undefined reference to `dlerror'
    :(.text+0x5c7f4): undefined reference to `dlclose'
    :(.text+0x5c838): undefined reference to `dlerror'
    :(.text+0x5c844): undefined reference to `dlopen'
    :(.text+0x5c858): undefined reference to `dlsym'
    :(.text+0x5c860): undefined reference to `dlerror'
    :(.text+0x5c870): undefined reference to `dlclose'
    :(.text+0x5c8b4): undefined reference to `dlerror'
    :(.text+0x5c8c0): undefined reference to `dlopen'
    :(.text+0x5c8d4): undefined reference to `dlsym'
    :(.text+0x5c8dc): undefined reference to `dlerror'
    :(.text+0x5c8ec): undefined reference to `dlclose'
    :(.text+0x5c930): undefined reference to `dlerror'
    :(.text+0x5c93c): undefined reference to `dlopen'
    :(.text+0x5c950): undefined reference to `dlsym'
    :(.text+0x5c958): undefined reference to `dlerror'
    :(.text+0x5c968): undefined reference to `dlclose'
    :(.text+0x5c9a0): undefined reference to `dlerror'
    :(.text+0x5c9ac): undefined reference to `dlopen'
    :(.text+0x5c9c0): undefined reference to `dlsym'
    :(.text+0x5c9c8): undefined reference to `dlerror'
    :(.text+0x5c9d8): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosSemaphoreCreate(sem_t*, int)':
    :(.text+0x5d910): undefined reference to `sem_init'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosSemaphoreDestroy(sem_t*)':
    :(.text+0x5d92c): undefined reference to `sem_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosSemaphoreWait(sem_t*, unsigned int)':
    :(.text+0x5da10): undefined reference to `sem_timedwait'
    :(.text+0x5da48): undefined reference to `sem_wait'
    :(.text+0x5da60): undefined reference to `sem_trywait'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosSemaphoreSignal(sem_t*)':
    :(.text+0x5dab0): undefined reference to `sem_post'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosVirtualReserveInRangeBug1778973WARInit()':
    :(.text+0x5f448): undefined reference to `pthread_mutexattr_init'
    :(.text+0x5f464): undefined reference to `pthread_mutexattr_settype'
    :(.text+0x5f474): undefined reference to `pthread_mutexattr_setpshared'
    :(.text+0x5f484): undefined reference to `pthread_mutexattr_setprotocol'
    :(.text+0x5f4a4): undefined reference to `pthread_mutexattr_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosPosixInit()':
    :(.text+0x5f4f0): undefined reference to `dlerror'
    :(.text+0x5f4fc): undefined reference to `dlopen'
    :(.text+0x5f510): undefined reference to `dlsym'
    :(.text+0x5f518): undefined reference to `dlerror'
    :(.text+0x5f528): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosVirtualReserveInRange(unsigned long, void*, void*, unsigned long)':
    :(.text+0x5f768): undefined reference to `pthread_once'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosLoadLibrary(char const*)':
    :(.text+0x5fc8c): undefined reference to `dlerror'
    :(.text+0x5fca0): undefined reference to `dlopen'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosLoadLibraryUnsafe(char const*)':
    :(.text+0x5fcb4): undefined reference to `dlerror'
    :(.text+0x5fcc8): undefined reference to `dlopen'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosFreeLibrary(void*)':
    :(.text+0x5fcd4): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosGetProcAddress(void*, char const*)':
    :(.text+0x5fce8): undefined reference to `dlsym'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTlsAlloc(void (*)(void*))':
    :(.text+0x5fdec): undefined reference to `pthread_key_create'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTlsFree(unsigned int)':
    :(.text+0x5fe10): undefined reference to `pthread_key_delete'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTlsGetValue(unsigned int)':
    :(.text+0x5fe18): undefined reference to `pthread_getspecific'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTlsSetValue(unsigned int, void*)':
    :(.text+0x5fe28): undefined reference to `pthread_setspecific'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInitializeCriticalSectionWithSharedFlag(pthread_mutex_t*, int)':
    :(.text+0x5fef4): undefined reference to `pthread_mutexattr_init'
    :(.text+0x5ff14): undefined reference to `pthread_mutexattr_settype'
    :(.text+0x5ff24): undefined reference to `pthread_mutexattr_setpshared'
    :(.text+0x5ff34): undefined reference to `pthread_mutexattr_setprotocol'
    :(.text+0x5ff50): undefined reference to `pthread_mutexattr_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInitializeCriticalSection(pthread_mutex_t*)':
    :(.text+0x5ff70): undefined reference to `pthread_mutexattr_init'
    :(.text+0x5ff8c): undefined reference to `pthread_mutexattr_settype'
    :(.text+0x5ff9c): undefined reference to `pthread_mutexattr_setpshared'
    :(.text+0x5ffac): undefined reference to `pthread_mutexattr_setprotocol'
    :(.text+0x5ffc8): undefined reference to `pthread_mutexattr_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInitializeCriticalSectionShared(pthread_mutex_t*)':
    :(.text+0x5ffe8): undefined reference to `pthread_mutexattr_init'
    :(.text+0x60004): undefined reference to `pthread_mutexattr_settype'
    :(.text+0x60014): undefined reference to `pthread_mutexattr_setpshared'
    :(.text+0x60024): undefined reference to `pthread_mutexattr_setprotocol'
    :(.text+0x60040): undefined reference to `pthread_mutexattr_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTryEnterCriticalSection(pthread_mutex_t*)':
    :(.text+0x60058): undefined reference to `pthread_mutex_trylock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInitRWLockEx(void**, void*, unsigned long)':
    :(.text+0x600b4): undefined reference to `pthread_rwlockattr_init'
    :(.text+0x600c4): undefined reference to `pthread_rwlockattr_setpshared'
    :(.text+0x600d4): undefined reference to `pthread_rwlock_init'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosInitRWLock(void**)':
    :(.text+0x60114): undefined reference to `pthread_rwlockattr_init'
    :(.text+0x60144): undefined reference to `pthread_rwlockattr_setpshared'
    :(.text+0x60154): undefined reference to `pthread_rwlock_init'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosAcquireReaderLock(void**)':
    :(.text+0x60164): undefined reference to `pthread_rwlock_rdlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosAcquireWriterLock(void**)':
    :(.text+0x6016c): undefined reference to `pthread_rwlock_wrlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTryAcquireReaderLock(void**)':
    :(.text+0x6017c): undefined reference to `pthread_rwlock_tryrdlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosTryAcquireWriterLock(void**)':
    :(.text+0x601a4): undefined reference to `pthread_rwlock_trywrlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosReleaseReaderLock(void**)':
    :(.text+0x601c4): undefined reference to `pthread_rwlock_unlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosReleaseWriterLock(void**)':
    :(.text+0x601cc): undefined reference to `pthread_rwlock_unlock'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosDestroyRWLockEx(void**)':
    :(.text+0x601d4): undefined reference to `pthread_rwlock_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosDestroyRWLock(void**)':
    :(.text+0x601ec): undefined reference to `pthread_rwlock_destroy'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosOnce(int*, void (*)())':
    :(.text+0x60210): undefined reference to `pthread_once'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosCondCreateWithSharedFlag(pthread_cond_t*, int)':
    :(.text+0x60250): undefined reference to `pthread_condattr_setpshared'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosCondCreate(pthread_cond_t*)':
    :(.text+0x602b0): undefined reference to `pthread_condattr_setpshared'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosCondCreateShared(pthread_cond_t*)':
    :(.text+0x60310): undefined reference to `pthread_condattr_setpshared'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosThreadCreateWithName(cudart::CUOSthread_st**, int (*)(void*), void*, char const*)':
    :(.text+0x60564): undefined reference to `pthread_create'
    :(.text+0x60578): undefined reference to `pthread_setname_np'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosThreadCreate(cudart::CUOSthread_st**, int (*)(void*), void*)':
    :(.text+0x60640): undefined reference to `pthread_create'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosThreadJoin(cudart::CUOSthread_st*, int*)':
    :(.text+0x606a8): undefined reference to `pthread_join'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosThreadDetach(cudart::CUOSthread_st*)':
    :(.text+0x60708): undefined reference to `pthread_detach'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosHasThreadExited(cudart::CUOSthread_st*)':
    :(.text+0x60758): undefined reference to `pthread_kill'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosShmCreateNamedEx(void*, char const*, unsigned long, cudart::cuosShmInfoEx_st**)':
    :(.text+0x60ee0): undefined reference to `shm_unlink'
    :(.text+0x60ef8): undefined reference to `shm_open'
    :(.text+0x60f98): undefined reference to `shm_unlink'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosShmOpenNamedEx(void*, char const*, unsigned long, cudart::cuosShmInfoEx_st**)':
    :(.text+0x61124): undefined reference to `shm_open'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosShmCloseEx(cudart::cuosShmInfoEx_st*, unsigned int, unsigned int)':
    :(.text+0x61370): undefined reference to `shm_unlink'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `cudart::cuosSetThreadName(cudart::CUOSthread_st*, char const*)':
    :(.text+0x62294): undefined reference to `pthread_setname_np'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `CUOSdlsymLoader<int (*)(int, sockaddr*, unsigned int*, int)>::~CUOSdlsymLoader()':
    :(.text._ZN15CUOSdlsymLoaderIPFiiP8sockaddrPjiEED2Ev[_ZN15CUOSdlsymLoaderIPFiiP8sockaddrPjiEED5Ev]+0x18): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `CUOSdlsymLoader<int (*)(int*, int)>::~CUOSdlsymLoader()':
    :(.text._ZN15CUOSdlsymLoaderIPFiPiiEED2Ev[_ZN15CUOSdlsymLoaderIPFiPiiEED5Ev]+0x18): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `CUOSdlsymLoader<int (*)(unsigned long, unsigned long, unsigned long const*)>::~CUOSdlsymLoader()':
    :(.text._ZN15CUOSdlsymLoaderIPFimmPKmEED2Ev[_ZN15CUOSdlsymLoaderIPFimmPKmEED5Ev]+0x18): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `CUOSdlsymLoader<int (*)(unsigned long, unsigned long, unsigned long*)>::~CUOSdlsymLoader()':
    :(.text._ZN15CUOSdlsymLoaderIPFimmPmEED2Ev[_ZN15CUOSdlsymLoaderIPFimmPmEED5Ev]+0x18): undefined reference to `dlclose'
    /usr/local/cuda-10.0/targets/aarch64-linux/lib/libcudart_static.a(libcudart_static.a.o): In function `CUOSdlsymLoader<int (*)()>::~CUOSdlsymLoader()':
    :(.text._ZN15CUOSdlsymLoaderIPFivEED2Ev[_ZN15CUOSdlsymLoaderIPFivEED5Ev]+0x18): undefined reference to `dlclose'
    collect2: error: ld returned 1 exit status
    CMakeFiles/cmTC_bb43d.dir/build.make:103: recipe for target 'cmTC_bb43d' failed
    make[1]: *** [cmTC_bb43d] Error 1
    make[1]: Leaving directory '/code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeTmp'
    Makefile:138: recipe for target 'cmTC_bb43d/fast' failed
    make: *** [cmTC_bb43d/fast] Error 2
    

    CMake will not be able to correctly generate this project. Call Stack (most recent call first): CMakeLists.txt:715 (enable_language)

    -- Configuring incomplete, errors occurred! See also "/code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeOutput.log". See also "/code/onnxruntime/build/Linux/Release/CMakeFiles/CMakeError.log". Traceback (most recent call last): File "/code/onnxruntime/tools/ci_build/build.py", line 1043, in sys.exit(main()) File "/code/onnxruntime/tools/ci_build/build.py", line 972, in main args, cmake_extra_args) File "/code/onnxruntime/tools/ci_build/build.py", line 422, in generate_build_tree run_subprocess(cmake_args + ["-DCMAKE_BUILD_TYPE={}".format(config)], cwd=config_build_dir) File "/code/onnxruntime/tools/ci_build/build.py", line 196, in run_subprocess return subprocess.run(args, cwd=cwd, check=True, stdout=stdout, stderr=stderr, env=my_env, shell=shell) File "/usr/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['/usr/local/bin/cmake', '/code/onnxruntime/cmake', '-Donnxruntime_RUN_ONNX_TESTS=OFF', '-Donnxruntime_GENERATE_TEST_REPORTS=ON', '-Donnxruntime_DEV_MODE=OFF', '-DPYTHON_EXECUTABLE=/usr/bin/python3', '-Donnxruntime_USE_CUDA=ON', '-Donnxruntime_USE_NSYNC=OFF', '-Donnxruntime_CUDNN_HOME=/usr/lib/aarch64-linux-gnu', '-Donnxruntime_USE_AUTOML=OFF', '-Donnxruntime_CUDA_HOME=/usr/local/cuda', '-Donnxruntime_USE_JEMALLOC=OFF', '-Donnxruntime_USE_MIMALLOC=OFF', '-Donnxruntime_ENABLE_PYTHON=ON', '-Donnxruntime_BUILD_CSHARP=OFF', '-Donnxruntime_BUILD_SHARED_LIB=OFF', '-Donnxruntime_USE_EIGEN_FOR_BLAS=ON', '-Donnxruntime_USE_OPENBLAS=OFF', '-Donnxruntime_USE_MKLDNN=OFF', '-Donnxruntime_USE_MKLML=OFF', '-Donnxruntime_USE_GEMMLOWP=OFF', '-Donnxruntime_USE_NGRAPH=OFF', '-Donnxruntime_USE_OPENVINO=OFF', '-Donnxruntime_USE_OPENVINO_BINARY=OFF', '-Donnxruntime_USE_OPENVINO_SOURCE=OFF', '-Donnxruntime_USE_OPENVINO_MYRIAD=OFF', '-Donnxruntime_USE_OPENVINO_GPU_FP32=OFF', '-Donnxruntime_USE_OPENVINO_GPU_FP16=OFF', '-Donnxruntime_USE_OPENVINO_CPU_FP32=OFF', '-Donnxruntime_USE_OPENVINO_VAD_M=OFF', '-Donnxruntime_USE_OPENVINO_VAD_F=OFF', '-Donnxruntime_USE_NNAPI=OFF', '-Donnxruntime_USE_OPENMP=ON', '-Donnxruntime_USE_TVM=OFF', '-Donnxruntime_USE_LLVM=OFF', '-Donnxruntime_ENABLE_MICROSOFT_INTERNAL=OFF', '-Donnxruntime_USE_BRAINSLICE=OFF', '-Donnxruntime_USE_NUPHAR=OFF', '-Donnxruntime_USE_EIGEN_THREADPOOL=OFF', '-Donnxruntime_USE_TENSORRT=ON', '-Donnxruntime_TENSORRT_HOME=/usr/lib/aarch64-linux-gnu', '-Donnxruntime_CROSS_COMPILING=OFF', '-Donnxruntime_BUILD_SERVER=OFF', '-Donnxruntime_BUILD_x86=OFF', '-Donnxruntime_USE_FULL_PROTOBUF=ON', '-Donnxruntime_DISABLE_CONTRIB_OPS=OFF', '-Donnxruntime_MSVC_STATIC_RUNTIME=OFF', '-Donnxruntime_ENABLE_LANGUAGE_INTEROP_OPS=OFF', '-Donnxruntime_USE_DML=OFF', '-DCUDA_CUDA_LIBRARY=/usr/local/cuda/lib64/stubs', '-Donnxruntime_PYBIND_EXPORT_OPSCHEMA=OFF', '-DCMAKE_BUILD_TYPE=Release']' returned non-zero exit status 1. `

    opened by AndreV84 52
  • make WITHCACHE as an option in MacOS workflow

    make WITHCACHE as an option in MacOS workflow

    Description

    1. Set the WithCache default value as false in Mac OS CI workflow too.
    2. Add date of today in cache key to avoid cache size keep increasing too.

    WithCache, the pipeline duration reduced from 70 more minutes to 10 more minutes

    opened by mszhanyi 0
  • please reopen the issue

    please reopen the issue

    Describe the issue

    Could you please reopen this issue? We get the same problem in opset_version=16. issue: https://github.com/microsoft/onnxruntime/issues/2756#issue-543199292.

    Urgency

    No response

    Target platform

    Windows

    Build script

    .

    Error / output

    onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running BatchNormalization node. Name:'BatchNormalization_123' Status Message: D:\a_work\1\s\onnxruntime\core\framework\op_kernel.cc:81 onnxruntime::OpKernelContext::OutputMLValue status.IsOK() was false. Shape mismatch attempting to re-use buffer. {1,3,256,192} != {1,6,256,192}. Validate usage of dim_value (values should be > 0) and dim_param (all values with the same string should equate to the same size) in shapes in the model.

    Visual Studio Version

    No response

    GCC / Compiler Version

    No response

    build platform:windows 
    opened by shu0o0yX 0
  • CUDNN error executing cudnnConvolutionForward

    CUDNN error executing cudnnConvolutionForward

    Describe the issue

    Hi, I'm running the same ONNX model on many different machines in azure (all of the same type, same configuration, docker, etc...) and on some of them I get the following error on the first batch which is being executed:

    <class 'onnxruntime.capi.onnxruntime_pybind11_state.Fail'>
    
    [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Conv node. Name:'efficientnetb4/stem_conv/Conv2D' Status Message: CUDNN error executing cudnnConvolutionForward(s_.handle, &alpha, s_.x_tensor, s_.x_data, s_.w_desc, s_.w_data, s_.conv_desc, s_.algo, workspace.get(), s_.workspace_bytes, &beta, s_.y_tensor, s_.y_data)
    

    It happens only on some of the machines, and only on the first message.

    To reproduce

    onnxruntime-gpu==1.10.0

     ONNX_PROVIDERS = [
         ('CUDAExecutionProvider', {
             'device_id': 0,
             'cudnn_conv_algo_search': 'DEFAULT', 
         }),
     ]
    ONNX_SESSION_OPTIONS = onnxruntime.SessionOptions()
    ONNX_SESSION_OPTIONS.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_ALL
    efficientnet = onnxruntime.InferenceSession(str(fe_net_weights),
                                                            sess_options=ONNX_SESSION_OPTIONS,
                                                            providers=ONNX_PROVIDERS)
    
    feature_extractor.run([output_layer], {"input": input})
    

    Urgency

    No response

    Platform

    Linux

    OS Version

    Ubuntu 20.04

    ONNX Runtime Installation

    Released Package

    ONNX Runtime Version or Commit ID

    onnxruntime-gpu==1.10.0

    ONNX Runtime API

    Python

    Architecture

    X64

    Execution Provider

    CUDA

    Execution Provider Library Version

    cuda 11.3.0, cudnn8

    ep:CUDA 
    opened by kfirgoldwsc 0
  • How to save inference onnx model?

    How to save inference onnx model?

    Describe the issue

    Now I can build my own training session from torch net, but when I save onnx model after training, BatchNormalization is in training mode and can not fuse to conv. What should I do to save inference model ? current format: 1

    expect format: 0

    To reproduce

    2

    Urgency

    No response

    ONNX Runtime Installation

    Built from Source

    ONNX Runtime Version or Commit ID

    1.8.1

    PyTorch Version

    3.7

    Execution Provider

    CUDA

    Execution Provider Library Version

    No response

    training ep:CUDA 
    opened by ArtyZe 0
  • [MIGraphX] update the MIGraphX version used in ORT to rocm-5.4.0

    [MIGraphX] update the MIGraphX version used in ORT to rocm-5.4.0

    Description

    Update the MIGraphX version used in ORT to rocm-5.4.0

    Motivation and Context

    The previous branch migraphx_for_ort has stopped updating, it is too far away from the MIgraphX latest release branch. More discussion here: https://github.com/microsoft/onnxruntime/issues/14126#issuecomment-1373201049

    opened by PeixuanZuo 0
  • Update HistogramCalibrater.collect_data method to reduce memory consumption

    Update HistogramCalibrater.collect_data method to reduce memory consumption

    Description

    Updated HistogramCalibrater.collect_data method.

    Inference results are no longer appended to self.intermediate_outputs list. Instead, self.collector.collect method is called inside a while loop.

    Motivation and Context

    When CalibrationMethod.Entropy or CalibrationMethod.Percentile is specified, HistogramCalibrater class is used.

    In the HistogramCalibrater.collect_data method, all the intermediate outputs are taken in prior to collect histograms using HistogramCollector class. But this two-pass scheme consumes a lot of memory when a network has many intermediate output nodes and there're a lot of data that CalibrationDataReader provides.

    Please be noted that quantized models aren't identical after the changes. I suppose it won't cause harmful results though.

    opened by beru 0
Releases(v1.13.1)
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
DeepStochlog Package For Python

DeepStochLog Installation Installing SWI Prolog DeepStochLog requires SWI Prolog to run. Run the following commands to install: sudo apt-add-repositor

KU Leuven Machine Learning Research Group 17 Dec 23, 2022
GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model -- based on GPT-3, called GPT-Codex -- that is fine-tuned on publicly available code from GitHub.

2.3k Jan 09, 2023
an implementation of 3D Ken Burns Effect from a Single Image using PyTorch

3d-ken-burns This is a reference implementation of 3D Ken Burns Effect from a Single Image [1] using PyTorch. Given a single input image, it animates

Simon Niklaus 1.4k Dec 28, 2022
Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch

Retrieval-Augmented Denoising Diffusion Probabilistic Models (wip) Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in P

Phil Wang 55 Jan 01, 2023
LUKE -- Language Understanding with Knowledge-based Embeddings

LUKE (Language Understanding with Knowledge-based Embeddings) is a new pre-trained contextualized representation of words and entities based on transf

Studio Ousia 587 Dec 30, 2022
1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection

This project releases our 1st place solution on ICDAR 2021 Competition on Mathematical Formula Detection. We implement our solution based on MMDetection, which is an open source object detection tool

yuxzho 94 Dec 25, 2022
Leveraging OpenAI's Codex to solve cornerstone problems in Music

Music-Codex Leveraging OpenAI's Codex to solve cornerstone problems in Music Please NOTE: Presented generated samples were created by OpenAI's Codex P

Alex 2 Mar 11, 2022
Discover hidden deepweb pages

DeepWeb Scapper Att: Demo version An simple script to scrappe deepweb to find pages. Will return if any of those exists and will save on a file. You s

Héber Júlio 77 Oct 02, 2022
Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification

Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification

DingDing 143 Jan 01, 2023
Experiments for Fake News explainability project

fake-news-explainability Experiments for fake news explainability project This repository only contains the notebooks used to train the models and eva

Lorenzo Flores (Lj) 1 Dec 03, 2022
P-Tuning v2: Prompt Tuning Can Be Comparable to Finetuning Universally Across Scales and Tasks

P-tuning v2 P-Tuning v2: Prompt Tuning Can Be Comparable to Finetuning Universally Across Scales and Tasks An optimized prompt tuning strategy achievi

THUDM 540 Dec 30, 2022
Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

Music Source Separation with Channel-wise Subband Phase Aware ResUnet (CWS-PResUNet) Introduction This repo contains the pretrained Music Source Separ

Lau 100 Dec 25, 2022
League of Legends Reinforcement Learning Environment (LoLRLE) multiple training scenarios using PPO.

League of Legends Reinforcement Learning Environment (LoLRLE) About This repo contains code to train an agent to play league of legends in a distribut

2 Aug 19, 2022
A lightweight Python-based 3D network multi-agent simulator. Uses a cell-based congestion model. Calculates risk, loudness and battery capacities of the agents. Suitable for 3D network optimization tasks.

AMAZ3DSim AMAZ3DSim is a lightweight python-based 3D network multi-agent simulator. It uses a cell-based congestion model. It calculates risk, battery

Daniel Hirsch 13 Nov 04, 2022
Simple Dynamic Batching Inference

Simple Dynamic Batching Inference 解决了什么问题? 众所周知,Batch对于GPU上深度学习模型的运行效率影响很大。。。 是在Inference时。搜索、推荐等场景自带比较大的batch,问题不大。但更多场景面临的往往是稀碎的请求(比如图片服务里一次一张图)。 如果

116 Jan 01, 2023
Can we do Customers Segmentation using PHP and Unsupervized Machine Learning ? Yes we can ! 🤡

Customers Segmentation using PHP and Rubix ML PHP Library Can we do Customers Segmentation using PHP and Unsupervized Machine Learning ? Yes we can !

Mickaël Andrieu 11 Oct 08, 2022
Implementation of CVPR'2022:Surface Reconstruction from Point Clouds by Learning Predictive Context Priors

Surface Reconstruction from Point Clouds by Learning Predictive Context Priors (CVPR 2022) Personal Web Pages | Paper | Project Page This repository c

136 Dec 12, 2022
Code of paper: "DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks"

DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks Abstract: Adversarial training has been proven to

倪仕文 (Shiwen Ni) 58 Nov 10, 2022
Code for ICLR 2021 Paper, "Anytime Sampling for Autoregressive Models via Ordered Autoencoding"

Anytime Autoregressive Model Anytime Sampling for Autoregressive Models via Ordered Autoencoding , ICLR 21 Yilun Xu, Yang Song, Sahaj Gara, Linyuan Go

Yilun Xu 22 Sep 08, 2022
Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"

Easy-To-Hard The official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks". Gett

Avi Schwarzschild 52 Sep 08, 2022