CUDA Python Low-level Bindings

Overview

CUDA-Python

Building

Requirements

Dependencies of the CUDA-Python bindings and some versions that are known to work are as follows:

  • Driver: Linux (450.80.02 or later) Windows(456.38 or later)
  • CUDA Toolkit 11.0 to 11.4 - e.g. 11.4.48
  • Cython - e.g. 0.29.21
  • Versioneer - e.g. 0.20

Compilation

To compile the extension in-place, run:

python setup.py build_ext --inplace

To compile for debugging the extension modules with gdb, pass the --debug argument to setup.py.

Develop installation

You can use

python setup.py develop

to use the module in-place in your current Python environment (e.g. for testing of porting other libraries to use the binding).

Build the Docs

conda env create -f docs/environment-docs.yml
conda activate cuda-python-docs

Then compile and install cuda-python following the steps above.

cd docs
make html
open build/html/index.html

Build the Docs

conda env create -f docs_src/environment-docs.yml
conda activate cuda-python-docs

Then compile and install cuda-python following the steps above.

cd docs_src
make html
open build/html/index.html

Publish the Docs

git checkout gh-pages
cd docs_src
make html
cp -a build/html/. ../docs/

Testing

Requirements

Dependencies of the test execution and some versions that are known to work are as follows:

  • numpy-1.19.5
  • numba-0.53.1
  • matplotlib-3.3.4
  • scipy-1.6.3
  • pytest-benchmark-3.4.1

Unit-tests

You can run the included tests with:

pytest

Samples

You can run the included tests with:

pytest examples

Benchmark

You can run benchmark only tests with:

pytest --benchmark-only

Examples

The included examples are:

  • examples/extra/jit_program.py: Demonstrates the use of the API to compile and launch a kernel on the device. Includes device memory allocation / deallocation, transfers between host and device, creation and usage of streams, and context management.
  • examples/extra/numba_emm_plugin.py: Implements a Numba External Memory Management plugin, showing that this CUDA Python Driver API can coexist with other wrappers of the driver API.
Comments
  • Fails to build on AmazonLinux

    Fails to build on AmazonLinux

    Hi,

    I checked out the package and tried to build it on AmazonLinux but it fails to compile. Please see the build output below. I also tried all other commands there were mentioned in installation guide, but all failed with the same issue.

    Cuda : 11.2 GCC: 9.3

    $ python setup.py build
    Compiling cuda/_cuda/ccuda.pyx because it changed.
    Compiling cuda/_cuda/cnvrtc.pyx because it changed.
    [1/2] Cythonizing cuda/_cuda/ccuda.pyx
    [2/2] Cythonizing cuda/_cuda/cnvrtc.pyx
    Compiling cuda/_lib/utils.pyx because it changed.
    [1/1] Cythonizing cuda/_lib/utils.pyx
    Compiling cuda/_lib/ccudart/ccudart.pyx because it changed.
    Compiling cuda/_lib/ccudart/utils.pyx because it changed.
    [1/2] Cythonizing cuda/_lib/ccudart/ccudart.pyx
    [2/2] Cythonizing cuda/_lib/ccudart/utils.pyx
    Compiling cuda/ccuda.pyx because it changed.
    Compiling cuda/ccudart.pyx because it changed.
    Compiling cuda/cnvrtc.pyx because it changed.
    Compiling cuda/cuda.pyx because it changed.
    Compiling cuda/cudart.pyx because it changed.
    Compiling cuda/nvrtc.pyx because it changed.
    [1/6] Cythonizing cuda/ccuda.pyx
    [2/6] Cythonizing cuda/ccudart.pyx
    [3/6] Cythonizing cuda/cnvrtc.pyx
    [4/6] Cythonizing cuda/cuda.pyx
    [5/6] Cythonizing cuda/cudart.pyx
    [6/6] Cythonizing cuda/nvrtc.pyx
    Compiling cuda/tests/test_ccuda.pyx because it changed.
    Compiling cuda/tests/test_ccudart.pyx because it changed.
    Compiling cuda/tests/test_interoperability_cython.pyx because it changed.
    [1/3] Cythonizing cuda/tests/test_ccuda.pyx
    [2/3] Cythonizing cuda/tests/test_ccudart.pyx
    [3/3] Cythonizing cuda/tests/test_interoperability_cython.pyx
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.8
    creating build/lib.linux-x86_64-3.8/cuda
    copying cuda/__init__.py -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/_version.py -> build/lib.linux-x86_64-3.8/cuda
    creating build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_cuda
    creating build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_lib
    creating build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/__init__.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/kernels.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/perf_test_utils.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/test_cupy.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/test_launch_latency.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/test_numba.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    copying cuda/benchmarks/test_pointer_attributes.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
    creating build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/__init__.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_cuda.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_cudart.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_cython.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_interoperability.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_kernelParams.py -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_nvrtc.py -> build/lib.linux-x86_64-3.8/cuda/tests
    creating build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/__init__.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccuda.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccudart.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cnvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cuda.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cudart.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/nvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cnvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cuda.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cudart.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/nvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cnvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cuda.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/cudart.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/nvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda
    copying cuda/_cuda/ccuda.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/cnvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/loader.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/cnvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/loader.h -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/loader.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_cuda/cnvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
    copying cuda/_lib/dlfcn.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/param_packer.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/utils.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/utils.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/param_packer.h -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/param_packer.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/_lib/utils.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib
    copying cuda/tests/test_ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_interoperability_cython.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/tests/test_interoperability_cython.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
    copying cuda/_lib/ccudart/ccudart.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/utils.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/utils.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    copying cuda/_lib/ccudart/utils.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
    UPDATING build/lib.linux-x86_64-3.8/cuda/_version.py
    set build/lib.linux-x86_64-3.8/cuda/_version.py to '11.7.1'
    running build_ext
    building 'cuda._cuda.ccuda' extension
    creating build/temp.linux-x86_64-3.8
    creating build/temp.linux-x86_64-3.8/cuda
    creating build/temp.linux-x86_64-3.8/cuda/_cuda
    /home/ec2-user/anaconda3/envs/tensorflow2_p38/bin/x86_64-conda-linux-gnu-cc -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -Wstrict-prototypes -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/ec2-user/anaconda3/envs/tensorflow2_p38/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/ec2-user/anaconda3/envs/tensorflow2_p38/include -fPIC -I./cuda -I./cuda/_cuda -I/home/ec2-user/anaconda3/envs/tensorflow2_p38/include -I/usr/local/cuda-11.2/include -I/home/ec2-user/anaconda3/envs/tensorflow2_p38/include/python3.8 -c cuda/_cuda/ccuda.cpp -o build/temp.linux-x86_64-3.8/cuda/_cuda/ccuda.o -std=c++14 -fpermissive -Wno-deprecated-declarations -D _GLIBCXX_ASSERTIONS -fno-var-tracking-assignments -O3
    cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
    cuda/_cuda/ccuda.cpp: In function 'int __pyx_f_4cuda_5_cuda_5ccuda_cuPythonInit()':
    cuda/_cuda/ccuda.cpp:4202:138: error: 'CU_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM' was not declared in this scope
     4202 |         __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuMemcpy"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuMemcpy), 0x1B58, CU_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 836, __pyx_L4_error)
          |                                                                                                                                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:4924:137: error: 'CU_GET_PROC_ADDRESS_DEFAULT' was not declared in this scope
     4924 |         __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuMemcpy"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuMemcpy), 0xFA0, CU_GET_PROC_ADDRESS_DEFAULT); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 917, __pyx_L4_error)
          |                                                                                                                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:5637:152: error: 'CU_GET_PROC_ADDRESS_DEFAULT' was not declared in this scope
     5637 |       __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuGetErrorString"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuGetErrorString), 0x1770, CU_GET_PROC_ADDRESS_DEFAULT); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 997, __pyx_L4_error)
          |                                                                                                                                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:15609:73: error: 'CUflushGPUDirectRDMAWritesTarget' was not declared in this scope
    15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
          |                                                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:15609:122: error: 'CUflushGPUDirectRDMAWritesScope' was not declared in this scope
    15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
          |                                                                                                                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:15609:167: warning: expression list treated as compound expression in initializer [-fpermissive]
    15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
          |                                                                                                                                                                       ^
    cuda/_cuda/ccuda.cpp:16977:94: error: 'CUexecAffinityType' has not been declared
    16977 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuDeviceGetExecAffinitySupport(int *__pyx_v_pi, CUexecAffinityType __pyx_v_typename, CUdevice __pyx_v_dev) {
          |                                                                                              ^~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuDeviceGetExecAffinitySupport(int*, int, CUdevice)':
    cuda/_cuda/ccuda.cpp:17082:30: error: expected primary-expression before '(' token
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                              ^
    cuda/_cuda/ccuda.cpp:17082:32: error: expected primary-expression before ')' token
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                                ^
    cuda/_cuda/ccuda.cpp:17082:34: error: expected primary-expression before 'int'
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                                  ^~~
    cuda/_cuda/ccuda.cpp:17082:41: error: 'CUexecAffinityType' was not declared in this scope
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                                         ^~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:17082:69: error: expected primary-expression before ')' token
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                                                                     ^
    cuda/_cuda/ccuda.cpp:17082:71: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport'
    17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
          |                   ~                                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                       )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:17319:86: error: 'CUexecAffinityParam' has not been declared
    17319 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxCreate_v3(CUcontext *__pyx_v_pctx, CUexecAffinityParam *__pyx_v_paramsArray, int __pyx_v_numParams, unsigned int __pyx_v_flags, CUdevice __pyx_v_dev) {
          |                                                                                      ^~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxCreate_v3(CUctx_st**, int*, int, unsigned int, CUdevice)':
    cuda/_cuda/ccuda.cpp:17424:30: error: expected primary-expression before '(' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                              ^
    cuda/_cuda/ccuda.cpp:17424:32: error: expected primary-expression before ')' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                ^
    cuda/_cuda/ccuda.cpp:17424:44: error: expected primary-expression before '*' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                            ^
    cuda/_cuda/ccuda.cpp:17424:45: error: expected primary-expression before ',' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                             ^
    cuda/_cuda/ccuda.cpp:17424:47: error: 'CUexecAffinityParam' was not declared in this scope
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                               ^~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:17424:68: error: expected primary-expression before ',' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                                                    ^
    cuda/_cuda/ccuda.cpp:17424:70: error: expected primary-expression before 'int'
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                                                      ^~~
    cuda/_cuda/ccuda.cpp:17424:75: error: expected primary-expression before 'unsigned'
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                                                           ^~~~~~~~
    cuda/_cuda/ccuda.cpp:17424:97: error: expected primary-expression before ')' token
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                                                                                                 ^
    cuda/_cuda/ccuda.cpp:17424:99: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3'
    17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
          |                   ~                                                                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                                                   )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:20397:67: error: 'CUexecAffinityParam' was not declared in this scope
    20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
          |                                                                   ^~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:20397:88: error: '__pyx_v_pExecAffinity' was not declared in this scope
    20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
          |                                                                                        ^~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:20397:111: error: 'CUexecAffinityType' was not declared in this scope
    20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
          |                                                                                                               ^~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:20397:146: warning: expression list treated as compound expression in initializer [-fpermissive]
    20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
          |                                                                                                                                                  ^
    cuda/_cuda/ccuda.cpp:33564:75: error: 'CUDA_ARRAY_MEMORY_REQUIREMENTS' was not declared in this scope
    33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
          |                                                                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:33564:107: error: '__pyx_v_memoryRequirements' was not declared in this scope
    33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
          |                                                                                                           ^~~~~~~~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:33564:143: error: expected primary-expression before '__pyx_v_array'
    33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
          |                                                                                                                                               ^~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:33564:167: error: expected primary-expression before '__pyx_v_device'
    33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
    ---
    truncated due to git issue limit
    ---
    cuda/_cuda/ccuda.cpp:58806:44: error: 'CUgraphMem_attribute' was not declared in this scope
    58806 |     __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
          |                                            ^~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:58806:66: error: expected primary-expression before 'void'
    58806 |     __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
          |                                                                  ^~~~
    cuda/_cuda/ccuda.cpp:58806:74: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute'
    58806 |     __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
          |                   ~                                                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                          )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:64515:65: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                 ^~~~~~~~~~~~
          |                                                                 CUsurfObject
    cuda/_cuda/ccuda.cpp:64515:79: error: '__pyx_v_object_out' was not declared in this scope
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                               ^~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:64515:99: error: expected primary-expression before 'void'
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                   ^~~~
    cuda/_cuda/ccuda.cpp:64515:127: error: expected primary-expression before '__pyx_v_destroy'
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                                               ^~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:64515:144: error: expected primary-expression before 'unsigned'
    64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                                                                ^~~~~~~~
    cuda/_cuda/ccuda.cpp:64515:182: error: expected primary-expression before 'unsigned'
    64515 | atic CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                                                                                                    ^~~~~~~~
    
    cuda/_cuda/ccuda.cpp:64515:208: warning: expression list treated as compound expression in initializer [-fpermissive]
    64515 | a_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
          |                                                                                                                                                                                    ^
    
    cuda/_cuda/ccuda.cpp:64686:65: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                 ^~~~~~~~~~~~
          |                                                                 CUsurfObject
    cuda/_cuda/ccuda.cpp:64686:94: error: expected primary-expression before 'unsigned'
    64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                              ^~~~~~~~
    cuda/_cuda/ccuda.cpp:64686:120: warning: expression list treated as compound expression in initializer [-fpermissive]
    64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                                                        ^
    cuda/_cuda/ccuda.cpp:64857:66: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                  ^~~~~~~~~~~~
          |                                                                  CUsurfObject
    cuda/_cuda/ccuda.cpp:64857:95: error: expected primary-expression before 'unsigned'
    64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                               ^~~~~~~~
    cuda/_cuda/ccuda.cpp:64857:121: warning: expression list treated as compound expression in initializer [-fpermissive]
    64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                                                         ^
    cuda/_cuda/ccuda.cpp:65028:93: error: 'CUuserObject' has not been declared
    65028 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphRetainUserObject(CUgraph __pyx_v_graph, CUuserObject __pyx_v_object, unsigned int __pyx_v_count, unsigned int __pyx_v_flags) {
          |                                                                                             ^~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphRetainUserObject(CUgraph, int, unsigned int, unsigned int)':
    cuda/_cuda/ccuda.cpp:65133:30: error: expected primary-expression before '(' token
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                              ^
    cuda/_cuda/ccuda.cpp:65133:32: error: expected primary-expression before ')' token
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                ^
    cuda/_cuda/ccuda.cpp:65133:41: error: expected primary-expression before ',' token
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                         ^
    cuda/_cuda/ccuda.cpp:65133:43: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                           ^~~~~~~~~~~~
          |                                           CUsurfObject
    cuda/_cuda/ccuda.cpp:65133:57: error: expected primary-expression before 'unsigned'
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                                         ^~~~~~~~
    cuda/_cuda/ccuda.cpp:65133:71: error: expected primary-expression before 'unsigned'
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                                                                       ^~~~~~~~
    cuda/_cuda/ccuda.cpp:65133:85: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject'
    65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
          |                   ~                                                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                                     )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:65199:94: error: 'CUuserObject' has not been declared
    65199 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphReleaseUserObject(CUgraph __pyx_v_graph, CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
          |                                                                                              ^~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphReleaseUserObject(CUgraph, int, unsigned int)':
    cuda/_cuda/ccuda.cpp:65304:30: error: expected primary-expression before '(' token
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                              ^
    cuda/_cuda/ccuda.cpp:65304:32: error: expected primary-expression before ')' token
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                                ^
    cuda/_cuda/ccuda.cpp:65304:41: error: expected primary-expression before ',' token
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                                         ^
    cuda/_cuda/ccuda.cpp:65304:43: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                                           ^~~~~~~~~~~~
          |                                           CUsurfObject
    cuda/_cuda/ccuda.cpp:65304:57: error: expected primary-expression before 'unsigned'
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                                                         ^~~~~~~~
    cuda/_cuda/ccuda.cpp:65304:71: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject'
    65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
          |                   ~                                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                       )
    cuda/_cuda/ccuda.cpp: At global scope:
    cuda/_cuda/ccuda.cpp:74604:69: error: 'CUmoduleLoadingMode' was not declared in this scope
    74604 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuModuleGetLoadingMode(CUmoduleLoadingMode *__pyx_v_mode) {
          |                                                                     ^~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp:74604:90: error: '__pyx_v_mode' was not declared in this scope; did you mean '__pyx_k_name'?
    74604 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuModuleGetLoadingMode(CUmoduleLoadingMode *__pyx_v_mode) {
          |                                                                                          ^~~~~~~~~~~~
          |                                                                                          __pyx_k_name
    cuda/_cuda/ccuda.cpp:74775:145: error: 'CUmemRangeHandleType' has not been declared
    74775 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuMemGetHandleForAddressRange(void *__pyx_v_handle, CUdeviceptr __pyx_v_dptr, size_t __pyx_v_size, CUmemRangeHandleType __pyx_v_handleType, unsigned PY_LONG_LONG __pyx_v_flags) {
          |                                                                                                                                                 ^~~~~~~~~~~~~~~~~~~~
    cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuMemGetHandleForAddressRange(void*, CUdeviceptr, size_t, int, long long unsigned int)':
    cuda/_cuda/ccuda.cpp:74880:30: error: expected primary-expression before '(' token
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                              ^
    cuda/_cuda/ccuda.cpp:74880:32: error: expected primary-expression before ')' token
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                ^
    cuda/_cuda/ccuda.cpp:74880:34: error: expected primary-expression before 'void'
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                  ^~~~
    cuda/_cuda/ccuda.cpp:74880:53: error: expected primary-expression before ',' token
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                                     ^
    cuda/_cuda/ccuda.cpp:74880:61: error: expected primary-expression before ',' token
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                                             ^
    cuda/_cuda/ccuda.cpp:74880:63: error: 'CUmemRangeHandleType' was not declared in this scope; did you mean 'CUmemHandleType'?
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                                               ^~~~~~~~~~~~~~~~~~~~
          |                                                               CUmemHandleType
    cuda/_cuda/ccuda.cpp:74880:85: error: expected primary-expression before 'unsigned'
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                                                                                     ^~~~~~~~
    cuda/_cuda/ccuda.cpp:74880:108: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange'
    74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
          |                   ~                                                                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          |                                                                                                            )
    error: command '/home/ec2-user/anaconda3/envs/tensorflow2_p38/bin/x86_64-conda-linux-gnu-cc' failed with exit status 1
    
    opened by pranavladkat 9
  • nvrtc.nvrtcCompileProgram is changing the preferred encoding from UTF-8 to ANSI_X3.4-1968

    nvrtc.nvrtcCompileProgram is changing the preferred encoding from UTF-8 to ANSI_X3.4-1968

    Dear developers,

    I found out that calling the NVRTC for compilation is changing the preferred encoding for the current Python instance.

    For more details and to reproduce the issue, please refer to this StackOverflow question.

    Do you have an idea on why this happens, and how it is possible to revert the preferred encoding to its original setting?

    Thank you in advance

    opened by redsnic 5
  • Failed to dlopen libcuda.so in WSL environment

    Failed to dlopen libcuda.so in WSL environment

    from cuda import cuda
    cuda.cuInit(0)
    
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    Input In [2], in <module>
    ----> 1 cuda.cuInit(0)
    
    File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/cuda.pyx:8876, in cuda.cuda.cuInit()
    
    File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/ccuda.pyx:17, in cuda.ccuda.cuInit()
    
    File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/_cuda/ccuda.pyx:3553, in cuda._cuda.ccuda._cuInit()
    
    File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/_cuda/ccuda.pyx:424, in cuda._cuda.ccuda.cuPythonInit()
    
    RuntimeError: Failed to dlopen libcuda.so
    

    This is because in a WSL environment libcuda.so lives in /usr/lib/wsl/lib which is not in the default search path of dlopen. For libraries that link against libcuda this isn't a problem because there's a file at /etc/ld.so.conf.d/ld.wsl.conf which instructs the linker as to where it can find the libraries, but unfortunately dlopen doesn't use this.

    As a workaround, adding /usr/lib/wsl/lib to the LD_LIBRARY_PATH environment variable resolves the problem.

    opened by kkraus14 4
  • No module named 'examples'

    No module named 'examples'

    I change directories to try to run some examples.

    (cython) [email protected]:~/Documents/cuda-start-dec2022/cuda-python/examples/0_Introduction$ python clock_nvrtc_test.py
    Traceback (most recent call last):
      File "/home/nyck33/Documents/cuda-start-dec2022/cuda-python/examples/0_Introduction/clock_nvrtc_test.py", line 10, in <module>
        from examples.common import common
    ModuleNotFoundError: No module named 'examples'
    

    What am I doing wrong? I am looking at pypi package called absolufy-imports to try to get this going.

    opened by nyck33 3
  • Adopting a set of

    Adopting a set of "supported" python versions

    Right now the project doesn't have any set of explicitly supported python versions. NEP 29 provides an example of how this can be done:

    All minor versions of Python released 42 months prior to the project, and at minimum the two latest minor versions.

    Minimum Python ... version support should be adjusted upward on [a] major and minor release, but never on a patch release.

    This language also allows forecasting of python versions and forecasting (of some degree) of the resources required to maintain the project due to PEP 602 which normalizes the release schedule of python versions.

    There are at least two areas this practically impacts:

    • Support for version specific issues. Having a specified set of support versions allows some version specific issues to be termed in or out of scope, and be prioritized appropriately.
    • Binary distributions are currently made available on pypi and the nvidia channel of conda-forge, this bounds for which versions of python the binaries are targeted.
    opened by m3vaz 3
  • cudart.cudaSetDevice allocates memory on GPU other than target

    cudart.cudaSetDevice allocates memory on GPU other than target

    cuda-python 11.6.1 cuda toolkit 11.2 Ubuntu Linux

    If you run something like the following on a multi-GPU machine

    device_num = 5
    err, = cuda.cuInit(0)
    err, device = cuda.cuDeviceGet(device_num)
    err, cuda_context = cuda.cuCtxCreate(0, device)
    err, = cudart.cudaSetDevice(device)
    

    The call to cudart.cudaSetDevice will properly set your device to '5', but it will also allocate ~305 MB of memory on device 0 (or whichever is the 0th device in the device list provided by CUDA_VISIBLE_DEVICES). I think this issue (possibly in the C-CUDA runtime underneath?) may possibly be the root of many downstream issues in libraries like Tensorflow and Pytorch who have similar issues where a user selects a device but still gets tons of allocations on other devices. This 305 MB may not sound like a lot, but I'm running a program on an Nvidia-DGX with 16 GPUs and I have 64 worker processes, causing 64*305 = 19GB of unusable space to be allocated on GPU 0, which crashes the program. I cannot simply set CUDA_VISIBLE_DEVICES to correct this problem because the workers are communicating via shared GPU memory (via cuIPCMemHandle) with their parent process, and the parent process needs access to all GPUs. Additionally, the worker processes are performing data augmentation on one GPU, while writing output to another GPU with a different device ID.

    I am trying to investigate a workaround to not call 'cudart.cudaSetDevice' at all, but when it is not called I cannot properly use the pointer given by cuda.cuMemAlloc to create a PyTorch tensor. When I call cudart.cudaSetDevice, I am able to use the pointer properly.

    opened by QuiteAFoxtrot 3
  • Use python exceptions instead of `err, ... =`

    Use python exceptions instead of `err, ... =`

    Congratulations on the GA release! 🥳

    I've been looking forward to the cuda bindings for a while, and was just looking through the docs.

    The overview notes an implementation of ASSERT_DRV, which already contains the caveat:

    In a future release, this may automatically raise exceptions using a Python object model.

    I'm not sure if that means that the errors are going to be subclasses of something like a CUDAError, or if that is to be interpreted some other way, but in any case, I was quite surprised about this choice of exception API

    Why not make the functions raise err by default? Right now, IIUC, every invocation would need to accept an extra err-return (and handle it with something like ASSERT_DRV). This seems like a really onerous task to achieve the default behaviour of "fail in case of something unexpected" (and actively choosing where to introduce try... except: handling to continue even if things fail).

    It seems like a bad trade-off for me (high verbosity, and easy to forget adding an ASSERT_DRV), but maybe I'm overlooking something?

    The reasons I'm raising this right now, is that this would be a pretty fundamental API change, and if there's any chance at all (assuming it's not already "zero" after GA), it would be ASAP.

    opened by h-vetinari 3
  • ERROR  Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: none)

    ERROR Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: none)

    venv "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\venv\Scripts\Python.exe" Python 3.11.0 (main, Oct 24 2022, 18:26:48) [MSC v.1933 64 bit (AMD64)] Commit hash: Installing torch and torchvision Traceback (most recent call last): File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 227, in prepare_enviroment() File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 150, in prepare_enviroment run(f'"{python}" -m {torch_command}', "Installing torch and torchvision", "Couldn't install torch") File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 33, in run raise RuntimeError(message) RuntimeError: Couldn't install torch. Command: "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\venv\Scripts\python.exe" -m pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113 Error code: 1 stdout: Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu113

    stderr: ERROR: Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: none) ERROR: No matching distribution found for torch==1.12.1+cu113

    Press any key to continue . . .

    opened by GreatHK 2
  • First base of 'CUkernelNodeAttrValue_v1' is not an extension type

    First base of 'CUkernelNodeAttrValue_v1' is not an extension type

    I'm trying to compile cuda-python in a fairly minimal conda environment (nothing installed but the requirements), with cuda-11.6 installed, and seeing several instances of the following sort of error:

          Error compiling Cython file:
          ------------------------------------------------------------
          ...
                  Get memory address of class instance
    
              """
              pass
    
          cdef class CUkernelNodeAttrValue_v1(CUlaunchAttributeValue_union):
                                             ^
          ------------------------------------------------------------
    
          cuda/cuda.pxd:2637:36: First base of 'CUkernelNodeAttrValue_v1' is not an extension type
    

    I do have other cuda versions installed alongside 11.6 but judging from the output of Parsing headers in "/usr/local/cuda-11.6/include" it seems like it's probably finding the right version? Any advice on how to get past this, or debug it? Thanks!

    opened by bertmaher 2
  • Remove duplicate code in vectorAddMMAP example

    Remove duplicate code in vectorAddMMAP example

    The code to determine granularity is duplicated, immediately after perforing that check, the same code exists. The second entry is being eliminated by this change.

    opened by pentschev 2
  • _ZSt28__throw_bad_array_new_lengthv

    _ZSt28__throw_bad_array_new_lengthv

    ~/cuda-python$ pip install -e .
    Obtaining file:///home/vinuj/cuda-python
    Requirement already satisfied: cython in /home/vinuj/anaconda3/lib/python3.9/site-packages (from cuda-python==11.7.1) (0.29.28)
    Installing collected packages: cuda-python
      Attempting uninstall: cuda-python
        Found existing installation: cuda-python 11.7.1
        Uninstalling cuda-python-11.7.1:
          Successfully uninstalled cuda-python-11.7.1
      Running setup.py develop for cuda-python
    
        from cuda import cuda, cudart
    ImportError: /home/vinuj/cuda-python/cuda/cuda.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZSt28__throw_bad_array_new_lengthv
    
    
    opened by vinutah 2
  • Dropping Python 3.7

    Dropping Python 3.7

    We're considering dropping support for Python 3.7 for the next release. Per NEP 29, Python 3.7 drop schedule was almost a year ago and many associated libraries have already dropped it.

    Let us know if there's concerns in having Python 3.7 dropped next release. Thanks!

    opened by vzhurba01 0
  • No module named 'cuda._lib'; 'cuda' is not a package

    No module named 'cuda._lib'; 'cuda' is not a package

    After following the steps on cuda-python to install cuda-python with conda instruction, I try to

    from cuda import cuda, nvrtc
    

    as in the example in the pycharm python console, but it raises an error:

    Traceback (most recent call last):
      File "D:\Anaconda\envs\hierot\lib\code.py", line 90, in runcode
        exec(code, self.locals)
      File "<input>", line 1, in <module>
      File "D:\PyCharm Community Edition 2022.1.3\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
        module = self._system_import(name, *args, **kwargs)
      File "cuda\cuda.pyx", line 1, in init cuda.cuda
        # Copyright 2021-2022 NVIDIA Corporation.  All rights reserved.
      File "D:\PyCharm Community Edition 2022.1.3\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
        module = self._system_import(name, *args, **kwargs)
    ModuleNotFoundError: No module named 'cuda._lib'; 'cuda' is not a package
    

    But the code above can be successfully run in the terminal

    (hierot) D:\Projects\SimPlatform>python
    Python 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from cuda import cuda, nvrtc
    >>>
    

    Please help me with the problem, thanks in advance. Further information provided on request.

    I searched with

    ModuleNotFoundError: No module named 'xxx'
    

    Solutions suggest configure correct python interpreter, but I believe my interpreter is already properly configured.

    And search with

    No module named 'xxx'; 'yyy' is not a package
    

    Some says the cause is the name cuda is shadowed by the package name cuda, I think it might be the problem. Please check this.

    opened by HIEROT 0
  • Windows: ModuleNotFoundError: No module named 'win32api'

    Windows: ModuleNotFoundError: No module named 'win32api'

    Installing on Windows:

    python -m pip install cuda-python

    Then from python:

    from cuda import cuda

    Fails with

        File "cuda\cuda.pyx", line 1, in init cuda.cuda
    
        File "cuda\ccuda.pyx", line 1, in init cuda.ccuda
    
        File "cuda\_cuda\ccuda.pyx", line 8, in init cuda._cuda.ccuda
    
      ModuleNotFoundError: No module named 'win32api'
    

    I can fix this by installing pypiwin32 manually. But I think it should be listed in requirements.txt if platform_system is Windows.

    Thanks

    opened by ilyasher 0
  • more inference time in cuda env compared to cpu (occured only for a layer)

    more inference time in cuda env compared to cpu (occured only for a layer)

    Dear sir/madam: When I inference on a deep learning model (slowfast model), I'm facing a problem that my python program seems to take more inference time in cuda env compared to cpu. It's not the whole model but one specific layer takes more time on cuda env than cpu. I'm so confused that hope someone can help me with it. Here is the details. the specific layer is "slowway-conv1" layer as showned in the pic below representing the model structure of slowfast. image And my confusing result is as follows. the first for cuda and the second for cpu. image image In cuda env, I found the processing time of "conv1" (0.97s) accounts for a great proportion of the processing time of the whole model (1.04s), while in cpu env, the processing time of "conv1" (0.07s) only accounts for a very small proportion of the processing time of the whole model (4.43s). And I reckon that the proportion in cpu env is reasonable considering the calculation budget. Is my method of time measurement mistaken? I used the following code to measure time cost. image image If it's my fault that causing the confusing result, please kindly point out, or please give me some ideas to help me solve this problem. Thank you very much! Yours, Koala

    opened by koalaaaaaaaaa 0
  • cuda.cudart.cudaRuntimeGetVersion() hard-codes the runtime version, rather than querying the runtime

    cuda.cudart.cudaRuntimeGetVersion() hard-codes the runtime version, rather than querying the runtime

    The current implementation of cuda.cudart.cudaRuntimeGetVersion() hard-codes the runtime version, rather than querying the runtime for its version. This results in incorrect runtime versions if the runtime version is different from the version of cuda-python.

    https://github.com/NVIDIA/cuda-python/blob/746b773c91e1ede708fe9a584b8cdb1c0f32b51d/cuda/_lib/ccudart/ccudart.pyx#L79-L82

    https://github.com/NVIDIA/cuda-python/blob/746b773c91e1ede708fe9a584b8cdb1c0f32b51d/cuda/_lib/ccudart/utils.pyx#L37

    Additional context

    A workaround used in https://github.com/rapidsai/rmm/pull/946 is to use numba's API for this instead:

    import numba.cuda
    
    def cudaRuntimeGetVersion():
        major, minor = numba.cuda.runtime.get_version()
        return major * 1000 + minor * 10
    
    opened by bdice 6
Releases(v12.0.0)
1st Place Solution to ECCV-TAO-2020: Detect and Represent Any Object for Tracking

Instead, two models for appearance modeling are included, together with the open-source BAGS model and the full set of code for inference. With this code, you can achieve around 79 Oct 08, 2022

Equipped customers with insights about their EVs Hourly energy consumption and helped predict future charging behavior using LSTM model

Equipped customers with insights about their EVs Hourly energy consumption and helped predict future charging behavior using LSTM model. Designed sample dashboard with insights and recommendation for

Yash 2 Apr 07, 2022
LibMTL: A PyTorch Library for Multi-Task Learning

LibMTL LibMTL is an open-source library built on PyTorch for Multi-Task Learning (MTL). See the latest documentation for detailed introductions and AP

765 Jan 06, 2023
No-reference Image Quality Assessment(NIQA) Algorithms (BRISQUE, NIQE, PIQE, RankIQA, MetaIQA)

No-Reference Image Quality Assessment Algorithms No-reference Image Quality Assessment(NIQA) is a task of evaluating an image without a reference imag

Dae-Young Song 26 Jan 04, 2023
competitions-v2

Codabench (formerly Codalab Competitions v2) Installation $ cp .env_sample .env $ docker-compose up -d $ docker-compose exec django ./manage.py migrat

CodaLab 21 Dec 02, 2022
Dilated Convolution for Semantic Image Segmentation

Multi-Scale Context Aggregation by Dilated Convolutions Introduction Properties of dilated convolution are discussed in our ICLR 2016 conference paper

Fisher Yu 764 Dec 26, 2022
PyTorch experiments with the Zalando fashion-mnist dataset

zalando-pytorch PyTorch experiments with the Zalando fashion-mnist dataset Project Organization ├── LICENSE ├── Makefile - Makefile with co

Federico Baldassarre 31 Sep 25, 2021
The Ludii general game system, developed as part of the ERC-funded Digital Ludeme Project.

The Ludii General Game System Ludii is a general game system being developed as part of the ERC-funded Digital Ludeme Project (DLP). This repository h

Digital Ludeme Project 50 Jan 04, 2023
Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

Implicit Internal Video Inpainting Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation paper | project

202 Dec 30, 2022
[NeurIPS 2021] Galerkin Transformer: a linear attention without softmax

[NeurIPS 2021] Galerkin Transformer: linear attention without softmax Summary A non-numerical analyst oriented explanation on Toward Data Science abou

Shuhao Cao 159 Dec 20, 2022
The repo contains the code to train and evaluate a system which extracts relations and explanations from dialogue.

The repo contains the code to train and evaluate a system which extracts relations and explanations from dialogue. How do I cite D-REX? For now, cite

Alon Albalak 6 Mar 31, 2022
PN-Net a neural field-based framework for depth estimation from single-view RGB images.

PN-Net We present a neural field-based framework for depth estimation from single-view RGB images. Rather than representing a 2D depth map as a single

1 Oct 02, 2021
Neural Ensemble Search for Performant and Calibrated Predictions

Neural Ensemble Search Introduction This repo contains the code accompanying the paper: Neural Ensemble Search for Performant and Calibrated Predictio

AutoML-Freiburg-Hannover 26 Dec 12, 2022
Unofficial pytorch implementation of 'Image Inpainting for Irregular Holes Using Partial Convolutions'

pytorch-inpainting-with-partial-conv Official implementation is released by the authors. Note that this is an ongoing re-implementation and I cannot f

Naoto Inoue 525 Jan 01, 2023
Deep Learning segmentation suite designed for 2D microscopy image segmentation

Deep Learning segmentation suite dessigned for 2D microscopy image segmentation This repository provides researchers with a code to try different enco

7 Nov 03, 2022
Deep GPs built on top of TensorFlow/Keras and GPflow

GPflux Documentation | Tutorials | API reference | Slack What does GPflux do? GPflux is a toolbox dedicated to Deep Gaussian processes (DGP), the hier

Secondmind Labs 107 Nov 02, 2022
Predicting path with preference based on user demonstration using Maximum Entropy Deep Inverse Reinforcement Learning in a continuous environment

Preference-Planning-Deep-IRL Introduction Check my portfolio post Dependencies Gym stable-baselines3 PyTorch Usage Take Demonstration python3 record.

Tianyu Li 9 Oct 26, 2022
Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide.

SARS-CoV-2 processing requests Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide. Prerequisites This autom

useGalaxy.eu 17 Aug 13, 2022
PyTorch implementaton of our CVPR 2021 paper "Bridging the Visual Gap: Wide-Range Image Blending"

Bridging the Visual Gap: Wide-Range Image Blending PyTorch implementaton of our CVPR 2021 paper "Bridging the Visual Gap: Wide-Range Image Blending".

Chia-Ni Lu 69 Dec 20, 2022
A real-time motion capture system that estimates poses and global translations using only 6 inertial measurement units

TransPose Code for our SIGGRAPH 2021 paper "TransPose: Real-time 3D Human Translation and Pose Estimation with Six Inertial Sensors". This repository

Xinyu Yi 261 Dec 31, 2022