NVIDIA container runtime

Overview

nvidia-container-runtime

GitHub license Package repository

A modified version of runc adding a custom pre-start hook to all containers.
If environment variable NVIDIA_VISIBLE_DEVICES is set in the OCI spec, the hook will configure GPU access for the container by leveraging nvidia-container-cli from project libnvidia-container.

Usage example

# Setup a rootfs based on Ubuntu 16.04
cd $(mktemp -d) && mkdir rootfs
curl -sS http://cdimage.ubuntu.com/ubuntu-base/releases/16.04/release/ubuntu-base-16.04-core-amd64.tar.gz | tar --exclude 'dev/*' -C rootfs -xz

# Create an OCI runtime spec
nvidia-container-runtime spec
sed -i 's;"sh";"nvidia-smi";' config.json
sed -i 's;\("TERM=xterm"\);\1, "NVIDIA_VISIBLE_DEVICES=0";' config.json

# Run the container
sudo nvidia-container-runtime run nvidia_smi

Installation

Ubuntu distributions

  1. Install the repository for your distribution by following the instructions here.
  2. Install the nvidia-container-runtime package:
sudo apt-get install nvidia-container-runtime

CentOS distributions

  1. Install the repository for your distribution by following the instructions here.
  2. Install the nvidia-container-runtime package:
sudo yum install nvidia-container-runtime

Docker Engine setup

Do not follow this section if you installed the nvidia-docker2 package, it already registers the runtime.

To register the nvidia runtime, use the method below that is best suited to your environment.
You might need to merge the new argument with your existing configuration.

Systemd drop-in file

sudo mkdir -p /etc/systemd/system/docker.service.d
sudo tee /etc/systemd/system/docker.service.d/override.conf <<EOF
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker

Daemon configuration file

sudo tee /etc/docker/daemon.json <<EOF
{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
EOF
sudo pkill -SIGHUP dockerd

You can optionally reconfigure the default runtime by adding the following to /etc/docker/daemon.json:

"default-runtime": "nvidia"

Command line

sudo dockerd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime [...]

Environment variables (OCI spec)

Each environment variable maps to an command-line argument for nvidia-container-cli from libnvidia-container.
These variables are already set in our official CUDA images.

NVIDIA_VISIBLE_DEVICES

This variable controls which GPUs will be made accessible inside the container.

Possible values

  • 0,1,2, GPU-fef8089b …: a comma-separated list of GPU UUID(s) or index(es).
  • all: all GPUs will be accessible, this is the default value in our container images.
  • none: no GPU will be accessible, but driver capabilities will be enabled.
  • void or empty or unset: nvidia-container-runtime will have the same behavior as runc.

Note: When running on a MIG capable device, the following values will also be available:

  • 0:0,0:1,1:0, MIG-GPU-fef8089b/0/1 …: a comma-separated list of MIG Device UUID(s) or index(es).

Where the MIG device indices have the form <GPU Device Index>:<MIG Device Index> as seen in the example output:

$ nvidia-smi -L
GPU 0: Graphics Device (UUID: GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5)
  MIG Device 0: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/1/0)
  MIG Device 1: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/1/1)
  MIG Device 2: (UUID: MIG-GPU-b8ea3855-276c-c9cb-b366-c6fa655957c5/11/0)

NVIDIA_MIG_CONFIG_DEVICES

This variable controls which of the visible GPUs can have their MIG configuration managed from within the container. This includes enabling and disabling MIG mode, creating and destroying GPU Instances and Compute Instances, etc.

Possible values

  • all: Allow all MIG-capable GPUs in the visible device list to have their MIG configurations managed.

Note:

  • This feature is only available on MIG capable devices (e.g. the A100).
  • To use this feature, the container must be started with CAP_SYS_ADMIN privileges.
  • When not running as root, the container user must have read access to the /proc/driver/nvidia/capabilities/mig/config file on the host.

NVIDIA_MIG_MONITOR_DEVICES

This variable controls which of the visible GPUs can have aggregate information about all of their MIG devices monitored from within the container. This includes inspecting the aggregate memory usage, listing the aggregate running processes, etc.

Possible values

  • all: Allow all MIG-capable GPUs in the visible device list to have their MIG devices monitored.

Note:

  • This feature is only available on MIG capable devices (e.g. the A100).
  • To use this feature, the container must be started with CAP_SYS_ADMIN privileges.
  • When not running as root, the container user must have read access to the /proc/driver/nvidia/capabilities/mig/monitor file on the host.

NVIDIA_DRIVER_CAPABILITIES

This option controls which driver libraries/binaries will be mounted inside the container.

Possible values

  • compute,video, graphics,utility …: a comma-separated list of driver features the container needs.
  • all: enable all available driver capabilities.
  • empty or unset: use default driver capability: utility,compute.

Supported driver capabilities

  • compute: required for CUDA and OpenCL applications.
  • compat32: required for running 32-bit applications.
  • graphics: required for running OpenGL and Vulkan applications.
  • utility: required for using nvidia-smi and NVML.
  • video: required for using the Video Codec SDK.
  • display: required for leveraging X11 display.

NVIDIA_REQUIRE_*

A logical expression to define constraints on the configurations supported by the container.

Supported constraints

  • cuda: constraint on the CUDA driver version.
  • driver: constraint on the driver version.
  • arch: constraint on the compute architectures of the selected GPUs.
  • brand: constraint on the brand of the selected GPUs (e.g. GeForce, Tesla, GRID).

Expressions

Multiple constraints can be expressed in a single environment variable: space-separated constraints are ORed, comma-separated constraints are ANDed.
Multiple environment variables of the form NVIDIA_REQUIRE_* are ANDed together.

NVIDIA_DISABLE_REQUIRE

Single switch to disable all the constraints of the form NVIDIA_REQUIRE_*.

NVIDIA_REQUIRE_CUDA

The version of the CUDA toolkit used by the container. It is an instance of the generic NVIDIA_REQUIRE_* case and it is set by official CUDA images. If the version of the NVIDIA driver is insufficient to run this version of CUDA, the container will not be started.

Possible values

  • cuda>=7.5, cuda>=8.0, cuda>=9.0 …: any valid CUDA version in the form major.minor.

CUDA_VERSION

Similar to NVIDIA_REQUIRE_CUDA, for legacy CUDA images.
In addition, if NVIDIA_REQUIRE_CUDA is not set, NVIDIA_VISIBLE_DEVICES and NVIDIA_DRIVER_CAPABILITIES will default to all.

Issues and Contributing

Checkout the Contributing document!

Comments
  • Incompatible with docker.io update on ubuntu:

    Incompatible with docker.io update on ubuntu: "error adding seccomp filter rule for syscall clone3: permission denied: unknown"

    Ubuntu is in the progress of updating the docker.io version in 20.04. We've discovered that the nvidia-container-runtime provided for Ubuntu 20.04 is incompatible with this update. This bug was reported in Ubuntu here.

    $ sudo nvidia-docker run \
    >          --shm-size=1g \
    >          --ulimit memlock=-1 \
    >          --ulimit stack=67108864 \
    >          --rm nvcr.io/nvidia/tensorflow:21.07-tf1-py3 -- \
    >          mpiexec \
    >          --bind-to socket \
    >          --allow-run-as-root \
    >          -np 8 \
    >          python -u /workspace/nvidia-examples/cnn/resnet.py \
    >          --layers=50 \
    >          --precision=fp16 \
    >          --batch_size=256 \
    >          --num_iter=300 \
    >          --iter_unit=batch \
    >          --display_every=300
    docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall clone3: permission denied: unknown.
    
    opened by dannf 61
  • ERROR: /usr/bin/nvidia-container-runtime: unmarshal OCI spec file: (Centos7)

    ERROR: /usr/bin/nvidia-container-runtime: unmarshal OCI spec file: (Centos7)

    This is a Bug Report/advice post

    OS CentOS7 Docker 1.13.1 nvidia-container-runtime - newest NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 Card Quadro P2000

    Issue: command: docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

    ERROR: /usr/bin/nvidia-container-runtime: unmarshal OCI spec file: json: cannot unmarshal array into Go struct field Process.capabilities of type specs.LinuxCapabilities
    /usr/bin/docker-current: Error response from daemon: containerd: container not started.
    

    What am im trying to do: trying to test run docker seeing nvidia card correctly. docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

    Nvidia-container-runtime has been added to service file.

    Logs

    2020/06/01 12:11:37 Running /usr/bin/nvidia-container-runtime
    2020/06/01 12:11:37 Using bundle file: /var/run/docker/libcontainerd/7443682205db285fc37af7e2c80bc55a30fa6023e5d8c221af6b6741a3875d70/config.json
    2020/06/01 12:11:37 ERROR: unmarshal OCI spec file: json: cannot unmarshal array into Go struct field Process.capabilities of type specs.LinuxCapabilities
    2020/06/01 12:11:37 Running /usr/bin/nvidia-container-runtime
    2020/06/01 12:11:37 Command is not "create", executing runc doing nothing
    2020/06/01 12:11:37 Looking for "docker-runc" binary
    2020/06/01 12:11:37 "docker-runc" binary not found
    2020/06/01 12:11:37 Looking for "runc" binary
    2020/06/01 12:11:37 ERROR: find runc path: exec: "runc": executable file not found in $PATH
    

    docker version

    Client:
     Version:         1.13.1
     API version:     1.26
     Package version: docker-1.13.1-103.git7f2769b.el7.centos.x86_64
     Go version:      go1.10.3
     Git commit:      7f2769b/1.13.1
     Built:           Sun Sep 15 14:06:47 2019
     OS/Arch:         linux/amd64
    
    Server:
     Version:         1.13.1
     API version:     1.26 (minimum version 1.12)
     Package version: docker-1.13.1-103.git7f2769b.el7.centos.x86_64
     Go version:      go1.10.3
     Git commit:      7f2769b/1.13.1
     Built:           Sun Sep 15 14:06:47 2019
     OS/Arch:         linux/amd64
     Experimental:    false
    
    

    nvidia versions nvidia-xconfig-branch-440-440.64.00-1.el7.x86_64 nvidia-container-runtime-3.2.0-1.x86_64 libnvidia-container-tools-1.1.1-1.x86_64 nvidia-driver-branch-440-cuda-libs-440.64.00-1.el7.x86_64 nvidia-libXNVCtrl-devel-440.64.00-1.el7.x86_64 nvidia-container-toolkit-1.1.1-2.x86_64 yum-plugin-nvidia-0.5-1.el7.noarch nvidia-driver-branch-440-NvFBCOpenGL-440.64.00-1.el7.x86_64 nvidia-modprobe-branch-440-440.64.00-1.el7.x86_64 nvidia-driver-branch-440-cuda-440.64.00-1.el7.x86_64 nvidia-settings-440.64.00-1.el7.x86_64 nvidia-driver-branch-440-devel-440.64.00-1.el7.x86_64 kmod-nvidia-latest-dkms-440.64.00-1.el7.x86_64 nvidia-persistenced-branch-440-440.64.00-1.el7.x86_64 libnvidia-container1-1.1.1-1.x86_64 nvidia-driver-branch-440-libs-440.64.00-1.el7.x86_64 nvidia-libXNVCtrl-440.64.00-1.el7.x86_64 nvidia-driver-branch-440-NVML-440.64.00-1.el7.x86_64 nvidia-driver-branch-440-440.64.00-1.el7.x86_64

    ive tried adding /usr/libexec/docker/docker-runc-current to my user path no luck

    other notes: should they have paths? whereis docker-runc docker-runc: $ whereis runc runc: $

    opened by smogsy 24
  • Pod can not access gpu after few days since pod started, only restart pod the gpu can work again.

    Pod can not access gpu after few days since pod started, only restart pod the gpu can work again.

    A pod access gpu after few days since pod started, and nvidia-smi got error about 'NVML initialize failed: Unknow'. Restart the pod every thing goes well, and few day later it will be wrong again.

    Any one see this problem before?

    My k8s environment:

    K8s with containerd: Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.0", GitCommit:"af46c47ce925f4c4ad5cc8d1fca46c7b77d13b38", GitTreeState:"clean", BuildDate:"2020-12-08T17:51:19Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

    nvidia-container-runtime:

    nvidia-container-cli info
    NVRM version:   460.39
    CUDA version:   11.2
    
    Device Index:   0
    Device Minor:   0
    Model:          GeForce RTX 3090
    Brand:          GeForce
    GPU UUID:       GPU-a1b8699b-3405-6df4-6bcf-5f83706d661e
    Bus Location:   00000000:01:00.0
    Architecture:   8.6
    
    Device Index:   1
    Device Minor:   1
    Model:          GeForce RTX 3090
    Brand:          GeForce
    GPU UUID:       GPU-efaa212b-fdcc-9193-790d-39a5bc75fdb9
    Bus Location:   00000000:21:00.0
    Architecture:   8.6
    
    Device Index:   2
    Device Minor:   2
    Model:          GeForce RTX 3090
    Brand:          GeForce
    GPU UUID:       GPU-d0445686-6904-d70c-9196-0424f4f0ebe4
    Bus Location:   00000000:81:00.0
    Architecture:   8.6
    

    k8s-nvidia-plugin: nvidia/k8s-device-plugin:v0.8.0

    opened by panyang1217 14
  • How can I install nvidia-docker2, nvidia-container-runtime  in other linux distributions?

    How can I install nvidia-docker2, nvidia-container-runtime in other linux distributions?

    Hi,

    I have been trying to install docker-nvidia2 in Debian (8) Jessie without success. I am able to install 17.12.0~ce-0~debian. As I don't see any mention of "other supported distributions", I wonder if is this installation possible?

    Best regards,

    opened by romarcg 14
  • SELinux Module for NVIDIA containers

    SELinux Module for NVIDIA containers

    When we run NVIDIA containers on a SELinux enabled distribution we need a separate SELinux module to run the container contained. Without a SELinux module we have to run the container privileged as this is the only way to allow specific SELinux contexts to interact (read, write, chattr, ...) with the files mounted into the container.

    A container running privileged will get the spc_t label that is allowed to rw, chattr of base types. The base types (device_t, bin_t, proc_t, ...) are introduced by the bind mounts of the hook. A bind mount cannot have two different SELinux contexts as SELinux operates on inode level.

    I have created the following SELinux nvidia-container.te that works with podman/cri-o/docker.

    A prerequisit for the SELinux module to work correctly is to ensure that the labels are correct for the mounted files. Therefore I have added a additional line to the oci-nvidia-hook where I am running a

    nvidia-container-cli -k list | restorecon -v -f -
    

    With this, everytime a container is started the files to be mounted will have the correct SELinux label and the SELinux will work.

    Now I can run NVIDIA containers without the privileged , can cap-drop=ALL capabilites and security-opt=no-new-privileges.

    podman run  --security-opt=no-new-privileges --cap-drop=ALL --security-opt label=type:nvidia_container_t \
                --rm -it docker.io/mirrorgooglecontainers/cuda-vector-add:v0.1
    docker run  --security-opt=no-new-privileges --cap-drop=ALL --security-opt label=type:nvidia_container_t \
                --rm -it docker.io/mirrorgooglecontainers/cuda-vector-add:v0.1
    
    podman run  --user 1000:1000 --security-opt=no-new-privileges --cap-drop=ALL --security-opt label=type:nvidia_container_t \
                --rm -it docker.io/mirrorgooglecontainers/cuda-vector-add:v0.1
    docker run  --user 1000:1000 --security-opt=no-new-privileges --cap-drop=ALL --security-opt label=type:nvidia_container_t \
                --rm -it docker.io/mirrorgooglecontainers/cuda-vector-add:v0.1
    
    opened by zvonkok 13
  • Trying to get nvidia-container-runtime on other distribution to work

    Trying to get nvidia-container-runtime on other distribution to work

    I'm trying to get the nvidia-container-runtime running on Slackware but when i start up a container with the needed startup commands and runtime I can see nvidia-smi in /usr/bin/ from inside the container but it shows that the filesize is zero. Also all library files show up with the filesize zero from inside the container.

    usr-bin

    Can somebody help?

    What i've done so far:

    1. install the latest drivers from the runfile
    2. compiled the libraries for the container runtime
    3. created the /etc/docker/daemon.json file
    4. compiled the nvidia-container-toolkit
    5. compiled runc and renamed it to nvidia-container-runtime
    6. made a symlink from the toolkit to nvidia-container-runtime-hook
    7. compiled seccomp and made sure that it is enabled in the kernel

    I'm a little lost right now and don't know what i've done possibly wrong.

    opened by ich777 12
  • multiarch support

    multiarch support

    This should work for architectures with nvidia cuda repos, an architecture in the manifest list used as the FROM line, and downloadable go binaries.

    Signed-off-by: Christy Norman [email protected]

    opened by clnperez 12
  • Issues installing nvidia-container-runtime

    Issues installing nvidia-container-runtime

    Trying to install the last dependency for nvidia docker 2 which is nvidia container runtime. Followed the steps for a Ubuntu install and after I call "sudo apt-get install nvidia-container-runtime" this is the error that I am getting:

    Reading package lists... Done Building dependency tree Reading state information... Done Package nvidia-container-runtime is not available, but is referred to by another package. This may mean that the package is missing, has been obsoleted, or is only available from another source

    E: Package 'nvidia-container-runtime' has no installation candidate

    Has anyone encountered this and figured out what went wrong?

    opened by Knowlue 11
  • Plugin requirements

    Plugin requirements

    Does this plugin function due to a limitation of runC or containerd settings? Is it possible to configure the prestart hooks at the containerd daemon settings layer instead or replacing runC with this plugin?

    opened by cmluciano 11
  • Issue on repository for Ubuntu 20.04

    Issue on repository for Ubuntu 20.04

    curl https://nvidia.github.io/nvidia-container-runtime/ubuntu20.04/nvidia-container-runtime.list

    return

    deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
    #deb https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /
    deb https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/$(ARCH) /
    #deb https://nvidia.github.io/nvidia-container-runtime/experimental/ubuntu18.04/$(ARCH) /
    

    instead of ubuntu20.04

    opened by myset 10
  • runtime hook using newer cli

    runtime hook using newer cli

    Using a kubernetes example:

    $ docker build -f nvidia-cuda-vector_Dockerfile -t cuda-vector-add:v0.1 .

    Sending build context to Docker daemon  2.046GB
    Step 1/5 : FROM nvidia/cuda-ppc64le:8.0-devel-ubuntu16.04
     ---> 62553fb74993
    Step 2/5 : RUN apt-get update && apt-get install -y --no-install-recommends         cuda-samples-$CUDA_PKG_VERSION &&     rm -rf /var/lib/apt/lists/*
     ---> Running in 1f1ddbb19617
    OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:385: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/local/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --compat32 --graphics --utility --video --display --require=cuda>=8.0 --pid=121739 /var/lib/docker/aufs/mnt/8e537edc1ae0f2e5d7190854e90d35eee0d6d5251eb79d21a396811011333e05]\\\\nnvidia-container-cli configure: unrecognized option '--display'\\\\nTry `nvidia-container-cli configure --help' or `nvidia-container-cli configure\\\\n--usage' for more information.\\\\n\\\"\"": unknown
    

    But if I bump up my FROM from 8.0 to 9.2, I don't get that error and my container builds. I see that the --display option was added to the configure subcommand in late Feb, so I'm thinking this is just a mismatch that expects the cli version to be newer?

    I found someone else running 8.0 on x86 has hit the same issue: https://stackoverflow.com/questions/49938024/errors-on-install-driverless-ai

    $ nvidia-container-cli --version
    version: 1.0.0
    build date: 2018-02-07T18:40+00:00
    build revision: c4cef33eca7ec335007b4f00d95c76a92676b993
    build compiler: gcc-5 5.4.0 20160609
    build platform: ppc64le
    

    I'm being a little lazy and not figuring this out myself, but I'm sure you know pretty quickly what caused this so I don't feel too guilty about my laziness. :D

    opened by clnperez 10
  • How to disable the behavier: mount compat lib (libcuda.so)  to /usr/lib/x86_64-linux-gnu/libcuda.so.520.61.05

    How to disable the behavier: mount compat lib (libcuda.so) to /usr/lib/x86_64-linux-gnu/libcuda.so.520.61.05

    Env

    Machine Environment

    4 * NVIDIA_10 + 470.129.06 Driver (cuda 11.4)

    Dockerfile (cuda 11.8 + cuda-11-8-compat)

    FROM ubuntu:18.04
    
    RUN apt update -y && \
       apt install -y debian-keyring \
                    debian-archive-keyring && \
        apt update -y && \
        apt install -y apt-transport-https \
                       ca-certificates \
                       software-properties-common \
                       libssl-dev \
                       libffi-dev \
                       dirmngr \
                       gnupg1-curl \
                       automake \
                       libtool \
                       libcurl3-dev \
                       libfreetype6-dev \
                       libhdf5-serial-dev \
                       libibverbs-dev \
                       librdmacm-dev \
                       libzmq3-dev \
                       pkg-config \
                       rsync \
                       unzip \
                       zip \
                       zlib1g-dev \
                       g++ \
                       gcc \
                       git \
                       make \
                       libarchive13 \
                       cmake-data \
                       cmake \
                       build-essential \
                       libgmp-dev \
                       bc \
                       jq \
                       curl \
                       iproute2 \
                       htop && \
        ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && \
        echo $TZ > /etc/timezone
        
    # 准备nvidia 源
    RUN echo "add nvidia repos" && \
        echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" >>  /etc/apt/sources.list.d/cuda.list && \
        echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /" >>  /etc/apt/sources.list.d/nvidia-ml.list && \
        apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub && \
        apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub && \
        apt update -y
    
    
    # 安装cuda
    RUN apt-get update && \
        apt-get install -yq --no-install-recommends \
            build-essential \
            cuda-cudart-11-8=11.8.89-1 \
            cuda-command-line-tools-11-8 \
            cuda-cudart-dev-11-8 \
            cuda-compat-11-8 \
            htop && ln -s cuda-11.8 /usr/local/cuda
    

    Runtime1

    killall /usr/bin/nvidia-container-runtime-hook
    dpkg --remove nvidia-container-toolkit nvidia-container-runtime libnvidia-container-tools libnvidia-container1:amd64 nvidia-container-runtime-hook nvidia-container-toolkit-base
    dpkg --purge nvidia-container-toolkit nvidia-container-runtime libnvidia-container-tools libnvidia-container1:amd64 nvidia-container-runtime-hook nvidia-container-toolkit-base
    
    killall /usr/bin/nvidia-container-runtime-hook
    rm -rf /usr/bin/nvidia-container-toolkit
    rm -rf /usr/bin/nvidia-container-runtime-hook
    
    version=1.12.0~rc.2-1
    apt update && apt install -y nvidia-container-toolkit=$version libnvidia-container-tools=$version nvidia-container-toolkit-base=$version libnvidia-container1:amd64=$version --allow-downgrades
    

    Runtime2

    killall /usr/bin/nvidia-container-runtime-hook
    dpkg --remove nvidia-container-toolkit nvidia-container-runtime libnvidia-container-tools libnvidia-container1:amd64 nvidia-container-runtime-hook nvidia-container-toolkit-base
    dpkg --purge nvidia-container-toolkit nvidia-container-runtime libnvidia-container-tools libnvidia-container1:amd64 nvidia-container-runtime-hook nvidia-container-toolkit-base
    
    killall /usr/bin/nvidia-container-runtime-hook
    rm -rf /usr/bin/nvidia-container-toolkit
    rm -rf /usr/bin/nvidia-container-runtime-hook
    
    version=1.0.0
    echo version $version
    apt update -y && apt install -y libnvidia-container-tools=$version-1 libnvidia-container1:amd64=$version-1  nvidia-container-runtime-hook=1.4.0-1 --allow-downgrades
    

    problem

    I found (under Runtime1), (Runtime2 don't has this behavior) one of nvidia-container-toolkit, nvidia-container-runtime, nvidia-container-runtime-hook, nvidia-container-toolkit-base would mount

    /usr/local/cuda/compat/libcuda.so 
    ->
    /usr/lib/x86_64-linux-gnu/libcuda.so.520.61.05
    

    image image

    As a result, After I create a container, I am forced to use compat libs, the lib mount is readonly, and I can't change with LD_LIBRARY_PATH

    opened by qisikai 0
  • Could nvidia-container-runtime work without ld.so.cache

    Could nvidia-container-runtime work without ld.so.cache

    Would it be possible for nvidia-container-runtime to work without the use of ld.so.cache or ldconfig? I got this to work in buildroot but it requires enabling BR2_PACKAGE_GLIBC_UTILS to get ldconfig and generating the cache file. It is common that embedded Linux systems don't include these because libraries are kept in a single flat directory (e.g. /usr/lib).

    opened by kochjr 2
  • Can i get the docker image: nvidia/base/centos:7

    Can i get the docker image: nvidia/base/centos:7

    I'm trying to fix default cap issue in the v3.0.0 version becuse of upgrading fail, so turn to re-build a v3.0.0 hook version. but the docker image nvidia/base/centos:7 is missing. So, where can i get the docker image: nvidia/base/centos:7 ?

    opened by MiiW 2
  • change different number gpu device in same container

    change different number gpu device in same container

    I try to use docker run --gpus to set each container can use one gpu. But sometime some container can use more gpu when other container isn't working. Can I change different number gpu device in same container?

    opened by GuoYingLong 0
  • CUDA libraries not being mounted into container

    CUDA libraries not being mounted into container

    I am using a Jetson Nano, which has just been flashed with the latest SD card image which contains L4T r32.7.2. I have installed the latest Docker Engine. I configured the libnvidia-container apt repository following these instructions which state:

    As of NVIDIA Container Toolkit 1.7.0 (nvidia-docker2 >= 2.8.0) support for Jetson plaforms is included for Ubuntu 18.04, Ubuntu 20.04, and Ubuntu 22.04 distributions. This means that the installation instructions provided for these distributions are expected to work on Jetson devices.

    Then I installed the nvidia-docker2 package, rebooted, and finally ran:

    sudo docker run --rm -it --runtime nvidia nvcr.io/nvidia/l4t-base:r32.7.1 cat /usr/local/cuda-10.2/version.txt
    

    I expect that this would output CUDA Version 10.2.300 because the file exists on the host and as I understand it, the instruction in /etc/nvidia-container-runtime/host-files-for-container.d/cuda.csv (dir, /usr/local/cuda-10.2) tells it to mount this into the container.

    Instead, it tells me that the file is missing from the container. And many of the CUDA libraries are missing as well (libcublas.so in particular).

    Info about packages installed
    cuda-command-line-tools-10-2/unknown,stable,now 10.2.460-1 arm64 [installed,automatic]
    cuda-compiler-10-2/unknown,stable,now 10.2.460-1 arm64 [installed,automatic]
    cuda-cudart-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
    cuda-cudart-dev-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
    cuda-cuobjdump-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
    cuda-cupti-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
    cuda-cupti-dev-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
    cuda-documentation-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
    cuda-driver-dev-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
    cuda-gdb-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
    cuda-libraries-10-2/unknown,stable,now 10.2.460-1 arm64 [installed,automatic]
    cuda-libraries-dev-10-2/unknown,stable,now 10.2.460-1 arm64 [installed,automatic]
    cuda-memcheck-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
    cuda-nvcc-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
    cuda-nvdisasm-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
    cuda-nvgraph-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
    cuda-nvgraph-dev-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
    cuda-nvml-dev-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
    cuda-nvprof-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
    cuda-nvprune-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
    cuda-nvrtc-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
    cuda-nvrtc-dev-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
    cuda-nvtx-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
    cuda-repo-l4t-10-2-local/now 10.2.460-1 arm64 [installed,local]
    cuda-samples-10-2/unknown,stable,now 10.2.300-1 arm64 [installed,automatic]
    cuda-toolkit-10-2/unknown,stable,now 10.2.460-1 arm64 [installed]
    cuda-tools-10-2/unknown,stable,now 10.2.460-1 arm64 [installed,automatic]
    cuda-visual-tools-10-2/unknown,stable,now 10.2.460-1 arm64 [installed,automatic]
    libnvidia-container-tools/bionic,now 1.10.0-1 arm64 [installed]
    libnvidia-container0/bionic,now 0.11.0+jetpack arm64 [installed]
    libnvidia-container1/bionic,now 1.10.0-1 arm64 [installed]
    nvidia-container-csv-cuda/stable,now 10.2.460-1 arm64 [installed]
    nvidia-container-csv-cudnn/stable,now 8.2.1.32-1+cuda10.2 arm64 [installed]
    nvidia-container-csv-tensorrt/stable,now 8.2.1.8-1+cuda10.2 arm64 [installed]
    nvidia-container-csv-visionworks/stable,now 1.6.0.501 arm64 [installed]
    nvidia-container-runtime/bionic,now 3.10.0-1 all [installed]
    nvidia-container-toolkit/bionic,now 1.10.0-1 arm64 [installed]
    nvidia-docker2/bionic,now 2.11.0-1 all [installed]
    nvidia-l4t-3d-core/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-apt-source/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-bootloader/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-camera/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-configs/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-core/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-cuda/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-firmware/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-gputools/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-graphics-demos/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-gstreamer/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-init/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-initrd/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-jetson-io/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-jetson-multimedia-api/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-kernel/stable,now 4.9.253-tegra-32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-kernel-dtbs/stable,now 4.9.253-tegra-32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-kernel-headers/stable,now 4.9.253-tegra-32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-libvulkan/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-multimedia/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-multimedia-utils/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-oem-config/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-tools/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-wayland/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-weston/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-x11/stable,now 32.7.2-20220420143418 arm64 [installed]
    nvidia-l4t-xusb-firmware/stable,now 32.7.2-20220420143418 arm64 [installed]
    
    /etc/apt/sources.list.d/cuda-l4t-10-2-local.list
    ----------------
    deb file:///var/cuda-repo-l4t-10-2-local /
    
    
    /etc/apt/sources.list.d/docker.list
    ----------------
    deb [arch=arm64 signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu   bionic stable
    
    
    /etc/apt/sources.list.d/nvidia-container-toolkit.list
    ----------------
    deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
    #deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /
    
    
    /etc/apt/sources.list.d/nvidia-l4t-apt-source.list
    ----------------
    # SPDX-FileCopyrightText: Copyright (c) 2019-2021 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
    # SPDX-License-Identifier: LicenseRef-NvidiaProprietary
    #
    # NVIDIA CORPORATION, its affiliates and licensors retain all intellectual
    # property and proprietary rights in and to this material, related
    # documentation and any modifications thereto. Any use, reproduction,
    # disclosure or distribution of this material and related documentation
    # without an express license agreement from NVIDIA CORPORATION or
    # its affiliates is strictly prohibited.
    
    deb https://repo.download.nvidia.com/jetson/common r32.7 main
    deb https://repo.download.nvidia.com/jetson/t210 r32.7 main
    
    
    /etc/apt/sources.list.d/visionworks-repo.list
    ----------------
    deb-src file:///var/visionworks-repo /
    deb file:///var/visionworks-repo /
    
    
    /etc/apt/sources.list.d/visionworks-sfm-repo.list
    ----------------
    deb-src file:///var/visionworks-sfm-repo /
    deb file:///var/visionworks-sfm-repo /
    
    
    /etc/apt/sources.list.d/visionworks-tracking-repo.list
    ----------------
    deb-src file:///var/visionworks-tracking-repo /
    deb file:///var/visionworks-tracking-repo /
    
    Client: Docker Engine - Community
     Version:           20.10.17
     API version:       1.41
     Go version:        go1.17.11
     Git commit:        100c701
     Built:             Mon Jun  6 23:02:19 2022
     OS/Arch:           linux/arm64
     Context:           default
     Experimental:      true
    
    Server: Docker Engine - Community
     Engine:
      Version:          20.10.17
      API version:      1.41 (minimum version 1.12)
      Go version:       go1.17.11
      Git commit:       a89b842
      Built:            Mon Jun  6 23:00:46 2022
      OS/Arch:          linux/arm64
      Experimental:     false
     containerd:
      Version:          1.6.7
      GitCommit:        0197261a30bf81f1ee8e6a4dd2dea0ef95d67ccb
     nvidia:
      Version:          1.1.3
      GitCommit:        v1.1.3-0-g6724737
     docker-init:
      Version:          0.19.0
      GitCommit:        de40ad0
    
    opened by rgov 5
Releases(v3.11.0)
Owner
NVIDIA Corporation
NVIDIA Corporation
A library for graph deep learning research

Documentation | Paper [JMLR] | Tutorials | Benchmarks | Examples DIG: Dive into Graphs is a turnkey library for graph deep learning research. Why DIG?

DIVE Lab, Texas A&M University 1.3k Jan 01, 2023
PyTorch implementation of PP-LCNet

PP-LCNet-Pytorch Pre-Trained Models Google Drive p018 Accuracy Models Top1 Top5 PPLCNet_x0_25 0.5186 0.7565 PPLCNet_x0_35 0.5809 0.8083 PPLCNet_x0_5 0

24 Dec 12, 2022
T2F: text to face generation using Deep Learning

⭐ [NEW] ⭐ T2F - 2.0 Teaser (coming soon ...) Please note that all the faces in the above samples are generated ones. The T2F 2.0 will be using MSG-GAN

Animesh Karnewar 533 Dec 22, 2022
classify fashion-mnist dataset with pytorch

Fashion-Mnist Classifier with PyTorch Inference 1- clone this repository: git clone https://github.com/Jhamed7/Fashion-Mnist-Classifier.git 2- Instal

1 Jan 14, 2022
Official implementation of the paper 'Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution' in CVPR 2022

LDL Paper | Supplementary Material Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution Jie Liang*, Hu

150 Dec 26, 2022
Codes accompanying the paper "Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning" (NeurIPS 2021 Spotlight

Implicit Constraint Q-Learning This is a pytorch implementation of ICQ on Datasets for Deep Data-Driven Reinforcement Learning (D4RL) and ICQ-MA on SM

42 Dec 23, 2022
🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

🐤 Nix-TTS An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji

Rendi Chevi 156 Jan 09, 2023
Wind Speed Prediction using LSTMs in PyTorch

Implementation of Deep-Forecast using PyTorch Deep Forecast: Deep Learning-based Spatio-Temporal Forecasting Adapted from original implementation Setu

Onur Kaplan 151 Dec 14, 2022
A pytorch implementation of Detectron. Both training from scratch and inferring directly from pretrained Detectron weights are available.

Use this instead: https://github.com/facebookresearch/maskrcnn-benchmark A Pytorch Implementation of Detectron Example output of e2e_mask_rcnn-R-101-F

Roy 2.8k Dec 29, 2022
Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with predefined topics from an unlabeled document corpus.

Lbl2Vec Lbl2Vec is an algorithm for unsupervised document classification and unsupervised document retrieval. It automatically generates jointly embed

sebis - TUM - Germany 61 Dec 20, 2022
MicRank is a Learning to Rank neural channel selection framework where a DNN is trained to rank microphone channels.

MicRank: Learning to Rank Microphones for Distant Speech Recognition Application Scenario Many applications nowadays envision the presence of multiple

Samuele Cornell 20 Nov 10, 2022
This repo contains implementation of different architectures for emotion recognition in conversations.

Emotion Recognition in Conversations Updates 🔥 🔥 🔥 Date Announcements 03/08/2021 🎆 🎆 We have released a new dataset M2H2: A Multimodal Multiparty

Deep Cognition and Language Research (DeCLaRe) Lab 1k Dec 30, 2022
Chunkmogrify: Real image inversion via Segments

Chunkmogrify: Real image inversion via Segments Teaser video with live editing sessions can be found here This code demonstrates the ideas discussed i

David Futschik 112 Jan 04, 2023
A general python framework for single object tracking in LiDAR point clouds, based on PyTorch Lightning.

Open3DSOT A general python framework for single object tracking in LiDAR point clouds, based on PyTorch Lightning. The official code release of BAT an

Kangel Zenn 172 Dec 23, 2022
The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".

Code for "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval" (ACL 2021, Long) This is the repository for baseline m

Akari Asai 25 Oct 30, 2022
Overview of architecture and implementation of TEDS-Net, as described in MICCAI 2021: "TEDS-Net: Enforcing Diffeomorphisms in Spatial Transformers to Guarantee TopologyPreservation in Segmentations"

TEDS-Net Overview of architecture and implementation of TEDS-Net, as described in MICCAI 2021: "TEDS-Net: Enforcing Diffeomorphisms in Spatial Transfo

Madeleine K Wyburd 14 Jan 04, 2023
Drone detection using YOLOv5

This drone detection system uses YOLOv5 which is a family of object detection architectures and we have trained the model on Drone Dataset. Overview I

Tushar Sarkar 27 Dec 20, 2022
An Inverse Kinematics library aiming performance and modularity

IKPy Demo Live demos of what IKPy can do (click on the image below to see the video): Also, a presentation of IKPy: Presentation. Features With IKPy,

Pierre Manceron 481 Jan 02, 2023
Turi Create simplifies the development of custom machine learning models.

Quick Links: Installation | Documentation | WWDC 2019 | WWDC 2018 Turi Create Check out our talks at WWDC 2019 and at WWDC 2018! Turi Create simplifie

Apple 10.9k Jan 01, 2023
Style-based Point Generator with Adversarial Rendering for Point Cloud Completion (CVPR 2021)

Style-based Point Generator with Adversarial Rendering for Point Cloud Completion (CVPR 2021) An efficient PyTorch library for Point Cloud Completion.

Microsoft 119 Jan 02, 2023