Python 3 Bindings for NVML library. Get NVIDIA GPU status inside your program.

Related tags

GPU Utilitiespy3nvml
Overview

py3nvml

Documentation also available at readthedocs.

Python 3 compatible bindings to the NVIDIA Management Library. Can be used to query the state of the GPUs on your system. This was ported from the NVIDIA provided python bindings nvidia-ml-py, which only supported python 2. I have forked from version 7.352.0. The old library was itself a wrapper around the NVIDIA Management Library.

In addition to these NVIDIA functions to query the state of the GPU, I have written a couple functions/tools to help in using gpus (particularly for a shared gpu server). These are:

  • A function to 'restrict' the available GPUs by setting the CUDA_VISIBLE_DEVICES environment variable.
  • A script for displaying a differently formatted nvidia-smi.

See the Utils section below for more info.

Updates in Version 0.2.3

To try and keep py3nvml somewhat up-to-date with the constantly evolving nvidia drivers, I have done some work to the py3nvml.py3nvml module. In particular, I have updated all the constants that were missing in py3nvml and existing in the NVIDIA source as of version 418.43. In addition, I have wrapped all of these constants in Enums so it is easier to see what constants go together. Finally, for all the functions in py3nvml.py3nvml I have copied in the C docstring. While this will result in some strange looking docstrings which will be slightly incorrect, they should give good guidance on the scope of the function, something which was ill-defined before.

Finally, I will remove the py3nvml.nvidia_smi module in a future version, as I believe it was only ever meant as an example of how to use the nvml functions to query the gpus, and is now quite out of date. To get the same functionality, you can call nvidia-smi -q -x from python with subprocess.

Requires

Python 3.5+.

Installation

From PyPi:

$ pip install py3nvml

From GitHub:

$ pip install -e git+https://github.com/fbcotter/py3nvml#egg=py3nvml

Or, download and pip install:

$ git clone https://github.com/fbcotter/py3nvml
$ cd py3nvml
$ pip install .

Utils

(Added by me - not ported from NVIDIA library)

grab_gpus

You can call the grab_gpus(num_gpus, gpu_select, gpu_fraction=.95) function to check the available gpus and set the CUDA_VISIBLE_DEVICES environment variable as need be. It determines if a GPU is available by checking if the amount of free memory is below memory-usage is above/equal to the gpu_fraction value. The default of .95 allows for some small amount of memory to be taken before it deems the gpu as being 'used'.

I have found this useful as I have a shared gpu server and like to use tensorflow which is very greedy and calls to tf.Session() grabs all available gpus.

E.g.

import py3nvml
import tensorflow as tf
py3nvml.grab_gpus(3)
sess = tf.Session() # now we only grab 3 gpus!

Or the following will grab 2 gpus from the first 4 (and leave any higher gpus untouched)

py3nvml.grab_gpus(num_gpus=2, gpu_select=[0,1,2,3])
sess = tf.Session()

This will look for 3 available gpus in the range of gpus from 0 to 3. The range option is not necessary, and it only serves to restrict the search space for the grab_gpus.

You can adjust the memory threshold for determining if a GPU is free/used with the gpu_fraction parameter (default is 1):

# Will allocate a GPU if less than 20% of its memory is being used
py3nvml.grab_gpus(num_gpus=2, gpu_fraction=0.8)
sess = tf.Session()

This function has no return codes but may raise some warnings/exceptions:

  • If the method could not connect to any NVIDIA gpus, it will raise a RuntimeWarning.
  • If it could connect to the GPUs, but there were none available, it will raise a ValueError.
  • If it could connect to the GPUs but not enough were available (i.e. more than 1 was requested), it will take everything it can and raise a RuntimeWarning.

get_free_gpus

This tool can query the gpu status. Unlike the default for grab_gpus, which checks the memory usage of a gpu, this function checks if a process is running on a gpu. For a system with N gpus, returns a list of N booleans, where the nth value is True if no process was found running on gpu n. An example use is:

import py3nvml
free_gpus = py3nvml.get_free_gpus()
if True not in free_gpus:
    print('No free gpus found')

get_num_procs

This function is called by get_free_gpus. It simply returns a list of integers with the number of processes running on each gpu. E.g. if you had 1 process running on gpu 5 in an 8 gpu system, you would expect to get the following:

import py3nvml
num_procs = py3nvml.get_num_procs()
print(num_proces)
>>> [0, 0, 0, 0, 0, 1, 0, 0]

py3smi

I found the default nvidia-smi output was missing some useful info, so made use of the py3nvml/nvidia_smi.py module to query the device and get info on the GPUs, and then defined my own printout. I have included this as a script in scripts/py3smi. The print code is horribly messy but the query code is very simple and should be understandable.

Running pip install will now put this script in your python's bin, and you'll be able to run it from the command line. Here is a comparison of the two outputs:

https://i.imgur.com/TvdfkFE.png

https://i.imgur.com/UPSHr8k.png

For py3smi, you can specify an update period so it will refresh the feed every few seconds. I.e., similar to watch -n5 nvidia-smi, you can run py3smi -l 5.

You can also get the full output (very similar to nvidia-smi) by running py3smi -f (this shows a slightly modified process info pane below).

Regular Usage

Visit NVML reference for a list of the functions available and their help. Also the script py3smi is a bit hacky but shows examples of me querying the GPUs for info.

(below here is everything ported from pynvml)

from py3nvml.py3nvml import *
nvmlInit()
print("Driver Version: {}".format(nvmlSystemGetDriverVersion()))
# e.g. will print:
#   Driver Version: 352.00
deviceCount = nvmlDeviceGetCount()
for i in range(deviceCount):
    handle = nvmlDeviceGetHandleByIndex(i)
    print("Device {}: {}".format(i, nvmlDeviceGetName(handle)))
# e.g. will print:
#  Device 0 : Tesla K40c
#  Device 1 : Tesla K40c

nvmlShutdown()

Additionally, see py3nvml.nvidia_smi.py. This does the equivalent of the nvidia-smi command:

nvidia-smi -q -x

With

import py3nvml.nvidia_smi as smi
print(smi.XmlDeviceQuery())

Differences from NVML

The py3nvml library consists of python methods which wrap several NVML functions, implemented in a C shared library. Each function's use is the same with the following exceptions:

  1. Instead of returning error codes, failing error codes are raised as Python exceptions. I.e. They should be wrapped with exception handlers.
try:
    nvmlDeviceGetCount()
except NVMLError as error:
    print(error)
  1. C function output parameters are returned from the corresponding Python function as tuples, rather than requiring pointers. Eg the C function:
nvmlReturn_t nvmlDeviceGetEccMode(nvmlDevice_t device,
                                  nvmlEnableState_t *current,
                                  nvmlEnableState_t *pending);

Becomes

nvmlInit()
handle = nvmlDeviceGetHandleByIndex(0)
(current, pending) = nvmlDeviceGetEccMode(handle)
  1. C structs are converted into Python classes. E.g. the C struct:
nvmlReturn_t DECLDIR nvmlDeviceGetMemoryInfo(nvmlDevice_t device,
                                             nvmlMemory_t *memory);
typedef struct nvmlMemory_st {
    unsigned long long total;
    unsigned long long free;
    unsigned long long used;
} nvmlMemory_t;

Becomes:

info = nvmlDeviceGetMemoryInfo(handle)
print("Total memory: {}MiB".format(info.total >> 20))
# will print:
#   Total memory: 5375MiB
print("Free memory: {}".format(info.free >> 20))
# will print:
#   Free memory: 5319MiB
print("Used memory: ".format(info.used >> 20))
# will print:
#   Used memory: 55MiB
  1. Python handles string buffer creation. E.g. the C function:
nvmlReturn_t nvmlSystemGetDriverVersion(char* version,
                                        unsigned int length);

Can be called like so:

version = nvmlSystemGetDriverVersion()
nvmlShutdown()

5. All meaningful NVML constants and enums are exposed in Python. E.g. the constant NVML_TEMPERATURE_GPU is available under py3nvml.NVML_TEMPERATURE_GPU

The NVML_VALUE_NOT_AVAILABLE constant is not used. Instead None is mapped to the field.

Release Notes (for pynvml)

Version 2.285.0

  • Added new functions for NVML 2.285. See NVML documentation for more information.
  • Ported to support Python 3.0 and Python 2.0 syntax.
  • Added nvidia_smi.py tool as a sample app.

Version 3.295.0

  • Added new functions for NVML 3.295. See NVML documentation for more information.
  • Updated nvidia_smi.py tool - Includes additional error handling

Version 4.304.0

  • Added new functions for NVML 4.304. See NVML documentation for more information.
  • Updated nvidia_smi.py tool

Version 4.304.3

  • Fixing nvmlUnitGetDeviceCount bug

Version 5.319.0

  • Added new functions for NVML 5.319. See NVML documentation for more information.

Version 6.340.0

  • Added new functions for NVML 6.340. See NVML documentation for more information.

Version 7.346.0

  • Added new functions for NVML 7.346. See NVML documentation for more information.

Version 7.352.0

  • Added new functions for NVML 7.352. See NVML documentation for more information.

COPYRIGHT

Copyright (c) 2011-2015, NVIDIA Corporation. All rights reserved.

LICENSE

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  • Neither the name of the NVIDIA Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Comments
  • gpu_temp_max_gpu_threshold missing

    gpu_temp_max_gpu_threshold missing

    I just found out that the GPU Max Operating Temp, exported with the XML tag gpu_temp_max_gpu_threshold, is missing from py3nvml.

    Do you have any plan to add it?

    Also, another missing tag is the cuda_version.

    opened by leinardi 13
  • fix invalid escape sequences

    fix invalid escape sequences

    python 3.7 gives a deprecation notice on N\A because it sees it as an escape sequence. I think it was intended as N/A so this is a proposal to fix the warning.

    opened by thekyz 3
  • nvmlDeviceGetMemoryInfo has weird result

    nvmlDeviceGetMemoryInfo has weird result

    hi, when I use nvmlDeviceGetMemoryInfo to get the gpu memory used info, it will return a false number e.g. 16MB while watch -n 0.01 nvidia-smi would show ~11G used. Here is example code below:

    py3nvml.nvmlInit()
    device_count = py3nvml.nvmlDeviceGetCount()
    assert gpuid < device_count
    handle = py3nvml.nvmlDeviceGetHandleByIndex(gpuid)
    mem_info = py3nvml.nvmlDeviceGetBAR1MemoryInfo(handle)
    if mem_info != 'N/A':
        print(mem_info)
        used = mem_info.bar1Used >> 20
        total = mem_info.bar1Total >> 20
    else:
        used = 0
        total = 0
    
    opened by knsong 2
  • video_clock tag missing

    video_clock tag missing

    Another tag missing is the video_clock:

    		<clocks>
    			<graphics_clock>300 MHz</graphics_clock>
    			<sm_clock>300 MHz</sm_clock>
    			<mem_clock>405 MHz</mem_clock>
    			<video_clock>540 MHz</video_clock>
    		</clocks>
    		<max_clocks>
    			<graphics_clock>2175 MHz</graphics_clock>
    			<sm_clock>2175 MHz</sm_clock>
    			<mem_clock>7000 MHz</mem_clock>
    			<video_clock>1950 MHz</video_clock>
    		</max_clocks>
    
    opened by leinardi 2
  • module 'string' has no attribute 'join'

    module 'string' has no attribute 'join'

    Hi, When I tried to do e.g. print(nvmlDeviceGetName(h)) I got the following error.

    File "/home/**/build/nvidia-ml-py/src/py3nvml/py3nvml/py3nvml.py", line 399, in __str__
        return self.__class__.__name__ + "(" + string.join(result, ", ") + ")"
    AttributeError: module 'string' has no attribute 'join'
    

    I'm sorry but I'm blind to Python and I have no idea. Could you investigate it?

    opened by ikfj 2
  • Add constraint on minimal graphics card memory size

    Add constraint on minimal graphics card memory size

    The new parameters allows discarding the graphics card that don't have enough RAM capacity. The parameter value should be in MiB.

    Updates the documentation to reflect the new value in arguments.

    opened by m5imunovic 1
  • Fix for more than 8 GPUs

    Fix for more than 8 GPUs

    grab_gpus failed for me on a system with more than 8 GPUs. My fix is to use the same length (nvmlDeviceGetCount) for for gpu_check and gpu_select, which allows all the GPUs to be grabbed.

    opened by hallbjorn 1
  • could not get the GPU MEM usage percent

    could not get the GPU MEM usage percent

    we could see an almost 100% for GPU MEM usage, but tested the code, and could only get '0'.

    My test code: from py3nvml.py3nvml import * nvmlInit() handle = nvmlDeviceGetHandleByIndex(0) result = nvmlDeviceGetUtilizationRates(handle) print(result.memory) # 0

    opened by spenly 1
  • [bug] Error class in nvidia_smi.py

    [bug] Error class in nvidia_smi.py

    Great code. Could you please check handleError call in nvidia_smi.py line 184. I'm suggesting replace val = handleError(NVML_ERROR_NOT_SUPPORTED); by val = handleError(NVMLError(NVML_ERROR_NOT_SUPPORTED)); because it gives me error while comparing value attribute in error handler.

    BR/thupalo

    opened by thupalo 1
  • Not defined?

    Not defined?

    Hi, so the module was working before but for some reason it keeps giving me undefined for these commands:

    nvmlDeviceGetFanSpeed(nvmlDeviceGetHandleByIndex(0)) nvmlDeviceGetTemperature(nvmlDeviceGetHandleByIndex(0), 0)

    I've got a nVida GTX 1080 on windows 10 with Python 3.6.00. All I was trying to do was read my temperature and fan speeds.

    opened by sometimescool22 1
  • Question when using py3nvml

    Question when using py3nvml

    I tried to run the code in the Usage part, but it's report that "ModuleNotFoundError: No module named 'py3nvml.pynvml' " How can I fix this?

    opened by ghost 1
  • the type of nvmlDeviceGetPciInfo(handle).busId is

    the type of nvmlDeviceGetPciInfo(handle).busId is "bytes" not "str"

    When I call nvmlDeviceGetPciInfo(handle) function the "busId" is "bytes" not "str"

     handle = nvmlDeviceGetHandleByIndex(i)
     devId = nvmlDeviceGetPciInfo(handle).busId
    

    The whole nvmlPciInfo_t is:

    nvmlPciInfo_t(busId: b'0000:00:0A.0', domain: 0x0000, bus: 0x00, device: 0x0A, pciDeviceId: 0x1EB810DE, pciSubSystemId: 0x12A210DE, reserved0: 0, reserved1: 0, reserved2: 0, reserved3: 0)
    

    This line converts c_info to "str", but "busId" is "bytes" https://github.com/fbcotter/py3nvml/blob/master/py3nvml/py3nvml.py#L2646

    Is it expected output?

    My test environment is python3.6

    opened by WangKunLoveReading 0
Releases(0.2.4)
  • 0.2.4(Oct 11, 2019)

    Minor update.

    • Fixed some alignment issues with long PIDs in py3smi.
    • Added ability to call py3nvml.grab_gpus with num_gpus=-1. This will grab all available GPUs. Could previously do this by setting num_gpus to a large number but this would throw a warning if it couldn't get all available gpus.
    Source code(tar.gz)
    Source code(zip)
  • 0.2.3(Mar 4, 2019)

    To try and keep py3nvml somewhat up-to-date with the constantly evolving nvidia drivers, I have done some work to the py3nvml.py3nvml module. In particular, I have updated all the constants that were missing in py3nvml and existing in the NVIDIA source as of version 418.43. In addition, I have wrapped all of these constants in Enums so it is easier to see what constants go together. Finally, for all the functions in py3nvml.py3nvml I have copied in the C docstring. While this will result in some strange looking docstrings which will be slightly incorrect, they should give good guidance on the scope of the function, something which was ill-defined before.

    Finally, I will remove the py3nvml.nvidia_smi module in a future version, as I believe it was only ever meant as an example of how to use the nvml functions to query the gpus, and is now quite out of date. To get the same functionality, you can call nvidia-smi -q -x from python with subprocess.

    Source code(tar.gz)
    Source code(zip)
  • 0.2.1(Jun 27, 2018)

    Updated script to not use a big xml query but rather multiple small queries. Can now handle if a gpu falls off the bus too, still displaying info for the remaining gpus.

    Source code(tar.gz)
    Source code(zip)
  • 0.2.0(May 17, 2018)

Owner
Fergal Cotter
Senior Researcher at Wayve AI. Signal Processing PhD graduate from Cambridge. I like wavelets and neural nets!
Fergal Cotter
cuDF - GPU DataFrame Library

cuDF - GPU DataFrames NOTE: For the latest stable README.md ensure you are on the main branch. Resources cuDF Reference Documentation: Python API refe

RAPIDS 5.2k Jan 08, 2023
Library for faster pinned CPU <-> GPU transfer in Pytorch

SpeedTorch Faster pinned CPU tensor - GPU Pytorch variabe transfer and GPU tensor - GPU Pytorch variable transfer, in certain cases. Update 9-29-1

Santosh Gupta 657 Dec 19, 2022
CUDA integration for Python, plus shiny features

PyCUDA lets you access Nvidia's CUDA parallel computation API from Python. Several wrappers of the CUDA API already exist-so what's so special about P

Andreas Klöckner 1.4k Jan 02, 2023
Python 3 Bindings for NVML library. Get NVIDIA GPU status inside your program.

py3nvml Documentation also available at readthedocs. Python 3 compatible bindings to the NVIDIA Management Library. Can be used to query the state of

Fergal Cotter 212 Jan 04, 2023
cuML - RAPIDS Machine Learning Library

cuML - GPU Machine Learning Algorithms cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions t

RAPIDS 3.1k Jan 04, 2023
A Python function for Slurm, to monitor the GPU information

Gpu-Monitor A Python function for Slurm, where I couldn't use nvidia-smi to monitor the GPU information. whole repo is not finish Installation TODO Mo

Squidward Tentacles 2 Feb 11, 2022
cuGraph - RAPIDS Graph Analytics Library

cuGraph - GPU Graph Analytics The RAPIDS cuGraph library is a collection of GPU accelerated graph algorithms that process data found in GPU DataFrames

RAPIDS 1.2k Jan 01, 2023
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Introduction This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code her

NVIDIA Corporation 6.9k Dec 28, 2022
QPT-Quick packaging tool 前项式Python环境快捷封装工具

QPT - Quick packaging tool 快捷封装工具 GitHub主页 | Gitee主页 QPT是一款可以“模拟”开发环境的多功能封装工具,一行命令即可将普通的Python脚本打包成EXE可执行程序,与此同时还可轻松引入CUDA等深度学习加速库, 尽可能在用户使用时复现您的开发环境。

GT-Zhang 545 Dec 28, 2022
jupyter/ipython experiment containers for GPU and general RAM re-use

ipyexperiments jupyter/ipython experiment containers and utils for profiling and reclaiming GPU and general RAM, and detecting memory leaks. About Thi

Stas Bekman 153 Dec 07, 2022
A Python module for getting the GPU status from NVIDA GPUs using nvidia-smi programmically in Python

GPUtil GPUtil is a Python module for getting the GPU status from NVIDA GPUs using nvidia-smi. GPUtil locates all GPUs on the computer, determines thei

Anders Krogh Mortensen 927 Dec 08, 2022
cuSignal - RAPIDS Signal Processing Library

cuSignal The RAPIDS cuSignal project leverages CuPy, Numba, and the RAPIDS ecosystem for GPU accelerated signal processing. In some cases, cuSignal is

RAPIDS 646 Dec 30, 2022
Conda package for artifact creation that enables offline environments. Ideal for air-gapped deployments.

Conda-Vendor Conda Vendor is a tool to create local conda channels and manifests for vendored deployments Installation To install with pip, run: pip i

MetroStar - Tech 13 Nov 17, 2022
Python 3 Bindings for the NVIDIA Management Library

====== pyNVML ====== *** Patched to support Python 3 (and Python 2) *** ------------------------------------------------ Python bindings to the NVID

Nicolas Hennion 95 Jan 01, 2023
📊 A simple command-line utility for querying and monitoring GPU status

gpustat Just less than nvidia-smi? NOTE: This works with NVIDIA Graphics Devices only, no AMD support as of now. Contributions are welcome! Self-Promo

Jongwook Choi 3.2k Jan 04, 2023
General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases.

Vulkan Kompute The general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabl

The Institute for Ethical Machine Learning 1k Dec 26, 2022
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

NVIDIA DALI The NVIDIA Data Loading Library (DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It provi

NVIDIA Corporation 4.2k Jan 08, 2023
ArrayFire: a general purpose GPU library.

ArrayFire is a general-purpose library that simplifies the process of developing software that targets parallel and massively-parallel architectures i

ArrayFire 4k Dec 29, 2022
BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.

A lightweight, GPU accelerated, SQL engine built on the RAPIDS.ai ecosystem. Get Started on app.blazingsql.com Getting Started | Documentation | Examp

BlazingSQL 1.8k Jan 02, 2023
A NumPy-compatible array library accelerated by CUDA

CuPy : A NumPy-compatible array library accelerated by CUDA Website | Docs | Install Guide | Tutorial | Examples | API Reference | Forum CuPy is an im

CuPy 6.6k Jan 05, 2023