DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning.

Overview

DirectML

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.

When used standalone, the DirectML API is a low-level DirectX 12 library and is suitable for high-performance, low-latency applications such as frameworks, games, and other real-time applications. The seamless interoperability of DirectML with Direct3D 12 as well as its low overhead and conformance across hardware makes DirectML ideal for accelerating machine learning when both high performance is desired, and the reliability and predictability of results across hardware is critical.

More information about DirectML can be found in Introduction to DirectML.

Visit the DirectX Landing Page for more resources for DirectX developers.

Getting Started with DirectML

DirectML is distributed as a system component of Windows 10, and is available as part of the Windows 10 operating system (OS) in Windows 10, version 1903 (10.0; Build 18362), and newer.

Starting with DirectML version 1.4.0, DirectML is also available as a standalone redistributable package (see Microsoft.AI.DirectML), which is useful for applications that wish to use a fixed version of DirectML, or when running on older versions of Windows 10.

Hardware requirements

DirectML requires a DirectX 12 capable device. Almost all commercially-available graphics cards released in the last several years support DirectX 12. Examples of compatible hardware include:

  • AMD GCN 1st Gen (Radeon HD 7000 series) and above
  • Intel Haswell (4th-gen core) HD Integrated Graphics and above
  • NVIDIA Kepler (GTX 600 series) and above
  • Qualcomm Adreno 600 and above

For application developers

DirectML exposes a native C++ DirectX 12 API. The header and library (DirectML.h/DirectML.lib) are available as part of the redistributable NuGet package, and are also included in the Windows 10 SDK version 10.0.18362 or newer.

For users, data scientists, and researchers

DirectML is built-in as a backend to several frameworks such as Windows ML, ONNX Runtime, and TensorFlow.

See the following sections for more information:

DirectML Samples

DirectML C++ sample code is available under Samples.

  • HelloDirectML: A minimal "hello world" application that executes a single DirectML operator.
  • DirectMLSuperResolution: A sample that uses DirectML to execute a basic super-resolution model to upscale video from 540p to 1080p in real time.
  • yolov4: YOLOv4 is an object detection model capable of recognizing up to 80 different classes of objects in an image. This sample contains a complete end-to-end implementation of the model using DirectML, and is able to run in real time on a user-provided video stream.

DirectML Python sample code is available under Python/samples. The samples require PyDirectML, an open source Python projection library for DirectML, which can be built and installed to a Python executing environment from Python/src. Refer to the Python/README.md file for more details.

Windows ML on DirectML

Windows ML (WinML) is a high-performance, reliable API for deploying hardware-accelerated ML inferences on Windows devices. DirectML provides the GPU backend for Windows ML.

DirectML acceleration can be enabled in Windows ML using the LearningModelDevice with any one of the DirectX DeviceKinds.

For more information, see Get Started with Windows ML.

ONNX Runtime on DirectML

ONNX Runtime is a cross-platform inferencing and training accelerator compatible with many popular ML/DNN frameworks, including PyTorch, TensorFlow/Keras, scikit-learn, and more.

DirectML is available as an optional execution provider for ONNX Runtime that provides hardware acceleration when running on Windows 10.

For more information about getting started, see Using the DirectML execution provider.

TensorFlow with DirectML

TensorFlow is a popular open source platform for machine learning and is a leading framework for training of machine learning models.

DirectML acceleration for TensorFlow 1.15 is currently available for Public Preview. TensorFlow on DirectML enables training and inference of complex machine learning models on a wide range of DirectX 12-compatible hardware.

TensorFlow on DirectML is supported on both the latest versions of Windows 10 and the Windows Subsystem for Linux, and is available for download as a PyPI package. For more information about getting started, see GPU accelerated ML training (docs.microsoft.com)

PyTorch with DirectML

DirectML acceleration for PyTorch 1.8.0 is currently available for Public Preview. PyTorch with DirectML enables training and inference of complex machine learning models on a wide range of DirectX 12-compatible hardware.

PyTorch on DirectML is supported on both the latest versions of Windows 10 and the Windows Subsystem for Linux, and is available for download as a PyPI package. For more information about getting started, see GPU accelerated ML training (docs.microsoft.com)

Feedback

We look forward to hearing from you!

External Links

Documentation

DirectML programming guide
DirectML API reference

More information

Introducing DirectML (Game Developers Conference '19)
Accelerating GPU Inferencing with DirectML and DirectX 12 (SIGGRAPH '18)
Windows AI: hardware-accelerated ML on Windows devices (Microsoft Build '20)
Gaming with Windows ML (DirectX Developer Blog)
DirectML at GDC 2019 (DirectX Developer Blog)
DirectX Linux (DirectX Developer Blog)

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Comments
  • DirectML is x2.8 slower than CUDA

    DirectML is x2.8 slower than CUDA

    I tested training the same deepfake model on the same hardware using tensorflow-cuda and tensorflow-directml. (my project https://github.com/iperov/DeepFaceLab)

    DirectML: avg iter time 626ms DMLvsCUDA1

    CUDA: avg iter time 222ms DMLvsCUDA2

    DirectML is x2.8 slower :-(

    I think that's what I was talking about here https://github.com/microsoft/DirectML/issues/104

    So what is the point of using DirectML if every millisecond of training acceleration is important in today's world?

    x2.8 slower is serious performance degradation. I reached the same speed in my weekend OpenCL NN library in pure python (https://github.com/iperov/litenn)

    But you are guys from microsoft company. Don't you think there is no point in further development of DirectML until you reach the level of CUDA performance?

    opened by iperov 36
  • Could not load dynamic library 'libcuda.so.1'

    Could not load dynamic library 'libcuda.so.1'

    Followed the instructions here

    ~ » cat /proc/version                                                                                                                                                             1 ↵ [email protected]
    Linux version 4.4.0-20150-Microsoft ([email protected]) (gcc version 5.4.0 (GCC) ) #1000-Microsoft Thu Jun 12 17:34:00 PST 2020
    

    I'm running build 20150, but am getting this error:

    Python 3.6.10 |Anaconda, Inc.| (default, May  8 2020, 02:54:21)
    [GCC 7.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import tensorflow.compat.v1 as tf
    >>>
    >>> tf.enable_eager_execution(tf.ConfigProto(log_device_placement=True))
    >>>
    >>> print(tf.add([1.0, 2.0], [3.0, 4.0]))
    2020-06-17 16:36:05.469811: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
    2020-06-17 16:36:05.469926: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)
    2020-06-17 16:36:05.470029: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (MAKERPC): /proc/driver/nvidia/version does not exist
    2020-06-17 16:36:05.470532: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
    2020-06-17 16:36:05.483133: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3400000000 Hz
    2020-06-17 16:36:05.487879: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fffe52ac420 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
    2020-06-17 16:36:05.488038: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
    tf.Tensor([4. 6.], shape=(2,), dtype=float32)
    
    opened by jflam 23
  • [installation] Could not find a version that satisfies the requirement tensorflow-directml (from versions: none)

    [installation] Could not find a version that satisfies the requirement tensorflow-directml (from versions: none)

    Hi,

    After following the steps described in https://docs.microsoft.com/en-us/windows/win32/direct3d12/gpu-tensorflow-wsl till pip install tensorflow-directml,

    the error appeared as

    ERROR: Could not find a version that satisfies the requirement tensorflow-directml (from versions: none) ERROR: No matching distribution found for tensorflow-directml

    BTW, I am using python 3.8

    and I did python list tensorflow*, which outputed

    Package Version


    certifi 2020.6.20 pip 20.1.1 setuptools 49.2.0.post20200714 wheel 0.34.2

    opened by shuwang1 19
  • How to get available devices and set a specific device in Pytorch-DML?

    How to get available devices and set a specific device in Pytorch-DML?

    Hi, For accessing available devices in Pytorch we'd normally do :

        print(f'available devices: {torch.cuda.device_count()}')
        print(f'current device: { torch.cuda.current_device()}')
    

    However, I noticed this fails (AssertionError: Torch not compiled with CUDA enabled).
    I thought the transition would be minimal, and stuff like this would work out of the box! especially so, after noting we cant write:

        print(f'available devices: {torch.dml.device_count()}')
        print(f'current device: { torch.dml.current_device()}')
    

    as it fails with the error :

    AttributeError: module 'torch.dml' has no attribute 'device_count'
    

    Apart from this, trying to specify a device using the form "dml:number" fails if number>1! that is this fails for "dml:1":

    import torch 
    import time
    def bench(device ='cpu'):
        print(f'running on {device}:')
        a = torch.randn(size=(2000,2000)).to(device=device)
        b = torch.randn(size=(2000,2000)).to(device=device)
       
        start = time.time()
        c = a+b
        end = time.time()
        
        # print(f'available devices: {torch.dml.device_count()}')
        # print(f'current device: { torch.dml.current_device()}')
        print(f'--took {end-start:.2f} seconds')
    
    bench('cpu')
    bench('dml')
    bench('dml:0')
    bench('dml:1')    
    

    it outputs :

    running on cpu:
    --took 0.00 seconds
    running on dml:
    --took 0.01 seconds
    running on dml:0:
    --took 0.00 seconds
    running on dml:1:
    

    and thats it, it doesnt execute when it comes to "dml:1".

    also trying to do :

    import torch 
    import time
    def bench(device ='cpu'):
        print(f'running on {device}:')
        a = torch.randn(size=(2000,2000)).to(device=device)
        b = torch.randn_like(a).to(device=device)
        
        start = time.time()
        c = a+b
        end = time.time()
        
        # print(f'available devices: {torch.dml.device_count()}')
        # print(f'current device: { torch.dml.current_device()}')
        print(f'--took {end-start:.2f} seconds')
    
    bench('cpu')
    bench('dml')
    bench('dml:0')
    bench('dml:1')    
    

    Fails with the following error :

    running on cpu:
    --took 0.00 seconds
    running on dml:
    Traceback (most recent call last):
      File "g:\tests.py", line 1246, in <module>
        bench('dml')
      File "g:\tests.py", line 1235, in bench
        b = torch.randn_like(a).to(device=device)
    RuntimeError: Could not run 'aten::normal_' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom 
    build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::normal_' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
    
    CPU: registered at D:\a\_work\1\s\build\aten\src\ATen\RegisterCPU.cpp:5926 [kernel]
    BackendSelect: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
    Named: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\core\NamedRegistrations.cpp:11 [kernel]
    AutogradOther: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradCPU: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradCUDA: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradXLA: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradNestedTensor: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradPrivateUse1: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradPrivateUse2: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradPrivateUse3: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    Tracer: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\TraceType_4.cpp:10612 [kernel]
    Autocast: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\autocast_mode.cpp:250 [backend fallback]
    Batched: registered at D:\a\_work\1\s\aten\src\ATen\BatchingRegistrations.cpp:1016 [backend fallback]
    VmapMode: registered at D:\a\_work\1\s\aten\src\ATen\VmapModeRegistrations.cpp:37 [kernel]
    
    
    pytorch-directml 
    opened by Coderx7 11
  • Conv2D-Fail: internal compiler error, abnormal program termination

    Conv2D-Fail: internal compiler error, abnormal program termination

    I ran across directML a few hours ago and am currently playing around with it on a Surface Pro 6 with an Intel HD Graphics 620. To set it all up, I followed this article to the letter: https://docs.microsoft.com/en-us/windows/win32/direct3d12/gpu-tensorflow-windows

    For testing purposes, I used a slightly modified version of my small go-to script:

    import tensorflow.compat.v1 as tf 
    
    tf.enable_eager_execution(tf.ConfigProto(log_device_placement=False)) 
    
    fashion_mnist = tf.keras.datasets.fashion_mnist
    (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
    
    
    class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
                   'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
    
    train_images = train_images.reshape(60000, 28, 28, 1)
    train_images = train_images / 255.0
    
    test_images = test_images.reshape(10000, 28, 28, 1)
    test_images = test_images / 255.0
    
    #model = tf.keras.Sequential([
    #    tf.keras.layers.Flatten(input_shape=(28, 28, 1)),
    #    tf.keras.layers.Dense(128, activation=tf.nn.relu),
    #    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
    #])
    
    model = tf.keras.Sequential([
        tf.keras.layers.Conv2D(64, (3,3), activation=tf.nn.relu, input_shape=(28, 28, 1)),
        tf.keras.layers.MaxPooling2D(2,2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(128, activation=tf.nn.relu),
        tf.keras.layers.Dense(10, activation=tf.nn.softmax)
    ])
    
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    
    model.fit(train_images, train_labels, epochs=5)
    
    test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)
    
    print('Test accuracy:', test_acc)
    

    The version of the model without convolutions runs absolutely fine. But as soon as I add the Conv2D layer, nothing works anymore.

    The entire output I get is:

    2021-04-23 21:23:05.241248: I tensorflow/stream_executor/platform/default/dso_loader.cc:99] Successfully opened dynamic library C:\Users\cyphus309\.conda\envs\directml\lib\site-packages\tensorflow_core\python/directml.b6e3bc69b89cfca5486e178bb9d51724d0c4a94a.dll
    2021-04-23 21:23:05.298554: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:249] DirectML device enumeration: found 1 compatible adapters.
    2021-04-23 21:23:05.299189: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
    2021-04-23 21:23:05.331743: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:185] DirectML: creating device on adapter 0 (Intel(R) HD Graphics 620)
    2021-04-23 21:23:05.363568: I tensorflow/stream_executor/platform/default/dso_loader.cc:99] Successfully opened dynamic library Kernel32.dll
    Train on 60000 samples
    Epoch 1/5
    
    internal compiler error, abnormal program termination
    
    

    Any ideas?

    bug 
    opened by kampfhamster309 11
  • Tensorflow directml crashes my python session

    Tensorflow directml crashes my python session

    Hi,

    I've recently purchased a 6900 xt GPU which I would like to use with tensorflow. I followed the installation guide on https://docs.microsoft.com/en-us/windows/win32/direct3d12/gpu-tensorflow-windows which worked but the issue I have now is that whenever I try to use tensorflow it closes my python environment.

    I've attached an image to show what I mean. I can import tensorflow fine and it shows me that I have version 1.15.5 available. The problem is when I want to check if my GPU is available I get two messages and then it crashes me out of my python environment.

    Does anybody know how to solve this issue and what is going on?

    Thank you in advance!

    amd_tf_problem

    bug 
    opened by bwintertkb 9
  • C++ DirectML.dll causes crash in debug x64 mode when using NuGet package Microsoft.AI.MachineLearning 1.5.2

    C++ DirectML.dll causes crash in debug x64 mode when using NuGet package Microsoft.AI.MachineLearning 1.5.2

    Hello,

    I'm experiencing a runtime crash with the C++ DirectML API in Debug x64 mode after upgrading my NuGet package Microsoft.AI.MachineLearning from version 1.4.0 to 1.5.2. There is no error in Release x64 mode.

    The reason why I'm using this package is because the included DirectML.dll improves DirectML performance greatly. There seems to be an issue when creating a DirectMLOperator. The operator type is DML_OPERATOR_JOIN.

    Can you please help me identify the issue? Also how can I find the latest DirectML.dll file without downloading the package?

    DirectML dll error

    opened by momower1 9
  • Performance will be improved by setting input strides=output strides for Clip in DirectMLX

    Performance will be improved by setting input strides=output strides for Clip in DirectMLX

    I am investigating for the performance of MobileNet V2 from TFLite models with "nhwc" layout and MobileNet V2 from ONNX models with "nchw" layout on the implementation with DirectML and DirectMLX API.

    I find that nhwc MobileNetV2 model has lots of Clip after Conv2d, the Clip will cost much time on inference. I guess that the Clip will do memory copy and hasn't be optimized in compilation stage.

    I have a workaround to resolve this problem: set Clip's input strides same as its' output strides by changing this lineto TensorDesc outputTensor = inputTensor in DirectMLX.h, the Clip will be optimized just like fused into Conv2d, and then the inference time will be significantly reduced to be as same as nchw MobileNetV2.

    When building nhwc MobileNetV2 model, we need append Identity after each Conv2d to transpose output tensor from default nchw to nhwc, then transpose this output tensor from nhwc to nchw as the next Conv2d's input tensor. In my opinion, I suppose that the Identity and Reinterpret can be optimized by DML in this model like: Conv0->Identity(nchw->nhwc)->Reinterpret strides(nhwc->nchw)->Conv1 just like transpose sinking in OpenVINO backend.

    I guess that the Identity and Reinterpret sinking may be blocked when there is Clip like: Conv0->Identity(nchw->nhwc)->Clip->Reinterpret strides(nhwc->nchw)->Conv1 . I verified that if I remove Identity to run Conv0->Reinterpret strides(nchw->nhwc)->Clip(input strides = output strides)->Reinterpret strides(nhwc->nchw)->Conv1, the inference time will be much lower than before.

    So in conclusion, I suggest setting Clip's input strides same as its' output strides by changing this line to TensorDesc outputTensor = inputTensor in DirectMLX.h.

    opened by mingmingtasd 8
  • TensorFlow & DirectML & ROCm  performance and roadmap

    TensorFlow & DirectML & ROCm performance and roadmap

    The current DirectML library for GPU is more 2x slower than the TensorFlow CPU library. When DirectML team will improve the performance of the library? Could you share a roadmap of DirectML? Will DirectML team cooperate with ROCm team (https://github.com/RadeonOpenCompute/ROCm), Intel and Nvidia for improving performance?

    opened by YuriyTigiev 8
  • pytorch-directml simple command error

    pytorch-directml simple command error

    just trying simple command with pytorch-directml 1.8.0a0.dev220224 and getting error

    >>> torch.tensor([1], dtype=torch.float32, device='dml')
    
    Traceback (most recent call last):
      File "<console>", line 1, in <module>
      File "D:\DevelopPPP\projects\DeepFakeBox\_internal\python\lib\site-packages\torch\tensor.py", line 193, in __repr__
        return torch._tensor_str._str(self)
      File "D:\DevelopPPP\projects\DeepFakeBox\_internal\python\lib\site-packages\torch\_tensor_str.py", line 383, in _str
        return _str_intern(self)
      File "D:\DevelopPPP\projects\DeepFakeBox\_internal\python\lib\site-packages\torch\_tensor_str.py", line 358, in _str_intern
        tensor_str = _tensor_str(self, indent)
      File "D:\DevelopPPP\projects\DeepFakeBox\_internal\python\lib\site-packages\torch\_tensor_str.py", line 242, in _tensor_str
        formatter = _Formatter(get_summarized_data(self) if summarize else self)
      File "D:\DevelopPPP\projects\DeepFakeBox\_internal\python\lib\site-packages\torch\_tensor_str.py", line 90, in __init__
        nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
    RuntimeError: Could not run 'aten::masked_select' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::masked_select' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
    
    CPU: registered at D:\a\_work\1\s\pytorch-directml\build\aten\src\ATen\RegisterCPU.cpp:5926 [kernel]
    BackendSelect: fallthrough registered at D:\a\_work\1\s\pytorch-directml\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
    Named: fallthrough registered at D:\a\_work\1\s\pytorch-directml\aten\src\ATen\core\NamedRegistrations.cpp:11 [kernel]
    AutogradOther: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradCPU: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradCUDA: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradXLA: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradNestedTensor: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    UNKNOWN_TENSOR_TYPE_ID: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradPrivateUse1: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradPrivateUse2: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    AutogradPrivateUse3: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
    Tracer: registered at D:\a\_work\1\s\pytorch-directml\torch\csrc\autograd\generated\TraceType_4.cpp:10612 [kernel]
    Autocast: fallthrough registered at D:\a\_work\1\s\pytorch-directml\aten\src\ATen\autocast_mode.cpp:250 [backend fallback]
    Batched: registered at D:\a\_work\1\s\pytorch-directml\aten\src\ATen\BatchingRegistrations.cpp:1016 [backend fallback]
    VmapMode: fallthrough registered at D:\a\_work\1\s\pytorch-directml\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback]
    

    cpu is fine

    >>> torch.tensor([1], dtype=torch.float32, device='cpu')
    tensor([1.])
    
    pytorch-directml 
    opened by iperov 7
  • Is there any low power mode for DirectML

    Is there any low power mode for DirectML

    hi, now I have a quick enough model (120fps) and will run at 20fps, what i need is use as low as possible gpu power. but i find the gpu frequency jump to 1150mhz too many times. as compare to "https://voovmeeting.com/download-center.html?from=1001" tencent meeting , I found when I enable human segmentation , in a 8xxx laptop, the gpu frequency hold below 400mhz , but GPU load over 75%, that is strange for frequency policy.
    so I guess , maybe directx12 or dx11 has some low power mode ? or some other ways, for ex. add some wait in each OP (for ex. convolution op)

    opened by liyuming1978 7
  • pytorch-directml produce

    pytorch-directml produce "[W dml_heap_allocator.cc:97] DML allocator out of memory!"

    I was trying to run the simple code below:

    import torch import torch_directml dml = torch_directml.device()

    print(f"dml={dml}")

    tensor1 = torch.tensor([1]) print(tensor1) tensor1=tensor1.to(dml)

    when runing tensor1.to(dml), i got the following error: [W dml_heap_allocator.cc:97] DML allocator out of memory! Traceback (most recent call last): File "/home/fnz/workspace/direct-ml/main.py", line 9, in tensor1=tensor1.to(dml) RuntimeError: Unknown error -2147024882

    It seems that my pytorch-directml doesn't work at all.

    below is my package in conda: (direct_ml) [email protected]:~/workspace/direct-ml$ conda list | grep torch torch 1.13.1 pypi_0 pypi torch-directml 0.1.13.dev221216 pypi_0 pypi

    BTW, my environment is wsl2 on top of windows 11 pro .

    The tensorflow directml seems working well.

    any idea ?

    thanks

    Feng

    opened by virtual-feng 1
  • torch-directml : torch.div with trunc rounding on int64 fails with RuntimeError

    torch-directml : torch.div with trunc rounding on int64 fails with RuntimeError

    Hi, Because 'aten::fmod.Tensor_out' is not implemented, I tried to implement it myself. I encountered a new error when using the rounding mode trunc with a int64 tensor.

    Code:

    import torch
    import torch_directml
    dml = torch_directml.device()
    
    a = torch.tensor([1,2,3]).to(dml) #
    b = 2
    a = a - torch.div(a, b, rounding_mode="trunc") * b
    
    opened by Theucalyptus 0
  • Very low validation and testing accuracy on CNN

    Very low validation and testing accuracy on CNN

    Hello everyone. I am facing an issue. I am explaining what I am trying to do. I have a Traffic and Road sign dataset that contains 43 classes. I am trying to classify the images. I am using the resnet34 pre-trained model. I have AMD RX6600 GPU that I use for running the model. For running the model on my AMD GPU I am using Pytorch Directml. Until now everything has worked fine. Training speed is fast enough, and GPU utilization is near 100%. Training loss decreases per epoch. But when I check the model using validation data after one training phase, validation loss increases and validation accuracy is too low. But training is ok. When I run the same code on my friend’s PC who has NVIDIA GPU, all is ok. Validation loss decreases and it converges. And I got an accuracy of 98% when running the same code on NVIDIA GPU. I can not figure out what the problem is. I also tune the hyperparameter but had no luck. And one strange thing is that this problem arises when I use CNN based model. I had run NLP pre-trained model BERT on my AMD GPU and there is no Issue. Validation loss decreases and it converges. Can anyone help me with this issue? I am giving the code below. Thanks in advance. Screenshot 2023-01-03 221733

    opened by AtiqurRahmanAni 0
  • Spacy seems outdated + problems running attention...

    Spacy seems outdated + problems running attention...

    Disclaimer: NOT a coder. Generally curious individual with just enough copy-paste and google skills. I may not know what I'm talking about.

    Just playing around with the repo. The install failed because of spacy version in requirements.txt for me. Using python 3.10 on Ubuntu 22.10. Changing Spacy to 3.4.4 (which I had cached, so I just did pip install spacy - to see whichever worked)

    It installed, but gave further warnings like ⚠ As of spaCy v3.0, shortcuts like 'en' are deprecated. Please use the full pipeline package name 'en_core_web_sm' instead. Collecting en-core-web-sm==3.4.1... and

    ⚠ As of spaCy v3.0, shortcuts like 'de' are deprecated. Please use the full pipeline package name 'de_core_news_sm' instead. Collecting de-core-news-sm==3.4.0

    opened by Vidyut 0
  • Operator 'aten::amax.out' is not currently supported on the DML backend.

    Operator 'aten::amax.out' is not currently supported on the DML backend.

    C:\ProgramData\Anaconda3\envs\torchdml\lib\site-packages\torch\optim\adamax.py:231: UserWarning: The operator 'aten::amax.out' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at D:\a_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:16.) torch.amax(norm_buf, 0, keepdim=False, out=exp_inf)

    opened by rmskmr05 0
Releases(tensorflow-directml-1.15.3.dev200626)
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Python ML pipeline that showcases mltrace functionality.

mltrace tutorial Date: October 2021 This tutorial builds a training and testing pipeline for a toy ML prediction problem: to predict whether a passeng

Log Labs 28 Nov 09, 2022
ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more

ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more

Broad Institute 65 Dec 20, 2022
Machine Learning for Time-Series with Python.Published by Packt

Machine-Learning-for-Time-Series-with-Python Become proficient in deriving insights from time-series data and analyzing a model’s performance Links Am

Packt 124 Dec 28, 2022
🔬 A curated list of awesome machine learning strategies & tools in financial market.

🔬 A curated list of awesome machine learning strategies & tools in financial market.

GeorgeZou 1.6k Dec 30, 2022
Probabilistic time series modeling in Python

GluonTS - Probabilistic Time Series Modeling in Python GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (

Amazon Web Services - Labs 3.3k Jan 03, 2023
A linear regression model for house price prediction

Linear_Regression_Model A linear regression model for house price prediction. This code is using these packages, so please make sure your have install

ShawnWang 1 Nov 29, 2021
Neural Machine Translation (NMT) tutorial with OpenNMT-py

Neural Machine Translation (NMT) tutorial with OpenNMT-py. Data preprocessing, model training, evaluation, and deployment.

Yasmin Moslem 29 Jan 09, 2023
Open MLOps - A Production-focused Open-Source Machine Learning Framework

Open MLOps - A Production-focused Open-Source Machine Learning Framework Open MLOps is a set of open-source tools carefully chosen to ease user experi

Data Revenue 590 Dec 28, 2022
Implementations of Machine Learning models, Regularizers, Optimizers and different Cost functions.

Linear Models Implementations of LinearRegression, LassoRegression and RidgeRegression with appropriate Regularizers and Optimizers. Linear Regression

Keivan Ipchi Hagh 1 Nov 22, 2021
Python Research Framework

Python Research Framework

EleutherAI 106 Dec 13, 2022
scikit-learn: machine learning in Python

scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under the 3-Clause BSD license. The project was started

neurodata 3 Dec 16, 2022
Built on python (Mathematical straight fit line coordinates error predictor machine learning foundational model)

Sum-Square_Error-Business-Analytical-Tool- Built on python (Mathematical straight fit line coordinates error predictor machine learning foundational m

om Podey 1 Dec 03, 2021
Conducted ANOVA and Logistic regression analysis using matplot library to visualize the result.

Intro-to-Data-Science Conducted ANOVA and Logistic regression analysis. Project ANOVA The main aim of this project is to perform One-Way ANOVA analysi

Chris Yuan 1 Feb 06, 2022
Automatic extraction of relevant features from time series:

tsfresh This repository contains the TSFRESH python package. The abbreviation stands for "Time Series Feature extraction based on scalable hypothesis

Blue Yonder GmbH 7k Jan 06, 2023
Programming assignments and quizzes from all courses within the Machine Learning Engineering for Production (MLOps) specialization offered by deeplearning.ai

Machine Learning Engineering for Production (MLOps) Specialization on Coursera (offered by deeplearning.ai) Programming assignments from all courses i

Aman Chadha 173 Jan 05, 2023
Diabetes Prediction with Logistic Regression

Diabetes Prediction with Logistic Regression Exploratory Data Analysis Data Preprocessing Model & Prediction Model Evaluation Model Validation: Holdou

AZİZE SULTAN PALALI 2 Oct 23, 2021
Production Grade Machine Learning Service

This project is made to help you scale from a basic Machine Learning project for research purposes to a production grade Machine Learning web service

Abdullah Zaiter 10 Apr 04, 2022
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

alkaline-ml 1.3k Dec 22, 2022
Iris-Heroku - Putting a Machine Learning Model into Production with Flask and Heroku

Puesta en Producción de un modelo de aprendizaje automático con Flask y Heroku L

Jesùs Guillen 1 Jun 03, 2022
Machine Learning Course with Python:

A Machine Learning Course with Python Table of Contents Download Free Deep Learning Resource Guide Slack Group Introduction Motivation Machine Learnin

Instill AI 6.9k Jan 03, 2023