当前位置:网站首页>NVIDIA cudnn learning
NVIDIA cudnn learning
2022-07-26 00:02:00 【[the mountains are green and the flowers are burning]】
NVIDIA cuDNN Study
List of articles
Preface :
NVIDIA CUDA Deep neural network library (cuDNN) It's a GPU Accelerated Deep neural network Primitive Library , Be able to implement standard routines in a highly optimized way ( Such as forward and reverse convolution 、 Pooling layer 、 Normalization and activation layer ).
Deep learning researchers and framework developers all over the world rely on cuDNN To achieve high performance GPU Speed up . With the help of cuDNN, Researchers and developers can focus on training neural networks and developing software applications , Instead of spending time on low-level GPU Performance tuning .cuDNN A deep learning framework that can accelerate widespread application , Include Caffe2、Chainer、Keras、MATLAB、MxNet、PaddlePaddle、PyTorch and TensorFlow. If you need to obtain experience NVIDIA Optimized and integrated in the framework cuDNN Deep learning framework Containers , Please visit NVIDIA GPU CLOUD Learn more and get started .
stay 21.02 NGC Containers are used for comparison cuDNN 7.6.5 The individual DGX-1V Server and use cuDNN 8.1.1 Of DGX-A100 Throughput . End to end performance tends to converge .
cuDNN Download and install
Get into [ Official website ](cuDNN Download | NVIDIA Developer) Check the same , Select the version you need to download and install :

cuDNN 8.3 New changes
cuDNN 8.3 in the light of A100 GPU optimized , Can provide up to V100 GPU 5 Times out of the box performance , It also contains information for conversational AI And computer vision API. It has been redesigned , Ease of use and Application Integration , At the same time, it can also provide developers with higher flexibility .
cuDNN 8.3 Highlights include :
- Provide optimization acceleration for converter based models
- Runtime fusion , Through the new operator 、 Heuristic algorithm and fusion quickly compile kernel
- Reduce the download package size 30%
cuDNN 8.3 It is now available in the form of six smaller libraries , Can be more finely integrated into the application . Developers can download cuDNN, Or from NGC Extract it from the frame container on the .
Read the new cuDNN Version Description , Get a detailed list of new features and enhancements .
cuDNN The main characteristics of
- For various commonly used convolutions Tensor Core Speed up , Include 2D Convolution 、3D Convolution 、 Grouping convolution 、 Deep separable convolution and inclusion NHWC and NCHW Extended convolution of input and output
- The kernel is optimized for many computer vision and speech models , Include ResNet、ResNext、EfficientNet、EfficientDet、SSD、MaskRCNN、Unet、VNet、BERT、GPT-2、Tacotron2 and WaveGlow
- Support FP32、FP16、BF16 and TF32 Floating point format and INT8 and UINT8 Integer format
- 4D Arbitrary dimensional ordering of tensors 、 Step and sub region means that it can be easily integrated into any neural network implementation
- Can provide various CNN Speed up the fusion operation on the architecture
stay Data Center and Move GPU Used in Ampere、Turing、Volta、Pascal、Maxwell and Kepler GPU Architecture Windows and Linux The system supports cuDNN.
cuDNN Acceleration framework

cuDNN resources
- NVIDIA Deep learning SDK file
- Deepen understanding cuDNN 8 Web conferencing
- About in cuDNN In the middle of Tensor Core Programming blog
- Related libraries and software :
- NCCL: be used for GPU Fast communication between
- cuBLAS: be used for GPU Accelerated BLAS routine
- DALI: For fast execution AI Data preprocessing
- NVIDIA GPU Cloud: For containers
- lookup NVIDIA Developer Forum On the other cuDNN developer
边栏推荐
猜你喜欢

痞子衡嵌入式:MCUXpresso IDE下将源码制作成Lib库方法及其与IAR,MDK差异

Get the data of Mafeng Hotel

Part 74: overview of machine learning optimization methods and superparameter settings

Compile live555 with vs2019 in win10

二叉树——112. 路径总和

Fixed and alternate sequential execution of modes

调用钉钉api报错:机器人发送签名过期;solution:签名生成时间和发送时间请保持在 timestampms 以内

二叉树——226. 翻转二叉树

Basic syntax of MySQL DDL, DML and DQL

复盘:推荐系统—— 负采样策略
随机推荐
STM32 lighting procedure
Reduce method of array
痞子衡嵌入式:MCUXpresso IDE下将源码制作成Lib库方法及其与IAR,MDK差异
Leetcode169-多数元素详解
注解@Autowired源码解析
指针函数的demo
SIGIR '22 recommendation system paper graph network
Quick sorting of top ten sorting
A brief introduction to OWASP
Storage of data in memory
Leetcode200-查找岛屿数量详解
安全文档归档软件
Responsibility chain model of behavioral model
[learning notes] unreal 4 engine introduction (III)
什么叫做 inode ?带你理解 inode 和对于创建文件和删除文件时 inode 都提供了哪些帮助。
C语言实战之猜拳游戏
Problem set
How to use yolov5 as an intelligent transportation system for red light running monitoring (1)
NVIDIA cuDNN学习
MySQL的DDL、DML和DQL的基本语法