当前位置:网站首页>NVIDIA cudnn learning
NVIDIA cudnn learning
2022-07-26 00:02:00 【[the mountains are green and the flowers are burning]】
NVIDIA cuDNN Study
List of articles
Preface :
NVIDIA CUDA Deep neural network library (cuDNN) It's a GPU Accelerated Deep neural network Primitive Library , Be able to implement standard routines in a highly optimized way ( Such as forward and reverse convolution 、 Pooling layer 、 Normalization and activation layer ).
Deep learning researchers and framework developers all over the world rely on cuDNN To achieve high performance GPU Speed up . With the help of cuDNN, Researchers and developers can focus on training neural networks and developing software applications , Instead of spending time on low-level GPU Performance tuning .cuDNN A deep learning framework that can accelerate widespread application , Include Caffe2、Chainer、Keras、MATLAB、MxNet、PaddlePaddle、PyTorch and TensorFlow. If you need to obtain experience NVIDIA Optimized and integrated in the framework cuDNN Deep learning framework Containers , Please visit NVIDIA GPU CLOUD Learn more and get started .
stay 21.02 NGC Containers are used for comparison cuDNN 7.6.5 The individual DGX-1V Server and use cuDNN 8.1.1 Of DGX-A100 Throughput . End to end performance tends to converge .
cuDNN Download and install
Get into [ Official website ](cuDNN Download | NVIDIA Developer) Check the same , Select the version you need to download and install :

cuDNN 8.3 New changes
cuDNN 8.3 in the light of A100 GPU optimized , Can provide up to V100 GPU 5 Times out of the box performance , It also contains information for conversational AI And computer vision API. It has been redesigned , Ease of use and Application Integration , At the same time, it can also provide developers with higher flexibility .
cuDNN 8.3 Highlights include :
- Provide optimization acceleration for converter based models
- Runtime fusion , Through the new operator 、 Heuristic algorithm and fusion quickly compile kernel
- Reduce the download package size 30%
cuDNN 8.3 It is now available in the form of six smaller libraries , Can be more finely integrated into the application . Developers can download cuDNN, Or from NGC Extract it from the frame container on the .
Read the new cuDNN Version Description , Get a detailed list of new features and enhancements .
cuDNN The main characteristics of
- For various commonly used convolutions Tensor Core Speed up , Include 2D Convolution 、3D Convolution 、 Grouping convolution 、 Deep separable convolution and inclusion NHWC and NCHW Extended convolution of input and output
- The kernel is optimized for many computer vision and speech models , Include ResNet、ResNext、EfficientNet、EfficientDet、SSD、MaskRCNN、Unet、VNet、BERT、GPT-2、Tacotron2 and WaveGlow
- Support FP32、FP16、BF16 and TF32 Floating point format and INT8 and UINT8 Integer format
- 4D Arbitrary dimensional ordering of tensors 、 Step and sub region means that it can be easily integrated into any neural network implementation
- Can provide various CNN Speed up the fusion operation on the architecture
stay Data Center and Move GPU Used in Ampere、Turing、Volta、Pascal、Maxwell and Kepler GPU Architecture Windows and Linux The system supports cuDNN.
cuDNN Acceleration framework

cuDNN resources
- NVIDIA Deep learning SDK file
- Deepen understanding cuDNN 8 Web conferencing
- About in cuDNN In the middle of Tensor Core Programming blog
- Related libraries and software :
- NCCL: be used for GPU Fast communication between
- cuBLAS: be used for GPU Accelerated BLAS routine
- DALI: For fast execution AI Data preprocessing
- NVIDIA GPU Cloud: For containers
- lookup NVIDIA Developer Forum On the other cuDNN developer
边栏推荐
- How to solve cross domain problems
- STM32 lighting procedure
- redis-扩展数据类型(跳跃表/BitMaps/HyperLogLog/GeoSpatial)
- Scroll series
- After entering www.baidu.com in the address bar
- TOPSIS and entropy weight method
- Stm32 systeminit trap during simulation debugging
- Article 75: writing skills of academic papers
- [learning notes] unreal 4 engine introduction (IV)
- The items of listview will be displayed completely after expansion
猜你喜欢

A long detailed explanation of C language operators

GUI interface of yolov3 (2) -- beautify the page + output the name and quantity of the identified object
![[day.2] Joseph Ring problem, how to use arrays to replace circular linked lists (detailed explanation)](/img/2b/b354e52a9eb1d53475fa8d0339d33b.jpg)
[day.2] Joseph Ring problem, how to use arrays to replace circular linked lists (detailed explanation)

Responsibility chain model of behavioral model

多御安全浏览器手机版将增加新功能,使用户浏览更个性化

死信队列 和消息TTL过期代码
![[learning notes] unreal 4 engine introduction (IV)](/img/30/4defa3cbd785d43adb405c71d16406.png)
[learning notes] unreal 4 engine introduction (IV)

SQLZOO——Nobel Quiz

Get the data of Mafeng Hotel

A brief introduction to OWASP
随机推荐
回溯——17. 电话号码的字母组合
Iterator pattern of behavioral pattern
sftp和ftp的区别
C语言实战之猜拳游戏
二叉树——700.二叉搜索树中的搜索
Native JS perfectly realizes deep copy
Exercise (1) create a set C1 to store the elements "one", "two", "three"“
二叉树——404. 左叶子之和
Compile live555 with vs2019 in win10
Sort fake contacts
BOM 浏览器对象模型
智牛股--09
STM32 serial port
C# - readonly 和 const 关键字
How to use yolov5 as an intelligent transportation system for red light running monitoring (1)
Call nailing API and report an error: the signature sent by the robot is expired; Solution: please keep the signature generation time and sending time within timestampms
Find the cause of program dead cycle through debugging
The late Apple co-founder Steve Jobs was posthumously awarded the U.S. presidential medal of freedom
firewall 命令简单操作
Scroll series