当前位置:网站首页>文献阅读(245)Roller
文献阅读(245)Roller
2022-07-28 13:10:00 【tiaozhanzhe1900】
文章目录
- 题目:Roller: Fast and Efficient Tensor Compilation for Deep Learning
- 时间:2022
- 会议:OSDI
- 研究机构:微软
本篇论文的主要的motivation在于现在DNN的编译探索工作的时间比较长,特别是针对Nivida以外的硬件平台如AMD GPU和Graphcore IPU,所以这篇论文换了一个思路,采用构造的方式生成kernel,首先介绍基本概念:
- rTile:最基础的抽象层级,就是一个data tile,不过与计算和访存的基础尺寸对应
- rProgram:包括了load、store、compute的基于rTile的程序,可以满足GPU中的SM执行
- kernel:利用rProgram构造kernel

- rTile is a new tile abstraction that encapsulates tensor shapes that align with the key features of the underlying accelerator, thus achieving efficient execution by limiting the shape choices.
- rProgram: adopts a recursive rTile-based construction algorithm to gradually increase the size of the rTile shape to construct an rProgram that saturates a single execution unit of the accelerator (e.g., an SM, a streaming multi-processor in a NVIDIA GPU)
- kernel: performs the scale-out process, which simply replicates the resulting rProgram to other parallel execution units

边栏推荐
- 30天刷题计划(四)
- Security assurance is based on software life cycle -psp application
- Tutorial on the principle and application of database system (060) -- MySQL exercise: operation questions 11-20 (IV)
- Several solutions to spanning
- 安全保障基于软件全生命周期-Istio的授权机制
- TS literacy method - Basic chapter
- 协同办公工具:在线白板初起步,在线设计已红海
- Security assurance is based on software life cycle -istio authentication mechanism
- 基于NoneBot2的qq机器人配置记录
- 多线程与高并发(三)—— 源码解析 AQS 原理
猜你喜欢

Denial of service DDoS Attacks

strcmp、strstr、memcpy、memmove的实现
![[lvgl events] Application of events on different components (I)](/img/a8/7c24e68f3506bbef3c2e922729471c.png)
[lvgl events] Application of events on different components (I)

The domestic API management tool eolink is very easy to use, creating an efficient research and development tool

word打字时后面的字会消失是什么原因?如何解决?

作为一个程序员,如何高效的管理时间?

No swagger, what do I use?

【飞控开发基础教程7】疯壳·开源编队无人机-SPI(气压计数据获取)

LeetCode 105.从前序与中序遍历序列构造二叉树 && 106.从中序与后序遍历序列构造二叉树

DXF读写:对齐尺寸标注文字居中、上方的位置计算
随机推荐
Generation of tables and contingency tables (cross tables) of R language factor data: use the summary function to analyze the list, view the chi square test results, and judge whether the two factor v
strcmp、strstr、memcpy、memmove的实现
【Try to Hack】HFish蜜罐部署
盘点操作URL中常用的几个高效API
Dojp1520 gate jumping problem solution
R language ggplot2 visualization: use ggviolin function of ggpubr package to visualize violin diagram and set draw_ The quantiles parameter adds a specified quantile horizontal line (for example, 50%
Understand the principle behind the virtual list, and easily realize the virtual list
安全保障基于软件全生命周期-PSP应用
Several solutions to spanning
SQL daily practice (Niuke new question bank) - day 4: advanced operators
R语言使用lm函数构建线性回归模型、使用subset函数指定对于数据集的子集构建回归模型(使用floor函数和length函数选择数据前部分构建回归模型)
Poj3268 shortest path solution
多级缓存方案
Thoroughly master binary search
Product Manager: job responsibility table
Uva1599 ideal path problem solution
【LVGL事件(Events)】事件在不同组件上的应用(一)
30天刷题计划(三)
SLAM论文合集
30 day question brushing plan (III)