当前位置:网站首页>Research on multi model architecture of ads computing power chip
Research on multi model architecture of ads computing power chip
2022-07-01 16:00:00 【Advanced engineering intelligent vehicle】
author :Nathan J, Chief architect of Furui microelectronics UK R & D Center , Resident in Cambridge, England . I was in ARM The headquarters has been engaged in high performance for more than ten years CPU Architecture research and AI architecture research .
In the past decade , Deep neural network (DNN) It has been widely used , For example, mobile phones ,AR/VR,IoT And automatic driving . Complex use cases lead to many DNN The emergence of model application , for example VR The application of contains many subtasks : Avoid collision with nearby obstacles through target detection , Predict input through the tracking of opponents or gestures , Through the eye tracking to complete the center point rendering , These subtasks can use different DNN Model to complete . Like autonomous vehicle, it also uses a series of DNN Algorithm to realize the perception function , Every DNN To accomplish a specific task . But different DNN The network layer and operator of the model are also very different , Even in a DNN Heterogeneous operators and types may also be used in the model .
Besides ,Torch、TensorFlow and Caffe Wait for the mainstream deep learning framework , It is still handled in a sequential way inference Mission , One process per model . Therefore, it also leads to the current NPU Architecture is still focused on a single DNN Task acceleration and optimization , This is far from enough DNN Performance requirements of model application , There is an urgent need for new models at the bottom NPU Computing architecture accelerates and optimizes multi model tasks . And reconfigurable NPU Although it can adapt to the diversity of neural network layer , But additional hardware resources are needed to support ( For example, switching unit , Interconnection and control module ), It will also lead to additional power consumption caused by reconfiguration of the network layer .
Development NPU To support multitasking models faces many challenges :DNN The diversity of load is improved NPU The complexity of design ; Multiple DNN Linkage between , Lead to DNN Scheduling between becomes difficult ; How to balance reconfigurability and customization becomes more challenging . In addition, this kind of NPU Additional performance criteria are also introduced in the design : Because of multiple DNN Delay caused by data sharing between models , Multiple DNN How to allocate resources effectively between models .
The current direction of design research can be roughly divided into the following points : Multiple DNN Parallel execution between models , The redesign NPU Architecture to effectively support DNN Diversity of models , Optimization of scheduling strategy .
Edit search map
DNN Parallelism and scheduling strategy :
Parallel strategies such as time division multiplexing and spatial cooperative positioning can be used . Scheduling algorithm can be roughly divided into three directions : Static and dynamic scheduling , Scheduling for time and space , And scheduling based on software or hardware .
Time division multiplexing It is an upgraded version of the traditional priority preemption strategy , allow inter-DNN Assembly line operation , To improve the utilization of system resources (PE and memory etc. ). This strategy focuses on the optimization of scheduling algorithm , The advantage is to NPU There are few changes to the hardware .
Spatial collaborative positioning Focus on multiple DNN Parallelism of model execution , That's different DNN The model can occupy NPU Different parts of hardware resources . It requires design NPU Each stage should be predicted DNN Network characteristics and priorities , Take the predefined part NPU Hardware units are assigned to specific DNN Internet use . The assigned strategy can be selected DNN Dynamic allocation during operation , Or static allocation . Static allocation depends on the hardware scheduler , Less software intervention . The advantage of spatial collaborative positioning is that it can better improve the performance of the system , But the hardware changes are relatively large .
Dynamic scheduling and static scheduling Dynamic scheduling or static scheduling is selected according to the specific goals of user use cases .
Dynamic scheduling is more flexible , According to the reality DNN Reallocate resources according to the needs of the task . Dynamic scheduling mainly depends on time division multiplexing , Or use a dynamically composable engine ( You need to add a dynamic scheduler to your hardware ), Most algorithms choose preemptive Strategy or AI-MT Early expulsion algorithm of .
For customized static scheduling strategy , Can better improve NPU Performance of . This scheduling strategy refers to NPU At the design stage, specific hardware modules have been customized to deal with specific neural network layers or specific operations . This scheduling strategy has high performance , But the hardware changes are relatively large .
Edit search map
isomerism NPU framework :
Static scheduling strategy combining dynamic reconfiguration and customization , stay NPU Design multiple sub accelerators in , Each sub accelerator is targeted at a specific neural network layer or specific network operations . In this way, the scheduler can adapt to multiple DNN The network layer of the model runs on the appropriate sub accelerator , You can also schedule from different DNN The network layer of the model runs synchronously on multiple sub accelerators . This can not only save the additional hardware resource consumption brought by the reconfiguration architecture , It can also improve the flexibility of processing in different network layers .
isomerism NPU The research and design of architecture can be mainly considered from these three aspects :
1) How to design multi seed accelerator according to the characteristics of different network layers ;
2) How to distribute resources among different sub accelerators ;
3) How to schedule the specific network layer that meets the memory limit to execute on the appropriate sub accelerator .
reference :
[1] Stylianos I. Venieris, and etc.“Multi-DNN Accelerators for Next-Generation AI Systems”
https://arxiv.org/pdf/2205.09376.pdf
[2] Hyoukjun K. Liangzhen L and etc. “Heterogeneous Dataflow Accelerators for Multi-DNN- Workloads”
https://arxiv.org/abs/1909.07437
边栏推荐
- Tanabata confession introduction: teach you to use your own profession to say love words, the success rate is 100%, I can only help you here ~ (programmer Series)
- There will be a gap bug when the search box and button are zoomed
- Trace the source of drugs and tamp the safety dike
- 使用腾讯云搭建图床服务
- Seate中用了shardingjdbc 就不能用全局事务了吗?
- 【LeetCode】43. 字符串相乘
- [PHP graduation design] design and implementation of textbook management system based on php+mysql+apache (graduation thesis + program source code) -- textbook management system
- laravel的模型删除后动作
- Pocket network supports moonbeam and Moonriver RPC layers
- Does 1.5.1 in Seata support mysql8?
猜你喜欢
揭秘慕思“智商税”:狂砸40亿搞营销,发明专利仅7项
How to write good code - Defensive Programming Guide
TensorFlow团队:我们没被抛弃
The newly born robot dog can walk by himself after rolling for an hour. The latest achievement of Wu Enda's eldest disciple
【IDM】IDM下载器安装
C#/VB.NET 合并PDF文档
MySQL backup and restore single database and single table
Microservice tracking SQL (support Gorm query tracking under isto control)
[STM32 learning] w25qxx automatic judgment capacity detection based on STM32 USB storage device
如何写出好代码 - 防御式编程指南
随机推荐
开机时小键盘灯不亮的解决方案
求求你们,别再刷 Star 了!这跟“爱国”没关系!
Task. Run(), Task. Factory. Analysis of behavior inconsistency between startnew() and new task()
Embedded development: five revision control best practices
二叉树的前序,中序,后续(非递归版本)
华为发布HCSP-Solution-5G Security人才认证,助力5G安全人才生态建设
ThinkPHP进阶
Automatique, intelligent, visuel! Forte conviction des huit conceptions derrière la solution sslo
Preorder, inorder, follow-up of binary tree (non recursive version)
Telecommuting experience? Let's introduce ourselves ~ | community essay solicitation
[target tracking] |stark
Thinkphp内核工单系统源码商业开源版 多用户+多客服+短信+邮件通知
跨平台应用开发进阶(二十四) :uni-app实现文件下载并保存
七夕表白攻略:教你用自己的专业说情话,成功率100%,我只能帮你们到这里了啊~(程序员系列)
TensorFlow团队:我们没被抛弃
智慧党建: 穿越时空的信仰 | 7·1 献礼
投稿开奖丨轻量应用服务器征文活动(5月)奖励公布
Go language learning notes - Gorm use - table addition, deletion, modification and query | web framework gin (VIII)
电脑屏幕变色了怎么调回来,电脑屏幕颜色怎么改
Can't global transactions be used when shardingjdbc is used in seate?