当前位置:网站首页>Research on multi model architecture of ads computing power chip
Research on multi model architecture of ads computing power chip
2022-07-01 16:00:00 【Advanced engineering intelligent vehicle】
author :Nathan J, Chief architect of Furui microelectronics UK R & D Center , Resident in Cambridge, England . I was in ARM The headquarters has been engaged in high performance for more than ten years CPU Architecture research and AI architecture research .
In the past decade , Deep neural network (DNN) It has been widely used , For example, mobile phones ,AR/VR,IoT And automatic driving . Complex use cases lead to many DNN The emergence of model application , for example VR The application of contains many subtasks : Avoid collision with nearby obstacles through target detection , Predict input through the tracking of opponents or gestures , Through the eye tracking to complete the center point rendering , These subtasks can use different DNN Model to complete . Like autonomous vehicle, it also uses a series of DNN Algorithm to realize the perception function , Every DNN To accomplish a specific task . But different DNN The network layer and operator of the model are also very different , Even in a DNN Heterogeneous operators and types may also be used in the model .
Besides ,Torch、TensorFlow and Caffe Wait for the mainstream deep learning framework , It is still handled in a sequential way inference Mission , One process per model . Therefore, it also leads to the current NPU Architecture is still focused on a single DNN Task acceleration and optimization , This is far from enough DNN Performance requirements of model application , There is an urgent need for new models at the bottom NPU Computing architecture accelerates and optimizes multi model tasks . And reconfigurable NPU Although it can adapt to the diversity of neural network layer , But additional hardware resources are needed to support ( For example, switching unit , Interconnection and control module ), It will also lead to additional power consumption caused by reconfiguration of the network layer .

Development NPU To support multitasking models faces many challenges :DNN The diversity of load is improved NPU The complexity of design ; Multiple DNN Linkage between , Lead to DNN Scheduling between becomes difficult ; How to balance reconfigurability and customization becomes more challenging . In addition, this kind of NPU Additional performance criteria are also introduced in the design : Because of multiple DNN Delay caused by data sharing between models , Multiple DNN How to allocate resources effectively between models .
The current direction of design research can be roughly divided into the following points : Multiple DNN Parallel execution between models , The redesign NPU Architecture to effectively support DNN Diversity of models , Optimization of scheduling strategy .

Edit search map
DNN Parallelism and scheduling strategy :
Parallel strategies such as time division multiplexing and spatial cooperative positioning can be used . Scheduling algorithm can be roughly divided into three directions : Static and dynamic scheduling , Scheduling for time and space , And scheduling based on software or hardware .
Time division multiplexing It is an upgraded version of the traditional priority preemption strategy , allow inter-DNN Assembly line operation , To improve the utilization of system resources (PE and memory etc. ). This strategy focuses on the optimization of scheduling algorithm , The advantage is to NPU There are few changes to the hardware .
Spatial collaborative positioning Focus on multiple DNN Parallelism of model execution , That's different DNN The model can occupy NPU Different parts of hardware resources . It requires design NPU Each stage should be predicted DNN Network characteristics and priorities , Take the predefined part NPU Hardware units are assigned to specific DNN Internet use . The assigned strategy can be selected DNN Dynamic allocation during operation , Or static allocation . Static allocation depends on the hardware scheduler , Less software intervention . The advantage of spatial collaborative positioning is that it can better improve the performance of the system , But the hardware changes are relatively large .
Dynamic scheduling and static scheduling Dynamic scheduling or static scheduling is selected according to the specific goals of user use cases .
Dynamic scheduling is more flexible , According to the reality DNN Reallocate resources according to the needs of the task . Dynamic scheduling mainly depends on time division multiplexing , Or use a dynamically composable engine ( You need to add a dynamic scheduler to your hardware ), Most algorithms choose preemptive Strategy or AI-MT Early expulsion algorithm of .
For customized static scheduling strategy , Can better improve NPU Performance of . This scheduling strategy refers to NPU At the design stage, specific hardware modules have been customized to deal with specific neural network layers or specific operations . This scheduling strategy has high performance , But the hardware changes are relatively large .

Edit search map
isomerism NPU framework :
Static scheduling strategy combining dynamic reconfiguration and customization , stay NPU Design multiple sub accelerators in , Each sub accelerator is targeted at a specific neural network layer or specific network operations . In this way, the scheduler can adapt to multiple DNN The network layer of the model runs on the appropriate sub accelerator , You can also schedule from different DNN The network layer of the model runs synchronously on multiple sub accelerators . This can not only save the additional hardware resource consumption brought by the reconfiguration architecture , It can also improve the flexibility of processing in different network layers .
isomerism NPU The research and design of architecture can be mainly considered from these three aspects :
1) How to design multi seed accelerator according to the characteristics of different network layers ;
2) How to distribute resources among different sub accelerators ;
3) How to schedule the specific network layer that meets the memory limit to execute on the appropriate sub accelerator .
reference :
[1] Stylianos I. Venieris, and etc.“Multi-DNN Accelerators for Next-Generation AI Systems”
https://arxiv.org/pdf/2205.09376.pdf
[2] Hyoukjun K. Liangzhen L and etc. “Heterogeneous Dataflow Accelerators for Multi-DNN- Workloads”
https://arxiv.org/abs/1909.07437
边栏推荐
- 大龄测试/开发程序员该何去何从?是否会被时代抛弃?
- What time do you get off work?!!!
- Redis seckill demo
- [pyGame practice] do you think it's magical? Pac Man + cutting fruit combine to create a new game you haven't played! (source code attached)
- Which MySQL functions are currently supported by tablestore in table storage?
- Reading notes of top performance version 2 (V) -- file system monitoring
- VIM from dislike to dependence (22) -- automatic completion
- 【php毕业设计】基于php+mysql+apache的教材管理系统设计与实现(毕业论文+程序源码)——教材管理系统
- The newly born robot dog can walk by himself after rolling for an hour. The latest achievement of Wu Enda's eldest disciple
- 二叉树的前序,中序,后续(非递归版本)
猜你喜欢

韩国AI团队抄袭震动学界!1个导师带51个学生,还是抄袭惯犯
![[200 opencv routines] 216 Draw polylines and polygons](/img/47/3e5564ff9cf5fa3ef98a2ea27694cf.png)
[200 opencv routines] 216 Draw polylines and polygons

Overview | slam of laser and vision fusion

DO280管理应用部署--pod调度控制

What is the forkjoin framework in the concurrent programming series?

Programming examples of stm32f1 and stm32subeide - production melody of PWM driven buzzer

ThinkPHP kernel work order system source code commercial open source version multi user + multi customer service + SMS + email notification

Nuxt. JS data prefetching
![[target tracking] | template update time context information (updatenet)](/img/53/0a8b2135fa4903f30e4573256c393a.png)
[target tracking] | template update time context information (updatenet) "learning the model update for Siamese trackers"
![[IDM] IDM downloader installation](/img/2b/baf8852b422c1c4a18e9c60de864e5.png)
[IDM] IDM downloader installation
随机推荐
Which MySQL functions are currently supported by tablestore in table storage?
【显存优化】深度学习显存优化方法
DO280管理应用部署--pod调度控制
分享在大疆DJI(深圳总部)工作的日常和福利
She is the "HR of others" | ones character
The newly born robot dog can walk by himself after rolling for an hour. The latest achievement of Wu Enda's eldest disciple
Pico,是要拯救还是带偏消费级VR?
使用腾讯云搭建图床服务
For the sustainable development of software testing, we must learn to knock code?
远程办公经验?来一场自问自答的介绍吧~ | 社区征文
Share the daily work and welfare of DJI (Shenzhen headquarters) in Dajiang
Summer Challenge harmonyos canvas realize clock
6.2 normalization 6.2.6 BC normal form (BCNF) 6.2.9 normalization summary
STM32F1与STM32CubeIDE编程实例-PWM驱动蜂鸣器生产旋律
ABAP-屏幕切换时,刷新上一个屏幕
C#/VB. Net merge PDF document
How to adjust the size of computer photos to what you want
Where should older test / development programmers go? Will it be abandoned by the times?
ABAP call restful API
Factory high-precision positioning management system, digital safety production management