当前位置:网站首页>Research on multi model architecture of ads computing power chip
Research on multi model architecture of ads computing power chip
2022-07-01 16:00:00 【Advanced engineering intelligent vehicle】
author :Nathan J, Chief architect of Furui microelectronics UK R & D Center , Resident in Cambridge, England . I was in ARM The headquarters has been engaged in high performance for more than ten years CPU Architecture research and AI architecture research .
In the past decade , Deep neural network (DNN) It has been widely used , For example, mobile phones ,AR/VR,IoT And automatic driving . Complex use cases lead to many DNN The emergence of model application , for example VR The application of contains many subtasks : Avoid collision with nearby obstacles through target detection , Predict input through the tracking of opponents or gestures , Through the eye tracking to complete the center point rendering , These subtasks can use different DNN Model to complete . Like autonomous vehicle, it also uses a series of DNN Algorithm to realize the perception function , Every DNN To accomplish a specific task . But different DNN The network layer and operator of the model are also very different , Even in a DNN Heterogeneous operators and types may also be used in the model .
Besides ,Torch、TensorFlow and Caffe Wait for the mainstream deep learning framework , It is still handled in a sequential way inference Mission , One process per model . Therefore, it also leads to the current NPU Architecture is still focused on a single DNN Task acceleration and optimization , This is far from enough DNN Performance requirements of model application , There is an urgent need for new models at the bottom NPU Computing architecture accelerates and optimizes multi model tasks . And reconfigurable NPU Although it can adapt to the diversity of neural network layer , But additional hardware resources are needed to support ( For example, switching unit , Interconnection and control module ), It will also lead to additional power consumption caused by reconfiguration of the network layer .

Development NPU To support multitasking models faces many challenges :DNN The diversity of load is improved NPU The complexity of design ; Multiple DNN Linkage between , Lead to DNN Scheduling between becomes difficult ; How to balance reconfigurability and customization becomes more challenging . In addition, this kind of NPU Additional performance criteria are also introduced in the design : Because of multiple DNN Delay caused by data sharing between models , Multiple DNN How to allocate resources effectively between models .
The current direction of design research can be roughly divided into the following points : Multiple DNN Parallel execution between models , The redesign NPU Architecture to effectively support DNN Diversity of models , Optimization of scheduling strategy .

Edit search map
DNN Parallelism and scheduling strategy :
Parallel strategies such as time division multiplexing and spatial cooperative positioning can be used . Scheduling algorithm can be roughly divided into three directions : Static and dynamic scheduling , Scheduling for time and space , And scheduling based on software or hardware .
Time division multiplexing It is an upgraded version of the traditional priority preemption strategy , allow inter-DNN Assembly line operation , To improve the utilization of system resources (PE and memory etc. ). This strategy focuses on the optimization of scheduling algorithm , The advantage is to NPU There are few changes to the hardware .
Spatial collaborative positioning Focus on multiple DNN Parallelism of model execution , That's different DNN The model can occupy NPU Different parts of hardware resources . It requires design NPU Each stage should be predicted DNN Network characteristics and priorities , Take the predefined part NPU Hardware units are assigned to specific DNN Internet use . The assigned strategy can be selected DNN Dynamic allocation during operation , Or static allocation . Static allocation depends on the hardware scheduler , Less software intervention . The advantage of spatial collaborative positioning is that it can better improve the performance of the system , But the hardware changes are relatively large .
Dynamic scheduling and static scheduling Dynamic scheduling or static scheduling is selected according to the specific goals of user use cases .
Dynamic scheduling is more flexible , According to the reality DNN Reallocate resources according to the needs of the task . Dynamic scheduling mainly depends on time division multiplexing , Or use a dynamically composable engine ( You need to add a dynamic scheduler to your hardware ), Most algorithms choose preemptive Strategy or AI-MT Early expulsion algorithm of .
For customized static scheduling strategy , Can better improve NPU Performance of . This scheduling strategy refers to NPU At the design stage, specific hardware modules have been customized to deal with specific neural network layers or specific operations . This scheduling strategy has high performance , But the hardware changes are relatively large .

Edit search map
isomerism NPU framework :
Static scheduling strategy combining dynamic reconfiguration and customization , stay NPU Design multiple sub accelerators in , Each sub accelerator is targeted at a specific neural network layer or specific network operations . In this way, the scheduler can adapt to multiple DNN The network layer of the model runs on the appropriate sub accelerator , You can also schedule from different DNN The network layer of the model runs synchronously on multiple sub accelerators . This can not only save the additional hardware resource consumption brought by the reconfiguration architecture , It can also improve the flexibility of processing in different network layers .
isomerism NPU The research and design of architecture can be mainly considered from these three aspects :
1) How to design multi seed accelerator according to the characteristics of different network layers ;
2) How to distribute resources among different sub accelerators ;
3) How to schedule the specific network layer that meets the memory limit to execute on the appropriate sub accelerator .
reference :
[1] Stylianos I. Venieris, and etc.“Multi-DNN Accelerators for Next-Generation AI Systems”
https://arxiv.org/pdf/2205.09376.pdf
[2] Hyoukjun K. Liangzhen L and etc. “Heterogeneous Dataflow Accelerators for Multi-DNN- Workloads”
https://arxiv.org/abs/1909.07437
边栏推荐
- Reading notes of top performance version 2 (V) -- file system monitoring
- Task. Run(), Task. Factory. Analysis of behavior inconsistency between startnew() and new task()
- Smart Party Building: faith through time and space | 7.1 dedication
- Overview | slam of laser and vision fusion
- 搜索框和按钮缩放时会有缝隙的bug
- 【显存优化】深度学习显存优化方法
- Go language learning notes - Gorm use - table addition, deletion, modification and query | web framework gin (VIII)
- MySQL高级篇4
- 工厂高精度定位管理系统,数字化安全生产管理
- 使用腾讯云搭建图床服务
猜你喜欢

Automatic, intelligent and visual! Deeply convinced of the eight designs behind sslo scheme

Please, stop painting star! This has nothing to do with patriotism!

三星率先投产3nm芯片,上海应届硕士生可直接落户,南开成立芯片科学中心,今日更多大新闻在此...

Programming examples of stm32f1 and stm32subeide - production melody of PWM driven buzzer

【LeetCode】43. 字符串相乘

自动、智能、可视!深信服SSLO方案背后的八大设计

【php毕业设计】基于php+mysql+apache的教材管理系统设计与实现(毕业论文+程序源码)——教材管理系统

电脑照片尺寸如何调整成自己想要的

TensorFlow团队:我们没被抛弃

C#/VB.NET 合并PDF文档
随机推荐
Redis high availability principle
并发编程系列之什么是ForkJoin框架?
Tensorflow team: we haven't been abandoned
Trace the source of drugs and tamp the safety dike
vim 从嫌弃到依赖(22)——自动补全
ABAP-调用Restful API
超视频时代,什么样的技术会成为底座?
基于PHP的轻量企业销售管理系统
Does 1.5.1 in Seata support mysql8?
马来西亚《星报》:在WTO MC12 孙宇晨仍在坚持数字经济梦想
Nuxt. JS data prefetching
Embedded development: five revision control best practices
STM32ADC模拟/数字转换详解
Don't ask me again why MySQL hasn't left the index? For these reasons, I'll tell you all
如何写出好代码 - 防御式编程指南
华为发布HCSP-Solution-5G Security人才认证,助力5G安全人才生态建设
[daily news]what happened to the corresponding author of latex
She is the "HR of others" | ones character
使用腾讯云搭建图床服务
ADS算力芯片的多模型架构研究