当前位置:网站首页>This "advanced" technology design 15 years ago makes CPU shine in AI reasoning
This "advanced" technology design 15 years ago makes CPU shine in AI reasoning
2022-07-07 04:27:00 【Intel edge computing community】

data 、 Calculate the force 、 Algorithm , The troika that drives AI .
—— This is hot knowledge .
Today's data 、 The development speed of algorithm is faster than computing power , But computing power will become the driving force AI Industrialization and industry AI The key factor of industrialization development .
—— It is also well known ?
The traditional view is that GPU It is more suitable for accelerating artificial intelligence , It's not .
—— Um. ? Let's talk about .
According to the IDC Produced 《2020-2021 China's AI computing power development assessment report 》 Show , Next few years , Reasoning workloads in various industries AI Applications continue to increase .IDC Even predict , The server market share for reasoning workload will exceed that of training in the near future .
Deep learning is the representative of the third wave of artificial intelligence , It is divided into two stages of training and reasoning .
Compared with the training stage that requires a lot of computing power and data , Reasoning has a relatively low demand for the amount of data , But we need to respond as quickly as possible and optimize energy efficiency .

Give Way AI Landing is more about reasoning —— According to a small amount of data in the real world , Provide the right answer quickly . And do large-scale reasoning ,CPU The platform has great advantages , The user's learning threshold is low 、 The deployment speed is fast while ensuring low risk .
Hardware innovation , Instruction set upgrade :
mining CPU AI acceleration potential
But if you dig CPU Deep learning acceleration potential , Where to start ?
Accelerate data center 、 Enterprise and intelligent edge computing environments AI Reasoning workload , For example, image recognition 、 Object detection and image segmentation , All need strong computing support .
Then the more essential problem is :CPU Where does the power of calculation come from .
in fact , about CPU for , Its ability to process data , That is, calculation power , Depending on CPU Continuous introduction and improvement of specific accelerated instruction sets or arithmetic units .
With the explosion of data , There are more and more varieties , To achieve efficient processing of these data , Especially efficient parallel processing ,CPU The instruction set of is also constantly upgrading 、 evolution .
The working mode of early general-purpose processors , Generally based on SISD( Single instruction single data stream ) Instructions , In each core , An instruction operates on one piece of data at a time .

“ A single ”、“ One ”, These words doomed that such instructions would not be very efficient in the scenario of rising computing demand . Especially in image processing 、 game 、AI Common array operations in calculation , Its array multiplication operation is SISD Under orders , Need to be broken down into 3 Operation instructions , These multiplication operations are actually the same .
New instructions came into being , Processors are beginning to introduce new SIMD( Single instruction multi data stream ) Instructions to improve efficiency , This kind of instruction allows one instruction to operate on multiple pieces of data at a time .
1996 year , Intel launched MMX Instruction set , First joined the pair SIMD Command support , At the same time, Intel also equipped it with a special 64 Bit register .

This means being outside the main road , It also opens up a wider dedicated channel for busy computing needs ( register ). More Than This , Intel also joined FMA ( Fusion multiply plus ) Instruction set , Let the processor complete two basic operations of addition and multiplication at one time , Efficiency doubled again .
Lengthen the timeline , You can see 1999-2007 During the year , Intel is right SIMD The instruction set is constantly upgraded and optimized , And by the 2007 year ,AVX The birth of .

Intel in its Sandy Bridge New advanced vector extensions have been introduced into the microarchitecture (Advanced Vector Extensions,AVX) Instruction set , It not only extends the vector computing power to 256 position , New data processing enhancement functions such as data rearrangement are also added .
At Intel To the strong Integrated in the scalable processor family AVX-512 Instruction set , The register has been changed from the original 64 Bit upgraded to 512 position , And it has two 512 Bit FMA unit .
This means that the application can execute at the same time 32 Sub double 、64 Second single precision floating point operations , Or operate eight 64 Bits and sixteen 32 An integer , Computing power has been greatly improved .
AVX-512 Integrate into Intel Xeon platform ,
Integration acceleration AI Deep learning
The innovation process at the hardware level is as follows , The next step is “ the real thing ” Application practice of .
2017 year , The first generation Intel To the strong Scalable platform ( Take the first generation Intel To the strong Scalable processor as the core ) Appearance , Intel AVX-512 Technology is added .
AVX-512 With the help of ultra wide 512 The bit vector operation function improves the performance of severe workloads .
Compare with the previous generation platforms that did not integrate this technology , The new platform can process more data per clock cycle .

AVX-512 Technology is already a powerful weapon ,FMA ( Fusion multiply plus ) The integration of instruction set is double buff The blessing ——FMA The integration of can perform floating-point multiplication in one step - Addition operation , Round only once , It can improve the speed and accuracy of floating-point operation .
increase FMA Cell can further improve the concurrency of vector computing , Platinum for the first generation Xeon scalable processors (Platinum) Series and some gold medals (Gold) Each core of the series has 2 individual FMA unit , The other part is Zhiqiang gold medal series 、 Silver medal (Silver) And bronze (Bronze) Each core of the series has 1 individual FMA unit .

In addition to the innovation of instruction set , Intel Xeon series also integrates from many aspects AI Reasoning acceleration .
2019 year , Intel has launched the second generation of Intel To the strong Scalable platform , This is also Intel's official move towards integration AI Reasoning acceleration .
Intel Deep learning speeds up (DL Boost) technology , At that time, it was mainly about CPU Yes INT8 The acceleration of reasoning , With its bonus , The reasoning performance of the second generation of Xeon is up to 30 times , This makes it Intel's first integrated AI The mainstream data center level of acceleration capability CPU.

There are many places to use , The technological innovation of more than ten years ago shines brightly in the diverse computing needs
At the specific application level ,AVX-512 Instruction sets are widely used , Include Scientific simulation 、 Financial analysis 、 AI deep learning 、3D Modeling and analysis 、 Image and audio / Video processing 、 Encryption and data compression And so on .
For example, in video codec 、 Transcoding and other processing processes , Applications need to perform large-scale repetitive floating-point calculations , and AVX-512 Instruction set can play its strengths in it .
In the video cloud service scenario , Integrate AVX-512 The new generation of instruction set Intel To the strong platinum 8180 Whether the processor is in single task delay 、 It is still in the full throughput index , Than the old Intel To the strong E5-2699 v4 The processor has been greatly improved .

As shown in the figure above , On single task delay , The new processor brings the biggest 2 Times performance improvement ; In terms of full throughput , Transcoding performance can be achieved at most 1.4-1.5 Double the rise .
In fact, if you look back , A dozen years ago AVX-512 The emergence of some “ leading ”, I wonder if there were technical R & D personnel at that time “ Future generations see the present as well as the present and the past ” The epiphany of —— Understand the future AI Will usher in the third wave 、 The video age will rise 、 High performance computing will also gradually break into public view ……
Various complex computing requirements , Give Way AVX-512 No longer represents high operating pressure , Instead, it accelerates all kinds of applications : Very suitable for the moment AI The reasoning load is getting heavier 、 Scenes in urgent need of acceleration , In particular, the calculation accuracy is not so high AI application .
In this way , Technological innovation still comes before demand , Time will verify everything .
About Intel Advanced vector extension 512 For more information, please click “ Read the original ” link .
边栏推荐
- 杭州电 3711 Binary Number
- kivy教程之设置窗体大小和背景(教程含源码)
- SSM+jsp实现仓库管理系统,界面那叫一个优雅
- Use facet to record operation log
- 这项15年前的「超前」技术设计,让CPU在AI推理中大放光彩
- 如何编写一个程序猿另一个面试官眼前一亮的简历[通俗易懂]
- [team learning] [phase 34] Baidu PaddlePaddle AI talent Creation Camp
- 过气光刻机也不能卖给中国!美国无理施压荷兰ASML,国产芯片再遭打压
- NFT meta universe chain diversified ecosystem development case
- 论文上岸攻略 | 如何快速入门学术论文写作
猜你喜欢

The request request is encapsulated in uni app, which is easy to understand

史上最全MongoDB之Mongo Shell使用

The easycvr platform is connected to the RTMP protocol, and the interface call prompts how to solve the error of obtaining video recording?

Video fusion cloud platform easycvr video Plaza left column list style optimization

ABAP Dynamic Inner table Group cycle

buildroot的根文件系统提示“depmod:applt not found”
![[ArcGIS tutorial] thematic map production - population density distribution map - population density analysis](/img/82/8f5b6f388d5676cb7ff902ba80d9d2.jpg)
[ArcGIS tutorial] thematic map production - population density distribution map - population density analysis
![[team learning] [34 issues] scratch (Level 2)](/img/29/8383d753eedcffd874bc3f0194152a.jpg)
[team learning] [34 issues] scratch (Level 2)

Easycvr cannot be played using webrtc. How to solve it?

【写给初发论文的人】撰写综述性科技论文常见问题
随机推荐
Triple half circle progress bar, you can use it directly
Ssm+jsp realizes the warehouse management system, and the interface is called an elegant interface
UltraEdit-32 温馨提示:右协会,取消 bak文件[通俗易懂]
Network Security Learning - Information Collection
EasyCVR视频广场点击播放时,主菜单高亮效果消失问题的修复
未婚夫捐5亿美元给女PI,让她不用申请项目,招150位科学家,安心做科研!
Hangzhou Electric 3711 binary number
Fix the problem that the highlight effect of the main menu disappears when the easycvr Video Square is clicked and played
Easycvr cannot be played using webrtc. How to solve it?
SQL where multiple field filtering
史上最全MongoDB之Mongo Shell使用
2022中青杯C题城市交通思路分析
【OA】Excel 文档生成器: Openpyxl 模块
Golang compresses and decompresses zip files
Mathematical analysis_ Notes_ Chapter 10: integral with parameters
杭州电 3711 Binary Number
2022中青杯数学建模B题开放三孩背景下的生育政策研究思路
[coded font series] opendyslexic font
[system management] clear the icon cache of deleted programs in the taskbar
Hardware development notes (10): basic process of hardware development, making a USB to RS232 module (9): create ch340g/max232 package library sop-16 and associate principle primitive devices