当前位置:网站首页>This "advanced" technology design 15 years ago makes CPU shine in AI reasoning
This "advanced" technology design 15 years ago makes CPU shine in AI reasoning
2022-07-07 04:27:00 【Intel edge computing community】
data 、 Calculate the force 、 Algorithm , The troika that drives AI .
—— This is hot knowledge .
Today's data 、 The development speed of algorithm is faster than computing power , But computing power will become the driving force AI Industrialization and industry AI The key factor of industrialization development .
—— It is also well known ?
The traditional view is that GPU It is more suitable for accelerating artificial intelligence , It's not .
—— Um. ? Let's talk about .
According to the IDC Produced 《2020-2021 China's AI computing power development assessment report 》 Show , Next few years , Reasoning workloads in various industries AI Applications continue to increase .IDC Even predict , The server market share for reasoning workload will exceed that of training in the near future .
Deep learning is the representative of the third wave of artificial intelligence , It is divided into two stages of training and reasoning .
Compared with the training stage that requires a lot of computing power and data , Reasoning has a relatively low demand for the amount of data , But we need to respond as quickly as possible and optimize energy efficiency .
Give Way AI Landing is more about reasoning —— According to a small amount of data in the real world , Provide the right answer quickly . And do large-scale reasoning ,CPU The platform has great advantages , The user's learning threshold is low 、 The deployment speed is fast while ensuring low risk .
Hardware innovation , Instruction set upgrade :
mining CPU AI acceleration potential
But if you dig CPU Deep learning acceleration potential , Where to start ?
Accelerate data center 、 Enterprise and intelligent edge computing environments AI Reasoning workload , For example, image recognition 、 Object detection and image segmentation , All need strong computing support .
Then the more essential problem is :CPU Where does the power of calculation come from .
in fact , about CPU for , Its ability to process data , That is, calculation power , Depending on CPU Continuous introduction and improvement of specific accelerated instruction sets or arithmetic units .
With the explosion of data , There are more and more varieties , To achieve efficient processing of these data , Especially efficient parallel processing ,CPU The instruction set of is also constantly upgrading 、 evolution .
The working mode of early general-purpose processors , Generally based on SISD( Single instruction single data stream ) Instructions , In each core , An instruction operates on one piece of data at a time .
“ A single ”、“ One ”, These words doomed that such instructions would not be very efficient in the scenario of rising computing demand . Especially in image processing 、 game 、AI Common array operations in calculation , Its array multiplication operation is SISD Under orders , Need to be broken down into 3 Operation instructions , These multiplication operations are actually the same .
New instructions came into being , Processors are beginning to introduce new SIMD( Single instruction multi data stream ) Instructions to improve efficiency , This kind of instruction allows one instruction to operate on multiple pieces of data at a time .
1996 year , Intel launched MMX Instruction set , First joined the pair SIMD Command support , At the same time, Intel also equipped it with a special 64 Bit register .
This means being outside the main road , It also opens up a wider dedicated channel for busy computing needs ( register ). More Than This , Intel also joined FMA ( Fusion multiply plus ) Instruction set , Let the processor complete two basic operations of addition and multiplication at one time , Efficiency doubled again .
Lengthen the timeline , You can see 1999-2007 During the year , Intel is right SIMD The instruction set is constantly upgraded and optimized , And by the 2007 year ,AVX The birth of .
Intel in its Sandy Bridge New advanced vector extensions have been introduced into the microarchitecture (Advanced Vector Extensions,AVX) Instruction set , It not only extends the vector computing power to 256 position , New data processing enhancement functions such as data rearrangement are also added .
At Intel To the strong Integrated in the scalable processor family AVX-512 Instruction set , The register has been changed from the original 64 Bit upgraded to 512 position , And it has two 512 Bit FMA unit .
This means that the application can execute at the same time 32 Sub double 、64 Second single precision floating point operations , Or operate eight 64 Bits and sixteen 32 An integer , Computing power has been greatly improved .
AVX-512 Integrate into Intel Xeon platform ,
Integration acceleration AI Deep learning
The innovation process at the hardware level is as follows , The next step is “ the real thing ” Application practice of .
2017 year , The first generation Intel To the strong Scalable platform ( Take the first generation Intel To the strong Scalable processor as the core ) Appearance , Intel AVX-512 Technology is added .
AVX-512 With the help of ultra wide 512 The bit vector operation function improves the performance of severe workloads .
Compare with the previous generation platforms that did not integrate this technology , The new platform can process more data per clock cycle .
AVX-512 Technology is already a powerful weapon ,FMA ( Fusion multiply plus ) The integration of instruction set is double buff The blessing ——FMA The integration of can perform floating-point multiplication in one step - Addition operation , Round only once , It can improve the speed and accuracy of floating-point operation .
increase FMA Cell can further improve the concurrency of vector computing , Platinum for the first generation Xeon scalable processors (Platinum) Series and some gold medals (Gold) Each core of the series has 2 individual FMA unit , The other part is Zhiqiang gold medal series 、 Silver medal (Silver) And bronze (Bronze) Each core of the series has 1 individual FMA unit .
In addition to the innovation of instruction set , Intel Xeon series also integrates from many aspects AI Reasoning acceleration .
2019 year , Intel has launched the second generation of Intel To the strong Scalable platform , This is also Intel's official move towards integration AI Reasoning acceleration .
Intel Deep learning speeds up (DL Boost) technology , At that time, it was mainly about CPU Yes INT8 The acceleration of reasoning , With its bonus , The reasoning performance of the second generation of Xeon is up to 30 times , This makes it Intel's first integrated AI The mainstream data center level of acceleration capability CPU.
There are many places to use , The technological innovation of more than ten years ago shines brightly in the diverse computing needs
At the specific application level ,AVX-512 Instruction sets are widely used , Include Scientific simulation 、 Financial analysis 、 AI deep learning 、3D Modeling and analysis 、 Image and audio / Video processing 、 Encryption and data compression And so on .
For example, in video codec 、 Transcoding and other processing processes , Applications need to perform large-scale repetitive floating-point calculations , and AVX-512 Instruction set can play its strengths in it .
In the video cloud service scenario , Integrate AVX-512 The new generation of instruction set Intel To the strong platinum 8180 Whether the processor is in single task delay 、 It is still in the full throughput index , Than the old Intel To the strong E5-2699 v4 The processor has been greatly improved .
As shown in the figure above , On single task delay , The new processor brings the biggest 2 Times performance improvement ; In terms of full throughput , Transcoding performance can be achieved at most 1.4-1.5 Double the rise .
In fact, if you look back , A dozen years ago AVX-512 The emergence of some “ leading ”, I wonder if there were technical R & D personnel at that time “ Future generations see the present as well as the present and the past ” The epiphany of —— Understand the future AI Will usher in the third wave 、 The video age will rise 、 High performance computing will also gradually break into public view ……
Various complex computing requirements , Give Way AVX-512 No longer represents high operating pressure , Instead, it accelerates all kinds of applications : Very suitable for the moment AI The reasoning load is getting heavier 、 Scenes in urgent need of acceleration , In particular, the calculation accuracy is not so high AI application .
In this way , Technological innovation still comes before demand , Time will verify everything .
About Intel Advanced vector extension 512 For more information, please click “ Read the original ” link .
边栏推荐
- 机器人(自动化)课程的持续学习-2022-
- How do test / development programmers get promoted? From nothing, from thin to thick
- Formation continue en robotique (automatisation) - 2022 -
- VIM - own active button indent this command "suggestions collection"
- Easycvr cannot be played using webrtc. How to solve it?
- Analysis on urban transportation ideas of 2022 Zhongqing cup C
- In cooperation with the research team of the clinical trial center of the University of Hong Kong and Hong Kong Gangyi hospital, Kexing launched the clinical trial of Omicron specific inactivated vacc
- Triple half circle progress bar, you can use it directly
- 案例大赏:英特尔携众多合作伙伴推动多领域AI产业创新发展
- The most complete security certification of mongodb in history
猜你喜欢
Mongo shell, the most complete mongodb in history
EasyCVR视频广场点击播放时,主菜单高亮效果消失问题的修复
Mathematical analysis_ Notes_ Chapter 10: integral with parameters
硬件开发笔记(十): 硬件开发基本流程,制作一个USB转RS232的模块(九):创建CH340G/MAX232封装库sop-16并关联原理图元器件
Food Chem|深度学习根据成分声明准确预测食品类别和营养成分
CUDA Programming
Collection of idea gradle Lombok errors
1.19.11. SQL client, start SQL client, execute SQL query, environment configuration file, restart policy, user-defined functions, constructor parameters
Digital chemical plants realize the coexistence of advantages of high quality, low cost and fast efficiency
史上最全MongoDB之安全认证
随机推荐
软件测试之网站测试如何进行?测试小攻略走起!
Highly paid programmers & interview questions. Are you familiar with the redis cluster principle of series 120? How to ensure the high availability of redis (Part 1)?
kivy教程之设置窗体大小和背景(教程含源码)
NTU notes 6422quiz review (1-3 sections)
Golang compresses and decompresses zip files
Zero knowledge private application platform aleo (1) what is aleo
The root file system of buildreoot prompts "depmod:applt not found"
Digital chemical plant management system based on Virtual Simulation Technology
Win11控制面板快捷键 Win11打开控制面板的多种方法
Different meat customers joined hands with Dexter to launch different hamburgers in some stores across the country
史上最全MongoDB之部署篇
POJ training plan 2253_ Frogger (shortest /floyd)
超越Postman,新一代国产调试工具Apifox,用起来够优雅
A series of shortcut keys for jetbrain pychar
How to conduct website testing of software testing? Test strategy let's go!
Ssm+jsp realizes enterprise management system (OA management system source code + database + document +ppt)
Digital chemical plants realize the coexistence of advantages of high quality, low cost and fast efficiency
In cooperation with the research team of the clinical trial center of the University of Hong Kong and Hong Kong Gangyi hospital, Kexing launched the clinical trial of Omicron specific inactivated vacc
[team learning] [34 issues] scratch (Level 2)
POJ培训计划2253_Frogger(最短/floyd)