当前位置:网站首页>This "advanced" technology design 15 years ago makes CPU shine in AI reasoning
This "advanced" technology design 15 years ago makes CPU shine in AI reasoning
2022-07-07 04:27:00 【Intel edge computing community】
data 、 Calculate the force 、 Algorithm , The troika that drives AI .
—— This is hot knowledge .
Today's data 、 The development speed of algorithm is faster than computing power , But computing power will become the driving force AI Industrialization and industry AI The key factor of industrialization development .
—— It is also well known ?
The traditional view is that GPU It is more suitable for accelerating artificial intelligence , It's not .
—— Um. ? Let's talk about .
According to the IDC Produced 《2020-2021 China's AI computing power development assessment report 》 Show , Next few years , Reasoning workloads in various industries AI Applications continue to increase .IDC Even predict , The server market share for reasoning workload will exceed that of training in the near future .
Deep learning is the representative of the third wave of artificial intelligence , It is divided into two stages of training and reasoning .
Compared with the training stage that requires a lot of computing power and data , Reasoning has a relatively low demand for the amount of data , But we need to respond as quickly as possible and optimize energy efficiency .
Give Way AI Landing is more about reasoning —— According to a small amount of data in the real world , Provide the right answer quickly . And do large-scale reasoning ,CPU The platform has great advantages , The user's learning threshold is low 、 The deployment speed is fast while ensuring low risk .
Hardware innovation , Instruction set upgrade :
mining CPU AI acceleration potential
But if you dig CPU Deep learning acceleration potential , Where to start ?
Accelerate data center 、 Enterprise and intelligent edge computing environments AI Reasoning workload , For example, image recognition 、 Object detection and image segmentation , All need strong computing support .
Then the more essential problem is :CPU Where does the power of calculation come from .
in fact , about CPU for , Its ability to process data , That is, calculation power , Depending on CPU Continuous introduction and improvement of specific accelerated instruction sets or arithmetic units .
With the explosion of data , There are more and more varieties , To achieve efficient processing of these data , Especially efficient parallel processing ,CPU The instruction set of is also constantly upgrading 、 evolution .
The working mode of early general-purpose processors , Generally based on SISD( Single instruction single data stream ) Instructions , In each core , An instruction operates on one piece of data at a time .
“ A single ”、“ One ”, These words doomed that such instructions would not be very efficient in the scenario of rising computing demand . Especially in image processing 、 game 、AI Common array operations in calculation , Its array multiplication operation is SISD Under orders , Need to be broken down into 3 Operation instructions , These multiplication operations are actually the same .
New instructions came into being , Processors are beginning to introduce new SIMD( Single instruction multi data stream ) Instructions to improve efficiency , This kind of instruction allows one instruction to operate on multiple pieces of data at a time .
1996 year , Intel launched MMX Instruction set , First joined the pair SIMD Command support , At the same time, Intel also equipped it with a special 64 Bit register .
This means being outside the main road , It also opens up a wider dedicated channel for busy computing needs ( register ). More Than This , Intel also joined FMA ( Fusion multiply plus ) Instruction set , Let the processor complete two basic operations of addition and multiplication at one time , Efficiency doubled again .
Lengthen the timeline , You can see 1999-2007 During the year , Intel is right SIMD The instruction set is constantly upgraded and optimized , And by the 2007 year ,AVX The birth of .
Intel in its Sandy Bridge New advanced vector extensions have been introduced into the microarchitecture (Advanced Vector Extensions,AVX) Instruction set , It not only extends the vector computing power to 256 position , New data processing enhancement functions such as data rearrangement are also added .
At Intel To the strong Integrated in the scalable processor family AVX-512 Instruction set , The register has been changed from the original 64 Bit upgraded to 512 position , And it has two 512 Bit FMA unit .
This means that the application can execute at the same time 32 Sub double 、64 Second single precision floating point operations , Or operate eight 64 Bits and sixteen 32 An integer , Computing power has been greatly improved .
AVX-512 Integrate into Intel Xeon platform ,
Integration acceleration AI Deep learning
The innovation process at the hardware level is as follows , The next step is “ the real thing ” Application practice of .
2017 year , The first generation Intel To the strong Scalable platform ( Take the first generation Intel To the strong Scalable processor as the core ) Appearance , Intel AVX-512 Technology is added .
AVX-512 With the help of ultra wide 512 The bit vector operation function improves the performance of severe workloads .
Compare with the previous generation platforms that did not integrate this technology , The new platform can process more data per clock cycle .
AVX-512 Technology is already a powerful weapon ,FMA ( Fusion multiply plus ) The integration of instruction set is double buff The blessing ——FMA The integration of can perform floating-point multiplication in one step - Addition operation , Round only once , It can improve the speed and accuracy of floating-point operation .
increase FMA Cell can further improve the concurrency of vector computing , Platinum for the first generation Xeon scalable processors (Platinum) Series and some gold medals (Gold) Each core of the series has 2 individual FMA unit , The other part is Zhiqiang gold medal series 、 Silver medal (Silver) And bronze (Bronze) Each core of the series has 1 individual FMA unit .
In addition to the innovation of instruction set , Intel Xeon series also integrates from many aspects AI Reasoning acceleration .
2019 year , Intel has launched the second generation of Intel To the strong Scalable platform , This is also Intel's official move towards integration AI Reasoning acceleration .
Intel Deep learning speeds up (DL Boost) technology , At that time, it was mainly about CPU Yes INT8 The acceleration of reasoning , With its bonus , The reasoning performance of the second generation of Xeon is up to 30 times , This makes it Intel's first integrated AI The mainstream data center level of acceleration capability CPU.
There are many places to use , The technological innovation of more than ten years ago shines brightly in the diverse computing needs
At the specific application level ,AVX-512 Instruction sets are widely used , Include Scientific simulation 、 Financial analysis 、 AI deep learning 、3D Modeling and analysis 、 Image and audio / Video processing 、 Encryption and data compression And so on .
For example, in video codec 、 Transcoding and other processing processes , Applications need to perform large-scale repetitive floating-point calculations , and AVX-512 Instruction set can play its strengths in it .
In the video cloud service scenario , Integrate AVX-512 The new generation of instruction set Intel To the strong platinum 8180 Whether the processor is in single task delay 、 It is still in the full throughput index , Than the old Intel To the strong E5-2699 v4 The processor has been greatly improved .
As shown in the figure above , On single task delay , The new processor brings the biggest 2 Times performance improvement ; In terms of full throughput , Transcoding performance can be achieved at most 1.4-1.5 Double the rise .
In fact, if you look back , A dozen years ago AVX-512 The emergence of some “ leading ”, I wonder if there were technical R & D personnel at that time “ Future generations see the present as well as the present and the past ” The epiphany of —— Understand the future AI Will usher in the third wave 、 The video age will rise 、 High performance computing will also gradually break into public view ……
Various complex computing requirements , Give Way AVX-512 No longer represents high operating pressure , Instead, it accelerates all kinds of applications : Very suitable for the moment AI The reasoning load is getting heavier 、 Scenes in urgent need of acceleration , In particular, the calculation accuracy is not so high AI application .
In this way , Technological innovation still comes before demand , Time will verify everything .
About Intel Advanced vector extension 512 For more information, please click “ Read the original ” link .
边栏推荐
- [multi threading exercise] write a multi threading example of the producer consumer model.
- EasyCVR集群重启导致其他服务器设备通道状态离线情况的优化
- [coded font series] opendyslexic font
- ESG全球领导者峰会|英特尔王锐:以科技之力应对全球气候挑战
- 一度辍学的数学差生,获得今年菲尔兹奖
- Collection of idea gradle Lombok errors
- Ssm+jsp realizes enterprise management system (OA management system source code + database + document +ppt)
- Analysis on the thinking of college mathematical modeling competition and curriculum education of the 2022a question of the China Youth Cup
- [team learning] [phase 34] Baidu PaddlePaddle AI talent Creation Camp
- Kotlin Compose Text支持两种颜色
猜你喜欢
2022中青杯C题城市交通思路分析
案例大赏:英特尔携众多合作伙伴推动多领域AI产业创新发展
2022年电工杯B 题 5G 网络环境下应急物资配送问题思路分析
[coded font series] opendyslexic font
Hardware development notes (10): basic process of hardware development, making a USB to RS232 module (9): create ch340g/max232 package library sop-16 and associate principle primitive devices
Five years of automated testing, and finally into the ByteDance, the annual salary of 30W is not out of reach
EasyCVR无法使用WebRTC进行播放,该如何解决?
Win11图片打不开怎么办?Win11无法打开图片的修复方法
Ssm+jsp realizes the warehouse management system, and the interface is called an elegant interface
接口自动化测试实践指导(中):接口测试场景有哪些
随机推荐
Unity3D在一建筑GL材料可以改变颜色和显示样本
Master the secrets of software security testing methods, and pinch the security test report with your hands
史上最全MongoDB之安全认证
Comment les tests de logiciels sont - ils effectués sur le site Web? Testez la stratégie!
【刷题记录】2. 两数相加
Win11图片打不开怎么办?Win11无法打开图片的修复方法
Unit test asp Net MVC 4 Application - unit testing asp Net MVC 4 apps thoroughly
Nanopineo use development process record
The JSON format of the international area code of the mobile phone number is obtained with PHP
2022中青杯数学建模B题开放三孩背景下的生育政策研究思路
Easycvr cannot be played using webrtc. How to solve it?
SSM+jsp实现仓库管理系统,界面那叫一个优雅
[record of question brushing] 2 Add two numbers
JS form get form & get form elements
Mathematical analysis_ Notes_ Chapter 10: integral with parameters
数学分析_笔记_第10章:含参变量积分
sscanf,sscanf_s及其相关使用方法「建议收藏」
英特尔David Tuhy:英特尔傲腾技术成功的原因
机器人(自动化)课程的持续学习-2022-
How to solve the problem of adding RTSP device to easycvr cluster version and prompting server ID error?