当前位置:网站首页>This "advanced" technology design 15 years ago makes CPU shine in AI reasoning
This "advanced" technology design 15 years ago makes CPU shine in AI reasoning
2022-07-07 04:27:00 【Intel edge computing community】
data 、 Calculate the force 、 Algorithm , The troika that drives AI .
—— This is hot knowledge .
Today's data 、 The development speed of algorithm is faster than computing power , But computing power will become the driving force AI Industrialization and industry AI The key factor of industrialization development .
—— It is also well known ?
The traditional view is that GPU It is more suitable for accelerating artificial intelligence , It's not .
—— Um. ? Let's talk about .
According to the IDC Produced 《2020-2021 China's AI computing power development assessment report 》 Show , Next few years , Reasoning workloads in various industries AI Applications continue to increase .IDC Even predict , The server market share for reasoning workload will exceed that of training in the near future .
Deep learning is the representative of the third wave of artificial intelligence , It is divided into two stages of training and reasoning .
Compared with the training stage that requires a lot of computing power and data , Reasoning has a relatively low demand for the amount of data , But we need to respond as quickly as possible and optimize energy efficiency .
Give Way AI Landing is more about reasoning —— According to a small amount of data in the real world , Provide the right answer quickly . And do large-scale reasoning ,CPU The platform has great advantages , The user's learning threshold is low 、 The deployment speed is fast while ensuring low risk .
Hardware innovation , Instruction set upgrade :
mining CPU AI acceleration potential
But if you dig CPU Deep learning acceleration potential , Where to start ?
Accelerate data center 、 Enterprise and intelligent edge computing environments AI Reasoning workload , For example, image recognition 、 Object detection and image segmentation , All need strong computing support .
Then the more essential problem is :CPU Where does the power of calculation come from .
in fact , about CPU for , Its ability to process data , That is, calculation power , Depending on CPU Continuous introduction and improvement of specific accelerated instruction sets or arithmetic units .
With the explosion of data , There are more and more varieties , To achieve efficient processing of these data , Especially efficient parallel processing ,CPU The instruction set of is also constantly upgrading 、 evolution .
The working mode of early general-purpose processors , Generally based on SISD( Single instruction single data stream ) Instructions , In each core , An instruction operates on one piece of data at a time .
“ A single ”、“ One ”, These words doomed that such instructions would not be very efficient in the scenario of rising computing demand . Especially in image processing 、 game 、AI Common array operations in calculation , Its array multiplication operation is SISD Under orders , Need to be broken down into 3 Operation instructions , These multiplication operations are actually the same .
New instructions came into being , Processors are beginning to introduce new SIMD( Single instruction multi data stream ) Instructions to improve efficiency , This kind of instruction allows one instruction to operate on multiple pieces of data at a time .
1996 year , Intel launched MMX Instruction set , First joined the pair SIMD Command support , At the same time, Intel also equipped it with a special 64 Bit register .
This means being outside the main road , It also opens up a wider dedicated channel for busy computing needs ( register ). More Than This , Intel also joined FMA ( Fusion multiply plus ) Instruction set , Let the processor complete two basic operations of addition and multiplication at one time , Efficiency doubled again .
Lengthen the timeline , You can see 1999-2007 During the year , Intel is right SIMD The instruction set is constantly upgraded and optimized , And by the 2007 year ,AVX The birth of .
Intel in its Sandy Bridge New advanced vector extensions have been introduced into the microarchitecture (Advanced Vector Extensions,AVX) Instruction set , It not only extends the vector computing power to 256 position , New data processing enhancement functions such as data rearrangement are also added .
At Intel To the strong Integrated in the scalable processor family AVX-512 Instruction set , The register has been changed from the original 64 Bit upgraded to 512 position , And it has two 512 Bit FMA unit .
This means that the application can execute at the same time 32 Sub double 、64 Second single precision floating point operations , Or operate eight 64 Bits and sixteen 32 An integer , Computing power has been greatly improved .
AVX-512 Integrate into Intel Xeon platform ,
Integration acceleration AI Deep learning
The innovation process at the hardware level is as follows , The next step is “ the real thing ” Application practice of .
2017 year , The first generation Intel To the strong Scalable platform ( Take the first generation Intel To the strong Scalable processor as the core ) Appearance , Intel AVX-512 Technology is added .
AVX-512 With the help of ultra wide 512 The bit vector operation function improves the performance of severe workloads .
Compare with the previous generation platforms that did not integrate this technology , The new platform can process more data per clock cycle .
AVX-512 Technology is already a powerful weapon ,FMA ( Fusion multiply plus ) The integration of instruction set is double buff The blessing ——FMA The integration of can perform floating-point multiplication in one step - Addition operation , Round only once , It can improve the speed and accuracy of floating-point operation .
increase FMA Cell can further improve the concurrency of vector computing , Platinum for the first generation Xeon scalable processors (Platinum) Series and some gold medals (Gold) Each core of the series has 2 individual FMA unit , The other part is Zhiqiang gold medal series 、 Silver medal (Silver) And bronze (Bronze) Each core of the series has 1 individual FMA unit .
In addition to the innovation of instruction set , Intel Xeon series also integrates from many aspects AI Reasoning acceleration .
2019 year , Intel has launched the second generation of Intel To the strong Scalable platform , This is also Intel's official move towards integration AI Reasoning acceleration .
Intel Deep learning speeds up (DL Boost) technology , At that time, it was mainly about CPU Yes INT8 The acceleration of reasoning , With its bonus , The reasoning performance of the second generation of Xeon is up to 30 times , This makes it Intel's first integrated AI The mainstream data center level of acceleration capability CPU.
There are many places to use , The technological innovation of more than ten years ago shines brightly in the diverse computing needs
At the specific application level ,AVX-512 Instruction sets are widely used , Include Scientific simulation 、 Financial analysis 、 AI deep learning 、3D Modeling and analysis 、 Image and audio / Video processing 、 Encryption and data compression And so on .
For example, in video codec 、 Transcoding and other processing processes , Applications need to perform large-scale repetitive floating-point calculations , and AVX-512 Instruction set can play its strengths in it .
In the video cloud service scenario , Integrate AVX-512 The new generation of instruction set Intel To the strong platinum 8180 Whether the processor is in single task delay 、 It is still in the full throughput index , Than the old Intel To the strong E5-2699 v4 The processor has been greatly improved .
As shown in the figure above , On single task delay , The new processor brings the biggest 2 Times performance improvement ; In terms of full throughput , Transcoding performance can be achieved at most 1.4-1.5 Double the rise .
In fact, if you look back , A dozen years ago AVX-512 The emergence of some “ leading ”, I wonder if there were technical R & D personnel at that time “ Future generations see the present as well as the present and the past ” The epiphany of —— Understand the future AI Will usher in the third wave 、 The video age will rise 、 High performance computing will also gradually break into public view ……
Various complex computing requirements , Give Way AVX-512 No longer represents high operating pressure , Instead, it accelerates all kinds of applications : Very suitable for the moment AI The reasoning load is getting heavier 、 Scenes in urgent need of acceleration , In particular, the calculation accuracy is not so high AI application .
In this way , Technological innovation still comes before demand , Time will verify everything .
About Intel Advanced vector extension 512 For more information, please click “ Read the original ” link .
边栏推荐
- 如何编写一个程序猿另一个面试官眼前一亮的简历[通俗易懂]
- 图灵诞辰110周年,智能机器预言成真了吗?
- AI 落地新题型 RPA + AI =?
- 视频融合云平台EasyCVR视频广场左侧栏列表样式优化
- mpf2_线性规划_CAPM_sharpe_Arbitrage Pricin_Inversion Gauss Jordan_Statsmodel_Pulp_pLU_Cholesky_QR_Jacobi
- Use facet to record operation log
- [knife-4j quickly build swagger]
- EasyCVR视频广场点击播放时,主菜单高亮效果消失问题的修复
- 硬件开发笔记(十): 硬件开发基本流程,制作一个USB转RS232的模块(九):创建CH340G/MAX232封装库sop-16并关联原理图元器件
- kivy教程之设置窗体大小和背景(教程含源码)
猜你喜欢
[team learning] [34 issues] scratch (Level 2)
Opencv third party Library
Quick completion guide of manipulator (10): accessible workspace
The most complete deployment of mongodb in history
EasyCVR平台接入RTMP协议,接口调用提示获取录像错误该如何解决?
用CPU方案打破内存墙?学PayPal堆傲腾扩容量,漏查欺诈交易量可降至1/30
[team learning] [34 sessions] Alibaba cloud Tianchi online programming training camp
mpf2_线性规划_CAPM_sharpe_Arbitrage Pricin_Inversion Gauss Jordan_Statsmodel_Pulp_pLU_Cholesky_QR_Jacobi
[system management] clear the icon cache of deleted programs in the taskbar
科兴与香港大学临床试验中心研究团队和香港港怡医院合作,在中国香港启动奥密克戎特异性灭活疫苗加强剂临床试验
随机推荐
Why does WordPress open so slowly?
高薪程序员&面试题精讲系列120之Redis集群原理你熟悉吗?如何保证Redis的高可用(上)?
Ssm+jsp realizes enterprise management system (OA management system source code + database + document +ppt)
【OA】Excel 文档生成器: Openpyxl 模块
1.19.11. SQL client, start SQL client, execute SQL query, environment configuration file, restart policy, user-defined functions, constructor parameters
过气光刻机也不能卖给中国!美国无理施压荷兰ASML,国产芯片再遭打压
sscanf,sscanf_s及其相关使用方法「建议收藏」
什么是 CGI,什么是 IIS,什么是VPS「建议收藏」
Golang calculates constellations and signs based on birthdays
[multi threading exercise] write a multi threading example of the producer consumer model.
SQL where multiple field filtering
[leetcode]Spiral Matrix II
2022年电工杯B 题 5G 网络环境下应急物资配送问题思路分析
Imitate Tengu eating the moon with Avatar
Collection of idea gradle Lombok errors
Unit test asp Net MVC 4 Application - unit testing asp Net MVC 4 apps thoroughly
Unity3d can change colors and display samples in a building GL material
Kotlin Compose Text支持两种颜色
kivy教程之设置窗体大小和背景(教程含源码)
2022中青杯C题城市交通思路分析