当前位置:网站首页>This "advanced" technology design 15 years ago makes CPU shine in AI reasoning

This "advanced" technology design 15 years ago makes CPU shine in AI reasoning

2022-07-07 04:27:00 Intel edge computing community

data 、 Calculate the force 、 Algorithm , The troika that drives AI .

—— This is hot knowledge .

Today's data 、 The development speed of algorithm is faster than computing power , But computing power will become the driving force AI Industrialization and industry AI The key factor of industrialization development .

—— It is also well known ?

The traditional view is that GPU It is more suitable for accelerating artificial intelligence , It's not .

—— Um. ? Let's talk about .

According to the IDC Produced 《2020-2021 China's AI computing power development assessment report 》 Show , Next few years , Reasoning workloads in various industries AI Applications continue to increase .IDC Even predict , The server market share for reasoning workload will exceed that of training in the near future .

Deep learning is the representative of the third wave of artificial intelligence , It is divided into two stages of training and reasoning .

Compared with the training stage that requires a lot of computing power and data , Reasoning has a relatively low demand for the amount of data , But we need to respond as quickly as possible and optimize energy efficiency .

Give Way AI Landing is more about reasoning —— According to a small amount of data in the real world , Provide the right answer quickly . And do large-scale reasoning ,CPU The platform has great advantages , The user's learning threshold is low 、 The deployment speed is fast while ensuring low risk .

Hardware innovation , Instruction set upgrade :

mining CPU AI acceleration potential

But if you dig CPU Deep learning acceleration potential , Where to start ?

Accelerate data center 、 Enterprise and intelligent edge computing environments AI Reasoning workload , For example, image recognition 、 Object detection and image segmentation , All need strong computing support .

Then the more essential problem is :CPU Where does the power of calculation come from .

in fact , about CPU for , Its ability to process data , That is, calculation power , Depending on CPU Continuous introduction and improvement of specific accelerated instruction sets or arithmetic units .

With the explosion of data , There are more and more varieties , To achieve efficient processing of these data , Especially efficient parallel processing ,CPU The instruction set of is also constantly upgrading 、 evolution .

The working mode of early general-purpose processors , Generally based on SISD( Single instruction single data stream ) Instructions , In each core , An instruction operates on one piece of data at a time .

“ A single ”、“ One ”, These words doomed that such instructions would not be very efficient in the scenario of rising computing demand . Especially in image processing 、 game 、AI Common array operations in calculation , Its array multiplication operation is SISD Under orders , Need to be broken down into 3 Operation instructions , These multiplication operations are actually the same .

New instructions came into being , Processors are beginning to introduce new SIMD( Single instruction multi data stream ) Instructions to improve efficiency , This kind of instruction allows one instruction to operate on multiple pieces of data at a time .

1996 year , Intel launched MMX Instruction set , First joined the pair SIMD Command support , At the same time, Intel also equipped it with a special 64 Bit register .

This means being outside the main road , It also opens up a wider dedicated channel for busy computing needs ( register ). More Than This , Intel also joined FMA ( Fusion multiply plus ) Instruction set , Let the processor complete two basic operations of addition and multiplication at one time , Efficiency doubled again .

Lengthen the timeline , You can see 1999-2007 During the year , Intel is right SIMD The instruction set is constantly upgraded and optimized , And by the 2007 year ,AVX The birth of .

Intel in its Sandy Bridge New advanced vector extensions have been introduced into the microarchitecture (Advanced Vector Extensions,AVX) Instruction set , It not only extends the vector computing power to 256 position , New data processing enhancement functions such as data rearrangement are also added .

At Intel   To the strong   Integrated in the scalable processor family AVX-512 Instruction set , The register has been changed from the original 64 Bit upgraded to 512 position , And it has two 512 Bit FMA unit .

This means that the application can execute at the same time 32 Sub double 、64 Second single precision floating point operations , Or operate eight 64 Bits and sixteen 32 An integer , Computing power has been greatly improved .

AVX-512 Integrate into Intel Xeon platform ,

Integration acceleration AI Deep learning

The innovation process at the hardware level is as follows , The next step is “ the real thing ” Application practice of .

2017 year , The first generation Intel   To the strong   Scalable platform ( Take the first generation Intel   To the strong   Scalable processor as the core ) Appearance , Intel  AVX-512 Technology is added .

AVX-512 With the help of ultra wide 512 The bit vector operation function improves the performance of severe workloads .

Compare with the previous generation platforms that did not integrate this technology , The new platform can process more data per clock cycle .

AVX-512 Technology is already a powerful weapon ,FMA ( Fusion multiply plus ) The integration of instruction set is double buff The blessing ——FMA The integration of can perform floating-point multiplication in one step - Addition operation , Round only once , It can improve the speed and accuracy of floating-point operation .

increase FMA Cell can further improve the concurrency of vector computing , Platinum for the first generation Xeon scalable processors (Platinum) Series and some gold medals (Gold) Each core of the series has 2 individual FMA unit , The other part is Zhiqiang gold medal series 、 Silver medal (Silver) And bronze (Bronze) Each core of the series has 1 individual FMA unit .

In addition to the innovation of instruction set , Intel Xeon series also integrates from many aspects AI Reasoning acceleration .

2019 year , Intel has launched the second generation of Intel   To the strong   Scalable platform , This is also Intel's official move towards integration AI Reasoning acceleration .

Intel   Deep learning speeds up (DL Boost) technology , At that time, it was mainly about CPU Yes INT8 The acceleration of reasoning , With its bonus , The reasoning performance of the second generation of Xeon is up to 30 times , This makes it Intel's first integrated AI The mainstream data center level of acceleration capability CPU.

There are many places to use , The technological innovation of more than ten years ago shines brightly in the diverse computing needs

At the specific application level ,AVX-512 Instruction sets are widely used , Include Scientific simulation 、 Financial analysis 、 AI deep learning 、3D Modeling and analysis 、 Image and audio / Video processing 、 Encryption and data compression And so on .

For example, in video codec 、 Transcoding and other processing processes , Applications need to perform large-scale repetitive floating-point calculations , and AVX-512 Instruction set can play its strengths in it .

In the video cloud service scenario , Integrate AVX-512 The new generation of instruction set Intel   To the strong   platinum 8180 Whether the processor is in single task delay 、 It is still in the full throughput index , Than the old Intel   To the strong  E5-2699 v4 The processor has been greatly improved .

As shown in the figure above , On single task delay , The new processor brings the biggest 2 Times performance improvement ; In terms of full throughput , Transcoding performance can be achieved at most 1.4-1.5 Double the rise .

In fact, if you look back , A dozen years ago AVX-512 The emergence of some “ leading ”, I wonder if there were technical R & D personnel at that time “ Future generations see the present as well as the present and the past ” The epiphany of —— Understand the future AI Will usher in the third wave 、 The video age will rise 、 High performance computing will also gradually break into public view ……

Various complex computing requirements , Give Way AVX-512 No longer represents high operating pressure , Instead, it accelerates all kinds of applications : Very suitable for the moment AI The reasoning load is getting heavier 、 Scenes in urgent need of acceleration , In particular, the calculation accuracy is not so high AI application .

In this way , Technological innovation still comes before demand , Time will verify everything .

About Intel   Advanced vector extension  512 For more information, please click “ Read the original ” link .

原网站

版权声明
本文为[Intel edge computing community]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207062145535831.html