当前位置:网站首页>Before we learn about high-performance computing, let's take a look at its history
Before we learn about high-performance computing, let's take a look at its history
2022-06-10 23:22:00 【Huawei cloud developer Alliance】
Abstract : stay 2005 Years later, , The performance of single core scalar processors has basically reached its peak , It's hard to go any further ( exceed 10%) Lifting performance .
This article is shared from Huawei cloud community 《 High performance computing (1)—— Reading history to know the rise and fall 》, author : I'm a big watermelon .
The rise of parallelism
stay 2005 Years ago , Most processors are single core , Some processors have started Support vectorization ( Such as X86 Processor supported MMX( Multimedia extension ) and SSE( streaming SIMD Expand ) Instruction set ), Processor manufacturer via Increase the frequency of single core scalar processors and Instruction level parallel processing capability ( That is, improve the performance of the instruction pipeline ) To improve the computing performance of the processor .
stay 2005 Years ago , The performance of a single core scalar processor is basically per 18 Nearly doubled in months , This is called Moore's law , As shown in the figure below . The period when the performance improvement of a single core scalar processor can meet Moore's law is called the period of improving software performance “ A free lunch ” period , Because the performance of single core scalar code can meet the speed increase described by Moore's law .

stay 2005 Years later, , The performance of single core scalar processors has basically reached its peak , It's hard to go any further ( exceed 10%) Lifting performance —— Why? ?
Single core scalar processors improve performance by approximating Moore's law can , It mainly improves performance in the following ways :
- Increase the clock frequency of the processor : The clock frequency of the processor indicates that the processor 1 How many basic operations can be run in seconds . Power consumption limits the continuous improvement of frequency : From the point of view of physical Theorem , With the advancement of processor process , Maximum power consumption of the processor ( Mainly leakage power consumption ) More and more big ( The power consumption of the processor is approximately proportional to the third power of the processor frequency ),THINK: cool 、 Superconducting computer 、 Quantum computers
- Improve instruction level parallelism : There are many different components on a single core scalar processor , Each component performs different command operations . To make good use of the processor's instruction level parallelism, code optimizers are required 、 Compiler authors and processor designers work together . Processor design usually increases the level of hardware pipeline , Now this method has reached its limits . Hardware designers improve performance by increasing the length of hardware registers . for example , The original register length is 32 position , Now upgrade to 128 position ,THINK: Increase the number of registers ? Why the mainstream is 32 Bit and 64 position ?https://www.zhihu.com/question/21641577
therefore , Due to heat dissipation, the frequency of the processor cannot be increased , Hardware manufacturers have switched to using multiple processors on a single chip , This is called Multicore ; To provide higher performance processors , Processor hardware manufacturers process multiple data simultaneously by increasing the width of registers and instructions , This is called To quantify .
Difficulties of multi-core and vectorization :
- basis Amdahl The laws of , The proportion of serial code in the program limits the best results that parallelized code can achieve .
- Multi core vector processor ( In especial X86) In order to reduce the delay in obtaining data , A large number of caches are used to store repeatedly accessed data . But caching does not contribute to the native computing power of the hardware .
- Some code cannot use multi-core parallelization or Vectorization .
- We should give full play to the computing power of vectorization and multi-core , Multiple copies of code may be required , This increases the cost of code maintenance .
- In some strict application scenarios , It is limited that it cannot allow the deviation of calculation results caused by vectorization and multithreading .
The rise of heterogeneous parallelism
The rise of heterogeneous parallel era is from 2007 year NVIDIA Introduction CUDA(Computing Unified Device Architecture, Computing Unified Device Architecture ) At the beginning . Heterogeneous parallelism contains two sub concepts :
- isomerism (Homogenerous). Heterogeneous means that heterogeneous parallel computing needs to process multiple computing platforms with different architectures at the same time The problem of , Such as the current mainstream heterogeneous parallel computing platform X86+GPU、X86+FPGA, And currently under development ARM/Power+GPU
- parallel (Parallel). Parallel means that heterogeneous parallel computing mainly adopts parallel programming , Whether it's X86 It's about Manager , still ARM and GPU Processor and DSP, All the processors here are multi-core vector processors
At present CPU Has fallen behind Moore's law , but GPU It is still developing rapidly .CPU and GPU They are all chips with computing power . among ,CPU Not only good at command operation , And good at all kinds of numerical operations ; and GPU It's a chip specially designed for processing graphics tasks , Only good at graphic function classes
Numerical calculation . In terms of hardware design ,CPU It consists of several cores optimized for sequential serial processing . On the other hand , GPU By thousands of smaller 、 More efficient core components , These cores are designed for multitasking at the same time .

All in all ,GPU It is designed for the numerical calculation of matrix graphic functions . It uses a large number of repeated design operation units to establish a large number of numerical operation threads , He is good at highly parallel numerical calculation of a large number of parallel data without logical relationship . and CPU Is based on both “ Instructions are executed in parallel ” and “ Data parallel operation ” Design according to the idea of the design , Good at handling scheduling with complex instructions 、 loop 、 Branch 、 The program tasks of logical judgment and execution, etc .

Huawei partners and developers conference 2022 The fire is coming , Heavy content can't be missed !
【 Wonderful activities 】
March forward courageously · Be an all-around Developer →12 Technology live broadcast ,8 High energy output of the great technical treasure , And the code room 、 Many rounds of mysterious tasks such as knowledge competition are waiting for you to challenge . Break through immediately , Open the ultimate prize ! Click to embark on the promotion of all-round developers !
【 Technical topics 】
The future has to ,2022 Technical exploration → Huawei's cutting-edge technologies in various fields 、 Heavy open source project 、 Innovative application practice , Standing at the entrance of the intelligent world , Explore how the future shines into reality , Full of dry goods, click to learn
Click to follow , The first time to learn about Huawei's new cloud technology ~
边栏推荐
- 关于String.format(String format, Object... args)
- Array, list, set, map, properties dependency injection format
- 数据与信息资源共享平台(五)
- 样板小作坊
- Is it safe for BOC securities to open an account? Is it formal?
- 【视频】KMEANS均值聚类和层次聚类:R语言分析生活幸福指数可视化|数据分享
- PwnTheBox,Pwn:tutorial1
- DependencyManagement 和 Dependencies
- 掌握高性能计算前,我们先了解一下它的历史
- Dependencymanagement and dependencies
猜你喜欢

LeetCode+ 16 - 20

34. find the first and last positions of elements in the sorted array - binary search, double pointer

Distributed Foundation

乘风破浪,探索数据可视化开发平台 FlyFish 开源背后的秘密!

执行Oracle的SQL语句报错【ORA-00904: “CREATETIME“: 标识符无效】、【ORA-00913: 值过多】解决办法

UE4 getting started with bone animation

功能测试之设计语言测试:功能测试包含哪些测试?分别有什么作用

AI智能视频分析EasyCVR平台设备通道批量删除功能的开发实现

PwnTheBox,Web:hello
![[Video] kmeans mean clustering and hierarchical clustering: R language analysis life happiness index visualization | data sharing](/img/d5/d544ab0c14ba22946219feafdc3392.png)
[Video] kmeans mean clustering and hierarchical clustering: R language analysis life happiness index visualization | data sharing
随机推荐
Is it too late to teach yourself programming at 28? Is it reliable?
数据与信息资源共享平台(八)
Icml2022 | reexamine end-to-end voice to text translation from scratch
Native support for the first version of arm64! Microsoft win11/10 free tool set PowerToys 0.59 release
项目实训10——对特定数据库的备份
Our understanding of the industrial Internet is still trapped in the logic of an Internet like platform and center
About string format(String format, Object... args)
Project training 11 - regular backup of database
上海股票开户是安全的吗?
【QPSK中频】基于FPGA的QPSK中频信号产生模块verilog设计
Relevant knowledge of flowable BPMN
ICML2022 | 從零開始重新審視端到端的語音到文本翻譯
执行Oracle的SQL语句报错【ORA-00904: “CREATETIME“: 标识符无效】、【ORA-00913: 值过多】解决办法
Laravel8 enables alicloud file upload
掌握高性能计算前,我们先了解一下它的历史
Exécuteur - shutdown, shutdown Now, awaittermination details and actual Fighting
Executor - Shutdown、ShutdownNow、awaitTermination 詳解與實戰
Software features and functions of the blind box mall app system development
AI智能视频分析EasyCVR平台设备通道批量删除功能的开发实现
项目实训13——界面补充