当前位置:网站首页>Server SMP, NUMA, MPP system learning notes.

Server SMP, NUMA, MPP system learning notes.

2022-07-06 23:37:00 Galloping tortoise

One 、 background

Commercial processors are committed to the development of single core processors , Its performance has been brought into full play , Simply increasing the speed of a single core chip will generate too much heat and will not bring corresponding performance improvements , but CPU Performance requirements are greater than CPU Speed of development .

Although it can be improved by increasing the assembly line CPU The frequency of , But because of The increase of buffer and the poor control of leakage current , Resulting in a substantial increase in power , The performance is not as good as the previous low-frequency CPU. because CPU The power of the system increases , Lead to CPU The problem of heat dissipation is even more serious , Air cooling can no longer solve the problem .

that , This led to the emergence of new technologies : Multicore processor . As early as 1996 In, there was the first multi-core CPU Prototype Hydra.2001 year IBM Launch the first commercial multi-core processor POWER4,2005 year Intal and AMD Large scale application of multi-core processors .

Multicore processors are becoming more and more popular , stay The server 、 desktop 、 Netbook 、 Flat 、 Mobile phones or medical devices 、 The defence 、 Aerospace and other aspects have been widely used .

Two 、 The development of multi-core processors

2.1 Distinguish from... In terms of architecture

  • Isomorphic multi-core architecture : The processors in the system are the same in architecture .
  • Heterogeneous multi-core architecture : The processors in the system are different in architecture .

Homogeneous multi-core architecture is relatively simple in hardware and software design , High versatility .

Heterogeneous multicore processors have :TI The Da Vinci platform DM6000 series (ARM9+DSP)、Xilinx Of Zynq7000 series ( Dual core Cortex-A9+FPGA)、Cell processor (1 individual 64 position POWERPC+8 individual 32 Bit coprocessor ) wait .

Homogeneous multicore processors have :Exynos4412,freescale i.mx6 dual and quad series 、TI Of OMAP4460 etc. ,Intel Of Core Duo、Core2 Duo etc. .

2.2 Distinguish from the operation mode

In terms of software , Multi core processors are common 2 Operation mode :

   AMP( Asymmetric multiprocessing )
   SMP( Symmetric multiprocessing )

2.2.1 SMP Pattern

Two is SMP(Symmetric Multi-processing) Pattern :SMP The operating system architecture of mode is a variant of multi-core processor technology , An operating system instance controls all processors , All processors share memory . And AMP Each in the mode CPU Running an operating system instance on is different ,SMP All in the mode system CPU In the same position , Run an operating system instance together , all CPU Share system memory and peripheral resources . be relative to AMP Pattern ,SMP The operating system of mode has Shareable memory 、 High performance and power consumption ratio 、 And easy to realize load balancing Other advantages , It can give full play to the hardware advantages of multi-core processors .
 Insert picture description here
 Insert picture description here
chart 2-3 Shown SMP The mode operation system is responsible for coordinating the work between the two processor cores , Two processor cores share the same operating system instance in main memory . Although the address of the application is the same in each core , But after MMU Map them to different locations in main memory , Thus, the code and data space between the two applications are isolated .

2.2.2 AMP Pattern

One is AMP(Asymmetric Multi-processing) Pattern :AMP Mode RTOS In all CPU An operating system instance is running on both ( These operation instances are not necessarily identical ), Each operating system has its own dedicated memory , They communicate with each other through limited access to shared memory .AMP The operating system structure of mode requires users to participate in the allocation of system resources . This type of RTOS Less application , There are only Wind River The company's VxWorks Provide AMP Configuration of mode .

 Insert picture description here  Insert picture description here
chart 2-4 Typical AMP System structure , every last CPU Running an operation system instance on , Each operating system has its own exclusive resources ( The most basic thing is to monopolize their own CPU), Other resources may be shared by the two systems 、 Or allocated to each system for special . The allocation of resources is decided by the user , So it is visible to users . Among commercial real-time operation systems, only WindRiver The company's VxWorks Provides AMP Mode support , At present, the application of this mode is less .

2.2.3 SMP and AMP Feature summary

SMP Of features yes : There is only one operating system instance , Run on multiple CPU On , Every CPU The structure is the same , Memory 、 Resource sharing . One of the biggest features of this system is Share all resources .

AMP Of features yes : Multiple CPU, each CPU The architecture is different , Every CPU The kernel runs a separate operating system or a separate instance of the same operating system , Every CPU Have your own independent resources . The biggest feature of this structure is Don't share resources .

3、 ... and 、 The development of server architecture

3.1 Distinguish from... In terms of architecture

From the perspective of system architecture , The current commercial servers can be roughly divided into three categories

  • Symmetric multiprocessor architecture (SMP:Symmetric Multi-Processor)
  • Inconsistent storage access structure (NUMA:Non-Uniform Memory Access)
  • Massive parallel processing architecture (MPP:Massive Parallel Processing)

There are two models for shared memory multiprocessors

  • Uniform memory access (Uniform-Memory-Access, abbreviation UMA)
  • Model nonuniform memory access (Nonuniform-Memory-Access, abbreviation NUMA) Model

3.1.1 SMP(Symmetric Multi-Processor)

So-called Symmetric multiprocessor architecture , It refers to multiple CPU Symmetrical work , There is no primary or secondary or subordinate relationship . various CPU Share the same physical memory , Every CPU It takes the same time to access any address in memory , therefore SMP Also known as consistent memory access structure (UMA:Uniform Memory Access) Yes SMP The way the server can expand includes increasing memory 、 Use faster CPU、 increase CPU、 expand I/O( The number of slots and the number of buses ) And add more peripherals ( Usually disk storage ).

SMP The main feature of the server is sharing , All the resources in the system (CPU、 Memory 、I/O etc. ) It's all Shared . It is precisely because of this characteristic , Led to SMP The main problem with the server , That is, its scalability is very limited .

about SMP For servers , Every shared link can cause SMP The bottleneck of server expansion , The most Limited is memory . Because each CPU The same memory resources must be accessed through the same memory bus , So with CPU An increase in quantity , Memory access conflicts will increase rapidly , In the end CPU Waste of resources , send CPU The effectiveness of performance is greatly reduced . Experimental proof ,SMP The server CPU The best use case is 2 to 4 individual CPU
 Insert picture description here

3.1.2 NUMA(Non-Uniform Memory Access)

because SMP Limitations on scalability , People began to explore how to effectively expand the technology to build large-scale systems ,NUMA It is one of the results of this effort to use NUMA technology , You can put dozens of CPU( Even a hundred CPU) Combined in one server .
 Insert picture description here

  • NUMA The multiprocessor model is shown in the figure , Its access time varies with the location of the stored word . Its shared memory is physically distributed on the local memory of all processors . The set of all local memory forms the global address space , It can be accessed by all processors . It is faster for the processor to access local memory , But accessing remote memory belonging to another processor is slower , Because there will be additional delay through the interconnection network .
  • NUMA The basic feature of a server is that it has multiple servers CPU modular , Every CPU The module consists of multiple CPU( Such as 4 individual ) form , And has independent local memory 、I/O Notches, etc .

 Insert picture description here
Because the nodes can be interconnected through modules ( It is called Crossbar Switch) Connect and interact with information , So every CPU Can access the memory of the whole system ( This is a NUMA System and MPP Important differences in systems ). obviously , Access to local memory will be much faster than access to remote memory ( Memory of other nodes in the system ) The speed of , This is also inconsistent storage access NUMA The origin of .

Because of this characteristic , In order to better play the system performance , When developing applications, you need to minimize differences CPU Information interaction between modules . utilize NUMA technology , It can solve the problem of SMP The expansion of the system , It can support hundreds of physical servers CPU. Typical NUMA Examples of servers include HP Of Superdome、SUN15K、IBMp690 etc. .

but NUMA Technology also has some flaws , Because the latency of accessing remote memory far exceeds local memory , So when CPU As the number increases , System performance cannot be increased linearly . Such as HP company Superdome Server time , It has been published with HP Other UNIX The relative performance value of the server , Results found ,64 road CPU Of Superdome (NUMA structure ) The relative performance value of is 20, and 8 road N4000( Shared SMP structure ) The relative performance value of is 6.3. From this result we can see that ,8 Multiples CPU In exchange for 3 Double performance improvement .

3.1.3 MPP(Massive Parallel Processing)

 Insert picture description here

and NUMA Different ,MPP It provides another way to expand the system , It consists of multiple SMP The server is connected through a certain node Internet , Working together , Complete the same task , From the user's point of view, it's a server system . Its basic feature is that it consists of many SMP The server ( Every SMP The server is called a node ) It is connected by node Internet , Each node only accesses its own local resources ( Memory 、 Storage, etc ), It's a total no sharing (Share Nothing) structure , So the expansion ability is the best , Theoretically, there is no limit to its expansion , Current technology can achieve 512 Nodes are interconnected , Thousands CPU. At present, there is no standard for node Internet in the industry , Such as NCR Of Bynet,IBM Of SPSwitch, They all adopt different internal implementation mechanisms . But the node Internet is only for MPP Server internal use , Transparent to users .

stay MPP In the system , Every SMP Nodes can also run their own operating systems 、 Database etc. . But and NUMA The difference is , It does not have the problem of remote memory access . In other words , Within each node CPU Can't access the memory of another node . The information interaction between nodes is realized through the node Internet , This process is commonly referred to as data redistribution (Data Redistribution).

however MPP The server needs a complex mechanism to schedule and balance the load and parallel processing of each node . At present, some are based on MPP Technical servers often use system level software ( Such as a database ) To shield this complexity . for instance ,NCR Of Teradata Is based on MPP Technology of a relational database software , When developing applications based on this database , No matter how many nodes the backend server consists of , Developers are faced with the same database system , There is no need to consider how to schedule the load of some nodes .

3.2 Comparison of advantages and disadvantages of Architecture

3.2.1 NUMA、MPP、SMP Performance differences between

NUMA The node interconnection mechanism of is implemented within the same physical server , When a CPU When remote memory access is required , It has to wait , This is also NUMA The server can't implement CPU Performance expands linearly as it increases .MPP The node interconnection mechanism is different SMP Outside the server through I/O Realized , Each node only accesses local memory and storage , The information interaction between nodes and the processing of nodes themselves are carried out in parallel . therefore MPP When adding nodes, the performance can basically achieve linear expansion .SMP be-all CPU Resources are shared , Therefore, linear expansion is fully realized .

3.2.2 NUMA、MPP、SMP Differences between extensions

NUMA Theoretically, it can be expanded infinitely , At present, the technology is relatively mature and can support hundreds of CPU Expand . Such as HP Of SUPERDOME.
MPP Theoretically, it can also achieve infinite expansion , At present, the technology is relatively mature and can support 512 Nodes , Thousands CPU Expand .
SMP Poor scalability , at present 2 A to 4 individual CPU The utilization rate of is the best , however IBM Of BOOK technology , To be able to CPU Extended to 8 individual .
MPP It's made up of many SMP constitute , Multiple SMP The server is connected through a certain node Internet , Working together , Complete the same task .

3.2.3 MPP and SMP、NUMA The difference between applications

  • MPP The advantages of

    MPP The system does not share resources , So for it , Resources than SMP More , When the transaction to be handled reaches a certain scale ,MPP Is more efficient than SMP good . because MPP Because the system needs to transmit information between different processing units , When communication time is short , that MPP The system can give full play to the advantages of resources , To achieve high efficiency . in other words : Operations have nothing to do with each other , There is less communication between processing units , That uses MPP The system is better . therefore ,MPP The system shows its advantages in decision support and data mining .

  • SMP The advantages of

    MPP Because the system needs to transmit information between different processing units , So it's more efficient than SMP It's a little closer . When there is much communication time , that MPP The system can give full play to the advantages of resources . Therefore, currently used OTLP In the program , Users access a central database , If the SMP System structure , It is more efficient than adopting MPP The structure is much faster .

  • NUMA Advantages of Architecture

    NUMA Architecture , It can integrate many in one physical server CPU, Make the system have high transaction processing ability , Because the time delay of remote memory access is much longer than that of local memory access , Therefore, it is necessary to minimize differences CPU Data interaction between modules . obviously ,NUMA Architecture is more suitable for OLTP Transaction processing environment , When used in a data warehouse environment , Because a large number of complex data processing will inevitably lead to a large number of data interaction , Will make CPU Greatly reduce the utilization of .

Four 、 summary

Traditional multi-core computing uses SMP(Symmetric Multi-Processor ) Pattern : Combine multiple processors with a centralized memory and I/O Bus connection . All processors can only access the same physical memory , therefore SMP Systems are sometimes referred to as consistent memory access (UMA) Structure system , Consistency means no matter when , The processor can only keep or share a unique value for each data in memory . Obviously ,SMP The disadvantage is limited scalability , Because in memory and I/O When the interface reaches saturation , Adding processors doesn't get better performance , Corresponding to it are AMP framework , There is a master-slave relationship between different nuclei , For example, one core controls the business of another core , It can be understood as control plane and data plane in multi-core system .

NUMA Pattern is a kind of Distributed memory access , The processor can access different memory addresses at the same time , Greatly improve parallelism . NUMA In mode , The processor is divided into multiple ” node ”(node), The local memory space allocated to each node . The processors in all nodes can access all the physical storage of the system , But the time required to access the memory in this node , It takes much less time to access storage in some remote nodes .

NUMA The main advantage of is scalability .NUMA The architecture has been designed beyond SMP Architecture limitations on scalability . adopt SMP, All memory accesses are passed to the same shared memory bus . This is a great way to CPU A relatively small number of cases , But it does not apply to dozens or even hundreds of CPU The situation of , Because of these CPU Will compete with each other for access to the shared memory bus .NUMA By limiting the on any memory bus CPU Number and rely on high-speed interconnection to connect nodes , This alleviates these bottlenecks .

Reference material :
https://scitechconnect.elsevier.com/asymmetric-multi-processing-amp-vs-symmetric-multi-processing-smp/

原网站

版权声明
本文为[Galloping tortoise]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/187/202207061556596227.html