What is cluster ? Why use a cluster architecture ?
2022-06-09 16:58·
One 、 What is cluster ?
In short , A cluster is a group of ( Several ) Independent computers , A large computer service system composed of high-speed communication network , Each cluster node ( Each computer in the cluster ) Are independent servers running their own services . These servers can communicate with each other , Collaboration provides applications to users , System resources and data , And managed in a single system model . When a user requests a clustered system , The cluster gives users the impression that it is a single independent server , In fact, the user requests a group of clustered servers .
for instance :
Open Google , Baidu's page , It looks so simple , Maybe you think you can make a similar web page in a few minutes , But in fact , Behind this page is the result of thousands of server clusters working together .
To describe a cluster in one sentence , That is, a group of servers cooperate to do the same thing , These machines may need unified and coordinated management , It can be distributed in one computer room , It can also be distributed in multiple computer rooms in various regions of the country and the world .
Two 、 Why use clustering ?
(1) High performance
Important computing intensive applications in some countries ( Like the weather forecast , Nuclear test simulation, etc ), The computer should have a strong ability to calculate and process . With the technology available all over the world , Even large machines , Its computing power is also limited , It is difficult to accomplish this task alone . Because the calculation time may be quite long , Maybe a few days , Even a few years or more . therefore , For this kind of complex computing business , The computer cluster technology is used , Dozens or hundreds of them , Even thousands of computers do calculations .

If you have one LNMP Environmental Science , Only service is needed at a time 10 Four concurrent requests , Then a single server must be faster than multiple server clusters . Only when the number of concurrent or total requests exceeds the capacity of a single server , Server cluster will show its advantages .
(2) Price effectiveness
Usually a system cluster architecture , Only a few or dozens of server hosts are needed . It is much cheaper than the special-purpose supercomputer which is worth millions of yuan at every turn . Under the same performance requirements , The computer cluster architecture is more cost-effective than the large-scale computers with the same computing power .
Early Taobao , The database and other core systems of Alipay use minicomputer servers with millions of yuan . Later, due to the high cost of use and maintenance and the geometric doubling of the cost of equipment expansion , Even become the bottleneck of expansion , Personnel maintenance is also very difficult , End use PC Server cluster replacement , such as , Combine the database system with a small computer Oracle Database migration to MySQL Open source database integration PC On the server . Not only did the cost drop , It is also easier to expand and maintain .
(3) Scalability
When the service load , When the pressure increases , The cluster system can be simply extended to meet the requirements , And will not reduce the service quality .
Usually , If the hardware device wants to expand the performance , Have to add new CPU And memory devices , If you can't add it , You have to buy higher performance servers , Take our current servers for example , The number of devices that can be added is always limited . If cluster technology is adopted , You only need to add a new single server to the existing cluster architecture , From the point of view of the customer visited , The system service is almost unchanged in terms of continuity and performance , The system has been upgraded unconsciously , Increased access , Easily implemented extensions . The number of nodes in the cluster system can grow to thousands or even tens of thousands , It's much more scalable than a single supercomputer .
(4) High availability
A single computer system will always face the problem of equipment damage , Such as CPU, Memory , a main board , Power Supply , Hard disk, etc. , If only one part breaks , The computer system may be down , Unable to provide services properly . In a clustered system , Although some hardware and software will still fail , But the service of the whole system can be 7*24 Hours available .
The cluster architecture technology can make the system continue to work in case of several hardware equipment failures , This minimizes system downtime . The cluster system improves the system reliability at the same time , It also greatly reduces the business loss caused by system failure , At present, almost 100% All Internet sites require 7*24 Hour service .
(5) transparency
A loosely coupled cluster system composed of several independent computers constitutes a virtual server . When a user or client program accesses a clustered system , It's like accessing a high-performance computer , Like a highly available server , Some servers in the cluster go online , Offline will not interrupt the service of the whole system , It's also transparent to users .
(6) manageability
The whole system can be physically large , But it's easy to manage , It's like managing a single image system . In an ideal situation , The software and hardware modules can be plug and play .
(7) Programmability
On a cluster system , Easy to develop and modify various applications .
3、 ... and 、 Common classifications of clusters
1、 Common classifications of clusters
Computer cluster architecture can be divided into the following categories according to function and structure :
Tips :
Load balancing cluster and high availability cluster are common cluster architecture modes in the Internet industry , It is also the focus of our study .
2、 Common classifications of clusters
(1) Load balancing cluster
- Load balancing cluster provides more practical for enterprises , More cost effective system architecture solutions . The load balancing cluster can distribute the load pressure of access requests from many customers equally in the computer cluster . Customer access request load usually includes application processing load and network traffic load . Such a system is well suited to the pattern of using the same set of applications to serve a large number of users , Each node can bear the load pressure of access request , And it can realize the dynamic allocation of access requests among nodes , To achieve load balancing .
Load balancing cluster runtime , Generally, customer access requests are distributed to a group of servers on the back end through one or more front-end load balancers , So as to achieve high performance and high availability of the whole system . Generally, high availability cluster and load balancing cluster will use similar technology , Or it has the characteristics of high availability and load balancing at the same time .
The role of a load balancing cluster is :
Typical open source software for load balancing clusters includes LVS,Nginx,Haproxy etc. . As shown in the figure below :

Tips :
Different services have a switching time of several seconds ,DB The business is obviously better than Web Service switching time .
(2) High availability clusters
Generally, it refers to when any node in the cluster fails , All tasks on this node will be automatically transferred to other normal nodes . This process does not affect the operation of the entire cluster .
When a node system in the cluster fails , The cluster service of the operator will react quickly , Allocate the services of the system to other working systems in the cluster . Considering the fault tolerance of computer hardware and software , The main purpose of high availability cluster is to make the overall services of the cluster as available as possible . If the primary node in the high availability cluster fails , Then the backup node will replace it during this period . The standby node is usually the mirror of the primary node . When it replaces the primary node , It can take over the master node completely ( Include IP Address and other resources ) Provide services , therefore , Make the clustered system environment consistent for users , That is, the access of users will not be affected .
The high availability cluster makes the running speed and response speed of the server system as fast as possible . They often use redundant nodes and servers running on multiple machines to track each other . If a node fails , Its replacement will take over its duties in a few seconds or less . therefore , For users , Any machine in the cluster goes down , Business will not be affected ( In theory ).
The role of a high availability cluster is :
When a machine goes down , Another machine takes over the downtime of the machine IP Resources and service resources , Provide services .
It is often used in applications that are difficult to achieve load balancing , Such as load balancers , Master database , Between primary storage pairs .
Open source software commonly used in high availability clusters includes Keepalived,Heartbeat etc. , The frame composition is shown in the figure below :

(3) High performance computing cluster
High performance computing cluster is also called parallel computing . Usually , High performance computing clusters involve parallel applications developed for clusters , To solve complex scientific problems ( The weather forecast , Oil exploration , Nuclear reaction simulation, etc ). The high-performance computing cluster is like a supercomputer , The supercomputer is internally composed of tens of thousands of independent servers , And communicate on the common messaging layer to run applications in parallel . In the production environment, the task is actually cut into cakes , Then it is distributed to the cluster node for calculation , The result is returned after calculation , Then continue to calculate the new task , So back and forth .
(4) Grid computing cluster
Because it is seldom used , Skip here
hot tip :
In Internet websites , Load balancing cluster and high availability cluster are commonly used
Four 、 Introduction and selection of commonly used cluster software and hardware
1、 Common cluster software and hardware products in enterprises
The open source cluster software commonly used by Internet enterprises includes :Nginx,LVS,Haproxy,Keepalived,heartbeat.
The common business cluster hardware used by Internet enterprises are :F5,Netscaler,Radware,A10 etc. , The working mode is equivalent to Haproxy Working mode of .
TaoBao , Catch up network , Sina and other companies have used Netscaler Load balancing products . Cluster hardware Netscaler The product diagram of is shown in the figure below :

Cluster hardware F5 The product is shown in the figure below :

2、 How to select cluster software and hardware products
When business matters , The technical force is weak , And hope to pay for products and better services , You can choose hardware load balancing products , Such as F5,Netscaler,Radware etc. , Most of these companies are traditional large non internet enterprises , Like a bank , negotiable securities , Finance and BMW , Benz, etc
For portals , Most of them use software and hardware products to share the risk of a single product , Such as Taobao , tencent , Sina, etc . The financed enterprises will buy hardware products , Such as ganji.com .
Small and medium-sized Internet enterprises , Because there is no profit to be made at the initial stage or the profit is very low , Will hope to solve the problem by using open source and free solutions , Therefore, special operation and maintenance personnel will be employed for maintenance . for example :51CTO etc.
By comparison , The cost of commercial load balancing products is high , Good performance , A more stable , The disadvantage is that it cannot be redeveloped , Open source load balancing software has high requirements on the ability of operation and maintenance personnel , If the operation, maintenance and development ability is strong , Then open source load balancing software is a good choice ,
The current Internet industry tends to use open source load balancing software .
3、 How to select open source cluster software products
The websites of small and medium-sized Internet companies are accessed concurrently and the total number of visits is not very large , Preferred Nginx Load balancing , The reason is that Nginx Load balancing configuration is simple , Easy to use , Safe and stable , The community is active , More and more people are using it , Become a popular trend , Another similar product for load balancing is Haproxy( Support L4 and L7 load , Equally good , But the community is not as good as Nginx active ).
If you want to consider Nginx High availability of load balancing , Preferred Keepalived Software , The reason is that installation and configuration are simple , Easy to use , Safe and stable , And Keepalived There are other highly available software that serve similar services Heartbeat( It's complicated to use , It is not recommended for beginners )
If it is a large enterprise Internet company , Load balancing products can use LVS+Keepalived Do layer 4 forwarding at the front end ( It is generally active / standby or active / active , If you need to extend, you can use DNS Or front-end use OSPF), The back-end using Nginx perhaps Haproxy do 7 Layer forwarding ( It can be expanded to hundreds ), Then there is the application server , If it is load balancing and high availability of database and storage , Suggested choice LVS+Heartbeat,LVS Support TCP Forward and DR The model is very efficient ,Heartbeat Can cooperate with drbd, Not only can VIP Handoff , Block device level data synchronization can also be supported (drbd), And the management of resource services .
原网站版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/160/202206091832175551.html