当前位置:网站首页>Practice of dynamic load balancing based on open source tars
Practice of dynamic load balancing based on open source tars
2022-06-24 12:20:00 【2020labs assistant】
One 、 background
vivo In the practice of micro services, some businesses in the field of Internet have chosen... Based on the consideration of many comprehensive factors TARS Microservice framework .
The official description is :TARS Is a multi language support 、 Embedded service governance capabilities , And Devops Micro service framework that can coordinate well . On the basis of open source, we have done a lot to adapt the internal system , For example, with CICD Build publishing system 、 Single sign on system to get through , But it's not the point we're going to talk about this time . Here I would like to focus on the dynamic load balancing algorithm that we implement in addition to the existing load balancing algorithm .
Two 、 What is load balancing
Wikipedia is defined as : Load balancing (Load balancing) It's an electronic computer technology , Used on multiple computers ( Computer cluster )、 network connections 、CPU、 Load distribution in disk drives or other resources , In order to optimize the use of resources 、 Maximize throughput 、 Minimize response time 、 Purpose of avoiding overload at the same time . Using multiple server components with load balancing , Replace a single component , Reliability can be improved by redundancy . Load balancing services are usually done by dedicated software and hardware . The main function is to allocate a large number of jobs reasonably to multiple operation units for execution , Used to solve the problem of high concurrency and high availability in Internet Architecture .
This passage is easy to understand , In essence, it is a method to solve the problem of traffic allocation when distributed services deal with a large number of concurrent requests .
3、 ... and 、TARS Which load balancing algorithms are supported
TARS Three load balancing algorithms are supported , Load balancing algorithm based on polling 、 Polling load balancing algorithm based on weight distribution 、 Uniformity hash Load balancing algorithm . The function entry is selectAdapterProxy, Code in TarsCpp In the document , If you are interested, you can learn more about this function .
3.1 Load balancing algorithm based on polling
The implementation of load balancing algorithm based on polling is very simple , The principle is to make all the services available ip Form a call list . When a request arrives, it is assigned to each machine in the request list one by one in chronological order , If it is assigned to the last node in the last list, the cycle starts again from the first node in the list . In this way, the purpose of traffic dispersion is achieved , Balance the load of each machine as much as possible , Improve the efficiency of the machine . This algorithm can basically satisfy a large number of distributed scenarios , This is also TARS The default load balancing algorithm .
But if the processing power of each node is different ? Although the traffic is evenly distributed , But because there are weak nodes in the middle , These nodes still have the possibility of overload . So we have the following load balancing algorithm .
3.2 Polling load balancing algorithm based on weight distribution
As the name suggests, weight assignment is to assign a fixed weight to each node , This weight represents the probability that each node can be assigned traffic . for instance , Yes 5 Nodes , The weights of the configuration are 4,1,1,1,3, If there is 100 Please come here , Then the corresponding assigned traffic is also 40,10,10,10,30. In this way, the client requests are allocated according to the configured weight . Here's a detail to pay attention to , When implementing weighted polling, it must be smooth . That is to say, if there is 10 A request , Not before 4 It's the third time 1 A node .
There are many smooth weighted polling algorithms in the industry , Interested readers can search for information on their own .
3.3 Uniformity Hash
Most of the time, in some business scenarios with cache , In addition to the demand for average traffic distribution , At the same time, there is a requirement that the same client request should fall on the same node as much as possible .
Let's say there's a scenario , A business has 1000 Million users , Each user has an identity id And a set of user information . User ID id And user information is one-to-one correspondence , This mapping exists in DB in , And all other modules need to query this mapping relationship and get some necessary user field information from it . In a big concurrency scenario , Direct request DB The system must be unstoppable , So we naturally want to use the cache solution to solve . Does every node need to store the full amount of user information ? While you can , But it's not the best plan , In case the user scale changes from 1000 Ten thousand rose to 1 Hundred million? ? Obviously, as the number of users increases , Become stretched , Soon there will be bottlenecks or even inability to meet demand . So there's a need for consistency hash Algorithm to solve this problem . Uniformity hash The algorithm provides the guarantee that the request falls on the same node as much as possible under the same input .
Why say as much as possible ? Because the node will fail and go offline , It may also be added due to capacity expansion , Uniformity hash The algorithm is able to minimize the cache reconstruction under such changes .TARS The use of hash There are two algorithms , One is right key seek md5 After value , Take the address offset to do XOR operation , The other is ketama hash.
Four 、 Why dynamic load balancing is needed ?
Most of our current services are based on virtual machines , So mixed deployment ( One node deploys multiple services ) It's a common phenomenon . In the case of mixed deployment , If a service code has bug It takes up a lot of CPU Or memory , Then the services deployed with him will be affected .
If the above three load balancing algorithms are still used , There's a problem , The affected machines will still allocate traffic according to the specified rules . Maybe some people will think , The polling load balancing algorithm based on weight can not configure the nodes with problems to have low weight and then allocate them to less traffic ? It can , But this method is often not timely , If it happened in the middle of the night ? And it needs to be configured manually after the fault is removed , Increased operation and maintenance costs . Therefore, we need a dynamic load balancing algorithm to automatically adjust the traffic distribution , Try to ensure the quality of service in this abnormal situation .
It's not hard to see from here that , To achieve the core of dynamic load balancing function, we only need to dynamically adjust the weight of different nodes according to the load of services . This is also a common practice in the industry , All of them get server status information periodically , Dynamically calculate the current weight of each server .
5、 ... and 、 Dynamic load balancing strategy
Here we also use the method of dynamic weight calculation for available nodes based on various load factors , Return the weight and reuse it TARS Static weight node selection algorithm . The load factors we choose are : Interface 5 The average time of a minute / Interface 5 Minute timeout rate / Interface 5 Minute abnormal rate /CPU load / Memory usage / Network card load . Load factor supports dynamic expansion .
The overall function diagram is as follows :
5.1 Overall interaction sequence diagram
rpc Invocation time ,EndpointManager Get the set of available nodes on a regular basis . Nodes have weight information . When the service initiates the call, it selects the corresponding node according to the load balancing algorithm specified by the service side ;
RegistrServer On a regular basis from db/ Monitor and learn to get information such as timeout rate and average time consumption . From other platforms ( such as CMDB) Get machine load class information , such as cpu/ Memory, etc. . All computation threads execute asynchronously and are cached locally ;
EndpointManager The selection strategy is executed according to the weight obtained . The following figure shows the impact of node weight change on request traffic allocation :
5.2 Node update and load balancing strategy
All performance data of each node 60 Seconds to update , Use thread timing update ;
Calculate the weight value and value range of all nodes , In the memory cache ;
After getting the node weight information, the main call executes the current static weight load balancing algorithm to select the node ;
Out strategy : If all nodes are the same or abnormal, the default method is polling ;
5.3 How the load is calculated
Load calculation method : Each load factor sets the weight value and the corresponding importance level ( In percentage terms ), Adjust the settings according to the specific importance , Finally, the total value will be calculated by multiplying the weight value of all load factors by the corresponding percentage . such as : The weight of time consumption is 10, The weight of timeout rate is 20, The corresponding importance levels are 40% and 60%, Then the sum is 10 * 0.4 + 20 * 0.6 = 16. Each load factor is calculated as follows ( At present, we only use two load factors, average time consuming and timeout rate , It's also the easiest to TARS Data available in the current system ):
1、 According to the proportion of each machine in the total time consumption, the weight is distributed in inverse proportion : The weight = Initial weight *( The sum of time - The average time of a single machine is )/ The sum of time ( The disadvantage is that the traffic is not allocated according to the time consumption ratio );
2、 Timeout rate weight : Timeout rate weight = Initial weight - Overtime rate * Initial weight * 90%, Conversion 90% Because 100% Overtime may also be caused by excessive traffic , Keep small traffic probing requests ;
The corresponding code is implemented as follows :
void LoadBalanceThread::calculateWeight(LoadCache &loadCache)
{
for (auto &loadPair : loadCache)
{
ostringstream log;
const auto ITEM_SIZE(static_cast<int>(loadPair.second.vtBalanceItem.size()));
int aveTime(loadPair.second.aveTimeSum / ITEM_SIZE);
log << "aveTime: " << aveTime << "|"
<< "vtBalanceItem size: " << ITEM_SIZE << "|";
for (auto &loadInfo : loadPair.second.vtBalanceItem)
{
// According to the proportion of each machine in the total time consumption, the weight is distributed in inverse proportion : The weight = Initial weight *( The sum of time - The average time of a single machine is )/ The sum of time
TLOGDEBUG("loadPair.second.aveTimeSum: " << loadPair.second.aveTimeSum << endl);
int aveTimeWeight(loadPair.second.aveTimeSum ? (DEFAULT_WEIGHT * ITEM_SIZE * (loadPair.second.aveTimeSum - loadInfo.aveTime) / loadPair.second.aveTimeSum) : 0);
aveTimeWeight = aveTimeWeight <= 0 ? MIN_WEIGHT : aveTimeWeight;
// Timeout rate weight : Timeout rate weight = Initial weight - Overtime rate * Initial weight * 90%, Conversion 90% Because 100% Overtime may also be caused by excessive traffic , Keep small traffic probing requests
int timeoutRateWeight(loadInfo.succCount ? (DEFAULT_WEIGHT - static_cast<int>(loadInfo.timeoutCount * TIMEOUT_WEIGHT_FACTOR / (loadInfo.succCount
+ loadInfo.timeoutCount))) : (loadInfo.timeoutCount ? MIN_WEIGHT : DEFAULT_WEIGHT));
// All kinds of weights are multiplied by corresponding proportions and then added to sum
loadInfo.weight = aveTimeWeight * getProportion(TIME_CONSUMING_WEIGHT_PROPORTION) / WEIGHT_PERCENT_UNIT
+ timeoutRateWeight * getProportion(TIMEOUT_WEIGHT_PROPORTION) / WEIGHT_PERCENT_UNIT ;
log << "aveTimeWeight: " << aveTimeWeight << ", "
<< "timeoutRateWeight: " << timeoutRateWeight << ", "
<< "loadInfo.weight: " << loadInfo.weight << "; ";
}
TLOGDEBUG(log.str() << "|" << endl);
}
}The related code is implemented in RegistryServer, The code file is shown below :
The core implementation is LoadBalanceThread class , Welcome to correct .
5.4 Usage mode
- stay Servant Management office configuration -w -v Parameters can support dynamic load balancing , If it is not configured, it is not enabled .
Here's the picture :
- Be careful : All nodes need to be enabled to take effect , otherwise rpc It is found in the framework that different nodes adopt different load balancing algorithms to force all nodes to be polled .
6、 ... and 、 Scenarios for dynamic load balancing
If your service is running in Docker On the container , That may not require dynamic load balancing . Use it directly Docker The scheduling ability of the system can automatically scale services , Or directly deploy Docker The granularity of distribution is small , Let the service monopolize docker There is no question of interaction . If the services are deployed mixed , And the service rate may be affected by other services , For example, a service can directly cpu completely fill , It's suggested that this function be turned on .
7、 ... and 、 Next step
At present, only two factors, average time consuming and timeout rate, are considered in the implementation , To a certain extent, this can reflect the service capacity , But not completely . therefore , We will consider joining in the future cpu These indicators can better reflect the load of nodes . as well as , Some strategies for the main caller to adjust the weight according to the return code .
Finally, welcome to discuss with us , Together for TARS Open source makes contributions .
author :vivo Internet server team -Yang Minshan
边栏推荐
- Listed JD Logistics: breaking through again
- QT -- the qtabwidget supports dragging tabbar items
- VaR in PHP_ export、print_ r、var_ Differences in dump debugging
- [deep learning][pytorch][original]crnn trains loss on the higher version of pytorch as a solution for Nan
- [206] use PHP language to generate the code of go language
- 《opencv学习笔记》-- 分离颜色通道、多通道混合
- How to calculate the bandwidth of video transmission? How much bandwidth is required to transmit 4K video?
- ArrayList#subList这四个坑,一不小心就中招
- Embedded must learn! Detailed explanation of hardware resource interface - based on arm am335x development board (Part 2)
- Example of SMS interface verification code function implemented by ThinkPHP framework
猜你喜欢

Basic path test of software test on the function of the previous day

Axi low power interface
【老卫搞机】090期:键盘?主机?全功能键盘主机!

《opencv学习笔记》-- 感兴趣区域(ROI)、图像混合
[mysql_16] variables, process control and cursors

美团基于 Flink 的实时数仓平台建设新进展

TP-LINK 1208 router tutorial (2)

Tools and methods - use code formatting tools in source insight

Opencv learning notes - loading and saving images

《opencv学习笔记》-- 分离颜色通道、多通道混合
随机推荐
Ingenious conception - iron death regulatory factor classification and prognosis 6+
打新债的条件 开户是安全的吗
最新热点:使用铜死亡相关基因进行肿瘤预后分型!
打新债可以申请多少 开户是安全的吗
Turn 2D photos into 3D models to see NVIDIA's new AI "magic"!
保险APP适老化服务评测分析2022第06期
Cloud native database: the outlet of the database, you can also take off
Jenkins remote publishing products
How can a shell script (.Sh file) not automatically close or flash back after execution?
嵌入式必学!硬件资源接口详解——基于ARM AM335X开发板 (下)
Is it safe to open an account under the conditions of new bonds
炒伦敦金短线稳定赚钱技巧?在哪里炒伦敦金安全靠谱?
10 zeros of D
FreeRTOS概述与体验
Opencv learning notes - Discrete Fourier transform
夜晚读书 -- 关于微服务和容器
Variable parameter template implements max (accepts multiple parameters, two implementation methods)
Continuous testing | test process improvement: practice continuous testing within iterations in coding
Which commercial insurance endowment insurance is good? Ranking of commercial endowment insurance products in 2022
mRNA疫苗的研制怎么做?27+ 胰腺癌抗原和免疫亚型的解析来告诉你答案!