当前位置:网站首页>Ways to improve the utilization of openeuler resources 01: Introduction
Ways to improve the utilization of openeuler resources 01: Introduction
2022-07-07 19:52:00 【openEuler】
The problem background
According to the Canalys A report released showed [1], Global spending on cloud infrastructure services is 2022 Year on year growth in the first quarter of 34%, achieve 559 Billion dollars . However , Several studies have shown that , The current average number of global data center user clusters CPU Utilization is lower than 20%, There is a huge waste of resources . therefore , Improving the utilization of data center resources is an important problem that needs to be solved urgently [2].
The cause of the problem
The main reason for low resource utilization is the imbalance between tasks and resource allocation , This imbalance has many forms , for example :
The scheduling system is independent of the cluster : Different jobs adopt different scheduling systems , Jobs cannot flow in a broader cluster , Idle resources of other clusters cannot be effectively utilized .
Lack of diversity in task types : The job homogeneity in the cluster is serious , Some resources are used in the job set , As a result, the utilization rate of this part of resources is high , But the rest of the resources are idle .
Lack of priority hierarchical management : Or the lack of low priority jobs to fill idle resources , Or there are low priority jobs, but the cluster does not have hierarchical control capability , Lead to over allocation of resources .
The resource type in the cluster is single : The overall specification of the internal resources of the cluster is single , It cannot flexibly scale the dynamic requirements of various resources according to the overall business , This leads to excessive allocation of some resources .
Overall speaking , It is the lack of diversity of tasks and resources within the cluster , The weak ability of scheduling to manage diverse tasks and resources leads to .
Solutions
Deploy different types of jobs , Improve the utilization rate of resources in time and space respectively .
Oversold resources ( Air separation is oversold ): The idle resources of online business are oversold to offline jobs , Improve overall resource utilization . Peak staggering use ( Time oversold ): The idle period of online business is filled with offline jobs , Reduce resource idling .
Technical challenges
Whether it is oversold by air or time , There is a lack of common peak resources , This problem will lead to the service quality of some businesses (QoS) Damage . How to improve resource utilization , Security business QoS Undamaged is a key technical challenge .
Besides , The diversity and complexity of cloud businesses further increase the difficulty of ensuring service quality :
One side , Perceived degree from load characteristics , It can be divided into white box applications , Black box application and gray box application . White box applications can be perceived by the system , Get... In real time QoS indicators ; Black box business cannot be perceived by the system , The system doesn't even know the application QoS What is it? ; Applications with a perceptibility between the two are called gray box applications . How to accurately quantify the service quality of black box business and locate interference sources is the technical challenge of capability generalization , It is also a research hotspot in the industry .
On the other hand , From the business complexity of the load , It can be divided into lightweight applications ( Such as microservices , Function calculation ), Traditional applications ( Such as monomer Application ) And super applications ( Such as HPC/AI) etc. . We need to overcome technical problems such as full stack collaborative awareness , Build a universal unified system .
Solution brief
According to the above cause analysis , further , Diversified businesses / Load and resource integration deployment scheduling , It can significantly improve the flexibility of resource allocation , So as to achieve the purpose of improving the efficiency of resource utilization . But it also brings greater technical challenges , Managed business / The more load , The more resource types , The more complex the dependency relationship is , The more complex the multi-objective optimization requirements of the system . Based on this , We divide it into the following development stages :
L0: Independent deployment : Cluster independent technology stack 、 Independent resource pool , Low cluster utilization (<20%).
L1: Shared deployment : Unified technology stack expands the scale of the cluster , Single type business shared resource deployment , Improve resource utilization based on dynamic elasticity , The utilization rate of cluster resources is low (<30%).
Related technology : Technology stack unification 、 Containerization 、 Stretch and stretch
L2:「 Mixed deployment 」: Unified technology stack expands the scale of the cluster , Deployment of shared resources for various types of businesses , Improve resource utilization based on oversold and isolation technology , The utilization rate of cluster resources is high (>40%).
Related technology : Oversold resources 、 Hierarchical isolation of resources 、 Feedback control
L3:「 Generic hybrid 」: Hybrid deployment business type generalization , Support the deployment of thousands of black box business shared resources on the public cloud , be based on QoS Quantitative perception ensures the service quality of key businesses .
Related technology :QoS quantitative / location 、 Precise control 、QoS Perceptual scheduling
L4:「 Integration deployment 」: On the basis of load type generalization , Fusion container 、 The virtual machine 、 Lightweight runtime and other diverse loads , combination HPC/AI+ Complex scenarios such as heterogeneous resource perception , Comprehensively improve the overall utilization of various resources .
Related technology : Heterogeneous resource aware scheduling 、 Unified scheduling
among ,L1~L2 To improve the cluster CPU Resource utilization is the main factor ,L3~L4 Generalize the technology of improving resource utilization .
The industry is currently engaged in internal business L2 Level exploration has significantly improved the overall utilization of clusters and even data centers , But public cloud generalization is still in its early stage , It's not commercial yet .
We are on the trend of combining future generics and converged deployment , It has built a set of sustainable resource utilization solutions , As shown in the figure below :
In order to achieve the best deployment effect , It needs to be controlled and optimized at multiple levels of task execution :
「 Cluster management 」: At the scheduling level, businesses with strong performance interference are deployed separately , Reduce unnecessary interference through task combination optimization .
「 Stand alone management 」: Stand alone management level real-time perception of resource competition , Eliminate the impact on key operations .
「 Resource isolation layer 」: Priority control by grading tasks , Ensure the resource requirements of high priority tasks .
At present, Huawei has realized based on the above framework L2 Level solutions , The relevant features have been verified in Huawei and launched in succession . Important breakthroughs have been made in technology at all levels :
「 Cluster management 」:
Predictive scheduling : Support predictive scheduling based on node physical resource utilization [3]、 Load balancing scheduling 、 Resource preemption scheduling and other features . Feature modeling : A set of general application portrait modeling components is designed and implemented , This component can automatically inject interference 、 Index collection and model output .
「 Stand alone management 」:
QoS quantitative : Real time detection of business based on quantitative model QoS And real-time control of interference sources . Topology layout : According to the hardware topology , Make dynamic affinity arrangement for business , With the resource quota unchanged , Improve overall performance . Power control : The increased resource utilization increases the risk of excessive power consumption of the whole machine , Power consumption changes need to be monitored in real time , Carry out targeted power consumption suppression . L3/MB control : The current underlying hardware provides L3 Cache and memory bandwidth isolation , But still need software dynamic control , To achieve a balance between interference control and resource utilization .
「 Resource isolation layer 」:
Hierarchical preemption : Provide hierarchical preemption capability for prioritized queued resources , Such as CPU、MEM、IO/NET etc. , among CPU Absolute suppression ability ( Avoid priority reversal ),NET Preemptive performance (<100ms) And other industry leaders . Flexible scheduling : Support tidal affinity 、CPU Burst Equal elastic scheduling capacity .
The above fine particle characteristics , We will also open to openEuler On , Please use more 、 Communicate more in the community .
Future plans
At present, we have verified and implemented the hybrid deployment scheme in some internal scenarios , It's reached L2 Stage . In the short term , We also need to break through the black box business QoS Ensure relevant technology and enter L3 Stage , Only to achieve L3 Only in this stage can more users benefit . In the long term , In addition to the container scenario , There are more load types 、 Resource types need to improve resource utilization , This needs to be scheduled in the cluster 、OS And other levels, there are more technological breakthroughs .
This article briefly introduces the thinking about the solution technology of improving the utilization of resources on the cloud , Follow up plans for the isolation technology involved , Feedback control technology , Perceptual scheduling technology is introduced in detail , Coming soon !
Reference material
Global cloud services spend hits US$55.9 billion in Q1 2022 Wang Kangjin , Jia Tong , Li Ying . Summary of research on job scheduling and resource management technology in off-line mixed Department . Journal of software ,2020,31(10):3100-3119 Volcano: On the management platform of off-line operation Department , Realize intelligent resource management and job scheduling
Join us
The resource utilization improvement technology mentioned in the article , from Cloud Native SIG、High Performance Network SIG,Kernel SIG, OpenStack SIG and Virt SIG Joint participation , Its source code will be in openEuler The community is gradually open source . If you are interested in related technologies , Welcome to watch and join . You can add a small assistant wechat , Add the corresponding SIG Wechat group .
This article is from WeChat official account. - openEuler(openEulercommunity).
If there is any infringement , Please contact the [email protected] Delete .
Participation of this paper “OSC Source creation plan ”, You are welcome to join us , share .
边栏推荐
- 9 atomic operation class 18 Rohan enhancement
- My creation anniversary
- “本真”是什么意思
- Jürgen Schmidhuber回顾LSTM论文等发表25周年:Long Short-Term Memory. All computable metaverses. Hierarchical reinforcement learning (RL). Meta-RL. Abstractions in generative adversarial RL. Soccer learn
- 论文解读(ValidUtil)《Rethinking the Setting of Semi-supervised Learning on Graphs》
- Make insurance more "safe"! Kirin Xin'an one cloud multi-core cloud desktop won the bid of China Life Insurance, helping the innovation and development of financial and insurance information technolog
- 指定opencv非标准安装的版本
- MySQL、sqlserver oracle数据库连接方式
- L1-028 judging prime number (Lua)
- Throughput
猜你喜欢
CMD command enters MySQL times service name or command error (fool teaching)
Introduction to bit operation
杰理之手动配对方式【篇】
超分辨率技术在实时音视频领域的研究与实践
一张图深入的理解FP/FN/Precision/Recall
【RT-Thread env 工具安装】
Matplotlib drawing 3D graphics
Redis master-slave and sentinel master-slave switchover are built step by step
The project manager's "eight interview questions" is equal to a meeting
648. 单词替换
随机推荐
R language dplyr package mutate_ At function and min_ The rank function calculates the sorting sequence number value and ranking value of the specified data column in the dataframe, and assigns the ra
Welcome to the markdown editor
[Verilog advanced challenge of Niuke network question brushing series] ~ multi bit MUX synchronizer
项目经理『面试八问』,看了等于会了
LeetCode_7_5
How to open an account for stock speculation? Excuse me, is it safe to open a stock account by mobile phone?
杰理之按键发起配对【篇】
位运算介绍
RESTAPI 版本控制策略【eolink 翻译】
what‘s the meaning of inference
# 欢迎使用Markdown编辑器
Experiment 1 of Compilation Principle: automatic implementation of lexical analyzer (Lex lexical analysis)
让这个 CRMEB 单商户微信商城系统火起来,太好用了!
小试牛刀之NunJucks模板引擎
IP tools
R语言ggplot2可视化:使用ggpubr包的ggstripchart函数可视化分组点状条带图(dot strip plot)、设置position参数配置不同分组数据点的分离程度
Chief technology officer of Pasqual: analog quantum computing takes the lead in bringing quantum advantages to industry
CMD command enters MySQL times service name or command error (fool teaching)
openEuler 资源利用率提升之道 01:概论
Automatic classification of defective photovoltaic module cells in electroluminescence images-論文閱讀筆記