当前位置:网站首页>Ways to improve the utilization of openeuler resources 01: Introduction
Ways to improve the utilization of openeuler resources 01: Introduction
2022-07-07 19:52:00 【openEuler】
The problem background
According to the Canalys A report released showed [1], Global spending on cloud infrastructure services is 2022 Year on year growth in the first quarter of 34%, achieve 559 Billion dollars . However , Several studies have shown that , The current average number of global data center user clusters CPU Utilization is lower than 20%, There is a huge waste of resources . therefore , Improving the utilization of data center resources is an important problem that needs to be solved urgently [2].
The cause of the problem
The main reason for low resource utilization is the imbalance between tasks and resource allocation , This imbalance has many forms , for example :
The scheduling system is independent of the cluster : Different jobs adopt different scheduling systems , Jobs cannot flow in a broader cluster , Idle resources of other clusters cannot be effectively utilized .
Lack of diversity in task types : The job homogeneity in the cluster is serious , Some resources are used in the job set , As a result, the utilization rate of this part of resources is high , But the rest of the resources are idle .
Lack of priority hierarchical management : Or the lack of low priority jobs to fill idle resources , Or there are low priority jobs, but the cluster does not have hierarchical control capability , Lead to over allocation of resources .
The resource type in the cluster is single : The overall specification of the internal resources of the cluster is single , It cannot flexibly scale the dynamic requirements of various resources according to the overall business , This leads to excessive allocation of some resources .
Overall speaking , It is the lack of diversity of tasks and resources within the cluster , The weak ability of scheduling to manage diverse tasks and resources leads to .
Solutions
Deploy different types of jobs , Improve the utilization rate of resources in time and space respectively .
Oversold resources ( Air separation is oversold ): The idle resources of online business are oversold to offline jobs , Improve overall resource utilization . Peak staggering use ( Time oversold ): The idle period of online business is filled with offline jobs , Reduce resource idling .
Technical challenges
Whether it is oversold by air or time , There is a lack of common peak resources , This problem will lead to the service quality of some businesses (QoS) Damage . How to improve resource utilization , Security business QoS Undamaged is a key technical challenge .
Besides , The diversity and complexity of cloud businesses further increase the difficulty of ensuring service quality :
One side , Perceived degree from load characteristics , It can be divided into white box applications , Black box application and gray box application . White box applications can be perceived by the system , Get... In real time QoS indicators ; Black box business cannot be perceived by the system , The system doesn't even know the application QoS What is it? ; Applications with a perceptibility between the two are called gray box applications . How to accurately quantify the service quality of black box business and locate interference sources is the technical challenge of capability generalization , It is also a research hotspot in the industry .
On the other hand , From the business complexity of the load , It can be divided into lightweight applications ( Such as microservices , Function calculation ), Traditional applications ( Such as monomer Application ) And super applications ( Such as HPC/AI) etc. . We need to overcome technical problems such as full stack collaborative awareness , Build a universal unified system .
Solution brief
According to the above cause analysis , further , Diversified businesses / Load and resource integration deployment scheduling , It can significantly improve the flexibility of resource allocation , So as to achieve the purpose of improving the efficiency of resource utilization . But it also brings greater technical challenges , Managed business / The more load , The more resource types , The more complex the dependency relationship is , The more complex the multi-objective optimization requirements of the system . Based on this , We divide it into the following development stages :
L0: Independent deployment : Cluster independent technology stack 、 Independent resource pool , Low cluster utilization (<20%).
L1: Shared deployment : Unified technology stack expands the scale of the cluster , Single type business shared resource deployment , Improve resource utilization based on dynamic elasticity , The utilization rate of cluster resources is low (<30%).
Related technology : Technology stack unification 、 Containerization 、 Stretch and stretch
L2:「 Mixed deployment 」: Unified technology stack expands the scale of the cluster , Deployment of shared resources for various types of businesses , Improve resource utilization based on oversold and isolation technology , The utilization rate of cluster resources is high (>40%).
Related technology : Oversold resources 、 Hierarchical isolation of resources 、 Feedback control
L3:「 Generic hybrid 」: Hybrid deployment business type generalization , Support the deployment of thousands of black box business shared resources on the public cloud , be based on QoS Quantitative perception ensures the service quality of key businesses .
Related technology :QoS quantitative / location 、 Precise control 、QoS Perceptual scheduling
L4:「 Integration deployment 」: On the basis of load type generalization , Fusion container 、 The virtual machine 、 Lightweight runtime and other diverse loads , combination HPC/AI+ Complex scenarios such as heterogeneous resource perception , Comprehensively improve the overall utilization of various resources .
Related technology : Heterogeneous resource aware scheduling 、 Unified scheduling
among ,L1~L2 To improve the cluster CPU Resource utilization is the main factor ,L3~L4 Generalize the technology of improving resource utilization .
The industry is currently engaged in internal business L2 Level exploration has significantly improved the overall utilization of clusters and even data centers , But public cloud generalization is still in its early stage , It's not commercial yet .
We are on the trend of combining future generics and converged deployment , It has built a set of sustainable resource utilization solutions , As shown in the figure below :
In order to achieve the best deployment effect , It needs to be controlled and optimized at multiple levels of task execution :
「 Cluster management 」: At the scheduling level, businesses with strong performance interference are deployed separately , Reduce unnecessary interference through task combination optimization .
「 Stand alone management 」: Stand alone management level real-time perception of resource competition , Eliminate the impact on key operations .
「 Resource isolation layer 」: Priority control by grading tasks , Ensure the resource requirements of high priority tasks .
At present, Huawei has realized based on the above framework L2 Level solutions , The relevant features have been verified in Huawei and launched in succession . Important breakthroughs have been made in technology at all levels :
「 Cluster management 」:
Predictive scheduling : Support predictive scheduling based on node physical resource utilization [3]、 Load balancing scheduling 、 Resource preemption scheduling and other features . Feature modeling : A set of general application portrait modeling components is designed and implemented , This component can automatically inject interference 、 Index collection and model output .
「 Stand alone management 」:
QoS quantitative : Real time detection of business based on quantitative model QoS And real-time control of interference sources . Topology layout : According to the hardware topology , Make dynamic affinity arrangement for business , With the resource quota unchanged , Improve overall performance . Power control : The increased resource utilization increases the risk of excessive power consumption of the whole machine , Power consumption changes need to be monitored in real time , Carry out targeted power consumption suppression . L3/MB control : The current underlying hardware provides L3 Cache and memory bandwidth isolation , But still need software dynamic control , To achieve a balance between interference control and resource utilization .
「 Resource isolation layer 」:
Hierarchical preemption : Provide hierarchical preemption capability for prioritized queued resources , Such as CPU、MEM、IO/NET etc. , among CPU Absolute suppression ability ( Avoid priority reversal ),NET Preemptive performance (<100ms) And other industry leaders . Flexible scheduling : Support tidal affinity 、CPU Burst Equal elastic scheduling capacity .
The above fine particle characteristics , We will also open to openEuler On , Please use more 、 Communicate more in the community .
Future plans
At present, we have verified and implemented the hybrid deployment scheme in some internal scenarios , It's reached L2 Stage . In the short term , We also need to break through the black box business QoS Ensure relevant technology and enter L3 Stage , Only to achieve L3 Only in this stage can more users benefit . In the long term , In addition to the container scenario , There are more load types 、 Resource types need to improve resource utilization , This needs to be scheduled in the cluster 、OS And other levels, there are more technological breakthroughs .
This article briefly introduces the thinking about the solution technology of improving the utilization of resources on the cloud , Follow up plans for the isolation technology involved , Feedback control technology , Perceptual scheduling technology is introduced in detail , Coming soon !
Reference material
Global cloud services spend hits US$55.9 billion in Q1 2022 Wang Kangjin , Jia Tong , Li Ying . Summary of research on job scheduling and resource management technology in off-line mixed Department . Journal of software ,2020,31(10):3100-3119 Volcano: On the management platform of off-line operation Department , Realize intelligent resource management and job scheduling
Join us
The resource utilization improvement technology mentioned in the article , from Cloud Native SIG、High Performance Network SIG,Kernel SIG, OpenStack SIG and Virt SIG Joint participation , Its source code will be in openEuler The community is gradually open source . If you are interested in related technologies , Welcome to watch and join . You can add a small assistant wechat , Add the corresponding SIG Wechat group .
This article is from WeChat official account. - openEuler(openEulercommunity).
If there is any infringement , Please contact the [email protected] Delete .
Participation of this paper “OSC Source creation plan ”, You are welcome to join us , share .
边栏推荐
- Automatic classification of defective photovoltaic module cells in electroluminescence images-論文閱讀筆記
- Throughput
- PV static creation and dynamic creation
- 浏览积分设置的目的
- 8 CAS
- Kirin Xin'an joins Ningxia commercial cipher Association
- R language ggplot2 visualization: use the ggecdf function of ggpubr package to visualize the grouping experience cumulative density distribution function curve, and the linetype parameter to specify t
- [confluence] JVM memory adjustment
- 杰理之发起对耳配对、回连、开启可发现、可连接的轮循函数【篇】
- what‘s the meaning of inference
猜你喜欢
2022.07.04
Make insurance more "safe"! Kirin Xin'an one cloud multi-core cloud desktop won the bid of China Life Insurance, helping the innovation and development of financial and insurance information technolog
ASP.NET幼儿园连锁管理系统源码
2022如何评估与选择低代码开发平台?
爬虫实战(七):爬王者英雄图片
杰理之相同声道的耳机不允许配对【篇】
超分辨率技术在实时音视频领域的研究与实践
openEuler 有奖捉虫活动,来参与一下?
PMP對工作有益嗎?怎麼選擇靠譜平臺讓備考更省心省力!!!
RESTAPI 版本控制策略【eolink 翻译】
随机推荐
歌单11111
微信公众号OAuth2.0授权登录并显示用户信息
杰理之相同声道的耳机不允许配对【篇】
ant desgin 多选
Number - number (Lua)
8 CAS
AI writes a poem
【剑指offer】剑指 Offer II 012. 左右两边子数组的和相等
Ucloud is a basic cloud computing service provider
杰理之发起对耳配对、回连、开启可发现、可连接的轮循函数【篇】
凌云出海记 | 赛盒&华为云:共助跨境电商行业可持续发展
Solve the error reporting problem of rosdep
Nunjuks template engine
【RT-Thread env 工具安装】
Training IX basic configuration of network services
Numpy——2. Shape of array
R language ggplot2 visualization: use the ggqqplot function of ggpubr package to visualize the QQ graph (Quantitative quantitative plot)
Is PMP beneficial to work? How to choose a reliable platform to make it easier to prepare for the exam!!!
String - string (Lua)
R语言ggplot2可视化:使用ggpubr包的ggecdf函数可视化分组经验累积密度分布函数曲线、linetype参数指定不同分组曲线的线型