当前位置:网站首页>Ways to improve the utilization of openeuler resources 01: Introduction
Ways to improve the utilization of openeuler resources 01: Introduction
2022-07-07 19:52:00 【openEuler】
The problem background
According to the Canalys A report released showed [1], Global spending on cloud infrastructure services is 2022 Year on year growth in the first quarter of 34%, achieve 559 Billion dollars . However , Several studies have shown that , The current average number of global data center user clusters CPU Utilization is lower than 20%, There is a huge waste of resources . therefore , Improving the utilization of data center resources is an important problem that needs to be solved urgently [2].
The cause of the problem
The main reason for low resource utilization is the imbalance between tasks and resource allocation , This imbalance has many forms , for example :
The scheduling system is independent of the cluster : Different jobs adopt different scheduling systems , Jobs cannot flow in a broader cluster , Idle resources of other clusters cannot be effectively utilized .
Lack of diversity in task types : The job homogeneity in the cluster is serious , Some resources are used in the job set , As a result, the utilization rate of this part of resources is high , But the rest of the resources are idle .
Lack of priority hierarchical management : Or the lack of low priority jobs to fill idle resources , Or there are low priority jobs, but the cluster does not have hierarchical control capability , Lead to over allocation of resources .
The resource type in the cluster is single : The overall specification of the internal resources of the cluster is single , It cannot flexibly scale the dynamic requirements of various resources according to the overall business , This leads to excessive allocation of some resources .
Overall speaking , It is the lack of diversity of tasks and resources within the cluster , The weak ability of scheduling to manage diverse tasks and resources leads to .
Solutions
Deploy different types of jobs , Improve the utilization rate of resources in time and space respectively .
Oversold resources ( Air separation is oversold ): The idle resources of online business are oversold to offline jobs , Improve overall resource utilization . Peak staggering use ( Time oversold ): The idle period of online business is filled with offline jobs , Reduce resource idling .
Technical challenges
Whether it is oversold by air or time , There is a lack of common peak resources , This problem will lead to the service quality of some businesses (QoS) Damage . How to improve resource utilization , Security business QoS Undamaged is a key technical challenge .
Besides , The diversity and complexity of cloud businesses further increase the difficulty of ensuring service quality :
One side , Perceived degree from load characteristics , It can be divided into white box applications , Black box application and gray box application . White box applications can be perceived by the system , Get... In real time QoS indicators ; Black box business cannot be perceived by the system , The system doesn't even know the application QoS What is it? ; Applications with a perceptibility between the two are called gray box applications . How to accurately quantify the service quality of black box business and locate interference sources is the technical challenge of capability generalization , It is also a research hotspot in the industry .
On the other hand , From the business complexity of the load , It can be divided into lightweight applications ( Such as microservices , Function calculation ), Traditional applications ( Such as monomer Application ) And super applications ( Such as HPC/AI) etc. . We need to overcome technical problems such as full stack collaborative awareness , Build a universal unified system .
Solution brief
According to the above cause analysis , further , Diversified businesses / Load and resource integration deployment scheduling , It can significantly improve the flexibility of resource allocation , So as to achieve the purpose of improving the efficiency of resource utilization . But it also brings greater technical challenges , Managed business / The more load , The more resource types , The more complex the dependency relationship is , The more complex the multi-objective optimization requirements of the system . Based on this , We divide it into the following development stages :
L0: Independent deployment : Cluster independent technology stack 、 Independent resource pool , Low cluster utilization (<20%).
L1: Shared deployment : Unified technology stack expands the scale of the cluster , Single type business shared resource deployment , Improve resource utilization based on dynamic elasticity , The utilization rate of cluster resources is low (<30%).
Related technology : Technology stack unification 、 Containerization 、 Stretch and stretch
L2:「 Mixed deployment 」: Unified technology stack expands the scale of the cluster , Deployment of shared resources for various types of businesses , Improve resource utilization based on oversold and isolation technology , The utilization rate of cluster resources is high (>40%).
Related technology : Oversold resources 、 Hierarchical isolation of resources 、 Feedback control
L3:「 Generic hybrid 」: Hybrid deployment business type generalization , Support the deployment of thousands of black box business shared resources on the public cloud , be based on QoS Quantitative perception ensures the service quality of key businesses .
Related technology :QoS quantitative / location 、 Precise control 、QoS Perceptual scheduling
L4:「 Integration deployment 」: On the basis of load type generalization , Fusion container 、 The virtual machine 、 Lightweight runtime and other diverse loads , combination HPC/AI+ Complex scenarios such as heterogeneous resource perception , Comprehensively improve the overall utilization of various resources .
Related technology : Heterogeneous resource aware scheduling 、 Unified scheduling
among ,L1~L2 To improve the cluster CPU Resource utilization is the main factor ,L3~L4 Generalize the technology of improving resource utilization .
The industry is currently engaged in internal business L2 Level exploration has significantly improved the overall utilization of clusters and even data centers , But public cloud generalization is still in its early stage , It's not commercial yet .
We are on the trend of combining future generics and converged deployment , It has built a set of sustainable resource utilization solutions , As shown in the figure below :
In order to achieve the best deployment effect , It needs to be controlled and optimized at multiple levels of task execution :
「 Cluster management 」: At the scheduling level, businesses with strong performance interference are deployed separately , Reduce unnecessary interference through task combination optimization .
「 Stand alone management 」: Stand alone management level real-time perception of resource competition , Eliminate the impact on key operations .
「 Resource isolation layer 」: Priority control by grading tasks , Ensure the resource requirements of high priority tasks .
At present, Huawei has realized based on the above framework L2 Level solutions , The relevant features have been verified in Huawei and launched in succession . Important breakthroughs have been made in technology at all levels :
「 Cluster management 」:
Predictive scheduling : Support predictive scheduling based on node physical resource utilization [3]、 Load balancing scheduling 、 Resource preemption scheduling and other features . Feature modeling : A set of general application portrait modeling components is designed and implemented , This component can automatically inject interference 、 Index collection and model output .
「 Stand alone management 」:
QoS quantitative : Real time detection of business based on quantitative model QoS And real-time control of interference sources . Topology layout : According to the hardware topology , Make dynamic affinity arrangement for business , With the resource quota unchanged , Improve overall performance . Power control : The increased resource utilization increases the risk of excessive power consumption of the whole machine , Power consumption changes need to be monitored in real time , Carry out targeted power consumption suppression . L3/MB control : The current underlying hardware provides L3 Cache and memory bandwidth isolation , But still need software dynamic control , To achieve a balance between interference control and resource utilization .
「 Resource isolation layer 」:
Hierarchical preemption : Provide hierarchical preemption capability for prioritized queued resources , Such as CPU、MEM、IO/NET etc. , among CPU Absolute suppression ability ( Avoid priority reversal ),NET Preemptive performance (<100ms) And other industry leaders . Flexible scheduling : Support tidal affinity 、CPU Burst Equal elastic scheduling capacity .
The above fine particle characteristics , We will also open to openEuler On , Please use more 、 Communicate more in the community .
Future plans
At present, we have verified and implemented the hybrid deployment scheme in some internal scenarios , It's reached L2 Stage . In the short term , We also need to break through the black box business QoS Ensure relevant technology and enter L3 Stage , Only to achieve L3 Only in this stage can more users benefit . In the long term , In addition to the container scenario , There are more load types 、 Resource types need to improve resource utilization , This needs to be scheduled in the cluster 、OS And other levels, there are more technological breakthroughs .
This article briefly introduces the thinking about the solution technology of improving the utilization of resources on the cloud , Follow up plans for the isolation technology involved , Feedback control technology , Perceptual scheduling technology is introduced in detail , Coming soon !
Reference material
Global cloud services spend hits US$55.9 billion in Q1 2022 Wang Kangjin , Jia Tong , Li Ying . Summary of research on job scheduling and resource management technology in off-line mixed Department . Journal of software ,2020,31(10):3100-3119 Volcano: On the management platform of off-line operation Department , Realize intelligent resource management and job scheduling
Join us
The resource utilization improvement technology mentioned in the article , from Cloud Native SIG、High Performance Network SIG,Kernel SIG, OpenStack SIG and Virt SIG Joint participation , Its source code will be in openEuler The community is gradually open source . If you are interested in related technologies , Welcome to watch and join . You can add a small assistant wechat , Add the corresponding SIG Wechat group .
This article is from WeChat official account. - openEuler(openEulercommunity).
If there is any infringement , Please contact the [email protected] Delete .
Participation of this paper “OSC Source creation plan ”, You are welcome to join us , share .
边栏推荐
- Automatic classification of defective photovoltaic module cells in electroluminescence images-论文阅读笔记
- R语言fpc包的dbscan函数对数据进行密度聚类分析、查看所有样本的聚类标签、table函数计算聚类簇标签与实际标签构成的二维列联表
- R language ggplot2 visualization: use the ggdensity function of ggpubr package to visualize the packet density graph, and use stat_ overlay_ normal_ The density function superimposes the positive dist
- AI writes a poem
- 怎么在手机上买股票开户 股票开户安全吗
- R语言ggplot2可视化:使用ggpubr包的ggstripchart函数可视化分组点状条带图(dot strip plot)、设置position参数配置不同分组数据点的分离程度
- UCloud是基础云计算服务提供商
- How to buy stocks on your mobile phone and open an account? Is it safe to open an account
- 831. KMP字符串
- 项目经理『面试八问』,看了等于会了
猜你喜欢
openEuler 资源利用率提升之道 01:概论
谷歌seo外链Backlinks研究工具推荐
Experiment 1 of Compilation Principle: automatic implementation of lexical analyzer (Lex lexical analysis)
位运算介绍
Make this crmeb single merchant wechat mall system popular, so easy to use!
Flink并行度和Slot详解
2022.07.04
Navicat连接2002 - Can‘t connect to local MySQL server through socket ‘/var/lib/mysql/mysql.sock‘解决
PMP practice once a day | don't get lost in the exam -7.7
8 CAS
随机推荐
The strength index of specialized and new software development enterprises was released, and Kirin Xin'an was honored on the list
【STL】vector
ASP. Net gymnasium integrated member management system source code, free sharing
L1-025 positive integer a+b (Lua)
Throughput
How to share the same storage among multiple kubernetes clusters
Interpretation of transpose convolution theory (input-output size analysis)
ASP.NET体育馆综合会员管理系统源码,免费分享
tp6 实现佣金排行榜
注解。。。
LC:字符串转换整数 (atoi) + 外观数列 + 最长公共前缀
2022.07.04
State mode - Unity (finite state machine)
Automatic classification of defective photovoltaic module cells in electroluminescence images-论文阅读笔记
Dynamic addition of El upload upload component; El upload dynamically uploads files; El upload distinguishes which component uploads the file.
L1-019 who falls first (Lua)
What does "true" mean
ASP.NET幼儿园连锁管理系统源码
模拟实现string类
Notes...