当前位置:网站首页>Cluster task scheduling system lsf/sge/slurm/pbs based on HPC scenario
Cluster task scheduling system lsf/sge/slurm/pbs based on HPC scenario
2022-07-07 10:34:00 【Entering】
be based on HPC Cluster task scheduling system of scenario
List of articles
At present, there are four mainstream schedulers in the market :LSF/SGE/Slurm/PBS.
Different industries have different support for applications due to their usage habits and different schedulers , There are often different preferences : For example, universities and supercomputing often use Slurm, The most commonly used by semiconductor companies is LSF and SGE, Industrial manufacturing may use PBS More .
LSF Schools
Spectrum LSF、PlatformLSF、OpenLava
be based on LSF(Load Sharing Facility) The main schedulers are Spectrum LSF、PlatformLSF、OpenLava Three .
In the early LSF By Toronto University developed Utopia The system developed .
2007 year ,Platform Computing Based on earlier versions of LSF Open source has a simplified version Platform Lava.
This open source project 2011 The year ended , By OpenLava To take over .
2011 year ,Platform staff David Bigagli be based on Platform Lava The derived code of creates OpenLava 1.0.2014 year , some Platform Our employees set up Teraproc company , by OpenLava Provide development and business support .2016 year IBM Just LSF Copyright pair Teraproc The company initiated a lawsuit ,2018 year IBM Win a lawsuit ,OpenLava Disabled .

2011 year ,Platform Lava After the suspension of the open source project .2012 year 1 month ,IBM Acquired Platform Computing.Spectrum LSF Namely IBM The commercial version launched after the acquisition , Current update to 10.1.0, Support at the same time Linux and Windows, The maximum number of nodes exceeds 6000, Provide business support at home .
Platform LSF yes LSF Early versions , And Spectrum LSF It belongs to IBM, The current version is 9.1.3, Visual inspection has stopped updating, focusing on maintenance .


Among the three schedulers , have only Spectrum LSF Support Auto-Scale Cluster auto scaling function , At the same time, the scheduler can also use LSF resourceconnector Overflow to the cloud , Supporting cloud vendors include AWS、Azure、Google Cloud.
SGE Schools
UGE、SGE
be based on SGE(Sun Grid Engine) The scheduler includes UGE(Univa Grid Engine) and SGE(Son of Grid Engine).
1993 year ,Grid Engine Release as commercial software , Used... One after another CODINE(Computing in Distributed Networked Environments)、GRD(Global Resource Director) As name .1999 year , For the first time by Genias Software Launch the market , Then be Gridware Company purchase . until 2000 By the SUN Officially renamed after the acquisition Sun Grid Engine,2001 The open source version was released in .
2010 By the Oracle It was renamed after the acquisition Oracle Grid Engine, Change to closed source version , No source code . The original open source project database forbids users to modify .
therefore ,Grid Engine The community started the open source version of SGE(Son of Grid Engine) project . The scheduler was last updated to 2016 Year of 8.1.9, Due to copyright risks ,SGE There has been no maintenance and update for a long time .

2013 year Univa Acquired Oracle Grid Engine, Become the only commercial software **UGE(Univa Grid Engine)** provider .UGE The latest version is 8.6.15, Support at the same time Linux and Windows, There is no relevant information about commercial support in China .
2020 year 9 month ,Altair Acquired Univa.

User access Univa product Navops Launch Move the workload to the cloud , Support at the same time UGE and Slurm colony . meanwhile ,Navops Launch Support AWS、Azure、Google Cloud Wait for cloud vendors , And it can monitor the cloud expenses and Auto-Scale Cluster auto scaling .
Slurm- The only pure open source school among the four schools
Slurm Its full name is Simple Linux Utility for Resource Management, In the early stage, Lawrence Livermore National Laboratory 、SchedMD、Linux NetworX、Hewlett-Packard and Groupe Bull Responsible for the development of , By closed source software Quadrics RMS Inspired by the .
Slurm The latest version is 20.02, At present, it is composed of community and SchedMD Jointly maintained by the company , Keep open source and free , from SchedMD The company provides business support , Support only Linux System , The maximum number of nodes exceeds 12 ten thousand .
Slurm High fault tolerance 、 Support heterogeneous resources 、 Highly scalable and other advantages , More than... Can be submitted per second 1000 A mission , And because it's an open framework , Highly configurable , Have more than 100 Plug in , Therefore, the applicability is quite strong .

The global 60% Of TOP500 Supercomputing centers and super large-scale clusters ( Including China's Tianhe II, etc ) All use Slurm As a scheduling system . our TOP500 Just use Slurm Scheduling resources on the cloud .
We support in Slurm Automatic cluster scaling and cloud cost monitoring on , And support AWS、 Alibaba cloud 、Azure、 Tencent cloud 、 Hua Wei Yun 、Google Cloud Wait for cloud vendors .
fastone Of Auto-Scale The function can automatically monitor the number of tasks submitted by users and the demand for resources , Dynamically turn on the required computing resources on demand , Effectively reduce costs while improving efficiency .
PBS Schools
OpenPBS、PBS PRO、Moab/TORQUE
be based on PBS(Portable Batch System) The scheduler includes OpenPBS、PBS PRO、Moab/TORQUE.
PBS It was originally made by MRJ Technology Solutions On 1991 year 6 Month begins for NASA The job scheduling system ,MRJ On 20 century 90 By the end of the s Veridian Acquisition .2003 year ,Altair Acquired Veridian, To obtain the PBS Technology and intellectual property rights .
PBS Pro yes Altair its PBS WORKS Commercial version provided , Support visual interface , The number of nodes exceeds 50000 individual .

2016 year Altair be based on P****BS Pro An open source licensed version is available , And MRJ On 1998 The original open source version released in is roughly what it is now OpenPBS. And Pro Version than , A lot more restrictions , But they all support Linux and Windows.

**Moab/TORQUE Together, it is the function of a complete scheduler , Now belongs to the same company Adaptive Computing.**90 In the mid-s by MHPCC Of David Jackson Developed Maui, Later he founded Adaptive Computing.
Moab yes Adaptive Computing company ( Formerly known as Cluster Resources company-developed Maui Cluster Scheduler) Maintenance of OpenPBS Branch ,2003 Released in . The project was originally open source and free , Later it became commercial software Moab No longer free .
TORQUE(Terascale Open-source Resource and QUEue Manager) In the early Torque It's also open source free software , however 2018 year 6 Month begins TORQUE No more open source .
Both only support Linux System , Provide a visual interface , It has about thousands of nodes .

Cloud services ,PBS Pro Can pass Altair Control product Overflow from local to cloudy and Auto-Scale Cluster auto scaling , Supported cloud vendors include AWS、Azure and Google Cloud.
Moab/TORQUE You can go through NODUSCloud OS product Achieve local expansion to the cloud , Support TORQUE or Slurm Clustering and auto scaling , Supported cloud vendors include AWS、Azure、GoogleCloud And Huawei cloud , And pass Account Manager The product realizes cloud expense monitoring .
QUE or Slurm Clustering and auto scaling , Supported cloud vendors include AWS、Azure、GoogleCloud And Huawei cloud , And pass Account Manager The product realizes cloud expense monitoring .
边栏推荐
- 求方程ax^2+bx+c=0的根(C语言)
- Summary of router development knowledge
- How embedded engineers improve work efficiency
- 5个chrome简单实用的日常开发功能详解,赶快解锁让你提升更多效率!
- 1323:【例6.5】活动选择
- IIC基本知识
- 软考中级电子商务师含金量高嘛?
- 多线程-异步编排
- Why is the reflection efficiency low?
- Deeply analyze the main contents of erc-4907 agreement and think about the significance of this agreement to NFT market liquidity!
猜你喜欢

555电路详解

leetcode-560:和为 K 的子数组

Remote meter reading, switching on and off operation command
![[STM32] solution to the problem that SWD cannot recognize devices after STM32 burning program](/img/03/41bb3870b9a6c2ee66099abac08eb3.png)
[STM32] solution to the problem that SWD cannot recognize devices after STM32 burning program

搭建物联网硬件通信技术几种方案
![[sword finger offer] 42 Stack push in and pop-up sequence](/img/f4/eb69981163683c5b36f17992a87b3e.png)
[sword finger offer] 42 Stack push in and pop-up sequence

Pre knowledge reserve of TS type gymnastics to become an excellent TS gymnastics master

Summary of router development knowledge

【实战】霸榜各大医学分割挑战赛的Transformer架构--nnFormer

5个chrome简单实用的日常开发功能详解,赶快解锁让你提升更多效率!
随机推荐
ArrayList thread insecurity and Solutions
JS实现链式调用
施努卡:机器视觉定位技术 机器视觉定位原理
ThreadLocal会用可不够
leetcode-560:和为 K 的子数组
Leetcode-304: two dimensional area and retrieval - matrix immutable
openinstall与虎扑达成合作,挖掘体育文化产业数据价值
软考中级,软件设计师考试那些内容,考试大纲什么的?
宁愿把简单的问题说一百遍,也不把复杂的问题做一遍
Review of the losers in the postgraduate entrance examination
Study summary of postgraduate entrance examination in July
CAS mechanism
求最大公约数与最小公倍数(C语言)
串口通讯继电器-modbus通信上位机调试软件工具项目开发案例
基于HPC场景的集群任务调度系统LSF/SGE/Slurm/PBS
小程序跳转H5,配置业务域名经验教程
移动端通过设置rem使页面内容及字体大小自动调整
Applet jump to H5, configure business domain name experience tutorial
优雅的 Controller 层代码
Encrypt and decrypt stored procedures (SQL 2008/sql 2012)