当前位置:网站首页>Cluster task scheduling system lsf/sge/slurm/pbs based on HPC scenario
Cluster task scheduling system lsf/sge/slurm/pbs based on HPC scenario
2022-07-07 10:34:00 【Entering】
be based on HPC Cluster task scheduling system of scenario
List of articles
At present, there are four mainstream schedulers in the market :LSF/SGE/Slurm/PBS.
Different industries have different support for applications due to their usage habits and different schedulers , There are often different preferences : For example, universities and supercomputing often use Slurm, The most commonly used by semiconductor companies is LSF and SGE, Industrial manufacturing may use PBS More .
LSF Schools
Spectrum LSF、PlatformLSF、OpenLava
be based on LSF(Load Sharing Facility) The main schedulers are Spectrum LSF、PlatformLSF、OpenLava Three .
In the early LSF By Toronto University developed Utopia The system developed .
2007 year ,Platform Computing Based on earlier versions of LSF Open source has a simplified version Platform Lava.
This open source project 2011 The year ended , By OpenLava To take over .
2011 year ,Platform staff David Bigagli be based on Platform Lava The derived code of creates OpenLava 1.0.2014 year , some Platform Our employees set up Teraproc company , by OpenLava Provide development and business support .2016 year IBM Just LSF Copyright pair Teraproc The company initiated a lawsuit ,2018 year IBM Win a lawsuit ,OpenLava Disabled .
2011 year ,Platform Lava After the suspension of the open source project .2012 year 1 month ,IBM Acquired Platform Computing.Spectrum LSF Namely IBM The commercial version launched after the acquisition , Current update to 10.1.0, Support at the same time Linux and Windows, The maximum number of nodes exceeds 6000, Provide business support at home .
Platform LSF yes LSF Early versions , And Spectrum LSF It belongs to IBM, The current version is 9.1.3, Visual inspection has stopped updating, focusing on maintenance .
Among the three schedulers , have only Spectrum LSF Support Auto-Scale Cluster auto scaling function , At the same time, the scheduler can also use LSF resourceconnector Overflow to the cloud , Supporting cloud vendors include AWS、Azure、Google Cloud.
SGE Schools
UGE、SGE
be based on SGE(Sun Grid Engine) The scheduler includes UGE(Univa Grid Engine) and SGE(Son of Grid Engine).
1993 year ,Grid Engine Release as commercial software , Used... One after another CODINE(Computing in Distributed Networked Environments)、GRD(Global Resource Director) As name .1999 year , For the first time by Genias Software Launch the market , Then be Gridware Company purchase . until 2000 By the SUN Officially renamed after the acquisition Sun Grid Engine,2001 The open source version was released in .
2010 By the Oracle It was renamed after the acquisition Oracle Grid Engine, Change to closed source version , No source code . The original open source project database forbids users to modify .
therefore ,Grid Engine The community started the open source version of SGE(Son of Grid Engine) project . The scheduler was last updated to 2016 Year of 8.1.9, Due to copyright risks ,SGE There has been no maintenance and update for a long time .
2013 year Univa Acquired Oracle Grid Engine, Become the only commercial software **UGE(Univa Grid Engine)** provider .UGE The latest version is 8.6.15, Support at the same time Linux and Windows, There is no relevant information about commercial support in China .
2020 year 9 month ,Altair Acquired Univa.
User access Univa product Navops Launch Move the workload to the cloud , Support at the same time UGE and Slurm colony . meanwhile ,Navops Launch Support AWS、Azure、Google Cloud Wait for cloud vendors , And it can monitor the cloud expenses and Auto-Scale Cluster auto scaling .
Slurm- The only pure open source school among the four schools
Slurm Its full name is Simple Linux Utility for Resource Management, In the early stage, Lawrence Livermore National Laboratory 、SchedMD、Linux NetworX、Hewlett-Packard and Groupe Bull Responsible for the development of , By closed source software Quadrics RMS Inspired by the .
Slurm The latest version is 20.02, At present, it is composed of community and SchedMD Jointly maintained by the company , Keep open source and free , from SchedMD The company provides business support , Support only Linux System , The maximum number of nodes exceeds 12 ten thousand .
Slurm High fault tolerance 、 Support heterogeneous resources 、 Highly scalable and other advantages , More than... Can be submitted per second 1000 A mission , And because it's an open framework , Highly configurable , Have more than 100 Plug in , Therefore, the applicability is quite strong .
The global 60% Of TOP500 Supercomputing centers and super large-scale clusters ( Including China's Tianhe II, etc ) All use Slurm As a scheduling system . our TOP500 Just use Slurm Scheduling resources on the cloud .
We support in Slurm Automatic cluster scaling and cloud cost monitoring on , And support AWS、 Alibaba cloud 、Azure、 Tencent cloud 、 Hua Wei Yun 、Google Cloud Wait for cloud vendors .
fastone Of Auto-Scale The function can automatically monitor the number of tasks submitted by users and the demand for resources , Dynamically turn on the required computing resources on demand , Effectively reduce costs while improving efficiency .
PBS Schools
OpenPBS、PBS PRO、Moab/TORQUE
be based on PBS(Portable Batch System) The scheduler includes OpenPBS、PBS PRO、Moab/TORQUE.
PBS It was originally made by MRJ Technology Solutions On 1991 year 6 Month begins for NASA The job scheduling system ,MRJ On 20 century 90 By the end of the s Veridian Acquisition .2003 year ,Altair Acquired Veridian, To obtain the PBS Technology and intellectual property rights .
PBS Pro yes Altair its PBS WORKS Commercial version provided , Support visual interface , The number of nodes exceeds 50000 individual .
2016 year Altair be based on P****BS Pro An open source licensed version is available , And MRJ On 1998 The original open source version released in is roughly what it is now OpenPBS. And Pro Version than , A lot more restrictions , But they all support Linux and Windows.
**Moab/TORQUE Together, it is the function of a complete scheduler , Now belongs to the same company Adaptive Computing.**90 In the mid-s by MHPCC Of David Jackson Developed Maui, Later he founded Adaptive Computing.
Moab yes Adaptive Computing company ( Formerly known as Cluster Resources company-developed Maui Cluster Scheduler) Maintenance of OpenPBS Branch ,2003 Released in . The project was originally open source and free , Later it became commercial software Moab No longer free .
TORQUE(Terascale Open-source Resource and QUEue Manager) In the early Torque It's also open source free software , however 2018 year 6 Month begins TORQUE No more open source .
Both only support Linux System , Provide a visual interface , It has about thousands of nodes .
Cloud services ,PBS Pro Can pass Altair Control product Overflow from local to cloudy and Auto-Scale Cluster auto scaling , Supported cloud vendors include AWS、Azure and Google Cloud.
Moab/TORQUE You can go through NODUSCloud OS product Achieve local expansion to the cloud , Support TORQUE or Slurm Clustering and auto scaling , Supported cloud vendors include AWS、Azure、GoogleCloud And Huawei cloud , And pass Account Manager The product realizes cloud expense monitoring .
QUE or Slurm Clustering and auto scaling , Supported cloud vendors include AWS、Azure、GoogleCloud And Huawei cloud , And pass Account Manager The product realizes cloud expense monitoring .
边栏推荐
- 555 circuit details
- Basic introduction of yarn and job submission process
- Jump to the mobile terminal page or PC terminal page according to the device information
- [homework] 2022.7.6 write your own cal function
- 1321: [example 6.3] deletion problem (noip1994)
- [sword finger offer] 42 Stack push in and pop-up sequence
- 根据设备信息进行页面跳转至移动端页面或者PC端页面
- 软考中级有用吗??
- @Transcation的配置,使用,原理注意事项:
- leetcode-304:二维区域和检索 - 矩阵不可变
猜你喜欢
01 use function to approximate cosine function (15 points)
OpenGL glLightfv 函数的应用以及光源的相关知识
IIC Basics
Talking about the return format in the log, encapsulation format handling, exception handling
搭建物联网硬件通信技术几种方案
Remote meter reading, switching on and off operation command
Some superficial understanding of word2vec
P1031 [NOIP2002 提高组] 均分纸牌
Mendeley--免费的文献管理工具,给论文自动插入参考文献
2022年上半年5月网络工程师试题及答案
随机推荐
Elegant controller layer code
Schnuka: machine vision positioning technology machine vision positioning principle
BigDecimal value comparison
@Transcation的配置,使用,原理注意事项:
Kotlin实现微信界面切换(Fragment练习)
1323: [example 6.5] activity selection
Why is the reflection efficiency low?
[higherhrnet] higherhrnet detailed heat map regression code of higherhrnet
2022.7.5DAY597
Smart city construction based on GIS 3D visualization technology
宁愿把简单的问题说一百遍,也不把复杂的问题做一遍
Leetcode-304: two dimensional area and retrieval - matrix immutable
Common shortcut keys in IDA
Encrypt and decrypt stored procedures (SQL 2008/sql 2012)
【实战】霸榜各大医学分割挑战赛的Transformer架构--nnFormer
[sword finger offer] 42 Stack push in and pop-up sequence
1324:【例6.6】整数区间
C logging method
IPv4 socket address structure
2022.7.6DAY598