| <- PREV | Index | Next -> |
NHSE ReviewTM: Comments
· Archive
· Search
Packages Vendor Version -------- ------ ------- Codine - Computing in Distributed GENIAS GmbH, Germany 3.3.2 Network Environment Connect:Queue Sterling Corp., USA Oct 95 Load Balancer Unison Software, USA Jun 95 Load Leveler IBM Corp., USA 1.2.1 LSF - Load Sharing Facility Platform Computing, Canada 2.1 NQE - Network Queuing Environment Craysoft Corp., USA 2.0 Task Broker Hewlett-Packard Corp. 1.2
Functionality Supported Comments ------------- --------- -------- Commercial/Research Commercial Fully supported commercial package by Genias GmbH Cost Yes There are academic and commercial rates. User Support Yes Telephone/Email - seem very responsive. Heterogeneous Yes Platforms Most Major Convex, Cray, DEC, HP, IBM, SGI and Sun Operating Systems Most Major ConvexOS, UniCOS, OSF/1, Ultrix, HP-UX, AIX, Irix, SunOS and Solaris Additional No Hardware/Software Batch Jobs Yes Interactive Support Yes Parallel Support Yes PVM and Express Queue Types Yes Dispatching policy Yes Configurable. Impact on Cluster Users Yes Can be set up to minimise impact Impact on Cluster w/s Yes If process Migration and checkpointing used Load Balancing Yes No details. Check Pointing Yes Optional - Relink code with Codine libraries Process Migration Yes Optional Job Monitoring and Yes Rescheduling Suspension/Resumption Yes BSD Signals supported of Jobs Resource Administration Yes Yes, via a master scheduler Job Runtime Limits Yes Forked Child Management No No migration or checkpointing of these processes. Job Scheduling Yes Priority Process Management Yes Runtime limits and exclusive CPU usage Job Scheduling Control Yes GUI/Command-line Yes Ease of Use Unknown User Allocation of Job Yes User Job Status Query Yes Job Statistics Yes Runtime Configuration Yes Dynamic Dynamic Resource Pool Yes Single Point of Yes Being worked upon. Failure Fault Tolerance Some Master scheduler will reschedule a job Security Issues Yes Normal Unix - but seems to have been thought out.Contact: Johannes Grawe
Functionality Supported Comments ------------- --------- -------- Commercial/Research Commercial Sterling Software - Dallas office Cost ? User Support Yes By Sterling Heterogeneous Yes Platforms Most Major Data General, Digital, IBM HP, SGI, Amdahl, Meiko, and Sun Operating Systems Most Major DG/UX, OSF/Ultrix, Aix/ECS, UX, Irix, UTS, SunOS/Solaris Additional No Hardware/Software Batch Jobs Yes Interactive Support No Planned for the future but will redirect interactive i/o Parallel Support Yes PVM and Linda Queue Types Yes Multiple queue types Dispatching policy Yes CPU/Memory/Queue based Impact on w/s Owner Yes Owner has no control over w/s resources Impact on Cluster w/s Yes CPU/Memory/Swap-space Load Balancing Yes CPU/Memory/Queue based Check Pointing No Planned for the future Process Migration No Planned for the future Job Monitoring and Yes Rescheduling Suspension/Resumption No of Jobs Resource Yes Administration Job Runtime Limits Yes Forked Child No Management Job Scheduling Yes Priority Process Management Yes Need to configure whole machine Job Scheduling Control Yes GUI/Command-line Both GUI for user and administrator Ease of Use Unknown User Allocation of Job Limited User can only specify which queue User Job Status Query Yes Job Statistics Yes Runtime Configuration Yes Administrator can modify queues and their policies on "fly" Dynamic Resource Pool Yes Probably yes Single Point of Minimised Multiple master schedulers are run. Failure Fault Tolerance Yes Scheduler will attempt to restart jobs after a failure Security Issues Yes Uses normal Unix security[Editor's Note: CONNECT:Queue is no longer being actively promoted by Sterling Commerce (SC). Although SC is committed to supporting its existing CONNECT:Queue customers, SC is attempting to upgrade them, as well as sell to new prospects, a much more robust job scheduling and workload balancing software called JP1. JP1 is manufactured by Hitachi and SC is presently the sole distributor of this product in the U.S.]
Contact: Lisa Middleton
Manager, Marketing Communications
Sterling Commerce
5215 North O'Conner Blvd, Suite 1500
Irving, TX 75039, USA
Tel: +1 (800) 700-5599 (sales)
Tel: +1 (800) 292-0104 (tech support)
Email: Lisa_Middleton@csg.stercomm.com
Systems Supported:
Data General - DG/UX
DEC - OSF/1 & Ultrix
IBM - Aix/ESA
HP - HP-UX
SGI - Irix
Amdahl - UTS
Sun - Solaris & SunOS
Functionality Supported Comments ------------- --------- -------- Commercial/Research Commercial Unison Software Solution Cost POA User Support Yes Telephone/Email Heterogeneous Yes Platforms Most Major Sun, HP, IBM, DEC, SGI Operating Systems Most Major Ultrix, HP-UX, Aix, Irix, SunOS, Solaris Additional No Hardware/Software Batch Jobs Yes Interactive Support Yes Parallel Support No Queue Types Multiple Dispatching policy Multiple Numerous factors can be taken into account Impact on w/s Owner Configurable User have rights over their w/s Impact on Cluster w/s Yes CPU/Memory/Swap Load Balancing Yes Tries to distribute load evenly. Check Pointing Yes Process Migration Yes Job Monitoring and Yes Rescheduling Suspension/Resumption Yes of Jobs Resource Administration Yes Numerous factors can be taken into account Job Runtime Limits Yes Forked Child No Management Job Scheduling Priority Yes Process Management Yes Exclusive CPU access available Job Scheduling Control Yes GUI/Command-line Both Ease of Use Unknown User Allocation of Job Yes Submit to one of many queues User Job Status Query Yes Job Statistics Yes Runtime Configuration Yes Dynamic Resource Pool Yes Master scheduler check resource table regularly Single Point of Failure Yes New release will overcome this with multiple master schedulers Fault Tolerance Partially If machine crashes, scheduler will try and re-run Job. Security Issues Yes Unix securityContact:
Functionality Supported Comments ------------- --------- -------- Commercial/Research Commercial IBM Kingston v 1.2.1 Cost POA User Support Yes IBM type - depends on level purchased Heterogeneous Yes Platforms Most Major SP, RS/6000, Sun SPARC, SGI workstation, HP 9000 Operating Systems Most Major AIX 4.1.3, SunOS 4.1.2, Solaris 2.3, IRIX 5.2, HP-UX 9.01 Additional No Hardware/Software Batch Jobs Yes Interactive Support Yes Only on Spx, planned for clusters in future release Parallel Support Yes Only on Spx, planned for clusters in future release - PVMe Queue Types Multiple Dispatching policy Configurable Impact on w/s Owner Configurable User determines when their machine is available Impact on Cluster w/s Yes CPU/RAM/Swap Load Balancing Yes based on fair and optimal use of resources Check Pointing Yes Need to relink with LL libraries Process Migration Yes Need to relink with LL libraries Job Monitoring and Yes Rescheduling Suspension/Resumption Yes of Jobs Resource Yes Administrator can configure this Administration Job Runtime Limits Yes Forked Child No Management Job Scheduling Yes Priority Process Management Yes Can have exclusive CPU usage Job Scheduling Control Yes GUI/Command-line Both NQS compatible Ease of Use Unknown User Allocation of Job Yes User Job Status Query Yes Job Statistics Yes Runtime Configuration Yes Dynamic Resource Pool Yes Single Point of Limited Master schedulers - but does have Failure checkpointing Fault Tolerance Limited Schedulers will try and re-run jobs. Security Issues Yes Normal Unix SecurityContact: Kathy Lange
Functionality Supported Comments ------------- --------- -------- Commercial/Research Commercial Platform Computing Corp. Cost POA User Support Yes Heterogeneous Yes Platforms Most Major Convex, Digital, HP, IBM, SGI and Sun Operating Systems Most Major ConvexOS, Ultrix, HP--UX, Aix, Irix, SCO, SunOS and Solaris Additional No Hardware/Software Batch Jobs Yes Interactive Support Yes Parallel Support Yes PVM, P4, TCCMSG, DSM and Linda Queue Types Multiple Dispatching policy Configurable Satisfy job resource and system requirements Impact on w/s Owner Yes LSF will migrate jobs on and off w/s Impact on Cluster w/s Yes CPU/RAM/Swap Load Balancing Yes Tries to distributed work load evenly across cluster Check Pointing No Planned in future releases Process Migration Yes Job Monitoring and Yes Migration of jobs Rescheduling Suspension/Resumption Yes of Jobs Resource Yes Administration Job Runtime Limits Yes Forked Child Management No Job Scheduling Priority Yes Process Management Yes Exclusive CPU usage possible Job Scheduling Control Yes GUI/Command-line Both Ease of Use Unknown User Allocation of Job Yes Limited User Job Status Query Yes Job Statistics Yes Runtime Configuration Yes Dynamic Resource Pool Yes? Not sure but seems likely. Single Point of Failure No Master scheduler re-elected by slave schedulers Fault Tolerance Yes jobs restarted Security Issues Yes Normal Unix, trusted hosts and Kerboros supportContact: Harrison Cheung
Functionality Supported Comments ------------- --------- -------- Commercial/Research Commercial CraySoft Corp. Cost Yes $2875 per 10-user network - 27th Sep 95 User Support Yes Telephone/Email, etc. Heterogeneous Yes Platforms Most Major SPARC, IBM RS/6000, SGI, Digital Alpha, HP and Cray Research. Operating Systems Most Major Solaris, SunOS, Aix, Irix, Dec-OSF/1, HP-UX & Unicos (latest) Additional No Hardware/Software Batch Jobs Yes Interactive Support Yes Parallel Support Yes PVM Queue Types Multiple Batch and pipe queues Dispatching policy Yes Configurable by administrators Impact on w/s Owner Configurable w/s owner can configure impact Impact on Cluster w/s Yes CPU/RAM/Swap Load Balancing Yes Check Pointing No Possible Cray if NQS available Process Migration No Possible Cray if NQS available Job Monitoring and Yes Rescheduling Suspension/Resumption Yes of Jobs Resource Yes Administration Job Runtime Limits Yes Forked Child Management No Only under UniCOS Job Scheduling Priority Yes Process Management Yes Workstation can be configured to allow exclusive CPU Job Scheduling Control Yes GUI/Command-line Both WWW interface Ease of Use Unknown User Allocation of Job Yes To one of a number of queues User Job Status Query Yes Job Statistics Yes Runtime Configuration Yes Dynamic Resource Pool Yes Single Point of No Client forwards jobs to master scheduler, Failure multiple schedulers Fault Tolerance Some Jobs not sent to queues not responding Security Issues High Has US DoD B1 security rating, plus Cray Multi-level securityContact: Dan Ferber
Functionality Supported Comments ------------- --------- -------- Commercial/Research Commercial Cost Yes Doc & Media = $315, 10 w/s License ~ $5k User Support Yes Telephone/Email - depends on how much you pay. Heterogeneous No Version supplied by HP only works on HP platforms. Platforms One HP - 3rd Party versions available. Operating Systems One HP-UX Additional No Hardware/Software Batch Jobs Yes Interactive Support Limited X supported for I/O, stdin/stderr need special w/s configuration Parallel Support No Queue Types Many Configurable Queues Dispatching policy Yes Impact on w/s Owner Yes Can be achieved, but need to "edit" local configuration file. Impact on Cluster w/s Yes CPU/RAM/Disk Load Balancing Yes Scheme where each w/s "bids" for job Check Pointing No Process Migration No Job Monitoring and Yes & No No automatic rescheduling of jobs Rescheduling Suspension/Resumption Yes Supports all BSD signals of Jobs Resource Yes Local and global configuration files. Administration Job Runtime Limits No No built in Forked Child No See above... Management Job Scheduling Yes Priority Process Management Yes w/s can be configured to provide exclusive or shared access to jobs Job Scheduling Control Yes GUI/Command-line Both Motif based GUI Ease of Use Unknown User Allocation of Job Yes User Job Status Query Yes Job Statistics Yes Admin and user statistics Runtime Configuration Yes Need to "edit" configuration file on each w/s - static Dynamic Resource Pool Yes Need to "edit" local configuration file to add/withdrawn w/s. Single Point of No Schedulers distributed - jobs are not Failure guaranteed to complete. Fault Tolerance Yes Jobs are not guaranteed to complete. Security Issues Yes Normal UnixContact: Terry Graff
Packages Vendor Version -------- ------ ------- Batch UCSF, USA 4.0 CCS - Computing Center Software Paderborn, Germany ? Condor Wisconsin State University, 5.5.3 USA DJM - Distributed Job Manager Minnesota Supercomputing ?? Center DQS 3.x Florida State University, 3.1 USA EASY Argonne National Lab, USA 1.0 far University of Liverpool, UK 1.0 MDQS ARL, USA ?? Generic NQS University of Sheffield, UK 3.4 Portable Batch System NASA Amass & LLNL, USA 1.1 PRM - Prospero Resource Manager University of Southern 1.0 California QBATCH Vita Services Ltd., USA ??
Functionality Supported Comments ------------- --------- -------- Commercial/Research Research Unclear, IPR seems to belong to UCSF Cost PD GNU Types license User Support Yes email Author Heterogeneous Yes Platforms Many IBM RS/6000, Sparc, MIPS, Alpha Operating Systems Many AIX 3.x, IRIX 3.x, 4.x, & 5.x, SunOS 4.x and Ultrix 4.1 Additional No NFS Hardware/Software Batch Jobs Yes Interactive Support Yes Parallel Support No Queue Types Yes Multiple Queues Dispatching policy Configurable Impact on w/s Owner Controllable Impact on Cluster w/s Some Uses w/s CPU, memory and swap space Load Balancing Some Load balanced queues Check Pointing No Process Migration No Job Monitoring and Yes Rescheduling Suspension/Resumption Yes Start/suspend Jobs of Jobs Resource Some Administration Job Runtime Limits Yes Forked Child No Management Job Scheduling Yes Priority Process Management Some Unix "nice" Job Scheduling Control Yes GUI/Command-line Yes Based on Forms library, Job Status and removal. Ease of Use Unknown User Allocation of Job Some Control over Jobs on own workstation User Job Status Query Yes Job Statistics Yes Runtime Configuration Some Dynamic Resource Pool Unknown Single Point of Unknown Multiple queues, but probably single Failure administrative daemon Fault Tolerance No Not mentioned in documentation Security Issues Some Seems to be normal UnixContact: Scott R Presnell
Functionality Supported Comments ------------- --------- -------- Commercial/Research Research Paderborn Center for Parallel Computing Cost Unknown Enquire User Support Yes Telephone/Email -- Paderborn Center for Parallel Computing (Germany) Heterogeneous No Based around Parsytec Systems Platforms Parsytec Sparc, Parsytec GC Operating Systems Parix SunOS, Parix Additional No Hardware/Software Batch Jobs Yes Interactive Support Yes Parallel Support Yes Queue Types Yes Dispatching policy Yes Static configuration, dynamically allocated by user. Impact on w/s Owner N/A Impact on Cluster w/s Yes Resources, once allocated to a user may not be released (2) Load Balancing No Check Pointing No Process Migration No Job Monitoring and No Dependent upon the user Rescheduling Suspension/Resumption Yes of Jobs Resource Administration Yes Job Runtime Limits Yes Forked Child N/A Management Job Scheduling Yes Administrator Priority Process Management No Job Scheduling Control Yes Administrator configures resources GUI/Command-line Command-line Ease of Use Moderate User Allocation of Job Yes User Job Status Query Yes Job Statistics Some Not extensive Runtime Configuration Yes Dynamically configurable Dynamic Resource Pool Yes Resources dynamically configured. Single Point of Failure Yes Port and queues managers Fault Tolerance No Jobs need to be re-submitted Security Issues Yes Normal Unix securityContact: Friedhelm Ramme
Functionality Supported Comments ------------- --------- -------- Commercial/Research Academic Wisconsin State University Cost PD User Support Some Telephone/Email/Mailing List Heterogeneous Yes Platforms Most Major DEC, HP, IBM, Sequent, SGI and Sun Operating Systems Most Major OSF/1, Ultrix, HP-UX, AIX, Dynix, Irix, SunOS and Solaris Additional No Hardware/Software Batch Jobs Yes Interactive Support ? Parallel Support No PVM support planned Queue Types Yes Dispatching policy Impact on w/s Owner Yes Controllable Impact on Cluster w/s Yes Checkpoints to local w/s Load Balancing Check Pointing Yes Code must be relinked with Condor Libraries Process Migration Yes Code must be relinked with Condor Libraries Job Monitoring and Rescheduling Suspension/Resumption of Jobs Resource Yes Administration Job Runtime Limits Yes Forked Child No Management Job Scheduling Yes Priority Process Management Yes Job Scheduling Control Yes GUI/Command-line Command-line Ease of Use unknown User Allocation of Job Yes User Job Status Query Yes Job Statistics Yes Runtime Configuration Yes Dynamic Resource Pool Yes But only when machines fall idle are they added to the Condor pool Single Point of Yes At the master scheduler level Failure Fault Tolerance Yes Restart job at last checkpoint. Security Issues Yes Tries to maintain normal Unix securityContact: Miron Livny
Functionality Supported Comments ------------- --------- -------- Commercial/Research Research Minnesota Supercomputing Center Cost PD User Support Limited Heterogeneous No Platforms Limited Front-end - Sun's and SGI's, back-end CM-2 and CM-5 Operating Systems Limited Additional CM-X Hardware/Software Batch Jobs Yes Interactive Support Yes I/O redirected if running interactive as batch Parallel Support Yes Queue Types Multiple Queues Jobs to run on CM partitions Dispatching policy Yes No. of factors, including jobs size, partition loading, time in queue, etc Impact on w/s Owner No Impact on Cluster w/s No Load Balancing Yes Check Pointing No Process Migration No Job Monitoring and Yes Restart job Rescheduling Suspension/Resumption Yes Restart job of Jobs Resource Yes Partition manager Administration Job Runtime Limits Yes To groups and users per partition Forked Child No Management Job Scheduling Priority Yes Process Management Yes Job Scheduling Control Yes GUI/Command-line Command-line NQS type interface. Ease of Use Unknown User Allocation of Job Limited Direct job to a queue User Job Status Query Yes Command-line Job Statistics Yes Limited Runtime Configuration Yes Partitions and queues changed on "fly" Dynamic Resource Pool Yes partitions can be recovered after crash Single Point of Failure Yes Queue manager runs on w/s, but not dependent on partitions being up! Fault Tolerance Yes Will restart job - as long as queue manager remains active Security Issues Yes Normal Unix security.Contact: Liz Stadther
Functionality Supported Comments ------------- --------- -------- Commercial/Research Research Supercomputer Computation Research Institute (SCRI) at FSU Cost PD User Support Yes Telephone/Email support from SCRI + Mailing List Heterogeneous Yes Platforms Most Major Digital, Intel, HP, SGI, IBM, and Sun Operating Systems Most Major OSF/Ultrix, Linux, UX, Irix, Aix, SunOS and Solaris Additional No Hardware/Software Batch Jobs Yes Interactive Support Yes Via BSD sockets Parallel Support Yes PVM, P4 and P5 Queue Types Multiple Queue complexes defined by administrator Dispatching policy Yes By queue and weighted queue Impact on w/s Owner Minimal Can be configured not to affect owner at all Impact on Cluster w/s Yes Especially if checkpointing is used. Load Balancing Yes Configurable - CPU memory, job size etc. Check Pointing Yes Must be relined with DQS library Process Migration No Job Monitoring and Yes Rescheduling Suspension/Resumption Yes Configurable of Jobs Resource Administration Yes Job Runtime Limits Yes Forked Child No Management Job Scheduling Priority Yes Process Management Yes Will allow exclusive CPU access Job Scheduling Control Yes GUI/Command-line Both Ease of Use Unknown User Allocation of Job Yes Based on queues used User Job Status Query Yes GUI Job Statistics Yes Runtime Configuration Yes By root or designated DQS manager via GUI Dynamic Resource Pool Yes On the "fly" resource pool Single Point of Minimal Multiple instances of Master scheduler Failure Fault Tolerance Yes Tries to complete job after crash Security Issues Yes Normal Unix + AFS/KerborosContact: Tom Green
Functionality Supported Comments ------------- --------- -------- Commercial/Research Research Argonne National Laboratory Cost PD User Support Limited Email support by author, when time permits. + mailer group Heterogeneous Limited Written in Perl, should be easy to "port" Platforms Limited IBM (SP1 & SP2) and DEC Alpha Operating Systems Limited Aix and OSF/1 Additional No Will work with AFS Hardware/Software Batch Jobs Yes Interactive Support Yes I/O from batch jobs is delivered to user at end of job - actual interactive login Parallel Support Yes MPL, PVM, P4 and MPI - will basically support any interface Queue Types Single Configurable Dispatching policy Yes Configurable Impact on w/s Owner Yes Only normal Unix (nice) Impact on Cluster w/s Yes CPU/Memory/Swap Load Balancing No Check Pointing No Process Migration No Job Monitoring and Yes/No Jobs that fail are not rescheduled Rescheduling Suspension/Resumption No of Jobs Resource Yes By administrator - configurable Administration Job Runtime Limits Yes Runtime limits are put on a users access to the resources allocated Forked Child No Not applicable as user has exclusive access Management to node. Job Scheduling Yes Priority Process Management No Each user has exclusive access to the nodes allocated to them Job Scheduling Control Yes GUI/Command-line Command-line GUI planned Ease of Use Unknown User Allocation of Job Limited Direct job to a queue User Job Status Query Yes Command-line Job Statistics Yes Limited - extensions planned Runtime Configuration Yes Queue needs to be turned off, reconfigured and turned back on. Dynamic Resource Pool Yes Single Point of Yes Scheduler need to write to filesystem Failure Fault Tolerance No Jobs need to be rescheduled after a failure. Security Issues Yes Normal Unix security.[Editor's Note: the next article update will include the new EASYLL, a combination of EASY and LoadLeveler.]
Contact: David Lifka
Cornell Theory Center
Frank H.T. Rhodes Hall
Hoy Road, Cornell University
Ithaca, NY 14853-3801, USA
Tel: +1 (607) 254-8621
Fax:
Email: lifka@tc.cornell.edu
Systems Supported:
IBM SP1 and SP2
DEC Alpha Farm
Functionality Supported Comments ------------- --------- -------- Commercial/Research Research University of Liverpool, UK Cost PD User Support Yes Telephone/Email - UK JISC funded support Heterogeneous Some Only implemented on Sun's at moment Platforms Several Sun - SGI and HP (beta) Operating Systems Several SunOS and Solaris Irix and HP-UX beta) Additional No Hardware/Software Batch Jobs Yes Interactive Support Yes BSD Sockets Parallel Support Yes NAS-HPF & PVM Queue Types No Dispatching policy Yes Automatic or manual Impact on w/s Owner Configurable Console user has priority - non "owner" jobs killed by default Impact on Cluster w/s Yes CPU/RAM/diskspace Load Balancing Limited Based on w/s loads - can be manually over ridden. Check Pointing No Deliberately omitted Process Migration No Deliberately omitted Job Monitoring and No User initiated (manual) Rescheduling Suspension/Resumption No User initiated (manual) of Jobs Resource Administration Some Static database Job Runtime Limits No Normal Unix can be invoked manually Forked Child Management No Job Scheduling Priority No Process Management No Job Scheduling Control No GUI/Command-line Command-line Ease of Use Unknown User Allocation of Job Yes User Job Status Query Yes User initiated (manual) Job Statistics No Administrator can add Unix statistics, but not standard Runtime Configuration Limited Editing far database files Dynamic Resource Pool Limited Required editing master database Single Point of Failure Yes Master daemon Fault Tolerance No When Master deamon dies, far needs reboot. Security Issues Yes Normal UnixContact: J Steve Morgan
Functionality Supported Comments ------------- --------- -------- Commercial/Research Research University of Sheffield Cost PD GNU License User Support Yes Supported by the University of Sheffield (JISC-NTI) until July '96 Heterogeneous Yes Platforms Most Major IBM, Fujitsu, HP, SGI, Intel, NCR, Sun, DEC & Cray Operating Systems Most Major AIX, UXP/M, HP-UX, IRIX, Linux/, Solaris, SunOS, Ultrix, OSF/1 & UNICOS Additional No Hardware/Software Batch Jobs Yes Interactive Support Yes NQS nets to be configured to send stdin/err to file. Parallel Support No Not yet. Queue Types Yes One on each server, minimal Unix configuration. Dispatching policy Static Each queue knows about its own load and performance Impact on w/s Owner Yes Queues can be day/night sensitive, owner can "nice" jobs Impact on Cluster w/s Yes CPU/RAM//Diskspace Load Balancing Static Master scheduling (option), only knows perf. and load at each queue Check Pointing No Process Migration No Job Monitoring and Yes Rescheduling Suspension/Resumption Yes Supports normal Unix signals of Jobs Resource Administration Yes Normal Unix Job Runtime Limits Yes Normal Unix Forked Child Management No Job Scheduling Priority Yes Process Management Yes Manage the number of jobs run (one or many) Job Scheduling Control Yes GUI/Command-line Command-line WWW based interface planned. Ease of Use Unknown User Allocation of Job Yes User Job Status Query Yes Job Statistics Yes Amount depends on the platform running job Runtime Configuration Yes Dynamic Dynamic Resource Pool Yes Dynamic Single Point of Failure Yes/No Yes if master scheduler, No if just configured with "peer" queues. Fault Tolerance Yes Queue will try run and complete job after crash Security Issues Yes Normal UnixContact: Stuart Herbert
Functionality Supported Comments ------------- --------- -------- Commercial/Research Research Ballistics Research Laboratory Cost PD User Support Some Heterogeneous Yes Platforms Some Sun Sparc +?? Operating Systems Some SunOS 4.1.x, BSD 4.2, SYS III & V Additional No Hardware/Software Batch Jobs Yes Interactive Support Yes Redirected I/O and sockets ? Parallel Support No Queue Types Multiple Dispatching policy Unknown Impact on w/s Owner Unknown Impact on Cluster w/s Normal Load Balancing Probably Master scheduler probably does this task Check Pointing No Process Migration No Job Monitoring and unknown/Yes On failure job will be rescheduled Rescheduling Suspension/Resumption Yes BSD signals of Jobs Resource Administration Yes Via multiple queues Job Runtime Limits Probably No known if enforced Forked Child Management No Job Scheduling Priority Probably Process Management Unknown Job Scheduling Control Probably GUI/Command-line Command-line Ease of Use Unknown User Allocation of Job Yes User Job Status Query Probably Job Statistics Probably Runtime Configuration Unknown Dynamic Resource Pool Unknown Single Point of Yes Failure Fault Tolerance Yes On reboot, system will attempt to re-run jib. Security Issues Yes Normal UnixContact: Mike Muuss
Functionality Supported Comments ------------- --------- -------- Commercial/Research Research NASA Ames & LLNL Cost PD User Support Yes Telephone/Email/Mailing List - NASA Ames Heterogeneous Yes Platforms Multiple Sun, SGI, IBM, Intel, Thinking Machines, Cray Platforms Operating Systems Multiple OS/s SunOS, Solaris, Aix, Intel-OSF/1, CMOST, Unicos Additional No Hardware/Software Batch Jobs Yes Interactive Support Yes stdin/stderr via Xterm Parallel Support Yes On Parallel machines - interfaces Queue Types Multiple Definable by administrator Dispatching policy Configurable Impact on w/s Owner Yes Configurable - mouse/keyboard/resources available Impact on Cluster w/s Yes CPU/RAM/Swap Load Balancing Yes Check Pointing No Vendor specific - reliant on Posix 1003.1a Process Migration No Job Monitoring and Yes Rescheduling Suspension/Resumption Yes of Jobs Resource Yes Administration Job Runtime Limits Yes Forked Child Management No Job Scheduling Yes Priority Process Management Yes Configure for exclusive CPU usage Job Scheduling Control Yes GUI/Command-line Command-line Tcl/Tk planned for next release Ease of Use Unknown User Allocation of Job Yes To specific queue User Job Status Query Yes Job Statistics Yes Runtime Configuration Yes Dynamic Resource Pool Yes Single Point of Unsure Unsure about master scheduler... Failure Fault Tolerance Minimised Restart jobs after failure - not sure about crashes Security Issues Yes Normal Unix trusted clients and Kerboros type authorisationContact: Dave Tweten
Functionality Supported Comments ------------- --------- -------- Commercial/Research Research Information Sciences Institute, University of S. California Cost PD Free to non-commercial sites (License Agreement) User Support Yes Telephone/Email/WWW Heterogeneous Yes but ! Only supports two platforms Platforms Sun & HP Sun and HP Operating Systems Yes SunOS and HP-UX Additional No Hardware/Software Batch Jobs Yes Interactive Support Yes I/O redirected to users terminal Parallel Support Yes CMMD, PVM, and Express - MPI planned Queue Types Yes Dispatching policy No Impact on w/s Owner No Configurable + migrate processes off w/s Impact on Cluster w/s Yes Diskspace and Network Load Balancing No Check Pointing Yes Taken from Condor Process Migration Yes Taken from Condor Job Monitoring and Yes/No No automatic rescheduling. Rescheduling Suspension/Resumption Some Processes can be suspended individually. of Jobs Resource Administration Yes Through "system manager" software Job Runtime Limits No None imposed Forked Child Management No Job Scheduling No Priority Process Management Yes/No Yes, but when w/s owner returns job will be suspended and migrated Job Scheduling Control No GUI/Command-line command-line Ease of Use Unknown User Allocation of Job Yes Using job configuration file User Job Status Query Yes Job Statistics Some Runtime Configuration Yes Dynamic Resource Pool Yes Single Point of Yes/No Failed resources will be acquired by other Failure system managers Fault Tolerance No Security Issues Yes Normal Unix, plans for Kerboros type authenticationContact: Santosh Roa
Functionality Supported Comments ------------- --------- -------- Commercial/Research Research Alan Saunders Cost PD User Support None Heterogeneous Yes Limited multi-platform support Platforms Several Sun, DEC and IBM Operating Systems Several SunOS, Ultrix and Aix Additional No Hardware/Software Batch Jobs Yes Interactive Support Yes Send to queue monitor, but can be reconfigured. Parallel Support No Queue Types Multiple One or more queues per server Dispatching policy Static Configured when queue is started Impact on w/s Owner Yes Queue can be configured with low-priority Impact on Cluster w/s Yes CPU/Memory/Diskspace Load Balancing Static Seems necessary send jobs to specific queues Check Pointing No Process Migration No Job Monitoring and Yes Rescheduling Suspension/Resumption Yes Normal BSD signals supported of Jobs Resource Administration Static Configured at startup time. Job Runtime Limits Yes Unix limits supported Forked Child No Management Job Scheduling Yes Prioritise queues Priority Process Management Probably Job Scheduling Control Yes GUI/Command-line command-line Ease of Use unknown User Allocation of Job Yes User Job Status Query Yes Job Statistics Yes Amount unknown, but probably normal Unix accounting. Runtime Configuration Static Dynamic Resource Pool Static Single Point of No Multiple queues - each knows nothings about Failure the others Fault Tolerance Yes Machines crashes jobs in queue re-run Security Issues Yes Normal UnixContact: Alan Saunders
The remaining five systems: Codine, Connect:Queue, LoadLeveler, LSF and NQE, are functionally very similar. These five packages all support parallel jobs, but, at present LoadLeveler(3) only supports parallel jobs on IBM SP2 systems rather than workstation clusters, thus eliminating it from our shortlist. The other packages all support PVM but none mention future support for MPI and HPF.
If checkpointing is deemed to be important then Connect:Queue, LSF and NQE are eliminated from the shortlist. The authors of this report take the view that checkpointing is a useful additional feature rather than being highly desirable.
If security is a major issue then only LSF and NQE have security features over and above normal Unix features. Both packages support Kerboros type user authentication, but NQE additionally has a US DoD security rating.
Of the remaining packages, Connect:Queue, LSF and NQE all use multiple master schedulers to minimise problems associated with Single Points of Failure - and a similar feature is planned for the next release of Codine. All the packages claim to maximise their resilience and fault tolerance.
Conclusion
Four packages (3) remain on our shortlist (Codine, Connect:Queue, LSF and NQE) out of the original seven. It is difficult to reduce the list further without installing and detailed testing of the functionality, robustness and stability of each package. However, an additional consideration is the cost of a site license and software maintenance support for each package.
Five packages (Batch, Condor, GNQS, MDQS and Qbatch) do not support parallel jobs which these authors deem to be highly desirable. Support for parallel jobs is planned in both Condor and GNQS.
Three packages, EASY, far and PRM have limited functionality under the Job Scheduling and Allocation Policy section (see section 2.4). EASY, at present, has no concept of load balancing, however, it is planned in a future release. far is able to accomplish some load balancing, but it appears to be rather crude (based on rup) and its usefulness on a large diverse cluster would be limited. PRM has no dispatching or load balancing capability, even though it is capable of migrating processes. All three packages are incapable of suspending, resuming or rescheduling jobs without intervention at the processes level by a systems administrator.
Conclusion
Two packages remain on the selection list (DQS and PBS) out of the original twelve. It is difficult to reduce the list further without actually installing and practically testing the functionality, robustness and stability of each package. A key factor in choosing between these two remaining packages would be the maturity of the software, how widespread is its usage and how much user support could be expected from the authors or supporting site.
If sequential jobs, rather than parallel ones, are the overriding concern then Condor and GNQS should be the packages of choice. GNQS in particular is a mature, robust and widely used package. Support for it can be found at a number of sites, including CERN and the University of Sheffield.
(3)
Authors Footnote - LoadLeveler - 13 June 1996 - Mark Baker mab@npac.syr.edu
Since this article was written there have been several revisions of
LoadLeveler from IBM.
Loadleveler 1.2.x supports parallel jobs running on heterogeneous
distributed systems. It should be noted that this functionality would
mean that if this review was rewritten today LoadLeveler would be one of
the commercial packages shortlisted in this section.
| <- PREV | Index | Next -> |
NHSE ReviewTM: Comments
· Archive
· Search
NHSE: Software Catalog
· Roadmap