| <- PREV | Index | Next -> |
NHSE ReviewTM: Comments
· Archive
· Search
Packages Vendor Version
-------- ------ -------
Codine - Computing in Distributed GENIAS GmbH, Germany 3.3.2
Network Environment
Connect:Queue Sterling Corp., USA Oct 95
Load Balancer Unison Software, USA Jun 95
Load Leveler IBM Corp., USA 1.2.1
LSF - Load Sharing Facility Platform Computing, Canada 2.1
NQE - Network Queuing Environment Craysoft Corp., USA 2.0
Task Broker Hewlett-Packard Corp. 1.2
Functionality Supported Comments
------------- --------- --------
Commercial/Research Commercial Fully supported commercial package by
Genias GmbH
Cost Yes There are academic and commercial rates.
User Support Yes Telephone/Email - seem very responsive.
Heterogeneous Yes
Platforms Most Major Convex, Cray, DEC, HP, IBM, SGI and Sun
Operating Systems Most Major ConvexOS, UniCOS, OSF/1, Ultrix, HP-UX, AIX,
Irix, SunOS and Solaris
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support Yes
Parallel Support Yes PVM and Express
Queue Types Yes
Dispatching policy Yes Configurable.
Impact on Cluster Users Yes Can be set up to minimise impact
Impact on Cluster w/s Yes If process Migration and checkpointing used
Load Balancing Yes No details.
Check Pointing Yes Optional - Relink code with Codine libraries
Process Migration Yes Optional
Job Monitoring and Yes
Rescheduling
Suspension/Resumption Yes BSD Signals supported
of Jobs
Resource Administration Yes Yes, via a master scheduler
Job Runtime Limits Yes
Forked Child Management No No migration or checkpointing of these
processes.
Job Scheduling Yes
Priority
Process Management Yes Runtime limits and exclusive CPU usage
Job Scheduling Control Yes
GUI/Command-line Yes
Ease of Use Unknown
User Allocation of Job Yes
User Job Status Query Yes
Job Statistics Yes
Runtime Configuration Yes Dynamic
Dynamic Resource Pool Yes
Single Point of Yes Being worked upon.
Failure
Fault Tolerance Some Master scheduler will reschedule a job
Security Issues Yes Normal Unix - but seems to have been thought
out.
Contact: Johannes Grawe
Functionality Supported Comments
------------- --------- --------
Commercial/Research Commercial Sterling Software - Dallas office
Cost ?
User Support Yes By Sterling
Heterogeneous Yes
Platforms Most Major Data General, Digital, IBM HP, SGI, Amdahl,
Meiko, and Sun
Operating Systems Most Major DG/UX, OSF/Ultrix, Aix/ECS, UX, Irix, UTS,
SunOS/Solaris
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support No Planned for the future but will redirect
interactive i/o
Parallel Support Yes PVM and Linda
Queue Types Yes Multiple queue types
Dispatching policy Yes CPU/Memory/Queue based
Impact on w/s Owner Yes Owner has no control over w/s resources
Impact on Cluster w/s Yes CPU/Memory/Swap-space
Load Balancing Yes CPU/Memory/Queue based
Check Pointing No Planned for the future
Process Migration No Planned for the future
Job Monitoring and Yes
Rescheduling
Suspension/Resumption No
of Jobs
Resource Yes
Administration
Job Runtime Limits Yes
Forked Child No
Management
Job Scheduling Yes
Priority
Process Management Yes Need to configure whole machine
Job Scheduling Control Yes
GUI/Command-line Both GUI for user and administrator
Ease of Use Unknown
User Allocation of Job Limited User can only specify which queue
User Job Status Query Yes
Job Statistics Yes
Runtime Configuration Yes Administrator can modify queues and their
policies on "fly"
Dynamic Resource Pool Yes Probably yes
Single Point of Minimised Multiple master schedulers are run.
Failure
Fault Tolerance Yes Scheduler will attempt to restart jobs after
a failure
Security Issues Yes Uses normal Unix security
[Editor's Note:
CONNECT:Queue is no longer being actively promoted by Sterling Commerce (SC). Although SC is committed to supporting its existing CONNECT:Queue customers, SC is attempting to upgrade them, as well as sell to new prospects, a much more robust job scheduling and workload balancing software called JP1. JP1 is manufactured by Hitachi and SC is presently the sole distributor of this product in the U.S.]
Contact: Lisa Middleton
Manager, Marketing Communications
Sterling Commerce
5215 North O'Conner Blvd, Suite 1500
Irving, TX 75039, USA
Tel: +1 (800) 700-5599 (sales)
Tel: +1 (800) 292-0104 (tech support)
Email: Lisa_Middleton@csg.stercomm.com
Systems Supported:
Data General - DG/UX
DEC - OSF/1 & Ultrix
IBM - Aix/ESA
HP - HP-UX
SGI - Irix
Amdahl - UTS
Sun - Solaris & SunOS
Functionality Supported Comments
------------- --------- --------
Commercial/Research Commercial Unison Software Solution
Cost POA
User Support Yes Telephone/Email
Heterogeneous Yes
Platforms Most Major Sun, HP, IBM, DEC, SGI
Operating Systems Most Major Ultrix, HP-UX, Aix, Irix, SunOS, Solaris
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support Yes
Parallel Support No
Queue Types Multiple
Dispatching policy Multiple Numerous factors can be taken into account
Impact on w/s Owner Configurable User have rights over their w/s
Impact on Cluster w/s Yes CPU/Memory/Swap
Load Balancing Yes Tries to distribute load evenly.
Check Pointing Yes
Process Migration Yes
Job Monitoring and Yes
Rescheduling
Suspension/Resumption Yes
of Jobs
Resource Administration Yes Numerous factors can be taken into account
Job Runtime Limits Yes
Forked Child No
Management
Job Scheduling Priority Yes
Process Management Yes Exclusive CPU access available
Job Scheduling Control Yes
GUI/Command-line Both
Ease of Use Unknown
User Allocation of Job Yes Submit to one of many queues
User Job Status Query Yes
Job Statistics Yes
Runtime Configuration Yes
Dynamic Resource Pool Yes Master scheduler check resource table
regularly
Single Point of Failure Yes New release will overcome this with multiple
master schedulers
Fault Tolerance Partially If machine crashes, scheduler will try and
re-run Job.
Security Issues Yes Unix security
Contact:
Functionality Supported Comments
------------- --------- --------
Commercial/Research Commercial IBM Kingston v 1.2.1
Cost POA
User Support Yes IBM type - depends on level purchased
Heterogeneous Yes
Platforms Most Major SP, RS/6000, Sun SPARC, SGI workstation, HP
9000
Operating Systems Most Major AIX 4.1.3, SunOS 4.1.2, Solaris 2.3, IRIX 5.2,
HP-UX 9.01
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support Yes Only on Spx, planned for clusters in future
release
Parallel Support Yes Only on Spx, planned for clusters in future
release - PVMe
Queue Types Multiple
Dispatching policy Configurable
Impact on w/s Owner Configurable User determines when their machine is
available
Impact on Cluster w/s Yes CPU/RAM/Swap
Load Balancing Yes based on fair and optimal use of resources
Check Pointing Yes Need to relink with LL libraries
Process Migration Yes Need to relink with LL libraries
Job Monitoring and Yes
Rescheduling
Suspension/Resumption Yes
of Jobs
Resource Yes Administrator can configure this
Administration
Job Runtime Limits Yes
Forked Child No
Management
Job Scheduling Yes
Priority
Process Management Yes Can have exclusive CPU usage
Job Scheduling Control Yes
GUI/Command-line Both NQS compatible
Ease of Use Unknown
User Allocation of Job Yes
User Job Status Query Yes
Job Statistics Yes
Runtime Configuration Yes
Dynamic Resource Pool Yes
Single Point of Limited Master schedulers - but does have
Failure checkpointing
Fault Tolerance Limited Schedulers will try and re-run jobs.
Security Issues Yes Normal Unix Security
Contact: Kathy Lange
Functionality Supported Comments
------------- --------- --------
Commercial/Research Commercial Platform Computing Corp.
Cost POA
User Support Yes
Heterogeneous Yes
Platforms Most Major Convex, Digital, HP, IBM, SGI and Sun
Operating Systems Most Major ConvexOS, Ultrix, HP--UX, Aix, Irix, SCO,
SunOS and Solaris
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support Yes
Parallel Support Yes PVM, P4, TCCMSG, DSM and Linda
Queue Types Multiple
Dispatching policy Configurable Satisfy job resource and system requirements
Impact on w/s Owner Yes LSF will migrate jobs on and off w/s
Impact on Cluster w/s Yes CPU/RAM/Swap
Load Balancing Yes Tries to distributed work load evenly across
cluster
Check Pointing No Planned in future releases
Process Migration Yes
Job Monitoring and Yes Migration of jobs
Rescheduling
Suspension/Resumption Yes
of Jobs
Resource Yes
Administration
Job Runtime Limits Yes
Forked Child Management No
Job Scheduling Priority Yes
Process Management Yes Exclusive CPU usage possible
Job Scheduling Control Yes
GUI/Command-line Both
Ease of Use Unknown
User Allocation of Job Yes Limited
User Job Status Query Yes
Job Statistics Yes
Runtime Configuration Yes
Dynamic Resource Pool Yes? Not sure but seems likely.
Single Point of Failure No Master scheduler re-elected by slave
schedulers
Fault Tolerance Yes jobs restarted
Security Issues Yes Normal Unix, trusted hosts and Kerboros
support
Contact: Harrison Cheung
Functionality Supported Comments
------------- --------- --------
Commercial/Research Commercial CraySoft Corp.
Cost Yes $2875 per 10-user network - 27th Sep 95
User Support Yes Telephone/Email, etc.
Heterogeneous Yes
Platforms Most Major SPARC, IBM RS/6000, SGI, Digital Alpha, HP and
Cray Research.
Operating Systems Most Major Solaris, SunOS, Aix, Irix, Dec-OSF/1, HP-UX &
Unicos (latest)
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support Yes
Parallel Support Yes PVM
Queue Types Multiple Batch and pipe queues
Dispatching policy Yes Configurable by administrators
Impact on w/s Owner Configurable w/s owner can configure impact
Impact on Cluster w/s Yes CPU/RAM/Swap
Load Balancing Yes
Check Pointing No Possible Cray if NQS available
Process Migration No Possible Cray if NQS available
Job Monitoring and Yes
Rescheduling
Suspension/Resumption Yes
of Jobs
Resource Yes
Administration
Job Runtime Limits Yes
Forked Child Management No Only under UniCOS
Job Scheduling Priority Yes
Process Management Yes Workstation can be configured to allow
exclusive CPU
Job Scheduling Control Yes
GUI/Command-line Both WWW interface
Ease of Use Unknown
User Allocation of Job Yes To one of a number of queues
User Job Status Query Yes
Job Statistics Yes
Runtime Configuration Yes
Dynamic Resource Pool Yes
Single Point of No Client forwards jobs to master scheduler,
Failure multiple schedulers
Fault Tolerance Some Jobs not sent to queues not responding
Security Issues High Has US DoD B1 security rating, plus Cray
Multi-level security
Contact: Dan Ferber
Functionality Supported Comments
------------- --------- --------
Commercial/Research Commercial
Cost Yes Doc & Media = $315, 10 w/s License ~ $5k
User Support Yes Telephone/Email - depends on how much you pay.
Heterogeneous No Version supplied by HP only works on HP
platforms.
Platforms One HP - 3rd Party versions available.
Operating Systems One HP-UX
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support Limited X supported for I/O, stdin/stderr need special
w/s configuration
Parallel Support No
Queue Types Many Configurable Queues
Dispatching policy Yes
Impact on w/s Owner Yes Can be achieved, but need to "edit" local
configuration file.
Impact on Cluster w/s Yes CPU/RAM/Disk
Load Balancing Yes Scheme where each w/s "bids" for job
Check Pointing No
Process Migration No
Job Monitoring and Yes & No No automatic rescheduling of jobs
Rescheduling
Suspension/Resumption Yes Supports all BSD signals
of Jobs
Resource Yes Local and global configuration files.
Administration
Job Runtime Limits No No built in
Forked Child No See above...
Management
Job Scheduling Yes
Priority
Process Management Yes w/s can be configured to provide exclusive or
shared access to jobs
Job Scheduling Control Yes
GUI/Command-line Both Motif based GUI
Ease of Use Unknown
User Allocation of Job Yes
User Job Status Query Yes
Job Statistics Yes Admin and user statistics
Runtime Configuration Yes Need to "edit" configuration file on each w/s
- static
Dynamic Resource Pool Yes Need to "edit" local configuration file to
add/withdrawn w/s.
Single Point of No Schedulers distributed - jobs are not
Failure guaranteed to complete.
Fault Tolerance Yes Jobs are not guaranteed to complete.
Security Issues Yes Normal Unix
Contact: Terry Graff
Packages Vendor Version
-------- ------ -------
Batch UCSF, USA 4.0
CCS - Computing Center Software Paderborn, Germany ?
Condor Wisconsin State University, 5.5.3
USA
DJM - Distributed Job Manager Minnesota Supercomputing ??
Center
DQS 3.x Florida State University, 3.1
USA
EASY Argonne National Lab, USA 1.0
far University of Liverpool, UK 1.0
MDQS ARL, USA ??
Generic NQS University of Sheffield, UK 3.4
Portable Batch System NASA Amass & LLNL, USA 1.1
PRM - Prospero Resource Manager University of Southern 1.0
California
QBATCH Vita Services Ltd., USA ??
Functionality Supported Comments
------------- --------- --------
Commercial/Research Research Unclear, IPR seems to belong to UCSF
Cost PD GNU Types license
User Support Yes email Author
Heterogeneous Yes
Platforms Many IBM RS/6000, Sparc, MIPS, Alpha
Operating Systems Many AIX 3.x, IRIX 3.x, 4.x, & 5.x, SunOS 4.x and
Ultrix 4.1
Additional No NFS
Hardware/Software
Batch Jobs Yes
Interactive Support Yes
Parallel Support No
Queue Types Yes Multiple Queues
Dispatching policy Configurable
Impact on w/s Owner Controllable
Impact on Cluster w/s Some Uses w/s CPU, memory and swap space
Load Balancing Some Load balanced queues
Check Pointing No
Process Migration No
Job Monitoring and Yes
Rescheduling
Suspension/Resumption Yes Start/suspend Jobs
of Jobs
Resource Some
Administration
Job Runtime Limits Yes
Forked Child No
Management
Job Scheduling Yes
Priority
Process Management Some Unix "nice"
Job Scheduling Control Yes
GUI/Command-line Yes Based on Forms library, Job Status and
removal.
Ease of Use Unknown
User Allocation of Job Some Control over Jobs on own workstation
User Job Status Query Yes
Job Statistics Yes
Runtime Configuration Some
Dynamic Resource Pool Unknown
Single Point of Unknown Multiple queues, but probably single
Failure administrative daemon
Fault Tolerance No Not mentioned in documentation
Security Issues Some Seems to be normal Unix
Contact: Scott R Presnell
Functionality Supported Comments
------------- --------- --------
Commercial/Research Research Paderborn Center for Parallel Computing
Cost Unknown Enquire
User Support Yes Telephone/Email -- Paderborn Center for Parallel
Computing (Germany)
Heterogeneous No Based around Parsytec Systems
Platforms Parsytec Sparc, Parsytec GC
Operating Systems Parix SunOS, Parix
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support Yes
Parallel Support Yes
Queue Types Yes
Dispatching policy Yes Static configuration, dynamically allocated
by user.
Impact on w/s Owner N/A
Impact on Cluster w/s Yes Resources, once allocated to a user may not be
released (2)
Load Balancing No
Check Pointing No
Process Migration No
Job Monitoring and No Dependent upon the user
Rescheduling
Suspension/Resumption Yes
of Jobs
Resource Administration Yes
Job Runtime Limits Yes
Forked Child N/A
Management
Job Scheduling Yes Administrator
Priority
Process Management No
Job Scheduling Control Yes Administrator configures resources
GUI/Command-line Command-line
Ease of Use Moderate
User Allocation of Job Yes
User Job Status Query Yes
Job Statistics Some Not extensive
Runtime Configuration Yes Dynamically configurable
Dynamic Resource Pool Yes Resources dynamically configured.
Single Point of Failure Yes Port and queues managers
Fault Tolerance No Jobs need to be re-submitted
Security Issues Yes Normal Unix security
Contact: Friedhelm Ramme
Functionality Supported Comments
------------- --------- --------
Commercial/Research Academic Wisconsin State University
Cost PD
User Support Some Telephone/Email/Mailing List
Heterogeneous Yes
Platforms Most Major DEC, HP, IBM, Sequent, SGI and Sun
Operating Systems Most Major OSF/1, Ultrix, HP-UX, AIX, Dynix, Irix, SunOS
and Solaris
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support ?
Parallel Support No PVM support planned
Queue Types Yes
Dispatching policy
Impact on w/s Owner Yes Controllable
Impact on Cluster w/s Yes Checkpoints to local w/s
Load Balancing
Check Pointing Yes Code must be relinked with Condor Libraries
Process Migration Yes Code must be relinked with Condor Libraries
Job Monitoring and
Rescheduling
Suspension/Resumption
of Jobs
Resource Yes
Administration
Job Runtime Limits Yes
Forked Child No
Management
Job Scheduling Yes
Priority
Process Management Yes
Job Scheduling Control Yes
GUI/Command-line Command-line
Ease of Use unknown
User Allocation of Job Yes
User Job Status Query Yes
Job Statistics Yes
Runtime Configuration Yes
Dynamic Resource Pool Yes But only when machines fall idle are they
added to the Condor pool
Single Point of Yes At the master scheduler level
Failure
Fault Tolerance Yes Restart job at last checkpoint.
Security Issues Yes Tries to maintain normal Unix security
Contact: Miron Livny
Functionality Supported Comments
------------- --------- --------
Commercial/Research Research Minnesota Supercomputing Center
Cost PD
User Support Limited
Heterogeneous No
Platforms Limited Front-end - Sun's and SGI's, back-end CM-2 and
CM-5
Operating Systems Limited
Additional CM-X
Hardware/Software
Batch Jobs Yes
Interactive Support Yes I/O redirected if running interactive as
batch
Parallel Support Yes
Queue Types Multiple Queues Jobs to run on CM partitions
Dispatching policy Yes No. of factors, including jobs size, partition
loading, time in queue, etc
Impact on w/s Owner No
Impact on Cluster w/s No
Load Balancing Yes
Check Pointing No
Process Migration No
Job Monitoring and Yes Restart job
Rescheduling
Suspension/Resumption Yes Restart job
of Jobs
Resource Yes Partition manager
Administration
Job Runtime Limits Yes To groups and users per partition
Forked Child No
Management
Job Scheduling Priority Yes
Process Management Yes
Job Scheduling Control Yes
GUI/Command-line Command-line NQS type interface.
Ease of Use Unknown
User Allocation of Job Limited Direct job to a queue
User Job Status Query Yes Command-line
Job Statistics Yes Limited
Runtime Configuration Yes Partitions and queues changed on "fly"
Dynamic Resource Pool Yes partitions can be recovered after crash
Single Point of Failure Yes Queue manager runs on w/s, but not dependent
on partitions being up!
Fault Tolerance Yes Will restart job - as long as queue manager
remains active
Security Issues Yes Normal Unix security.
Contact: Liz Stadther
Functionality Supported Comments
------------- --------- --------
Commercial/Research Research Supercomputer Computation Research Institute
(SCRI) at FSU
Cost PD
User Support Yes Telephone/Email support from SCRI + Mailing
List
Heterogeneous Yes
Platforms Most Major Digital, Intel, HP, SGI, IBM, and Sun
Operating Systems Most Major OSF/Ultrix, Linux, UX, Irix, Aix, SunOS and
Solaris
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support Yes Via BSD sockets
Parallel Support Yes PVM, P4 and P5
Queue Types Multiple Queue complexes defined by administrator
Dispatching policy Yes By queue and weighted queue
Impact on w/s Owner Minimal Can be configured not to affect owner at all
Impact on Cluster w/s Yes Especially if checkpointing is used.
Load Balancing Yes Configurable - CPU memory, job size etc.
Check Pointing Yes Must be relined with DQS library
Process Migration No
Job Monitoring and Yes
Rescheduling
Suspension/Resumption Yes Configurable
of Jobs
Resource Administration Yes
Job Runtime Limits Yes
Forked Child No
Management
Job Scheduling Priority Yes
Process Management Yes Will allow exclusive CPU access
Job Scheduling Control Yes
GUI/Command-line Both
Ease of Use Unknown
User Allocation of Job Yes Based on queues used
User Job Status Query Yes GUI
Job Statistics Yes
Runtime Configuration Yes By root or designated DQS manager via GUI
Dynamic Resource Pool Yes On the "fly" resource pool
Single Point of Minimal Multiple instances of Master scheduler
Failure
Fault Tolerance Yes Tries to complete job after crash
Security Issues Yes Normal Unix + AFS/Kerboros
Contact: Tom Green
Functionality Supported Comments
------------- --------- --------
Commercial/Research Research Argonne National Laboratory
Cost PD
User Support Limited Email support by author, when time permits. +
mailer group
Heterogeneous Limited Written in Perl, should be easy to "port"
Platforms Limited IBM (SP1 & SP2) and DEC Alpha
Operating Systems Limited Aix and OSF/1
Additional No Will work with AFS
Hardware/Software
Batch Jobs Yes
Interactive Support Yes I/O from batch jobs is delivered to user at
end of job - actual interactive login
Parallel Support Yes MPL, PVM, P4 and MPI - will basically support
any interface
Queue Types Single Configurable
Dispatching policy Yes Configurable
Impact on w/s Owner Yes Only normal Unix (nice)
Impact on Cluster w/s Yes CPU/Memory/Swap
Load Balancing No
Check Pointing No
Process Migration No
Job Monitoring and Yes/No Jobs that fail are not rescheduled
Rescheduling
Suspension/Resumption No
of Jobs
Resource Yes By administrator - configurable
Administration
Job Runtime Limits Yes Runtime limits are put on a users access to
the resources allocated
Forked Child No Not applicable as user has exclusive access
Management to node.
Job Scheduling Yes
Priority
Process Management No Each user has exclusive access to the nodes
allocated to them
Job Scheduling Control Yes
GUI/Command-line Command-line GUI planned
Ease of Use Unknown
User Allocation of Job Limited Direct job to a queue
User Job Status Query Yes Command-line
Job Statistics Yes Limited - extensions planned
Runtime Configuration Yes Queue needs to be turned off, reconfigured
and turned back on.
Dynamic Resource Pool Yes
Single Point of Yes Scheduler need to write to filesystem
Failure
Fault Tolerance No Jobs need to be rescheduled after a failure.
Security Issues Yes Normal Unix security.
[Editor's Note: the next article update will include the new EASYLL, a combination of EASY and LoadLeveler.]
Contact: David Lifka
Cornell Theory Center
Frank H.T. Rhodes Hall
Hoy Road, Cornell University
Ithaca, NY 14853-3801, USA
Tel: +1 (607) 254-8621
Fax:
Email: lifka@tc.cornell.edu
Systems Supported:
IBM SP1 and SP2
DEC Alpha Farm
Functionality Supported Comments
------------- --------- --------
Commercial/Research Research University of Liverpool, UK
Cost PD
User Support Yes Telephone/Email - UK JISC funded support
Heterogeneous Some Only implemented on Sun's at moment
Platforms Several Sun - SGI and HP (beta)
Operating Systems Several SunOS and Solaris Irix and HP-UX beta)
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support Yes BSD Sockets
Parallel Support Yes NAS-HPF & PVM
Queue Types No
Dispatching policy Yes Automatic or manual
Impact on w/s Owner Configurable Console user has priority - non "owner" jobs
killed by default
Impact on Cluster w/s Yes CPU/RAM/diskspace
Load Balancing Limited Based on w/s loads - can be manually over
ridden.
Check Pointing No Deliberately omitted
Process Migration No Deliberately omitted
Job Monitoring and No User initiated (manual)
Rescheduling
Suspension/Resumption No User initiated (manual)
of Jobs
Resource Administration Some Static database
Job Runtime Limits No Normal Unix can be invoked manually
Forked Child Management No
Job Scheduling Priority No
Process Management No
Job Scheduling Control No
GUI/Command-line Command-line
Ease of Use Unknown
User Allocation of Job Yes
User Job Status Query Yes User initiated (manual)
Job Statistics No Administrator can add Unix statistics, but
not standard
Runtime Configuration Limited Editing far database files
Dynamic Resource Pool Limited Required editing master database
Single Point of Failure Yes Master daemon
Fault Tolerance No When Master deamon dies, far needs reboot.
Security Issues Yes Normal Unix
Contact: J Steve Morgan
Functionality Supported Comments
------------- --------- --------
Commercial/Research Research University of Sheffield
Cost PD GNU License
User Support Yes Supported by the University of Sheffield
(JISC-NTI) until July '96
Heterogeneous Yes
Platforms Most Major IBM, Fujitsu, HP, SGI, Intel, NCR, Sun, DEC &
Cray
Operating Systems Most Major AIX, UXP/M, HP-UX, IRIX, Linux/, Solaris,
SunOS, Ultrix, OSF/1 & UNICOS
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support Yes NQS nets to be configured to send stdin/err
to file.
Parallel Support No Not yet.
Queue Types Yes One on each server, minimal Unix
configuration.
Dispatching policy Static Each queue knows about its own load and
performance
Impact on w/s Owner Yes Queues can be day/night sensitive, owner can
"nice" jobs
Impact on Cluster w/s Yes CPU/RAM//Diskspace
Load Balancing Static Master scheduling (option), only knows perf.
and load at each queue
Check Pointing No
Process Migration No
Job Monitoring and Yes
Rescheduling
Suspension/Resumption Yes Supports normal Unix signals
of Jobs
Resource Administration Yes Normal Unix
Job Runtime Limits Yes Normal Unix
Forked Child Management No
Job Scheduling Priority Yes
Process Management Yes Manage the number of jobs run (one or many)
Job Scheduling Control Yes
GUI/Command-line Command-line WWW based interface planned.
Ease of Use Unknown
User Allocation of Job Yes
User Job Status Query Yes
Job Statistics Yes Amount depends on the platform running job
Runtime Configuration Yes Dynamic
Dynamic Resource Pool Yes Dynamic
Single Point of Failure Yes/No Yes if master scheduler, No if just configured
with "peer" queues.
Fault Tolerance Yes Queue will try run and complete job after
crash
Security Issues Yes Normal Unix
Contact: Stuart Herbert
Functionality Supported Comments ------------- --------- -------- Commercial/Research Research Ballistics Research Laboratory Cost PD User Support Some Heterogeneous Yes Platforms Some Sun Sparc +?? Operating Systems Some SunOS 4.1.x, BSD 4.2, SYS III & V Additional No Hardware/Software Batch Jobs Yes Interactive Support Yes Redirected I/O and sockets ? Parallel Support No Queue Types Multiple Dispatching policy Unknown Impact on w/s Owner Unknown Impact on Cluster w/s Normal Load Balancing Probably Master scheduler probably does this task Check Pointing No Process Migration No Job Monitoring and unknown/Yes On failure job will be rescheduled Rescheduling Suspension/Resumption Yes BSD signals of Jobs Resource Administration Yes Via multiple queues Job Runtime Limits Probably No known if enforced Forked Child Management No Job Scheduling Priority Probably Process Management Unknown Job Scheduling Control Probably GUI/Command-line Command-line Ease of Use Unknown User Allocation of Job Yes User Job Status Query Probably Job Statistics Probably Runtime Configuration Unknown Dynamic Resource Pool Unknown Single Point of Yes Failure Fault Tolerance Yes On reboot, system will attempt to re-run jib. Security Issues Yes Normal UnixContact: Mike Muuss
Functionality Supported Comments
------------- --------- --------
Commercial/Research Research NASA Ames & LLNL
Cost PD
User Support Yes Telephone/Email/Mailing List - NASA Ames
Heterogeneous Yes
Platforms Multiple Sun, SGI, IBM, Intel, Thinking Machines, Cray
Platforms
Operating Systems Multiple OS/s SunOS, Solaris, Aix, Intel-OSF/1, CMOST,
Unicos
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support Yes stdin/stderr via Xterm
Parallel Support Yes On Parallel machines - interfaces
Queue Types Multiple Definable by administrator
Dispatching policy Configurable
Impact on w/s Owner Yes Configurable - mouse/keyboard/resources
available
Impact on Cluster w/s Yes CPU/RAM/Swap
Load Balancing Yes
Check Pointing No Vendor specific - reliant on Posix 1003.1a
Process Migration No
Job Monitoring and Yes
Rescheduling
Suspension/Resumption Yes
of Jobs
Resource Yes
Administration
Job Runtime Limits Yes
Forked Child Management No
Job Scheduling Yes
Priority
Process Management Yes Configure for exclusive CPU usage
Job Scheduling Control Yes
GUI/Command-line Command-line Tcl/Tk planned for next release
Ease of Use Unknown
User Allocation of Job Yes To specific queue
User Job Status Query Yes
Job Statistics Yes
Runtime Configuration Yes
Dynamic Resource Pool Yes
Single Point of Unsure Unsure about master scheduler...
Failure
Fault Tolerance Minimised Restart jobs after failure - not sure about
crashes
Security Issues Yes Normal Unix trusted clients and Kerboros type
authorisation
Contact: Dave Tweten
Functionality Supported Comments
------------- --------- --------
Commercial/Research Research Information Sciences Institute, University of
S. California
Cost PD Free to non-commercial sites (License
Agreement)
User Support Yes Telephone/Email/WWW
Heterogeneous Yes but ! Only supports two platforms
Platforms Sun & HP Sun and HP
Operating Systems Yes SunOS and HP-UX
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support Yes I/O redirected to users terminal
Parallel Support Yes CMMD, PVM, and Express - MPI planned
Queue Types Yes
Dispatching policy No
Impact on w/s Owner No Configurable + migrate processes off w/s
Impact on Cluster w/s Yes Diskspace and Network
Load Balancing No
Check Pointing Yes Taken from Condor
Process Migration Yes Taken from Condor
Job Monitoring and Yes/No No automatic rescheduling.
Rescheduling
Suspension/Resumption Some Processes can be suspended individually.
of Jobs
Resource Administration Yes Through "system manager" software
Job Runtime Limits No None imposed
Forked Child Management No
Job Scheduling No
Priority
Process Management Yes/No Yes, but when w/s owner returns job will be
suspended and migrated
Job Scheduling Control No
GUI/Command-line command-line
Ease of Use Unknown
User Allocation of Job Yes Using job configuration file
User Job Status Query Yes
Job Statistics Some
Runtime Configuration Yes
Dynamic Resource Pool Yes
Single Point of Yes/No Failed resources will be acquired by other
Failure system managers
Fault Tolerance No
Security Issues Yes Normal Unix, plans for Kerboros type
authentication
Contact: Santosh Roa
Functionality Supported Comments
------------- --------- --------
Commercial/Research Research Alan Saunders
Cost PD
User Support None
Heterogeneous Yes Limited multi-platform support
Platforms Several Sun, DEC and IBM
Operating Systems Several SunOS, Ultrix and Aix
Additional No
Hardware/Software
Batch Jobs Yes
Interactive Support Yes Send to queue monitor, but can be
reconfigured.
Parallel Support No
Queue Types Multiple One or more queues per server
Dispatching policy Static Configured when queue is started
Impact on w/s Owner Yes Queue can be configured with low-priority
Impact on Cluster w/s Yes CPU/Memory/Diskspace
Load Balancing Static Seems necessary send jobs to specific queues
Check Pointing No
Process Migration No
Job Monitoring and Yes
Rescheduling
Suspension/Resumption Yes Normal BSD signals supported
of Jobs
Resource Administration Static Configured at startup time.
Job Runtime Limits Yes Unix limits supported
Forked Child No
Management
Job Scheduling Yes Prioritise queues
Priority
Process Management Probably
Job Scheduling Control Yes
GUI/Command-line command-line
Ease of Use unknown
User Allocation of Job Yes
User Job Status Query Yes
Job Statistics Yes Amount unknown, but probably normal Unix
accounting.
Runtime Configuration Static
Dynamic Resource Pool Static
Single Point of No Multiple queues - each knows nothings about
Failure the others
Fault Tolerance Yes Machines crashes jobs in queue re-run
Security Issues Yes Normal Unix
Contact: Alan Saunders
The remaining five systems: Codine, Connect:Queue, LoadLeveler, LSF and NQE, are functionally very similar. These five packages all support parallel jobs, but, at present LoadLeveler(3) only supports parallel jobs on IBM SP2 systems rather than workstation clusters, thus eliminating it from our shortlist. The other packages all support PVM but none mention future support for MPI and HPF.
If checkpointing is deemed to be important then Connect:Queue, LSF and NQE are eliminated from the shortlist. The authors of this report take the view that checkpointing is a useful additional feature rather than being highly desirable.
If security is a major issue then only LSF and NQE have security features over and above normal Unix features. Both packages support Kerboros type user authentication, but NQE additionally has a US DoD security rating.
Of the remaining packages, Connect:Queue, LSF and NQE all use multiple master schedulers to minimise problems associated with Single Points of Failure - and a similar feature is planned for the next release of Codine. All the packages claim to maximise their resilience and fault tolerance.
Conclusion
Four packages (3) remain on our shortlist (Codine, Connect:Queue, LSF and NQE) out of the original seven. It is difficult to reduce the list further without installing and detailed testing of the functionality, robustness and stability of each package. However, an additional consideration is the cost of a site license and software maintenance support for each package.
Five packages (Batch, Condor, GNQS, MDQS and Qbatch) do not support parallel jobs which these authors deem to be highly desirable. Support for parallel jobs is planned in both Condor and GNQS.
Three packages, EASY, far and PRM have limited functionality under the Job Scheduling and Allocation Policy section (see section 2.4). EASY, at present, has no concept of load balancing, however, it is planned in a future release. far is able to accomplish some load balancing, but it appears to be rather crude (based on rup) and its usefulness on a large diverse cluster would be limited. PRM has no dispatching or load balancing capability, even though it is capable of migrating processes. All three packages are incapable of suspending, resuming or rescheduling jobs without intervention at the processes level by a systems administrator.
Conclusion
Two packages remain on the selection list (DQS and PBS) out of the original twelve. It is difficult to reduce the list further without actually installing and practically testing the functionality, robustness and stability of each package. A key factor in choosing between these two remaining packages would be the maturity of the software, how widespread is its usage and how much user support could be expected from the authors or supporting site.
If sequential jobs, rather than parallel ones, are the overriding concern then Condor and GNQS should be the packages of choice. GNQS in particular is a mature, robust and widely used package. Support for it can be found at a number of sites, including CERN and the University of Sheffield.
(3)
Authors Footnote - LoadLeveler - 13 June 1996 - Mark Baker mab@npac.syr.edu
Since this article was written there have been several revisions of
LoadLeveler from IBM.
Loadleveler 1.2.x supports parallel jobs running on heterogeneous
distributed systems. It should be noted that this functionality would
mean that if this review was rewritten today LoadLeveler would be one of
the commercial packages shortlisted in this section.
| <- PREV | Index | Next -> |
NHSE ReviewTM: Comments
· Archive
· Search
NHSE: Software Catalog
· Roadmap