NHSE Review TM 1996 Volume First Issue

Cluster Management Software -- People Talk Back (Comments)


Article Updates

People Talk Back (Comments)

Index of Comments, by Subject

  1. Cluster Management Software: LSF: vendor
  2. Cluster Management Software: MDQS: vendor
  3. Cluster Management Software: CCS: vendor

Comments, by index number

  1. Cluster Management Software: LSF
    26-Jul-96
    
       In 4.3: A Discussion About the CMS Packages Reviewed
    
    "If checkpointing is deemed to be important then Connect:Queue, LSF
    and NQE are eliminated", is not true. Checkpointing has been supported
    by LSF since LSF 2.0, both at kernel and at user level. In the
    upcoming LSF 3.0, we are even supporting check-pointing for parallel
    jobs! (MPI based).
    
       Please also correct section 4.1.5: 
    
    >Check Pointing          No             Planned in future releases
    
       We support checkpointing. And will support it for even parallel jobs.
    
    >User Allocation of Job  Yes            Limited
    
       I don't understand "limited". Users can say 
    
            -R "type=hppa && swap>20 && !fileserver"
    
       No other systems provides up to 32 load indices for user to specify.
    
     Thanks
    
    Jingwen Wang
    
  2. Cluster Management Software: MDQS
    26-Jul-96
    
    It was most gratifying to see that the 1982 paper on MDQS that Doug
    Kingston and I wrote was given such substantial coverage in your
    "Cluster Management Software" review.
    
    The two main MDQS papers have both been converted to HTML and are
    available on the Web at:
    
    http://ftp.arl.mil/~mike/papers/82usenix-mdqs/
    
    There are still thousands of computers here at the Lab running MDQS.
    Support versions include:
    
    As for lineage, MDQS was the inspiration for NQS, developed for NASA
    Ames by Stirling Software, and the main queueing system on Cray
    computers for many years.
    
    While MDQS does offer over-the-network queueing of jobs, it does not
    provide the level of visibility and control that large computer
    centers require these days.  However, the simplicity and portability
    of the software have given it a long and productive life.
    
    Best of luck with your new electronic journal.  From the quality of
    the first issue it promises to be a success!
    
            Sincerely,
            Michael John Muuss
            Senior Computer Scientist
            The U. S. Army Research Laboratory
    
    http://ftp.arl.mil/~mike/
    
  3. Cluster Management Software: CCS
    01-Aug-96
    
    Could you please update the following information concerning CCS (v2.0)
    
    1) Please update the CCS short description in chapter 3.3.2 Computing Center Software:

    The Computing Center Software (CCS) is a Metacomputer Management Software [1000]. It provides a homogeneous interface to a network of MPP systems.

    CCS is a distributed software package [16] for the management of parallel high performance computing systems. It establishes a seamless environment with transparent access to a pool of parallel machines. CCS is responsible for user authorization and accounting, for serving a mixture of interactive and batch jobs, and for optimized request scheduling [17 & 1001]. It provides an automatic reservation system and supports unstable WAN connections.

    CCS is implemented as a multi-agent software, operating in the Unix front-end area of the HPC systems to be managed. It was designed to be as independent as possible of both, the machines managed and their operating systems.

    Main Features of CCS include:

    2) Please add to the references:

    1000: Friedhelm Ramme, Building a Virtual Machine-Room - a Focal Point in Metacomputing, Future Generation Computer Systems (FGCS), Elsevier Science B.V., Aug. 1995, Special Issue on HPCN, Vol 11, pp. 477-489

    1001: Jörn Gehring, Friedhelm Ramme, Architecture-Independent Request-Scheduling with Tight Waiting-Time Estimations, Proc. IEEE IPPS Workshop on Job Scheduling Strategies for Parallel Processing, April 1996, Hawaii , pp. 41-54;
    to be published as well by Springer-Verlag in Lecture Notes in Computer Science

    3) Please update chapter 4.2.2

    Platforms               Sun, Parsytec   Sparc, Parsytec GCel, GCPP, SC320, PowerXPlorer
    
    Operating Systems       Many            SunOS, Solaris, Parix
    
    Parallel Support        Yes             PVM, MPI, Parmacs, Inmos Toolsets
    
    Job Scheduling          Yes             Several Algorithms (FCFS*, FFDH, FFIH, IVS)
    
    User Allocation of Job  Yes             Automatic reservation system
    
    Backend: Parsytec -GCel, -GCPP, PowerXPlorer
    
    Thank you in advance
    
       Friedhelm Ramme