26-Jul-96 In 4.3: A Discussion About the CMS Packages Reviewed "If checkpointing is deemed to be important then Connect:Queue, LSF and NQE are eliminated", is not true. Checkpointing has been supported by LSF since LSF 2.0, both at kernel and at user level. In the upcoming LSF 3.0, we are even supporting check-pointing for parallel jobs! (MPI based). Please also correct section 4.1.5: >Check Pointing No Planned in future releases We support checkpointing. And will support it for even parallel jobs. >User Allocation of Job Yes Limited I don't understand "limited". Users can say -R "type=hppa && swap>20 && !fileserver" No other systems provides up to 32 load indices for user to specify. Thanks Jingwen Wang
26-Jul-96 It was most gratifying to see that the 1982 paper on MDQS that Doug Kingston and I wrote was given such substantial coverage in your "Cluster Management Software" review. The two main MDQS papers have both been converted to HTML and are available on the Web at: http://ftp.arl.mil/~mike/papers/82usenix-mdqs/ There are still thousands of computers here at the Lab running MDQS. Support versions include:
As for lineage, MDQS was the inspiration for NQS, developed for NASA Ames by Stirling Software, and the main queueing system on Cray computers for many years. While MDQS does offer over-the-network queueing of jobs, it does not provide the level of visibility and control that large computer centers require these days. However, the simplicity and portability of the software have given it a long and productive life. Best of luck with your new electronic journal. From the quality of the first issue it promises to be a success! Sincerely, Michael John Muuss Senior Computer Scientist The U. S. Army Research Laboratory http://ftp.arl.mil/~mike/
01-Aug-96 Could you please update the following information concerning CCS (v2.0)1) Please update the CCS short description in chapter 3.3.2 Computing Center Software:
The Computing Center Software (CCS) is a Metacomputer Management Software [1000]. It provides a homogeneous interface to a network of MPP systems.
CCS is a distributed software package [16] for the management of parallel high performance computing systems. It establishes a seamless environment with transparent access to a pool of parallel machines. CCS is responsible for user authorization and accounting, for serving a mixture of interactive and batch jobs, and for optimized request scheduling [17 & 1001]. It provides an automatic reservation system and supports unstable WAN connections.
CCS is implemented as a multi-agent software, operating in the Unix front-end area of the HPC systems to be managed. It was designed to be as independent as possible of both, the machines managed and their operating systems.
Main Features of CCS include:
2) Please add to the references:
1000:
Friedhelm Ramme,
Building a Virtual Machine-Room - a Focal Point in Metacomputing,
Future Generation Computer Systems (FGCS), Elsevier Science B.V., Aug. 1995,
Special Issue on HPCN, Vol 11, pp. 477-489
1001:
Jörn Gehring, Friedhelm Ramme,
Architecture-Independent Request-Scheduling with Tight Waiting-Time Estimations,
Proc. IEEE IPPS Workshop on Job Scheduling Strategies for Parallel Processing,
April 1996, Hawaii , pp. 41-54;
to be published
as well by Springer-Verlag in Lecture Notes in Computer Science
3) Please update chapter 4.2.2
Platforms Sun, Parsytec Sparc, Parsytec GCel, GCPP, SC320, PowerXPlorer Operating Systems Many SunOS, Solaris, Parix Parallel Support Yes PVM, MPI, Parmacs, Inmos Toolsets Job Scheduling Yes Several Algorithms (FCFS*, FFDH, FFIH, IVS) User Allocation of Job Yes Automatic reservation system Backend: Parsytec -GCel, -GCPP, PowerXPlorer Thank you in advance Friedhelm Ramme