NHSE ReviewTM 1997 Volume First Issue

A Survey of MPI Implementations

| <- Prev | Index | Next -> |
NHSE ReviewTM: Comments · Archive · Search


Sun

The Sun implementation of MPI is quite recent. Version 2 is in beta release as of this writing and should be generally available soon. Version 1 was a repackaged MPICH.

Sun MPI is derived from MPICH. For version 2, it has been integrated with a new Sun HPC environment and optimized for SCI, though it can run over any network using TCP. The Sun HPC environment is layered software that includes parallel job management. Users can launch (tmrun), examine (tmps) or kill (tmkill) parallel jobs. There is considerable flexibility in specifying where jobs are started, how standard input and output should be handled (approximately the same as the functionality in IBM MPI, plus a bit more), etc.

Because of the early status of this implementation, it is difficult to assess its strengths and weaknesses. There are two potentially exciting developments in the Sun implementation. The first of these is the Prism debugger. This debugger, originally developed by Thinking Machines Corporation, won nearly universal praise for its excellent user interface, and for its built-in performance and visualization features. In its new incarnation, Prism can debug HPF applications as well as MPI applications. The second exciting feature is the MPI-2 I/O library. This reviewer was not able to test this MPI-2 functionality, or the parallel file system associated with it, but it appears that much of the work has been done. Providing this functionality put Sun, which has historically lagged other vendors in MPI support, in a leading position with respect to MPI development.

On the other hand, robustness of the HPC package has not yet been demonstrated. A quick test of the beta version of the software revealed bugs: for instance, breakpoints sometimes weren't displayed correctly by Prism, and the software became confused about what parallel jobs were running. It does not appear that the process management software will be able to do a good job of process placement, and there is no good batch queuing system. There is no special scheduling of parallel jobs, such as gang scheduling or a mechanism to dedicate processors to a single application. Load Sharing Facility (LSF), which ships with Sun HPC software, is not integrated with Sun process management tools and has not yet shown itself to be effective at managing parallel jobs in any case. Also, since Sun MPI is derived from an earlier version of MPICH, the implementation is not compliant with respect to MPI_Cancel and Fortran constants, as described above.

Copyright © 1996


| <- Prev | Index | Next -> |
NHSE ReviewTM: Comments · Archive · Search