NHSE Review 1997 Volume First Issue: A Survey of MPI Implementations -- Introduction and History

NHSE Review^TM 1997 Volume First Issue

A Survey of MPI Implementations

| <- Prev | Index | Next -> |
NHSE Review^TM: Comments · Archive · Search

The `ch_p4` device

This is the ``network of workstations'' implementation of MPICH. P4 (Portable Programs for Parallel Processors) is an older message passing library that was used to implement the MPICH ADI[9]. The ``ch'' in ``ch_p4'' stands for ``channel.'' The ADI is in fact implemented in terms of a simpler ``channel'' interface, and the channel interface is implemented in terms of P4. The layering is not strict.

The ch_p4 device is characterized by the following.

P4 runs on Sun/SunOS, Sun/Solaris, Solaris86, Cray, HP, Dec 5000, Dec Alpha, Next, IBM RS6000, Linux86, FreeBSD, IBM3090, SGI (5, 6), and others.
The device uses process-to-process sockets, for processes not on the same host, or shared memory (using the ``-comm=shared'' configuration flag), for processes on the same host.
The user provides a list of programs and machines to start them on in a P4 ``procgroup'' file. P4 starts remote processes using rsh (or optionally, using a ``secure server'' that provides faster startup). I/O and signal propagation are handled by rsh. P4 processes start a ``listener'' subprocess that helps to establish process-to-process connections if there aren't enough TCP connections to fully connect the MPI application.
An interesting feature of the ch_p4 device is that the user starts up a single process, and that process starts the other MPI processes inside MPI_Init. To do this, ch_p4 relies on the argc and argv arguments to MPI_Init.
The ch_p4 device handles heterogeneous MPI applications -- applications with processes running on more than one architecture. Data representation conversion, if needed, is automatically performed.

While the ch_p4 device provides a way to run MPICH on networks of workstations, it is not very friendly to users.

The procgroup file is difficult to work with and the documentation is not easy to find. Fortunately the complexity is often hidden behind local utilities or an ``mpirun'' command.
There is no concept of a ``virtual machine.'' Unlike PVM, the network of workstations used by an application is defined by where the application is running, not by an infrastructure that exists before and persists afterwards. Consequently, there are no ``ps'' or ``kill'' equivalents that understand parallel jobs, and no automatic way to examine the state of remote nodes or perform load balancing. The lack of such infrastructure also contributes to the signal propagation and I/O problems described below. In some cases, the lack of machine state is a bonus, particularly when MPI programs are started automatically by a batch system.
Because signal propagation is managed through rsh, it is very easy to end up with ``orphaned'' processes that don't realize the rest of an application has gone away. These orphaned processes often interfere with the running of subsequent parallel jobs and are difficult to find.
Because standard I/O relies on rsh, output from remote nodes is often heavily buffered, and doesn't appear on the screen until well after it is written. This can make debugging with printf very difficult.

| <- Prev | Index | Next -> |
NHSE Review^TM: Comments · Archive · Search

NHSE ReviewTM 1997 Volume First Issue

A Survey of MPI Implementations

The ch_p4 device

NHSE Review^TM 1997 Volume First Issue

The `ch_p4` device