second.c in detail -- data reductions

Next: third.cor second.c Up: Lesson 2 -- Previous: Compiling and running

second.c in detail -- data reductions

We have already discussed MPI_Bcast in section 3.4.3. Thus the functioning of the program up to MPI_Bcast should be clear. The computation itself is straightforward. Every process finds the length of the subinterval, initialises sum to 0, and adds up the values of f(x) which correspond to its subintervals. The only slighly confusing thing in this computation may be that instead of working on one simply connected subset of the domain, each process jumps over pool_size of other subintervals. Thus the subintervals a given process works on are interleaved with subintervals belonging to other processes.

At the end of this computation each process puts its own area under the curve of f(x) in mypi. All processes then call

MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, host_rank,
           MPI_COMM_WORLD);

This operation is discussed in ``Using MPI...'' on page 25, and in ``MPI: A Message-Passing Interface Standard'' on page 111, section 4.9.

All processes which issue this operation, pack their data into mypi and send it out to a process whose rank is given by the 6th argument: host_rank in our case. The type of data put in mypi is given by the 4th argument and the number of data items in mypi is given by the third argument.

The process number host_rank receives the data from all processes (including itself), and performs a reduction operation on that data. The reduction operation is defined by the 5th argument. Here it is MPI_SUM, i.e., a summation. The result is placed in the second argument, pi. Only processes which belong to the communicator MPI_COMM_WORLD, specified by the last argument to MPI_Reduce, participate in the transaction.

MPI does not actually specify who is to perform the final summation. On some supercomputers, there may be a special circuitry within the network itself for doing such things. For example on the Connection Machine CM5, integer reductions are performed by the network, whereas floating point reductions are peformed by the destination node. CM5 network reductions are much faster than node reductions. But on the farms, it is almost certain that the reduction will be performed by the destination node.

There is a large number of predefined reduction operations, for example MPI_SUM used above, MPI_PROD, MPI_MIN, MPI_MAX, etc. These are discussed on page 113 of ``MPI: A Message-Passing Interface Standard'', section 4.9.2. The programmer can define her own reduction operations using function MPI_Op_create, discussed in section 4.9.4, page 118.

Next: third.cor second.c Up: Lesson 2 -- Previous: Compiling and running

Zdzislaw Meglicki
Tue Feb 28 15:07:51 EST 1995