next up previous
Next: ninth.c in detail Up: Lesson 6 -- Previous: The synopsis of

Compiling and running ninth.c under LAM

In order to compile the program type:

gustav@jupiter:~/mpi/advanced 235 $ hcc -o ninth ninth.c -lmpi -lm
gustav@jupiter:~/mpi/advanced 236 $

To run the program type:

gustav@jupiter:~/mpi/advanced 226 $ mpirun -w N `pwd`/ninth -- `uname -n`
Greetings to the master (jupiter, 5) from (cisr, 0)
Greetings to the master (jupiter, 5) from (mercury, 1)
Greetings to the master (jupiter, 5) from (venus, 2)
Greetings to the master (jupiter, 5) from (earth, 3)
Greetings to the master (jupiter, 5) from (mars, 4)
Greetings to the master (jupiter, 5) from (saturn, 6)
Greetings to the master (jupiter, 5) from (uranus, 7)
Greetings to the master (jupiter, 5) from (bacchus, 8)
Greetings to the master (jupiter, 5) from (ceres, 9)
Greetings to the master (jupiter, 5) from (diana, 10)
Greetings to the master (jupiter, 5) from (minerva, 11)
Greetings to the master (jupiter, 5) from (vesta, 12)
76 particles per processor
total number of particles = 988
offsets: 0 76 152 228 304 380 456 532 608 684 760 836 912 
Communicating =  2.80414 seconds
particles[offsets[i]].x:  0.17083  0.04163  0.91243  0.78323  0.65404  0.52484
0.39564  0.26644  0.13725  0.00805  0.87885  0.74965  0.62046 
done my job in  4.27236 seconds, waiting for slow processes...
evaluated 975156 3D interactions in 10.20670 seconds
gustav@jupiter:~/mpi/advanced 227 $

Observe the following interesting point. The master process writes on standard output that it did its own job in 4.27236 seconds. There is a call to MPI_Barrier at the end of the program, and that barrier is crossed only when all processes will have finished their computations. Thus the master process has to wait for slower processes nearly 6 seconds, before the program can finally exit.

If during that time we issue a LAM command state N in another window we can see the following:

gustav@jupiter:~/sph/src/sph11 226 $ state N
NODE      INDEX  PID      KPRI     KSTATE               PROGRAM
n0        [7]    14821    0        BR (MPI_Barrier)     ninth
n1        [7]    22474    0        BR (MPI_Barrier)     ninth
n2        [7]    2065     0        BR (MPI_Barrier)     ninth
n3        [7]    11730    0        BR (MPI_Barrier)     ninth
n4        [7]    8125     0        BR (MPI_Barrier)     ninth
n5 (o)    [8]    10341    0        BR (MPI_Barrier)     ninth
n6        [7]    1002     0        BR (MPI_Barrier)     ninth
n7        [7]    14609    0        BR (MPI_Barrier)     ninth
n8        [7]    7965     0        R  (-25, 1)          ninth
n9        [7]    2860     0        R  (-25, 1)          ninth
n10       [7]    7571     0        R  (-25, 1)          ninth
n11       [7]    7285     0        BR (MPI_Barrier)     ninth
n12       [7]    1777     0        BR (MPI_Barrier)     ninth
gustav@jupiter:~/sph/src/sph11 227 $
The three slow nodes are number 8, 9, and 10, bacchus, ceres, and diana. They are all SPARCstations 1 running at 20 MHz with Weitek 3170-based FPUs, which also run at 20 MHz. Other machines are Suns ELC, whose clock rate is 33 MHz and which have Weitek 8601 based FPUs running also at 33 MHz, one Sun 670 MP with a 40 MHz CPU and a Cypress CY7C602-based FPU running at 40 MHz, and one Sun LX with a 50 MHz TI TMS390S10-based SuperSPARC.

There is therefore a large performance gap (50 / 20) between the nodes, with nodes number 8, 9, and 10, being indeed the slowest of the lot. And it shows.



Zdzislaw Meglicki
Tue Feb 28 15:07:51 EST 1995