In order to compile the program type:
gustav@jupiter:~/mpi/advanced 235 $ hcc -o ninth ninth.c -lmpi -lm gustav@jupiter:~/mpi/advanced 236 $
To run the program type:
gustav@jupiter:~/mpi/advanced 226 $ mpirun -w N `pwd`/ninth -- `uname -n` Greetings to the master (jupiter, 5) from (cisr, 0) Greetings to the master (jupiter, 5) from (mercury, 1) Greetings to the master (jupiter, 5) from (venus, 2) Greetings to the master (jupiter, 5) from (earth, 3) Greetings to the master (jupiter, 5) from (mars, 4) Greetings to the master (jupiter, 5) from (saturn, 6) Greetings to the master (jupiter, 5) from (uranus, 7) Greetings to the master (jupiter, 5) from (bacchus, 8) Greetings to the master (jupiter, 5) from (ceres, 9) Greetings to the master (jupiter, 5) from (diana, 10) Greetings to the master (jupiter, 5) from (minerva, 11) Greetings to the master (jupiter, 5) from (vesta, 12) 76 particles per processor total number of particles = 988 offsets: 0 76 152 228 304 380 456 532 608 684 760 836 912 Communicating = 2.80414 seconds particles[offsets[i]].x: 0.17083 0.04163 0.91243 0.78323 0.65404 0.52484 0.39564 0.26644 0.13725 0.00805 0.87885 0.74965 0.62046 done my job in 4.27236 seconds, waiting for slow processes... evaluated 975156 3D interactions in 10.20670 seconds gustav@jupiter:~/mpi/advanced 227 $
Observe the following interesting point. The master process writes
on standard output that it did its own job in 4.27236 seconds.
There is a call to MPI_Barrier
at the end of the program,
and that barrier is crossed only when all processes
will have finished their computations. Thus the master process
has to wait for slower processes nearly 6 seconds, before the
program can finally exit.
If during that time we issue a LAM command state N
in another window we can see the following:
gustav@jupiter:~/sph/src/sph11 226 $ state N NODE INDEX PID KPRI KSTATE PROGRAM n0 [7] 14821 0 BR (MPI_Barrier) ninth n1 [7] 22474 0 BR (MPI_Barrier) ninth n2 [7] 2065 0 BR (MPI_Barrier) ninth n3 [7] 11730 0 BR (MPI_Barrier) ninth n4 [7] 8125 0 BR (MPI_Barrier) ninth n5 (o) [8] 10341 0 BR (MPI_Barrier) ninth n6 [7] 1002 0 BR (MPI_Barrier) ninth n7 [7] 14609 0 BR (MPI_Barrier) ninth n8 [7] 7965 0 R (-25, 1) ninth n9 [7] 2860 0 R (-25, 1) ninth n10 [7] 7571 0 R (-25, 1) ninth n11 [7] 7285 0 BR (MPI_Barrier) ninth n12 [7] 1777 0 BR (MPI_Barrier) ninth gustav@jupiter:~/sph/src/sph11 227 $The three slow nodes are number 8, 9, and 10, bacchus, ceres, and diana. They are all SPARCstations 1 running at 20 MHz with Weitek 3170-based FPUs, which also run at 20 MHz. Other machines are Suns ELC, whose clock rate is 33 MHz and which have Weitek 8601 based FPUs running also at 33 MHz, one Sun 670 MP with a 40 MHz CPU and a Cypress CY7C602-based FPU running at 40 MHz, and one Sun LX with a 50 MHz TI TMS390S10-based SuperSPARC.
There is therefore a large performance gap (50 / 20) between the nodes, with nodes number 8, 9, and 10, being indeed the slowest of the lot. And it shows.