%T Configurable Collective Communication in LAM\-MPI
%A John Markus Bjørndalen, Otto J. Anshus, Tore Aarsen, Brian Vinter
%E James S. Pascoe, Roger J. Loader, Vaidy S. Sunderam
%B Communicating Process Architectures 2002
%X In another paper, we observed that PastSet (our experimental
tuple space system) was 1.83 times faster on global
reductions than LAM\-MPI. Our hypothesis was that this was
due to the better resource usage of the PATHS framework (an
extension to PastSet that supports orchestration and
configuration) due to a mapping of the communication and
operations which matched the computing resources and cluster
topology better. This paper reports on an experiment to
verify this and represents on\-going work to add some of the
same configurability of PastSet and PATHS to MPI. We show
that by adding run\-time configurable collective
communication, we can reduce the latencies without
recompiling the application source code. For the same
cluster where we experienced the faster PastSet, we show
that Allreduce with our configuration mechanism is 1.79
times faster than the original LAM\-MPI Allreduce. We also
experiment with the configuration mechanism on 3 different
cluster platforms with 2\-, 4\-, and 8\-way nodes. For the
cluster of 8\-way nodes, we show an improvement by a factor
of 1.98 for Allreduce.
If you have any comments on this database, including inaccuracies, requests to remove or add information, or suggestions for improvement, the WoTUG web team are happy to hear of them. We will do our best to resolve problems to everyone's satisfaction.
Copyright for the papers presented in this database normally resides with the authors; please contact them directly for more information. Addresses are normally presented in the full paper.
Pages © WoTUG, or the indicated author. All Rights Reserved.
Comments on these web pages should be addressed to: www at wotug.org