%T Configurable Collective Communication in LAM\-MPI %A John Markus Bjørndalen, Otto J. Anshus, Tore Aarsen, Brian Vinter %E James S. Pascoe, Roger J. Loader, Vaidy S. Sunderam %B Communicating Process Architectures 2002 %X In another paper, we observed that PastSet (our experimental tuple space system) was 1.83 times faster on global reductions than LAM\-MPI. Our hypothesis was that this was due to the better resource usage of the PATHS framework (an extension to PastSet that supports orchestration and configuration) due to a mapping of the communication and operations which matched the computing resources and cluster topology better. This paper reports on an experiment to verify this and represents on\-going work to add some of the same configurability of PastSet and PATHS to MPI. We show that by adding run\-time configurable collective communication, we can reduce the latencies without recompiling the application source code. For the same cluster where we experienced the faster PastSet, we show that Allreduce with our configuration mechanism is 1.79 times faster than the original LAM\-MPI Allreduce. We also experiment with the configuration mechanism on 3 different cluster platforms with 2\-, 4\-, and 8\-way nodes. For the cluster of 8\-way nodes, we show an improvement by a factor of 1.98 for Allreduce.