Communicating Process Architectures 2009
Communicating Process
Architectures 2009 will start on the evening of Sunday 1st. November, through
to lunchtime on 4th. November at the Technische Universiteit Eindhoven
in Eindhoven, the Netherlands. The conference will run in its normal style
(a single stream of refereed papers during the day and fringe events
in the evenings). This year, however, we shall be part of
Formal Methods Week 2009
where registration for CPA also gives access to the sessions of other
conferences, tutorials and workshops running simultaenously –
follow their sidebar link to "Schedule" for the outline timetable.
Welcome!
WoTUG is a forum set up to support those applying the CSP
model of parallel processing. You will find on this site articles and information
that can help you design and build concurrent software and hardware systems that
really work, day in, day out, without any need to spend man-years of debugging
effort.
- Information on CSP, the mathematical basis of our work
- Our papers, the distilled results of our work
- The KRoC retargettable occam compiler
- A page about the group.
The Abstract below is from a paper in our database:
Configurable Collective Communication in LAM-MPI
By John Markus Bjørndalen, Otto J. Anshus, Tore Aarsen, Brian Vinter
In another paper, we observed that PastSet (our experimental tuple space system) was 1.83 times faster on global reductions than LAM-MPI. Our hypothesis was that this was due to the better resource usage of the PATHS framework (an extension to PastSet that supports orchestration and configuration) due to a mapping of the communication and operations which matched the computing resources and cluster topology better. This paper reports on an experiment to verify this and represents on-going work to add some of the same configurability of PastSet and PATHS to MPI. We show that by adding run-time configurable collective communication, we can reduce the latencies without recompiling the application source code. For the same cluster where we experienced the faster PastSet, we show that Allreduce with our configuration mechanism is 1.79 times faster than the original LAM-MPI Allreduce. We also experiment with the configuration mechanism on 3 different cluster platforms with 2-, 4-, and 8-way nodes. For the cluster of 8-way nodes, we show an improvement by a factor of 1.98 for Allreduce.
Complete record...
|