11th September 1995
Lecture room G22 (also known as the Pearson Lecture Theatre)
Pearson Building
University College London
Gower Street
London WC1E 6BT
Efficiency levels for ``real'' HPC applications are reported (e.g. by the NAS parallel benchmarks) ranging around 20-30% (for some 16-node systems) to 10-20% (for 1024-node massively parallel super-computers). Are low efficiencies the result of bad engineering at the application level (which can be remedied by education) or bad engineering at the architecture level (which can be remedied by <what>)? Maybe these efficiency levels are acceptable to users ... after all, 20% of 16 nodes (rated at 160 MFLOPS per node) is still around 500 Mflops and 10% of 1024 nodes is 16 Gflops? But they may be disappointing to those who thought they were going to be able to turn round jobs at over 100 Gflops! Are there other ways of obtaining the current levels of performance that are more cost-effective?
A further cause of concern is the dwindling number of suppliers of HPC technology that are still in the market ...
This workshop will focus on the technical and educational problems that underly this growing crisis. Political matters will not be considered ... unless they can be shown to have a direct bearing.
09:30 Registration 09:50 Introduction to the Day 10:00 High performance compute + interconnect is not enough (Professor David May, University of Bristol) 10:40 Experiences with the Cray T3D, PowerGC, ... (Chris Jones, British Aerospace, Warton) 11:05 More experiences with the Cray T3D, ... (ABSTRACT) (Ian Turton, Centre for Computational Geography, University of Leeds) 11:30 Coffee 11:40 Experiences with the Meiko CS2, ... (ABSTRACT) (Chris Booth, Parallel Processing Section, DRA Malvern) 12:15 Problems of Parallelisation - why the pain? (ABSTRACT) (Dr. Steve Johnson, University of Greenwich) 13:00 Working Lunch (provided) [Separate discussion groups] 14:20 Language Problems and High Performance Computing (ABSTRACT) (Nick Maclaren, University of Cambridge Computer Laboratory) 14:50 Parallel software and parallel hardware - bridging the gap (ABSTRACT) (Professor Peter Welch, University of Kent) 15:30 Work sessions and Tea [Separate discussion groups] 16:30 Plenary discussion session 16:55 Summary 17:00 Close
Judith Broom
Computing Laboratory
The University
Canterbury
Kent
CT2 7NF
England
tel: +44 1227 827695
fax: +44 1227 762811
email: J.Broom@ukc.ac.uk
The latest information can be found at:
<URL:/parallel/groups/selhpc/crisis/>and
<URL:ftp://ftp.cs.ukc.ac.uk/pub/parallel/groups/selhpc/crisis/>where full details of this workshop (e.g. names of speakers and final timetable) will be updated.
All types of participant are welcome - see above. Position statements are also welcome, but not compulsory, from all attending this workshop. They will be reproduced for all who attend and will help us define the scope of each discussion group.
It seems to be proving difficult to build efficient high-performance computer systems simply by taking very fast processors and joining them together with very high bandwidth interconnect. Apart from the need to keep the computational and communication power in balance, it may also be essential to reduce communication start-up costs (in line with increasing bandwidth) and to reduce process context-switch time (in line with increasing computational power). Failure in either of these regards leads to coarse-grained parallelism, which may result in insufficient parallel slackness to allow efficient use of individual processing nodes, potentially serious cache-coherency problems for super-computing applications and unnecessarily large worst-case latency guarantees for real-time applications.
These standards raise two problems: depressed levels of efficiency (this may be a temporary reflection of early implementations) and a low-level hardware-oriented programming model (HPF expects the world to be an array and processing architectures to be a 2-D grid, MPI allows a free-wheeling view of message-passing that is non-deterministic by default). Neither standard allows the application developer to design and implement systems in terms dictated by the application; bridging the gap between the application and these hardware-oriented tools remains a serious problem.
New pretenders, based upon solid mathematical theory and analysis, are knocking on the door - such as Bulk Synchronous Parallelism (BSP). Old pretenders, also based upon solid mathematical theory and analysis and with a decade of industrial application, lie largely unused and under-developed for large-scale HPC - such as occam. Might either of these offer some pointers to the future?
Please come along and make this workshop work.