Crisis in HPC Personal Article in Flagship bulletin

by Chris Wadsworth and Roger Evans, RAL

Crisis in High Performance Computing (UCL, 11 September 1995)

Any meeting with such a provocative title was sure to attract a good audience and the meeting room at UCL was packed with a mixture of computer scientists and users of HPC for the meeting organised by Prof Peter Welch of the University of Kent under the auspices of SEL-HPC (the London and South East consortium for education and training in HPC).

In his opening remarks Prof. Welch addressed what he saw as the key points for the day: given the apparently poor fraction of peak performance achieved on the latest parallel computers, is there a crisis? Does efficiency matter given the fact that machines are steadily faster and cheaper? Are people disappointed? Is the problem in hardware or software or in our models of parallelism? Can we do better a) now and b) on the next generation of machines?

The first talk by Prof. David May, the architect of the Transputer, and recently moved to the University of Bristol, was sure to introduce many controversial points and we were not disappointed. David described the mood of optimism from around 1990 when massively parallel computers were expected to find a breadth of application, be cheap, and produde new results. Many computer science issues were at that time not resolved and by 1995 we observe that much of the computing need is satisfied by single processor machines and small symmetrical multi-processor (SMP) computers. Provocatively David said that he expected this to continue for some time yet.

In an invited talk presented in 1989, he had commented on the need for standard languages and libraries to popularise parallel computing, the same need existed today! The industry had pursued the simple metrics of Mflop/s and Mbyte/s communications bandwidth while ignoring issues of latency and context switching time which had been balanced in the original Transputer design. Almost all the current parallel computers are based on commodity microprocessors without hardware assistance for communications or context switching. With the resulting imbalance it is not possible to context switch during communications delays and efficiency is severely compromised.

It is not obvious that the volume microprocessor market with its emphasis on cheap memory parts will allow these deficiencies to be rectified but the major new growth area of multi-media servers requires parallel systems with well bounded communications delays and may well be the opportunity for balanced systems to be designed again. The large scientific massively parallel computer has limited market potential and the desktop machine is where volume and profits can be achieved. Today this is where the small SMP machines have been very successful.

In relating the experiences of end users, Chris Jones of BAE Warton described a finite difference electromagnetic simulation used to model lightning strike on military aeroplanes. The code achieved high efficiencies on a Cray Y-MP and on a Parsytec (transputer based) parallel computer but gave only 12% efficiency on a Cray T3D.

Ian Turton from Leeds is a computational geographer trying to popularise HPC in the geography community. He had been seduced by the promise of a 38 Gflop Cray T3D and his disappointment in achieving only 7-8 Gflop/s was exceeded by his surprise at being told that this was the best anyone had achieved on that machine. Ian felt that the uncertainty over the future of HPC and more particularly concerns regarding the Cray specific optimisations needed to achieve good performance meant that he could not honestly recommend it to other geographers.

Chris Booth from DRA Malvern described detailed event driven simulations of Army logistic support. Originally driven by the problem of moving 9000 vehicles for each Brigade to Germany in the event of war, this scale of simulation requires the power of parallel computing. Experience with parallelising C++ code had not been encouraging, giving very poor load balance. Lower latency and faster context switching times would help as would a global address space.

Other speakers addressed the issues of parallelisation tools and of language limitations. None of the current languages appear ideal for expressing parallelism in a way that can lead to efficient code. Has the time come to abandon the old baggage and start from scratch again?

In a second presentation Peter Welch described the different approaches to parallelism of the computer scientist and the application user. The former thinks parallelism is fun and the latter just wants performance and should have the parallelism hidden from him. Today's parallel computers were described as immature with too much machine specific knowledge needed for good performance. The lessons of the Transputer had been forgotten and latency and context switch times had not moved apace with floating point performance and communications bandwidth. A machine designed from scratch with the correct balance would be easier to program and be closer to being a general purpose machine.

There followed some parallel subgroups and a closing plenary discussion. The general view was that there was disappointment with MPP performance, machines having been oversold not only by the vendor but also by local enthusiasts. Despite this there exists a range of problems that can only be solved on the large MPP machines. If there was a crisis it was that many MPP vendors had gone out of business, more might find the market unprofitable and the remaining suppliers might exploit a monopoly situation where prices would rise again to the detriment of the end users.

Roger Evans (r.g.evans@rl.ac.uk)
Chris Wadsworth (cpw@inf.rl.ac.uk)