Crisis in HPC Personal Comment - Duncan Campbell, University of York

The workshop prompted me to pose certain questions, foremost amongst these was, what HPC are we referring to? Is one referring to the large MPPs which are so expensice that only governments can buy them, then researchers buy some access on them? Or is one referring to medium MPPs which corporations buy for massively parallel DB access or for their scientific equations?

The former answer suggests that HPC is limited to a small collection of special-purpose computers, with a reasonably large user base.

The latter answer suggests that HPC is more widely based, and is still set to continue its growth, particularly in the DB area.

A further direction for HPC was posed by one of the speakers, namely the MPP servers, for multi-media, etc.

Another question posed was, given then seemingly low efficiency rates for HPC machines, how do those rates compare to uniprocessors? If the rates are pretty similar, the HPC users are probably being quite unrealistic in their efficiency demands.

Several good points were made for solving this HPC crisis in terms of the machines' efficiencies. It was acknowledged that communication is a bottleneck. One needs to design hardware to integrate with communication and with fast context switching. New models of parallelism are required for one to be able to work with the communication available. Also, we needs to write software according to a good model of parallelism.

The irony is, these points have been made again and again for several years now, but little appears to have been done to fully address them.

Personally, I believe that HPC is being tackled from the wrong direction. Computing really took off with the PC revolution. For HPC computing to take off, it looks like it will need the impetus of a Parallel PC (PPC) revolution.

Such an approach would tackle the high-demand, large volume market. One could start with small (4-8 processor) PPCs, using non-standard microprocessors with integrated communication (T9?). These would need to support existing software (with preferably no modification), it could be run initially with multi-processing rather than parallel processing, but this requires a suitable parallel OS (Taos?). It should also be integrated with distributed computing (PVM, MPI, Condor?), and requires a suitable model of this (modified CLUMPS, BSP?).

Once the PPC is established, its new general purpose software base could be ported to the developing HPC market, which would be scaled up from what would have been learned with the PPC experience.

Again, HPC is big and expensive and powerful, but clusters of workstations have very slow communication between themselves. Instead, if one had a cluster of PPCs, more parallel computing power (and communication) would be available at a node level. So, there would not be such a dependence upon inter-workstation communication so often. Also, PPC users would be less likely to object if someone else's job is running on a (dedicated?) group of his PPC's processors, rather than time-slicing with his own work.

Meanwhile, much mention was made of the problems of executing scientific code on HPC platforms, but hardly any mention was made of the massively parallel DB market, which is growing relatively quickly in the HPC arena.

Duncan

Dr D. K. G. Campbell  "Like a reverse charge call from God." (P. Clarke)
 Advanced Computer Architecture Group, Department of Computer Science,
           University of York, Heslington, York  YO1 5DD, UK
 Tel: +44 (0)1904 432763  Email: campbell@cs.york.ac.uk
 Fax: +44 (0)1904 432767  WWW: http://www.cs.york.ac.uk/~campbell/