Crisis in HPC Personal Comment - Jeremy Martin, Oxford University Computer Services

Thank you very much for arranging the meeting in London yesterday. I enjoyed it a lot. The overwhelming impression that I get is that the occam/transputer way of doing things is a vastly superior approach to what else is currently being offered. (I couldn't have said this with any confidence before because I didn't know enough about other approaches to HPC). On a system of T9000 transputers it should be easy to achieve near to 100% efficiency, for the right problems. On a Cray T3D you would be lucky to get 20%. Never mind if the individual processors are more powerful, you don't ever get to use that extra power in practice.

It was suggested that there is a risk that there may not be any parallel supercomputers around in the future, because of lack of commercial success. Personally I doubt this. There are plenty of specialised problems which cannot be solved efficiently on networks of workstations because there is too much communication. Chris Jones's lightning strike simulation was a good example. So there will always have to be some parallel supercomputers.

I thought that your design for the next generation of transputers with virtual shared memory was very elegant. The idea of being able to plug in extra processor units or memory units as you please without having to touch the program (assuming that it has sufficient parallel slackness) is most appealing. Another important property of shared virtual memory is that it does away to the sole objection to transputers of Nick Maclaren - (that they don't cope well with problems involving shared memory)

I talked to one person about this afterwards who felt that in order to implement your design you would need to artificially slow the processors to get the right computation/communication power ratio. Is this true, and if so is there any way around this? (Perhaps it is all hypothetical in the light of what was said about academics lack of influence in the silicon industry.)

Another issue that I think is very important is correctness. The occam and transputer model is great because of its mathematical semantics. The hardware virtual channel routing can be proven semantically equivalent to physical links according to the failures model. So we have a system which is very efficient and also provably correct. Can the same be said for the Cray message routing architecture? Perhaps. Can we be sure that a Fortran program which has been "parallelised" has been done so in a semantics preserving manner. Who knows?

It is interesting how the facilities offered by visual-BASIC have made BASIC into a popular language again. I liked what you said about programming in parallel because it is a natural thing to do, rather than purely for efficiency. Suppose we had visual-turbo-occam? A language for (serial) PCs which enabled you to do fancy things with windows and networking and draw graphs and play games. Something like that might have some influence on the future of parallel computing. I get the impression that some people were put off occam because, for all its cleverness, it lacked user-friendly tools.

Jeremy Martin (jeremy.martin@oucs.ox.ac.uk)