Crisis in HPC Discussion - Preston Briggs, Tera Computer

From: preston@tera.com (Preston Briggs)
Newsgroups: comp.parallel
Subject: Re: Crisis in HPC Workshop - Conclusions (Summary)
Date: 23 Oct 1995 14:48:37 GMT

P.H.Welch@ukc.ac.uk writes:
   o (If there are any "yes" answers to the above) where do the problems lie?

     Hardware architecture for MPPs.  Real blame here.  Quoting from
     David May's opening presentation: "Almost all the current parallel
     computers are based on commodity microprocessors without hardware
     assistance for communications or context switching.  With the
     resulting imbalance it is not possible to context switch during
     communications delays and efficiency is severely compromised".

mccalpin@UDel.Edu (John D McCalpin) writes:

>I certainly agree that this is a problem, but it seems to me that the
>trouble is often worse than this.  In my field (ocean modeling), we
>typically find that the single-node performance is already in 5%
>percent of peak range, so even perfect scalability gives disappointing
>results.  The problem is that the computational nodes of MPP machines
>are too much like workstations and not enough like supercomputers.
>(With the exception of IBM SP-2 wide nodes.)
>
>I am definitely *not* recommending add-on vector processors for
>commodity cpus, but I am saying that fundamental changes in
>architecture are needed to enable the machines to better tolerate
>latency even to their own local memories.

Sure. But faster context switching, a la the Tera or some other multithreaded machine, lets each node do something useful while waiting on memory. Communicating to another node or "communicating" to memory -- it's all the same thing. Latency is a growing problem and the way around it is to context switch instead of waiting. Assuming, or course, you can context switch quickly -- May's point.

Preston Briggs