From: geoffb@bristol.st.com
Date: Thu, 15 Jun 1995 10:13:17 +0100
To: P.H.Welch@ukc.ac.uk
Subject: Re: "Crisis" workshop


Peter:

...  (private stuff omitted)

One of the issues which jumps to mind is the erosion of the super-
computing user base through performance improvements at the high
end of the uni-processor market.  How many of 1985's super-computer
users are now adequately served by high-end workstations and PC's?

Another issue is the real requirement for *general purpose*
super-computing.  The only major applications I know of are commercial
data bases.  What are the typical efficiencies of those applications?
Perhaps the success of this application area could be the key to making
your workshop more positive ("Q: look how good these data base servers
are; how can we push this into the general purpose arena?  A: Write all
our scientific applications in SQL (:-)").  Of course, you might want
to argue that aero-dynamic and other sorts of CAD modelling are major
applications...

My own conclusion on programming paradigms was that for massively
parallel programs, you really have to eliminate contention because
contention leads to sequentialisation.  The only paradigm which seems
to support that is BSP.  Paradigms like PVM and the predecessor to MPI
are a complete disaster from this point of view.  Occam3's shared
channels are too.  In fact, the only thing which helps is combining
operations -- if you can find ways of combining contending data
accesses without sequentialising them then you will succeed.  From
here, I very quickly leapt to the conclusion that there was a whole
pile of fundamental mathematical theory missing about combining
operators (eg, what an adequate set of generators is and how to compose
them).  It is not until this theory has been discovered and the
programming paradigms which it supports have been invented that
massively parallel computing will be viable.  (I don't think BSP goes
anything like far enough.)  Remember that sequential computing is based
on the fact that all boolean functions can be reduced to combinations
of 0, 1, !, & and |.  We need to find the equivalent for combining
parallel operations!

I expect you're even more depressed now.

Cheers

Geoff

________________________________________________________________________________


Date: Tue, 29 Aug 1995 14:19:46 +0100
From: Mike.Giles@comlab.oxford.ac.uk
To: P.H.Welch@ukc.ac.uk
Subject: Re: HPC/Eficiency workshop


Dear Peter,

...  (private stuff omitted)

My opinions and prejudices on the matters you'll be discussing at the 
conference are as follows:

I don't think there's a real crisis.  I think there have been problems
with both hardware and software, but the biggest problem has perhaps
been unrealistic expectations of what parallel HPC would deliver.

On the hardware side, the move to parallel computing has happened
alongside a move from vector to RISC processors, and from expensive
SRAM main memories to cheaper DRAM main meories plus caches. Saying
`alongside' is not really correct because it's the price performance
benefits of RISC processors with memory hierarchies which has driven
the new generation of parallel systems.  The reason for making the
distinction is that much of the lack of `efficiency' which people
are achieving and which they're upset about is due to the move to
RISC processors with caches rather than due to parallelism per se.
For example the T3D's poor performance relative to its nominal peak
is due to the lack of a secondary cache and CRAY's lack of experience
with developing an optimising compiler for a RISC processor; the 
parallel aspects of the T3D's hardware/software are pretty good.
Some of the problems are also due to taking codes which were very
carefully written for the CRAY vector machines and then running them
without change on RISC-based systems.  The considerations involved
in `fine-tuning' a program for RISC systems with memory hierarchies 
are entirely different to vector systems. In many cases, re-programming
(e.g. changing the way data is stored and accessed) can result in a 
factor 2 or more improvement in execution speed.  In practice the 
compilers haven't been able to do this job for us; we've had to manually
re-program.  By the time the compilers get to this point there won't
be many remaining legacy vector codes left.

Still on the hardware side, I think there was too much hype on the
`massively' parallel aspect.  In practice, I think most parallel 
computing in both industry and academia will involve 4-32 processors.
Fewer than 4 processors isn't worth the hassle.  More than 32 can't
be afforded by many organisations.  Some people argue that this might
change in the future, but I disagree.  Processor development is 
driven by the workstation market, so high-end single processors will
be designed for workstations costing say 10-20k.  For good economic
reasons parallel systems will be built out of high-end processors
so a system with 32 processors will cost on the order of 500k.  Not
many organisation can afford more than that, only the huge national 
centres and then a fair bit of money has to go into national networking
to support it.  Looking to the future I think the above analysis remains
valid if one takes the `processor' to be itself increasingly parallel
due to multiple pipes or even up to 4 tightly-coupled processors with 
shared cache/memory; at this level the compiler will handle/hide the
parallelism so we can consider it as a single functional unit.

For the few truly massively parallel systems Amdahl's law remains a
concern in many applications, and will definitely affect the efficiency
which can be achieved. It's fashionable now to yawn when someone 
mentions Amdahl's law, but it's still there, sometimes in more subtle 
forms.  A code could be written to avoid any significant scalar 
computation, and to get great parallel efficiency on the main guts of
the application, but poor parallel load-balancing on a secondary aspect
such as boundary conditions can really hurt the parallel speedup.

On the software front, things are steadily improving.  There is good
portability now with PVM and MPI, although getting efficiency as
well is still a problem.  More importantly, application developers 
are all now in the habit of defining for their application a
communication harness which has the key capabilities they need so 
that if they do need to port onto a new underlying communication
library it is relatively painless.  In the old days there was embedded
PVM code everywhere and porting that to MPI or PARMACS or something
else was painful.

I personally think we need more research on higher-level libraries
which take care of the parallelisation for the application programmer.
That's what we have been working on for unstructured grids.  My
jaundiced view (said with a smile so I do hope it doesn't cause offense
to anyone reading this) is that computer scientists aren't close enough
to the application developers to understand what they need, and 
application developers aren't interested in making life easier for 
other application developers!

We are using BSP as the underlying communication library for our own 
library development.  From my user's perspective, the reason for using
BSP is its efficiency rather than its simplicity, but the reason for
its efficiency is precisely its simplicity.  The internals guts of the
BSP library are so small that with relatively little effort they can be
very efficiently implemented on any system using its own lowest-level
highest-performing communications primitives.  The BSP superstep 
structure will, in the future, offer performance benefits in terms of
overlapping communication and computation.  Right now very little
hardware supports this (only the new adapter cards on the SP2?) and
so it's not actually used by the current BSP library (as far as I know).
The BSP library is also a good fit to the current trends in hardware 
towards various flavours of virtual shared memory.

This has been a bit of a rambling email.  I'd be happy to have a 
follow-up discussion on any aspect of it.  If I can make it to the 
workshop I'd be happy to present this viewpoint in the open discussion,
especially if it's felt that my opinion is outside the mainstream!

I'll be away for the next week so email discussion will have to wait
till I get back.

Mike Giles

-----------------------------------------------------------	
Dr. Michael Giles
Oxford University Computing Laboratory
Wolfson Building
Parks Road
Oxford OX1 3QD.

Tel: +44-01865-273862
FAX: +44-01865-273839

email: giles@comlab.oxford.ac.uk
--------------------------------------------------------------