From: geoffb@bristol.st.com Date: Thu, 15 Jun 1995 10:13:17 +0100 To: P.H.Welch@ukc.ac.uk Subject: Re: "Crisis" workshop Peter: ... (private stuff omitted) One of the issues which jumps to mind is the erosion of the super- computing user base through performance improvements at the high end of the uni-processor market. How many of 1985's super-computer users are now adequately served by high-end workstations and PC's? Another issue is the real requirement for *general purpose* super-computing. The only major applications I know of are commercial data bases. What are the typical efficiencies of those applications? Perhaps the success of this application area could be the key to making your workshop more positive ("Q: look how good these data base servers are; how can we push this into the general purpose arena? A: Write all our scientific applications in SQL (:-)"). Of course, you might want to argue that aero-dynamic and other sorts of CAD modelling are major applications... My own conclusion on programming paradigms was that for massively parallel programs, you really have to eliminate contention because contention leads to sequentialisation. The only paradigm which seems to support that is BSP. Paradigms like PVM and the predecessor to MPI are a complete disaster from this point of view. Occam3's shared channels are too. In fact, the only thing which helps is combining operations -- if you can find ways of combining contending data accesses without sequentialising them then you will succeed. From here, I very quickly leapt to the conclusion that there was a whole pile of fundamental mathematical theory missing about combining operators (eg, what an adequate set of generators is and how to compose them). It is not until this theory has been discovered and the programming paradigms which it supports have been invented that massively parallel computing will be viable. (I don't think BSP goes anything like far enough.) Remember that sequential computing is based on the fact that all boolean functions can be reduced to combinations of 0, 1, !, & and |. We need to find the equivalent for combining parallel operations! I expect you're even more depressed now. Cheers Geoff ________________________________________________________________________________ Date: Tue, 29 Aug 1995 14:19:46 +0100 From: Mike.Giles@comlab.oxford.ac.uk To: P.H.Welch@ukc.ac.uk Subject: Re: HPC/Eficiency workshop Dear Peter, ... (private stuff omitted) My opinions and prejudices on the matters you'll be discussing at the conference are as follows: I don't think there's a real crisis. I think there have been problems with both hardware and software, but the biggest problem has perhaps been unrealistic expectations of what parallel HPC would deliver. On the hardware side, the move to parallel computing has happened alongside a move from vector to RISC processors, and from expensive SRAM main memories to cheaper DRAM main meories plus caches. Saying `alongside' is not really correct because it's the price performance benefits of RISC processors with memory hierarchies which has driven the new generation of parallel systems. The reason for making the distinction is that much of the lack of `efficiency' which people are achieving and which they're upset about is due to the move to RISC processors with caches rather than due to parallelism per se. For example the T3D's poor performance relative to its nominal peak is due to the lack of a secondary cache and CRAY's lack of experience with developing an optimising compiler for a RISC processor; the parallel aspects of the T3D's hardware/software are pretty good. Some of the problems are also due to taking codes which were very carefully written for the CRAY vector machines and then running them without change on RISC-based systems. The considerations involved in `fine-tuning' a program for RISC systems with memory hierarchies are entirely different to vector systems. In many cases, re-programming (e.g. changing the way data is stored and accessed) can result in a factor 2 or more improvement in execution speed. In practice the compilers haven't been able to do this job for us; we've had to manually re-program. By the time the compilers get to this point there won't be many remaining legacy vector codes left. Still on the hardware side, I think there was too much hype on the `massively' parallel aspect. In practice, I think most parallel computing in both industry and academia will involve 4-32 processors. Fewer than 4 processors isn't worth the hassle. More than 32 can't be afforded by many organisations. Some people argue that this might change in the future, but I disagree. Processor development is driven by the workstation market, so high-end single processors will be designed for workstations costing say 10-20k. For good economic reasons parallel systems will be built out of high-end processors so a system with 32 processors will cost on the order of 500k. Not many organisation can afford more than that, only the huge national centres and then a fair bit of money has to go into national networking to support it. Looking to the future I think the above analysis remains valid if one takes the `processor' to be itself increasingly parallel due to multiple pipes or even up to 4 tightly-coupled processors with shared cache/memory; at this level the compiler will handle/hide the parallelism so we can consider it as a single functional unit. For the few truly massively parallel systems Amdahl's law remains a concern in many applications, and will definitely affect the efficiency which can be achieved. It's fashionable now to yawn when someone mentions Amdahl's law, but it's still there, sometimes in more subtle forms. A code could be written to avoid any significant scalar computation, and to get great parallel efficiency on the main guts of the application, but poor parallel load-balancing on a secondary aspect such as boundary conditions can really hurt the parallel speedup. On the software front, things are steadily improving. There is good portability now with PVM and MPI, although getting efficiency as well is still a problem. More importantly, application developers are all now in the habit of defining for their application a communication harness which has the key capabilities they need so that if they do need to port onto a new underlying communication library it is relatively painless. In the old days there was embedded PVM code everywhere and porting that to MPI or PARMACS or something else was painful. I personally think we need more research on higher-level libraries which take care of the parallelisation for the application programmer. That's what we have been working on for unstructured grids. My jaundiced view (said with a smile so I do hope it doesn't cause offense to anyone reading this) is that computer scientists aren't close enough to the application developers to understand what they need, and application developers aren't interested in making life easier for other application developers! We are using BSP as the underlying communication library for our own library development. From my user's perspective, the reason for using BSP is its efficiency rather than its simplicity, but the reason for its efficiency is precisely its simplicity. The internals guts of the BSP library are so small that with relatively little effort they can be very efficiently implemented on any system using its own lowest-level highest-performing communications primitives. The BSP superstep structure will, in the future, offer performance benefits in terms of overlapping communication and computation. Right now very little hardware supports this (only the new adapter cards on the SP2?) and so it's not actually used by the current BSP library (as far as I know). The BSP library is also a good fit to the current trends in hardware towards various flavours of virtual shared memory. This has been a bit of a rambling email. I'd be happy to have a follow-up discussion on any aspect of it. If I can make it to the workshop I'd be happy to present this viewpoint in the open discussion, especially if it's felt that my opinion is outside the mainstream! I'll be away for the next week so email discussion will have to wait till I get back. Mike Giles ----------------------------------------------------------- Dr. Michael Giles Oxford University Computing Laboratory Wolfson Building Parks Road Oxford OX1 3QD. Tel: +44-01865-273862 FAX: +44-01865-273839 email: giles@comlab.oxford.ac.uk --------------------------------------------------------------