Overview of Recent Supercomputers
| <- HREF="node2.html" Prev | Index | Next -> |
NHSE ReviewTM: Comments
· Archive
· Search
Since many years the taxonomy of Flynn [5]
has proven to be useful for the classification of high-performance
computers. This classification is based on the way instruction- and
data streams are arranged and comprises four main architectural classes. We
will first briefly sketch these classes and afterwards fill in
some details when each of the classes is described separately.
- SISD machines: These are the conventional
systems that contain one CPU and hence can accommodate one instruction
stream that is executed serially. Nowadays many large mainframes may
have more than one CPU but each of these execute instruction streams
that are unrelated. Therefore, such systems still should be regarded as
(multiple) SISD machines acting on different data spaces. Examples
of SISD machines are for instance most workstations like those of DEC,
Hewlett-Packard, and Sun Microsystems. The definition of SISD machines
is given here for completeness' sake. We will not discuss this type of
machine in this report.
- SIMD machines: Such systems often have a large
number of processing units, ranging from 1,024 to 16,384 that all
may execute the same instruction on different data in lock-step. So,
a single instruction manipulates many data items in parallel. Examples
of SIMD machines in this class are the CPP DAP Gamma and the MasPar MP-2.
- Another subclass of the SIMD systems are the vectorprocessors.
Vectorprocessors act on arrays of similar data rather than on single
data items using specially structured CPUs. When data can be manipulated
by these vector units, results can be delivered with a rate of one,
two and -- in special cases -- of three per clock cycle (a clock cycle
being defined as the basic internal unit of time for the system). So,
vector processors execute on their data in an almost parallel way but only
when executing in vector mode. In this case they are several times faster
than when executing in conventional scalar mode. For practical purposes
vectorprocessors are therefore mostly regarded as SIMD machines. Examples
of such systems are for instance the Convex C410, and the Hitachi S3600.
- MISD machines: Theoretically in these types
of machines multiple instructions should act on a single stream of
data. As yet no practical machine in this class has been constructed
nor are such systems easy to conceive. We will disregard them in the
following discussions.
- MIMD machines: These machines execute several
instruction streams in parallel on different data. The difference with the
multi-processor SISD machines mentioned above lies in the fact that the
instructions and data are related because they represent different parts
of the same task to be executed. So, MIMD systems may run many sub-tasks
in parallel in order to shorten the time-to-solution for the main task
to be executed. There is a large variety of MIMD systems and especially
in this class the Flynn taxonomy proves to be not fully adequate for the
classification of systems. Systems that behave very differently like
a four-processor Cray Y-MP T94 and a thousand processor nCUBE 3 both
fall in this class. In the following we will make another important
distinction between classes of systems and treat them accordingly.
- Shared memory systems: Shared memory systems have
multiple CPUs all of which share the same address space. This means that
the knowledge of where data is stored is of no concern to the user as
there is only one memory accessed by all CPUs on an equal basis. Shared
memory systems can be both SIMD or MIMD. Single-CPU vector processors
can be regarded as an example of the former, while the multi-CPU models
of these machines are examples of the latter. We will sometimes use the
abbreviations SM-SIMD and SM-MIMD for the two subclasses.
- Distributed memory systems: In this case each CPU
has its own associated memory. The CPUs are connected by some network
and may exchange data between their respective memories when required. In
contrast to shared memory machines the user must be aware of the location
of the data in the local memories and will have to move or distribute
these data explicitly when needed. Again, distributed memory systems
may be either SIMD or MIMD. The first class of SIMD systems, mentioned
above, operate in lock step and all have distributed memories associated
to the processors. For the distributed memory MIMD systems again a
subdivision is possible: those in which the processors are connected
in a fixed topology and those in which the topology is flexible and
may vary from task to task. For the distributed memory systems we
will sometimes use DM-SIMD and DM-MIMD to indicate the two subclasses.
Although the difference between shared- and distributed memory
machines seems clear cut, this is not always entirely the case from the
user's point of view. For instance, the late Kendall Square Research
systems employed the idea of ``virtual shared memory'' on a hardware
level. Virtual shared memory can also be simulated at the programming
level: The first draft proposal for High Performance Fortran (HPF) was
published in November 1992 [6] which
by means of compiler directives distributes the data over the available
processors. The proposal was fixed by May 1993. Therefore, the system on
which HPF is implemented will act in this case as a shared memory machine
to the user. Other vendors of Massively Parallel Processing systems
(the buzz-word MPP systems is fashionable here), like Convex and Cray,
also support proprietary virtual shared-memory programming models which
means that these physically distributed memory systems, by virtue of the
programming model, logically will behave as shared memory systems. In
addition, packages like TreadMarks [1]
provide a virtual shared memory environment for networks of workstations.
Another trend that has come up in the last few years is distributed
processing. This takes the DM-MIMD concept one step further:
instead of many integrated processors in one or several boxes,
workstations, mainframes, etc., are connected by Ethernet, FDDI, or
otherwise and set to work concurrently on tasks in the same program.
Conceptually, this is not different from DM-MIMD computing, but
the communication between processors is often orders of magnitude
slower. Many packages to realise distributed computing, commercial,
and non-commercial are available. Examples of these are Parasoft's
Express (commercial), PVM (standing for Parallel
Virtual Machine, non-commercial),
and MPI (Message Passing
Interface, [14] also
non-commercial). PVM and MPI have been adopted for instance by Convex,
Cray, IBM and Intel for the transition stage between distributed computing
and MPP on the clusters of their favorite processors and they are
available on a large amount of distributed memory MIMD systems and even on
shared memory MIMD systems for compatibility reasons. In addition there
is a tendency to cluster shared memory systems, for instance by HIPPI
channels, to obtain systems with a very high computational power. E.g.,
Silicon Graphics is already providing such arrays of systems, the Intel
Paragon with the MP (Multi Processor)
nodes, and the NEC SX-4 also have this structure. The Convex Exemplar
SPP-1200 could be seen as a more integrated example (although the software
environment is much more complete and allows shared memory addressing).
Copyright © 1996 Aad J. van der Steen and Jack J. Dongarra
| <- HREF="node2.html" Prev | Index | Next -> |
NHSE ReviewTM: Comments
· Archive
· Search
NHSE: Software Catalog
· Roadmap
Copyright © 1996 NHSE ReviewTM All Rights Reserved.
Lowell W Lutz
(lwlutz@rice.edu)
NHSE ReviewTM WWWeb Editor