NHSE ReviewTM 1996 Volume First Issue

Overview of Recent Supercomputers

| <- HREF="node23.html" Prev | Index | Next -> |
NHSE ReviewTM: Comments · Archive · Search


3.3.7 The Tera MTA.

Machine type Distributed-memory multi-processor
Models MTA
Operating system Unix BSD4.4 + proprietary micro kernel
Compilers Fortran 77 (Fortran 90 extensions), HPF, C, C++
Vendors information Web page http://www.tera.com/

System parameters:

Model MTA-xC
Clock cycle
Theor. peak performance
Per proc. (64-bit) 1 Gflop/s
Maximal (64-bit) 256 Gflop/s
Main memory <=16
Memory bandwidth
CPU-to-memory >8 GB/s
No. of processors 16-256

Remarks:

Although the memory in the MTA is physically distributed, the system is emphatically presented as a shared memory machine (with non-uniform access time). The latency incurred in memory references is hidden by multi-threading, i.e., usually many concurrent program threads (instruction streams) may be active at any time. Therefore, when for instance a load instruction cannot be satisfied because of memory latency the thread requesting this operation is stalled and another thread of which an operation can be done is switched into execution. This switching between program threads only takes 1 cycle. As there may be up to 128 instruction streams and 8 memory references can be issued without waiting for preceding ones, a latency of 1024 cycles can be tolerated. References that are stalled are retried from a retry pool.

The connection network connects a 3-D cube of p processors with sides of p super one third of which alternately the x or y axes are connected. Therefore, all nodes connect to four out of six neighbours. Furthermore, there is an I/O port at every node. Each network port is capable of sending and receiving a 64-bit word per cycle which amount to a bandwidth of 22.6 GB/s per port. In case of detected failures, ports in the network can be bypassed without interrupting operations of the system.

Although the MTA should be able to run ``dusty-deck'' Fortran programs because parallelism is automatically exploited as soon as an opportunity is detected for multi-threading, it may be (and often is) worthwhile to explicitly control the parallelism in the program and to take advantage of known data locality occurrences. MTA provides handles for this in the form of library routines, including synchronisation, barrier, and reduction operations on defined groups of threads. Controlled and uncontrolled parallelism approaches may be freely mixed. HPF will also be supported for SPMD-style programming.

Measured Performances: The MTA will be benchmarkable from the beginning of 1996, therefore, no performance figures are available yet.

Copyright © 1996 Aad J. van der Steen and Jack J. Dongarra


| <- HREF="node23.html" Prev | Index | Next -> |
NHSE ReviewTM: Comments · Archive · Search
NHSE: Software Catalog · Roadmap


Copyright © 1996 NHSE ReviewTM All Rights Reserved.
Lowell W Lutz (lwlutz@rice.edu) NHSE ReviewTM WWWeb Editor