| <- HREF="node29.html" Prev | Index | Next -> |
NHSE ReviewTM: Comments
· Archive
· Search
Machine type | RISC-based distributed-memory multi-processor |
---|---|
Models | AP1000 |
Operating system | Cell OS (transparent to the user) and SunOS (Sun's Unix variant) on the front-end system |
Connection structure | T-net (2-D torus), B-net (common bus + hierarchical ring), S-net (tree) (see remarks) |
Compilers | Fortran 77 and C with extensions |
System parameters:
Model | AP1000 |
---|---|
Clock cycle | 40 ns |
Theor. peak performance | |
Per proc. (64-bit) | 12.5 Mflop/s |
Maximal (64-bit) | 12.8 Gflop/s |
Main memory | <=16 |
Memory/node | 16 MB |
Communication bandwidth | |
B-net | 50 MB/s |
T-net | 25 MB/s |
No. of processors | 8-1024 |
Remarks:
The AP1000 is put together from computing cells each of which contains a 25 MHz SPARC processor (IU) and an additional floating-point processor (FPU). The processor cells are complemented by routing and message controllers, a B-net interface (see below), cell memory, and cache memory (128 KB). The peak performance of the FPU is estimated to be 12.5 Mflop/s which brings the aggregate peak rate to 12.8 Gflop/s for a full 1024 cell system. The system is front-ended by a Sun 4 machine.
Fujitsu has attempted to diminish the communication problems that are inherent to DM-MIMD machines by implementing different networks for broadcasting and collection of data (the B-net), for synchronisation (the S-net), and for communication on the processor grid (the T-net). As the broadcasting or multicasting (i.e., broadcasting to a selected subset) of data often constitutes a bottleneck in the execution of a computational task, the B-net has a two times higher bandwidth than the interprocessor T-net (50 vs. 25 MB/s). Because the gather and scatter of data over the processors is generally less structured, a combination of a common bus and a hierarchical ring structure is used. The B-net interface has FIFO buffers and scatter-gather controllers to allow for sending/receiving data independent the other active components in the cell. The message controller seeks to minimise the overhead for data transfer setup and relieves the IU from doing the message passing proper.
For the T-net which connects the cells in a 2-D grid the transfer speed is two times lower than that of the B-net, but as data movement will often be more regular, it is expected to give good throughput, especially as a new conflict-free wormhole routing scheme has been implemented by allocating routed messages to alternating buffer pairs in the intermediate cells. Experiments have shown relatively low message overhead for this system [9].
There is a tree-structured S-net for barrier synchronisation of processes with again quite low overheads (a maximum of 5.2 s for a full configuration).
Recently an entry model of the AP1000, the AP1000C, is being offered. The AP1000C starts at a configuration of 8 processor cells instead of the original 64. Also the housing has been made more compact for this model, saving a factor 3 in space.
Measured Performances: In [8] the performance on the solution of a full linear system on a 256 cell machine is given. A system of order 100 performed at about 40 Mflop/s, an order 300 system attained 180 Mflop/s, while a 1000 1000 system reached more than 300 Mflop/s. In [2] a speed of 2.3 Gflop/s on a dense system of order 25,600 on 512 cells was obtained.
Copyright © 1996 Aad J. van der Steen and Jack J. Dongarra
| <- HREF="node29.html" Prev | Index | Next -> |
NHSE ReviewTM: Comments
· Archive
· Search
NHSE: Software Catalog
· Roadmap