[ News | IPCA | Mirrors | Add | Search | Mail | Help | WoTUG ]
The toolkit contains global arrays (GA), memory allocator (MA), TCGMSG, and TCGMSG-MPI packages bundled together.
Global Arrays is a portable shared Non-Uniform Memory Access (NUMA) programming environment for distributed and shared memory computers.
TCGMSG is a simple and efficient message passing library.
TCGMSG-MPI is a TCGMSG library implementation on top of MPI and in some cases architecture-specific resources.
MA is a dynamic memory allocator for Fortran (and also C) programs.
Abstract: Portability, efficiency, and ease of coding are all important considerations in choosing the pr ogramming model for a scalable parallel application. The message-passing programming model is widely used because of its portability, yet some applications are too complex to code in it while also trying to maintain a balanced computation load and avoid redundant computations. The shared-memory programming model simplifies coding, but it is not portable and often provides little control over interprocessor data transfer costs. This paper describes a new approach, called Global Arrays GA, that combines the better features of both other models, leading to both simple coding and efficient execution. The key concept of GA is that it provides a portable interface through which each process in a MIMD parallel program can asynchronously access logical blocks of physically distributed matrices, with no need for explicit cooperation by other processes. We have implemented GA libraries on a variety of computer systems, including the Intel DELTA and Paragon, the IBM SP-1 - all message-passers, the Kendall Square KSR-2 - a nonuniform access shared-memory machine, and networks of Unix workstations. We discuss the design and implementation of these libraries, report their performance, illustrate the use of GA in the context of computational chemistry applications, and describe the use of a GA performance visualization tool.
Authors: Jaroslaw Nieplocha; Robert J. Harrison and Richard J. Littlefield. Environmental Molecular Sciences Laboratory Pacific Northwest National Laboratory, MSIN: K1-87, Richland, WA 99352, USA.
Abstract: In out-of-core computations, disk storage is treated as another level in the memory hierarchy, below cache, local memory, and (in a parallel computer) remote memories. However, the tools used to manage this storage are typically remote memory. This disparity complicates implementation of out-of-core algorithms and hinders portability. We describe a programming model that addresses this problem. This model allows parallel programs to use essentially the same mechanisms to manage the movement of data between take as our starting point the Global Arrays shared-memory model and library, which support a variety of operations on distributed arrays, including transfer between local and remote memories. We show how this model can be extended to support explicit transfer between global memory and secondary storage, and we define a Disk Resident Arrays library that supports such transfers. We illustrate the utility of the resulting model with two applications, an out-of-core matrix multiplication and a large computational chemistry program. We also describe implementation techniques on several parallel computers and present experimental results implemented very efficiently on parallel computers.
Author: Ian Foster (j_nieplocha@pnl.gov), Pacific Northwest National Laboratory, Richland, WA 99352, USA
Abstract: The performance of the Global Array shared-memory non-uniform memory-access programming model is explored on the I-WAY, wide-area-network distributed supercomputer environment. The Global Array model is extended by introducing a concept of mirrored arrays. Latencies and bandwidths for remote memory access are studied, and the performance of a large application from computational chemistry is evaluated using both fully distributed and also mirrored arrays. Excellent performance can be obtained is available.
Authors: J. Nieplocha and R. J. Harrison. Pacific Northwest National Laboratory, P.O. Box 999, Richland WA 99352, USA.
Abstract: All scalable parallel computers feature a memory hierarchy, in which some locations are ``closer'' to a particular processor than others. The hardware in a particular system may support a shared memory or message passing programming model, but these factors effect only the relative costs of local and remote accesses, not the system's fundamental Non-Uniform Memory Access (NUMA) characteristics. Yet while the efficient management of memory hierarchies is fundamental to high performance in scientific computing, existing parallel languages and tools provide only limited support for this management task. Recognizing this deficiency, we propose abstractions and programming tools that can facilitate the explicit management of memory hierarchies by the programmer, and hence the efficient programming of scalable parallel computers. The abstractions comprise local arrays, global (distributed) arrays, and disk resident arrays located on secondary storage. The tools comprise the Global Arrays library, which supports the transfer of data between local and global arrays, and the Disk Resident Arrays (DRA) library, for transferring data between global and disk resident arrays. We describe the shared memory NUMA model implemented in the tools, discuss extensions for wide area computing environments, and review major applications of the tools, which currently total over one million lines of code.
Authors: Jaroslaw Nieplocha (j_nieplocha@pnl.gov), Pacific Northwest National Laboratory, Richland, WA 99352, USA; Robert J. Harrison (rj_harrison@pnl.gov), Pacific Northwest National Laboratory, Richland, WA 99352, USA and Ian Foster (itf@mcs.anl.gov), Argonne National Laboratory, Argonne, IL 60439, USA.
Abstract: Portability, efficiency, and ease of coding are all important considerations in choosing the programming model for a scalable parallel application. The message-passing programming model is widely used because of its portability, yet some applications are too complex to code in it while also trying to maintain a balanced computation load and avoid redundant computations. The shared-memory programming model simplifies coding, but it is not portable and often provides little control over interprocessor data transfer costs. This paper describes an approach, called Global Arrays (GA), that combines the better features of both other models, leading to both simple coding and efficient execution. The key concept of GA is that it provides a portable interface through which each process in a MIMD parallel program can asynchronously access logical blocks of physically distributed matrices, with no need for explicit cooperation by other processes. We have implemented GA libraries on a variety of computer systems, including the Intel DELTA and Paragon, the IBM SP-1 and SP-2 (all message-passers), the Kendall Square KSR-1/2, Convex SPP-1200(nonuniform access shared-memory machines), the Cray T3D (a globally-addressable distributed-memory computer) and networks of Unix workstations. We discuss the design and implementation of these libraries, report their performance, illustrate the use of GA in the context of computational chemistry applications, and describe the use of a GA performance visualization tool.
Authors: Jaroslaw Nieplocha (j_nieplocha@pnl.gov), Pacific Northwest National Laboratory, Richland, WA 99352, USA; Robert J. Harrison (rj_harrison@pnl.gov), Pacific Northwest National Laboratory, Richland, WA 99352, USA and Richard J. Littlefield. Pacific Northwest National Laboratory, USA.