-------------------------------------------------------------------------- Applied Parallel Research FORGE 90 DMP/SMP DataSheet -------------------------------------------------------------------------- FORGE Interactive Parallelization Tools For Distributed and Shared Memory Multiprocessor Systems and Clusters of Networked Workstations An interactive Fortran parallelization environment from APR -------------------------------------------------------------------------- * Baseline FORGE Browser ------------------------ APR's interactive parallelizers for distributed and shared memory systems are built upon the industry's leading interprocedural Fortran program browser, FORGE Baseline. This is the only tool powerful enough to analyze large, complex Fortran application programs for parallelization on both shared and distributed memory systems. The Baseline Browser utilizes an innovative database capable of analyzing even the most convoluted 'dusty deck" program. FORGE's database viewing tools provide facilities for fast reference tracing of variables and constants, consistency checking of COMMON blocks and subprogram calls, and exposing variable aliasing through COMMON and calls, as well as displaying COMMON block usage, data flow through calls, and data dependencies between routines and basic or arbitrary code blocks. Baseline FORGE's unique interprocedural database provides the complete, global view of a program that you need before you start optimizing. Additional facilities for program maintenance and tidy reformatting, and an advanced instrumentation module and runtime library for gathering serial execution performance statistics are also included. * The Distributed Memory Parallelizer (DMP) ------------------------------------------- Add onto Baseline the Distributed Memory Parallelizer to spread loops and distribute data arrays for MIMD architectures interactively. The parallelized program is fully scalable, with calls to APR's parallel run- time library, interfacing any of the popular communication packages such as PVM, Express, and Linda, or native message passing systems. With FORGE's SPMD (Single Program, Multiple Data) parallelization strategy, the same program runs on each processor while selected DO loops are rewritten to automatically distribute their iterations across the processors. Your first step in parallelizing a Fortran program is identifying the critical data arrays, proposing an array decomposition scheme, and then restructuring the program to decompose these arrays over the processors. FORGE DMP's Data Decomposition facility offers you an interactive way to specify decompositions and select arrays for partitioning while viewing the implications of these decisions. The Data Decomposer implements either BLOCK or CYCLIC distributions along any single dimension, with either FULL or SHRUNK memory allocation. With full allocation, an array is allocated its original size on each processor. With shrunk allocation, each processor is allocated only enough memory for an array to hold the elements that it owns. The next step is identifying which loops to parallelize. DMP's Loop Spreader allows interactive or automatic selection of loops. Under automatic selection, DMP uses actual runtime execution statistics to determine the best loops to parallelize to obtain higher parallelization granularity and reduced communication costs. DMP checks for parallelization inhibitors, rewrites dimension declarations and array subscripts on distributed arrays to reflect the partitioning, modifies DO loop control counters to operate dynamically depending on the loop distribution scheme, and insures that all restructurings are consistent through subroutine calls. Data communication calls to APR's parallel runtime library are inserted automatically around and within distributed loops to move the data as it is needed. DMP's interactive displays allow fine tuning of the communications. The resulting parallel program is dynamically scalable at runtime. DMP can also be used to interactively view the parallelizations developed by APR's batch parallelizing pre-compilers dpf and xhpf. * Parallel Performance Profiler and Simulator --------------------------------------------- Programs parallelized by FORGE DMP can utilize APR's Parallel Profiler to gather runtime performance statistics of CPU utilization and communication costs. The Performance Simulator can be used to predict performance on various MPP systems or configurations. Performance instrumentation options generate parallelized programs with calls to APR's runtime timing library to accumulate data on each node for loop and subprogram execution times, communication costs, and program wait times. The post-processor polytime is provided to analyze the results over all nodes and produce a composite report of a program's true performance on the parallel system. By linking with APR's runtime simulation library, performance of a parallelized program running on a single node can be extrapolated to report CPU and communication performance on a variety of scalable MPP systems. * The Shared Memory Parallelizer (SMP) -------------------------------------- Another add-on to Baseline FORGE is the Shared Memory Parallelizer. Unlike parallelizing compilers that fail to parallelize the most important DO loops in your program, SMP's interprocedural analysis can handle loops that call subroutines. SMP's strategy is to parallelize for high granularity by analyzing outermost loops first. It analyzes array and scalar dependencies across subprogram boundaries by tracing references through the database up and down the call tree. The result is a parallelized source code with compiler-specific directives inserted for scoping variables and for identifying Critical and Ordered regions of code. DO loops are selected for parallelization interactively. Using execution performance timings as a guide, FORGE SMP will suggest the most significant loop as a starting point, working through the code from the highest CPU-intensive loops down to some threshold, below which parallelization does not produce a performance gain. SMP's interprocedural analysis makes scoping of variables passed through subprogram calls and COMMON possible. In a parallel region of code, SMP analyzes all variable references within a loop, including those enclosed in routines called from the loop. Proceeding down the call chain, SMP identifies variables as PRIVATE or SHARED, and GLOBAL or LOCAL, displaying them interactively and allowing you to modify its decisions. SMP also identifies Critical and Ordered Regions in the code that will give rise to synchronization calls. On some systems these regions cannot be parallelized. These are also displayed interactively. Following successful analysis of a loop nest, FORGE SMP inserts directives that are specific for the target system and compiler on which the program is to be run. SMP knows about a number of shared memory systems and always generates the correct directives. And, your program's parallel analysis is saved and can be recalled again later to generate a parallel program for some other target system. ------------------ Other APR Products ------------------ forgex FORGE Explorer Motif GUI global Fortran program browser APR offers three MAGIC Parallelizing Batch Pre-Compilers: dpf for distributed memory systems spf for shared memory systems xhpf for HPF directives and Fortran 90 array syntax on distributed memory systems --------------------- Platforms and Targets --------------------- APR's products are available to run on various systems including HP, SUN, IBM RS/6000, DEC Alpha, and Cray. Parallelizations and runtime support are available for: workstation clusters, IBM SP1 and POWER/4, Intel Paragon, nCUBE, Meiko, Cray T3D, CM-5. ---------------- More Information ---------------- For further information on these tools and our parallelization techniques training workshops, contact us at: Applied Parallel Research, Inc. 550 Main Street, Suite I Placerville, CA 95667 Phone: 916/621-1600 Fax: 916/621-0593 email: forge@netcom.com Copyright * 1993 Applied Parallel Research, Inc. 11/93