------------------------------------------------------------------------- Applied Parallel Research FORGE Magic Pre-Compilers DataSheet ------------------------------------------------------------------------- Parallelize Fortran Automatically with FORGE MAGIC Pre-Compilers from Applied Parallel Research, Inc. Automatic parallelization of Fortran programs is finally possible for a range of real applications. APR announces the development of its MAGIC series of batch pre-compilers for both distributed and shared memory parallel multi-processor systems. APR, a leader in providing tools for Fortran optimization, vectorization, and parallelization, now offers its FORGE* premier parallelization technology enhanced with a unique automatic capability. We call it, simply, MAGIC. ------------------------- Bootstrap Parallelization ------------------------- MAGIC uses various schemes to arrive at an initial parallelization strategy for your program. With FORGE's fully interprocedural analysis, it can identify the most significant loops in the program and develop a parallelization based on those loops and the arrays they reference. Or, given program execution timing information, the parallelizer can focus precisely on the loops that must be parallelized to significantly affect performance. ---------------------- Parallelization Report ---------------------- Not all loops in a program are parallelizable as written. An essential aspect of APR's MAGIC pre-compilers is the detailed diagnostic assistance they provide. A parallelization report indicates the loop distribution and data array partitioning strategies that were successfully applied, as well as syndromes in the program that inhibited parallelization. From this report, a user can restructure the program to remove inhibitors, or suggest a different parallelization strategy. ----------------------- Seeding With Directives ----------------------- Directives can be used to propose an initial parallelization strategy to the MAGIC pre-compiler. Directives might indicate how a few key data arrays should be partitioned, or which DO loops are the most significant. These act as a seed for MAGIC's parallelization, from which a full strategy of loop distributions and data array partitioning is developed. Working from the user's seed directives, MAGIC finds all arrays used in combination with the partitioned arrays and decomposes them in the same way. It then proceeds to distribute as many DO loops referencing these arrays along the partitioned dimension as possible. Then it may automatically partition other arrays and distribute the loops they reference in a cascading process that works its way through the entire program's call tree until a viable parallelization has been developed. ---------------------------------- Serial Runtime Performance Timings ---------------------------------- APR's FORGE pre-compilers can produce instrumented versions of programs that run on the target host system and generate detailed execution timings. APR's runtime library measures performance down to the DO loop level. It reports relative percentages to identify the most significant routines and loops, and it contrasts loop and routine timings, including and excluding time spent in called routines and enclosed loops. Serial timing reports can be imported back into the parallelizers to drive the MAGIC process. ----------------------------- Parallel Performance Analysis ----------------------------- To refine a program's parallelization strategy for distributed memory systems, we need to know how well or poorly the program performs in parallel. In particular, we need to know where the bottlenecks for interprocessor communication are, and we need to find the cause of losses due to poor load balancing of processors and excessive overhead. APR's pre-compilers for distributed memory can instrument the parallelized programs they generate. When run on the target multiprocessor system they will produce a timing report that profiles the program's parallel performance and identifies data communication as well as routine and loop timings. With parallel performance timings in hand, you can fine tune the parallelization strategy by restructuring the code or inserting directives to alter the data partitioning or loop distribution decisions. -------------------- The Parallel Program -------------------- APR's distributed memory MAGIC parallelizing pre-compiler generates Fortran 77 SPMD (Single Program Multiple Data) code that is immediately compilable on many systems. Runtime data partitioning and communication, loop distribution and synchronization are performed by inserted calls to APR's parallel support library, which in turn interfaces the standard message passing technology libraries, including PVM, Express*, Linda*, IBM EUI, Intel NT, etc. On shared memory systems, the code generated is parallelized using directives that are specific for the target system and compiler. On shared memory systems, the MAGIC pre-compiler's parallelization includes cache management strategies (array padding and alignment) that will result in data restructuring. ---------- Directives ---------- APR's distributed memory parallelization directives apply to both data array partitioning and DO loop distribution, and are more flexible in their use than the published HPF (High Performance Fortran) directives. MAGIC will optionally generate a Fortran 77 program with the parallelization expressed in APR directives rather than runtime library calls. This gives the user a way to refine the parallelization strategy by changing the generated directives and inserting new ones, and then feeding the code back into the pre-compiler as input. -------------- MAGIC Products -------------- APR offers three MAGIC Pre-Compilers: dpf for distributed memory systems spf for shared memory systems xhpf for HPF directives and Fortran 90 array syntax on distributed memory systems ------------------ Other APR Products ------------------ forge90 Interactive parallelizers for distributed & shared memory systems forgex FORGE Explorer Motif GUI global Fortran program browser --------------------- Platforms and Targets --------------------- APR's products are available to run on various systems including IBM RS/6000, DEC Alpha, HP, SUN, and Cray. Parallelizations and runtime support are available for: workstation clusters, IBM SP1 and POWER/4, Intel Paragon, nCUBE, Meiko, Cray T3D, CM-5. ----------------- Other Information ----------------- For further information on these tools and our parallelization techniques training workshops, contact us at: ----------------------------------------------------------------------- Applied Parallel Research, Inc. 550 Main Street, Suite I Placerville, CA 95667 Phone: 916/621-1600 Fax: 916/621-0593 email: forge@netcom.com ----------------------------------------------------------------------- Copyright * 1993 Applied Parallel Research, Inc. 11/93