----------------------------------------------------------------------- Applied Parallel Research FORGE Magic/DM DataSheet ----------------------------------------------------------------------- Automatic Fortran Parallelization For Distributed Memory Systems With FORGE Magic/DM from Applied Parallel Research, Inc. Automatic parallelization of Fortran programs is finally here! APR announces the development of its FORGE MAGIC batch pre-compiler, dpf, for distributed memory, multi-processor systems. APR, a leader in providing tools for Fortran optimization, vectorization, and parallelization, now offers its FORGE* premier parallelization technology enhanced with a unique automatic capability. We call it, simply, MAGIC. ------------------------- Bootstrap Parallelization ------------------------- MAGIC/DM uses various schemes to arrive at an initial parallelization strategy for your program. With FORGE's fully interprocedural analysis, it can identify the most significant loops in the program and develop a parallelization based on those loops and the arrays they reference. Or, given real execution timing information, the parallelizer can focus precisely on just those loops that must be parallelized to significantly affect performance. Preliminary tests show MAGIC/DM's automatic parallelizations achieving 80% of the performance obtained from hand parallelization of certain applications ---------------------- Parallelization Report ---------------------- Not all loops in a program are parallelizable as written. An essential aspect of APR's MAGIC/DM pre-compiler is its detailed diagnostic report showing the loop distribution and data array partitioning strategies that were successfully applied, as well as syndromes in the program that inhibited parallelization. From this report, a user can restructure the program to remove inhibitors, or suggest a different parallelization strategy. In this way, MAGIC/DM makes a perfect learning tool for investigating aspects of data parallelism and program optimization. MAGIC/DM's parallelization is also viewable and modifiable from APR's interactive FORGE DM Parallelizer. ----------------------- Seeding With Directives ----------------------- Directives can be used to propose an initial parallelization strategy to the MAGIC/DM pre-compiler. Directives might indicate how a few key data arrays should be partitioned, or which DO loops are the most significant. These act as a seed for the parallelization, from which a full strategy of loop distributions and data array partitioning is developed. Working from the user's seed directives, MAGIC/DM finds all arrays used in combination with the partitioned arrays and decomposes them in the same way. It then proceeds to distribute as many DO loops referencing these arrays along the partitioned dimension as possible. Then it may automatically partition other arrays and distribute the loops they reference in a cascading process that works its way through the entire program's call tree until a viable parallelization has been developed. ---------------------------------- Serial Runtime Performance Timings ---------------------------------- APR's FORGE pre-compilers can produce instrumented versions of the original program that run on the target host system and generate detailed sequential execution timings. APR's runtime library measures performance down to the DO loop level. It reports relative percentages to identify the most significant routines and loops, and it contrasts loop and routine timings, including and excluding time spent in called routines and enclosed loops. Serial timing reports can be imported back into dpf to drive the automatic parallelization. ----------------------------- Parallel Performance Analysis ----------------------------- To refine a program's parallelization strategy for distributed memory systems, we need to know how well or poorly the program performs in parallel. In particular, we need to know where the bottlenecks for interprocessor communication are and the cause of losses due to poor load balancing of processors and excessive overhead. dpf can also instrument the parallelized programs it generates to produce a timing report when run on the target multiprocessor that profiles the program's parallel performance and identifies data communication as well as routine and loop timings. With parallel performance timings in hand, you can fine tune the parallelization strategy by restructuring the code or inserting directives to alter the data partitioning or loop distribution decisions. The Parallel Program MAGIC/DM generates Fortran 77 SPMD (Single Program Multiple Data) code that is immediately compilable on many systems. Loops are distributed using the owner sets heuristic rule to minimize data motion. Runtime data partitioning and communication, loop distribution and synchronization are performed by inserted calls to APR's parallel support library, which in turn interfaces the standard message passing technology libraries, including PVM, Express*, Linda*, IBM EUI, Intel NT, etc. dpf will also output the original Fortran program with the parallelization expressed in APR directives rather than runtime library calls. This gives the user a way to refine the parallelization strategy by changing the generated directives and inserting new ones, and then feeding the code back into the pre-compiler as input. ---------- Directives ---------- APR's distributed memory parallelization directives apply to both data array partitioning and DO loop distribution, and are more flexible than the published HPF (High Performance Fortran) directives. The PARTITION directive permits specification of either BLOCK or CYCLIC distributions with either FULL or SHRUNK array allocations on nodes. A DO PAR directive allows explicit parallelization of loops, including specification of the controlling owner sets array and dimension. Further fine tuning is possible using the IGNORE directive to override assumed inhibitors and to eliminate unnecessary data communication and synchronization. ------------------------------- Automatic Parallelization Modes ------------------------------- MAGIC/DM offers flexibility in its automatic mode. The dpf command line -Auto option controls parallelization behavior. At the highest level, parallelization is fully automatic, and dpf works on the most significant loop down to some specifiable threshold. In other modes, dpf favors user directives for array partitioning and/or loop distribution to seed the parallelization. At the lowest level, automatic parallelization is disabled and dpf only interprets the user's explicit directives. -------------- MAGIC Products -------------- APR offers three MAGIC Pre-Compilers: dpf for distributed memory systems spf for shared memory systems xhpf for HPF directives and Fortran 90 array syntax on distributed memory systems ------------------ Other APR Products ------------------ forge90 Interactive parallelizers for distributed & shared memory systems forgex FORGE Explorer Motif GUI global Fortran program browser --------------------- Platforms and Targets --------------------- APR's products are available to run on various systems including HP, SUN, IBM RS/6000, DEC Alpha, and Cray. Parallelizations and runtime support are available for: workstation clusters, IBM SP1 and POWER/4, Intel Paragon, nCUBE, Meiko, Cray T3D, TMC CM-5. ----------------- Other Information ----------------- For further information on these tools and our parallelization techniques training workshops, contact us at: Applied Parallel Research, Inc. 550 Main Street, Suite I Placerville, CA 95667 Phone: 916/621-1600 Fax: 916/621-0593 email: forge@netcom.com Copyright * 1993 Applied Parallel Research, Inc. 11/93