-----------------------------------------------------------------------
Applied Parallel Research       FORGE Magic/DM                DataSheet
-----------------------------------------------------------------------

Automatic Fortran Parallelization For Distributed Memory Systems With
FORGE Magic/DM from Applied Parallel Research, Inc.

Automatic parallelization of Fortran programs is finally here!  APR
announces the development of its FORGE MAGIC batch pre-compiler, dpf, for
distributed memory, multi-processor systems.

APR, a leader in providing tools for Fortran optimization, vectorization,
and parallelization, now offers its FORGE* premier parallelization
technology enhanced with a unique automatic capability.  We call it,
simply, MAGIC.

-------------------------
Bootstrap Parallelization
-------------------------
                        MAGIC/DM uses various schemes to arrive at an
initial parallelization strategy for your program. With FORGE's fully
interprocedural analysis, it can identify the most significant loops in
the program and develop a parallelization based on those loops and the
arrays they reference. Or, given real execution timing information, the
parallelizer can focus precisely on just those loops that must be
parallelized to significantly affect performance. Preliminary tests show
MAGIC/DM's automatic parallelizations achieving 80% of the performance
obtained from hand parallelization of certain applications

----------------------
Parallelization Report
----------------------
                     Not all loops in a program are parallelizable as
written. An essential aspect of APR's MAGIC/DM pre-compiler is its
detailed diagnostic report showing the loop distribution and data array
partitioning strategies that were successfully applied, as well as
syndromes in the program that inhibited parallelization. From this report,
a user can restructure the program to remove inhibitors, or suggest a
different parallelization strategy. In this way, MAGIC/DM makes a perfect
learning tool for investigating aspects of data parallelism and program
optimization.  MAGIC/DM's parallelization is also viewable and modifiable
from APR's interactive FORGE DM Parallelizer.

-----------------------
Seeding With Directives
-----------------------
                      Directives can be used to propose an initial
parallelization strategy to the MAGIC/DM pre-compiler.  Directives might
indicate how a few key data arrays should be partitioned, or which DO
loops are the most significant. These act as a seed for the
parallelization, from which a full strategy of loop distributions and data
array partitioning is developed. Working from the user's seed directives,
MAGIC/DM finds all arrays used in combination with the partitioned arrays
and decomposes them in the same way. It then proceeds to distribute as
many DO loops referencing these arrays along the partitioned dimension as
possible. Then it may automatically partition other arrays and distribute
the loops they reference in a cascading process that works its way through
the entire program's call tree until a viable parallelization has been
developed.

----------------------------------
Serial Runtime Performance Timings 
----------------------------------
                                 APR's FORGE pre-compilers can produce
instrumented versions of the original program that run on the target host
system and generate detailed sequential execution timings.  APR's runtime
library measures performance down to the DO loop level. It reports
relative percentages to identify the most significant routines and loops,
and it contrasts loop and routine timings, including and excluding time
spent in called routines and enclosed loops.  Serial timing reports can be
imported back into dpf to drive the automatic parallelization.

-----------------------------
Parallel Performance Analysis
-----------------------------
                            To refine a program's parallelization
strategy for distributed memory systems, we need to know how well or
poorly the program performs in parallel.  In particular, we need to know
where the bottlenecks for interprocessor communication are and the cause
of losses due to poor load balancing of processors and excessive overhead.
dpf can also instrument the parallelized programs it generates to produce
a timing report when run on the target multiprocessor that profiles the
program's parallel performance and identifies data communication as well
as routine and loop timings. With parallel performance timings in hand,
you can fine tune the parallelization strategy by restructuring the code
or inserting directives to alter the data partitioning or loop
distribution decisions.

The Parallel Program MAGIC/DM generates Fortran 77 SPMD (Single Program
Multiple Data) code that is immediately compilable on many systems.  Loops
are distributed using the owner sets heuristic rule to minimize data
motion. Runtime data partitioning and communication, loop distribution and
synchronization are performed by inserted calls to APR's parallel support
library, which in turn interfaces the standard message passing technology
libraries, including PVM, Express*, Linda*, IBM EUI, Intel NT, etc. dpf
will also output the original Fortran program with the parallelization
expressed in APR directives rather than runtime library calls.  This gives
the user a way to refine the parallelization strategy by changing the
generated directives and inserting new ones, and then feeding the code
back into the pre-compiler as input.

----------
Directives 
----------
         APR's distributed memory parallelization directives apply to
both data array partitioning and DO loop distribution, and are more
flexible than the published HPF (High Performance Fortran) directives.
The PARTITION directive permits specification of either BLOCK or CYCLIC
distributions with either FULL or SHRUNK array allocations on nodes.  A DO
PAR directive allows explicit parallelization of loops, including
specification of the controlling owner sets array and dimension.  Further
fine tuning is possible using the IGNORE directive to override assumed
inhibitors and to eliminate unnecessary data communication and
synchronization.

-------------------------------
Automatic Parallelization Modes 
-------------------------------
                              MAGIC/DM offers flexibility in its
automatic mode.  The dpf command line -Auto option controls
parallelization behavior.  At the highest level, parallelization is fully
automatic, and dpf works on the most significant loop down to some
specifiable threshold.  In other modes, dpf favors user directives for
array partitioning and/or loop distribution to seed the parallelization.
At the lowest level, automatic parallelization is disabled and dpf only
interprets the user's explicit directives.

--------------
MAGIC Products 
--------------
APR offers three MAGIC Pre-Compilers:

 dpf  for distributed memory systems

 spf  for shared memory systems

 xhpf for HPF directives and Fortran 90 array syntax on 
                             distributed memory systems


------------------
Other APR Products
------------------
 forge90   Interactive parallelizers for distributed & 
                                 shared memory systems

 forgex    FORGE Explorer Motif GUI global Fortran program browser


---------------------
Platforms and Targets 
---------------------
                    APR's products are available to run on various
systems including HP, SUN, IBM RS/6000, DEC Alpha, and Cray.
Parallelizations and runtime support are available for: workstation
clusters, IBM SP1 and POWER/4, Intel Paragon, nCUBE, Meiko, Cray T3D, TMC
CM-5.


-----------------
Other Information 
-----------------
                For further information on these tools and our
parallelization techniques training workshops, contact us at:  Applied
Parallel Research, Inc.  550 Main Street, Suite I Placerville, CA 95667
Phone: 916/621-1600 Fax: 916/621-0593 email: forge@netcom.com

Copyright * 1993 Applied Parallel Research, Inc.                   11/93