-------------------------------------------------------------------------
Applied Parallel Research        FORGE Magic Pre-Compilers      DataSheet
-------------------------------------------------------------------------

Parallelize Fortran Automatically with FORGE MAGIC Pre-Compilers from
Applied Parallel Research, Inc.


Automatic parallelization of Fortran programs is finally possible for a
range of real applications.  APR announces the development of its MAGIC
series of batch pre-compilers for both distributed and shared memory
parallel multi-processor systems.

APR, a leader in providing tools for Fortran optimization, vectorization,
and parallelization, now offers its FORGE* premier parallelization
technology enhanced with a unique automatic capability.  We call it,
simply, MAGIC.

-------------------------
Bootstrap Parallelization 
-------------------------
                        MAGIC uses various schemes to arrive at an
initial parallelization strategy for your program. With FORGE's fully
interprocedural analysis, it can identify the most significant loops in
the program and develop a parallelization based on those loops and the
arrays they reference. Or, given program execution timing information, the
parallelizer can focus precisely on the loops that must be parallelized to
significantly affect performance.

----------------------
Parallelization Report 
----------------------
                     Not all loops in a program are parallelizable as
written. An essential aspect of APR's MAGIC pre-compilers is the detailed
diagnostic assistance they provide.   A parallelization report indicates
the loop distribution and data array partitioning strategies that were
successfully applied, as well as syndromes in the program that inhibited
parallelization.  From this report, a user can restructure the program to
remove inhibitors, or suggest a different parallelization strategy.

-----------------------
Seeding With Directives 
-----------------------
                      Directives can be used to propose an initial
parallelization strategy to the MAGIC pre-compiler.  Directives might
indicate how a few key data arrays should be partitioned, or which DO
loops are the most significant. These act as a seed for MAGIC's
parallelization, from which a full strategy of loop distributions and data
array partitioning is developed.  Working from the user's seed directives,
MAGIC finds all arrays used in combination with the partitioned arrays and
decomposes them in the same way.  It then proceeds to distribute as many
DO loops referencing these arrays along the partitioned dimension as
possible.  Then it may automatically partition other arrays and distribute
the loops they reference in a cascading process that works its way through
the entire program's call tree until a viable parallelization has been
developed.

----------------------------------
Serial Runtime Performance Timings 
----------------------------------
                                 APR's FORGE pre-compilers can produce
instrumented versions of programs that run on the target host system and
generate detailed execution timings.  APR's runtime library measures
performance down to the DO loop level. It reports relative percentages to
identify the most significant routines and loops, and it contrasts loop
and routine timings, including and excluding time spent in called routines
and enclosed loops.  Serial timing reports can be imported back into the
parallelizers to drive the MAGIC process.

-----------------------------
Parallel Performance Analysis 
-----------------------------
                            To refine a program's parallelization
strategy for distributed memory systems, we need to know how well or
poorly the program performs in parallel.  In particular, we need to know
where the bottlenecks for interprocessor communication are, and we need to
find the cause of losses due to poor load balancing of processors and
excessive overhead.  APR's pre-compilers for distributed memory can
instrument the parallelized programs they generate.  When run on the
target multiprocessor system they will produce a timing report that
profiles the program's parallel performance and identifies data
communication as well as routine and loop timings. With parallel
performance timings in hand, you can fine tune the parallelization
strategy by restructuring the code or inserting directives to alter the
data partitioning or loop distribution decisions.

--------------------
The Parallel Program 
--------------------
                   APR's distributed memory MAGIC parallelizing
pre-compiler generates Fortran 77 SPMD (Single Program Multiple Data) code
that is immediately compilable on many systems.  Runtime data partitioning
and communication, loop distribution and synchronization are performed by
inserted calls to APR's parallel support library, which in turn interfaces
the standard message passing technology libraries, including PVM,
Express*, Linda*, IBM EUI, Intel NT, etc.

On shared memory systems, the code generated is parallelized using
directives that are specific for the target system and compiler.  On
shared memory systems, the MAGIC pre-compiler's parallelization includes
cache management strategies (array padding and alignment) that will result
in data restructuring.

----------
Directives 
----------
         APR's distributed memory parallelization directives apply to
both data array partitioning and DO loop distribution, and are more
flexible in their use than the published HPF (High Performance Fortran)
directives.  MAGIC  will optionally generate a Fortran 77 program with the
parallelization expressed in APR directives rather than runtime library
calls.  This gives the user a way to refine the parallelization strategy
by changing the generated directives and inserting new ones, and then
feeding the code back into the pre-compiler as input.


--------------
MAGIC Products 
--------------
APR offers three MAGIC Pre-Compilers:

 dpf  for distributed memory systems

 spf  for shared memory systems

 xhpf for HPF directives and
       Fortran 90 array syntax on distributed memory systems

------------------
Other APR Products
------------------
 forge90     Interactive parallelizers for distributed & 
               shared memory systems
 forgex      FORGE Explorer Motif GUI global Fortran program browser


---------------------
Platforms and Targets 
---------------------
                    APR's products are available to run on various
systems including IBM RS/6000, DEC Alpha, HP, SUN, and Cray.
Parallelizations and runtime support are available for: workstation
clusters, IBM SP1 and POWER/4, Intel Paragon, nCUBE, Meiko, Cray T3D,
CM-5.

-----------------
Other Information 
-----------------
                For further information on these tools and our
parallelization techniques training workshops, contact us at:  

-----------------------------------------------------------------------
Applied Parallel Research, Inc.  
550 Main Street, Suite I Placerville, CA 95667
Phone: 916/621-1600 Fax: 916/621-0593 email: forge@netcom.com
-----------------------------------------------------------------------

Copyright * 1993 Applied Parallel Research, Inc.                11/93