| <- HREF="chap1.html" Prev | Index | Next -> |
NHSE ReviewTM: Comments · Archive · Search
HPF compilers, in general, are fairly "young" as compared to compilers for other standardized languages such as Fortran or C. As a result, there is not yet a common body of practice, widely accepted, that is a standard yardstick for comparing all HPF compilers. This chapter makes the obvious comparison of which defined HPF language features are supported by each reviewed compiler. Then the capabilities of each compiler in handling FORTRAN 77 or Fortran 90 features, particularly with respect to parallelism and parallelization are tabulated. Each compiler has some associated performance analysis tools and some extensions beyond Standard Fortran or HPF. Some of these are tabulated. Finally some subjective perceptions as to the relative strengths of each compiler are given, derived from some application codes that have been implemented in or converted to HPF at the CTC, and from some inspection of and interaction with CTC user's codes.
Later chapters show more details of the generated code from each compiler for a few small kernels of HPF code. These are discussed to give a better feel for the compilation behavior of each compiler, as perceptible to a user of the compiler.
A complication in forming evaluation criteria for these compilers is the current distinction between HPF and Subset HPF. Indeed, more confusion is added to any feature comparisons based on the full and Subset languages because a number of detailed additions and subtractions were made to the contents of the Subset during the interpretations and corrections that led from "HPF 1.0" of May 1993 to the current "HPF 1.1" of November 1994. (Note that the HPF Forum has reconsidered the status of the "Subset HPF" definition in that the soon to be adopted proposals for "HPF 2.0" do not use that form of two-level designation.)
It would be desirable in a comparative review paper such as this to present carefully measured data on, for instance, compilation duration timing, size of generated executable files, runtime execution duration timing, etc., for a well studied collection of benchmark kernels and whole applications. This has not been done for this paper at this time.
None of these three compilers is completely compliant, or effective, for the full facilities of Fortran 90 combined with the full HPF language. Two compilers (xHPF and XL HPF) are designated as "Extended Subset HPF" compilers by their vendors, and each has significant extensions into facilities of the full language, and other departures from the Subset language. It is thus better to describe all these compilers by listing all the HPF features they will or will not process rather than to attempt to categorize them as "full" or "Subset".
The following table indicates some major aspects of HPF and briefly notes presence or absence of the feature in each of the compilers, or indicates restricted, related, or augmented facilities corresponding to the HPF feature.
Feature | xHPF [2] | pghpf [3], [4], [5] | XL HPF [6], [7] |
---|---|---|---|
TEMPLATE, DISTRIBUTE, ALIGN | yes | yes | yes |
BLOCK or CYCLIC in N dimensions | yes (with default assumptions and keyword extensions) |
yes | yes (default preference to N=2 if no PROCESSORS used with DISTRIBUTION) |
BLOCK(k), CYCLIC(k) | no | yes | partial --- CYCLIC(k) is generally treated as CYCLIC(1) |
PROCESSORS | no (but checked for syntax) | yes | yes |
DYNAMIC, REDISTRIBUTE, REALIGN | no (but proprietary use of DYNAMIC) |
yes | no |
SEQUENCE, NOSEQUENCE | yes | yes | yes |
FORALL | statement only | statement and construct | statement and construct |
INDEPENDENT | yes (omits NEW clause, but is extended in semantics; see also many CAPR$ directives to control parallelization) |
yes (extended with ON HOME and REDUCTION clauses) |
yes |
Default Data Mapping for Arrays not named in Directives | Replicated (but note derivation of mappings via " ... -Auto ... "
parallelization and generated mappings) |
Replicated | Replicated |
HPF Procedure Arguments and Data Mapping facilities | Prescriptive mapping available | Prescriptive and Transcriptive (inherited) mappings available | Prescriptive and Descriptive mappings available |
HPF Procedure Dummy Argument Data Mapping for Arrays not named in Directives | Procedure dummy argument automatically inherits mapping of actual argument | Local replication re-mapping of dummy arguments not explicitly mapped | Local replication re-mapping of dummy arguments not explicitly mapped |
HPF intrinsics and HPF_LIBRARY routines | no | yes | only NUMBER_OF_PROCESSORS, PROCESSORS_SHAPE, HPF_ALIGNMENT, HPF_DISTRIBUTION, HPF_TEMPLATE (some with omissions from defined interfaces) |
EXTRINSIC | (a) omitting routine from translation generates EXTRINSIC-like call; (b) proprietary directive available in caller |
EXTRINSIC(HPF_LOCAL), EXTRINSIC(F77_LOCAL) | EXTRINSIC(HPF), EXTRINSIC(HPF_LOCAL), EXTRINSIC(HPF_SERIAL) |
HPF_LOCAL_LIBRARY routines | no (but proprietary support available for equivalents) |
yes (except for LOCAL_TO_GLOBAL) |
yes (but with omissions from defined interfaces) |
Feature | xHPF | pghpf | XL HPF |
Beyond details of HPF that are handled or not, each compiler has its own manner of handling of FORTRAN 77 and Fortran 90 features that may or may not contribute to the parallel execution of a program. The following table briefly notes some of these capabilities.
Feature | xHPF | pghpf | XL HPF |
---|---|---|---|
Generation of Parallel Code for Run-time Determined Number of Processors | yes | yes | yes |
Automatic Parallelization of DO Loops | yes | yes | yes |
Analysis and Parallelization over the entire Program Tree (Inter-procedural Analysis) | yes | no | no |
Automatic Derivation of Data Mappings (for User-declared Arrays) | yes | no | no |
Fortran 90 Feature Coverage | Closely aligned to only that Fortran 90 required for Subset HPF: significant omissions from F90 are free-form syntax, MODULEs, derived types, pointers, data type parameters (KIND) | Full Fortran 90, but there are limitations on recursion, character arrays, visibility control and use of some objects defined in modules, and the parallelization of programs using arrays of derived type or using pointers | Full Fortran 90, but codes that use pointers, ENTRY statements, sub-objects as internal files, and many intrinsics that have a DIM argument if the argument is non-constant cannot be compiled as HPF codes (however they may be compiled as HPF_LOCAL or HPF_SERIAL codes) |
Fortran I/O Handling | Performed by "Processor 0" | Performed by "Processor 0" in both HPF and HPF_LOCAL code; uses PGI-supplied data conversion support | Performed by "Processor 0" in HPF, by all processors in HPF_LOCAL | Feature | xHPF | pghpf | XL HPF |
Our experience at CTC with HPF indicates that performance tuning (including discovering for which parts of a code the compiler has actually generated effective parallelism) is more laborious than correctness debugging, regardless of the HPF product. Thus, even the nature of a compiler report indicating parallel execution (or not) is an important "tool". The following table indicates some of the facilities, tools, and extensions that bear on those issues (and a few more minor ones).
Tool or Extension | xHPF | pghpf | XL HPF |
---|---|---|---|
Parallel Compilation Report | Three levels available in listing file: summary report, annotated call and loop tree, annotated HPF source listing; also summary available to standard output | Summary report of parallel execution compilation (indicated as "FORALL") available to standard output; also commented-insertions indicating "FORALL" compilation available in preserved intermediate Fortran file | Two levels available in listing file: messages related to original source line numbers indicating absence of parallel execution, pseudo-Fortran source with the same messages inserted |
Instrumented Run Time for Performance Analysis; Display Tool | "-otpf" inserts time-recording calls at each procedure entry/exit
and at head and tail of each loop: execution generates time
information; postprocessor polytime generates annotated
call-tree after run with times attributed to user computation,
latency-plus-bandwidth-attributed communication time, communication
wait time, and RTP overhead; an Xwindow GUI (FORGExplorer/ DMP) is
available to sort report rows per any column and to display the
relevant source code lines |
"-Mprof" inserts time-recording for either routine entry/exits or
for each source-line; execution generates time information;
postprocessor and Xwindow GUI visualizer pgprof displays
information on time taken by each procedure executed or each line
executed, numbers or sizes of messages sent or received at each line,
and allows various sorting-s of the reports;"-Mstats" in compilation command and "-stat" in the execution command line reports CPU time, memory statistics, message-passing statistics at the end of the run. |
"-pg" links prof/gprof-instrumented runtime (asynchronous
sampled timing); postprocessor and Xwindow GUI xprofiler
permits examination of CPU utilization by each procedure; also the
parallel environment and message-passing support has trace-file
collection that can be visualized (using IBM PE VT tool)
with a coordinated source-file browser in post-execution
animation |
Fortran Language Extensions | Limited Connection Machine Fortran (cmf) handling: LAYOUT as DISTRIBUTE, non-F90 argument order in CSHIFT/ EOSHIFT/ RESHAPE, ARRAY attribute as DIMENSION, DATA attribute keyword, spelling of some intrinsic names, RANK intrinsic | STRUCTURE/END STRUCTURE data type declaration, RECORD data
instance declaration (uses STRUCTURE type), UNION and MAP keywords for
STRUCTURE declarations; "Cray Pointer" syntax; nnn'O and
nnn'X constants; limited Connection Machine Fortran
(cmf) handling: non-F90 argument order in CSHIFT/ EOSHIFT/ RESHAPE,
ARRAY attribute as DIMENSION, spelling of some intrinsic names, square
brackets syntax in array constructor |
XL HPF has extensive extensions beyond Fortran 90 that range from the lexical (e.g., $ allowed in identifiers) to an "Integer POINTER type" plus an associated "LOC(...)" intrinsic generating addresses plus an "integer pointer assignment" statement: the details are not highlighted in the XL HPF Language Reference but can be ascertained from the current IBM XL Fortran Language Reference manual [8] |
Vendor-Specific Directives Extensions | Additional keywords in DISTRIBUTE: FULLBLOCK, SHRUNKBLOCK, FULLCYCLIC, SHRUNKCYCLIC; CHPF$ EXTRINSIC SAFE ...; "CAPR$"-form directives for more detailed control of automatic parallelization, communication scheduling, and decomposition: DO PAR [ON ...], DO NOPAR, IGNORE ALL INHIBITORS, IGNORE ... COM [ON ...], PARTITION ..., PARTITION_NOMOVE ... | Additional keyword clauses for INDEPENDENT (anticipates HPF 2): ON HOME ..., REDUCTION ... | @PROCESS allows "command-line" compiler options to have effect within source files; EJECT; ! ... SOURCEFORM ... |
Integration with Other Tools | Interactive parallelization with FORGExplorer/ DMP and xHPF can share identical program-analysis databases (see Performance Analysis report handling, above); all IBM SP PE tools work with generated FORTRAN 77 source (parallel debugger, trace analysis/visualization) | Run time launch of per-node debugger (e.g., dbx); all IBM SP PE tools work with generated FORTRAN 77 source (parallel debugger, trace analysis/visualization) | IBM SP PE tools work with HPF source: parallel debuggers see source statements but not yet distributed data, trace visualization can animate the HPF source during playback | Tool or Extension | xHPF | pghpf | XL HPF |
The experience of the author and colleagues at the CTC with the three compilers has led to a set of perceptions as to their relative strengths and weaknesses. This is clearly a subjective measure, and is colored strongly by the collection of codes that have been processed to date with these compilers. The following table summarizes these perceptions. It has been adapted from a Case Study presentation prepared at CTC and used extensively in training -- in particular, see the commentary in the Part 8 Discussion Layer, section 4.
Feature | xHPF | pghpf | XL HPF |
---|---|---|---|
Style of Fortran Code Best Handled | FORTRAN 77 DO Loops with scalar array references and modest use of reductions and with conditionals only in support of reductions; deep program trees with parallelizable loop at any higher level | Fortran 90 data-parallel constructs; parallelism visible in single compilation units; extensive use of both HPF_LIBRARY intrinsics and Fortran 90 array elemental, manipulation, and reduction intrinsics | Fortran 90 data-parallel constructs; parallelism visible in single compilation units; extensive use of Fortran 90 array elemental, manipulation, and reduction intrinsics |
Ability to Generate Parallel Code from DO Loops | Excellent where dependence analysis (performed on the entire program tree) can determine absence of loop-carried dependences | Excellent where an anchor is visible for data-parallel execution as indicated by a distributed array as the left-hand-side of an assignment | Excellent where an anchor is visible for data-parallel execution as indicated by a distributed array as the left-hand-side of an assignment |
Ability to Recognize Privatizable Scalars and Reductions in Loops | Excellent, including cases with conditionals that implement reductions such as MAXVAL | Excellent, particularly with "auto-parallelization" command-line option and indicated distributed arrays in loops; also has extension of REDUCTION clause (anticipates HPF 2.0) for INDEPENDENT directive to handle "+", "*", ".or.", ".and.", ".neqv.", iand, ior, ieor, min, and max scalar reductions in loops | Excellent |
Ability to Process Connection Machine Fortran (CMF) | Some automatic conversion and handling of LAYOUT directives and non-standard keywords and intrinsic spellings and argument order; severe limitation due to missing HPF_LIBRARY routines | Only CSHIFT, EOSHIFT, and RESHAPE with non-standard argument order, and the ARRAY keyword and non-standard array constructor syntax is handled; LAYOUT directives are not handled; "CMF-inspired" HPF_LIBRARY routines are available | No automatic handling of CMF; note absence of "CMF-inspired" HPF_LIBRARY computational routines | Feature | xHPF | pghpf | XL HPF |
| <- HREF="chap1.html" Prev | Index | Next -> |
NHSE ReviewTM: Comments · Archive · Search