PTLIB Review of MPP-Apprentice
See the PTLIB Review of Parallel Debuggers and
Performance Analyzers which includes this evaluation of Apprentice
for the review criteria and methodology as well as a comparison
with other similar tools.
Performance
- Acceptable monitoring overhead
- Yes. The amount of overhead varies, but should be around a
factor of 2. (the program takes twice as long to run with the
MPP Apprentice instrumentation enabled on all routines).
- Intrusion compensation
- Apprentice does subtract out the time the user
spends in the instrumentation code. It however does not compensate
for the changes in the use of instruction and data cache caused
by the additional instrumentation code.
- Acceptable response time
- Yes. Outside of peak network traffic hours the response of
MPP-Apprentice is quite good. Response can be slow during
regular work days during peak hours, but this is largely
due to the load on the system being used rather than any fault
of the tool.
- Memory/disk requirements
- The executable for Apprentice appears to only take about 5 megs of
hard drive space. The additional xhelp information tool for Apprentice
takes 4 megs.
- Scalable data collection
- Yes. Apprentice can be enabled for only certain subroutines
during compile time, thus modifying the amount of data
collected.
- Scalable data presentation
- The Apprentice presentation consolidates information in the
same manner regardless of the number of processes.
Versatility
- Languages/programming models/
communication libaries supported
- C, C++, and Fortran / MIMD, SPMD / PVM, MPI, CRAFT
Release planned for spring 1997 will support HPF_CRAFT on the T3E.
- Runs on currently popular platforms
- Yes. Apprentice runs on the CRAY T3D and T3E.
- Platform dependencies isolated
- Apprentice runs exclusively on the CRAY T3D / T3E.
- Support for heterogeneous environment
- No.
- Interacts with current or soon-to-be
standards (e.g., PVM, MPI, HPF)
- Yes. With PVM and MPI. But not with HPF.
- Uses SDDF (Standard Data Display Format)
- No.
- Change/customize/add new views easily
- No.
Ease of Use
- Documentation
- Very straightforward and complete. The documentation can be found
on the T3D by accessing the man page (man apprentice) or by using the
help button (which calls the xhelp utility), or by reading the
information panel at the bottom of the MPP Apprentice window, and
finally by observing the Context Sensitive Help window.
- Ease of installation
- Because I was unable to install a copy of Apprentice myself, I
cannot say whether installation is trivial or not.
- Command-line interface
- No.
- Window-based interface
- Apprentice has a user-friendly X Window interface which
allows
the user to select from a series of monitoring views.
- GUI common look-and-feel (OSF/Motif Style Guide)
- Yes. The windows behave as the user would expect.
- Privilege-free installation
- Not applicable (Apprentice is part of the
standard software installed on the Cray T3D).
- Reports information at source code level
- Yes. Information on timing at the source code level can be
obtained.
The source code itself can be viewed, modified and recompiled
from Apprentice, through an xbrowse tool.
- Automated instrumentation
- Yes. The user doesn't need to manually add anything to his/her
source
code.
- Compile without special linking
- No. Apprentice requires compilation with -lapp to link with the
apprentice library, and it needs to be enabled at compile time
with the options -Ta (Timing all, in Fortran) or
-h apprentice (in C).
Maturity
- Runs without crashing the monitored program
- Yes. Apprentice does NOT crash the monitored program. Because
Apprentice works post-mortem with a generated file,
the program has already executed.
- Reports and recovers from error conditions
- No.
- Support
- Because Apprentice is a commercial product and should be installed
on
the T3D or T3E the user chooses to use, obtaining assistance should simply
consist of contacting the designated help contact on that T3D or T3E.
Capabilities
- Support for multiple threads per node
- No.
- Presents different levels of abstraction,
from global to individual threads,
procedures, or data structures
- Yes. It is possible to analyze performance from the main program,
down to individual subroutines, and even down to individual
function calls.
- Single point of control for parallel debugging
- No. Apprentice is not a parallel debugger, it is a parallel
monitoring tool. Apprentice has single point control for
parallel monitoring.
- Attach/detach to/from running program
- No.
- Breakpoints and data watchpoints
- No.
- Program state examination
- No.
- Program state modification
- No.
- Event tracing mechanism
- No.
Yes. With the -Ta (Timing, all) compile flag you can selectively
enable Apprentice for individual files at compile time.
(use -happrentice when using C language)
- Cache and memory reference tracking/display
- No. But Apprentice does report the overall time for loading
instructions caches and data caches.
Apprentice also notes the
amount of time spent doing shared to private data coercion.
And notes the number of PE private loads and stores,
local shared loads and stores, and remote loads and stores.
This numbers can be obtained for the whole program, a subroutine,
down to a basic block.
- Remote data access pattern analysis
- No.
- Message tracing/display
- It isn't tracing messages, but you can see the amount of time
spent in a particular message called from a particular location
and something of the distribution of that time across the processors
with the mean, min, max stuff mentioned above.
- Input/output characterization
- Yes. Apprentice gives the overall time spent in doing
input/output
operations for each subroutine.
- Real-time monitoring
- No.
- Post-mortem analysis
- Yes. Apprentice monitors a program and allows the user to view
several
different performance aspects by analyzing a generated trace
file.
- Profiling at level of subprocedures and
coarse blocks
- Yes.
- Utilization display
(communications/idle/IO/computation)
- No. But Apprentice does give information on time spent waiting,
or doing I/O, or doing computations (specifiying types of
computations: float adds, float multiplies, integer adds, etc)
You can also get some idea of load imbalance across the processors.
- Performance prediction
- Yes. Descriptions are given on time losses due to instruction
cache
and data cache activities, as well as due to single instruction
issue; an estimate of possible preformance improvement is given.
- Comparisons between different runs
- No.
Other
- Commercial/research
- Commercial.
- Cost
- Apprentice is included as part of
Cray CF90 and C++ Programming Environments for MPP systems.
Software Obtained
- Location
- Used on the Cray T3D at PSC (telnet mario.psc.edu)
- Date
- Late November, 1996
- Version
- MPP Apprentice Version 1.1.1.5, Oct 1995
Summary
The MPP Apprentice is a performance analysis tool that collects
both compile-time and run-time information that helps the user
find performance infefficiencies in his/her program. Apprentice
is enabled at compile time and can be selectively applied to
individual subroutines. The clean and simple graphical interface
gives the user numerical and bar-chart information on time spent
in different routines, how that time was spent (calculations, I/O,
waits, etc.) Useful observations are also given on how performance
could be improved. Since Apprentice has a graphical/window interface
the response time can sometimes be slow, specially on regular
work days during office hours. Otherwise, response time is very
decent and Apprentice can prove to be a valuable performance
analyzer too.
Ratings (Worse 1 ... 5 Better)
- Performance: x
- Versatility: x
- Ease of Use: x
- Maturity: x
- Capabilities: x
- OVERALL: x
Click here to view a screen shot
Reviewed by Christian Halloy, Nov. 30, 1996
Review updated by Shirley Browne, May 2, 1997
ptlib_maintainers@nhse.org