PTLIB Review of MPP-Apprentice

See the PTLIB Review of Parallel Debuggers and Performance Analyzers which includes this evaluation of Apprentice for the review criteria and methodology as well as a comparison with other similar tools.

Performance

Acceptable monitoring overhead
Yes. The amount of overhead varies, but should be around a factor of 2. (the program takes twice as long to run with the MPP Apprentice instrumentation enabled on all routines).

Intrusion compensation
Apprentice does subtract out the time the user spends in the instrumentation code. It however does not compensate for the changes in the use of instruction and data cache caused by the additional instrumentation code.

Acceptable response time
Yes. Outside of peak network traffic hours the response of MPP-Apprentice is quite good. Response can be slow during regular work days during peak hours, but this is largely due to the load on the system being used rather than any fault of the tool.

Memory/disk requirements
The executable for Apprentice appears to only take about 5 megs of hard drive space. The additional xhelp information tool for Apprentice takes 4 megs.

Scalable data collection
Yes. Apprentice can be enabled for only certain subroutines during compile time, thus modifying the amount of data collected.

Scalable data presentation
The Apprentice presentation consolidates information in the same manner regardless of the number of processes.

Versatility

Languages/programming models/ communication libaries supported
C, C++, and Fortran / MIMD, SPMD / PVM, MPI, CRAFT
Release planned for spring 1997 will support HPF_CRAFT on the T3E.

Runs on currently popular platforms
Yes. Apprentice runs on the CRAY T3D and T3E.

Platform dependencies isolated
Apprentice runs exclusively on the CRAY T3D / T3E.

Support for heterogeneous environment
No.

Interacts with current or soon-to-be standards (e.g., PVM, MPI, HPF)
Yes. With PVM and MPI. But not with HPF.

Uses SDDF (Standard Data Display Format)
No.

Change/customize/add new views easily
No.

Ease of Use

Documentation
Very straightforward and complete. The documentation can be found on the T3D by accessing the man page (man apprentice) or by using the help button (which calls the xhelp utility), or by reading the information panel at the bottom of the MPP Apprentice window, and finally by observing the Context Sensitive Help window.

Ease of installation
Because I was unable to install a copy of Apprentice myself, I cannot say whether installation is trivial or not.

Command-line interface
No.

Window-based interface
Apprentice has a user-friendly X Window interface which allows the user to select from a series of monitoring views.

GUI common look-and-feel (OSF/Motif Style Guide)
Yes. The windows behave as the user would expect.

Privilege-free installation
Not applicable (Apprentice is part of the standard software installed on the Cray T3D).

Reports information at source code level
Yes. Information on timing at the source code level can be obtained. The source code itself can be viewed, modified and recompiled from Apprentice, through an xbrowse tool.

Automated instrumentation
Yes. The user doesn't need to manually add anything to his/her source code.

Compile without special linking
No. Apprentice requires compilation with -lapp to link with the apprentice library, and it needs to be enabled at compile time with the options -Ta (Timing all, in Fortran) or -h apprentice (in C).

Maturity

Runs without crashing the monitored program
Yes. Apprentice does NOT crash the monitored program. Because Apprentice works post-mortem with a generated file, the program has already executed.

Reports and recovers from error conditions
No.

Support
Because Apprentice is a commercial product and should be installed on the T3D or T3E the user chooses to use, obtaining assistance should simply consist of contacting the designated help contact on that T3D or T3E.

Capabilities

Support for multiple threads per node
No.

Presents different levels of abstraction, from global to individual threads, procedures, or data structures
Yes. It is possible to analyze performance from the main program, down to individual subroutines, and even down to individual function calls.

Single point of control for parallel debugging
No. Apprentice is not a parallel debugger, it is a parallel monitoring tool. Apprentice has single point control for parallel monitoring.

Attach/detach to/from running program
No.

Breakpoints and data watchpoints
No.

Program state examination
No.

Program state modification
No.

Event tracing mechanism
No. Yes. With the -Ta (Timing, all) compile flag you can selectively enable Apprentice for individual files at compile time. (use -happrentice when using C language)

Cache and memory reference tracking/display
No. But Apprentice does report the overall time for loading instructions caches and data caches. Apprentice also notes the amount of time spent doing shared to private data coercion. And notes the number of PE private loads and stores, local shared loads and stores, and remote loads and stores. This numbers can be obtained for the whole program, a subroutine, down to a basic block.

Remote data access pattern analysis
No.

Message tracing/display
It isn't tracing messages, but you can see the amount of time spent in a particular message called from a particular location and something of the distribution of that time across the processors with the mean, min, max stuff mentioned above.

Input/output characterization
Yes. Apprentice gives the overall time spent in doing input/output operations for each subroutine.

Real-time monitoring
No.

Post-mortem analysis
Yes. Apprentice monitors a program and allows the user to view several different performance aspects by analyzing a generated trace file.

Profiling at level of subprocedures and coarse blocks
Yes.

Utilization display (communications/idle/IO/computation)
No. But Apprentice does give information on time spent waiting, or doing I/O, or doing computations (specifiying types of computations: float adds, float multiplies, integer adds, etc) You can also get some idea of load imbalance across the processors.

Performance prediction
Yes. Descriptions are given on time losses due to instruction cache and data cache activities, as well as due to single instruction issue; an estimate of possible preformance improvement is given.

Comparisons between different runs
No.

Other

Commercial/research
Commercial.

Cost
Apprentice is included as part of Cray CF90 and C++ Programming Environments for MPP systems.

Software Obtained

Location
Used on the Cray T3D at PSC (telnet mario.psc.edu)

Date
Late November, 1996

Version
MPP Apprentice Version 1.1.1.5, Oct 1995

Summary

The MPP Apprentice is a performance analysis tool that collects both compile-time and run-time information that helps the user find performance infefficiencies in his/her program. Apprentice is enabled at compile time and can be selectively applied to individual subroutines. The clean and simple graphical interface gives the user numerical and bar-chart information on time spent in different routines, how that time was spent (calculations, I/O, waits, etc.) Useful observations are also given on how performance could be improved. Since Apprentice has a graphical/window interface the response time can sometimes be slow, specially on regular work days during office hours. Otherwise, response time is very decent and Apprentice can prove to be a valuable performance analyzer too.

Ratings (Worse 1 ... 5 Better)

Click here to view a screen shot
Reviewed by Christian Halloy, Nov. 30, 1996
Review updated by Shirley Browne, May 2, 1997
ptlib_maintainers@nhse.org