NHSE ReviewTM 1997 Volume First Issue

Establishing Standards for HPC System Software and Tools

| <- HREF="ch1.html" Prev | Index | Next -> |
NHSE ReviewTM: Comments · Archive · Search


Chapter 2 -- Why Software Standards Are Important for HPC

Perhaps more than any other user community, HPC programmers have been faced with a recurring paradox:

The fact that software is available to address a particular need
does not mean the need has been satisfied.

Rudimentary debuggers and timing tools were among the first software to be delivered on parallel platforms. Over the last ten years, developers from both research and industrial organizations have devoted considerable effort to the problem of tool support for parallel debugging and performance tuning. [20, 24, 4].

One result is that with few exceptions, the software suites provided on recent parallel and clustered computer systems have included at least an interactive debugger, a tool measuring program timing characteristics, and a tool measuring one or more other types of program performance (e.g., message- passing, instruction counts, memory use, I/O). Additional performance analysis tools are widely available as shareware. Evidence shows, however, that HPC application developers simply aren't using the current generation of debugging and tuning tools. [19]

There are a number of reasons for this. At the Second Pasadena Workshop on System Software and Tools for Parallel Computing Systems, a working group was convened to address the issue of "Usability of HPC System Software and Tools". The group included users from research institutions and from third-party applications developers, as well as software tool developers from the HPC industry.

2.1 The User's Perspective

The users were asked to identify reasons why they don't use current software tools. Seven reasons dominated:

In fact, users cannot be certain that they will get a "payoff" if they learn a new tool or some other type of system software. Therefore, they avoid using tools unless there is no other way to accomplish a necessary task. What's needed is some sort of guarantee that tool skills learned for one HPC system will be applicable on other systems as well.

2.2 The Developer's Perspective

In turn, tool developers were asked why they don't appear to respond to user criticisms of system software and tools. They responded:

In other words, vendors do not perceive any real advantage in conforming to user requests for standardizing system software and tools. Inducements will be needed before they can agree to implement tools that are consistent with other vendors' products.

2.3 The Need for Standards

It is important to appreciate that the nature of HPC applications has changed over the last decade. As competition for HPC resources has increased, applications have come to revolve around the concept of "portability." This is a somewhat over-generalized term that actually encompasses three requirements. First, parallel programmers are concerned with the need to migrate existing codes successfully to new and better systems as they become available. The reality of today's parallel computing marketplace is that hardware and systems software change almost constantly. By the time an application is ready for production-level use, the platform for which it was developed will have been superceded. Alternatively, the best performance may be achievable only if the application can make use of multiple platforms (e.g., data filtering on an distributed-memory system, followed by intensive computation on an SMP, followed by visualization on a specialized workstation), in which case individual portions of the application may be migrated to different targets.

The skyrocketing popularity of network-based (heterogeneous or clustered) parallelism imposes another requirement for programming support. In some situations, such environments offer a mechanism for consuming so-called "wasted cycles" when machines are idle or under-utilized. At other sites, network-based systems provide alternate environments for executing parallel applications when the primary target machine is unavailable due to competing demands. Programmers are now demanding the ability to transport codes (i.e., port repeatedly without sacrificing performance) across a spectrum of computers or system configurations.

The third requirement is support for distributed development; that is, coding, compiling, and even debugging applications on systems other than the target parallel machine. This is crucial for the future of parallelism, since it decreases the competition for costly HPC resources by off-loading non-HPC tasks. Using portable languages and tools, users can develop and test applications on a serial workstation or a smaller, less expensive HPC system, later moving them to the final target platform.

The clear implication is that HPC tools need to be machine-independent - or at least available on multiple platforms. Like it or not, most HPC programmers will end up working on a number of very different machine platforms over the course of time. The investment in learning to use a tool (or other piece of system software) probably will not be warranted unless the tool is supported on more than one platform, and behaves in a consistent way across platforms. Such cross-platform and cross-vendor consistency can only be achieved through formal or informal standardization.

Copyright © 1996


| <- HREF="ch1.html" Prev | Index | Next -> |
NHSE ReviewTM: Comments · Archive · Search


Copyright © 1997 Cherri M. Pancake