NHSE ReviewTM 1997 Volume First Issue

Establishing Standards for HPC System Software and Tools

| <- HREF="ch9.html" Prev | Index | Next -> |
NHSE ReviewTM: Comments · Archive · Search


Chapter 10 -- Moving HPC Standards Forward

Almost two years have passed since the task force's draft document was produced. In this chapter, an attempt is made to assess the outcomes of the group's efforts as the draft was circulated within the broader HPC community. A second section outlines recent progress in the area of HPC software standards. The concluding paragraphs suggest ways in which readers of this article might become more proactive in supporting standards efforts.

10.1 Outcomes of the Task Force

The standard guidelines were released in draft form at the Supercomputing '95 conference, in November of 1995. Shortly thereafter, they were submitted to the National Coordinating Office for HPCC, for review by the software subcommittee. After several months of formal review by the agencies participating on the committee, adoption of the document took place in June of 1996. By that time, the document had been circulated relatively widely across the participating federal agencies and major supercomputing centers.

While the task force leadership attempted to avoid some of the pitfalls experienced by previous standards efforts, we "discovered" others. Most significant was our failure to institute any type of tracking mechanism that would document how the guidelines were put into practice. As a result, what is reported here is based on anecdotal information forwarded to us informally and, consequently, in hit-or-miss fashion.

A number of major procurement efforts from DOE laboratories, NASA sites, and academic computing center, are reported to have been drafted after extensive study of the guidelines document. Several are said to have used the requirements verbatim, and others showed significant influence in terms of wording and scope.

In addition, virtually all of the vendor organizations that participated in the task force have reported using the guidelines in their own software planning processes. There is evidence - in the form of new product announcements and planned releases - that many of the elements cited by users as key requirements will be appearing soon, or have appeared in recent months.

As might be expected, there have been criticisms of the task force documents as well. Agencies creating procurements were particularly critical of the somewhat superficial treatment of security and authentication. Vendor criticisms focused on the treatment of parallel I/O and operating system services. Both groups indicated that the requirements related to parallel debuggers were not well specified.

Overall, however, the feedback from all groups indicated that it was extremely useful to have a "strawman" document, regardless of its omissions or shortcomings. There has been sustained encouragement - from agencies, vendors, and users alike - to follow up the initial effort with a second round to improve and extend the guidelines (see below).

10.2 New Efforts in HPC Standards

Since the time when the task force met, several new efforts to define HPC software standards have taken place, addressing problems that were discussed in earlier chapters.

In summer of 1997, the Message Passing Interface Forum completed its eagerly-awaited definition of MPI-2. This significantly expands the capabilities of MPI, adding one-sided communications (remote memory operations), dynamic spawning of processes, language interoperability (for Fortran, C, and C++), and parallel I/O [9].

Since both PVM and MPI continue to be widely used, a new effort has been established to interface the two environments [3]. Called PVMPI, this attempts to solve the obstacles to message-passing on heterogeneous computing systems, including the interface between dissimilar runtime environments for a single message-passing library, as well as the differences between PVM and MPI. The group also plans to address problems related to fault-tolerance, as heterogeneous systems are susceptible to a variety of failures that should not be allowed to impede program completion.

The High Performance Debugging Forum (HPDF) is another new effort, established in March of 1997. HPDF is a collaboration involving parallel tool researchers, commercial debugger developers, and representatives of HPC user organizations. Sponsored by the Parallel Tools Consortium, its goal is to define a variety of standards relevant to debugging tools on HPC systems, including user interfaces, interfaces between debuggers and compilers, etc.

HPDF intends to release its first standard in December, 1997. HPD Version 1.0 defines a command-based (i.e., non-graphical) interface for parallel debuggers. The stated goals are to:

  1. Capture the best-practice knowledge and experience of parallel debugger implementors across the industry.
  2. Establish a well-defined, testable, and minimal core set of features that parallel debuggers must provide.
  3. Ensure that parallel debugger implementors provide this set of features in a consistent way.
  4. Limit the core set in size so that initial commercial implementations can be available within a year of the standard's release.
The standard is defined for the Fortran, C, and C++ languages, and some effort has been made to keep it compatible with other declarative languages as well. While much of the functionality is equally applicable to serial debuggers, attention has focused on those issues that arise when the program being debugged includes multiple threads, multiple processes, or multiple processes containing multiple threads. The array of vendor organizations actively supporting HPDF is impressive.

OpenMP is a somewhat looser alliance of hardware and software vendors that is focusing on standard compiler directives for shared-memory parallelism. At the instigation of DOE's ASCI program, this effort is led by Digital, IBM, Intel, and Kuck & Associates. OpenMP announced its first draft in October of 1997. The structure of the directives shows a strong influence of the Parallel Computing Forum / ANSI X3H5 efforts (which foundered on the window-of-opportunity issue).

Although to date the only language supported in Fortran, the group plans to define C and C++ bindings as well. The API provides a mechanism for specifying shared-memory parallelism that will span computers from multiple vendors. Both UNIX and NT operating systems are targeted.

The Parallel Tools Consortium recently announced formation of a new working group to define a standard API for acquiring information on an application's memory utilization. If successful, this effort will fill a significant gap in current tool support on HPC platforms.

Finally, four agencies represented in the National Coordinating Office for Computing, Information, and Communication (renamed from HPCC in summer of 1997) have recently partnered to support a follow-up to the task force whose efforts are described in this article. DOE, NSF, DARPA, and NASA will serve as joint sponsors of a new national task force in 1998. The group will review the previous guidelines and how they were applied (or not applied) in successful HPC procurements. This information will be used as a starting-point for defining a new set of guidelines that reflects the availability of new standards, evolving software technology, and changes in procurement practices. Individuals interested in participating in the effort should contact the author at pancake@cs.orst.edu.

10.3 What You Can Do to Foster Standardization of HPC Software

The current lack of standardization in HPC software is due to the interaction of several factors, as outlined in the introductory chapters of this article. All of them must be addressed if HPC machines are to be truly usable.

First, clear standards, defining consistent and implementable functionality, must be developed for those areas of HPC software where they are currently lacking. This requires the participation of individuals who are willing to dedicate significant amounts of time bringing divergent, and often conflicting, interests to some mutually acceptable position. Organizations must be willing to share the burden - and expense - of supporting standards participation.

Second, the HPC user community must bring more pressure to bear on HPC vendors. The simple existence of a standard does not mean that it will be available widely or consistently. It is up to the users of a standard to demand that it be promptly and fully.

Third, HPC machine procurements must specify clearly what kinds of software support are needed to provide usability. This is significantly harder than defining metrics for measuring system performance, but will ultimately have more impact on how effectively the system is used.

Finally, purchasing organizations must contrive to award or deny procurements on the basis of software availability as well as hardware performance. Even carefully crafted software requirements are frequently ignored when the final decision is made. Basing purchasing decisions strictly on hardware perpetuates the lack of usable software support.

These goals can only be accomplished if more individuals take on more active, and more vocal, roles to express their concerns about the lack of usability and consistency on current HPC machines. Historical precedents demonstrate that standardization can only be successful when representation includes multiple sectors of the affected community. For HPC, this means that researchers and users, as well as vendors, must become involved.

Quite simply, it is not in the business interests of a vendor company to invest in interoperability or standardization unless the customer community insists on it. The HPC community has been lax in showing that kind of support for software uniformity. Organizations and individuals must find the time to complain loudly when consistency is violated, and to participate in community-wide efforts to improve the situation. It is in all of our interests to establish - and strengthen - standards for HPC system software and tools.

Copyright © 1996


| <- HREF="ch9.html" Prev | Index | Next -> |
NHSE ReviewTM: Comments · Archive · Search


Copyright © 1997 Cherri M. Pancake