NHSE ReviewTM 1997 Volume First Issue

Establishing Standards for HPC System Software and Tools

| <- HREF="ch3.html" Prev | Index | Next -> |
NHSE ReviewTM: Comments · Archive · Search


Chapter 4 -- Tool Support for Application Development

HPC applications, in general, are noted for their size and complexity. Many occupy hundreds of thousands of lines of source code, with some extending to millions of lines. Applications typically are comprised of hundreds to thousands of files. Sometimes the source code for an "application" actually represents several intertwined applications, perhaps written using different programming languages or intended to execute on different types of HPC platforms. Simply keeping track of the dependencies among components can be a monumental task. Moreover, such applications were not written overnight, nor by a single individual. The team that develops and maintains an HPC application is often diverse and may be geographically dispersed. All these characteristics have important implications for system software and tool support.

The group discussed a number of aspects of HPC application development and the software requirements associated with each. This chapter covers the issues related to creating an HPC application in the first place, including basic shells and utilities, compilers, translators, and system software for building applications. For convenience, the issues associated with run-time behavior have been split out into separate chapters. The next chapter describes what users requested in terms of Tool Support for Debugging, while requirements for assessing and improving application performance appear in Tools Support for Performance Tuning.

A pervasive theme in all the discussions was the need for vendors to recognize that HPC applications are no longer developed directly on the target platform. This is due both to increasing competition for HPC computing resources and to the availability of desktop systems capable of providing general programming support. More and more activities that are ancillary to the HPC computations themselves have been off-loaded from the HPC machine to desktops or intermediary compute servers. Such activities include program browsing, editing, compiling, and building, as well as the post-processing, analysis, and viewing of program results or performance information.

Consequently, any HPC environment is inherently distributed and multi-platform in nature. The implications are particularly important for program development activities. Users need to be able to carry out such activities at the desktop, which means that system software and tools must:

  1. be compatible with at least the most common desktop UNIX environments
  2. interact with code that is intended to execute on another platform (the HPC machine)
  3. have transparent access to the HPC machine's file system
Where possible, debugging and performance tuning tools should also function as parts of a seamless, multi-platform environment.

Much discussion centered on the need for HPC vendors to realize that users really are forced to use multiple platforms in developing and executing their applications, and that their demands for consistency are a direct consequence of this situation. A vendor's decision to omit "de facto standard" software elements (e.g., csh), or to provide support in a non-standard way (e.g., proprietary versions of make), is more serious than vendors seem to realize. It is not just a matter of how difficult it might be to discover why an application that worked on a previous platform - or with the previous version of the operating system - no longer functions properly. Nor is it the time required to find an appropriate solution that rankles. The problem is more general: users are reluctant to modify their applications in ways they know are incompatible with other computing environments.

The users, in general, were concerned that their development efforts not be too platform-specific. Given the historical evidence for HPC platform longevity, this concern is understandable. Any successful HPC application is certain to execute, over its lifetime, on several different machine or operating system versions. In many cases, the competition for HPC cycles forces users to alternate which platforms they run a particular application on. Clearly, it is essential for these users to be able to develop and maintain a single version of the code, not one per target platform.

In the area of programming models, too, users felt that vendors failed to understand the realities of HPC applications. Currently, many compilers and software tools force users to conform to a model they find highly artificial: SPMD (single program, multiple data). This is problematical from several perspectives. Because the executable code for all PEs must be loaded onto each PE, an unnecessarily large amount of memory is occupied. Source code structure must be artificially contrived, using repetitive which-PE-is-this tests, rather than following more logical separations such as master/worker. Even worse, insistence on SPMD makes it difficult or impossible to link multiple applications together to form meta-applications. Users pointed out that the only perceivable benefit of the SPMD is that it makes life easier for tool and compiler developers. This is not a sound basis for imposing such a restriction. All system software and tools should support a "MIMD-style", or MPMD (multiple program, multiple data), model for HPC applications.

Similarly, it is no longer realistic to assume that an HPC application will be restricted to a single source language. It is increasingly common to construct applications from components written at different times, some in Fortran and some in C or C++. Computational kernels or other key routines may well be coded in assembly language in order to achieve maximal performance. It is essential that vendors recognize this pattern and support it through mechanisms that allow a module written in one language to invoke one in a second language.

Another recurring topic was the need for interoperability and integration among the various tools and utilities needed for developing HPC applications. This issue is addressed in the introductory section of Tool Support for Debugging.

4.1 Basic Shells and Utilities

HPC vendors typically devote little attention to the kinds of basic support represented by UNIX shells and utilities. Users in the group were quick to point out that the lack of such support can be a significant hindrance, especially considering the level of effort required to provide basic implementations. For example, shell scripts are the most common way of launching HPC applications, filtering or preprocessing input data, aggregating and reorganizing output data, and providing user-defined checkpoints. The absence of such support - or its lack of consistency with other UNIX environments - forces users to devote an inordinate amount of time to basic systems programming. Given the simplicity and stability of UNIX utilities, there is no reason for omitting them or providing them in non-standard form.

Users were insistent that POSIX sh and System V csh are absolute essentials for any HPC platform. Other shells, such as tcsh or ksh, represent little added development time and should probably be included as well, but they are not strictly needed. Perl was also named as a highly desirable, but not essential, scripting language.

While the full UNIX utility suite is not necessary, the most common elements must always be supported: grep, egrep, sed, and diff. HPC vendors were reminded of the ubiquity of these utilities on other UNIX platforms and hence, programmer dependence upon them.

The question of source code editors was also addressed. While it would be very nice to have "smart" editors that understand programming language syntax or can help in managing file dependencies, this was not felt to be a high priority. Instead, the users agreed that the vi editor was essential. It must be fully supported in both full-screen and line-oriented modes, since capabilities vary widely according to whether access is local or dialup. Interestingly, the reason for electing vi over emacs - a much more flexible editor - was that only vi performs consistently across current UNIX platforms.

Links to the Guidelines document:

Shells and utilities in the Baseline Development Environment
All requirements related to shells and utilities

4.2 Compilers, Assemblers, and Translators

The discussions of programming languages and compiler support also centered on the need for adhering to standards, since this is the only way to facilitate either application portability or multi-platform development - both key priorities for HPC users. The most widespread requirements for language support remain Fortran77 and C. While the ANSI standard is obvious for C, it was noted that ANSI Fortran77 does not reflect accurately the de facto standard for Fortran, which must include the MIL standard extensions as well. The widespread practice of adding vendor-specific extensions was deprecated. For compilers already supporting such extensions, users suggested that a compilation option make it possible to identify and flag all non-standard constructs.

The discussions of Fortran90 and C++ were heated, with some partisan supporters strongly in favor of including these in the baseline specification. The counter-arguments including low current usage (partly due to the lack of fully compliant Fortran90 compilers) and the slowness of the ANSI committees in arriving at standard specifications. The group eventually agreed that the current situation precluded requiring either language on all HPC platforms. Rather, they were designated as additional, desirable elements that should be included on some, but not all procurements.

The parallel languages proved even more controversial, in part because the current lack of consistency across vendor platforms means that different user installations have become dependent on different language dialects. It was agreed that some form of parallel Fortran language or compiler directives were important for any shared-memory machine. Since the requirement applies to only certain machines, however, it was not included as part of the baseline environment but relegated to the first-priority desirables. HPF (High Performance Fortran) shared a similar fate, this time partly due to skepticism that HPF would prove useful for a broad enough range of user applications. In the end, "subset HPF" was listed as a first-level desirable, while the full HPF standard occurred at the third-level of priority.

The situation for parallel C and C++ was less clear-cut. While many users were interested in the availability of some form of parallel C language or compiler pragmas, at least for shared-memory systems, the lack of current availability was enough of a drawback to position it as a second-level desirable. Parallel C++ was deemed to be so experimental in nature that it would not be realistic to request vendor support at this time.

The performance-oriented focus of HPC applications means that some programmers are willing to forego convenience in order to achieve better efficiency for their applications. Therefore, the availability of an assembler was also considered important for some proportion of sites. In those cases, two other features would also be desirable. First, the inclusion of one or more compiler options capable of generating pseudo-assembly-language listings would make it possible for the user to see what the compiler was doing, as a point of departure for hand-tuning. Second the ability to embed sections of assembly code into a higher-level program (e.g., through compiler directives or pragmas) would simplify that process.

Links to the Guidelines document:

Compilers in the Baseline Development Environment
All requirements related to sequential compilers
All requirements related to parallelizing compilers and translators
All requirements related to additional language support

4.3 Program Analysis

The group deprecated the general lack of support for basic program analysis on HPC platforms. In some cases, there are not even tools (such as lint or flint) for detecting mismatches in the number and type of procedural arguments. There is even less support for basic interprocedural analysis, such as COMMON block analysis or use/def analysis.

Given the fact that compiler technology has been capable of such analysis for over two decades, users find it hard to understand why HPC compilers must be so rudimentary and unhelpful in nature.

Links to the Guidelines document:

Requirements related to program analysis

4.4 Support for Building Applications

Given the size and complexity of current HPC applications, users are highly dependent on the platform's utilities for building application from individual object files. For this reason, the group's most vocal criticisms concerned make. It was pointed out that although this is one of UNIX's simplest utilities, it is also the most notorious in terms of platform-specific idiosyncracies. Each vendor chooses slight variants in syntax and semantics, making it impossible to develop multi-platform make files without resorting to complicated templates or scripts. Moreover, the inability of some versions to handle cascaded includes means that the application may even have to be reformulated simply to satisfy yet-another-version of make.

The group agreed that all vendors should provide a fully supported implementation of this key utility, and that its interface should conform to the de facto standard established by GNU make. This would not only support the multi-platform development patterns already described, but also make it possible to maintain a single version of the application capable of executing on multiple HPC machines.

Other user requirements were viewed as important, but not necessarily widespread enough to warrant inclusion in the baseline environment. One was the capability of performing make in parallel. Another was the ability to include ANSI C preprocessor directives (such as #define) in any of the source languages supported on the machine. It was noted that some Fortran compilers do not even provide full support for include at this time, much less conditional compilation. Yet to achieve the best performance, it is often desirable to use preprocessing to generate different executables for different sizes of problems. While compiler support would be the best solution, users were willing to invoke a separate preprocessor, as long as it was consistent across platforms and compatible with Fortran as well as C.

Other requirements involved linker/loader support. Although linkage editing features - such as the ability to replace specific objects in an executable without completely rebuilding it - have been available for over 30 years, HPC platforms often do not support them. This is particularly unfortunate given the likely size of an HPC executable and the number of its component objects. Similarly, support for deferring the linking of object libraries until load-time is ubiquitous on serial workstations, but lacking on many HPC machines. This means that not only are the executables much larger in size, but whenever a library is updated or patched, all applications using it must be re-built. A further enhancement to linking capabilities would be to defer linking until an object is actually needed at run-time. This is another feature commonly found on workstations, where some libraries are so large (e.g., OSF/Motif) that it doesn't make sense to load routines that might never be used.

The desirability of source code version control (e.g., tools such as SCCS, RCS, or USM) was at a slightly lower priority, probably because many HPC users are unfamiliar with them.

Users also pointed out that it should be possible to carry out the entire application-building process - preprocessing, compiling, linking - from a workstation. While it was recognized that there might be licensing issues, the availability of such support could have significant effects in relieving the congestion on large HPC machines.

Links to the Guidelines document:

Application building in the Baseline Development Environment
All requirements related to application building

Copyright © 1996


| <- HREF="ch3.html" Prev | Index | Next -> |
NHSE ReviewTM: Comments · Archive · Search


Copyright © 1997 Cherri M. Pancake