Changes to the archive in in August 1996

Newest entries are first. Older changes can be found here.

29th August 1996

/parallel/environments/pvm3/povray/: Updated PVM POV-Ray to V2.9
/parallel/environments/pvm3/povray/pvmpov29.tar.gz.txt: PVM POV-Ray V2.9 Patch description by Andreas Dilger <adilger@enel.ucalgary.ca>, http://www-mddsp.enel.ucalgary.ca/People/adilger/, Micronet Research Group, Dept of Electrical & Computer Engineering, University of Calgary, Canada Patches to add parallel processing to POV-Ray. Requires PVM 3.3.
/parallel/environments/pvm3/povray/pvmpov29.tar.gz: PVM POV-Ray V2.9 Patch source by Andreas Dilger <adilger@enel.ucalgary.ca>, http://www-mddsp.enel.ucalgary.ca/People/adilger/, Micronet Research Group, Dept of Electrical & Computer Engineering, University of Calgary, Canada Based on original work by Brad Kline of Cray Research Inc.
/parallel/libraries/communication/scotch/: Updated SCOTCH Static Mapping Package to V3.1 The SCOTCH software package is produced by the SCOTCH project whos goal is to study static mapping by the means of graph theory, using a ``divide and conquer'' approach. The SCOTCH software package for static mapping embodies all the algorithms and graph bipartitioning heuristics developed within the SCOTCH project. See also http://www.labri.u-bordeaux.fr/~pelegrin/scotch/
/parallel/libraries/communication/scotch/scotch_3.1A.tar.gz: The SCOTCH v3.1 Academic distribution by Francois Pellegrini <pelegrin@labri.u-bordeaux.fr> Contains binaries for Sun Solaris 2 and SunOS 4; MIPS SGI IRIX 5 and 6; Linux; PowerPC IBM AIX 4. Also contains sources, graphs, 3.1 User Guide (below)
/parallel/libraries/communication/scotch/scotch_user3.1.ps.gz: The SCOTCH v3.1 User Guide by Francois Pellegrini <pelegrin@labri.u-bordeaux.fr>

28th August 1996

/parallel/languages/impala/: Impala - IMplicitly PArallel LAnguage Application Suite
/parallel/languages/impala/Announcement: Announcement of Impala (IMplicitly PArallel LAnguages) application suite by Andy Shaw <shaw@hammer.lcs.mit.edu> Impala is an application suite for Implicitly Parallel Languages. See also http://www.csg.lcs.mit.edu/impala/
/parallel/languages/impala/impala-v0.00.tar.gz: Impala application suite V0.00
/parallel/languages/impala/impala-v0.00/papers/boon-report.ps.gz: How I Spent Summer of 1993 at MCRC by Boon S. Ang What Boon did while interning at Motorola Cambridge Research Center, summer 1993. October 8, 1993
/parallel/languages/impala/impala-v0.00/papers/eigensolver-dong-sorr.ps.gz: Analysis of Non-Strict Functional Implementations of the Dongarra-Sorensen Eigensolver by S. Sur and A. P. W. Bohm <bohm@CS.ColoState.Edu>; Tel: +1 (303) 491-7595; FAX: +1 (303) 491-6639. Department of Computer Science, Colorado State University, Ft. Collins, CO 80523, USA. ABSTRACT: We study the producer-consumer parallelism of Eigensolvers composed of a tridiagonalization function, a tridiagonal solver, and a matrix multiplication, written in the non-strict functional programming language Id. We verify the claim that non-strict functional languages allow the natural exploitation of this type of parallelism, in the framework of realistic numerical codes. We compare the standard top-down Dongarra-Sorensen solver with a new, bottom-up version. We show that this bottom-up implementation is much more space efficient than the top-down version. Also, we compare both versions of the Dongarra-Sorensen solver with the more traditional QL algorithm, and verify that the Dongarra-Sorensen solver is much more efficient, even when run in a serial mode. We show that in a non-strict functional execution model, the Dongarra-Sorensen algorithm can run completely in parallel with the Householder function. Moreover, this can be achieved without any change in the code components. We also indicate how the critical path of the complete Eigensolver can be improved.
/parallel/languages/impala/impala-v0.00/papers/eigensolver-jacobi.ps.gz: A Functional Implementation of the Jacobi Eigen-Solver by A. P. W. Bohm, Department of Computer Science, Colorado State University, Ft. Collins, CO 80523, USA and R.E. Hiromoto, Computer Research Group, Los Alamos National Laboratory. June 15, 1994 ABSTRACT: In this paper, we describe the systematic development of two implementations of the Jacobi eigen-solver and give their performance results for the MIT/Motorola Monsoon dataflow machine. Our study is carried out using MINT, the MIT Monsoon simulator. The design of these implementations follows from the mathematics of the Jacobi method, and not from a translation of an existing sequential code. The functional semantics with respect to array updates, which cause excessive array copying, has lead us to a new implementation of a parallel "group-rotations" algorithm first described by Sameh. Our version of this algorithm requires O(n^3) operations, whereas Sameh's original version requires O(n^4) operations. The implementations are programmed in the language Id, and although Id has non-functional features, we have restricted the development of our eigen-solvers to the functional sub-set of the language.
/parallel/languages/impala/impala-v0.00/papers/nas-ft-jfp.ps.gz: On the Effectiveness of Functional Language Features: NAS benchmark FT by J. Hammes; S. Sur and W. Bohm. Department of Computer Science, Colorado State University, Ft. Collins, CO 80523, USA. In Journal of Functional Programming 1 (1): 1{000, January 1993, Cambridge University Press. ABSTRACT: In this paper we investigate the effectiveness of functional language features when writing scientific codes. Our programs are written in the purely functional subset of Id and executed on a one node Motorola Monsoon machine, and in Haskell and executed on a Sparc 2. In the application we study { the NAS FT benchmark, a three-dimensional heat equation solver { it is necessary to target and select one-dimensional sub-arrays in threedimensional arrays. Furthermore, it is important to be able to share computation in array definitions. We compare first order and higher order implementations of this benchmark. The higher order version uses functions to select one-dimensional sub-arrays, or slices, from a threedimensional object, whereas the first order version creates copies to achieve the same result. We compare various representations of a three-dimensional object, and study the effect of strictness in Haskell. We also study the performance of our codes when employing recursive and iterative implementations of the one-dimensional FFT, which forms the kernel of this benchmark. It turns out that these languages still have quite inefficient implementations, with respect to both space and time. For the largest problem we could run (32 3 ), Haskell is fifteen times slower than Fortran and uses three times more space than is absolutely necessary, whereas Id on Monsoon uses nine times more cycles than Fortran on the MIPS R3000, and uses five times more space than is absolutely necessary. This code, and others like it, should inspire compiler writers to improve the performance of functional language implementations.
/parallel/languages/impala/impala-v0.00/papers/nas-ft-pact.ps.gz: Functional, I-Structure, and M-Structure Implementations of NAS Benchmark FT by S. Sur and W. Bohm. Department of Computer Science, Colorado State University, Ft. Collins, CO 80523, USA. ABSTRACT: We implement the NAS parallel benchmark FT, which numerically solves a three dimensional partial differential equation using forward and inverse FFTs, in the dataflow language Id and run it on a one node monsoon machine. Id is a layered language with a purely functional kernel, a deterministic layer with I-structures, and a non-deterministic layer with M-structures. We compare the performance of versions of our code written in these three layers of Id. We measure instruction counts and critical path length using the Monsoon Interpreter Mint. We measure the space requirements of our codes by determining the largest possible problem size fitting on a one node monsoon machine. The purely functional code provides the highest average parallelism, but this parallelism turns out to be superfluous. The I-structure code executes the minimal number of instructions and as it has a similar critical path length as the functional code, runs the fastest. The M-structure code allows the largest problem sizes to be run at the cost of about 20% increase in instruction count, and 75% to 100% increase in critical path length, compared to the I-structure code.
/parallel/languages/impala/impala-v0.00/papers/nas-integer-sort.ps.gz: NAS parallel benchmark integer sort (IS) performance on MINT by S. Sur and W. Bohm. Department of Computer Science, Colorado State University, Ft. Collins, CO 80523, USA. April 7, 1993 ABSTRACT: We implemented several sorting routines in Id and compared their relative performances in terms of number of instructions (S1), length of the critical path (S1) and average parallelism. The sorting routines considered here are of the types (1) Exchange sort (2) Insertion sort (3) Merge sort and (4) Sorting Networks. We implemented them using I-structures (e.g. merge sort) or M-structures (e.g bubble sort), whichever was proved to be more efficient. We then optimized the routines with respect to efficiency, minimized the number of barriers, eliminated redundant copying etc. to the best of our abilities and then compared their performances. We have compared our results with expected theoretical performance and obtained satisfactory results.
/parallel/performance/benchmarks/NAS-parallel/: Updated The NASA NAS Parallel Benchmarks to V2.1. The home page for the benchmarks is http://www.nas.nasa.gov/NAS/NPB/ which contains the NPB V1 and V2 reports and source code.
/parallel/performance/benchmarks/NAS-parallel/NPB2.1.tar.gz: NAS Parallel Benchmarks Source Code V2.1 New LU and BT in V2.1.
/parallel/environments/charm/: Added UIUC CHARM / CHARM++ 4.5 and 4.6 with papers and manuals
/parallel/environments/charm/Announcement: Announcement of CHARM / CHARM++ 4.5 by Sanjeev Krishnan <sanjeev@cs.uiuc.edu> The 4.5 distribution contains: Converse : A runtime framework that supports interoperability; CHARM : A parallel object-based extension of C; CHARM++ : A C++ based parallel language that supports concurrent objects with inheritance; Projections : An expert performance analysis tool; SummaryTool : A simple performance analysis tool; and Dagger : A notation for easy expression of message driven programs.
/parallel/environments/charm/charm.manual/: CHARM Manual
/parallel/environments/charm/charmpp.manual/: CHARM++ Manual
/parallel/environments/charm/distrib.4.6/: CHARM 4.6 distribution
/parallel/environments/charm/distrib.4.6/doc/: doc
/parallel/environments/charm/distrib.4.6/net-hp.tar.gz: CHARM / CHARM++ binaries for Networks of HP HPPA-RISC machines
/parallel/environments/charm/distrib.4.6/net-rs6k.tar.gz: CHARM / CHARM++ binaries for Networks of IBM RS6000 machines
/parallel/environments/charm/distrib.4.6/sim-hp.tar.gz: Converse Simulator binaries for HP HPPA-RISC
/parallel/environments/charm/distrib.4.6/sim-sol.tar.gz: Converse Simulator binaries for Sun Solaris 2.x
/parallel/environments/charm/distrib.4.6/net-sol.tar.gz: CHARM / CHARM++ binaries for Networks of Sun Solaris 2.x machines
/parallel/environments/charm/distrib.4.6/uth-hp.tar.gz: CHARM / CHARM++ binaries for uniprocessor HP HPPA-RISC
/parallel/environments/charm/distrib.4.6/uth-rs6k.tar.gz: CHARM / CHARM++ binaries for uniprocessor IBM RS6000
/parallel/environments/charm/distrib.4.6/uth-sol.tar.gz: CHARM / CHARM++ binaries for uniprocessor Sun Solaris 2.x
/parallel/environments/charm/distrib.4.6/sp1.tar.gz: CHARM / CHARM++ binaries for IBM SP1
/parallel/environments/charm/distrib.4.6/t3d.tar.gz: CHARM / CHARM++ binaries for Cray T3D
/parallel/environments/charm/distrib.4.5/: CHARM 4.5 distribution
/parallel/environments/charm/distrib.4.5/announcement.txt: Announcement of CHARM / CHARM++ 4.5
/parallel/environments/charm/distrib.4.5/README: Overview of files
/parallel/environments/charm/distrib.4.5/Charm.license: CHARM License Basically OK for educational, research or non-profit use. For-profit internal or evaluation only or need commercial license (commercialUse.txt)
/parallel/environments/charm/distrib.4.5/commercialUse.txt: Commercial use contact details
/parallel/environments/charm/distrib.4.5/charm.manual.ps.gz: CHARM 4.5 Manual
/parallel/environments/charm/distrib.4.5/charm.tutorial.ps.gz: CHARM Tutorial
/parallel/environments/charm/distrib.4.5/charm++.manual.ps.gz: CHARM++ 4.5 Manual
/parallel/environments/charm/distrib.4.5/install.manual.ps.gz: CHARM Installation Manual
/parallel/environments/charm/distrib.4.5/converse.manual.ps.gz: Converse Parallel Programming Environment Manual
/parallel/environments/charm/distrib.4.5/dagger.tutorial.ps.gz: Dagger Language Tutorial
/parallel/environments/charm/distrib.4.5/dagtool.manual.ps.gz: Dagger Tool (DagTool) Tutorial
/parallel/environments/charm/distrib.4.5/projections.manual.ps.gz: User Manual for Projections performance analysis tool V2.0
/parallel/environments/charm/distrib.4.5/summary.manual.ps.gz: User Manual for SummaryTool V1.0
/parallel/environments/charm/distrib.4.5/net-hp.tar.Z: CHARM / CHARM++ binaries for Networks of HP HPPA-RISC machines
/parallel/environments/charm/distrib.4.5/net-rs6k.tar.Z: CHARM / CHARM++ binaries for Networks of IBM RS6000 machines
/parallel/environments/charm/distrib.4.5/net-sol.tar.Z: CHARM / CHARM++ binaries for Networks of Sun Solaris 2.x machines
/parallel/environments/charm/distrib.4.5/net-sun.tar.Z: CHARM / CHARM++ binaries for Networks of Sun SunOS 4.x machines
/parallel/environments/charm/distrib.4.5/sim-hp.tar.Z: Converse Simulator binaries for HP HPPA-RISC
/parallel/environments/charm/distrib.4.5/sim-rs6k.tar.Z: Converse Simulator binaries for IBM RS6000
/parallel/environments/charm/distrib.4.5/sim-sol.tar.Z: Converse Simulator binaries for Sun Solaris 2.x
/parallel/environments/charm/distrib.4.5/sim-sun.tar.Z: Converse Simulator binaries for Sun SunOS 4
/parallel/environments/charm/distrib.4.5/uth-hp.tar.Z: CHARM / CHARM++ binaries for uniprocessor HP HPPA-RISC
/parallel/environments/charm/distrib.4.5/uth-rs6k.tar.Z: CHARM / CHARM++ binaries for uniprocessor IBM RS6000
/parallel/environments/charm/distrib.4.5/uth-sun.tar.Z: CHARM / CHARM++ binaries for uniprocessor Sun SunOS 4
/parallel/environments/charm/distrib.4.5/uth-sol.tar.Z: CHARM / CHARM++ binaries for uniprocessor Sun Solaris 2.x
/parallel/environments/charm/distrib.4.5/cm5.tar.Z: CHARM / CHARM++ binaries for CM5
/parallel/environments/charm/distrib.4.5/paragon-sunmos.tar.Z: CHARM / CHARM++ binaries for Intel Paragon SUNMOS
/parallel/environments/charm/distrib.4.5/paragon-osf.tar.Z: CHARM / CHARM++ binaries for Intel Paragon OSF
/parallel/environments/charm/distrib.4.5/sp1.tar.Z: CHARM / CHARM++ binaries for IBM SP1
/parallel/environments/charm/distrib.4.5/tools/: CHARM Tools
/parallel/environments/charm/papers.html
/parallel/environments/charm/reportlist.ps.gz: UIUC PPL / CHARM Papers Overview
/parallel/environments/charm/papers/: UIUC PPL / CHARM Papers
/parallel/environments/charm/papers/ABSTRACTS: Abstracts of some of the papers below
/parallel/environments/charm/papers/SanjeevThesis.ps.gz: Automating Runtime Optimizations for Parallel Object-Oriented Programming by Sanjeev Krishnan Ph.D. Thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, June 1996. ABSTRACT: Software development for parallel computers has been recognized as one of the bottlenecks preventing their widespread use. In this thesis we examine two complementary approaches for addressing the challenges of high performance and enhanced programmability in parallel programs: automated optimizations and object-orientation. We have developed the parallel object-oriented language Charm++ (an extension of C++), which enables the benefits of object-orientation to be applied to the problems of parallel programming. In order to improve parallel program performance without extra effort, we explore the use of automated optimizations. In particular, we have developed techniques for automating run-time optimizations for parallel object-oriented languages. These techniques have been embodied in the Paradise post-mortem analysis tool which automates several run-time optimizations without programmer intervention. Paradise builds a program representation from traces, analyzes characteristics, chooses and parameterizes optimizations, and generates hints to the Charm++ run-time libraries. The optimizations researched are for static and dynamic object placement, scheduling, granularity control and communication reduction. We also evaluate Charm++, Paradise and several run-time optimization techniques using real applications, including an N-body simulation program, a program from the NAS benchmark suite, and several other programs.
/parallel/environments/charm/papers/ParallelObjectArrays_POOMA96.ps.gz: A Parallel Array Abstraction for Data-Driven Objects by Sanjeev Krishnan <sanjeev@cs.uiuc.edu> and Laxmikant V. Kale <kale@cs.uiuc.edu>. ABSTRACT: We describe design and implementation of an abstraction for parallel arrays of data-driven objects. The arrays may be multi-dimensional, and the number of elements in an array is independent of the number of processors. The elements are mapped to processors by a user-controllable mapping function. The mapping may be changed during the parallel computation, which facilitates load balancing, and communication optimization, for example. Asynchronous method invocation is supported, with multicast, broadcast, and dimension-wide broadcast. The abstraction is illustrated using examples in fluid dynamics and molecular simulations.
/parallel/environments/charm/papers/charmpp.ps.gz: CHARM++ (Chapter 7) by Laxmikant V. Kale and Sanjeev Krishnan. ABSTRACT: CHARM++ is a parallel object-oriented language based on C++. It was developed over the past few years at the Parallel Programming Laboratory, University of Illinois, to enable the application of object orientation to the problems of parallel programming. Its innovative features include message-driven execution for latency tolerance and modularity, dynamic creation and load balancing of concurrent objects, branched objects which are groups of objects with one representative on every processor, and multiple specific information-sharing abstractions. This chapter describes its design philosophy, essential features, syntax, implementation, and applications.
/parallel/environments/charm/papers/Converse_IPPS96.ps.gz: Converse: an Interoperable Framework for Parallel Programming by Laxmikant V. Kale; Milind Bhandarkar; Narain Jagathesan; Sanjeev Krishnan and Joshua M. Yelon. In Proceedings of the International Parallel Processing Symposium, Honolulu, Hawaii, April 1996. ABSTRACT: Many different parallel languages and paradigms have been developed, each with its own advantages. To benefit from all of them, it should be possible to link together modules written in different parallel languages in a single application. Since the paradigms sometimes differ in fundamental ways, this is difficult to accomplish. This paper describes a framework, Converse, that supports such multi-lingual interoperability. The framework is meant to be inclusive, and has been verified to support the SPMD programming style, message-driven programming, parallel object-oriented programming, and thread-based paradigms. The framework aims at extracting the essential aspects of the runtime support into a set of core components, so that language-specific code does not have to pay overhead for features that it does not need.
/parallel/environments/charm/papers/RuntimeOpts_ICS96.ps.gz: Automating Parallel Runtime Optimizations Using Post-Mortem Analysis by Sanjeev Krishnan and L. V. Kale. To appear in Proceedings of the 10th ACM International Conference on Supercomputing, Philadelphia, May 1996. ABSTRACT: Attaining good performance for parallel programs frequently requires substantial expertise and effort, which can be reduced by automated optimizations. In this paper we concentrate on run-time optimizations and techniques to automate them without programmer intervention, using post-mortem analysis of parallel program execution. We classify the characteristics of parallel programs with respect to object placement (mapping), scheduling and communication, then describe techniques to discover these characteristics by post-mortem analysis, present heuristics to choose appropriate optimizations based on these characteristics, and describe techniques to generate concise hints to runtime optimization libraries. Our ideas have been developed in the framework of the {\em Paradise} post-mortem analysis tool for the parallel object-oriented language Charm++. We also present results for optimizing simple parallel programs running on the Thinking Machines CM-5.
/parallel/environments/charm/papers/ChareKernel_ICPP91.ps.gz: Supporting Machine Independent Parallel Programming on Diverse Architectures. by Wayne Fenton; Balkrishan Ramkumar; Vikram Saletore; Amitabh B. Sinha and Laxmikant V. Kale. International Conference on Parallel Processing, August, 1991. ABSTRACT: The Chare kernel is a run time support system that permits users to write machine independent parallel programs on MIMD multiprocessors without losing efficiency. It supports an explicitly parallel language which helps control the complexity of parallel program design by imposing a separation of concerns between the user program and the system. The programmer is responsible for the dynamic creation of processes and exchanging messages between processes. The kernel assumes responsibility for when and where to execute the processes, dynamic load balancing, and other ``low'' level features. The language also provides machine-independent abstractions for information sharing which are implemented differently on different types of machines. The language has been implemented on both shared and nonshared memory machines including Sequent Balance and Symmetry, Encore Multimax, Alliant FX/8, Intel iPSC/2, iPSC/860 and NCUBE/2, and is being ported to NUMA (Non Uniform Memory Access) machines like the BBN TC2000. It is also being ported to a network of Sun workstations. We discuss the salient features of the implementation of the kernel on the three different types of architectures.
/parallel/environments/charm/papers/Projections_ICPP94.ps.gz: Projections: a preliminary performance tool for Charm. by Laxmikant V. Kale and Amitabh B. Sinha. Parallel Systems Fair, International Symposium on Parallel Processing, Newport Beach, April 1993. ABSTRACT: The advent and acceptance of massively parallel machines has made it increasingly important to have tools to analyze the performance of programs run ning on these machines. Current day performance tools suffer from two drawbacks: they are not scalable and they lose specific information about the user program in their attempt for generality. In this paper, we present Projections, a scalable performance tool, for Charm that can provide program-specific information to help the users better understand the behavior of their programs.
/parallel/environments/charm/papers/PrioritizedLoadBalancing_IPPS93.ps.gz: A Load Balancing Strategy For Prioritized Execution of Tasks. by Amitabh B. Sinha and Laxmikant V. Kale. International Symposium on Parallel Processing, Newport Beach, April 1993. ABSTRACT: Load balancing is a critical factor in achieving optimal performance in parallel applications where tasks are created in a dynamic fashion. In many computations, such as state space search problems, tasks have priorities, and solutions to the computation may be achieved more efficiently if these priorities are adhered to in the parallel execution of the tasks. For such tasks, a load balancing scheme that only seeks to balance load, without balancing high priority tasks over the entire system, might result in the concentration of high priority tasks (even in a balanced-load environment) on a few processors, thereby leading to low priority work being done. In such situations a load balancing scheme is desired which would balance both load and high priority tasks over the system. In this paper, we describe the development of a more efficient prioritized load balancing strategy.
/parallel/environments/charm/papers/ParallelSort_ICPP93.ps.gz: A Comparison Based Parallel Sorting Algorithm. by L.V. Kale and Sanjeev Krishnan. International Conference on Parallel Processing, August 1993. ABSTRACT: We present a fast comparison based parallel sorting algorithm that can handle arbitrary key types. Data movement is the major portion of sorting time for most algorithms in the literature. Our algorithm is parameterized so that it can be tuned to control data movement time, especially for large data sets. Parallel histograms are used to partition the key set exactly. The algorithm is architecture independent, and has been implemented in the CHARM portable parallel programming system, allowing it to be efficiently run on virtually any MIMD computer. Performance results for sorting different data sets are presented.
/parallel/environments/charm/papers/Charm++_OOPSLA93.ps.gz: CHARM++ : A Portable Concurrent Object Oriented System Based On C++. by L.V. Kale and Sanjeev Krishnan. Proceedings of the Conference on Object Oriented Programming Systems, Languages and Applications, Sept-Oct 1993. ACM Sigplan Notes, Vol. 28, No. 10, pp. 91-108. (Also: Technical Report UIUCDCS-R-93-1796, March 1993, University of Illinois, Urbana, IL.) [Internal Report #93-2, March 93] ABSTRACT: We describe Charm++, an object oriented portable parallel programming language based on C++. Its design philosophy, implementation, sample applications and their performance on various parallel machines are described. Charm++ is an explicitly parallel language consisting of C++ with a few extensions. It provides a clear separation between sequential and parallel objects. The execution model of Charm++ is message driven, thus helping one write programs that are latency-tolerant. The language supports multiple inheritance, dynamic binding, overloading, strong typing, and reuse for parallel objects. Charm++ provides specific modes for sharing information between parallel objects. Extensive dynamic load balancing strategies are provided. It is based on the Charm parallel programming system, and its runtime system implementation reuses most of the runtime system for Charm.
/parallel/environments/charm/papers/Checkpoint.ps.gz: Efficient, Language-Based Checkpointing for Massively Parallel Programs by Sanjeev Krishnan and Laxmikant V. Kale. Submitted for publication. [Internal Report #94-2] ABSTRACT: Checkpointing and restart is an approach to ensuring forward progress of a program in spite of system failures or planned interruptions. We investigate issues in checkpointing and restart of programs running on massively parallel computers. We identify a new set of issues that have to be considered for the MPP platform, based on which we have designed an approach based on the language and run-time system. Hence our checkpointing facility can be used on virtually any parallel machine in a portable manner, irrespective of whether the operating system supports checkpointing. We present methods to make checkpointing and restart space- and time-efficient, including object-specific functions that save the state of an object. We present techniques to automatically generate checkpointing code for parallel objects, without programmer intervention. We also present mechanisms to allow the programmer to easily incorporate application specific knowledge selectively to make the checkpointing more efficient. The techniques developed here have been implemented in the Charm++ parallel object-oriented programming language and run-time system. Performance results are presented for the checkpointing overhead of programs running on parallel machines.
/parallel/environments/charm/papers/FMA_ICPP95.ps.gz: A Parallel Adaptive Fast Multipole algorithm for N-body problems by Sanjeev Krishnan and Laxmikant V. Kale. In Proceedings of the International Conference on Parallel Processsing, August 1995. ABSTRACT: We describe the design and implementation of a parallel adaptive fast multipole algorithm (AFMA) for N-body problems. Our AFMA algorithm can organize particles in cells of arbitrary shape. This simplifies its parallelization, so that good locality and load balance are both easily achieved. We describe a tighter well-separatedness criterion, and improved techniques for constructing the AFMA tree. We describe how to avoid redundant computation of pair-wise interactions while maintaining load balance, using a fast edge-partitioning algorithm. The AFMA algorithm is designed in an object oriented, message-driven manner, allowing latency tolerance by overlapping computation and communication easily. It also incorporates several optimizations for message prioritization and communication reduction. Preliminary performance results of our implementation using the Charm++ parallel programming system are presented.
/parallel/environments/charm/papers/agents.ps.gz: Agents: an Undistorted Representation of Problem Structure. To appear in proceedings of the conference on Languages and Compilers for Parallel Computing, 1995. ABSTRACT: Agents is an actors-based language, and like every other such language, it has a computation graph formed by the actors and the lines of communication between them. Agents is unusual, though, in that the structure of the computation graph is explicitly declared. However, unlike most languages in which the structure of the computation is declared (eg, visual dataflow languages), jlang does not require the computation graph to be gridlike, finite, or flat: rather, it can be a tree, a table, a graph, or indeed any arbitrarily complex, irregular, or even infinite pattern. By providing explicit constructs for declaring the structure of the computation, two advantages are gained: one, it becomes much easier for the programmer to express the computation, and two, the compiler and runtime are able to perform optimizations that would not be possible in a normal actors-based language where the structure of computation is hidden or unknown.
/parallel/environments/charm/papers/pvmug95.ps.gz: Interoperability and multithreading for PVM using the CONVERSE Interoperable Framework by L. V. Kale; Abner Zangvil and Narain K. Jagathesan. Presentation for PVMUG'95
/parallel/environments/charm/papers/hpccPosition.ps.gz: Application Oriented and Computer Science Centered HPCC Research by Laxmikant V. Kale <kale@cs.uiuc.edu> May 12 1994 ABSTRACT: At this time, there is a perception of a backlash against the HPCC program, and even the idea of massively parallel computing itself. In preparation to defining an agenda for HPCC, this paper first analyzes the reasons for this backlash. Although beset with unrealistic expectations, parallel processing will be a beneficial technology with a broad impact, beyond applications in science. However, this will require significant advances and work in computer science in addition to parallel hardware and end-applications which are emphasized currently. The paper presents a possible agenda that could lead to a successful HPCC program in the future.
/parallel/environments/charm/papers/Performance_ICPP94.ps.gz: A framework for intelligent performance feedback by Amitabh B. Sinha and Laxmikant V. Kale. ABSTRACT: The significant gap between peak and realized performance of parallel machines motivates the need for performance analysis. Contemporary tools provide only generic measurement, rather than program-specific information and analysis. An object-oriented and message-driven language, such as Charm, presents opportunities for both program-specific feedback and automatic performance analysis We present a framework in which specific and intelligent feedback can be given to the user about their parallel program. The framework will use information about the parallel program generated at compile-time and at run-time to analyze its performance using general expertise and specific algorithms in performance analysis.
/parallel/environments/charm/papers/InformationSharing_SC93.ps.gz: Information Sharing Mechanisms in Parallel Programs by L.V. Kale and Amitabh Sinha. International Parallel Processing Symposium, Cancun, Mexico, April 26-29, 1994. [Internal Report #93-4, March 1993] ABSTRACT: Most parallel programming models provide a single generic mode in which processes can exchange information with each other. However, empirical observation of parallel programs suggests that processes share data in a few distinct and specific modes. We argue that such modes should be identified and explicitly supported in parallel languages and their associated models. The paper describes a set of information sharing abstractions that have been identified and incorporated in the parallel programming language Charm. It can be seen that using these abstractions leads to improved clarity and expressiveness of user programs. In addition, the specificity provided by these abstractions can be exploited at compile-time and at run-time to provide the user with highly refined performance feedback and intelligent debugging tools.
/parallel/environments/charm/papers/Symbolic_LNCS93.ps.gz: Prioritization in Parallel Symbolic Computing by L.V. Kale; B. Ramkumar; V. Saletore and A.B. Sinha. Lecture Notes in Computer Science, Vol. 748, pp. 12-41, 1993. [Internal Report #93-6]
/parallel/environments/charm/papers/CharmOverview.ps.gz: Parallel Programming with CHARM: An Overview by L.V. Kale July 1993. [Internal Report #93-8]
/parallel/environments/charm/papers/Quiescence.ps.gz: A Dynamic and Adaptive Quiescence Detection Algorithm by A. Sinha; L.V. Kale and B. Ramkumar. September 1993. [Internal Report #93-11]
/parallel/environments/charm/papers/Projections_IPPS93.ps.gz: Projections: A Preliminary Performance Tool for Charm by L.V. Kale and A.B. Sinha. Parallel Systems Fair, International Parallel Processing Symposium, Newport Beach, CA., pp. 108-114, April 1993. [Internal Report #92-3]
/parallel/environments/charm/papers/TPDS_PartI.ps.gz: The Charm Parallel Programming Language and System: Part I - Description of Language Features by L. V. Kale; B. Ramkumar; A. B. Sinha and A. Gursoy. ABSTRACT: We describe a parallel programming system for developing machine independent programs for all MIMD machines. Many useful approaches to this problem are seen to require a common base of support, which can be encapsulated in a language that abstracts over resource management decisions and machine specific details. This language can be used for implementing other high level approaches as well as for efficient application programming. The requirements for such a language are defined, and the language supported by the Charm system is described, and illustrated with examples. Charm is one of the first languages to support message driven execution, and embodies unique abstractions such as branch office chares and specifically shared variables. In Part II of this paper, we talk about the runtime support system for Charm. The system thus provides ease of programming on MIMD platforms without sacrificing performance.
/parallel/environments/charm/papers/TPDS_PartII.ps.gz: The Charm Parallel Programming Language and System: Part II - The Runtime System by B. Ramkumar; A. B. Sinha; V. A. Saletore and L. V. Kale. ABSTRACT: Charm is a parallel programming system that permits users to write portable parallel programs on MIMD multiprocessors without losing efficiency. It supports an explicitly parallel language which helps control the complexity of parallel program design by imposing a separation of concerns between the user program and the system. It also provides target machine independent abstractions for information sharing which are implemented differently on different types of processors. In part I of this paper [16], we described the language support provided by Charm and the rationale behind its design. Charm has been implemented on a variety of parallel machines including shared memory machines like the Encore Multimax and the Sequent Symmetry, message passing architectures like the Intel iPSC/2, Intel i860 and the NCUBE 2, and a network of Unix workstations. The Chare kernel is the run-time system that supports the portable execution of Charm on several MIMD architectures. We discuss the implementation and performance of the Chare kernel on three architectures: shared memory, message passing, and a network of workstations. Index terms: Message-driven execution, MIMD machines, Parallel programming, Portable parallel software, Task granularity.
/parallel/environments/charm/papers/DP_TR_92_10.ps.gz: Dynamic Adaptive Scheduling in an Implementation of a Data Parallel Language by Edward A. Kornkven <kale@cs.uiuc.edu> and Laxmikant V. Kale. ABSTRACT: In the execution of a parallel program, it is desirable for all processors dedicated to the program to be kept fully utilized. However, a program that employs a lot of message-passing might spend a considerable amount of time waiting for messages to arrive. In order to mitigate this efficiency loss, instead of blocking execution for every message, we would rather overlap that communication time with other computation. This paper presents an approach to accomplishing this overlap in a systematic manner when compiling a data parallel language targeted for MIMD computers.
/parallel/libraries/memory/global-array/iway.ps.Z: Shared Memory NUMA Programming on I-WAY by J. Nieplocha and R. J. Harrison. Pacific Northwest National Laboratory, P.O. Box 999, Richland WA 99352, USA. ABSTRACT: The performance of the Global Array shared-memory non-uniform memory-access programming model is explored on the I-WAY, wide-area-network distributed supercomputer environment. The Global Array model is extended by introducing a concept of mirrored arrays. Latencies and bandwidths for remote memory access are studied, and the performance of a large application from computational chemistry is evaluated using both fully distributed and also mirrored arrays. Excellent performance can be obtained is available.
/parallel/libraries/memory/global-array/siam.ps.Z: The Global Array Programming Model for High Performance Scientific Computing by J. Nieplocha; R.J. Harrison and R.J. Littlefield. Pacific Northwest Laboratory. SIAM News, August/September 1995
/parallel/libraries/memory/global-array/frontiers.ps.Z: Disk Resident Arrays: An Array-Oriented I/O Library for Out-of-Core Computations by Ian Foster <j_nieplocha@pnl.gov>, Pacific Northwest National Laboratory, Richland, WA 99352, USA To appear in Proc. Frontiers'96 of Massively Parallel Computing Symp. ABSTRACT: In out-of-core computations, disk storage is treated as another level in the memory hierarchy, below cache, local memory, and (in a parallel computer) remote memories. However, the tools used to manage this storage are typically remote memory. This disparity complicates implementation of out-of-core algorithms and hinders portability. We describe a programming model that addresses this problem. This model allows parallel programs to use essentially the same mechanisms to manage the movement of data between take as our starting point the Global Arrays shared-memory model and library, which support a variety of operations on distributed arrays, including transfer between local and remote memories. We show how this model can be extended to support explicit transfer between global memory and secondary storage, and we define a Disk Resident Arrays library that supports such transfers. We illustrate the utility of the resulting model with two applications, an out-of-core matrix multiplication and a large computational chemistry program. We also describe implementation techniques on several parallel computers and present experimental results implemented very efficiently on parallel computers.

27th August 1996

/parallel/languages/c/parallel-c++/classes/toops/: Updated TOOPS to V1.2.1. TOOPS is Tool for Object Oriented Protocol Simulation, A C++ class library for process-oriented simulation primarily of communication protocols. TOOPS contains classes for processors, processes, channels, sockets and messages. See also ftp://ftp.ldv.e-technik.tu-muenchen.de/dist/INDEX.html
/parallel/languages/c/parallel-c++/classes/toops/toops1.21.tar.gz
/parallel/languages/c/parallel-c++/classes/toops/toops121.zip: TOOPS Version 1.21 TOOPS currently runs under HP UX 9.0x and 10.10 (HP C++ 3.40 and gcc 2.5.8 or 2.7.2), IRIX 5.3 (SGI CC 4.0), LINUX (gcc), DOS and Windows 3.1 (Borland C++ 3.1 and MS Visual C++ 1.51). We still have problems under Borland 4.x and SUN OS 4.1.3.
/parallel/environments/pvm3/tkpvm/: Updated TkPVM for Tcl 7.5p1, Tk4.1p1 - dash and plus patches. Added Plug-in for Solaris 2.5
/parallel/standards/mpi/anl/: MPICH 1.0.13 release; updated software, documents, user guide including software (below).
/parallel/standards/mpi/anl/mpich-1.0.13.tar.gz: MPI Chameleon implementation version 1.0.13
/parallel/standards/mpi/anl/mpicharticle.ps.gz: A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard by William Gropp, Mathematics and Computer Science Division, Argonne National Laboratory, USA; Ewing Lusk, Mathematics and Computer Science Division, Argonne National Laboratory, USA; Nathan Doss, Department of Computer Science & NSF Engineering Research Center for CFS, Mississippi State University, USA and Anthony Skjellum, Department of Computer Science & NSF Engineering Research Center for CFS, Mississippi State University, USA. ABSTRACT: MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we describe MPICH, unique among existing implementations in its design goal of combining portability with high performance. We document its portability and performance and describe the architecture by which these features are simultaneously achieved. We also discuss the set of tools that accompany the free distribution of MPICH, which constitute the beginnings of a portable parallel programming environment. A project of this scope inevitably imparts lessons about parallel computing, the specification being followed, the current hardware and software environment for parallel computing, and project management; we describe those we have learned. Finally, we discuss future developments for MPICH, including those necessary to accommodate extensions to the MPI Standard now being contemplated by the MPI Forum.
/parallel/environments/lam/distribution/lam60-patch.tar.gz: Updated Patches 01-17 for LAM 6.0
/parallel/environments/pvm3/emory-vss/scipvm.ps.Z: SCI-PVM: Parallel Distributed Computing on SCI Workstation Clusters by Ivan Zora ja <zoraja@split.fesb.hr>, Department of Electronics and Computer Science, University of Split, 21000 Split, Croatia; Hermann Hellwagner <hellwagn@informatik.tu-muenchen.de>, Institut fur Informatik, Technische Universitat Munchen D-80290, Munchen, Germany and Vaidy Sunderam <vss@mathcs.emory.edu>, Department of Math and Computer Science, Emory University, Atlanta, GA 30322, USA. ABSTRACT: Workstation and PC clusters interconnected by SCI (Scalable Coherent Interface) are very promising technologies for high performance cluster computing. Using commercial SBus to SCI interface cards and early system software and drivers, a two-workstation cluster has been constructed for initial testing and evaluation. The PVM system has been adapted to operate on this cluster using raw device access to the SCI interconnect, and preliminary communications performance tests have been carried out. Our preliminary results indicate that communications throughput in the range of 3.5 MBytes/s, and latencies of 620 s can be achieved on SCI clusters. These figures are significantly better (by a factor of 3 to 4) than those attainable on typical Ethernet LAN's. Moreover, our experiments were conducted with first generation SCI hardware, beta device drivers, and relatively slow workstations. We expect that in the very near future, SCI networks will be capable of delivering several tens of MBytes/s bandwidth and a few tens of microseconds latencies, and will significantly enhance the viability of cluster computing.
/parallel/environments/chimp/vispad/report-95.ps.Z: EPCC-SS95-12 Application Engineering Tools for MPI and PUL by Patricio R. Domingues ABSTRACT: VISPAT is a post-mortem visualisation tool based on the concept of program execution phases. It consists of several graphic displays, each of which presents a different aspect of the parallel program under consideration. Execution related information is collected at run-time in trace files by using calls to an instrumentation library. The processing of the trace files by VISPAT results in a graphical playback of all recorded run-time events. This report describes the enhancements and changes performed in VISPAT during this year's project.
/parallel/environments/chimp/vispad/report-94.ps.Z: Application Engineering Tools for MPI and PUL by Kesavan Shanmugam and Konstantinos Tourlas. EPCC-SS94-01, September 1994 ABSTRACT: This report describes the adaptation of VISPAD, a visualisation tool for performance analysis and debugging, from the CHIMP message passing system to the recently established MPI standard. VISPAD is a post-mortem visualisation tool based on the concept of program execution phases. It consists of a number of displays, each of which presents a different aspect of the parallel program under consideration. Execution related information is collected at run-time in trace files by using calls to an instrumentation library. The processing of the trace files by VISPAD results in a graphical playback of all the recorded run-time events. The process of adapting VISPAD to MPI included a restructuring of the instrumentation library, the implementation of an instrumented version of the MPI interface, changing the format of the trace files, the adaptation of existing displays, and the introduction of two new displays. The latter serve the purpose of visualising the rich set of communication operations supported by MPI.
/parallel/environments/chimp/vispad/report-93.ps.Z: Application Engineering Tools for CHIMP and PUL by N. Tomov and K-J. Wierenga. September 1993 ABSTRACT: This project is concerned with the implementation of a visualisation tool for performance analysis and debugging - VISPAD. The tool's interface is based on Anna Hondroudakis' thesis work [1] on visualisation tools for parallel applications. VISPAD processes information produced by a run of a parallel application. Information about the application run is recorded in trace files by instrumented versions of the CHIMP and PUL libraries and by instrumentation library calls added to the application. VISPAD can then be used to provide postmortem visualisation by replaying the application run from the information in the trace files. Visualisation is provided by a number of graphical displays, which show different aspects of the performance of the parallel application. In this way, it is hoped, the programmer will be assisted in the debugging and optimisation of her/his program. In the nine weeks of the project, three of VISPAD's displays were implemented. The Navigation Display provides a rich system of temporal abstractions (phases) to present a concise view of the application run to the user, allowing her/him to easily locate particular areas of interest. The Membership Matrix Display shows how the different processes in the parallel application join various SAP groups and the way group memberships change over time. The CHIMP Level Animation Display reconstructs CHIMP communications between processes.
/parallel/environments/paragraph/distribution/: Added more details to ParaGraph area.
/parallel/transputer/software/compilers/gcc/pereslavl/gcc-2.7.2/changes10: Changes in gcc-2.7.2-t800.10
/parallel/transputer/software/compilers/gcc/pereslavl/gcc-2.7.2/gcc-2.7.2-t800.10.dif.gz: gcc-2.7.2 for t800 (source diff) V10
/parallel/transputer/software/compilers/gcc/pereslavl/gcc-2.7.2/patch10.gz: Patch from V9 to V10
/parallel/transputer/software/compilers/gcc/pereslavl/gcc-2.7.2/changes9: Changes in gcc-2.7.2-t800.9
/parallel/transputer/software/compilers/gcc/pereslavl/gcc-2.7.2/gcc-2.7.2-t800.9.dif.gz: gcc-2.7.2 for t800 (source diff) V9
/parallel/transputer/software/compilers/gcc/pereslavl/gcc-2.7.2/patch9.gz: Patch from V8 to V9

13th August 1996

/parallel/environments/mpi/nec-mpi-tests: NEC simple MPI test programs by Hiroyuki ARAKI <araki@csl.cl.nec.co.jp> A collection of simple test programs for MPI NEC have been using to test out their MPI implementation.
/parallel/libraries/numerical/linear-algebra/scalapack: ScaLAPACK Version 1.2 by Jack Dongarra <dongarra@dasher.cs.utk.edu> The ScaLAPACK project is made up of 4 components: dense matrix software (ScaLAPACK), large sparse eigenvalue software (PARPACK), sparse direct systems software (CAPSS) and preconditioners for large sparse iterative solvers (PARPRE). This version includes routines for the solution of linear systems of equations, symmetric positive definite banded linear systems of equations, condition estimation and iterative refinement, for LU and Cholesky factorization, matrix inversion, full-rank linear least squares problems, orthogonal and generalized orthogonal factorizations, orthogonal transformation routines, reductions to upper Hessenberg, bidiagonal and tridiagonal form, reduction of a symmetric-definite generalized eigenproblem to standard form, the symmetric, generalized symmetric and the nonsymmetric eigenproblem. Get ScaLAPACK from http://www.netlib.org/scalapack/index.html
/parallel/environments/lam/distribution/xled11.tar.gz
/parallel/environments/lam/distribution/xled11.readme: XLED 1.1 by Nick Nevin <nevin@alex.osc.edu> XLED is an X/Motif based LED server that emulates good old hardware LEDs. It is implemented on top of the LAM cluster computing environment. It provides a low cost alternative for sexy blinking LEDs demos (popular with supervisors and managers!) and quick-n-dirty debugging.