Newest entries are first. Older changes can be found here.
29th August 1996
- /parallel/environments/pvm3/povray/
- Updated PVM POV-Ray to V2.9
- /parallel/environments/pvm3/povray/pvmpov29.tar.gz.txt
- PVM POV-Ray V2.9 Patch description
by Andreas Dilger <adilger@enel.ucalgary.ca>,
http://www-mddsp.enel.ucalgary.ca/People/adilger/, Micronet
Research Group, Dept of Electrical & Computer Engineering, University
of Calgary, Canada
Patches to add parallel processing to POV-Ray. Requires PVM 3.3.
- /parallel/environments/pvm3/povray/pvmpov29.tar.gz
- PVM POV-Ray V2.9 Patch source
by Andreas Dilger <adilger@enel.ucalgary.ca>,
http://www-mddsp.enel.ucalgary.ca/People/adilger/, Micronet
Research Group, Dept of Electrical & Computer Engineering, University
of Calgary, Canada
Based on original work by Brad Kline of Cray Research Inc.
- /parallel/libraries/communication/scotch/
- Updated SCOTCH Static Mapping Package to V3.1
The SCOTCH software package is produced by the SCOTCH project whos
goal is to study static mapping by the means of graph theory, using a
``divide and conquer'' approach.
The SCOTCH software package for static mapping embodies all the
algorithms and graph bipartitioning heuristics developed within the
SCOTCH project.
See also http://www.labri.u-bordeaux.fr/~pelegrin/scotch/
- /parallel/libraries/communication/scotch/scotch_3.1A.tar.gz
- The SCOTCH v3.1 Academic distribution
by Francois Pellegrini <pelegrin@labri.u-bordeaux.fr>
Contains binaries for Sun Solaris 2 and SunOS 4; MIPS SGI IRIX 5 and
6; Linux; PowerPC IBM AIX 4. Also contains sources, graphs, 3.1 User
Guide (below)
- /parallel/libraries/communication/scotch/scotch_user3.1.ps.gz
- The SCOTCH v3.1 User Guide
by Francois Pellegrini <pelegrin@labri.u-bordeaux.fr>
28th August 1996
- /parallel/languages/impala/
- Impala - IMplicitly PArallel LAnguage Application Suite
- /parallel/languages/impala/Announcement
- Announcement of Impala (IMplicitly PArallel LAnguages) application
suite
by Andy Shaw <shaw@hammer.lcs.mit.edu>
Impala is an application suite for Implicitly Parallel Languages.
See also http://www.csg.lcs.mit.edu/impala/
- /parallel/languages/impala/impala-v0.00.tar.gz
- Impala application suite V0.00
- /parallel/languages/impala/impala-v0.00/papers/boon-report.ps.gz
- How I Spent Summer of 1993 at MCRC
by Boon S. Ang
What Boon did while interning at Motorola Cambridge Research Center,
summer 1993. October 8, 1993
- /parallel/languages/impala/impala-v0.00/papers/eigensolver-dong-sorr.ps.gz
- Analysis of Non-Strict Functional Implementations of the
Dongarra-Sorensen Eigensolver
by S. Sur and A. P. W. Bohm <bohm@CS.ColoState.Edu>; Tel: +1 (303)
491-7595; FAX: +1 (303) 491-6639. Department of Computer Science,
Colorado State University, Ft. Collins, CO 80523, USA.
ABSTRACT:
We study the producer-consumer parallelism of Eigensolvers composed
of a tridiagonalization function, a tridiagonal solver, and a matrix
multiplication, written in the non-strict functional programming
language Id. We verify the claim that non-strict functional languages
allow the natural exploitation of this type of parallelism, in the
framework of realistic numerical codes. We compare the standard
top-down Dongarra-Sorensen solver with a new, bottom-up version. We
show that this bottom-up implementation is much more space efficient
than the top-down version. Also, we compare both versions of the
Dongarra-Sorensen solver with the more traditional QL algorithm, and
verify that the Dongarra-Sorensen solver is much more efficient, even
when run in a serial mode. We show that in a non-strict functional
execution model, the Dongarra-Sorensen algorithm can run completely in
parallel with the Householder function. Moreover, this can be achieved
without any change in the code components. We also indicate how the
critical path of the complete Eigensolver can be improved.
- /parallel/languages/impala/impala-v0.00/papers/eigensolver-jacobi.ps.gz
- A Functional Implementation of the Jacobi Eigen-Solver
by A. P. W. Bohm, Department of Computer Science, Colorado State
University, Ft. Collins, CO 80523, USA and R.E. Hiromoto, Computer
Research Group, Los Alamos National Laboratory.
June 15, 1994
ABSTRACT:
In this paper, we describe the systematic development of two
implementations of the Jacobi eigen-solver and give their performance
results for the MIT/Motorola Monsoon dataflow machine. Our study is
carried out using MINT, the MIT Monsoon simulator. The design of these
implementations follows from the mathematics of the Jacobi method, and
not from a translation of an existing sequential code. The functional
semantics with respect to array updates, which cause excessive array
copying, has lead us to a new implementation of a parallel
"group-rotations" algorithm first described by Sameh. Our version of
this algorithm requires O(n^3) operations, whereas Sameh's original
version requires O(n^4) operations. The implementations are programmed
in the language Id, and although Id has non-functional features, we
have restricted the development of our eigen-solvers to the functional
sub-set of the language.
- /parallel/languages/impala/impala-v0.00/papers/nas-ft-jfp.ps.gz
- On the Effectiveness of Functional Language Features: NAS benchmark
FT
by J. Hammes; S. Sur and W. Bohm. Department of Computer Science,
Colorado State University, Ft. Collins, CO 80523, USA.
In Journal of Functional Programming 1 (1): 1{000, January 1993,
Cambridge University Press.
ABSTRACT:
In this paper we investigate the effectiveness of functional language
features when writing scientific codes. Our programs are written in
the purely functional subset of Id and executed on a one node Motorola
Monsoon machine, and in Haskell and executed on a Sparc 2. In the
application we study { the NAS FT benchmark, a three-dimensional heat
equation solver { it is necessary to target and select one-dimensional
sub-arrays in threedimensional arrays. Furthermore, it is important to
be able to share computation in array definitions. We compare first
order and higher order implementations of this benchmark. The higher
order version uses functions to select one-dimensional sub-arrays, or
slices, from a threedimensional object, whereas the first order
version creates copies to achieve the same result. We compare various
representations of a three-dimensional object, and study the effect of
strictness in Haskell. We also study the performance of our codes when
employing recursive and iterative implementations of the
one-dimensional FFT, which forms the kernel of this benchmark. It
turns out that these languages still have quite inefficient
implementations, with respect to both space and time. For the largest
problem we could run (32 3 ), Haskell is fifteen times slower than
Fortran and uses three times more space than is absolutely necessary,
whereas Id on Monsoon uses nine times more cycles than Fortran on the
MIPS R3000, and uses five times more space than is absolutely
necessary. This code, and others like it, should inspire compiler
writers to improve the performance of functional language
implementations.
- /parallel/languages/impala/impala-v0.00/papers/nas-ft-pact.ps.gz
- Functional, I-Structure, and M-Structure Implementations of NAS
Benchmark FT
by S. Sur and W. Bohm. Department of Computer Science, Colorado State
University, Ft. Collins, CO 80523, USA.
ABSTRACT:
We implement the NAS parallel benchmark FT, which numerically solves
a three dimensional partial differential equation using forward and
inverse FFTs, in the dataflow language Id and run it on a one node
monsoon machine. Id is a layered language with a purely functional
kernel, a deterministic layer with I-structures, and a
non-deterministic layer with M-structures. We compare the performance
of versions of our code written in these three layers of Id. We
measure instruction counts and critical path length using the Monsoon
Interpreter Mint. We measure the space requirements of our codes by
determining the largest possible problem size fitting on a one node
monsoon machine. The purely functional code provides the highest
average parallelism, but this parallelism turns out to be superfluous.
The I-structure code executes the minimal number of instructions and
as it has a similar critical path length as the functional code, runs
the fastest. The M-structure code allows the largest problem sizes to
be run at the cost of about 20% increase in instruction count, and 75%
to 100% increase in critical path length, compared to the I-structure
code.
- /parallel/languages/impala/impala-v0.00/papers/nas-integer-sort.ps.gz
- NAS parallel benchmark integer sort (IS) performance on MINT
by S. Sur and W. Bohm. Department of Computer Science, Colorado State
University, Ft. Collins, CO 80523, USA.
April 7, 1993
ABSTRACT:
We implemented several sorting routines in Id and compared their
relative performances in terms of number of instructions (S1), length
of the critical path (S1) and average parallelism. The sorting
routines considered here are of the types (1) Exchange sort (2)
Insertion sort (3) Merge sort and (4) Sorting Networks. We implemented
them using I-structures (e.g. merge sort) or M-structures (e.g bubble
sort), whichever was proved to be more efficient. We then optimized
the routines with respect to efficiency, minimized the number of
barriers, eliminated redundant copying etc. to the best of our
abilities and then compared their performances. We have compared our
results with expected theoretical performance and obtained
satisfactory results.
- /parallel/performance/benchmarks/NAS-parallel/
- Updated The NASA NAS Parallel Benchmarks to V2.1.
The home page for the benchmarks is
http://www.nas.nasa.gov/NAS/NPB/ which contains the NPB
V1 and V2 reports and source code.
- /parallel/performance/benchmarks/NAS-parallel/NPB2.1.tar.gz
- NAS Parallel Benchmarks Source Code V2.1
New LU and BT in V2.1.
- /parallel/environments/charm/
- Added UIUC CHARM / CHARM++ 4.5 and 4.6 with papers and manuals
- /parallel/environments/charm/Announcement
- Announcement of CHARM / CHARM++ 4.5
by Sanjeev Krishnan <sanjeev@cs.uiuc.edu>
The 4.5 distribution contains: Converse : A runtime framework that
supports interoperability; CHARM : A parallel object-based extension
of C; CHARM++ : A C++ based parallel language that supports concurrent
objects with inheritance; Projections : An expert performance analysis
tool; SummaryTool : A simple performance analysis tool; and Dagger : A
notation for easy expression of message driven programs.
- /parallel/environments/charm/charm.manual/
- CHARM Manual
- /parallel/environments/charm/charmpp.manual/
- CHARM++ Manual
- /parallel/environments/charm/distrib.4.6/
- CHARM 4.6 distribution
- /parallel/environments/charm/distrib.4.6/doc/
- doc
- /parallel/environments/charm/distrib.4.6/net-hp.tar.gz
- CHARM / CHARM++ binaries for Networks of HP HPPA-RISC machines
- /parallel/environments/charm/distrib.4.6/net-rs6k.tar.gz
- CHARM / CHARM++ binaries for Networks of IBM RS6000 machines
- /parallel/environments/charm/distrib.4.6/sim-hp.tar.gz
- Converse Simulator binaries for HP HPPA-RISC
- /parallel/environments/charm/distrib.4.6/sim-sol.tar.gz
- Converse Simulator binaries for Sun Solaris 2.x
- /parallel/environments/charm/distrib.4.6/net-sol.tar.gz
- CHARM / CHARM++ binaries for Networks of Sun Solaris 2.x machines
- /parallel/environments/charm/distrib.4.6/uth-hp.tar.gz
- CHARM / CHARM++ binaries for uniprocessor HP HPPA-RISC
- /parallel/environments/charm/distrib.4.6/uth-rs6k.tar.gz
- CHARM / CHARM++ binaries for uniprocessor IBM RS6000
- /parallel/environments/charm/distrib.4.6/uth-sol.tar.gz
- CHARM / CHARM++ binaries for uniprocessor Sun Solaris 2.x
- /parallel/environments/charm/distrib.4.6/sp1.tar.gz
- CHARM / CHARM++ binaries for IBM SP1
- /parallel/environments/charm/distrib.4.6/t3d.tar.gz
- CHARM / CHARM++ binaries for Cray T3D
- /parallel/environments/charm/distrib.4.5/
- CHARM 4.5 distribution
- /parallel/environments/charm/distrib.4.5/announcement.txt
- Announcement of CHARM / CHARM++ 4.5
- /parallel/environments/charm/distrib.4.5/README
- Overview of files
- /parallel/environments/charm/distrib.4.5/Charm.license
- CHARM License
Basically OK for educational, research or non-profit use. For-profit
internal or evaluation only or need commercial license
(commercialUse.txt)
- /parallel/environments/charm/distrib.4.5/commercialUse.txt
- Commercial use contact details
- /parallel/environments/charm/distrib.4.5/charm.manual.ps.gz
- CHARM 4.5 Manual
- /parallel/environments/charm/distrib.4.5/charm.tutorial.ps.gz
- CHARM Tutorial
- /parallel/environments/charm/distrib.4.5/charm++.manual.ps.gz
- CHARM++ 4.5 Manual
- /parallel/environments/charm/distrib.4.5/install.manual.ps.gz
- CHARM Installation Manual
- /parallel/environments/charm/distrib.4.5/converse.manual.ps.gz
- Converse Parallel Programming Environment Manual
- /parallel/environments/charm/distrib.4.5/dagger.tutorial.ps.gz
- Dagger Language Tutorial
- /parallel/environments/charm/distrib.4.5/dagtool.manual.ps.gz
- Dagger Tool (DagTool) Tutorial
- /parallel/environments/charm/distrib.4.5/projections.manual.ps.gz
- User Manual for Projections performance analysis tool V2.0
- /parallel/environments/charm/distrib.4.5/summary.manual.ps.gz
- User Manual for SummaryTool V1.0
- /parallel/environments/charm/distrib.4.5/net-hp.tar.Z
- CHARM / CHARM++ binaries for Networks of HP HPPA-RISC machines
- /parallel/environments/charm/distrib.4.5/net-rs6k.tar.Z
- CHARM / CHARM++ binaries for Networks of IBM RS6000 machines
- /parallel/environments/charm/distrib.4.5/net-sol.tar.Z
- CHARM / CHARM++ binaries for Networks of Sun Solaris 2.x machines
- /parallel/environments/charm/distrib.4.5/net-sun.tar.Z
- CHARM / CHARM++ binaries for Networks of Sun SunOS 4.x machines
- /parallel/environments/charm/distrib.4.5/sim-hp.tar.Z
- Converse Simulator binaries for HP HPPA-RISC
- /parallel/environments/charm/distrib.4.5/sim-rs6k.tar.Z
- Converse Simulator binaries for IBM RS6000
- /parallel/environments/charm/distrib.4.5/sim-sol.tar.Z
- Converse Simulator binaries for Sun Solaris 2.x
- /parallel/environments/charm/distrib.4.5/sim-sun.tar.Z
- Converse Simulator binaries for Sun SunOS 4
- /parallel/environments/charm/distrib.4.5/uth-hp.tar.Z
- CHARM / CHARM++ binaries for uniprocessor HP HPPA-RISC
- /parallel/environments/charm/distrib.4.5/uth-rs6k.tar.Z
- CHARM / CHARM++ binaries for uniprocessor IBM RS6000
- /parallel/environments/charm/distrib.4.5/uth-sun.tar.Z
- CHARM / CHARM++ binaries for uniprocessor Sun SunOS 4
- /parallel/environments/charm/distrib.4.5/uth-sol.tar.Z
- CHARM / CHARM++ binaries for uniprocessor Sun Solaris 2.x
- /parallel/environments/charm/distrib.4.5/cm5.tar.Z
- CHARM / CHARM++ binaries for CM5
- /parallel/environments/charm/distrib.4.5/paragon-sunmos.tar.Z
- CHARM / CHARM++ binaries for Intel Paragon SUNMOS
- /parallel/environments/charm/distrib.4.5/paragon-osf.tar.Z
- CHARM / CHARM++ binaries for Intel Paragon OSF
- /parallel/environments/charm/distrib.4.5/sp1.tar.Z
- CHARM / CHARM++ binaries for IBM SP1
- /parallel/environments/charm/distrib.4.5/tools/
- CHARM Tools
- /parallel/environments/charm/papers.html
- /parallel/environments/charm/reportlist.ps.gz
- UIUC PPL / CHARM Papers Overview
- /parallel/environments/charm/papers/
- UIUC PPL / CHARM Papers
- /parallel/environments/charm/papers/ABSTRACTS
- Abstracts of some of the papers below
- /parallel/environments/charm/papers/SanjeevThesis.ps.gz
- Automating Runtime Optimizations for Parallel Object-Oriented
Programming
by Sanjeev Krishnan
Ph.D. Thesis, Department of Computer Science, University of Illinois
at Urbana-Champaign, June 1996.
ABSTRACT:
Software development for parallel computers has been recognized as
one of the bottlenecks preventing their widespread use. In this thesis
we examine two complementary approaches for addressing the challenges
of high performance and enhanced programmability in parallel programs:
automated optimizations and object-orientation. We have developed the
parallel object-oriented language Charm++ (an extension of C++), which
enables the benefits of object-orientation to be applied to the
problems of parallel programming. In order to improve parallel program
performance without extra effort, we explore the use of automated
optimizations. In particular, we have developed techniques for
automating run-time optimizations for parallel object-oriented
languages. These techniques have been embodied in the Paradise
post-mortem analysis tool which automates several run-time
optimizations without programmer intervention. Paradise builds a
program representation from traces, analyzes characteristics, chooses
and parameterizes optimizations, and generates hints to the Charm++
run-time libraries. The optimizations researched are for static and
dynamic object placement, scheduling, granularity control and
communication reduction. We also evaluate Charm++, Paradise and
several run-time optimization techniques using real applications,
including an N-body simulation program, a program from the NAS
benchmark suite, and several other programs.
- /parallel/environments/charm/papers/ParallelObjectArrays_POOMA96.ps.gz
- A Parallel Array Abstraction for Data-Driven Objects
by Sanjeev Krishnan <sanjeev@cs.uiuc.edu> and Laxmikant V. Kale
<kale@cs.uiuc.edu>.
ABSTRACT:
We describe design and implementation of an abstraction for parallel
arrays of data-driven objects. The arrays may be multi-dimensional,
and the number of elements in an array is independent of the number of
processors. The elements are mapped to processors by a
user-controllable mapping function. The mapping may be changed during
the parallel computation, which facilitates load balancing, and
communication optimization, for example. Asynchronous method
invocation is supported, with multicast, broadcast, and dimension-wide
broadcast. The abstraction is illustrated using examples in fluid
dynamics and molecular simulations.
- /parallel/environments/charm/papers/charmpp.ps.gz
- CHARM++ (Chapter 7)
by Laxmikant V. Kale and Sanjeev Krishnan.
ABSTRACT:
CHARM++ is a parallel object-oriented language based on C++. It was
developed over the past few years at the Parallel Programming
Laboratory, University of Illinois, to enable the application of
object orientation to the problems of parallel programming. Its
innovative features include message-driven execution for latency
tolerance and modularity, dynamic creation and load balancing of
concurrent objects, branched objects which are groups of objects with
one representative on every processor, and multiple specific
information-sharing abstractions. This chapter describes its design
philosophy, essential features, syntax, implementation, and
applications.
- /parallel/environments/charm/papers/Converse_IPPS96.ps.gz
- Converse: an Interoperable Framework for Parallel Programming
by Laxmikant V. Kale; Milind Bhandarkar; Narain Jagathesan; Sanjeev
Krishnan and Joshua M. Yelon.
In Proceedings of the International Parallel Processing Symposium,
Honolulu, Hawaii, April 1996.
ABSTRACT:
Many different parallel languages and paradigms have been developed,
each with its own advantages. To benefit from all of them, it should
be possible to link together modules written in different parallel
languages in a single application. Since the paradigms sometimes
differ in fundamental ways, this is difficult to accomplish. This
paper describes a framework, Converse, that supports such
multi-lingual interoperability. The framework is meant to be
inclusive, and has been verified to support the SPMD programming
style, message-driven programming, parallel object-oriented
programming, and thread-based paradigms. The framework aims at
extracting the essential aspects of the runtime support into a set of
core components, so that language-specific code does not have to pay
overhead for features that it does not need.
- /parallel/environments/charm/papers/RuntimeOpts_ICS96.ps.gz
- Automating Parallel Runtime Optimizations Using Post-Mortem
Analysis
by Sanjeev Krishnan and L. V. Kale.
To appear in Proceedings of the 10th ACM International Conference on
Supercomputing, Philadelphia, May 1996.
ABSTRACT:
Attaining good performance for parallel programs frequently requires
substantial expertise and effort, which can be reduced by automated
optimizations. In this paper we concentrate on run-time optimizations
and techniques to automate them without programmer intervention, using
post-mortem analysis of parallel program execution. We classify the
characteristics of parallel programs with respect to object placement
(mapping), scheduling and communication, then describe techniques to
discover these characteristics by post-mortem analysis, present
heuristics to choose appropriate optimizations based on these
characteristics, and describe techniques to generate concise hints to
runtime optimization libraries. Our ideas have been developed in the
framework of the {\em Paradise} post-mortem analysis tool for the
parallel object-oriented language Charm++. We also present results for
optimizing simple parallel programs running on the Thinking Machines
CM-5.
- /parallel/environments/charm/papers/ChareKernel_ICPP91.ps.gz
- Supporting Machine Independent Parallel Programming on Diverse
Architectures.
by Wayne Fenton; Balkrishan Ramkumar; Vikram Saletore; Amitabh B.
Sinha and Laxmikant V. Kale.
International Conference on Parallel Processing, August, 1991.
ABSTRACT:
The Chare kernel is a run time support system that permits users to
write machine independent parallel programs on MIMD multiprocessors
without losing efficiency. It supports an explicitly parallel language
which helps control the complexity of parallel program design by
imposing a separation of concerns between the user program and the
system. The programmer is responsible for the dynamic creation of
processes and exchanging messages between processes. The kernel
assumes responsibility for when and where to execute the processes,
dynamic load balancing, and other ``low'' level features. The language
also provides machine-independent abstractions for information sharing
which are implemented differently on different types of machines.
The language has been implemented on both shared and nonshared memory
machines including Sequent Balance and Symmetry, Encore Multimax,
Alliant FX/8, Intel iPSC/2, iPSC/860 and NCUBE/2, and is being ported
to NUMA (Non Uniform Memory Access) machines like the BBN TC2000. It
is also being ported to a network of Sun workstations. We discuss the
salient features of the implementation of the kernel on the three
different types of architectures.
- /parallel/environments/charm/papers/Projections_ICPP94.ps.gz
- Projections: a preliminary performance tool for Charm.
by Laxmikant V. Kale and Amitabh B. Sinha.
Parallel Systems Fair, International Symposium on Parallel
Processing, Newport Beach, April 1993.
ABSTRACT:
The advent and acceptance of massively parallel machines has made it
increasingly important to have tools to analyze the performance of
programs run ning on these machines. Current day performance tools
suffer from two drawbacks: they are not scalable and they lose
specific information about the user program in their attempt for
generality. In this paper, we present Projections, a scalable
performance tool, for Charm that can provide program-specific
information to help the users better understand the behavior of their
programs.
- /parallel/environments/charm/papers/PrioritizedLoadBalancing_IPPS93.ps.gz
- A Load Balancing Strategy For Prioritized Execution of Tasks.
by Amitabh B. Sinha and Laxmikant V. Kale.
International Symposium on Parallel Processing, Newport Beach, April
1993.
ABSTRACT:
Load balancing is a critical factor in achieving optimal performance
in parallel applications where tasks are created in a dynamic fashion.
In many computations, such as state space search problems, tasks have
priorities, and solutions to the computation may be achieved more
efficiently if these priorities are adhered to in the parallel
execution of the tasks. For such tasks, a load balancing scheme that
only seeks to balance load, without balancing high priority tasks over
the entire system, might result in the concentration of high priority
tasks (even in a balanced-load environment) on a few processors,
thereby leading to low priority work being done. In such situations a
load balancing scheme is desired which would balance both load and
high priority tasks over the system. In this paper, we describe the
development of a more efficient prioritized load balancing strategy.
- /parallel/environments/charm/papers/ParallelSort_ICPP93.ps.gz
- A Comparison Based Parallel Sorting Algorithm.
by L.V. Kale and Sanjeev Krishnan.
International Conference on Parallel Processing, August 1993.
ABSTRACT:
We present a fast comparison based parallel sorting algorithm that
can handle arbitrary key types. Data movement is the major portion of
sorting time for most algorithms in the literature. Our algorithm is
parameterized so that it can be tuned to control data movement time,
especially for large data sets. Parallel histograms are used to
partition the key set exactly. The algorithm is architecture
independent, and has been implemented in the CHARM portable parallel
programming system, allowing it to be efficiently run on virtually any
MIMD computer. Performance results for sorting different data sets are
presented.
- /parallel/environments/charm/papers/Charm++_OOPSLA93.ps.gz
- CHARM++ : A Portable Concurrent Object Oriented System Based On
C++.
by L.V. Kale and Sanjeev Krishnan.
Proceedings of the Conference on Object Oriented Programming Systems,
Languages and Applications, Sept-Oct 1993. ACM Sigplan Notes, Vol. 28,
No. 10, pp. 91-108. (Also: Technical Report UIUCDCS-R-93-1796, March
1993, University of Illinois, Urbana, IL.) [Internal Report #93-2,
March 93]
ABSTRACT:
We describe Charm++, an object oriented portable parallel programming
language based on C++. Its design philosophy, implementation, sample
applications and their performance on various parallel machines are
described. Charm++ is an explicitly parallel language consisting of
C++ with a few extensions. It provides a clear separation between
sequential and parallel objects. The execution model of Charm++ is
message driven, thus helping one write programs that are
latency-tolerant. The language supports multiple inheritance, dynamic
binding, overloading, strong typing, and reuse for parallel objects.
Charm++ provides specific modes for sharing information between
parallel objects. Extensive dynamic load balancing strategies are
provided. It is based on the Charm parallel programming system, and
its runtime system implementation reuses most of the runtime system
for Charm.
- /parallel/environments/charm/papers/Checkpoint.ps.gz
- Efficient, Language-Based Checkpointing for Massively Parallel
Programs
by Sanjeev Krishnan and Laxmikant V. Kale.
Submitted for publication. [Internal Report #94-2]
ABSTRACT:
Checkpointing and restart is an approach to ensuring forward progress
of a program in spite of system failures or planned interruptions. We
investigate issues in checkpointing and restart of programs running on
massively parallel computers. We identify a new set of issues that
have to be considered for the MPP platform, based on which we have
designed an approach based on the language and run-time system. Hence
our checkpointing facility can be used on virtually any parallel
machine in a portable manner, irrespective of whether the operating
system supports checkpointing. We present methods to make
checkpointing and restart space- and time-efficient, including
object-specific functions that save the state of an object. We present
techniques to automatically generate checkpointing code for parallel
objects, without programmer intervention. We also present mechanisms
to allow the programmer to easily incorporate application specific
knowledge selectively to make the checkpointing more efficient. The
techniques developed here have been implemented in the Charm++
parallel object-oriented programming language and run-time system.
Performance results are presented for the checkpointing overhead of
programs running on parallel machines.
- /parallel/environments/charm/papers/FMA_ICPP95.ps.gz
- A Parallel Adaptive Fast Multipole algorithm for N-body problems
by Sanjeev Krishnan and Laxmikant V. Kale.
In Proceedings of the International Conference on Parallel
Processsing, August 1995.
ABSTRACT:
We describe the design and implementation of a parallel adaptive fast
multipole algorithm (AFMA) for N-body problems. Our AFMA algorithm can
organize particles in cells of arbitrary shape. This simplifies its
parallelization, so that good locality and load balance are both
easily achieved. We describe a tighter well-separatedness criterion,
and improved techniques for constructing the AFMA tree. We describe
how to avoid redundant computation of pair-wise interactions while
maintaining load balance, using a fast edge-partitioning algorithm.
The AFMA algorithm is designed in an object oriented, message-driven
manner, allowing latency tolerance by overlapping computation and
communication easily. It also incorporates several optimizations for
message prioritization and communication reduction. Preliminary
performance results of our implementation using the Charm++ parallel
programming system are presented.
- /parallel/environments/charm/papers/agents.ps.gz
- Agents: an Undistorted Representation of Problem Structure.
To appear in proceedings of the conference on Languages and Compilers
for Parallel Computing, 1995.
ABSTRACT:
Agents is an actors-based language, and like every other such
language, it has a computation graph formed by the actors and the
lines of communication between them. Agents is unusual, though, in
that the structure of the computation graph is explicitly declared.
However, unlike most languages in which the structure of the
computation is declared (eg, visual dataflow languages), jlang does
not require the computation graph to be gridlike, finite, or flat:
rather, it can be a tree, a table, a graph, or indeed any arbitrarily
complex, irregular, or even infinite pattern. By providing explicit
constructs for declaring the structure of the computation, two
advantages are gained: one, it becomes much easier for the programmer
to express the computation, and two, the compiler and runtime are able
to perform optimizations that would not be possible in a normal
actors-based language where the structure of computation is hidden or
unknown.
- /parallel/environments/charm/papers/pvmug95.ps.gz
- Interoperability and multithreading for PVM using the CONVERSE
Interoperable Framework
by L. V. Kale; Abner Zangvil and Narain K. Jagathesan.
Presentation for PVMUG'95
- /parallel/environments/charm/papers/hpccPosition.ps.gz
- Application Oriented and Computer Science Centered HPCC Research
by Laxmikant V. Kale <kale@cs.uiuc.edu>
May 12 1994
ABSTRACT:
At this time, there is a perception of a backlash against the HPCC
program, and even the idea of massively parallel computing itself. In
preparation to defining an agenda for HPCC, this paper first analyzes
the reasons for this backlash. Although beset with unrealistic
expectations, parallel processing will be a beneficial technology with
a broad impact, beyond applications in science. However, this will
require significant advances and work in computer science in addition
to parallel hardware and end-applications which are emphasized
currently. The paper presents a possible agenda that could lead to a
successful HPCC program in the future.
- /parallel/environments/charm/papers/Performance_ICPP94.ps.gz
- A framework for intelligent performance feedback
by Amitabh B. Sinha and Laxmikant V. Kale.
ABSTRACT:
The significant gap between peak and realized performance of parallel
machines motivates the need for performance analysis. Contemporary
tools provide only generic measurement, rather than program-specific
information and analysis. An object-oriented and message-driven
language, such as Charm, presents opportunities for both
program-specific feedback and automatic performance analysis We
present a framework in which specific and intelligent feedback can be
given to the user about their parallel program. The framework will use
information about the parallel program generated at compile-time and
at run-time to analyze its performance using general expertise and
specific algorithms in performance analysis.
- /parallel/environments/charm/papers/InformationSharing_SC93.ps.gz
- Information Sharing Mechanisms in Parallel Programs
by L.V. Kale and Amitabh Sinha.
International Parallel Processing Symposium, Cancun, Mexico, April
26-29, 1994. [Internal Report #93-4, March 1993]
ABSTRACT:
Most parallel programming models provide a single generic mode in
which processes can exchange information with each other. However,
empirical observation of parallel programs suggests that processes
share data in a few distinct and specific modes. We argue that such
modes should be identified and explicitly supported in parallel
languages and their associated models. The paper describes a set of
information sharing abstractions that have been identified and
incorporated in the parallel programming language Charm. It can be
seen that using these abstractions leads to improved clarity and
expressiveness of user programs. In addition, the specificity provided
by these abstractions can be exploited at compile-time and at run-time
to provide the user with highly refined performance feedback and
intelligent debugging tools.
- /parallel/environments/charm/papers/Symbolic_LNCS93.ps.gz
- Prioritization in Parallel Symbolic Computing
by L.V. Kale; B. Ramkumar; V. Saletore and A.B. Sinha.
Lecture Notes in Computer Science, Vol. 748, pp. 12-41, 1993.
[Internal Report #93-6]
- /parallel/environments/charm/papers/CharmOverview.ps.gz
- Parallel Programming with CHARM: An Overview
by L.V. Kale
July 1993. [Internal Report #93-8]
- /parallel/environments/charm/papers/Quiescence.ps.gz
- A Dynamic and Adaptive Quiescence Detection Algorithm
by A. Sinha; L.V. Kale and B. Ramkumar.
September 1993. [Internal Report #93-11]
- /parallel/environments/charm/papers/Projections_IPPS93.ps.gz
- Projections: A Preliminary Performance Tool for Charm
by L.V. Kale and A.B. Sinha.
Parallel Systems Fair, International Parallel Processing Symposium,
Newport Beach, CA., pp. 108-114, April 1993. [Internal Report #92-3]
- /parallel/environments/charm/papers/TPDS_PartI.ps.gz
- The Charm Parallel Programming Language and System: Part I -
Description of Language Features
by L. V. Kale; B. Ramkumar; A. B. Sinha and A. Gursoy.
ABSTRACT:
We describe a parallel programming system for developing machine
independent programs for all MIMD machines. Many useful approaches to
this problem are seen to require a common base of support, which can
be encapsulated in a language that abstracts over resource management
decisions and machine specific details. This language can be used for
implementing other high level approaches as well as for efficient
application programming. The requirements for such a language are
defined, and the language supported by the Charm system is described,
and illustrated with examples. Charm is one of the first languages to
support message driven execution, and embodies unique abstractions
such as branch office chares and specifically shared variables. In
Part II of this paper, we talk about the runtime support system for
Charm. The system thus provides ease of programming on MIMD platforms
without sacrificing performance.
- /parallel/environments/charm/papers/TPDS_PartII.ps.gz
- The Charm Parallel Programming Language and System: Part II - The
Runtime System
by B. Ramkumar; A. B. Sinha; V. A. Saletore and L. V. Kale.
ABSTRACT:
Charm is a parallel programming system that permits users to write
portable parallel programs on MIMD multiprocessors without losing
efficiency. It supports an explicitly parallel language which helps
control the complexity of parallel program design by imposing a
separation of concerns between the user program and the system. It
also provides target machine independent abstractions for information
sharing which are implemented differently on different types of
processors. In part I of this paper [16], we described the language
support provided by Charm and the rationale behind its design. Charm
has been implemented on a variety of parallel machines including
shared memory machines like the Encore Multimax and the Sequent
Symmetry, message passing architectures like the Intel iPSC/2, Intel
i860 and the NCUBE 2, and a network of Unix workstations. The Chare
kernel is the run-time system that supports the portable execution of
Charm on several MIMD architectures. We discuss the implementation and
performance of the Chare kernel on three architectures: shared memory,
message passing, and a network of workstations. Index terms:
Message-driven execution, MIMD machines, Parallel programming,
Portable parallel software, Task granularity.
- /parallel/environments/charm/papers/DP_TR_92_10.ps.gz
- Dynamic Adaptive Scheduling in an Implementation of a Data Parallel
Language
by Edward A. Kornkven <kale@cs.uiuc.edu> and Laxmikant V. Kale.
ABSTRACT:
In the execution of a parallel program, it is desirable for all
processors dedicated to the program to be kept fully utilized.
However, a program that employs a lot of message-passing might spend a
considerable amount of time waiting for messages to arrive. In order
to mitigate this efficiency loss, instead of blocking execution for
every message, we would rather overlap that communication time with
other computation. This paper presents an approach to accomplishing
this overlap in a systematic manner when compiling a data parallel
language targeted for MIMD computers.
- /parallel/libraries/memory/global-array/iway.ps.Z
- Shared Memory NUMA Programming on I-WAY
by J. Nieplocha and R. J. Harrison. Pacific Northwest National
Laboratory, P.O. Box 999, Richland WA 99352, USA.
ABSTRACT:
The performance of the Global Array shared-memory non-uniform
memory-access programming model is explored on the I-WAY,
wide-area-network distributed supercomputer environment. The Global
Array model is extended by introducing a concept of mirrored arrays.
Latencies and bandwidths for remote memory access are studied, and the
performance of a large application from computational chemistry is
evaluated using both fully distributed and also mirrored arrays.
Excellent performance can be obtained is available.
- /parallel/libraries/memory/global-array/siam.ps.Z
- The Global Array Programming Model for High Performance Scientific
Computing
by J. Nieplocha; R.J. Harrison and R.J. Littlefield. Pacific
Northwest Laboratory.
SIAM News, August/September 1995
- /parallel/libraries/memory/global-array/frontiers.ps.Z
- Disk Resident Arrays: An Array-Oriented I/O Library for Out-of-Core
Computations
by Ian Foster <j_nieplocha@pnl.gov>, Pacific Northwest National
Laboratory, Richland, WA 99352, USA
To appear in Proc. Frontiers'96 of Massively Parallel Computing Symp.
ABSTRACT:
In out-of-core computations, disk storage is treated as another level
in the memory hierarchy, below cache, local memory, and (in a parallel
computer) remote memories. However, the tools used to manage this
storage are typically remote memory. This disparity complicates
implementation of out-of-core algorithms and hinders portability. We
describe a programming model that addresses this problem. This model
allows parallel programs to use essentially the same mechanisms to
manage the movement of data between take as our starting point the
Global Arrays shared-memory model and library, which support a variety
of operations on distributed arrays, including transfer between local
and remote memories. We show how this model can be extended to support
explicit transfer between global memory and secondary storage, and we
define a Disk Resident Arrays library that supports such transfers.
We illustrate the utility of the resulting model with two
applications, an out-of-core matrix multiplication and a large
computational chemistry program. We also describe implementation
techniques on several parallel computers and present experimental
results implemented very efficiently on parallel computers.
27th August 1996
- /parallel/languages/c/parallel-c++/classes/toops/
- Updated TOOPS to V1.2.1. TOOPS is Tool for Object Oriented
Protocol Simulation, A C++ class library for process-oriented
simulation primarily of communication protocols.
TOOPS contains classes for processors, processes, channels, sockets
and messages.
See also ftp://ftp.ldv.e-technik.tu-muenchen.de/dist/INDEX.html
- /parallel/languages/c/parallel-c++/classes/toops/toops1.21.tar.gz
- /parallel/languages/c/parallel-c++/classes/toops/toops121.zip
- TOOPS Version 1.21
TOOPS currently runs under HP UX 9.0x and 10.10 (HP C++ 3.40 and gcc
2.5.8 or 2.7.2), IRIX 5.3 (SGI CC 4.0), LINUX (gcc), DOS and Windows
3.1 (Borland C++ 3.1 and MS Visual C++ 1.51). We still have problems
under Borland 4.x and SUN OS 4.1.3.
- /parallel/environments/pvm3/tkpvm/
- Updated TkPVM for Tcl 7.5p1, Tk4.1p1 - dash and plus patches.
Added Plug-in for Solaris 2.5
- /parallel/standards/mpi/anl/
- MPICH 1.0.13 release; updated software, documents, user
guide including software (below).
- /parallel/standards/mpi/anl/mpich-1.0.13.tar.gz
- MPI Chameleon implementation version 1.0.13
- /parallel/standards/mpi/anl/mpicharticle.ps.gz
- A High-Performance, Portable Implementation of the MPI Message
Passing Interface Standard
by William Gropp, Mathematics and Computer Science Division, Argonne
National Laboratory, USA; Ewing Lusk, Mathematics and Computer Science
Division, Argonne National Laboratory, USA; Nathan Doss, Department of
Computer Science & NSF Engineering Research Center for CFS,
Mississippi State University, USA and Anthony Skjellum, Department of
Computer Science & NSF Engineering Research Center for CFS,
Mississippi State University, USA.
ABSTRACT:
MPI (Message Passing Interface) is a specification for a standard
library for message passing that was defined by the MPI Forum, a
broadly based group of parallel computer vendors, library writers, and
applications specialists. Multiple implementations of MPI have been
developed. In this paper, we describe MPICH, unique among existing
implementations in its design goal of combining portability with high
performance. We document its portability and performance and describe
the architecture by which these features are simultaneously achieved.
We also discuss the set of tools that accompany the free distribution
of MPICH, which constitute the beginnings of a portable parallel
programming environment. A project of this scope inevitably imparts
lessons about parallel computing, the specification being followed,
the current hardware and software environment for parallel computing,
and project management; we describe those we have learned. Finally, we
discuss future developments for MPICH, including those necessary to
accommodate extensions to the MPI Standard now being contemplated by
the MPI Forum.
- /parallel/environments/lam/distribution/lam60-patch.tar.gz
- Updated Patches 01-17 for LAM 6.0
- /parallel/environments/pvm3/emory-vss/scipvm.ps.Z
- SCI-PVM: Parallel Distributed Computing on SCI Workstation Clusters
by Ivan Zora ja <zoraja@split.fesb.hr>, Department of Electronics and
Computer Science, University of Split, 21000 Split, Croatia; Hermann
Hellwagner <hellwagn@informatik.tu-muenchen.de>, Institut fur
Informatik, Technische Universitat Munchen D-80290, Munchen, Germany
and Vaidy Sunderam <vss@mathcs.emory.edu>, Department of Math and
Computer Science, Emory University, Atlanta, GA 30322, USA.
ABSTRACT:
Workstation and PC clusters interconnected by SCI (Scalable Coherent
Interface) are very promising technologies for high performance
cluster computing. Using commercial SBus to SCI interface cards and
early system software and drivers, a two-workstation cluster has been
constructed for initial testing and evaluation. The PVM system has
been adapted to operate on this cluster using raw device access to the
SCI interconnect, and preliminary communications performance tests
have been carried out. Our preliminary results indicate that
communications throughput in the range of 3.5 MBytes/s, and latencies
of 620 s can be achieved on SCI clusters. These figures are
significantly better (by a factor of 3 to 4) than those attainable on
typical Ethernet LAN's. Moreover, our experiments were conducted with
first generation SCI hardware, beta device drivers, and relatively
slow workstations. We expect that in the very near future, SCI
networks will be capable of delivering several tens of MBytes/s
bandwidth and a few tens of microseconds latencies, and will
significantly enhance the viability of cluster computing.
- /parallel/environments/chimp/vispad/report-95.ps.Z
- EPCC-SS95-12 Application Engineering Tools for MPI and PUL
by Patricio R. Domingues
ABSTRACT:
VISPAT is a post-mortem visualisation tool based on the concept of
program execution phases. It consists of several graphic displays,
each of which presents a different aspect of the parallel program
under consideration. Execution related information is collected at
run-time in trace files by using calls to an instrumentation library.
The processing of the trace files by VISPAT results in a graphical
playback of all recorded run-time events. This report describes the
enhancements and changes performed in VISPAT during this year's
project.
- /parallel/environments/chimp/vispad/report-94.ps.Z
- Application Engineering Tools for MPI and PUL
by Kesavan Shanmugam and Konstantinos Tourlas.
EPCC-SS94-01, September 1994
ABSTRACT:
This report describes the adaptation of VISPAD, a visualisation tool
for performance analysis and debugging, from the CHIMP message passing
system to the recently established MPI standard. VISPAD is a
post-mortem visualisation tool based on the concept of program
execution phases. It consists of a number of displays, each of which
presents a different aspect of the parallel program under
consideration. Execution related information is collected at run-time
in trace files by using calls to an instrumentation library. The
processing of the trace files by VISPAD results in a graphical
playback of all the recorded run-time events. The process of adapting
VISPAD to MPI included a restructuring of the instrumentation library,
the implementation of an instrumented version of the MPI interface,
changing the format of the trace files, the adaptation of existing
displays, and the introduction of two new displays. The latter serve
the purpose of visualising the rich set of communication operations
supported by MPI.
- /parallel/environments/chimp/vispad/report-93.ps.Z
- Application Engineering Tools for CHIMP and PUL
by N. Tomov and K-J. Wierenga.
September 1993
ABSTRACT:
This project is concerned with the implementation of a visualisation
tool for performance analysis and debugging - VISPAD. The tool's
interface is based on Anna Hondroudakis' thesis work [1] on
visualisation tools for parallel applications. VISPAD processes
information produced by a run of a parallel application. Information
about the application run is recorded in trace files by instrumented
versions of the CHIMP and PUL libraries and by instrumentation library
calls added to the application. VISPAD can then be used to provide
postmortem visualisation by replaying the application run from the
information in the trace files. Visualisation is provided by a number
of graphical displays, which show different aspects of the performance
of the parallel application. In this way, it is hoped, the programmer
will be assisted in the debugging and optimisation of her/his program.
In the nine weeks of the project, three of VISPAD's displays were
implemented. The Navigation Display provides a rich system of temporal
abstractions (phases) to present a concise view of the application run
to the user, allowing her/him to easily locate particular areas of
interest. The Membership Matrix Display shows how the different
processes in the parallel application join various SAP groups and the
way group memberships change over time. The CHIMP Level Animation
Display reconstructs CHIMP communications between processes.
- /parallel/environments/paragraph/distribution/
- Added more details to ParaGraph area.
- /parallel/transputer/software/compilers/gcc/pereslavl/gcc-2.7.2/changes10
- Changes in gcc-2.7.2-t800.10
- /parallel/transputer/software/compilers/gcc/pereslavl/gcc-2.7.2/gcc-2.7.2-t800.10.dif.gz
- gcc-2.7.2 for t800 (source diff) V10
- /parallel/transputer/software/compilers/gcc/pereslavl/gcc-2.7.2/patch10.gz
- Patch from V9 to V10
- /parallel/transputer/software/compilers/gcc/pereslavl/gcc-2.7.2/changes9
- Changes in gcc-2.7.2-t800.9
- /parallel/transputer/software/compilers/gcc/pereslavl/gcc-2.7.2/gcc-2.7.2-t800.9.dif.gz
- gcc-2.7.2 for t800 (source diff) V9
- /parallel/transputer/software/compilers/gcc/pereslavl/gcc-2.7.2/patch9.gz
- Patch from V8 to V9
13th August 1996
- /parallel/environments/mpi/nec-mpi-tests
- NEC simple MPI test programs
by Hiroyuki ARAKI <araki@csl.cl.nec.co.jp>
A collection of simple test programs for MPI NEC have been using to
test out their MPI implementation.
- /parallel/libraries/numerical/linear-algebra/scalapack
- ScaLAPACK Version 1.2
by Jack Dongarra <dongarra@dasher.cs.utk.edu>
The ScaLAPACK project is made up of 4 components: dense matrix
software (ScaLAPACK), large sparse eigenvalue software (PARPACK),
sparse direct systems software (CAPSS) and preconditioners for large
sparse iterative solvers (PARPRE).
This version includes routines for the solution of linear systems of
equations, symmetric positive definite banded linear systems of
equations, condition estimation and iterative refinement, for LU and
Cholesky factorization, matrix inversion, full-rank linear least
squares problems, orthogonal and generalized orthogonal
factorizations, orthogonal transformation routines, reductions to
upper Hessenberg, bidiagonal and tridiagonal form, reduction of a
symmetric-definite generalized eigenproblem to standard form, the
symmetric, generalized symmetric and the nonsymmetric eigenproblem.
Get ScaLAPACK from http://www.netlib.org/scalapack/index.html
- /parallel/environments/lam/distribution/xled11.tar.gz
- /parallel/environments/lam/distribution/xled11.readme
- XLED 1.1
by Nick Nevin <nevin@alex.osc.edu>
XLED is an X/Motif based LED server that emulates good old hardware
LEDs. It is implemented on top of the LAM cluster computing
environment. It provides a low cost alternative for sexy blinking LEDs
demos (popular with supervisors and managers!) and quick-n-dirty
debugging.
Copyright © 1993-2000 Dave Beckett & WoTUG