Internet Parallel Computing Archive

IPCA : Languages : Impala : Impala-v0.00 : Papers

.nams-info/
.released
Last modified 11:00:00, 16 May 1996 - binary
How I Spent Summer of 1993 at MCRC
Last modified 18:01:00, 18 May 1996 - 44.4K - gzipped PostScript
What Boon did while interning at Motorola Cambridge Research Center, summer 1993. October 8, 1993.
Author: Boon S. Ang
Analysis of Non-Strict Functional Implementations of the Dongarra-Sorensen Eigensolver
Last modified 18:01:00, 18 May 1996 - 59.5K - gzipped PostScript
Abstract: We study the producer-consumer parallelism of Eigensolvers composed of a tridiagonalization function, a tridiagonal solver, and a matrix multiplication, written in the non-strict functional programming language Id. We verify the claim that non-strict functional languages allow the natural exploitation of this type of parallelism, in the framework of realistic numerical codes. We compare the standard top-down Dongarra-Sorensen solver with a new, bottom-up version. We show that this bottom-up implementation is much more space efficient than the top-down version. Also, we compare both versions of the Dongarra-Sorensen solver with the more traditional QL algorithm, and verify that the Dongarra-Sorensen solver is much more efficient, even when run in a serial mode. We show that in a non-strict functional execution model, the Dongarra-Sorensen algorithm can run completely in parallel with the Householder function. Moreover, this can be achieved without any change in the code components. We also indicate how the critical path of the complete Eigensolver can be improved.
Authors: S. Sur and A. P. W. Bohm (bohm@CS.ColoState.Edu); Tel: +1 (303) 491-7595; FAX: +1 (303) 491-6639. Department of Computer Science, Colorado State University, Ft. Collins, CO 80523, USA.
A Functional Implementation of the Jacobi Eigen-Solver
Last modified 18:01:00, 18 May 1996 - 48.8K - gzipped PostScript
June 15, 1994.
Abstract: In this paper, we describe the systematic development of two implementations of the Jacobi eigen-solver and give their performance results for the MIT/Motorola Monsoon dataflow machine. Our study is carried out using MINT, the MIT Monsoon simulator. The design of these implementations follows from the mathematics of the Jacobi method, and not from a translation of an existing sequential code. The functional semantics with respect to array updates, which cause excessive array copying, has lead us to a new implementation of a parallel "group-rotations" algorithm first described by Sameh. Our version of this algorithm requires O(n^3) operations, whereas Sameh's original version requires O(n^4) operations. The implementations are programmed in the language Id, and although Id has non-functional features, we have restricted the development of our eigen-solvers to the functional sub-set of the language.
Authors: A. P. W. Bohm, Department of Computer Science, Colorado State University, Ft. Collins, CO 80523, USA and R.E. Hiromoto, Computer Research Group, Los Alamos National Laboratory.
On the Effectiveness of Functional Language Features: NAS benchmark FT
Last modified 18:01:00, 18 May 1996 - 99.1K - gzipped PostScript
In Journal of Functional Programming 1 (1): 1{000, January 1993, Cambridge University Press.
Abstract: In this paper we investigate the effectiveness of functional language features when writing scientific codes. Our programs are written in the purely functional subset of Id and executed on a one node Motorola Monsoon machine, and in Haskell and executed on a Sparc 2. In the application we study { the NAS FT benchmark, a three-dimensional heat equation solver { it is necessary to target and select one-dimensional sub-arrays in threedimensional arrays. Furthermore, it is important to be able to share computation in array definitions. We compare first order and higher order implementations of this benchmark. The higher order version uses functions to select one-dimensional sub-arrays, or slices, from a threedimensional object, whereas the first order version creates copies to achieve the same result. We compare various representations of a three-dimensional object, and study the effect of strictness in Haskell. We also study the performance of our codes when employing recursive and iterative implementations of the one-dimensional FFT, which forms the kernel of this benchmark. It turns out that these languages still have quite inefficient implementations, with respect to both space and time. For the largest problem we could run (32 3 ), Haskell is fifteen times slower than Fortran and uses three times more space than is absolutely necessary, whereas Id on Monsoon uses nine times more cycles than Fortran on the MIPS R3000, and uses five times more space than is absolutely necessary. This code, and others like it, should inspire compiler writers to improve the performance of functional language implementations.
Authors: J. Hammes; S. Sur and W. Bohm. Department of Computer Science, Colorado State University, Ft. Collins, CO 80523, USA.
Functional, I-Structure, and M-Structure Implementations of NAS Benchmark FT
Last modified 18:01:00, 18 May 1996 - 48.6K - gzipped PostScript
Abstract: We implement the NAS parallel benchmark FT, which numerically solves a three dimensional partial differential equation using forward and inverse FFTs, in the dataflow language Id and run it on a one node monsoon machine. Id is a layered language with a purely functional kernel, a deterministic layer with I-structures, and a non-deterministic layer with M-structures. We compare the performance of versions of our code written in these three layers of Id. We measure instruction counts and critical path length using the Monsoon Interpreter Mint. We measure the space requirements of our codes by determining the largest possible problem size fitting on a one node monsoon machine. The purely functional code provides the highest average parallelism, but this parallelism turns out to be superfluous. The I-structure code executes the minimal number of instructions and as it has a similar critical path length as the functional code, runs the fastest. The M-structure code allows the largest problem sizes to be run at the cost of about 20% increase in instruction count, and 75% to 100% increase in critical path length, compared to the I-structure code.
Authors: S. Sur and W. Bohm. Department of Computer Science, Colorado State University, Ft. Collins, CO 80523, USA.
NAS parallel benchmark integer sort (IS) performance on MINT
Last modified 18:01:00, 18 May 1996 - 67.6K - gzipped PostScript
April 7, 1993.
Abstract: We implemented several sorting routines in Id and compared their relative performances in terms of number of instructions (S1), length of the critical path (S1) and average parallelism. The sorting routines considered here are of the types (1) Exchange sort (2) Insertion sort (3) Merge sort and (4) Sorting Networks. We implemented them using I-structures (e.g. merge sort) or M-structures (e.g bubble sort), whichever was proved to be more efficient. We then optimized the routines with respect to efficiency, minimized the number of barriers, eliminated redundant copying etc. to the best of our abilities and then compared their performances. We have compared our results with expected theoretical performance and obtained satisfactory results.
Authors: S. Sur and W. Bohm. Department of Computer Science, Colorado State University, Ft. Collins, CO 80523, USA.