[ News | IPCA | Mirrors | Add | Search | Mail | Help | WoTUG ]
Abstract: We study the producer-consumer parallelism of Eigensolvers composed of a tridiagonalization function, a tridiagonal solver, and a matrix multiplication, written in the non-strict functional programming language Id. We verify the claim that non-strict functional languages allow the natural exploitation of this type of parallelism, in the framework of realistic numerical codes. We compare the standard top-down Dongarra-Sorensen solver with a new, bottom-up version. We show that this bottom-up implementation is much more space efficient than the top-down version. Also, we compare both versions of the Dongarra-Sorensen solver with the more traditional QL algorithm, and verify that the Dongarra-Sorensen solver is much more efficient, even when run in a serial mode. We show that in a non-strict functional execution model, the Dongarra-Sorensen algorithm can run completely in parallel with the Householder function. Moreover, this can be achieved without any change in the code components. We also indicate how the critical path of the complete Eigensolver can be improved.
Authors: S. Sur and A. P. W. Bohm (bohm@CS.ColoState.Edu); Tel: +1 (303) 491-7595; FAX: +1 (303) 491-6639. Department of Computer Science, Colorado State University, Ft. Collins, CO 80523, USA.
Abstract: In this paper, we describe the systematic development of two implementations of the Jacobi eigen-solver and give their performance results for the MIT/Motorola Monsoon dataflow machine. Our study is carried out using MINT, the MIT Monsoon simulator. The design of these implementations follows from the mathematics of the Jacobi method, and not from a translation of an existing sequential code. The functional semantics with respect to array updates, which cause excessive array copying, has lead us to a new implementation of a parallel "group-rotations" algorithm first described by Sameh. Our version of this algorithm requires O(n^3) operations, whereas Sameh's original version requires O(n^4) operations. The implementations are programmed in the language Id, and although Id has non-functional features, we have restricted the development of our eigen-solvers to the functional sub-set of the language.
Authors: A. P. W. Bohm, Department of Computer Science, Colorado State University, Ft. Collins, CO 80523, USA and R.E. Hiromoto, Computer Research Group, Los Alamos National Laboratory.
Abstract: In this paper we investigate the effectiveness of functional language features when writing scientific codes. Our programs are written in the purely functional subset of Id and executed on a one node Motorola Monsoon machine, and in Haskell and executed on a Sparc 2. In the application we study { the NAS FT benchmark, a three-dimensional heat equation solver { it is necessary to target and select one-dimensional sub-arrays in threedimensional arrays. Furthermore, it is important to be able to share computation in array definitions. We compare first order and higher order implementations of this benchmark. The higher order version uses functions to select one-dimensional sub-arrays, or slices, from a threedimensional object, whereas the first order version creates copies to achieve the same result. We compare various representations of a three-dimensional object, and study the effect of strictness in Haskell. We also study the performance of our codes when employing recursive and iterative implementations of the one-dimensional FFT, which forms the kernel of this benchmark. It turns out that these languages still have quite inefficient implementations, with respect to both space and time. For the largest problem we could run (32 3 ), Haskell is fifteen times slower than Fortran and uses three times more space than is absolutely necessary, whereas Id on Monsoon uses nine times more cycles than Fortran on the MIPS R3000, and uses five times more space than is absolutely necessary. This code, and others like it, should inspire compiler writers to improve the performance of functional language implementations.
Authors: J. Hammes; S. Sur and W. Bohm. Department of Computer Science, Colorado State University, Ft. Collins, CO 80523, USA.
Abstract: We implement the NAS parallel benchmark FT, which numerically solves a three dimensional partial differential equation using forward and inverse FFTs, in the dataflow language Id and run it on a one node monsoon machine. Id is a layered language with a purely functional kernel, a deterministic layer with I-structures, and a non-deterministic layer with M-structures. We compare the performance of versions of our code written in these three layers of Id. We measure instruction counts and critical path length using the Monsoon Interpreter Mint. We measure the space requirements of our codes by determining the largest possible problem size fitting on a one node monsoon machine. The purely functional code provides the highest average parallelism, but this parallelism turns out to be superfluous. The I-structure code executes the minimal number of instructions and as it has a similar critical path length as the functional code, runs the fastest. The M-structure code allows the largest problem sizes to be run at the cost of about 20% increase in instruction count, and 75% to 100% increase in critical path length, compared to the I-structure code.
Authors: S. Sur and W. Bohm. Department of Computer Science, Colorado State University, Ft. Collins, CO 80523, USA.
Abstract: We implemented several sorting routines in Id and compared their relative performances in terms of number of instructions (S1), length of the critical path (S1) and average parallelism. The sorting routines considered here are of the types (1) Exchange sort (2) Insertion sort (3) Merge sort and (4) Sorting Networks. We implemented them using I-structures (e.g. merge sort) or M-structures (e.g bubble sort), whichever was proved to be more efficient. We then optimized the routines with respect to efficiency, minimized the number of barriers, eliminated redundant copying etc. to the best of our abilities and then compared their performances. We have compared our results with expected theoretical performance and obtained satisfactory results.
Authors: S. Sur and W. Bohm. Department of Computer Science, Colorado State University, Ft. Collins, CO 80523, USA.