HPC-Netlib Recommended Reading
Contribute to this page
Contents of this page
General
Books
- Gallivan, Heath, Ng, Ortega, Peyton, Plemmons, Romine, Sameh, and Voigt.
Parallel Algorithms for Matrix Computations. SIAM, 1990.
- Dongarra, Duff Sorensen, and Van der Vorst. Solving Linear Systems
on Vector and Shared Memory Computers. SIAM, 1991.
Papers
- Demmel, Heath, and Van der Vorst.
Parallel Numerical Linear Algebra.
Acta Numerica, 1993.
Papers
- Jack J. Dongarra and David W. Walker.
Software Libraries for Linear Algebra Computation on High-Performance
Computers.
ORNL TM-12404, August 1993.
- This paper discusses the design of linear algebra libraries for high
performance computers. Particular emphasis is placed on the development
of scalable algorithms for MIMD distributed memory concurrent
computers. A brief description of the EISPACK, LINPACK, and LAPACK
libraries is given, followed by an outline of ScaLAPACK, which is a
distributed memory version of LAPACK currently under development. The
importance of block-partitioned algorithms
in reducing the frequency of data movement between different levels
of hierarchical memory is stressed. The use of such algorithms
helps reduce the message startup costs on distributed memory concurrent
computers. Other key ideas in our approach are the use of distributed
versions of the Level 3 Basic Linear Algebra Subprograms (BLAS) as
computational building blocks, and the use of Basic
Linear Algebra Communication Subprograms (BLACS) as communication
building blocks. Together the distributed BLAS and the BLACS can be
used to construct higher-level algorithms, and hide many details of
the parallelism from the application developer.
The block-cyclic data distribution is described, and adopted as a good
way of distributing block-partitioned matrices. Block-partitioned
versions of the Cholesky and LU factorizations are presented, and
optimization issues associated with the implementation of the LU
factorization algorithm on distributed memory concurrent computers
are discussed, together with its performance on the Intel Delta system.
Finally, approaches to the design of library interfaces are reviewed.
Books
Direct methods
- I.S. Duff, A.M. Erisman, and J.K. Reid.
Direct Methods for Sparse Matrices.
Clarendon Press, Oxford, 1986.
Iterative methods
- Yousef Saad.
Iterative Methods for Sparse Linear Systems.
PWS Publishing Company, 1996. ISBN 0-534-94776-X
- Barrett, Berry, Chan, Demmel, Donato, Dongarra, Eijkhout, Pozo, Romine,
and Van der Vorst.
Templates for the Solution of Linear Systems: Building Blocks for
Iterative Methods, 2nd edition. SIAM, 1994
Papers
Direct Methods
Iterative Methods
- Willi Schonauer and Rudiger Weiss.
An engineering approach to generalized conjugate gradient methods
and beyond.
Applied Numerical Mathematics 19:3 (Dec 1995) 175-206.
- The purpose of this paper is to give nonexperts a simple introduction
to generalized conjugate gradient methods.
The methods surveyed include ORTHORES-based methods, residual- and
error-minimizing methods (such as GMRES), biconjugate gradient-based
methods, and methods based on the normal equations.
Practical aspects such
as normalization and smoothing are discussed. For a model of the
Navier-Stokes equations that varies between strong and weak ellipticity,
the methods are compared and their typical properties are demonstrated.
- Rudiger Weiss.
A theoretical overview of Krylov subspace methods.
Applied Numerical Mathematics 19:3 (Dec 1995) 207-233.
- This paper surveys Krylov subspace methods for the solution of linear
systems with a focus on commonly used and recently developed methods.
Convergence results are derived from a general theoretical framework
and compiled and analyzed.
- Gerard L.G. Sleijpen and Henk A. van der Vorst.
An overview of approaches for the stable computation of hybrid BiCG methods.
Applied Numerical Mathematics 19:3 (Dec 1995) 235-254.
- BiCG can be adapted so that operations with A^T can be avoided,
and hybrid methods with computational complexity almost similar
to BiCG can be constructed to improve convergence behavior.
Examples of this are CGS, Bi-CGSTAB, and BiCGstab(l).
In many applications, the speed of convergence of these methods is very
dependent on the incorporated BiCG process. The accuracy of the
iteration coefficients of BiCG depends on the particular choice
of the hybrid method. This paper discusses the accuracy of these
coefficients and how this affects the speed of converence and shows
that hybrid methods exists with bettery accuracy properties.
This paper also discusses look-ahead strategies for the determination
of appropriate values for l in BiCGstab(l).
- Anshul Gupta, Vipin Kumar, and Ahmed Sameh.
Performance and Scalability of Preconditioned Conjugate Gradient Methods
on Parallel Computers.
IEEE Transactions on Parallel and Distributed Systems, 1995.
Earlier version: University of Minnesota Tech Report TR 92-64, Nov. 1992
(ftp://ftp.cs.umn.edu/users/kumar/conjugate-gradient.ps).
- R.R.P. van Nooyen, C. Vuik, and P. Wesseling
Some parallelizable preconditioners for GMRES.
Technische Universiteit Delft, Tech Report 96-50.
- Randall Bramley and Vladimir Menkov.
Low Rank Off-Diagonal Block Preconditioners for Solving Sparse Linear
Systems on Parallel Computers.
Department of Computer Science, Indiana University - Bloomington.
Papers
- C. Walshaw, M. Cross, and M. Everett.
Parallel Partitioning of Unstructured Meshes.
Proc. Parallel CFD '96.
- A method is outlined for optimising graph partitions which arise in mapping
unstructured mesh calculations to parallel computers. The method employs
a combination of iterative techniques to both evenly balance the workload
and minimise the number and volume of interprocessor communications.
The method is designed to work efficiently in parallel as well as sequentially.
When combined with a fast direct partitioning technique (such as the
Greedy algorithm) to give an initial partition, the resulting two-stage
process proves itself to be both a powerful and flexible solution to the
static graph-partitioning problem. The algorithms can also be used for
dynamic load-balancing, and a clustering technique can additionally be
employed to speed up the whole process. Experiments indicate that the
resulting parallel code can provide high quality partitions, independent of
the initial partition, within a few seconds.
- C. Walshaw and M. Berzins
Dynamic load-balancing or PDE solvers on adaptive unstructured meshes.
Concurrency: Practice and Experience 7(1):17-28, 1995.
- Modern PDE solvers written for time-dependent problems
increasingly employ adaptive unstructured meshes
in order to both increase efficiency and control the numerical
error. If a distributed memory parallel computer is to be used, there
arises the significant problem of dividing the domain equally
amongst the processors whilst minimising the inter-subdomain
dependencies. A number of graph based algorithms have recently
been proposed for steady state calculations.
This paper considers an extension to such methods which renders
them more suitable for time-dependent problems in which the mesh
may be changed frequently.
- Scott R. Kohn and Scott B. Baden.
A Parallel Software Infrastructure for Structured Adaptive Mesh Methods.
Proc. Supercomputing '95, San Diego CA, December 1995.
- Structured adaptive mesh algorithms dynamically allocate computational
resources to accurately resolve interesting portions of a numerical calculation.
Such methods are difficult to implement and parallelize because they rely on
dynamic, irregular data structures. We have developed an efficient, portable,
parallel software infrastructure for adaptive mesh methods; our software
provides computational scientists with high-level facilities that hide
low-level details of parallelism and resource management. We have applied
our software infrastructure to the solution of adaptive eigenvalue problems
arising in materials design. We describe our software infrastructure and
analyze its performance. We also present computational results which
indicate that the uniformity restrictions imposed by a data parallel Fortran
implementation of a structured adaptive mesh application would significantly
impact performance.
Books
- Smith, Bjorstad, and Gropp.
Domain Decomposition: Parallel Multilevel Methods for Elliptic
Partial Differential Equations.
Cambride University Press, 1996. ISBN 0-521-49589-X
Books
Jorge J. More' and Stephen J. Wright.
Optimization Software Guide. SIAM Publications, 1993.
Pascal Van Hentenryck. The OPL Optimization Programming Language. MIT Press, 1999.
Papers
- Robert B. Schnabel,
A view of the limitations, opportunities, and challenges in parallel
nonlinear optimization,
Parallel Computing 21(6), June 1995, 875--905
- This paper discusses how consideration of parallelism is affecting
and is likely to affect the field of nonlinear optimization. It
presents a set of examples that illustrate limitations, opportunities,
and challenges. The examples include parallel methods for unconstrained
optimization problems with a small to moderate number of variables,
parallel methods for large block bordered systems of nonlinear equations,
and parallel methods for small-scale and large-scale global optimization
problems.
- Brett M. Averick and Jorge J. More'.
Evaluation of Large-Scale Optimization Problems on Vector and Parallel
Architectures.
- This research uses limited memory variable metric algorithms to
illustrate performance issues for large-scale optimization problems.
The results indicate that the cost of evaluating functions and gradients
dominates the overall solution time.
For a vector architecture,
vectorization of function and gradient evaluation using partitioning
techniques which are applicable to any partially separable function
improves computing times by at least a factor of 10.
For distributed memory architectures,
the authors show that it is possible to develop algorithms for
function and gradient evaluation that are scalable in the sense
that as the number of variables increases and the number of processors
increases, the computing time remains reasonably constant.
hpc-netlib@neltib.org