Abstract:
This manual describes the use of PETSc 2.0 for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc 2.0 uses the MPI standard for all message-passing communication.
PETSc includes an expanding suite of parallel linear and nonlinear equation solvers and unconstrained minimization modules that may be used in application codes written in Fortran, C, and C++. PETSc provides many of the mechanisms needed within parallel application codes, such as parallel matrix and vector assembly routines. The library is organized hierarchically, enabling users to employ the level of abstraction that is most appropriate for a particular problem. By using techniques of object-oriented programming, PETSc provides enormous flexibility for users.
PETSc is a sophisticated set of software tools; as such, for some users it initially has a much steeper learning curve than a simple subroutine library. In particular, for individuals without some computer science background or experience programming in C, Pascal, or C++, it may require a significant amount of time to take full advantage of the features that enable efficient software use. However, the power of the PETSc design and the algorithms it incorporates make the efficient implementation of many application codes much simpler than ``rolling them'' yourself. For many simple (or even relatively complicated) tasks a package such as Matlab is often the best tool; PETSc is not intended for the classes of problems for which effective Matlab code can be written.
Since PETSc is still under development, small changes in usage and calling sequences of routines may occur. PETSc is supported; see the readme.html in the PETSc distribution directory or the web site http://www.mcs.anl.gov/petsc for information on contacting support.
Getting Information on PETSc:
On-line:
Manual pages on all routines including example usage
docs/manualpages/manualpages.html in the distribution or
http://www.mcs.anl.gov/petsc/docs/manualpages/manualpages.html
Troubleshooting
docs/troubleshooting.html in the distribution or
http://www.mcs.anl.gov/petsc/docs/troubleshooting.html
In this manual:
Basic introduction, page Creating and Assembling Vectors
Assembling vectors and matrices, page Matrices
Linear solvers, page SLES: Linear Equations Solvers
List of all routines and PETSc data types, page
Function index, page
Subject index, page .
Acknowledgments:
We thank Victor Eijkhout, David Keyes, and Matthew Knepley for their valuable comments on the source code, functionality, and documentation for PETSc 2.0. In addition, we thank all PETSc users for their suggestions, bug reports, and encouragement.
Some of the source code and utilities in PETSc (or software used by PETSc) have been written by
PETSc uses routines from
PETSc interfaces to the following external software:
This manual is intended for use with PETSc 2.0.21.
The Portable, Extensible Toolkit for Scientific Computation (PETSc) has successfully demonstrated that the use of modern programming paradigms can ease the development of large-scale scientific application codes in Fortran, C, and C++. Begun several years ago, the software has evolved into a powerful set of tools for the numerical solution of partial differential equations and related problems on high-performance computers.
PETSc consists of a variety of components (similar to classes in C++), which are discussed in detail in Parts II and III of the users manual. Each component manipulates a particular family of objects (for instance, vectors) and the operations one would like to perform on the objects. The objects and operations in PETSc are derived from our long experiences with scientific computation. Some of the PETSc modules deal with
The components enable easy customization and extension of both algorithms and implementations. This approach promotes code reuse and flexibility, and separates the issues of parallelism from the choice of algorithms. The PETSc infrastructure creates a foundation for building large-scale applications.
It is useful to consider the interrelationships among different
pieces of PETSc 2.0. Figure 1
is a diagram of some
of the components of PETSc; Figure 2
presents
several of the individual components in more detail.
These figures illustrate the library's hierarchical organization,
which enables users to employ the level of abstraction that is most
appropriate for a particular problem.
The manual is divided into three parts:
Part II explains in detail the use of the various PETSc components, such as vectors, matrices, index sets, linear and nonlinear solvers, and graphics. Part III describes a variety of useful information, including profiling, the options database, viewers, error handling, makefiles, and some details of PETSc design.
efficient
The PETSc 2.0 Users Manual documents all of PETSc 2.0; thus, it can be rather intimidating for new users. We recommend that one initially read the entire document before proceeding with serious use of PETSc, but bear in mind that PETSc can be used efficiently before one understands all of the material presented here.
Manual pages for all PETSc functions can be
accessed on line at
${PETSC_DIR}/docs/manualpages/manualpages.htmlThe manual pages provide hyperlinked indices (organized by both concepts and routine names) to the tutorial examples and enable easy movement among related topics.
Within the PETSc distribution, the directory ${}PETSC_DIR/docs contains all documentation, including this manual and the manual pages in PostScript and HTML formats.
Emacs users may find the etags option to be extremely useful for exploring the PETSc source code. Details of this feature are provided in Section Emacs Users . Similarly, vi users may find the ctags option to be useful. Details of this feature are provided in Section VI Users .
The file manual.ps contains the PostScript form of the PETSc 2.0 Users Manual in its entirety, while intro.ps includes only the introductory segment, Part I. The file Installation contains detailed instructions for installing PETSc. The complete PETSc distribution, users manual, manual pages, and additional information are also available via the PETSc home page at http://www.mcs.anl.gov/petsc. The PETSc home page also contains details regarding installation, new features and changes in recent versions of PETSc, machines that we currently support, a troubleshooting guide, and a FAQ list for frequently asked questions.
Note to Fortran Programmers: In most of the manual, the examples and calling sequences are given for the C/C++ family of programming languages. We follow this convention because we highly recommend that PETSc applications be coded in C or C++. However, pure Fortran77 programmers can use most of the functionality of PETSc from Fortran, with only minor differences in the user interface. Chapter PETSc Fortran Users provides a discussion of the differences between using PETSc from Fortran and C, as well as several complete Fortran77 examples. This chapter also introduces some routines that support direct use of Fortran90 pointers.
Before using PETSc, the user must first set the environmental variable
PETSC_DIR, indicating the full path of the PETSc home
directory. For example, under the UNIX C shell a command of the form
setenv PETSC_DIR $HOME/petsccan be placed in the user's .cshrc file. In addition, the user must set the environmental variable PETSC_ARCH to specify the architecture (e.g., rs6000, sun4, solaris, etc.) on which PETSc is being used. The utility ${}PETSC_DIR/bin/petscarch can be used for this purpose. For example,
setenv PETSC_ARCH `$PETSC_DIR/bin/petscarch`can be placed in a .cshrc file. Thus, even if several machines of different types share the same filesystem, PETSC_ARCH will be set correctly when logging into any of them.
All PETSc programs use the MPI (Message Passing Interface) standard
for message-passing communication [(ref MPI-final)]. Thus, to execute
PETSc programs, users must know the procedure for beginning MPI jobs
on their selected computer system(s). For instance, when using the
MPICH implementation of MPI [(ref mpich-web-page)] and many others, the following
command initiates a program that uses eight processors:
mpirun -np 8 petsc_program_name petsc_optionsAll PETSc 2.0-compliant programs support the use of the -h or -help option as well as the -v or -version option.
Certain options are supported by all PETSc programs. We list a few particularly useful ones below; a complete list can be obtained by running any PETSc program with the option -help.
Most PETSc programs begin with a call to
ierr = PetscInitialize(int *argc,char ***argv,char *file_name,char *help_message);which initializes PETSc and MPI. The arguments argc and argv are the command line arguments delivered in all C and C++ programs. The argument file_name optionally indicates an alternative name for the PETSc options file, .petscrc, which resides by default in the user's home directory. Section Runtime Options provides details regarding this file and the PETSc options database, which can be used for runtime customization. The final argument, help_message, is an optional character string that will be printed if the program is run with the -help option. In Fortran the initialization command has the form
call PetscInitialize(character file_name,integer ierr)PetscInitialize() automatically calls MPI_Init() if MPI has not been not previously initialized. In certain circumstances in which MPI needs to be initialized directly (or is initialized by some other library), the user should first call MPI_Init() (or have the other library do it), and then call PetscInitialize(). By default, PetscInitialize() sets the PETSc ``world'' communicator, given by PETSC_COMM_WORLD, to MPI_COMM_WORLD.
For those not familar with MPI, a communicator is a way of indicating a collection of processors that will be involved together in a calculation or communication. Communicators has the variable type MPI_Comm. In most cases users can employ the communicator PETSC_COMM_WORLD to indicate all processes in a given run and PETSC_COMM_SELF to indicate a single process. MPI provides routines for generating new communicators consisting of subsets of processors, though most users rarely need to use these. The book Using MPI, by Lusk, Gropp, and Skjellum [(ref using-mpi)] provides an excellent introduction to the concepts in MPI. Note that PETSc users need not program much message passing directly with MPI, but they must be familar with the basic concepts of message passing and distributed memory computing.
Users who wish to employ PETSc routines on only a subset
of processors within a larger parallel job, or who wish to use a
``master'' process to coordinate the work of ``slave'' PETSc
processes, should specify an alternative communicator for
PETSC_COMM_WORLD by calling
ierr = PetscSetCommWorld(MPI_Comm comm);before calling PetscInitialize(), but, obviously, after calling MPI_Init(). PetscSetCommWorld() can be called at most once per process. Most users will never need to use the routine PetscSetCommWorld().
As illustrated by the PetscInitialize() statements above, PETSc 2.0 routines return an integer indicating whether an error has occurred during the call. The error code is set to be nonzero if an error has been detected; otherwise, it is zero. For the C/C++ interface, the error variable is the routine's return value, while for the Fortran version, each PETSc routine has as its final argument an integer error variable. Error tracebacks are discussed in the following section.
All PETSc programs should call PetscFinalize()
as their final (or nearly final) statement, as given below in the C/C++
and Fortran formats, respectively:
ierr = PetscFinalize(); call PetscFinalize(ierr)This routine handles options to be called at the conclusion of the program, and calls MPI_Finalize() if PetscInitialize() began MPI. If MPI was initiated externally from PETSc (by either the user or another software package), the user is responsible for calling MPI_Finalize().
To help the user start using PETSc immediately, we begin with a simple uniprocessor example in Figure 3 that solves the one-dimensional Laplacian problem with finite differences. This sequential code, which can be found in ${}PETSC_DIR/src/sles/examples/tutorials/ex1.c, illustrates the solution of a linear system with SLES, the interface to the preconditioners, Krylov subspace methods, and direct linear solvers of PETSc. Following the code we highlight a few of the most important parts of this example.
#ifdef PETSC_RCS_HEADER static char vcid[] = "$Id: ex1.c,v 1.70 1998/04/28 03:49:10 curfman Exp $"; #endif /* Program usage: mpirun ex1 [-help] [all PETSc options] */ static char help[] = "Solves a tridiagonal linear system with SLES.\n\n"; /*T Concepts: SLES^Solving a system of linear equations (basic uniprocessor example); Routines: SLESCreate(); SLESSetOperators(); SLESSetFromOptions(); Routines: SLESSolve(); SLESView(); SLESGetKSP(); SLESGetPC(); Routines: KSPSetTolerances(); PCSetType(); Processors: 1 T*/ /* Include "sles.h" so that we can use SLES solvers. Note that this file automatically includes: petsc.h - base PETSc routines vec.h - vectors sys.h - system routines mat.h - matrices is.h - index sets ksp.h - Krylov subspace methods viewer.h - viewers pc.h - preconditioners */ #include "sles.h" int main(int argc,char **args) { Vec x, b, u; /* approx solution, RHS, exact solution */ Mat A; /* linear system matrix */ SLES sles; /* linear solver context */ PC pc; /* preconditioner context */ KSP ksp; /* Krylov subspace method context */ double norm; /* norm of solution error */ int ierr, i, n = 10, col[3], its, flg, size; Scalar neg_one = -1.0, one = 1.0, value[3]; PetscInitialize(&argc,&args,(char *)0,help); MPI_Comm_size(PETSC_COMM_WORLD,&size); if (size != 1) SETERRA(1,0,"This is a uniprocessor example only!"); ierr = OptionsGetInt(PETSC_NULL,"-n",&n,&flg); CHKERRA(ierr); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Compute the matrix and right-hand-side vector that define the linear system, Ax = b. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ /* Create matrix. When using MatCreate(), the matrix format can be specified at runtime. */ ierr = MatCreate(PETSC_COMM_WORLD,n,n,&A); CHKERRA(ierr); /* Assemble matrix */ value[0] = -1.0; value[1] = 2.0; value[2] = -1.0; for (i=1; i<n-1; i++ ) { col[0] = i-1; col[1] = i; col[2] = i+1; ierr = MatSetValues(A,1,&i,3,col,value,INSERT_VALUES); CHKERRA(ierr); } i = n - 1; col[0] = n - 2; col[1] = n - 1; ierr = MatSetValues(A,1,&i,2,col,value,INSERT_VALUES); CHKERRA(ierr); i = 0; col[0] = 0; col[1] = 1; value[0] = 2.0; value[1] = -1.0; ierr = MatSetValues(A,1,&i,2,col,value,INSERT_VALUES); CHKERRA(ierr); ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); CHKERRA(ierr); ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); CHKERRA(ierr); /* Create vectors. Note that we form 1 vector from scratch and then duplicate as needed. */ ierr = VecCreate(PETSC_COMM_WORLD,PETSC_DECIDE,n,&x); CHKERRA(ierr); ierr = VecDuplicate(x,&b); CHKERRA(ierr); ierr = VecDuplicate(x,&u); CHKERRA(ierr); /* Set exact solution; then compute right-hand-side vector. */ ierr = VecSet(&one,u); CHKERRA(ierr); ierr = MatMult(A,u,b); CHKERRA(ierr); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Create the linear solver and set various options - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ /* Create linear solver context */ ierr = SLESCreate(PETSC_COMM_WORLD,&sles); CHKERRA(ierr); /* Set operators. Here the matrix that defines the linear system also serves as the preconditioning matrix. */ ierr = SLESSetOperators(sles,A,A,DIFFERENT_NONZERO_PATTERN); CHKERRA(ierr); /* Set linear solver defaults for this problem (optional). - By extracting the KSP and PC contexts from the SLES context, we can then directly call any KSP and PC routines to set various options. - The following four statements are optional; all of these parameters could alternatively be specified at runtime via SLESSetFromOptions(); */ ierr = SLESGetKSP(sles,&ksp); CHKERRA(ierr); ierr = SLESGetPC(sles,&pc); CHKERRA(ierr); ierr = PCSetType(pc,PCJACOBI); CHKERRA(ierr); ierr = KSPSetTolerances(ksp,1.e-7,PETSC_DEFAULT,PETSC_DEFAULT, PETSC_DEFAULT); CHKERRA(ierr); /* Set runtime options, e.g., -ksp_type <type> -pc_type <type> -ksp_monitor -ksp_rtol <rtol> These options will override those specified above as long as SLESSetFromOptions() is called _after_ any other customization routines. */ ierr = SLESSetFromOptions(sles); CHKERRA(ierr); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Solve the linear system - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ /* Solve linear system */ ierr = SLESSolve(sles,b,x,&its); CHKERRA(ierr); /* View solver info; we could instead use the option -sles_view to print this info to the screen at the conclusion of SLESSolve(). */ ierr = SLESView(sles,VIEWER_STDOUT_WORLD); CHKERRA(ierr); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Check solution and clean up - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ /* Check the error */ ierr = VecAXPY(&neg_one,u,x); CHKERRA(ierr); ierr = VecNorm(x,NORM_2,&norm); CHKERRA(ierr); if (norm > 1.e-12) PetscPrintf(PETSC_COMM_WORLD,"Norm of error %g, Iterations %d\n",norm,its); else PetscPrintf(PETSC_COMM_WORLD,"Norm of error < 1.e-12, Iterations %d\n",its); /* Free work space. All PETSc objects should be destroyed when they are no longer needed. */ ierr = VecDestroy(x); CHKERRA(ierr); ierr = VecDestroy(u); CHKERRA(ierr); ierr = VecDestroy(b); CHKERRA(ierr); ierr = MatDestroy(A); CHKERRA(ierr); ierr = SLESDestroy(sles); CHKERRA(ierr); /* Always call PetscFinalize() before exiting a program. This routine - finalizes the PETSc libraries as well as MPI - provides summary and diagnostic information if certain runtime options are chosen (e.g., -log_summary). */ PetscFinalize(); return 0; }
The C/C++ include files for PETSc should be used via statements such as
#include "sles.h"where sles.h is the include file for the SLES component. Each PETSc program must specify an include file that corresponds to the highest level PETSc objects needed within the program; all of the required lower level include files are automatically included within the higher level files. For example, sles.h includes mat.h (matrices), vector.h (vectors), and petsc.h (base PETSc file). The PETSc include files are located in the directory ${}PETSC_DIR/include. See Section Include Files for a discussion of PETSc include files in Fortran programs.
As shown in Figure 3 , the user can input control data at run time using the options database. In this example the command OptionsGetInt(PETSC_NULL,"-n",&n,&flg); checks whether the user has provided a command line option to set the value of n, the problem dimension. If so, the variable n is set accordingly; otherwise, n remains unchanged. A complete description of the options database may be found in Section Runtime Options .
One creates a new parallel or
sequential vector, x, of global dimension M with the
command
ierr = VecCreate(MPI_Comm comm,int m,int M,Vec *x);where comm denotes the MPI communicator. Additional vectors of the same type can be formed with
ierr = VecDuplicate(Vec old,Vec *new);The commands
ierr = VecSet(Scalar *value,Vec x); ierr = VecSetValues(Vec x,int n,int *indices,Scalar *values,INSERT_VALUES);respectively set all the components of a vector to a particular scalar value and assign a different value to each component. More detailed information about PETSc vectors, including their basic operations, scattering/gathering, index sets, and distributed arrays, is discussed in Chapter Vectors and Distributing Parallel Data .
Note the use of the PETSc variable type Scalar in this example. The Scalar is simply defined to be double in C/C++ (or correspondingly double precision in Fortran) for versions of PETSc that have not been compiled for use with complex numbers. The Scalar data type enables identical code to be used when the PETSc libraries have been compiled for use with complex numbers. Section Complex Numbers discusses the use of complex numbers in PETSc programs.
Usage of PETSc matrices and vectors is similar.
The user can create a new parallel or sequential matrix, A, which
has M global rows and N global columns, with the routine
ierr = MatCreate(MPI_Comm comm,int M,int N,Mat *A);where the matrix format can be specified at runtime. Values can then be set with the command
ierr = MatSetValues(Mat A,int m,int *im,int n,int *in,Scalar *values,INSERT_VALUES);After all elements have been inserted into the matrix, it must be processed with the pair of commands
ierr = MatAssemblyBegin(Mat A,MAT_FINAL_ASSEMBLY); ierr = MatAssemblyEnd(Mat A,MAT_FINAL_ASSEMBLY);Chapter Matrices discusses various matrix formats as well as the details of some basic matrix manipulation routines.
After creating the matrix and vectors that define a linear system,
Ax = b, the user can then use SLES to solve the system
with the following sequence of commands:
ierr = SLESCreate(MPI_Comm comm,SLES *sles); ierr = SLESSetOperators(SLES sles,Mat A,Mat PrecA,MatStructure flag); ierr = SLESSetFromOptions(SLES sles); ierr = SLESSolve(SLES sles,Vec b,Vec x,int *its); ierr = SLESDestroy(SLES sles);The user first creates the SLES context and sets the operators associated with the system (linear system matrix and optionally different preconditioning matrix). The user then sets various options for customized solution, solves the linear system, and finally destroys the SLES context. We emphasize the command SLESSetFromOptions(), which enables the user to customize the linear solution method at runtime by using the options database, which is discussed in Section Runtime Options . Through this database, the user not only can select an iterative method and preconditioner, but also can prescribe the convergence tolerance, set various monitoring routines, etc. (see, e.g., Figure 7 ).
Chapter SLES: Linear Equations Solvers describes in detail the SLES package, including the PC and KSP components for preconditioners and Krylov subspace methods.
All PETSc 2.0 routines return an integer indicating whether an error has occurred during the call. The PETSc macro CHKERRQ(ierr) checks the value of ierr and calls the PETSc 2.0 error handler upon error detection. CHKERRQ(ierr) should be used in all subroutines to enable a complete error traceback. A variant of this macro, CHKERRA(ierr), should be used in the main program to enable correct termination of all processes when an error is encountered. In Figure 4 we indicate a traceback generated by error detection within a sample PETSc program. The error occurred on line 858 of the file ${}PETSC_DIR/src/mat/impls/aij/seq/aij.c and was caused by trying to allocate too large an array in memory. The routine was called in the program ex3.c on line 49. See Section Error Checking for details regarding error checking when using the PETSc Fortran interface.
eagle>mpirun ex3 -m 10000 [0]PETSC ERROR: MatCreateSeqAIJ() line 1673 in src/mat/impls/aij/seq/aij.c [0]PETSC ERROR: Out of memory. This could be due to allocating [0]PETSC ERROR: too large an object or bleeding by not properly [0]PETSC ERROR: destroying unneeded objects. [0]PETSC ERROR: Try running with -trdump for more information. [0]PETSC ERROR: MatCreate() line 99 in src/mat/utils/gcreate.c [0]PETSC ERROR: main() line 71 in src/sles/examples/tutorials/ex3.c [0] MPI Abort by user Aborting program ! [0] Aborting program! p0_28969: p4_error: : 1
Since PETSc uses the message-passing model for parallel programming and employs MPI for all interprocessor communication, the user is free to employ MPI routines as needed throughout an application code. However, by default the user is shielded from many of the details of message passing within PETSc, since these are hidden within parallel objects, such as vectors, matrices, and solvers. In addition, PETSc provides tools such as generalized vector scatters/gathers and distributed arrays to assist in the management of parallel data.
Recall that the user must specify a communicator upon creation of any
PETSc object (such as a vector, matrix, or solver) to indicate the
processors over which the object is to be distributed. For example,
as mentioned above, some commands for matrix, vector, and linear solver
creation are:
ierr = MatCreate(MPI_Comm comm,int M,int N,Mat *A); ierr = VecCreate(MPI_Comm comm,int m,int M,Vec *x); ierr = SLESCreate(MPI_Comm comm,SLES *sles);The creation routines are collective over all processors in the communicator; thus, all processors in the communicator must call the creation routine. In addition, if a sequence of collective routines is being used, they must be called in the same order on each processor.
The next example, given in Figure 5 , illustrates the solution of a linear system in parallel. This code, corresponding to ${}PETSC_DIR/src/sles/examples/tutorials/ex2.c, handles the two-dimensional Laplacian discretized with finite differences, where the linear system is again solved with SLES. The code performs the same tasks as the sequential version within Figure 3 . Note that the user interface for initiating the program, creating vectors and matrices, and solving the linear system is exactly the same for the uniprocessor and multiprocessor examples. The primary difference between the examples in Figures 3 and 5 is that each processor forms only its local part of the matrix and vectors in the parallel case.
#ifdef PETSC_RCS_HEADER static char vcid[] = "$Id: ex2.c,v 1.74 1998/04/28 04:00:19 curfman Exp $"; #endif /* Program usage: mpirun -np <procs> ex2 [-help] [all PETSc options] */ static char help[] = "Solves a linear system in parallel with SLES.\n\ Input parameters include:\n\ -random_exact_sol : use a random exact solution vector\n\ -view_exact_sol : write exact solution vector to stdout\n\ -m <mesh_x> : number of mesh points in x-direction\n\ -n <mesh_n> : number of mesh points in y-direction\n\n"; /*T Concepts: SLES^Solving a system of linear equations (basic parallel example); Concepts: SLES^Laplacian, 2d Concepts: Laplacian, 2d Routines: SLESCreate(); SLESSetOperators(); SLESSetFromOptions(); Routines: SLESSolve(); SLESGetKSP(); SLESGetPC(); Routines: KSPSetTolerances(); PCSetType(); Routines: PetscRandomCreate(); PetscRandomDestroy(); VecSetRandom(); Processors: n T*/ /* Include "sles.h" so that we can use SLES solvers. Note that this file automatically includes: petsc.h - base PETSc routines vec.h - vectors sys.h - system routines mat.h - matrices is.h - index sets ksp.h - Krylov subspace methods viewer.h - viewers pc.h - preconditioners */ #include "sles.h" int main(int argc,char **args) { Vec x, b, u; /* approx solution, RHS, exact solution */ Mat A; /* linear system matrix */ SLES sles; /* linear solver context */ PetscRandom rctx; /* random number generator context */ double norm; /* norm of solution error */ int i, j, I, J, Istart, Iend, ierr, m = 8, n = 7, its, flg; Scalar v, one = 1.0, neg_one = -1.0; KSP ksp; PetscInitialize(&argc,&args,(char *)0,help); ierr = OptionsGetInt(PETSC_NULL,"-m",&m,&flg); CHKERRA(ierr); ierr = OptionsGetInt(PETSC_NULL,"-n",&n,&flg); CHKERRA(ierr); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Compute the matrix and right-hand-side vector that define the linear system, Ax = b. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ /* Create parallel matrix, specifying only its global dimensions. When using MatCreate(), the matrix format can be specified at runtime. Also, the parallel partitioning of the matrix is determined by PETSc at runtime. */ ierr = MatCreate(PETSC_COMM_WORLD,m*n,m*n,&A); CHKERRA(ierr); /* Currently, all PETSc parallel matrix formats are partitioned by contiguous chunks of rows across the processors. Determine which rows of the matrix are locally owned. */ ierr = MatGetOwnershipRange(A,&Istart,&Iend); CHKERRA(ierr); /* Set matrix elements for the 2-D, five-point stencil in parallel. - Each processor needs to insert only elements that it owns locally (but any non-local elements will be sent to the appropriate processor during matrix assembly). - Always specify global rows and columns of matrix entries. */ for ( I=Istart; I<Iend; I++ ) { v = -1.0; i = I/n; j = I - i*n; if ( i>0 ) {J = I - n; MatSetValues(A,1,&I,1,&J,&v,INSERT_VALUES); CHKERRA(ierr);} if ( i<m-1 ) {J = I + n; MatSetValues(A,1,&I,1,&J,&v,INSERT_VALUES); CHKERRA(ierr);} if ( j>0 ) {J = I - 1; MatSetValues(A,1,&I,1,&J,&v,INSERT_VALUES); CHKERRA(ierr);} if ( j<n-1 ) {J = I + 1; MatSetValues(A,1,&I,1,&J,&v,INSERT_VALUES); CHKERRA(ierr);} v = 4.0; MatSetValues(A,1,&I,1,&I,&v,INSERT_VALUES); } /* Assemble matrix, using the 2-step process: MatAssemblyBegin(), MatAssemblyEnd() Computations can be done while messages are in transition by placing code between these two statements. */ ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); CHKERRA(ierr); ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); CHKERRA(ierr); /* Create parallel vectors. - We form 1 vector from scratch and then duplicate as needed. - When using VecCreate() in this example, we specify only the vector's global dimension; the parallel partitioning is determined at runtime. - When solving a linear system, the vectors and matrices MUST be partitioned accordingly. PETSc automatically generates appropriately partitioned matrices and vectors when MatCreate() and VecCreate() are used with the same communicator. - The user can alternatively specify the local vector and matrix dimensions when more sophisticated partitioning is needed (replacing the PETSC_DECIDE argument in the VecCreate() statement below). */ ierr = VecCreate(PETSC_COMM_WORLD,PETSC_DECIDE,m*n,&u); CHKERRA(ierr); ierr = VecDuplicate(u,&b); CHKERRA(ierr); ierr = VecDuplicate(b,&x); CHKERRA(ierr); /* Set exact solution; then compute right-hand-side vector. By default we use an exact solution of a vector with all elements of 1.0; Alternatively, using the runtime option -random_sol forms a solution vector with random components. */ ierr = OptionsHasName(PETSC_NULL,"-random_exact_sol",&flg); CHKERRA(ierr); if (flg) { ierr = PetscRandomCreate(PETSC_COMM_WORLD,RANDOM_DEFAULT,&rctx); CHKERRA(ierr); ierr = VecSetRandom(rctx,u); CHKERRA(ierr); ierr = PetscRandomDestroy(rctx); CHKERRA(ierr); } else { ierr = VecSet(&one,u); CHKERRA(ierr); } ierr = MatMult(A,u,b); CHKERRA(ierr); /* View the exact solution vector if desired */ ierr = OptionsHasName(PETSC_NULL,"-view_exact_sol",&flg); CHKERRA(ierr); if (flg) {ierr = VecView(u,VIEWER_STDOUT_WORLD); CHKERRA(ierr);} /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Create the linear solver and set various options - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ /* Create linear solver context */ ierr = SLESCreate(PETSC_COMM_WORLD,&sles); CHKERRA(ierr); /* Set operators. Here the matrix that defines the linear system also serves as the preconditioning matrix. */ ierr = SLESSetOperators(sles,A,A,DIFFERENT_NONZERO_PATTERN); CHKERRA(ierr); /* Set linear solver defaults for this problem (optional). - By extracting the KSP and PC contexts from the SLES context, we can then directly call any KSP and PC routines to set various options. - The following two statements are optional; all of these parameters could alternatively be specified at runtime via SLESSetFromOptions(). All of these defaults can be overridden at runtime, as indicated below. */ ierr = SLESGetKSP(sles,&ksp); CHKERRA(ierr); ierr = KSPSetTolerances(ksp,1.e-2/((m+1)*(n+1)),1.e-50,PETSC_DEFAULT, PETSC_DEFAULT); CHKERRA(ierr); /* Set runtime options, e.g., -ksp_type <type> -pc_type <type> -ksp_monitor -ksp_rtol <rtol> These options will override those specified above as long as SLESSetFromOptions() is called _after_ any other customization routines. */ ierr = SLESSetFromOptions(sles); CHKERRA(ierr); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Solve the linear system - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ ierr = SLESSolve(sles,b,x,&its); CHKERRA(ierr); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Check solution and clean up - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ /* Check the error */ ierr = VecAXPY(&neg_one,u,x); CHKERRA(ierr); ierr = VecNorm(x,NORM_2,&norm); CHKERRA(ierr); /* Scale the norm */ /* norm *= sqrt(1.0/((m+1)*(n+1))); */ /* Print convergence information. PetscPrintf() produces a single print statement from all processes that share a communicator. An alternative is PetscFPrintf(), which prints to a file. */ if (norm > 1.e-12) PetscPrintf(PETSC_COMM_WORLD,"Norm of error %g iterations %d\n",norm,its); else PetscPrintf(PETSC_COMM_WORLD,"Norm of error < 1.e-12 Iterations %d\n",its); /* Free work space. All PETSc objects should be destroyed when they are no longer needed. */ ierr = SLESDestroy(sles); CHKERRA(ierr); ierr = VecDestroy(u); CHKERRA(ierr); ierr = VecDestroy(x); CHKERRA(ierr); ierr = VecDestroy(b); CHKERRA(ierr); ierr = MatDestroy(A); CHKERRA(ierr); /* Always call PetscFinalize() before exiting a program. This routine - finalizes the PETSc libraries as well as MPI - provides summary and diagnostic information if certain runtime options are chosen (e.g., -log_summary). */ PetscFinalize(); return 0; }
Figure 6 illustrates compiling and running a PETSc program using MPICH. Note that different sites may have slightly different library and compiler names. See Chapter Makefiles for a discussion about compiling PETSc programs. Users who are experiencing difficulties linking PETSc programs should refer to the troubleshooting guide via the PETSc WWW home page http://www.mcs.anl.gov/petsc or given in the file ${}PETSC_DIR/docs/troubleshooting.html.
eagle> make BOPT=g ex2 gcc -DPETSC_ARCH_sun4 -pipe -c -I../../../ -I../../..//include -I/usr/local/mpi/include -I../../..//src -g -DUSE_PETSC_DEBUG -DPETSC_MALLOC -DUSE_PETSC_LOG ex1.c gcc -g -DUSE_PETSC_DEBUG -DPETSC_MALLOC -DUSE_PETSC_LOG -o ex1 ex1.o /home/bsmith/petsc/lib/libg/sun4/libpetscsles.a -L/home/bsmith/petsc/lib/libg/sun4 -lpetscstencil -lpetscgrid -lpetscsles -lpetscmat -lpetscvec -lpetscsys -lpetscdraw /usr/local/lapack/lib/lapack.a /usr/local/lapack/lib/blas.a /usr/lang/SC1.0.1/libF77.a -lm /usr/lang/SC1.0.1/libm.a -lX11 /usr/local/mpi/lib/sun4/ch_p4/libmpi.a /usr/lib/debug/malloc.o /usr/lib/debug/mallocmap.o /usr/lang/SC1.0.1/libF77.a -lm /usr/lang/SC1.0.1/libm.a -lm rm -f ex1.o eagle> mpirun ex2 Norm of error 3.6618e-05 iterations 7 eagle> eagle> mpirun -np 2 ex2 Norm of error 5.34462e-05 iterations 9
eagle> mpirun ex1 -n 1000 -pc_type ilu -ksp_type gmres -ksp_rtol 1.e-7 -log_summary -------------------------------- PETSc Performance Summary: --------------------------------------ex1 on a sun4 named merlin.mcs.anl.gov with 1 processor, by curfman Wed Aug 7 17:24:27 1996
Max Min Avg Total Time (sec): 1.150e-01 1.0 1.150e-01 Objects: 1.900e+01 1.0 1.900e+01 Flops: 3.998e+04 1.0 3.998e+04 3.998e+04 Flops/sec: 3.475e+05 1.0 3.475e+05 MPI Messages: 0.000e+00 0.0 0.000e+00 0.000e+00 MPI Messages: 0.000e+00 0.0 0.000e+00 0.000e+00 (lengths) MPI Reductions: 0.000e+00 0.0
-------------------------------------------------------------------------------------------------- Phase Count Time (sec) Flops/sec -- Global -- Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R -------------------------------------------------------------------------------------------------- MatMult 2 2.553e-03 1.0 3.9e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 25 0 0 0 MatAssemblyBegin 1 2.193e-05 1.0 0.0e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 MatAssemblyEnd 1 5.004e-03 1.0 0.0e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 MatGetReordering 1 3.004e-03 1.0 0.0e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 MatILUFctrSymbol 1 5.719e-03 1.0 0.0e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 MatLUFactorNumer 1 1.092e-02 1.0 2.7e+05 1.0 0.0e+00 0.0e+00 0.0e+00 9 7 0 0 0 MatSolve 2 4.193e-03 1.0 2.4e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 25 0 0 0 MatSetValues 1000 2.461e-02 1.0 0.0e+00 0.0 0.0e+00 0.0e+00 0.0e+00 21 0 0 0 0 VecDot 1 2.060e-04 1.0 9.7e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 VecNorm 3 5.870e-04 1.0 1.0e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 15 0 0 0 VecScale 1 1.640e-04 1.0 6.1e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 VecCopy 1 3.101e-04 1.0 0.0e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 VecSet 3 5.029e-04 1.0 0.0e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 VecAXPY 3 8.690e-04 1.0 6.9e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 15 0 0 0 VecMAXPY 1 2.550e-04 1.0 7.8e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 SLESSolve 1 1.288e-02 1.0 2.2e+06 1.0 0.0e+00 0.0e+00 0.0e+00 11 70 0 0 0 SLESSetUp 1 2.669e-02 1.0 1.1e+05 1.0 0.0e+00 0.0e+00 0.0e+00 23 7 0 0 0 KSPGMRESOrthog 1 1.151e-03 1.0 3.5e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 PCSetUp 1 2.024e-02 1.0 1.5e+05 1.0 0.0e+00 0.0e+00 0.0e+00 18 7 0 0 0 PCApply 2 4.474e-03 1.0 2.2e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 25 0 0 0 -------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem. Index set 3 3 12420 0 Vector 8 8 65728 0 Matrix 2 2 184924 4140 Krylov Solver 1 1 16892 41080 Preconditioner 1 1 0 64872 SLES 1 1 0 122844
The examples throughout the library demonstrate thesoftware usage
and can serve as templates for developing
custom applications. We suggest that new PETSc
users examine programs in the directories
${PETSC_DIR}/src/<component>/examples/tutorials,where <component> denotes any of the PETSc components (listed in the following section), such as snes or sles. Currently some components have examples only in ${}PETSC_DIR/src/<component>/examples/tests; more tutorial examples will be forthcoming. The HTML version of the manual pages located at
${PETSC_DIR}/docs/manualpages/manualpages.html or http://www.mcs.anl.gov/petsc/docs/manualpages/manualpages.htmlprovides indices (organized by both routine names and concepts) to the tutorial examples.
To write a new application program using PETSc, we suggest the following procedure:
We conclude this introduction with an overview of the organization of the PETSc software. As shown in Figure 8 , the root directory of PETSc contains the following directories:
The vector (denoted by Vec) is one of the simplest PETSc objects. Vectors are used to store discrete PDE solutions, right-hand sides for linear systems, etc. This chapter is organized as follows:
PETSc currently provides two basic vector types: sequential and parallel
(MPI based). To create a sequential vector with m components,
one can
use the command
ierr = VecCreateSeq(PETSC_COMM_SELF,int m,Vec *x);To create a parallel vector one can either specify the number of components that will be stored on each processor or let PETSc decide. The command
ierr = VecCreateMPI(MPI_Comm comm,int m,int M,Vec *x);creates a vector that is distributed over all processors in the communicator, comm, where m indicates the number of components to store on the local processor, and M is the total number of vector components. Either the local or global dimension, but not both, can be set to PETSC_DECIDE to indicate that PETSc should determine it. More generally, one can use the routine
ierr = VecCreate(MPI_Comm comm,int M,Vec *v);which automatically generates the appropriate vector type (sequential or parallel) over all processors in comm. The option -vec_mpi can be used in conjunction with VecCreate() to specify the use of MPI vectors for the uniprocessor case.
We emphasize that all processors in comm must call the vector creation routines, since these routines are collective over all processors in the communicator. If you are not familar with MPI communicators, see the discussion in Section Writing PETSc Programs on page Writing PETSc Programs . In addition, if a sequence of VecCreateXXX() routines is used, they must be called in the same order on each processor in the communicator.
One can assign a single value to all components of a vector with the
command
ierr = VecSet(Scalar *value,Vec x);Assigning values to individual components of the vector is more complicated, in order to make it possible to write efficient parallel code. Assigning a set of components is a two-step process: one first calls
ierr = VecSetValues(Vec x,int n,int *indices,Scalar *values,INSERT_VALUES);any number of times on any or all of the processors. The argument n gives the number of components being set in this insertion. The integer array indices contains the global component indices, and values is the array of values to be inserted. Any processor can set any components of the vector; PETSc insures that they are automatically stored in the correct location. Once all of the values have been inserted with VecSetValues(), one must call
ierr = VecAssemblyBegin(Vec x);followed by
ierr = VecAssemblyEnd(Vec x);to perform any needed message passing of nonlocal components. In order to allow the overlap of communication and calculation, the user's code can perform any series of other actions between these two calls while the messages are in transition.
Example usage of VecSetValues() may be found in src/vec/examples/tutorials/ex2.c or ex2f.F
Often, rather than inserting elements in a vector, one may wish to
add values. This process
is also done with the command
ierr = VecSetValues(Vec x,int n,int *indices, Scalar *values,ADD_VALUES);Again one must call the assembly routines VecAssemblyBegin() and VecAssemblyEnd() after all of the values have been added. Note that addition and insertion calls to VecSetValues() cannot be mixed. Instead, one must add and insert vector elements in phases, with intervening calls to the assembly routines. This phased assembly procedure overcomes the nondeterministic behavior that would occur if two different processors generated values for the same location, with one processor adding while the other is inserting its value. (In this case the addition and insertion actions could be performed in either order, thus resulting in different values at the particular location. Since PETSc does not allow the simultaneous use of INSERT_VALUES and ADD_VALUES this nondeterministic behavior will not occur in PETSc.)
There is no routine called VecGetValues(), since we provide an alternative method for extracting some components of a vector using the vector scatter routines. See Section Scatters and Gathers for details; see also below for VecGetArray().
One can examine a vector with the command
ierr = VecView(Vec x,Viewer v);To print the vector to the screen, one can use the viewer VIEWER_STDOUT_WORLD, which ensures that parallel vectors are printed correctly to stdout. The viewer VIEWER_STDOUT_SELF can be employed if the user does not care in what order the individual processors print their segments of the vector. To display the vector in an X-window, one can use the default X-windows viewer VIEWER_DRAWX_WORLD, or one can create a viewer with the routine ViewerDrawOpenX(). A variety of viewers are discussed further in Section Viewers: Looking at PETSc Objects .
To create a new vector of the same format as an existing vector, one uses
the command
ierr = VecDuplicate(Vec old,Vec *new);To create several new vectors of the same format as an existing vector, one uses the command
ierr = VecDuplicateVecs(Vec old,int n,Vec **new);This routine creates an array of pointers to vectors. The two routines are very useful because they allow one to write library code that does not depend on the particular format of the vectors being used. Instead, the subroutines can automatically correctly create work vectors based on the specified existing vector. As discussed in Section Duplicating Multiple Vectors , the Fortran interface for VecDuplicateVecs() differs slightly.
When a vector is no longer needed, it should be destroyed with the
command
ierr = VecDestroy(Vec x);To destroy an array of vectors, one should use the command
ierr = VecDestroyVecs(Vec *vecs,int n);Note that the Fortran interface for VecDestroyVecs() differs slightly, as described in Section Duplicating Multiple Vectors .
It is also possible to create vectors that use an array provided by the user,
rather than having PETSc internally allocate the array space.
Such vectors can be created with the routines
ierr = VecCreateSeqWithArray(PETSC_COMM_SELF,int m,Scalar *array,Vec *x);and
ierr = VecCreateMPIWithArray(MPI_Comm comm,int m,int M,,Scalar *array,Vec *x);Note that here one must provide the value m, it cannot be PETSC_DECIDE and the user is responsible for providing enough space in the array; m*sizeof(Scalar).
Function Name Operation VecAXPY(Scalar *a,Vec x, Vec y); y = y + a*x VecAYPX(Scalar *a,Vec x, Vec y); y = x + a*y VecWAXPY(Scalar *a,Vec x,Vec y, Vec w); w = a*x + y VecAXPBY(Scalar *a,Scalar *,Vec x,Vec y); y = a*x + b*y VecScale(Scalar *a, Vec x); x = a*x VecDot(Vec x, Vec y, Scalar *r); r = x'*y VecTDot(Vec x, Vec y, Scalar *r); r = x'*y VecNorm(Vec x,NormType type, double *r); r = ||x||_type VecSum(Vec x, Scalar *r); r = x_i VecCopy(Vec x, Vec y); y = x VecSwap(Vec x, Vec y); y = x while x = y VecPointwiseMult(Vec x,Vec y, Vec w); w_i = x_i*y_i VecPointwiseDivide(Vec x,Vec y, Vec w); w_i = x_i/y_i VecMDot(int n,Vec x, Vec *y,Scalar *r); r[i] = x'*y[i] VecMTDot(int n,Vec x, Vec *y,Scalar *r); r[i] = x'*y[i] VecMAXPY(int n, Scalar *a,Vec x, Vec *y); y[i] = a_i*x + y[i] VecMax(Vec x, int *idx, double *r); r = x_i VecMin(Vec x, int *idx, double *r); r = x_i VecAbs(Vec x); x_i = |x_i| VecReciprocal(Vec x); x_i = 1/x_i VecShift(Scalar *s,Vec x); x_i = s + x_i
For parallel vectors that are distributed across the processors by ranges,
it is possible to determine
a processor's local range with the routine
ierr = VecGetOwnershipRange(Vec vec,int *low,int *high);The argument low indicates the first component owned by the local processor, while high specifies one more than the last owned by the local processor. This command is useful, for instance, in assembling parallel vectors.
On occasion, the user needs to access the actual elements of the vector.
The routine VecGetArray()
returns a pointer to the elements local to the processor:
ierr = VecGetArray(Vec v,Scalar **array);When access to the array is no longer needed, the user should call
ierr = VecRestoreArray(Vec v, Scalar **array);Minor differences exist in the Fortran interface for VecGetArray() and VecRestoreArray(), as discussed in Section Array Arguments . It is important to note that VecGetArray() and VecRestoreArray() do not copy the vector elements; they merely give users direct access to the vector elements. Thus, these routines require essentially no time to call and can be used efficiently.
The number of elements stored locally can be accessed with
ierr = VecGetLocalSize(Vec v,int *size);The global vector length can be determined by
ierr = VecGetSize(Vec v,int *size);
When writing parallel PDE codes there is extra complexity caused by having multiple ways of indexing (numbering) and ordering objects such as vertices and degrees of freedom. For example, a grid generator or partitioner may renumber the nodes, requiring adjustment of the other data structures that refer to these objects; see Figure 10 . In addition, local numbering (on a single processor) of objects may be different than the global (cross-processor) numbering. PETSc provides a variety of tools that help to manage the mapping among the various numbering systems. The two most basic are the AO (application ordering), which enables mapping between different global (cross-processor) numbering schemes and the ISLocalToGlobalMapping, which allows mapping between local (on-processor) and global (cross-processor) numbering.
In many applications it is desirable to work with one or more ``orderings'' (or numberings) of degrees of freedom, cells, nodes, etc. Doing so in a parallel environment is complicated by the fact that each processor cannot keep complete lists of the mappings between different orderings. In addition, the orderings used in the PETSc linear algebra routines (often contiguous ranges) may not correspond to the ``natural'' orderings for the application.
PETSc provides certain utility routines that allow one to deal cleanly
and efficiently with the various orderings. To define a new application ordering
(called an AO in PETSc), one can call the routine
ierr = AOCreateBasic(MPI_Comm comm,int n,int *apordering,int *petscordering,AO *ao);The arrays apordering and petscordering, respectively, contain a list of integers in the application ordering and their corresponding mapped values in the PETSc ordering. Each processor can provide whatever subset of the ordering it chooses, but multiple processors should never contribute duplicate values. The argument n indicates the number of local contributed values.
For example, consider a vector of length five, where node 0 in the application ordering corresponds to node 3 in the PETSc ordering. In addition, nodes 1, 2, 3, and 4 of the application ordering correspond, respectively, to nodes 2, 1, 4, and 0 of the PETSc ordering. We can write this correspondence as
0, 1, 2, 3, 4 3, 2, 1, 4, 0 .
The user can create the PETSc-AO mappings in a number of ways. For example,
if using two processors, one could call
ierr = AOCreateBasic(PETSC_COMM_WORLD,2,{0,3},{3,4},&ao);on the first processor and
ierr = AOCreateBasic(PETSC_COMM_WORLD,3,{1,2,4},{2,1,0},&ao);on the other processor.
Once the application ordering has been created, it can be used
with either of the commands
ierr = AOPetscToApplication(AO ao,int n,int *indices); ierr = AOApplicationToPetsc(AO ao,int n,int *indices);Upon input, the n-dimensional array indices specifies the indices to be mapped, while upon output, indices contains the mapped values. Since we, in general, employ a parallel database for the AO mappings, it is crucial that all processors that called AOCreateBasic() also call these routines; these routines cannot be called by just a subset of processors in the MPI communicator that was used in the call to AOCreateBasic().
An alternative routine to create the application ordering, AO, is
ierr = AOCreateBasicIS(IS apordering,IS petscordering,AO *ao);where index sets are used instead of integer arrays.
The
mapping routines
ierr = AOPetscToApplicationIS(AO ao,IS indices); ierr = AOApplicationToPetscIS(AO ao,IS indices);will map index sets (IS objects) between orderings. Both the AOXxxToYyy() and AOXxxToYyyIS() routines can be used regardless of whether the AO was created with a AOCreateBasic() or AOCreateBasicIS().
The AO context should be destroyed with AODestroy(AO ao) and viewed with AOView(AO ao,Viewer viewer).
Although we refer to the two orderings as ``PETSc'' and ``application'' orderings, the user is free to use them both for application orderings and to maintain relationships among a variety of orderings by employing several AO contexts.
The AOxxToxx() routines allow negative entries in the input integer array. These entries are not mapped; they simply remain unchanged. This functionality enables, for example, mapping neighbor lists that use negative numbers to indicate nonexistent neighbors due to boundary conditions, etc.
In many applications one works with a global representation of a vector
(usually on a vector obtained with VecCreateMPI())
and a local representation of the same vector that includes ghost points
required for local computation.
PETSc provides routines to help map indices from a local numbering scheme to
the PETSc global numbering scheme. This is done via the following routines
ierr = ISLocalToGlobalMappingCreate(int N,int* globalnum,ISLocalToGlobalMapping* ctx); ierr = ISLocalToGlobalMappingApply(ISLocalToGlobalMapping ctx,int n,int *in,int *out); ierr = ISLocalToGlobalMappingApplyIS(ISLocalToGlobalMapping ctx,IS isin,IS* isout); ierr = ISLocalToGlobalMappingDestroy(ISLocalToGlobalMapping ctx);Here N denotes the number of local indices, globalnum contains the global number of each local number, and ISLocalToGlobalMapping is the resulting PETSc object that contains the information needed to apply the mapping with either ISLocalToGlobalMappingApply() or ISLocalToGlobalMappingApplyIS().
Note that the ISLocalToGlobalMapping routines serve a different purpose than the AO routines. In the former case they provide a mapping from a local numbering scheme (including ghost points) to a global numbering scheme, while in the latter they provide a mapping between two global numbering schemes. In fact, many applications may use both AO and ISLocalToGlobalMapping routines. The AO routines are first used to map from an application global ordering (that has no relationship to parallel processing etc.) to the PETSc ordering scheme (where each processor has a contiguous set of indices in the numbering). Then in order to perform function or Jacobian evaluations locally on each processor, one works with a local numbering scheme that includes ghost points. The mapping from this local numbering scheme back to the global PETSc numbering can be handled with the ISLocalToGlobalMapping routines.
If one is given a list of indices in a global numbering, the routine
ierr = ISGlobalToLocalMappingApply(ISLocalToGlobalMapping ctx, ISGlobalToLocalMappingType type,int nin, int *idxin,int *nout,int *idxout);will provide a new list of indices in the local numbering. Again, negative values in idxin are left unmapped. But, in addition, if type is set to IS_GTOLM_MASK, , then nout is set to nin and all global values in idxin that are not represented in the local to global mapping are replaced by -1. When type is set to IS_GTOLM_DROP, the values in idxin that are not represented locally in the mapping are not included in idxout, so that potentially nout is smaller than nin. One must pass in an array long enough to hold all the indices. One can call ISGlobalToLocalMappingApply() with idxout equal to PETSC_NULL to determine the required length (returned in nout) and then allocate the required space and call ISGlobalToLocalMappingApply() a second time to set the values.
Often it is convenient to set elements into a vector using the local node
numbering rather than the global node numbering (e.g., each processor may
maintain its own sublist of vertices and elements and number them locally).
To set values into a vector with the local numbering, one must first call
ierr = VecSetLocalToGlobalMapping(Vec v,ISLocalToGlobalMapping ctx);and then call
ierr = VecSetValuesLocal(Vec x,int n,int *indices,Scalar *values,INSERT_VALUES);Now the indices use the local numbering, rather than the global.
Distributed arrays ( DAs in PETSc, which are used in conjunction with PETSc vectors, are intended for use with regular rectangular grids when communication of nonlocal data is needed before certain local computations can occur. PETSc distributed arrays are designed only for the case in which data can be thought of as being stored in a standard multidimensional array; thus, DAs are not intended for parallelizing unstructured grid problems, etc. DAs are intended for communicating vector (field) information, they are not for storing matrices. See Section [(ref sec:bdiag)] for sparse matrix data structures intended specifically for problems defined on rectangular grids; but note that general purpose sparse matrix formats discussed in the next Chapter as also appropriate (in fact, on RISC based processors, often one sees no performance gain by using grid specific sparse matrix storage schemes).
For example, a typical situation one encounters in solving PDEs in parallel is that, to evaluate a local function, f(x), each processor requires its local portion of the vector x as well as its ghost points (the bordering portions of the vector that are owned by neighboring processors). Figure 9 illustrates the ghost points for the seventh processor of a two-dimensional, regular parallel grid. Each box represents a processor; the ghost points for the seventh processor's local part of a parallel array are shown in gray.
The PETSc DA object manages the parallel communication required while working with data stored in regular arrays. The actual data is stored in approriately sized vector objects, the DA object only contains the parallel data layout information and communication information.
One creates a distributed array communication data structure
in two dimensions with the command
ierr = DACreate2d(MPI_Comm comm,DAPeriodicType wrap,DAStencilType st,int M, int N,int m,int n,int dof,int s,int *lx,int *ly,DA *da);The arguments M and N indicate the global numbers of grid points in each direction, while m and n denote the processor partition in each direction; m*n must equal the number of processors in the MPI communicator, comm. Instead of specifying the processor layout, one may use PETSC_DECIDE for m and n so that PETSc will determine the partition using MPI. The type of periodicity of the array is specified by wrap, which can be DA_NONPERIODIC (no periodicity), DA_XYPERIODIC (periodic in both x- and y-directions), DA_XPERIODIC , or DA_YPERIODIC. The argument dof indicates the number of degrees of freedom at each array point, and s is the stencil width (i.e., the width of the ghost point region). The optional arrays lx and ly may contain the number of nodes along the x and y axis for each cell, i.e. the dimension of lx is m and the dimension of ly is n; or PETSC_NULL may be passed in.
Two types of distributed array communication data structures can be created, as specified by st. Star-type stencils that radiate outward only in the coordinate directions are indicated by DA_STENCIL_STAR, while box-type stencils are specified by DA_STENCIL_BOX. For example, for the two-dimensional case, DA_STENCIL_STAR with width 1 corresponds to the standard 5-point stencil, while DA_STENCIL_BOX with width 1 denotes the standard 9-point stencil. In both instances the ghost points are identical, the only difference being that with star-type stencils certain ghost points are ignored, potentially decreasing substantially the number of messages sent. Note that the DA_STENCIL_STAR stencils can save interprocessor communication in two and three dimensions.
These DA stencils have nothing directly to do with any finite difference stencils one might chose to use for a discretization; they only ensure that the correct values are in place for application of a user-defined finite difference stencil (or any other discretization technique).
The commands for creating distributed array communication data structures
in one and three dimensions are analogous:
ierr = DACreate1d(MPI_Comm comm,DAPeriodicType wrap,int M,int w,int s,int *lc,DA *inra); ierr = DACreate3d(MPI_Comm comm,DAPeriodicType wrap,DAStencilType stencil_type, int M,int N,int P,int m,int n,int p,int w,int s,int *lx, int *ly,int *lz,DA *inra);DA_ZPERIODIC, DA_XZPERIODIC, DA_YZPERIODIC, and DA_XYZPERIODIC are additional options in three dimensions for DAPeriodicType. The routines to create distributed arrays are collective, so that all processors in the communicator comm must call DACreateXXX().
Each DA object gives you access to two vectors: a distributed global vector
and a local vector that includes room for the appropriate ghost points. These
vectors can be accessed
with the routines
ierr = DACreateGlobalVector(DA da,Vec *g); ierr = DACreateLocalVector(DA da,Vec *l);These two vectors will generally serve as the building blocks for local and global PDE solutions, etc. Note that calling DACreateGlobalVector() or DACreateLocalVector() does not create a new vector object, but rather extracts the one existing vector of its type from the distributed array. Thus, if additional vectors are needed in a code, they can be obtained by duplicating l or g via VecDuplicate() or VecDuplicateVecs().
We emphasize that a distributed array really consists of two parts: (1) the array of data within the vector being distributed and (2) the information needed to communicate the ghost point information between processes. In most cases, several different arrays could share the same communication information. The design of the distributed array object makes this easy. Each DA operation may operate on a vector of the appropriate size. The two obvious options are the local and global vectors created by the DA creation routines (e.g., DACreate2d()) and accessible through DACreateLocalVector() and DACreateGlobalVector(), but, as mentioned above, vectors created via VecDuplicate() on these may also be used. This is why there are vector arguments to DALocalToGlobal(), etc. (see below).
PETSc currently provides no container for multiple arrays sharing the same distributed array communication; note, however, that the dof parameter handles many cases of interest.
At certain stages of many applications, there is a need to work
on a local portion of the vector, including the ghost points.
This may be done by scattering a global vector into its
local parts by using the two-stage commands
ierr = DAGlobalToLocalBegin(DA da,Vec g,InsertMode iora,Vec l); ierr = DAGlobalToLocalEnd(DA da,Vec g,InsertMode iora,Vec l);which allow the overlap of communication and computation. Since the global and local vectors, given by g and l, respectively, must be compatible with the distributed array, da, they should be generated by DACreateGlobalVector() and DACreateLocalVector() (or be duplicates of such a vector obtained via VecDuplicate()). The InsertMode can be either ADD_VALUES or INSERT_VALUES.
One can scatter the local patches into the distributed vector
with the command
ierr = DALocalToGlobal(DA da,Vec l,InsertMode mode,Vec g);Note that this function is not subdivided into beginning and ending phases, since it is purely local.
A third type of distributed array scatter is from a local
vector (including ghost points that contain irrelevant values) to
a local vector with correct ghost point values.
This scatter may be done by
commands
ierr = DALocalToLocalBegin(DA da,Vec l1,InsertMode iora,Vec l2); ierr = DALocalToLocalEnd(DA da,Vec l1,InsertMode iora,Vec l2);Since both local vectors, l1 and l2, must be compatible with the distributed array, da, they should be generated by DACreateLocalVector() (or be duplicates of such vectors obtained via VecDuplicate()). The InsertMode can be either ADD_VALUES or INSERT_VALUES.
It is possible to directly access the vector scatter contexts (see below)
used in the local-to-global ( ltog), global-to-local
( gtol), and local-to-local ( ltol)
scatters with the command
ierr = DAGetScatter(DA da,VecScatter *ltog,VecScatter *gtol,VecScatter *ltol);Most users should not need to use these contexts.
The global indices of the lower left corner of the local portion of the array
as well as the local array size can be obtained with the commands
ierr = DAGetCorners(DA da,int *x,int *y,int *z,int *m,int *n,int *p); ierr = DAGetGhostCorners(DA da,int *x,int *y,int *z,int *m,int *n,int *p);The first version excludes any ghost points, while the second version includes them. The routine DAGetGhostCorners() deals with the fact that subarrays along boundaries of the problem domain have ghost points only on their interior edges, but not on their boundary edges.
When either type of stencil is used, DA_STENCIL_STAR or DA_STENCIL_BOX, the local vectors (with the ghost points) represent rectangular arrays, including the extra corner elements in the DA_STENCIL_STAR case. This configuration provides simple access to the elements by employing two- (or three-) dimensional indexing. The only difference between the two cases is that when DA_STENCIL_STAR is used, the extra corner components are not scattered between the processors and thus contain undefined values that should not be used.
To assemble global stiffness matrices, one needs either
1) to be able to determine the global node number of each local node
including the ghost nodes. The number may be determined by using the
command
ierr = DAGetGlobalIndices(DA da,int *n,int **idx);The output argument n contains the number of local nodes, including ghost nodes, while idx contains a list of the global indices that correspond to the local nodes. Note that the Fortran interface differs slightly; see Section Array Arguments for details.
2) or to set up the vectors and matrices so that their entries may be
added using the local numbering. This is done by first calling
ierr = DAGetISLocalToGlobalMapping(DA da,ISLocalToGlobalMapping *map);followed by
ierr = VecSetLocalToGlobalMapping(Vec x,ISLocalToGlobalMapping map); ierr = MatSetLocalToGlobalMapping(Vec x,ISLocalToGlobalMapping map);Now entries may be added to the vector and matrix using the local numbering and VecSetValuesLocal() and MatSetValuesLocal().
Since the global ordering that PETSc uses to manage its parallel vectors
(and matrices) does not usually correspond to the ``natural'' ordering
of a two- or three-dimensional array, the DA structure provides
an application ordering AO (see Section Application Orderings
) that maps
between the natural ordering on a rectangular grid and the ordering PETSc
uses to parallize. This ordering context can be obtained with the command
ierr = DAGetAO(DA da,AO *ao);In Figure 10 we indicate the orderings for a two-dimensional distributed array, divided among four processors.
To facilitate general vector scatters and gathers used, for example, in updating ghost points for problems defined on unstructured grids, PETSc employs the concept of an index set. An index set, which is a generalization of a set of integer indices, is used to define scatters, gathers, and similar operations on vectors and matrices.
The following command creates a index set based on a list
of integers:
ierr = ISCreateGeneral(MPI_Comm comm,int n,int *indices, IS *is);This routine essentially copies the n indices passed to it by the integer array indices. Thus, the user should be sure to free the integer array indices when it is no longer needed, perhaps directly after the call to ISCreateGeneral(). The communicator, comm, should consist of all processors that will be using the IS.
Another standard index set is defined by a starting point ( first) and a
stride ( step), and can be created with the command
ierr = ISCreateStride(MPI_Comm comm,int n,int first,int step,IS *is);Index sets can be destroyed with the command
ierr = ISDestroy(IS is);On rare occasions the user may have to access information directly from an index set. Several commands assist in this process:
ierr = ISGetSize(IS is,int *size); ierr = ISStrideGetInfo(IS is,int *first,int *stride); ierr = ISGetIndices(IS is,int **indices);The function ISGetIndices() returns a pointer to a list of the indices in the index set. For certain index sets, this may be a temporary array of indices created specifically for a given routine. Thus, once the user finishes using the array of indices, the routine
ierr = ISRestoreIndices(IS is, int **indices);should be called to ensure that the system can free the space it may have used to generate the list of indices.
A blocked version of the index sets can be created with the command
ierr = ISCreateBlock(MPI_Comm comm,int bs,int n,int *indices, IS *is);This version is used for defining operations in which each element of the index set refers to a block of bs vector entries. Related routines analogous to those described above exist as well, including ISBlockGetIndices(), ISBlockGetSize(), ISBlockGetBlockSize(), and ISBlock(). See the man pages for details.
PETSc vectors have full support for general scatters and gathers. One can select any subset of the components of a vector to insert or add to any subset of the components of another vector. We refer to these operations as generalized scatters, though they are actually a combination of scatters and gathers.
To copy selected components from one vector
to another, one uses the following set of commands:
ierr = VecScatterCreate(Vec x,IS ix,Vec y,IS iy,VecScatter *ctx); ierr = VecScatterBegin(Vec x,Vec y,INSERT_VALUES,SCATTER_FORWARD,VecScatter ctx); ierr = VecScatterEnd(Vec x,Vec y,INSERT_VALUES,SCATTER_FORWARD,VecScatter ctx); ierr = VecScatterDestroy(VecScatter ctx);Here ix denotes the index set of the first vector, while iy indicates the index set of the destination vector. The vectors can be parallel or sequential. The only requirements are that the number of entries in the index set of the first vector, ix, equal the number in the destination index set, iy, and that the vectors be long enough to contain all the indices referred to in the index sets. The argument INSERT_VALUES specifies that the vector elements will be inserted into the specified locations of the destination vector, overwriting any existing values. To add the components, rather than insert them, the user should select the option ADD_VALUES instead of INSERT_VALUES.
To perform a conventional gather operation, the user simply makes the destination index set, iy, be a stride index set with a stride of one. Similarly, a conventional scatter can be done with an initial (sending) index set consisting of a stride. For parallel vectors, all processors that own the vector must call the scatter routines. When scattering from a parallel vector to sequential vectors, each processor has its own sequential vector that receives values from locations as indicated in its own index set. Similarly, in scattering from sequential vectors to a parallel vector, each processor has its own sequential vector that makes contributions to the parallel vector.
Caution: When INSERT_VALUES is used, if two different processors contribute different values to the same component in a parallel vector, either value may end up being inserted. When ADD_VALUES is used, the correct sum is added to the correct location.
In some cases one may wish to ``undo'' a scatter, that is perform the
scatter backwards switching the roles of the sender and receiver. This is
done by using
ierr = VecScatterBegin(Vec y,Vec x,INSERT_VALUES,SCATTER_REVERSE,VecScatter ctx); ierr = VecScatterEnd(Vec y,Vec x,INSERT_VALUES,SCATTER_REVERSE,VecScatter ctx);Note that the roles of the first two arguments to these routines must be swapped whenever the SCATTER_REVERSE option is used.
Once a VecScatter object has been created it may be used with any vectors that have the appropriate parallel data layout. That is, one can call VecScatterBegin() and VecScatterEnd() with different vectors than used in the call to VecScatterCreate() so long as they have the same parallel layout (number of elements on each processor are the same). Usually, these ``different'' vectors would ahve been obtained vai calls to VecDuplicate() from the original vectors used in the call to VecScatterCreate().
There is no PETSc routine that is the opposite of VecSetValues() , that is, VecGetValues(). Instead, the user should create a new vector where the components are to be stored and perform the appropriate vector scatter. For example, if one desires to obtain the values of the 100th and 200th entries of a parallel vector, p, one could use a code such as that within Figure 11 . In this example, the values of the 100th and 200th components are placed in the array values. In this example each processor now has the 100th and 200th component, but obviously each processor could gather any elements it needed, or none by creating an index set with no entries.
Vec p, x; /* initial vector, destination vector */ VecScatter scatter; /* scatter context */ IS from, to; /* index sets that define the scatter */ Scalar *values; int idx_from[] = {100,200}, idx_to[] = {0,1};VecCreateSeq(PETSC_COMM_SELF,2,&x); ISCreateGeneral(PETSC_COMM_SELF,2,idx_from,&from); ISCreateGeneral(PETSC_COMM_SELF,2,idx_to,&to); VecScatterCreate(p,from,x,to,&scatter); VecScatterBegin(p,x,INSERT_VALUES,SCATTER_FORWARD,scatter); VecScatterEnd(p,x,INSERT_VALUES,SCATTER_FORWARD,scatter); VecGetArray(x,&values); ISDestroy(from); ISDestroy(to); VecScatterDestroy(scatter);
The scatters provide a very general method for managing the communication of
required ghost values for unstructured grid computations. One scatters
the global vector into a local ``ghosted'' work vector, performs the computation
on the local work vectors, and then scatters back into the global solution
vector. In the simplest case this may be written as
Function: (Input Vec globalin, Output Vec globalout)ierr = VecScatterBegin(Vec globalin,Vec localin,InsertMode INSERT_VALUES,ScatterMode SCATTER_FORWARD,VecScatter scatter); ierr = VecScatterEnd(Vec globalin,Vec localin,InsertMode INSERT_VALUES,ScatterMode SCATTER_FORWARD,VecScatter scatter); /* For example, do local calculations from localin to localout */ ierr = VecScatterBegin(Vec localout,Vec globalout,InsertMode ADD_VALUES,ScatterMode SCATTER_REVERSE,VecScatter scatter); ierr = VecScatterEnd(Vec localout,Vec globalout,InsertMode ADD_VALUES,ScatterMode SCATTER_REVERSE,VecScatter scatter);
We recommend that application developers skip this section on a first reading. It contains information about more advanced use of PETSc vectors to improve efficiency slightly. Once an application code is fully debugged and optimized these techniques can be tried to slightly decrease memory use and improve computation speed.
There are two minor drawbacks to the basic approach described above:
ierr = VecCreateGhost(MPI_Comm comm,int n,int N,int nghost,int *ghosts,Vec *vv)or
ierr = VecCreateGhostWithArray(MPI_Comm comm,int n,int N,int nghost,int *ghosts, Scalar *array,Vec *vv)Here n is the number of local vector entries, N is the number of global entries (or PETSC_NULL) and nghost is the number of ghost entries. The array ghosts is of size nghost and contains the global vector location for each local ghost location. Using VecDuplicate() or VecDuplicateVecs() on a ghosted vector will generate additional ghosted vectors.
In many ways a ghosted vector behaves just like any other MPI vector created
by VecCreateMPI(), the difference is that the ghosted vector has an additional
``local'' representation that allows one to access the ghost locations. This is done
through the call to
ierr = VecGhostGetLocalRepresentation(Vec g,Vec *l);The vector l is a sequential representation of the parallel vector g that shares the same array space (and hence numerical values); but allows one to access the ``ghost'' values past ``the end of the'' array. Note that one access the entries in l using the local numbering of elements and ghosts, while they are accessed in g using the global numbering.
A common usage of a ghosted vector is given by
ierr = VecGhostUpdateBegin(Vec globalin,InsertMode INSERT_VALUES,ScatterMode SCATTER_FORWARD); ierr = VecGhostUpdateEnd(Vec globalin,InsertMode INSERT_VALUES,ScatterMode SCATTER_FORWARD); ierr = VecGhostGetLocalRepresentation(Vec globalin,Vec *localin); ierr = VecGhostGetLocalRepresentation(Vec globalout,Vec *localout); /* Do local calculations from localin to localout */ ierr = VecGhostRestoreLocalRepresentation(Vec globalin,Vec *localin); ierr = VecGhostRestoreLocalRepresentation(Vec globalout,Vec *localout); ierr = VecGhostUpdateBegin(Vec globalout,InsertMode ADD_VALUES,ScatterMode SCATTER_REVERSE); ierr = VecGhostUpdateEnd(Vec globalout,InsertMode ADD_VALUES,ScatterMode SCATTER_REVERSE);The routines VecGhostUpdateBegin/End() are equivalent to the routines VecScatterBegin/End() above except that since they are scattering into the ghost locations, they do not need to copy the local vector values, which are already in place. In addition, the user does not have to allocate the local work vector, since the ghosted vector already has allocated slots to contain the ghost values.
The input arguments INSERT_VALUES and SCATTER_FORWARD cause the ghost values to be correctly updated from the appropriate processor. The arguments ADD_VALUES and SCATTER_REVERSE update the ``local'' portions of the vector from all the other processors' ghost values. This would be appropriate, for example, when performing a finite element assembly of a load vector.
Section Partitioning discusses the important topic of partitioning an unstructured grid.
PETSc 2.0 provides a variety of matrix implementations because no single matrix format is appropriate for all problems. Currently we support dense storage and compressed sparse row storage (both sequential and parallel versions), as well as several specialized formats. Additional formats can be added.
This chapter describes the basics of using PETSc matrices in general (regardless of the particular format chosen) and discusses tips for efficient use of the several simple uniprocessor and parallel matrix types. Details regarding the ever-expanding suite of PETSc matrices are given in Section . The use of PETSc matrices involves the following actions: create a particular type of matrix, insert values into it, process the matrix, use the matrix for various computations, and finally destroy the matrix. The application code does not need to know or care about the particular storage formats of the matrices.
The simplest routine for forming a PETSc matrix, A, is
ierr = MatCreate(MPI_Comm comm,int M,int N,Mat *A)This routine generates a sequential matrix when running on one processor and a parallel matrix for two or more processors; the particular matrix format is set by the user via options database commands. The user specifies only the global matrix dimensions, given by M and N, while PETSc determines the appropriate local dimensions and completely controls memory allocation. This routine facilitates switching among various matrix types, for example, to determine the format that is most efficient for a certain application. By default, MatCreate() employs the sparse AIJ format, which is discussed in detail Section Sparse Matrices . See the manual pages for further information about available matrix formats.
To insert or add entries to a matrix, one can call a variant of
MatSetValues, either
ierr = MatSetValues(Mat A,int m,int *im,int n,int *in,Scalar *values,INSERT_VALUES);or
ierr = MatSetValues(Mat A,int m,int *im,int n,int *in,Scalar *values,ADD_VALUES);This routine inserts or adds a logically dense subblock of dimension m*n into the matrix. The integer indices im and in, respectively, indicate the global row and column numbers to be inserted. MatSetValues() uses the standard C convention, where the row and column matrix indices begin with zero regardless of the storage format employed. The array values is logically two-dimensional, containing the values that are to be inserted. By default the values are given in row major order, which is the opposite of the Fortran convention. To allow the insertion of values in column major order, one can call the command
ierr = MatSetOption(Mat A,MAT_COLUMN_ORIENTED);Warning: Several of the sparse implementations do not currently support the column-oriented option!
This notation should not be a mystery to anyone. For example, to insert one matrix into another when using Matlab, one uses the command A(im,in) = B; where im and in contain the indices for the rows and columns. This action is identical to the calls above to MatSetValues().
When using the block compressed sparse row matrix format ( MATSEQBAIJ or MATMPIBAIJ), one can insert elements more efficiently using the block variant, MatSetValuesBlocked().
The function MatSetOption() accepts several other inputs; see
the manual page for details. We
discuss two of these options, which are related to the efficiency of the
assembly process. To indicate to PETSc that the row ( im) or
column ( in) indices set with MatSetValues() are sorted,
one uses the command
ierr = MatSetOption(Mat A,MAT_ROWS_SORTED);or
ierr = MatSetOption(Mat A,MAT_COLUMNS_SORTED);Note that these flags indicate the format of the data passed in with MatSetValues(); they do not have anything to do with how the sparse matrix data is stored internally in PETSc.
After the matrix elements have been inserted or added into the matrix,
it must be processed before it can be used. The routines for matrix
processing are
ierr = MatAssemblyBegin(Mat A,MAT_FINAL_ASSEMBLY); ierr = MatAssemblyEnd(Mat A,MAT_FINAL_ASSEMBLY);By placing other code between these two calls, the user can perform computations while messages are in transition. Calls to MatSetValues() with the INSERT_VALUES and ADD_VALUES options cannot be mixed without intervening calls to the assembly routines. For such intermediate assembly calls the second routine argument typically should be MAT_FLUSH_ASSEMBLY, which omits some of the work of the full assembly process. MAT_FINAL_ASSEMBLY is required only in the last matrix assembly before a matrix is used.
Even though one may insert values into PETSc matrices without regard
to which processor eventually stores them, for efficiency
reasons we usually recommend generating most entries on the
processor where they are destined to be stored. To help the
application programmer with this task for matrices that are
distributed across the processors by ranges, the routine
ierr = MatGetOwnershipRange(Mat A,int *first_row,int *last_row);informs the user that all rows from first_row to last_row-1 will be stored on the local processor.
In the sparse matrix implementations, once the assembly routines have been called, the matrices are compressed and can be used for matrix-vector multiplication, etc. Inserting new values into the matrix at this point will be expensive, since it requires copies and possible memory allocation. Thus, whenever possible one should completely set the values in the matrices before calling the final assembly routines.
If one wishes to repeatedly assemble matrices that retain the same
nonzero pattern (such as within a nonlinear or time-dependent
problem), the option
ierr = MatSetOption(Mat mat,MAT_NO_NEW_NONZERO_LOCATIONS);should be specified after the first matrix has been fully assembled. This option ensures that certain data structures and communication information will be reused (instead of regenerated) during successive steps, thereby increasing efficiency. See ${}PETSC_DIR/src/sles/examples/tutorials/ex5.c for a simple example of solving two linear systems that use the same matrix data structure.
The default matrix representation within PETSc is the general sparse AIJ format (also called the Yale sparse matrix format or compressed sparse row format, CSR). This section discusses tips for efficiently using this matrix format for large-scale applications. Additional formats (such as block compressed row and block diagonal storage, which are generally much more efficient for problems with multiple degrees of freedom per node) are further discussed in Section . Beginning users need not concern themselves initially with such details and may wish to proceed directly to Section Basic Matrix Operations . However, when an application code progresses to the point of tuning for efficiency and/or generating timing results, it is crucial to read this information.
In the PETSc AIJ matrix formats, we store the nonzero elements by rows, along with an array of corresponding column numbers and an array of pointers to the beginning of each row. Note that the diagonal matrix entries are stored with the rest of the nonzeros (not separately).
To create a sequential AIJ sparse matrix, A,
with m rows and n columns,
one uses the command
ierr = MatCreateSeqAIJ(PETSC_COMM_SELF,int m,int n,int nz,int *nzz,Mat *A);where nz or nnz can be used to preallocate matrix memory, as discussed below. The user can set nz=0 and nzz=PETSC_NULL for PETSc to control all matrix memory allocation.
The sequential and parallel AIJ matrix storage formats by default employ i-nodes (identical nodes) when possible. We search for consecutive rows with the same nonzero structure, thereby reusing matrix information for increased efficiency. Related options database keys are -mat_aij_no_inode (do not use inodes) and -mat_aij_inode_limit <limit> (set inode limit (max limit=5)). Note that problems with a single degree of freedom per grid node will automatically not use I-nodes.
By default the internal data representation for the AIJ formats employs zero-based indexing. For compatibility with standard Fortran storage, thus enabling use of external Fortran software packages such as SPARSKIT, the option -mat_aij_oneindex enables one-based indexing, where the stored row and column indices begin at one, not zero. All user calls to PETSc routines, regardless of this option, use zero-based indexing.
The dynamic process of allocating new memory and copying from the old storage to the new is intrinsically very expensive. Thus, to obtain good performance when assembling an AIJ matrix, it is crucial to preallocate the memory needed for the sparse matrix. The user has two choices for preallocating matrix memory via MatCreateSeqAIJ().
One can use the scalar nz to specify the expected number of nonzeros for each row. This is generally fine if the number of nonzeros per row is roughly the same throughout the matrix (or as a quick and easy first step for preallocation). If one underestimates the actual number of nonzeros in a given row, then during the assembly process PETSc will automatically allocate additional needed space. However, this extra memory allocation can slow the computation,
If different rows have very different numbers of nonzeros, one
should attempt to indicate (nearly) the exact number of elements
intended for the various rows with the optional array, nzz of
length m, where m is the number of rows, for example
int nnz[m]; nnz[0] = <nonzeros in row 0> nnz[1] = <nonzeros in row 1> .... nnz[m-1] = <nonzeros in row m-1>In this case, the assembly process will require no additional memory allocations if the nnz estimates are correct. If, however, the nnz estimates are incorrect, PETSc will automatically obtain the additional needed space, at a slight loss of efficiency.
Using the array nnz to preallocate memory is especially important for efficient matrix assembly if the number of nonzeros varies considerably among the rows. One can generally set nnz either by knowing in advance the problem structure (e.g., the stencil for finite difference problems on a structured grid) or by precomputing the information by using a segment of code similar to that for the regular matrix assembly. The overhead of determining the nnz array will be quite small compared with the overhead of the inherently expensive mallocs and moves of data that are needed for dynamic allocation during matrix assembly.
Thus, when assembling a sparse matrix with very different
numbers of nonzeros in various rows, one could proceed
as follows for finite difference methods:
1212
- Allocate integer array nnz.
- Loop over grid, counting the expected number of nonzeros for the row(s)
associated with the various grid points.
- Create the sparse matrix via MatCreateSeqAIJ() or alternative.
- Loop over the grid, generating matrix entries and inserting
in matrix via MatSetValues().
For (vertex-based) finite element type calculations, an analogous procedure is as follows:
1212
- Allocate integer array nnz.
- Loop over vertices, computing the number of neighbor vertices, which determines the
number of nonzeros for the corresponding matrix row(s).
- Create the sparse matrix via MatCreateSeqAIJ() or alternative.
- Loop over elements, generating matrix entries and inserting
in matrix via MatSetValues().
The -log_info option causes the routines
MatAssemblyBegin() and MatAssemblyEnd() to print
information about the success of the preallocation. Consider the
following example for the MATSEQAIJ matrix format:
MatAssemblyEnd_SeqAIJ:Matrix size 100 X 100; storage space: 2000 unneeded, 1000 used MatAssemblyEnd_SeqAIJ:Number of mallocs during MatSetValues is 0The first line indicates that the user preallocated 3000 spaces but only 1000 were used. The second line indicates that the user preallocated enough space so that PETSc did not have to internally allocate additional space (an expensive operation). In the next example the user did not preallocate sufficient space, as indicated by the fact that the number of mallocs is very large (bad for efficiency):
MatAssemblyEnd_SeqAIJ:Matrix size 1000 X 1000; storage space: 47 unneeded, 100000 used MatAssemblyEnd_SeqAIJ:Number of mallocs during MatSetValues is 40000Although at first glance such procedures for determining the matrix structure in advance may seem unusual, they are actually very efficient because they alleviate the need for dynamic construction of the matrix data structure, which can be very expensive.
Parallel sparse matrices with the AIJ
format can be created with the command
ierr = MatCreateMPIAIJ(MPI_Comm comm,int m,int n,int M,int N,int d_nz, int *d_nnz, int o_nz,int *o_nnz,Mat *A);A is the newly created matrix, while the arguments m, n, M, and N, indicate the number of local rows and columns and the number of global rows and columns, respectively. Either the local or global parameters can be replaced with PETSC_DECIDE, so that PETSc will determine them. The matrix is stored with a fixed number of rows on each processor, given by m, or determined by PETSc if m is PETSC_DECIDE.
If PETSC_DECIDE is not used for the arguments m and n, then the user must ensure that they are chosen to be compatible with the vectors. To do this, one first considers the matrix-vector product y = A x. The m that is used in the matrix creation routine MatCreateMPIAIJ() must match the local size used in the vector creation routine VecCreateMPI() for y. Likewise, the n used must match that used as the local size in VecCreateMPI() for x.
The user must set d_nz=0, o_nz=0, d_nnz=PETSC_NULL, and o_nnz=PETSC_NULL for PETSc to control dynamic allocation of matrix memory space. Analogous to nz and nnz for the routine MatCreateSeqAIJ(), these arguments optionally specify nonzero information for the diagonal ( d_nz and d_nnz) and off-diagonal ( o_nz and o_nnz) parts of the matrix. For a square global matrix, we define each processor's diagonal portion to be its local rows and the corresponding columns (a square submatrix); each processor's off-diagonal portion encompasses the remainder of the local matrix (a rectangular submatrix). The rank in the MPI communicator determines the absolute ordering of the blocks. That is, the process with rank 0 in the communicator given to MatCreateMPIAIJ contains the top rows of the matrix; the ith process in that communicator contains the ith block of the matrix.
As discussed above, preallocation of memory is critical for achieving good performance during matrix assembly, as this reduces the number of allocations and copies required. We present an example for three processors to indicate how this may be done for the MATMPIAIJ matrix format. Consider the 8 by 8 matrix, which is partitioned by default with three rows on the first processor, three on the second and two on the third.
( cccccccccc
1 & 2 & 0 & | & 0 & 3 & 0 & | & 0 & 4
0 & 5 & 6 & | & 7 & 0 & 0 & | & 8 & 0
9 & 0 & 10 & | & 11 & 0 & 0 & | & 12 & 0
13 & 0 & 14 & | & 15 & 16 & 17 & | & 0 & 0
0 & 18 & 0 & | & 19 & 20 & 21 & | & 0 & 0
0 & 0 & 0 & | & 22 & 23 & 0 & | & 24 & 0
25 & 26 & 27 & | & 0 & 0 & 28 & | & 29 & 0
30 & 0 & 0 & | & 31 & 32 & 33 & | & 0 &34
)
The ``diagonal'' submatrix, d, on the first processor is given by
( ccc
1 & 2 & 0
0 & 5 & 6
9 & 0 & 10
),
while the ``off-diagonal'' submatrix, o, matrix is given by
( ccccc
0 & 3 & 0 & 0 & 4
7 & 0 & 0 & 8 & 0
11 & 0 & 0 & 12 & 0
).
For the first processor one could set d_nz to 2 (since each row has 2 nonzeros) or, alternatively, set d_nzz to {2,2,}2. The o_nz could be set to 2 since each row of the o matrix has 2 nonzeros, or o_nzz could be set to {2,2,}2.
For the second processor the d submatrix is given by
( cccccccccc
15 & 16 & 17
19 & 20 & 21
22 & 23 & 0
) .
Thus, one could set d_nz to 3, since the maximum number of nonzeros in each row is 3, or alternatively one could set d_nzz to {3,3,}2, thereby indicating that the first two rows will have 3 nonzeros while the third has 2. The corresponding o submatrix for the second processor is
( cccccccccc
13 & 0 & 14 & 0 & 0
0 & 18 & 0 & 0 & 0
0 & 0 & 0 & 24 & 0
)
so that one could set o_nz to 2 or o_nzz to {2,1,}1.
Note that the user never directly works with the d and o submatrices, except when preallocating storage space as indicated above. Also, the user need not preallocate exactly the correct amount of space; as long as a sufficiently close estimate is given, the high efficiency for matrix assembly will remain.
As described above, the option -log_info
will print information about the success of preallocation during
matrix assembly. For the MATMPIAIJ format, PETSc will also list
the number of elements owned by on each processor that were generated
on a different processor. For example, the statements
[0]MatAssemblyBegin_MPIAIJ:Number of off processor values 10 [1]MatAssemblyBegin_MPIAIJ:Number of off processor values 7 [2]MatAssemblyBegin_MPIAIJ:Number of off processor values 5indicate that very few values have been generated on different processors. On the other hand, the statements
[0]MatAssemblyBegin_MPIAIJ:Number of off processor values 100000 [1]MatAssemblyBegin_MPIAIJ:Number of off processor values 77777indicate that many values have been generated on the ``wrong'' processors. This situation can be very inefficient, since the transfer of values to the ``correct'' processor is generally expensive. By using the command MatGetOwnershipRange() in application codes, the user should be able to generate most entries on the owning processor.
Note: It is fine to generate some entries on the ``wrong'' processor. Often this can lead to cleaner, simpler, less buggy codes. One should never make code overly complicated in order to generate all values locally. Rather, one should organize the code in such a way that most values are generated locally.
PETSc provides both sequential and parallel dense matrix formats,
where each processor stores its entries in a column-major array in the
usual Fortran style. To create a sequential, dense PETSc matrix,
A of dimensions m by n, the user should
call
ierr = MatCreateSeqDense(PETSC_COMM_SELF,int m,int n,Scalar *data,Mat *A);The variable data enables the user to optionally provide the location of the data for matrix storage (intended for Fortran users who wish to allocate their own storage space). Most users should merely set data to PETSC_NULL for PETSc to control matrix memory allocation. To create a parallel, dense matrix, A, the user should call
ierr = MatCreateMPIDense(MPI_Comm comm,int m,int n,int M,int N,Scalar *data,Mat *A)The arguments m, n, M, and N, indicate the number of local rows and columns and the number of global rows and columns, respectively. Either the local or global parameters can be replaced with PETSC_DECIDE, so that PETSc will determine them. The matrix is stored with a fixed number of rows on each processor, given by m, or determined by PETSc if m is PETSC_DECIDE.
PETSc does not currently provide parallel dense direct solvers. Our focus is on sparse iterative solvers.
Table 2 summarizes basic PETSc matrix operations. We briefly discuss a few of these routines in more detail below.
The parallel matrix can multiply a vector with n
local entries, returning a vector with m local entries. That is,
to form the product
ierr = MatMult(Mat A,Vec x,Vec y);the vectors x and y should be generated with
ierr = VecCreateMPI(MPI_Comm comm,n,N,&x); ierr = VecCreateMPI(MPI_Comm comm,m,M,&y);By default, if the user lets PETSc decide the number of components to be stored locally (by passing in PETSC_DECIDE as the second argument to VecCreateMPI() or using VecCreate()), vectors and matrices of the same dimension are automatically compatible for parallel matrix-vector operations.
Along with the matrix-vector multiplication routine, there is
a version for the transpose of the matrix,
ierr = MatMultTrans(Mat A,Vec x,Vec y);There are also versions that add the result to another vector:
ierr = MatMultAdd(Mat A,Vec x,Vec y,Vec w); ierr = MatMultTransAdd(Mat A,Vec x,Vec y,Vec w);These routines, respectively, produce w = A*x + y and w = AT*x + y . In C it is legal for the vectors y and w to be identical. In Fortran, this situation is forbidden by the language standard, but we allow it anyway.
One can print a matrix (sequential or parallel) to the screen with the
command
ierr = MatView(Mat mat,VIEWER_STDOUT_WORLD);Other viewers can be used as well. For instance, one can draw the nonzero stucture of the matrix into the default X-window with the command
ierr = MatView(Mat mat,VIEWER_DRAWX_WORLD);Use
ierr = MatView(Mat mat,Viewer viewer);where viewer was obtained with ViewerDrawOpenX(). Additional viewers and options are given in the MatView() man page and Section Viewers: Looking at PETSc Objects .
Function Name Operation MatAXPY(Scalar *a,Mat X, Mat Y); Y = Y + a*X MatMult(Mat A,Vec x, Vec y); y = A*x MatMultAdd(Mat A,Vec x, Vec y,Vec z); z = y + A*x MatMultTrans(Mat A,Vec x, Vec y); y = A^T*x MatMultTransAdd(Mat A,Vec x, Vec y,Vec z); z = y + A^T*x MatNorm(Mat A,NormType type, double *r); r = ||A||_type MatDiagonalScale(Mat A,Vec l,Vec r); A = diag(l)*A*diag(r) MatScale(Scalar *a,Mat A); A = a*A MatConvert(Mat A,MatType type,Mat *B); B = A MatCopy(Mat A,Mat B); B = A MatGetDiagonal(Mat A,Vec x); x = diag(A) MatTranspose(Mat A,Mat* B); B = A^T MatZeroEntries(Mat A); A = 0
Some people like to use matrix-free methods, which do not require
explicit storage of the matrix, for the numerical solution of partial
differential equations. To support matrix-free methods in PETSc, one
can use the following command to create a Mat structure without
ever actually generating the matrix:
ierr = MatCreateShell(MPI_Comm comm,int m,int n,int M,int N,void *ctx,Mat *mat);Here M and N are the global matrix dimensions (rows and columns), m and n are the local matrix dimensions, and ctx is a pointer to data needed by any user-defined shell matrix operations; the manual page has additional details about these parameters. Most matrix-free algorithms require only the application of the linear operator to a vector. To provide this action, the user must write a routine with the calling sequence
ierr = UserMult(Mat mat,Vec x,Vec y);and then associate it with the matrix, mat, by using the command
ierr = MatShellSetOperation(Mat mat,MatOperation MATOP_MULT, int (*UserMult)(Mat,Vec,Vec));Here MATOP_MULT is the name of the operation for matrix-vector multiplication. Within each user-defined routine (such as UserMult()), the user should call MatShellGetContext() to obtain the user-defined context, ctx, that was set by MatCreateShell(). This shell matrix can be used with the iterative linear equation solvers discussed in the following chapters.
The routine MatShellSetOperation() can be used to set any other matrix operations as well. The file ${}PETSC_DIR/include/mat.h provides a complete list of matrix operations, which have the form MATOP_<OPERATION>, where <OPERATION> is the name (in all capital letters) of the user interface routine (for example, MatMult() rightarrow MATOP_MULT). All user-provided functions have the same calling sequence as the usual matrix interface routines, since the user-defined functions are intended to be accessed through interface, e.g., MatMult(Mat,Vec,Vec) rightarrow UserMult(Mat,Vec,Vec).
Note that MatShellSetOperation() can also be used as a ``backdoor'' means of introducing user-defined changes in matrix operations for other storage formats (for example, to override the default LU factorization routine supplied within PETSc for the MATSEQAIJ format). However, we urge anyone who introduces such changes to use caution, since it would be very easy to accidentally create a bug in the new routine that could affect other routines as well.
In many iterative calculations (for instance, in a nonlinear equations
solver), it is important for efficiency purposes to reuse the nonzero
structure of a matrix, rather than determining it anew every time
the matrix is generated. To retain a given matrix but reinitialize
its contents, one can employ
ierr = MatZeroEntries(Mat A);This routine will zero the matrix entries in the data structure but keep all the data that indicates where the nonzeros are located. In this way a new matrix assembly will be much less expensive, since no memory allocations or copies will be needed. Of course, one can also explicitly set selected matrix elements to zero by calling MatSetValues().
In the numerical solution of elliptic partial differential equations, it can be cumbersome to deal with Dirichlet boundary conditions. In particular, one would like to assemble the matrix without regard to boundary conditions and then at the end apply the Dirichlet boundary conditions. In numerical analysis classes this process is usually presented as moving the known boundary conditions to the right-hand side and then solving a smaller linear system for the interior unknowns. Unfortunately, implementing this requires extracting a large submatrix from the original matrix and creating its corresponding data structures. This process can be expensive in terms of both time and memory.
One simple way to deal with this difficulty is to replace those rows in the
matrix associated with known boundary conditions, by rows of the
identity matrix (or some scaling of it). This action can be done with
the command
ierr = MatZeroRows(Mat A,IS rows,Scalar *diag_value);For sparse matrices this removes the data structures for certain rows of the matrix. If the pointer diag_value is PETSC_NULL, it even removes the diagonal entry. If the pointer is not null, it uses that given value at the pointer location in the diagonal entry of the eliminated rows.
Another matrix routine of interest is
ierr = MatConvert(Mat mat,MatType newtype,Mat *M)which converts the matrix mat to new matrix, M, that has either the same or different format. Set newtype to MATSAME to copy the matrix, keeping the same matrix format. See ${}PETSC_DIR/include/mat.h for other available matrix types; standard ones are MATSEQDENSE, MATSEQAIJ, MATMPIAIJ, MATMPIROWBS, MATSEQBDIAG, MATMPIBDIAG, MATSEQBAIJ, and MATMPIBAIJ.
In certain applications it may be necessary for application codes
to directly access elements of a matrix. This may be done by using the
the command
ierr = MatGetRow(Mat A,int row, int *ncols,int **cols,Scalar **vals);The argument ncols returns the number of nonzeros in that row, while cols and vals returns the column indices (with indices starting at zero) and values in the row. If only the column indices are needed (and not the corresponding matrix elements), one can use PETSC_NULL for the vals argument. Similarly, one can use PETSC_NULL for the cols argument. The user can only examine the values extracted with MatGetRow(); the values cannot be altered. To change the matrix entries, one must use MatSetValues().
Once the user has finished using a row, he or she must call
ierr = MatRestoreRow(Mat A,int row,int *ncols,int **cols,Scalar **vals);to free any space that was allocated during the call to MatGetRow().
For almost all unstructured grid computation, the distribution of portions of the grid across the processor's work load and memory can have a very large impact on performance. In most PDE calculations the grid partitioning and distribution across the processors can (and should) be done in a ``pre-processing'' step before the numerical computations. However, this does not mean it need be done in a separate, sequential program, rather it should be done before one sets up the parallel grid data structures in the actual program. PETSc provides an interface to the ParMETIS (developed by George Karypis; see the docs/installation.html file for directions on installing PETSc to use ParMETIS), to allow the partitioning to be done in parallel. PETSc does not currently provide directly support for dynamic repartitioning, load balancing by migrating matrix entries between processors, etc. For problems that require mesh refinement, PETSc uses the ``rebuild the data structure'' approach, as opposed to the ``maintain dynamic data structures that support the insertion/deletion of additional vector and matrix rows and columns entries'' approach.
Partitioning in PETSc is organized around the Partitioning object.
One first creates a parallel matrix that contains the connectivity information about the
grid (or other graph-type object) that is to be partitioned. This is done with the
command
ierr = MatCreateMPIAdj(MPI_Comm comm,int mlocal,int n,int *ia,int *ja, Mat *Adj);The argument mlocal indicates the number of rows of the graph being provided by the given processor, n is the total number of columns; equal to the sum of all the mlocal. The arguments ia and ja are the row pointers and column pointers for the given rows, these are the usual format for parallel compressed sparse row storage, using indices starting at 0, not 1.
For example, we demonstrate the form of the ia and ja for a triangular grid where we
(1) partition by element (triangle)
and (2) partition by vertex.
ierr = PartitioningCreate(MPI_Comm comm,Partitioning *part); ierr = PartitioningSetAdjacency(Partitioning part,Mat Adj); ierr = PartitioningSetFromOptions(Partitioning part); ierr = PartitioningApply(Partitioning part,IS *is); ierr = PartitioningDestroy(Partitioning part); ierr = MatDestroy(Mat Adj); ierr = ISPartitioningToNumbering(IS is,IS *isg);The resulting isg contains for each local node the new global number of that node. The resulting is contains the new processor number that each local node has been assigned to.
Now that a new numbering of the nodes has been determined one must
renumber all the nodes and migrate the grid information to the correct processor.
The command
ierr = AOCreateBasicIS(isg,PETSC_NULL,&ao);generates, see Section Application Orderings , an AO object that can be used in conjunction with the is and gis to move the relevant grid information to the correct processor and renumber the nodes etc.
PETSc does not currently provide tools that completely manage the migration and node renumbering, since it will be dependent on the particular data structure you use to store the grid information and the type of grid information that you need for your application. We do plan to include more support for this in the future, but designing the appropriate user interface and providing a scalable implementation that can be used for a wide variety of different grids requires a great deal of time. Thus we demonstrate how this may be managed for the model grid depicted above using (1) element based partitioning and (2) a vertex based partitioning.
SLES is the heart of PETSc, because it provides uniform and efficient access to all of the package's linear system solvers, including parallel and sequential, direct and iterative. SLES is intended for solving nonsingular systems of the form A x = b, where A denotes the matrix representation of a linear operator, b is the right-hand-side vector, and x is the solution vector. SLES uses the same calling sequence for both direct and iterative solution of a linear system. In addition, particular solution techniques and their associated options can be selected at runtime.
The combination of a Krylov subspace method and a preconditioner is at the center of most modern numerical codes for the iterative solution of linear systems. See, for example, [(ref fgn)] for an overview of the theory of such methods. SLES creates a simplified interface to the lower-level KSP and PC modules within the PETSc package. The KSP component, discussed in Section Krylov Methods , provides many popular Krylov subspace iterative methods; the PC module, described in Section Preconditioners , includes a variety of preconditioners. Although both KSP and PC can be used directly, users should employ the interface of SLES.
To solve a linear system with SLES, one must first create a solver context
with the command
ierr = SLESCreate(MPI_Comm comm,SLES *sles);Here comm is the MPI communicator, and sles is the newly formed solver context. Before actually solving a linear system with SLES, the user must call the following routine to set the matrices associated with the linear system:
ierr = SLESSetOperators(SLES sles,Mat Amat,Mat Pmat,MatStructure flag);The argument Amat, representing the matrix that defines the linear system, is a symbolic place holder for any kind of matrix. In particular, SLES does support matrix-free methods. The routine MatCreateShell() in Section Matrix-Free Matrices provides further information regarding matrix-free methods. Typically the preconditioning matrix, Pmat, is the same as the matrix that defines the linear system, Amat; however, occasionally these matrices differ (for instance, when preconditioning a matrix obtained from a high order method with that from a low order method). The argument flag can be used to eliminate unnecessary work when repeatedly solving linear systems of the same size with the same preconditioning method; when solving just one linear system, this flag is ignored. The user can set flag as follows:
Much of the power of SLES can be accessed through the single routine
ierr = SLESSetFromOptions(SLES sles);This routine accepts the options -h and -help as well as any of the KSP and PC options discussed below. To solve a linear system, one merely executes the command
ierr = SLESSolve(SLES sles,Vec b,Vec x,int *its);where b and x respectively denote the right-hand-side and solution vectors. On return, the parameter its contains either the iteration number at which convergence was successfully reached, or the negative of the iteration at which divergence or breakdown was detected. Section Convergence Tests gives more details regarding convergence testing. Note that multiple linear solves can be performed by the same SLES context. Once the SLES context is no longer needed, it should be destroyed with the command
ierr = SLESDestroy(SLES sles);The above procedure is sufficient for general use of the SLES package. One additional step is required for users who wish to customize certain preconditioners (e.g., see Section Block Jacobi and ) or to log certain performance data using the PETSc profiling facilities (as discussed in Chapter Profiling ). In this case, the user can optionally explicitly call
ierr = SLESSetUp(SLES sles,Vec b,Vec x);before calling SLESSolve() to perform any setup required for the linear solvers. The explicit call of this routine enables the separate monitoring of any computations performed during the set up phase, such as incomplete factorization for the ILU preconditioner.
The default solver within SLES is restarted GMRES, preconditioned for
the uniprocessor case with ILU(0), and for the multiprocessor case
with the block Jacobi method (with one block per processor, each of
which is solved with ILU(0)). A variety of other solvers
and options are also available.
To allow application programmers to set any of the preconditioner or
Krylov subspace options directly within the code, we provide routines
that extract the PC and KSP contexts,
ierr = SLESGetPC(SLES sles,PC *pc); ierr = SLESGetKSP(SLES sles,KSP *ksp);The application programmer can then directly call any of the PC or KSP routines to modify the corresponding default options.
To solve a linear system with a direct solver (currently supported only for sequential matrices) one may use the options -pc_type lu -ksp_type preonly (see below).
By default, if a direct solver is used, the factorization is not done in-place. This approach is to prevent the user from the unexpected surprise of having a corrupted matrix after a linear solve. The routine PCLUSetUseInPlace(), discussed below, causes factorization to be done in-place.
When solving multiple linear systems of the same size with the same method, several options are available. To solve successive linear systems having the same preconditioner matrix (i.e., the same data structure with exactly the same matrix elements) but different right-hand-side vectors, the user should simply call SLESSolve() multiple times. The preconditioner setup operations (e.g., factorization for ILU) will be done during the first call to SLESSolve() only; such operations will not be repeated for successive solves.
To solve successive linear systems that have different preconditioner matrices (i.e., the matrix elements and/or the matrix data structure change), the user must call SLESSetOperators() and SLESSolve() for each solve. See Section Using SLES for a description of various flags for SLESSetOperators() that can save work for such cases.
The Krylov subspace methods accept a number of options, many of which
are discussed below. First, to set the Krylov subspace method that is to
be used, one calls the command
ierr = KSPSetType(KSP ksp,KSPType method);The type can be one of KSPRICHARDSON, KSPCHEBYCHEV, KSPCG, KSPGMRES, KSPTCQMR, KSPBCGS, KSPCGS, KSPTFQMR, KSPCR, KSPLSQR, or KSPPREONLY. The KSP method can also be set with the options database command -ksp_type, followed by one of the options richardson, chebychev, cg, gmres, tcqmr, bcgs, cgs, tfqmr, cr, lsqr, or preonly. There are method-specific options for the Richardson, Chebychev, and GMRES methods.
ierr = KSPRichardsonSetScale(KSP ksp,double damping_factor); ierr = KSPChebychevSetEigenvalues(KSP ksp,double emax,double emin); ierr = KSPGMRESSetRestart(KSP ksp,int max_steps);The default parameter values are damping_factor=1.0, emax=0.01, emin=100.0, and max_steps=30. The GMRES restart and Richardson damping factor can also be set with the options -ksp_gmres_restart <n> and -ksp_richardson_scale <factor>.
The default technique for orthogonalization of the Hessenberg
matrix in GMRES is the modified Gram-Schmidt method, which
employs many VecDot() operations and can thus be slow in parallel.
A fast approach is to use the
unmodified (classical) Gram-Schmidt method, which can be set
with
ierr = KSPGMRESSetOrthogonalization(KSP ksp, KSPGMRESUnmodifiedGramSchmidtOrthogonalization);or the options database command -ksp_gmres_unmodifiedgramschmidt. Note that this algorithm is numerically unstable, but may deliver much better speed performance. One can also use unmodifed Gram-Schmidt with iterative refinement, by setting the orthogonalization routine, KSPGMRESIROrthog(), by using the command line option -ksp_gmres_irorthog.
By default, KSP assumes an initial guess of zero by zeroing the initial
value for the solution vector that is given. To use a nonzero
initial guess, the user must call
ierr = KSPSetInitialGuessNonzero(KSP ksp);For the conjugate gradient method with complex numbers, there are two slightly different algorithms depending on whether the matrix is Hermitian symmetric or truly symmetric (the default is to assume that it is Hermitian symmetric). To indicate that it is symmetric, one uses the command
ierr = KSPCGSetType(KSP ksp,KSPCGType KSP_CG_SYMMETRIC);Note that this option is not valid for all matrices.
Since the rate of convergence of Krylov projection methods for a particular linear system is strongly dependent on its spectrum, preconditioning is typically used to alter the spectrum and hence accelerate the convergence rate of iterative techniques. Preconditioning can be applied to the system (1 ) by (M_L^-1 A M_R^-1) (M_R x) = M_L^-1 b, where ML and MR indicate preconditioning matrices. If ML = I in (2 ), right preconditioning results, and the residual of (1 ),
r b - Ax = b - A M_R^-1 M_R x,
is preserved. In contrast, the residual is altered for left (MR = I ) and symmetric preconditioning, as given by
r_L M_L^-1 b - M_L^-1 A x = M_L^-1 r.
By default, all KSP implementations use left preconditioning.
Right preconditioning can be activated for some methods by
using the options database command -ksp_right_pc or
calling the routine
ierr = KSPSetPreconditionerSide(KSP ksp,PCSide PC_RIGHT);Attempting to use right preconditioning for a method that does not currently support it results in an error message of the form
KSPSetUp_Richardson:No right preconditioning for KSPRICHARDSONWe summarize the defaults for the residuals used in KSP convergence monitoring within Table Preconditioning within KSP . Details regarding specific convergence tests and monitoring routines are presented in the following sections. The preconditioned residual is used by default for convergence testing of all left-preconditioned KSP methods except for the conjugate gradient, Richardson, and Chebyshev methods. For these three cases the true residual is used by default, but the preconditioned residual can be employed instead with the options database command ksp_preres or by calling the routine
ierr = KSPSetUsePreconditionedResidual(KSP ksp);(ref hs:52)(ref ss:86)(ref v:92)(ref so:89)(ref f:93)
Options Default Database Convergence Method KSPType Name Monitor Richardson KSPRICHARDSON richardson true Chebychev KSPCHEBYCHEV chebychev true Conjugate Gradient A); KSPCG cg true Generalized Minimal Residual ,Mat *B); KSPGMRES gmres precond BiCGSTAB ed Minimal Residual ,Mat *B); KSPBCGS bcgs precond Conjugate Gradient Squared l ,Mat *B); KSPCGS cgs precond Transpose-Free Quasi-Minimal Residual (1) KSPTFQMR tfqmr precond Transpose-Free Quasi-Minimal Residual (2) KSPTCQMR tcqmr precond Conjugate Residual KSPCR cr precond Least Squares Method KSPLSQR lsqr precond Shell for no KSP method KSPPREONLY preonly preconddagger true - denotes true residual norm, precond - denotes preconditioned residual norm
The default convergence test, KSPDefaultConverged(), is based on the l2-norm of the residual. Convergence (or divergence) is decided by three quantities: the relative decrease of the residual norm, rtol, the absolute size of the residual norm, atol, and the relative increase in the residual, dtol. Convergence is detected at iteration k if
r_k _2 < max ( rtol * r_0 _2, atol ),
where rk = b - A xk. Divergence is detected if
r_k _2 > dtol * r_0 _2.
These parameters, as well as the maximum number of allowable iterations,
can be set with the routine
ierr = KSPSetTolerances(KSP ksp,double rtol,double atol,double dtol,int maxits);The user can retain the default value of any of these parameters by specifying PETSC_DEFAULT as the corresponding tolerance; the defaults are rtol=10-5, atol=10-50, dtol=105, and maxits=105. These parameters can also be set from the options database with the commands -ksp_rtol <rtol>, -ksp_atol <atol>, -ksp_divtol <dtol>, and -ksp_max_it <its>.
In addition to providing an interface to a simple convergence test,
KSP allows the application programmer the flexibility to provide
customized convergence-testing routines.
The user can specify a customized
routine with the command
ierr = KSPSetConvergenceTest(KSP ksp,int (*test)(KSP ksp,int it,double rnorm,void *ctx), void *ctx);The final routine argument, ctx, is an optional context for private data for the user-defined convergence routine, test. Other test routine arguments are the iteration number, it, and the residual's l2 norm, rnorm. The routine for detecting convergence, test, should return the integer 1 for convergence, 0 for no convergence, and minus 1 (-1) on error or failure to converge.
By default, the Krylov solvers run silently without displaying information
about the iterations. The user can indicate that the norms of the residuals
should be displayed by using
-ksp_monitor within the options database.
To display the residual norms in a graphical window (running under X Windows),
one should use -ksp_xmonitor [x,y,w,h], where either all or none of
the options must be specified.
Application programmers can also provide their own routines to perform
the monitoring by using the command
ierr = KSPSetMonitor(KSP ksp,int (*mon)(KSP ksp,int it,double rnorm,void *ctx), void *ctx);The final routine argument, ctx, is an optional context for private data for the user-defined monitoring routine, mon. Other mon routine arguments are the iteration number ( it) and the residual's l2 norm ( rnorm). A helpful routine within user-defined monitors is PetscObjectGetComm((PetscObject)ksp,MPI_Comm *comm), which returns in comm the MPI communicator for the KSP context. See Chapter for more discussion of the use of MPI communicators within PETSc.
Several monitoring routines are supplied with PETSc,
including
ierr = KSPDefaultMonitor(KSP,int,double, void *); ierr = KSPSingularValueMonitor(KSP,int,double, void *); ierr = KSPTrueMonitor(KSP,int,double, void *);The default monitor simply prints an estimate of the l2-norm of the residual at each iteration. The routine KSPSingularValueMonitor() is appropriate only for use with the conjugate gradient method or GMRES, since it prints estimates of the extreme singular values of the preconditioned operator at each iteration. Since KSPTrueMonitor() prints the true residual at each iteration by actually computing the residual using the formula r = b - Ax, the routine is slow and should be used only for testing or convergence studies, not for timing. These monitors may be accessed with the command line options -ksp_monitor, -ksp_singmonitor, and -ksp_truemonitor. .
To employ the default graphical monitor, one should use the
commands
DrawLG lg; ierr = KSPLGMonitorCreate(char *display,char *title,int x,int y,int w,int h,DrawLG *lg); ierr = KSPSetMonitor(KSP ksp,KSPLGMonitor,(void *)lg);When no longer needed, the line graph should be destroyed with the command
ierr = KSPLGMonitorDestroy(DrawLG lg);The user can change aspects of the graphs with the DrawLG*() and DrawAxis*() routines. One can also access this functionality from the options database with the command -ksp_xmonitor [x,y,w,h]. Where x, y, w, h are the optional location and size of the window.
Once can cancel all hardwired monitoring routines for KSP at runtime with -ksp_cancelmonitors.
As the Krylov method converges so that the residual norm is small, say 10-10 many of the final digits printed with the -ksp_monitor option are meaningless. Worse, they are different on different machines; due to different round-off rules used by, say, the IBM RS6000 and the Sun Sparc. This makes testing between different machines difficult. The option -ksp_smonitor causes PETSc to print fewer of the digits of the residual norm as it gets smaller; thus on most of the machines it will always print the same numbers making cross processor testing easier.
Since the convergence of Krylov subspace methods depends strongly on
the spectrum (eigenvalues) of the preconditioned operator, PETSc has specific
routines for eigenvalue approximation via the Arnoldi or Lanczos iteration.
First, before the linear solve one must call
ierr = KSPSetComputeEigenvalues(KSP ksp);Then after the SLES solve one calls
ierr = KSPComputeEigenvalues(KSP ksp, int n,double *realpart,double *complexpart);Here, n is the size of the two arrays and the eigenvalues are inserted into those two arrays. There is an additional routine
ierr = KSPComputeEigenvaluesExplicitly(KSP ksp, int n,double *realpart, double *complexpart);that is useful only for very small problems. It explicitly computes the full representation of the preconditioned operator and calles LAPACK to compute its eigenvalues. It should be only used for matrices of size up to a couple hundred. The DrawSP*() routines are very useful for drawing scatter plots of the eigenvalues.
The eigenvalues may also be computed and displayed graphically with the options data base commands -ksp_plot_eigenvalues and -ksp_plot_eigenvalues_explicitly. Or they can be dumped to the screen in ASCII text via -ksp_compute_eigenvalues and -ksp_compute_eigenvalues_explicitly.
To obtain the solution vector and right hand side from a KSP
context, one uses
ierr = KSPGetSolution(KSP ksp,Vec *x); ierr = KSPGetRhs(KSP ksp,Vec *rhs);During the iterative process the solution may not yet have been calculated or it may be stored in a different location. To access the approximate solution during the iterative process, one uses the command
ierr = KSPBuildSolution(KSP ksp,Vec w,Vec *v);where the solution is returned in v. The user can optionally provide a vector in w as the location to store the vector; however, if w is PETSC_NULL, space allocated by PETSc in the KSP context is used. One should not destroy this vector. For certain KSP methods, (e.g., GMRES), the construction of the solution is expensive, while for many others it requires not even a vector copy.
Access to the residual is done in a similar way with the
command
ierr = KSPBuildResidual(KSP ksp,Vec t,Vec w,Vec *v);Again, for GMRES and certain other methods this is an expensive operation.
As discussed in Section Preconditioning within KSP
, the Krylov space methods are
typically used in conjunction with a preconditioner.
To employ a particular preconditioning method, the user can either select
it from the options database using input of the form
-pc_type <methodname> or set the method with the
command
ierr = PCSetType(PC pc,PCType method);In Table 3 we summarize the basic preconditioning methods supported in PETSc. The PCSHELL preconditioner uses a specific, application-provided preconditioner. The direct preconditioner, PCLU, is, in fact, a direct solver for the linear system that uses LU factorization. PCLU is included as a preconditioner so that PETSc has a consistent interface among direct and iterative linear solvers.
Method PCType Options Database Name Jacobi PCJACOBI jacobi Block Jacobi PCBJACOBI bjacobi Block Gauss-Seidel (sequential only) PCBGS bgs SOR (and SSOR) PCSOR sor SOR with Eisenstat trick PCEISENSTAT eisenstat Incomplete Cholesky PCICC icc Incomplete LU PCILU ilu Additive Schwarz PCASM asm Linear solver PCSLES sles Combination of preconditioners PCCOMPOSITE composite LU PCLU lu No preconditioning PCNONE none Shell for user-defined PC PCSHELL shell
Some of the options for ILU preconditioner are
ierr = PCILUSetLevels(PC pc,int levels); ierr = PCILUSetReuseReordering(PC pc,PetscTruth flag); ierr = PCILUSetUseDropTolerance(PC pc,double dt,int dtcount); ierr = PCILUSetReuseFill(PC pc,PetscTruth flag); ierr = PCILUSetUseInPlace(PC pc);
When repeatedly solving linear systems with the same SLES
context, one can reuse some information computed
during the first linear solve.
In particular, PCILUSetReuseReordering() causes the reordering (for example, set with
-mat_order order) computed in the first factorization to be reused
for later factorizations.
The PCILUSetReuseFill() causes the
fill computed during the first drop tolerance factorization to be reused
in later factorizations. PCILUSetUseInPlace() is often used with
PCASM or PCBJACOBI when zero fill is used, since it reuses the
matrix space to store the incomplete factorization it saves memory and
copying time. Note that in-place factorization is not appropriate with
any ordering besides natural and cannot be used with the drop tolerance
factorization. These options may be set in the database with
-pc_ilu_levels <levels> -pc_ilu_reuse_reordering -pc_ilu_use_drop_tolerance <dt>,<dtcount> -pc_ilu_reuse_fill -pc_ilu_in_place -pc_ilu_nonzeros_along_diagonal
See Section Sparse Matrix Factorization for information on preallocation of memory for anticipated fill during factorization. By alleviating the considerable overhead for dynamic memory allocation, such tuning can significantly enhance performance.
PETSc supports incomplete factorization preconditioners for several matrix types for the uniprocessor case. In addition, for the parallel case we provide an interface to the ILU and ICC preconditioners of BlockSolve95 [(ref bs-user-ref)]. PETSc enables users to employ the preconditioners within BlockSolve95 by using the BlockSolve95 matrix format MATMPIROWBS and invoking either the PCILU or PCICC method within the linear solvers. Since PETSc automatically handles matrix assembly, preconditioner setup, profiling, etc., users who employ BlockSolve95 through the PETSc interface need not concern themselves with many details provided within the BlockSolve95 users manual. Consult the file docs/installation.html for details on installing PETSc to allow the use of BlockSolve95.
One can create a matrix that is compatible with BlockSolve95 by using
MatCreate() with the option -mat_mpirowbs, or by directly
calling
ierr = MatCreateMPIRowbs(MPI_Comm comm,int m,int M,int nz,int *nnz,void *proci,Mat *A)A is the newly created matrix, while the arguments m and M indicate the number of local and global rows, respectively. Either the local or global parameter can be replaced with PETSC_DECIDE, so that PETSc will determine it. The matrix is stored with a fixed number of rows on each processor, given by m, or determined by PETSc if m is PETSC_DECIDE. The arguments nz and nnz can be used to preallocate storage space, as discussed in Section Creating and Assembling Matrices for increasing the efficiency of matrix assembly; one sets nz=0 and nzz=PETSC_NULL for PETSc to control all matrix memory allocation. The argument proci is an optional BlockSolve95 BSprocinfo context; most users should set this parameter to PETSC_NULL, so that PETSc will create and initialize this context.
If the matrix is symmetric, one may call
ierr = MatSetOption(Mat mat,MAT_SYMMETRIC);to improve efficiency, but in this case one cannot use the ILU preconditioner, only ICC.
Internally, PETSc inserts zero elements into matrices of the MATMPIROWBS format if necessary, so that nonsymmetric matrices are considered to be symmetric in terms of their sparsity structure; this format is required for use of the parallel communication routines within BlockSolve95. In particular, if the matrix element A[i,j] exists, then PETSc will internally allocate a 0 value for the element A[j,i] during MatAssemblyEnd() if the user has not already set a value for the matrix element A[j,i] .
When manipulating a preconditioning matrix, A, BlockSolve95 internally works with a scaled and permuted matrix, hatA = P D-1/2 A D-1/2, where D is the diagonal of A, and P is a permutation matrix determined by a graph coloring for efficient parallel computation. Thus, when solving a linear system, Ax=b, using ILU/ICC preconditioning and the matrix format MATMPIROWBS for both the linear system matrix and the preconditioning matrix, one actually solves the scaled and permuted system hatA hatx = hatb , where hatx = P D1/2 x and hatb = P D-1/2 b . PETSc handles the internal scaling and permutation of x and b , so the user does not deal with these conversions, but instead always works with the original linear system. In this case, by default the scaled residual norm is monitored; one must use the option -ksp_bsmonitor to print both the scaled and unscaled residual norms. Note: If one is using ILU/ICC via BlockSolve95 and the MATMPIROWBS matrix format for the preconditioner matrix, but using a different format for a different linear system matrix, then this scaling and permuting is done only internally during the application of the preconditioner; ksp_bsmonitor should not be used in this case.
PETSc does not provide a parallel SOR, it can only be used on sequential matrices or as the subblock preconditioner when using block Jacobi or ASM preconditioning, see below.
The options for SOR
preconditioning are
ierr = PCSORSetOmega(PC pc,double omega); ierr = PCSORSetIterations(PC pc,int its); ierr = PCSORSetSymmetric(PC pc,MatSORType type);The first of these commands sets the relaxation factor for successive over (under) relaxation. The second command sets the number of inner iterations of SOR, given by its, to use between steps of the Krylov space method. The third command sets the kind of SOR sweep, where the argument type can be one of SOR_FORWARD_SWEEP, SOR_BACKWARD_SWEEP or SOR_SYMMETRIC_SWEEP, the default being SOR_FORWARD_SWEEP. Setting the type to be SOR_SYMMETRIC_SWEEP produces the SSOR method. In addition, each processor can locally and independently perform the specified variant of SOR with the types SOR_LOCAL_FORWARD_SWEEP, SOR_LOCAL_BACKWARD_SWEEP, and SOR_LOCAL_SYMMETRIC_SWEEP. These variants can also be set with the options -pc_sor_omega <omega>, -pc_sor_its <its>, -pc_sor_backward, -pc_sor_symmetric, -pc_sor_local_forward, -pc_sor_local_backward, and -pc_sor_local_symmetric.
The Eisenstat trick [(ref eisenstat81)] for SSOR preconditioning can be employed with the method PCEISENSTAT ( -pc_type eisenstat). By using both left and right preconditioning of the linear system, this variant of SSOR requires about half of the floating-point operations for conventional SSOR. The option -pc_eisenstat_diagonal_scaling) (or the routine PCEisenstatUseDiagonalScaling()) activates diagonal scaling in conjunction with Eisenstat SSOR method, while the option -pc_eisenstat_omega <omega> (or the routine PCEisenstatSetOmega(PC pc,double omega)) sets the SSOR relaxation coefficient, omega, as discussed above.
The LU preconditioner provides several options. The first, given by
the
command
ierr = PCLUSetUseInPlace(PC pc);causes the factorization to be performed in-place and hence destroys the original matrix. The options database variant of this command is -pc_lu_in_place. Another direct preconditioner option is selecting the ordering of equations with the command
-mat_order <ordering>The possible orderings are
The sparse LU factorization provided in PETSc does not perform pivoting for numerical stability (since they are designed to preserve nonzero structure), thus occasionally a LU factorization will fail with a zero pivot when, in fact, the matrix is non-singular. The option -pc_lu_nonzeros_along_diagonal <tol> will often help eliminate the zero pivot, by preprocessing the the column ordering to remove small values from the diagonal. Here, tol is an optional tolerance to decide if a value is nonzero; by default it is 1.e-10.
In addition, Section Sparse Matrix Factorization provides information on preallocation of memory for anticipated fill during factorization. Such tuning can significantly enhance performance, since it eliminates the considerable overhead for dynamic memory allocation.
The block Jacobi and overlapping additive Schwarz methods in PETSc are
supported in parallel; however, only the uniprocessor
version of the block Gauss-Seidel method is currently in place.
By default, the PETSc implentations of these methods
employ ILU(0) factorization on each individual block ( that is , the default solver on each
subblock is PCType=PCILU,
KSPType=KSPPREONLY); the user can set alternative linear solvers via the options
-sub_ksp_type and -sub_pc_type. In fact, all of the KSP
and PC options can be applied to the subproblems by inserting the prefix
-sub_ at the beginning of the option name.
These options database commands set the particular options for all
of the blocks within the global problem. In addition, the routines
ierr = PCBJacobiGetSubSLES(PC pc,int *n_local,int *first_local,SLES **subsles); ierr = PCBGSGetSubSLES(PC pc,int *n_local,int *first_local,SLES **subsles); ierr = PCASMGetSubSLES(PC pc,int *n_local,int *first_local,SLES **subsles);extract the SLES context for each local block. The argument n_local is the number of blocks on the calling processor, and first_local indicates the global number of the first block on the processor. The blocks are numbered successively by processors from zero through gb-1 , where gb is the number of global blocks. The array of SLES contexts for the local blocks is given by subsles. This mechanism enables the user to set different solvers for the various blocks. To set the appropriate data structures, the user must explicitly call SLESSetUp() before calling PCBJacobiGetSubSLES(), PCBGSGetSubSLES(), or PCASMGetSubSLES(). For further details, see the example ${}PETSC_DIR/src/sles/examples/tutorials/ex7.c.
The block Jacobi, block Gauss-Seidel, and additive Schwarz
preconditioners allow the user
to set the number of blocks into which the problem is divided. The
options database commands to set this value are -pc_bjacobi_blocks n
and -pc_bgs_blocks n, and, within a program, the corresponding routines
are
ierr = PCBJacobiSetTotalBlocks(PC pc,int blocks,int *size); ierr = PCBGSSetTotalBlocks(PC pc,int blocks,int *size); ierr = PCASMSetTotalSubdomains(PC pc,int n,IS *is); ierr = PCASMSetType(PC pc,PCASMType type);The optional argument size, is an array indicating the size of each block. Currently, for certain parallel matrix formats, only a single block per processor is supported. However, the MATMPIAIJ and MATMPIBAIJ formats support the use of general blocks as long as no blocks are shared among processors. The is argument contains the index sets that define the subdomains.
PCASMType is one of PC_ASM_BASIC, PC_ASM_INTERPOLATE, PC_ASM_RESTRICT, PC_ASM_NONE and may also be set with the options database -pc_asm_type [basic,interpolate,restrict,none]. The type PC_ASM_BASIC (or -pc_asm_type basic) corresponds to the standard additive Schwarz method that uses the full restriction and interpolation operators. The type PC_ASM_RESTRICT (or -pc_asm_type restrict) uses a full restriction operator, but during the interpolation process ignores the off-processor values. Similarly, PC_ASM_INTERPOLATE (or -pc_asm_type interpolate) uses a limited restriction process in conjunction with a full interpolation, while PC_ASM_NONE (or -pc_asm_type none) ignores off-processor valies for both restriction and interpolation. The ASM types with limited restriction or interpolation were suggested by Xiao-Chuan Cai. PC_ASM_RESTRICT is the PETSc default, as it saves substantial communication and for many problems has the added benefit of requiring fewer iterations for convergence than the standard additive Schwarz method.
The user can also set the number of blocks and sizes on a per-processor
basis with the commands
ierr = PCBJacobiSetLocalBlocks(PC pc,int blocks,int *size); ierr = PCBGSSetLocalBlocks(PC pc,int blocks,int *size); ierr = PCASMSetLocalSubdomains(PC pc,int N,IS *is);For the ASM preconditioner one can use the following command to set the overlap to compute in constructing the subdomains.
ierr = PCASMSetOverlap(PC pc,int overlap);The overlap defaults to 1, so if one desires that no additional overlap be computed beyond what may have been set with a call to PCASMSetTotalSubdomains() or PCASMSetLocalSubdomains(), then overlap must be set to be 0. In particular, if one does not explicitly set the subdomains in an application code, then all overlap would be computed internally by PETSc, and using an overlap of 0 would result in an ASM variant that is equivalent to the block Jacobi preconditioner. Note that one can define initial index sets is with any overlap via PCASMSetTotalSubdomains() or PCASMSetLocalSubdomains(); the routine PCASMSetOverlap() merely allows PETSc to extend that overlap further if desired.
The shell preconditioner simply uses an application-provided routine to
implement the preconditioner. To set this routine, one uses the
command
ierr = PCShellSetApply(PC pc,int (*apply)(void *ctx,Vec,Vec),void *ctx);The final argument ctx is a pointer to the application-provided data structure needed by the preconditioner routine. The three routine arguments of apply() are this context, the input vector, and the output vector, respectively.
The PC type PCCOMPOSITE allows one to form new preconditioners by combining already defined preconditioners and solvers. Combining preconditioners usually requires some experimentation to find a combination of preconditioners that works better than any single method. It is a tricky business and is not recommended until your application code is complete and running and you are trying to improve performance. In many cases using a single preconditioner is better than a combination; an exception is the multigrid/multilevel preconditioners (solvers) that are always combinations of some sort, see Section Multigrid Preconditioners .
Let B1 and B2 represent the application of two
preconditioners of type type1 and type2. The preconditioner
B = B1 + B2 can be obtained with
ierr = PCSetType(pc,PCCOMPOSITE); ierr = PCCompositeAddPC(pc,type1); ierr = PCCompositeAddPC(pc,type2);Any number of preconditioners may added in this way.
This way of combining preconditioners is called additive, since
the actions of the preconditioners are added together. This is the
default behavior. An alternative can be set with the option
ierr = PCCompositeSetType(PC pc,PCCompositeType PC_COMPOSITE_MULTIPLICATIVE);In this form the new residual is updated after the application of each preconditioner and the next preconditioner applied to the next residual. For example, with two composed preconditioners: B1 and B2; y = B x is obtained from
y = B1 x
w1 = x - A y
y = y + B2 w1
Loosely, this corresponds to a Gauss-Siedel iteration, while additive corresponds to a Jacobi like.
Under most circumstances the multiplicative form requires one-half the number of iterations as the additive form; but the multiplicative form does require the application of A inside the preconditioner.
In the multiplicative version, the calculation of the residual inside the
preconditioner can be done in two ways: using the original linear system matrix
or using the matrix used to build the preconditioners B1, B2, etc.
By default it uses the ``preconditioner matrix'', to use the true matrix use the
option
ierr = PCCompositeSetUseTrue(PC pc);The individual preconditioners can be accessed (in order to set options) via
ierr = PCCompositeGetPC(PC pc,int count,PC *subpc);For example, to set the first sub preconditioners to use ILU(1)
PC subpc; ierr = PCCompositeGetPC(pc,0,&subpc); ierr = PCILUSetFill(subpc,1);
These various options can also be set via the options database. For example, -pc_type composite -pc_composite_pcs jacobi,ilu causes the composite preconditioner to be used with two preconditioners: Jacobi and ILU. The option -pc_composite_type multiplicative initiates the multiplicative version of the algorithm, while -pc_composite_type additive the additive version. Using the true preconditioner is obtained with the option -pc_composite_true. One sets options for the subpreconditioners with the extra prefix -sub_N_ where N is the number of the subpreconditioner. For example, -sub_0_pc_ilu_fill 0.
PETSc also allows a preconditioner to be a complete linear solver. This is
achieved with the PCSLES type.
ierr = PCSetType(PC pc,PCSLES PCSLES); ierr = PCSLESGetSLES(pc,&sles); /* set any SLES/KSP/PC options */From the command line one can use 5 iterations of bi-CG-stab with ILU(0) preconditioning as the preconditioner with -pc_type sles -sub_pc_type ilu -sub_ksp_max_it 5 -sub_ksp_type bcgs.
By default the inner SLES preconditioner uses the outter ``preconditioner matrix'',
as the matrix to be solved in the linear system, to use the true matrix use the
option
ierr = PCSLESSetUseTrue(PC pc);at the command line with -pc_sles_true.
Naturally one can use a SLES preconditioner inside a composite preconditioner. For example, -pc_type composite -pc_composite_pcs ilu,sles -sub_1_pc_type jacobi -sub_1_ksp_max_it 10 uses two preconditioners: ILU(0) and 10 iterations of GMRES with Jacobi preconditioning. Though it is not clear whether one would ever wish to do such a thing.
A large suite of routines is available for using multigrid as a preconditioner. In the PC framework the user is required to provide the coarse grid solver, smoothers, restriction, and interpolation, as well as the code to calculate residuals. The PC component allows all of that to be wrapped up into a PETSc compliant preconditioner. We fully support both matrix-free and matrix-based multigrid solvers.
A multigrid preconditioner is created with the four commands
ierr = SLESCreate(MPI_Comm comm,SLES *sles); ierr = SLESGetPC(SLES sles,PC *pc); ierr = PCSetType(PC pc,PCMG); ierr = MGSetLevels(pc,int levels);A large number of parameters affect the multigrid behavior. The command
ierr = MGSetType(PC pc,MGType mode);indicates which form of multigrid to apply [(ref 1sbg)].
For standard V or W-cycle multigrids, one sets the
mode to be MGMULTIPLICATIVE; for the
additive form (which in certain cases reduces to the BPX method, or additive
multilevel Schwarz, or multilevel diagonal scaling), one uses
MGADDITIVE as the mode. For a variant
of full multigrid, one can
use MGFULL, and for the Kaskade
algorithm MGKASKADE.
For the multiplicative and full multigrid options, one can use a
W-cycle by calling
ierr = MGSetCycles(PC pc,int cycles);with a value of MG_W_CYCLE for cycles. The commands above can also be set from the options database. The option names are -pc_mg_method [multiplicative, additive, full, kaskade], and -pc_mg_cycles cycles.
The user can control the amount of pre- and postsmoothing
by using
either the options
-pc_mg_smoothup m and -pc_mg_smoothdown n or
the routines
ierr = MGSetNumberSmoothUp(PC pc,int m); ierr = MGSetNumberSmoothDown(PC pc,int n);Note that if the command MGSetSmoother() (discussed below) has been employed, the same amounts of pre- and postsmoothing will be used.
The remainder of the multigrid routines, which determine
the solvers and interpolation/restriction operators that are used,
are mandatory.
To set the coarse grid solver, one must
call
ierr = MGGetCoarseSolve(PC pc,SLES *sles);and set the appropriate options in sles. Similarly, the smoothers are set by calling
ierr = MGGetSmoother(PC pc,int level,SLES *sles);and setting the various options in sles. To use a different pre- and postsmoother, one should call the following routines instead operations to be matrix free (see Section Matrix-Free Matrices ), he or she should make sure that these operations are defined. Note that this system is arranged so that if the interpolation is the transpose of the restriction, the same mat argument can be passed to both MGSetRestriction() and MGSetInterpolation().
On each level except the coarsest, one must also set the routine to
compute the residual. The following command suffices:
MGSetResidual(PC pc,int level,int (*residual)(Mat,Vec,Vec,Vec),Mat mat);The residual() function can be set to be MGDefaultResidual() if one's operator is stored in a Mat format. In certain circumstances, where it is much cheaper to calculate the residual directly, rather than through the usual formula b - Ax, the user may wish to provide an alternative.
Finally, the user must provide three work vectors for each level
(except on the finest, where only the residual work vector is required).
The work vectors are set with the
commands
ierr = MGSetRhs(PC pc,int level,Vec b); ierr = MGSetX(PC pc,int level,Vec x); ierr = MGSetR(PC pc,int level,Vec r);The user is responsible for freeing these vectors once the iteration is complete.
The solution of large-scale nonlinear problems pervades many facets of computational science and demands robust and flexible solution strategies. The SNES component of PETSc provides a powerful suite of data-structure-neutral numerical routines for such problems. Built on top of the linear solvers and data structures discussed in preceding chapters, SNES enables the user to easily customize the nonlinear solvers according to the application at hand. Also, the SNES interface is identical for the uniprocessor and parallel cases; the only difference in the parallel version is that each processor typically forms only its local contribution to various matrices and vectors.
SNES includes methods for solving systems of nonlinear equations of the form F( x) = 0, where F: , Ren rightarrow Ren. SNES also contains solvers for unconstrained minimization problems of the form min { f( x) }, where f: , Ren rightarrow Re. Newton-like methods provide the core of the package, including both line search and trust region techniques, which are discussed further in Section The Nonlinear Solvers . Following the PETSc design philosophy, the interfaces to the various solvers are all virtually identical. In addition, the SNES software is completely flexible, so that the user can at runtime change any facet of the solution process.
The general form of the n-dimensional Newton's method for solving
(3
) is
x_k+1 = x_k - [ F'( x_k)]^-1 F( x_k), k=0,1, ...,
where x0 is an initial approximation to the solution and
F'(xk) is nonsingular.
In practice, the Newton iteration (5
) is implemented by
the following two steps:
1. & (Approximately) solve F'( x_k) x_k = - F( x_k).
2. & Update x_k+1 = x_k + x_k.
Similarly, the general form of Newton's method for solving (4
) is
x_k+1 = x_k - [ ^2 f( x_k)]^-1 f( x_k),
k=0,1, ...,
where x0 in , Ren is an initial approximation
to the solution, and nabla2 f(xk) is positive definite.
The iteration (7
) is usually implemented by
1. & (Approximately) solve ^2 f( x_k) x_k = - f( x_k).
2. & Update x_k+1 = x_k + x_k.
In the simplest usage of the nonlinear solvers, the user must merely provide a C, C++, or Fortran routine to evaluate the nonlinear function of Equation (3 ) or (4 ). The corresponding Jacobian matrix (or gradient and Hessian matrix) can be approximated with finite differences. For codes that are typically more efficient and accurate, the user can provide a routine to compute the Jacobian (or gradient and Hessian); details regarding these application-provided routines are discussed below. To provide an overview of the use of the nonlinear solvers, we first introduce a complete and simple example in Figure 13 , corresponding to ${}PETSC_DIR/src/snes/examples/tutorials/ex1.c. Note that the procedures for solving systems of nonlinear equations and unconstrained minimization problems are quite similar. We present the details unique to each class of problems in Sections Solving Systems of Nonlinear Equations and Solving Unconstrained Minimization Problems .
#ifdef PETSC_RCS_HEADER static char vcid[] = "$Id: ex1.c,v 1.9 1997/10/19 03:30:04 bsmith Exp $"; #endif static char help[] = "Uses Newton's method to solve a two-variable system.\n\n"; /*T Concepts: SNES^Solving a system of nonlinear equations (basic uniprocessor example); Routines: SNESCreate(); SNESSetFunction(); SNESSetJacobian(); SNESGetSLES(); Routines: SNESSolve(); SNESSetFromOptions(); Routines: SLESGetPC(); SLESGetKSP(); KSPSetTolerances(); PCSetType(); Processors: 1 T*/ /* Include "snes.h" so that we can use SNES solvers. Note that this file automatically includes: petsc.h - base PETSc routines vec.h - vectors sys.h - system routines mat.h - matrices is.h - index sets ksp.h - Krylov subspace methods viewer.h - viewers pc.h - preconditioners sles.h - linear solvers */ #include "snes.h" /* User-defined routines */ int FormJacobian(SNES,Vec,Mat*,Mat*,MatStructure*,void*); int FormFunction(SNES,Vec,Vec,void*); int main( int argc, char **argv ) { SNES snes; /* nonlinear solver context */ SLES sles; /* linear solver context */ PC pc; /* preconditioner context */ KSP ksp; /* Krylov subspace method context */ Vec x, r; /* solution, residual vectors */ Mat J; /* Jacobian matrix */ int ierr, its, size; Scalar pfive = .5; PetscInitialize( &argc, &argv,(char *)0,help ); MPI_Comm_size(PETSC_COMM_WORLD,&size); if (size != 1) SETERRA(1,0,"This is a uniprocessor example only!"); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Create nonlinear solver context - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ ierr = SNESCreate(PETSC_COMM_WORLD,SNES_NONLINEAR_EQUATIONS,&snes); CHKERRA(ierr); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Create matrix and vector data structures; set corresponding routines - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ /* Create vectors for solution and nonlinear function */ ierr = VecCreateSeq(PETSC_COMM_SELF,2,&x); CHKERRA(ierr); ierr = VecDuplicate(x,&r); CHKERRA(ierr); /* Create Jacobian matrix data structure */ ierr = MatCreate(PETSC_COMM_SELF,2,2,&J); CHKERRA(ierr); /* Set function evaluation routine and vector. */ ierr = SNESSetFunction(snes,r,FormFunction,PETSC_NULL); CHKERRA(ierr); /* Set Jacobian matrix data structure and Jacobian evaluation routine */ ierr = SNESSetJacobian(snes,J,J,FormJacobian,PETSC_NULL); CHKERRA(ierr); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Customize nonlinear solver; set runtime options - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ /* Set linear solver defaults for this problem. By extracting the SLES, KSP, and PC contexts from the SNES context, we can then directly call any SLES, KSP, and PC routines to set various options. */ ierr = SNESGetSLES(snes,&sles); CHKERRA(ierr); ierr = SLESGetKSP(sles,&ksp); CHKERRA(ierr); ierr = SLESGetPC(sles,&pc); CHKERRA(ierr); ierr = PCSetType(pc,PCNONE); CHKERRA(ierr); ierr = KSPSetTolerances(ksp,1.e-4,PETSC_DEFAULT,PETSC_DEFAULT,20); CHKERRA(ierr); /* Set SNES/SLES/KSP/PC runtime options, e.g., -snes_view -snes_monitor -ksp_type <ksp> -pc_type <pc> These options will override those specified above as long as SNESSetFromOptions() is called _after_ any other customization routines. */ ierr = SNESSetFromOptions(snes); CHKERRA(ierr); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Evaluate initial guess; then solve nonlinear system - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ /* Note: The user should initialize the vector, x, with the initial guess for the nonlinear solver prior to calling SNESSolve(). In particular, to employ an initial guess of zero, the user should explicitly set this vector to zero by calling VecSet(). */ ierr = VecSet(&pfive,x); CHKERRA(ierr); ierr = SNESSolve(snes,x,&its); CHKERRA(ierr); PetscPrintf(PETSC_COMM_SELF,"number of Newton iterations = %d\n\n", its); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Free work space. All PETSc objects should be destroyed when they are no longer needed. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ ierr = VecDestroy(x); CHKERRA(ierr); ierr = VecDestroy(r); CHKERRA(ierr); ierr = MatDestroy(J); CHKERRA(ierr); ierr = SNESDestroy(snes); CHKERRA(ierr); PetscFinalize(); return 0; } /* ------------------------------------------------------------------- */ /* FormFunction - Evaluates nonlinear function, F(x). Input Parameters: . snes - the SNES context . x - input vector . dummy - optional user-defined context (not used here) Output Parameter: . f - function vector */ int FormFunction(SNES snes,Vec x,Vec f,void *dummy) { int ierr; Scalar *xx, *ff; /* Get pointers to vector data. - For default PETSc vectors, VecGetArray() returns a pointer to the data array. Otherwise, the routine is implementation dependent. - You MUST call VecRestoreArray() when you no longer need access to the array. */ ierr = VecGetArray(x,&xx); CHKERRQ(ierr); ierr = VecGetArray(f,&ff); CHKERRQ(ierr); /* Compute function */ ff[0] = xx[0]*xx[0] + xx[0]*xx[1] - 3.0; ff[1] = xx[0]*xx[1] + xx[1]*xx[1] - 6.0; /* Restore vectors */ ierr = VecRestoreArray(x,&xx); CHKERRQ(ierr); ierr = VecRestoreArray(f,&ff); CHKERRQ(ierr); return 0; } /* ------------------------------------------------------------------- */ /* FormJacobian - Evaluates Jacobian matrix. Input Parameters: . snes - the SNES context . x - input vector . dummy - optional user-defined context (not used here) Output Parameters: . jac - Jacobian matrix . B - optionally different preconditioning matrix . flag - flag indicating matrix structure */ int FormJacobian(SNES snes,Vec x,Mat *jac,Mat *B,MatStructure *flag,void *dummy) { Scalar *xx, A[4]; int ierr, idx[2] = {0,1}; /* Get pointer to vector data */ ierr = VecGetArray(x,&xx); CHKERRQ(ierr); /* Compute Jacobian entries and insert into matrix. - Since this is such a small problem, we set all entries for the matrix at once. */ A[0] = 2.0*xx[0] + xx[1]; A[1] = xx[0]; A[2] = xx[1]; A[3] = xx[0] + 2.0*xx[1]; ierr = MatSetValues(*jac,2,idx,2,idx,A,INSERT_VALUES); CHKERRQ(ierr); *flag = SAME_NONZERO_PATTERN; /* Restore vector */ ierr = VecRestoreArray(x,&xx); CHKERRQ(ierr); /* Assemble matrix */ ierr = MatAssemblyBegin(*jac,MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); ierr = MatAssemblyEnd(*jac,MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); return 0; }
ierr = SNESCreate(MPI_Comm comm,SNES_NONLINEAR_EQUATIONS,SNES *snes); ierr = SNESCreate(MPI_Comm comm,SNES_UNCONSTRAINED_MINIMIZATION,SNES *snes);When solving a system of nonlinear equations, the user must then set routines for evaluating the function of equation (3 ) and its associated Jacobian matrix. Likewise, when solving an unconstrained minimization problem, the user must indicate routines for computing the function of Equation (4 ), as well as the corresponding gradient and Hessian. Such details are discussed in Sections Solving Systems of Nonlinear Equations and Solving Unconstrained Minimization Problems .
To choose a nonlinear solution method, the user can either
call
ierr = SNESSetType(SNES snes,SNESType method);or use the the option -snes_type <method>, where details regarding the available methods are presented in Section The Nonlinear Solvers . The application code can take complete control of the linear and nonlinear techniques used in the Newton-like method by calling
ierr = SNESSetFromOptions(snes);This routine provides an interface to the PETSc options database, so that at runtime the user can select a particular nonlinear solver, set various parameters and customized routines (e.g., specialized line search variants), prescribe the convergence tolerance, and set monitoring routines. With this routine the user can also control all linear solver options in the SLES, KSP, and PC modules, as discussed in Chapter SLES: Linear Equations Solvers .
After having set these routines and options, the user
solves the problem by calling
ierr = SNESSolve(SNES snes,Vec x,int *iters);where iters is the number of nonlinear iterations required for convergence and x indicates the solution vector. The user should initialize this vector to the initial guess for the nonlinear solver prior to calling SNESSolve(). In particular, to employ an initial guess of zero, the user should explicitly set this vector to zero by calling VecSet(). Finally, after solving the nonlinear system (or several systems), the user should destroy the SNES context with
ierr = SNESDestroy(SNES snes);
When solving a system of nonlinear equations, the user must provide
a vector, f, for storing the function of
Equation (3
), as well as a routine that evaluates this
function at the vector x. This information should be set with
the command
ierr = SNESSetFunction(SNES snes,Vec f, int (*FormFunction)(SNES snes,Vec x,Vec f,void *ctx),void *ctx);The argument ctx is an optional user-defined context, which can store any private, application-specific data required by the function evaluation routine; PETSC_NULL should be used if such information is not needed. In C and C++, a user-defined context is merely a structure in which various objects can be stashed; in Fortran a user context can be an integer array that contains both parameters and pointers to PETSc objects. ${}PETSC_DIR/src/snes/examples/tutorials/ex5.c and ${}PETSC_DIR/src/snes/examples/ex5f.F give examples of user-defined application contexts in C and Fortran, respectively.
The user must also specify a routine to form some approximation of the
Jacobian matrix, A, at the current iterate, x,
as is typically done with
ierr = SNESSetJacobian(SNES snes,Mat A,Mat B,int (*FormJacobian)(SNES snes,Vec x, Mat *A,Mat *B,MatStructure *flag,void *ctx),void *ctx);The arguments of the routine FormJacobian() are the current iterate, x; the Jacobian matrix, A; the preconditioner matrix, B (which is usually the same as A); a flag indicating information about the preconditioner matrix structure; and an optional user-defined Jacobian context, ctx, for application-specific data. The options for flag are identical to those for the flag of SLESSetOperators(), discussed in Section Using SLES . Note that the SNES solvers are all data-structure neutral, so the full range of PETSc matrix formats (including ``matrix-free'' methods) can be used. Chapter Matrices discusses information regarding available matrix formats and options, while Section Matrix-Free Methods focuses on matrix-free methods in SNES. We briefly touch on a few details of matrix usage that are particularly important for efficient use of the nonlinear solvers.
During successive calls to FormJacobian(), the user can either insert new matrix contexts or reuse old ones, depending on the application requirements. For many sparse matrix formats, reusing the old space (and merely changing the matrix elements) is more efficient; however, if the matrix structure completely changes, creating an entirely new matrix context may be preferable. Upon subsequent calls to the FormJacobian() routine, the user may wish to reinitialize the matrix entries to zero by calling MatZeroEntries(). See Section Other Matrix Operations for details on the reuse of the matrix context.
If the preconditioning matrix retains identical nonzero structure during successive nonlinear iterations, setting the parameter, flag, in the FormJacobian() routine to be SAME_NONZERO_PATTERN and reusing the matrix context can save considerable overhead. For example, when one is using a parallel preconditioner such as incomplete factorization in solving the linearized Newton systems for such problems, matrix colorings and communication patterns can be determined a single time and then reused repeatedly throughout the solution process. In addition, if using different matrices for the actual Jacobian and the preconditioner, the user can hold the preconditioner matrix fixed for multiple iterations by setting flag to SAME_PRECONDITIONER. See the discussion of SLESSetOperators() in Section Using SLES for details.
The directory ${}PETSC_DIR/src/snes/examples/tutorials provides a variety of examples.
As previously discussed, use of SNES for solving systems of
nonlinear equations and unconstrained minimization problems is quite
similar. When solving minimization problems, the user typically
provides routines for evaluating the function, gradient, and Hessian
corresponding to Equation (4
). The routine to evaluate
the scalar minimization function, f(x), should be set with
ierr = SNESSetMinimizationFunction(SNES snes, int (*FormMinFunction)(SNES snes,Vec x,double *f,void *ctx),void *ctx);The gradient vector, g(x), and gradient evaluation routine should be set with
ierr = SNESSetGradient(SNES snes,Vec g, int (*FormGradient)(SNES snes,Vec x,Vec g,void *ctx),void *ctx);In these routines, the argument ctx specifies an optional context for application-specific data, as described in Section Solving Systems of Nonlinear Equations .
The user must also set a routine to form some approximation of the Hessian
matrix, A, as is typically done with
ierr = SNESSetHessian(SNES snes,Mat A,Mat B,int (*FormHessian)(SNES snes,Vec x, Mat *A,Mat *B,MatStructure *flag,void *ctx),void *ctx);The arguments of the routine FormHessian() are the current iterate, x; the Hessian matrix, A; the preconditioner matrix, B (which is usually the same as A); a flag indicating information about the preconditioner structure; and an optional user-defined Hessian context, ctx. Reuse of matrix and preconditioner data during successive iterations of the nonlinear solvers is often critical for achieving good performance. This topic is discussed in detail for the case of solving systems of nonlinear equations in Section Solving Systems of Nonlinear Equations ; the options are identical for solving unconstrained minimization problems, and thus are not repeated here.
The directory ${}PETSC_DIR/src/snes/examples/tests/umin provides examples of solving unconstrained minimization problems.
As summarized in Table 13 , SNES includes several Newton-like nonlinear solvers based on line search techniques and trust region methods. The methods for solving systems of nonlinear equations and unconstrained minimization problems employ the prefixes SNES_EQ and SNES_UM, respectively.
Each solver may have associated with it a set of options, which can be set with routines and options database commands provided for this purpose. A complete list can be found by consulting the manual pages or by running a program with the -help option; we discuss just a few in the sections below.
Method SNES Type Options Name Default Convergence Test Line search SNESEQLS ls SNESConvergedEQLS() Trust region SNESEQTR tr SNESConvergedEQTR() Test Jacobian SNESEQTEST test Line search SNESUMLS umls SNESConvergedUMLS() Trust region SNESUMTR umtr SNESConvergedUMTR()
The method SNES_EQ_NLS ( -snes_type ls) provides a line
search Newton method for solving systems of nonlinear equations. By
default, this technique employs cubic backtracking [(ref dennis:83)].
An alternative line search routine can be set with the command
ierr = SNESSetLineSearch(SNES snes, int (*ls)(SNES,Vec,Vec,Vec,Vec,double,double*,double*));Other line search methods provided by PETSc are SNESNoLineSearch() and SNESQuadraticLineSearch(), which can be set with the option -snes_eq_ls [basic,quadratic,cubic]. The line search routines involve several parameters, which are set to defaults that are reasonable for many applications. The user can override the defaults by using the options -snes_eq_ls_alpha <alpha>, -snes_eq_ls_maxstep <max>, and -snes_eq_ls_steptol <tol>.
The method SNES_UM_NLS ( -snes_type umls) provides a line search Newton method for solving unconstrained minimization problems. The default line search algorithm is taken from More and Thuente [(ref more:92)]. Again, the user can set a variety of parameters to control the line search; one should run a SNES program with the option -help for details. Users may write their own customized line search codes by modeling them after one of the defaults provided by PETSc.
The most basic trust region method in SNES for solving systems of nonlinear equations, SNES_EQ_NTR (-snes_type tr), is taken from the MINPACK project [(ref more84)]. Several parameters can be set to control the variation of the trust region size during the solution process. In particular, the user can control the initial trust region radius, computed by
= _0 F_0 _2,
by setting Delta0 via the option -snes_eq_tr_delta0 <delta0>.
The default trust region method for unconstrained minimization, SNES_UM_NTR ( -snes_type umtr), is based on the work of Steihaug [(ref steihaug:83)]. This method uses the preconditioned conjugate gradient method via the KSP solver KSPQCG to determine the approximate minimizer of the resulting quadratic at each nonlinear iteration. This formulation requires the use of a symmetric preconditioner, where the currently available options are Jacobi, incomplete Cholesky, and the null preconditioners, which can be set with the options -pc_type jacobi, -pc_type icc, and -pc_type none, respectively.
This section discusses options and routines that apply to all SNES solvers and problem classes. In particular, we focus on convergence tests, monitoring routines, and tools for checking derivative computations.
Convergence of the nonlinear solvers can be detected in a variety of ways; the user can even specify a customized test, as discussed below. The default convergence routines for the various nonlinear solvers within SNES are listed in Table 13 ; see the corresponding manual pages for detailed descriptions. Each of these convergence tests involves several parameters, which are set by default to values that should be reasonable for a wide range of problems. The user can customize the parameters to the problem at hand by using some of the following routines and options.
One method of convergence testing is
to declare convergence when the norm of the change in the solution
between successive iterations is less than some tolerance, stol.
Convergence can also be determined based on the norm of the function
(or gradient for a minimization problem).
Such a test can use either the absolute size of the
norm, atol, or its relative decrease, rtol, from an initial
guess. The following routine sets these parameters, which are used
in many of the default SNES convergence tests:
ierr = SNESSetTolerances(SNES snes,double rtol,double atol,double stol, int its,int fcts);This routine also sets the maximum numbers of allowable nonlinear iterations, its, and function evaluations, fcts. The corresponding options database commands for setting these parameters are -snes_atol <atol>, -snes_rtol <rtol>, -snes_stol <stol>, -snes_max_it <its>, and -snes_max_funcs <fcts>. A related routine is SNESGetTolerances().
Convergence tests for trust regions methods often use an additional
parameter that indicates the minimium allowable trust region radius.
The user can set this parameter with the option -snes_trtol <trtol>
or with the routine
ierr = SNESSetTrustRegionTolerance(SNES snes,double trtol);An additional parameter is sometimes used for unconstrained minimization problems, namely the minimum function tolerance, ftol, which can be set with the option -snes_fmin <ftol> or with the routine
ierr = SNESSetMinimizationFunctionTolerance(SNES snes,double ftol);Users can set their own customized convergence tests in SNES by using the command
ierr = SNESSetConvergenceTest(SNES snes,int (*test)(SNES snes,double xnorm, double gnorm,double f,void *cctx),void *cctx);The final argument of the convergence test routine, cctx, denotes an optional user-defined context for private data. When solving systems of nonlinear equations, the arguments xnorm, gnorm, and f are the current iterate norm, current step norm, and function norm, respectively. Likewise, when solving unconstrained minimization problems, the arguments xnorm, gnorm, and f are the current iterate norm, current gradient norm, and the function value.
By default the SNES solvers run silently without displaying information
about the iterations. The user can initiate monitoring with the
command
ierr = SNESSetMonitor(SNES snes,int (*mon)(SNES,int its,double norm,void* mctx), void *mctx);The routine, mon, indicates a user-defined monitoring routine, where its and mctx respectively denote the iteration number and an optional user-defined context for private data for the monitor routine. The argument norm is the function norm (or gradient norm for unconstrained minimization problems).
The routine set by SNESSetMonitor() is called once after every successful step computation within the nonlinear solver. Hence, the user can employ this routine for any application-specific computations that should be done after the solution update. The option -snes_monitor activates the default SNES monitor routine, SNESDefaultMonitor(), while -snes_xmonitor draws a simple line graph of the residual norm's convergence.
Once can cancel all hardwired monitoring routines for SNES at runtime with -snes_cancelmonitors.
As the Newton method converges so that the residual norm is small, say 10-10, many of the final digits printed with the -snes_monitor option are meaningless. Worse, they are different on different machines; due to different round-off rules used by, say, the IBM RS6000 and the Sun Sparc. This makes testing between different machines difficult. The option -snes_smonitor causes PETSc to print fewer of the digits of the residual norm as it gets smaller; thus on most of the machines it will always print the same numbers making cross processor testing easier.
The routines
ierr = SNESGetSolution(SNES snes,Vec *x); ierr = SNESGetFunction(SNES snes,Vec *r);return the solution vector and function vector from a SNES context. These routines are useful, for instance, if the convergence test requires some property of the solution or function other than those passed with routine arguments.
Since hand-coding routines for Jacobian and Hessian matrix evaluation can be error prone, SNES provides easy-to-use support for checking these matrices against finite difference versions. In the simplest form of comparison, users can employ the option -snes_type test to compare the matrices at several points. Although not exhaustive, this test will generally catch obvious problems. One can compare the elements of the two matrices by using the option -snes_test_display , which causes the two matrices to be printed to the screen.
Another means for verifying the correctness of a code for Jacobian or Hessian computation is running the problem with either the finite difference or matrix-free variant, -snes_fd or -snes_mf. see Section Finite Difference Jacobian Approximations or Section Matrix-Free Methods ). If a problem converges well with these matrix approximations but not with a user-provided routine, the problem probably lies with the hand-coded matrix.
Since exact solution of the linear Newton systems within (5 ) and (7 ) at each iteration can be costly, modifications are often introduced that significantly reduce these expenses and yet retain the rapid convergence of Newton's method. Inexact or truncated Newton techniques approximately solve the linear systems using an iterative scheme. In comparison with using direct methods for solving the Newton systems, iterative methods have the virtue of requiring little space for matrix storage and potentially saving significant computational work. Within the class of inexact Newton methods, of particular interest are Newton-Krylov methods, where the subsidiary iterative technique for solving the Newton system is chosen from the class of Krylov subspace projection methods. Note that at runtime the user can set any of the linear solver options discussed in Chapter SLES: Linear Equations Solvers , such as -ksp_type <ksp_method> and -pc_type <pc_method>, to set the Krylov subspace and preconditioner methods.
Two levels of iterations occur for the inexact techniques, where during each global or outer Newton iteration a sequence of subsidiary inner iterations of a linear solver is performed. Appropriate control of the accuracy to which the subsidiary iterative method solves the Newton system at each global iteration is critical, since these inner iterations determine the asymptotic convergence rate for inexact Newton techniques. While the Newton systems must be solved well enough to retain fast local convergence of the Newton's iterates, use of excessive inner iterations, particularly when | xk - x* | is large, is neither necessary nor economical. Thus, the number of required inner iterations typically increases as the Newton process progresses, so that the truncated iterates approach the true Newton iterates.
A sequence of nonnegative numbers {etak} can be used to indicate the variable convergence criterion. In this case, when solving a system of nonlinear equations, the update step of the Newton process remains unchanged, and direct solution of the linear system is replaced by iteration on the system until the residuals
r_k^(i) = F'( x_k) x_k + F( x_k)
satisfy
r_k^(i) F( x_k) _k < 1.
Here x0 is an initial approximation of the solution, and | · | denotes an arbitrary norm in Ren .
By default a constant relative convergence tolerance is used for solving the subsidiary linear systems within the Newton-like methods of SNES. When solving a system of nonlinear equations, one can instead employ the techniques of Eisenstat and Walker [(ref ew94)] to compute etak at each step of the nonlinear solver by using the option -snes_ksp_ew_conv . In addition, by adding one's own KSP convergence test (see Section Convergence Tests ), one can easily create one's own, problem-dependent, inner convergence tests.
SNES fully supports matrix-free methods. The matrices specified in the Jacobian and Hessian evaluation routine need not be conventional matrices; instead, they can point to the data required to implement a particular matrix-free method. The matrix-free variant is allowed only when the linear systems are solved by an iterative method in combination with no preconditioning ( PCNONE or -pc_type none), a user-provided preconditioner matrix, or a user-provided preconditioner shell ( PCSHELL, discussed in Section Preconditioners ); that is, obviously matrix-free methods cannot be used if a direct solver is to be employed.
The user can create a matrix-free context for use within SNES with
the routine
ierr = SNESDefaultMatrixFreeMatCreate(SNES snes,Vec x, Mat *mat);This routine creates the data structures needed for the matrix-vector products that arise within Krylov space iterative methods [(ref brownsaad:90)] by employing the matrix type MATSHELL, discussed in Section Matrix-Free Matrices . The default SNES matrix-free approximations can also be invoked with the command -snes_mf. Or, one can retain the user-provided Jacobian preconditioner, but replace the user-provided Jacobian matrix with the default matrix free variant with the option -snes_mf_operator.
The user can set two parameters to control the Jacobian-vector
product approximation with the command
ierr = SNESSetMatrixFreeParameters(SNES snes,double rerror,double umin);The parameter rerror should be set to the square root of the relative error in the function evaluations, erel; the default is 10-8 , which assumes that the functions are evaluated to full double precision accuracy. The second parameter, umin (or umin), is a bit more involved; its default is 10-8 . The Jacobian-vector product is approximated via the formula
F'(u) a F(u + h*a) - F(u)h
where h is computed via
h = e_rel*u^Ta/||a||^2_2 & if |u'a| > u_min*||a||_1
= e_rel*u_min*sign(u^Ta)*||a||_1/||a||^2_2 & otherwise.
This approach is taken from Brown and Saad [(ref brownsaad:90)].
These parameters can also be set from the options database with
-snes_mf_err <err> -snes_mf_umin <umin>Note that setting these parameter appropriately is crucial for achieving fast convergence with matrix-free Newton-Krylov methods.
We include an example in Figure 14 that explicitly uses a matrix-free approach. Note that by using the option -snes_mf one can easily convert any SNES code to use a matrix-free Newton-Krylov method without a preconditioner. As shown in this example, SNESSetFromOptions() must be called after SNESSetJacobian() to enable runtime switching between the user-specified Jacobian and the default SNES matrix-free form.
Table Matrix-Free Methods summarizes the various matrix situations that SNES supports. In particular, different linear system matrices and preconditioning matrices are allowed, as well as both matrix-free and application-provided preconditioners. All combinations are possible, as demonstrated by the example, ${}PETSC_DIR/src/snes/examples/ex5.c, in Figure 14 .
Matrix Use Conventional Matrix Formats Matrix-Free Versions Jacobian Create matrix with MatCreate(). ^* Create matrix with MatCreateShell(). (or Hessian) Assemble matrix with user-defined Use MatShellSetOperation() to set Matrix routine. ^ various matrix actions. Or use SNESDefaultMatrixFreeMatCreate(). Preconditioning Create matrix with MatCreate(). ^* Use SNESGetSLES() and SLESGetPC() Matrix Assemble matrix with user-defined to access the PC, then use routine. ^ PCSetType(pc,PCSHELL); followed by PCSetApply().
* Use either the generic MatCreate() or a format-specific variant
such as MatCreateMPIAIJ().
\dagger Set user-defined matrix formation routine with SNESSetJacobian() or
SNESSetHessian().
Jacobian and Hessian Matrix Options
#ifdef PETSC_RCS_HEADER static char vcid[] = "$Id: ex6.c,v 1.49 1997/11/28 16:22:00 bsmith Exp $"; #endif static char help[] = "Uses Newton-like methods to solve u`` + u^{2} = f. Different\n\ matrices are used for the Jacobian and the preconditioner. The code also\n\ demonstrates the use of matrix-free Newton-Krylov methods in conjunction\n\ with a user-provided preconditioner. Input arguments are:\n\ -snes_mf : Use matrix-free Newton methods\n\ -user_precond : Employ a user-defined preconditioner. Used only with\n\ matrix-free methods in this example.\n\n"; /*T Concepts: SNES^Using different matrices for the Jacobian and preconditioner; Concepts: SNES^Using matrix-free methods and a user-provided preconditioner; Routines: SNESCreate(); SNESSetFunction(); SNESSetJacobian(); Routines: SNESSolve(); SNESSetFromOptions(); SNESGetSLES(); Routines: SLESGetPC(); PCSetType(); PCShellSetApply(); PCSetType(); Processors: 1 T*/ /* Include "snes.h" so that we can use SNES solvers. Note that this file automatically includes: petsc.h - base PETSc routines vec.h - vectors sys.h - system routines mat.h - matrices is.h - index sets ksp.h - Krylov subspace methods viewer.h - viewers pc.h - preconditioners sles.h - linear solvers */ #include "snes.h" #include <math.h> /* User-defined routines */ int FormJacobian(SNES,Vec,Mat*,Mat*,MatStructure*,void*); int FormFunction(SNES,Vec,Vec,void*); int MatrixFreePreconditioner(void*,Vec,Vec); int main( int argc, char **argv ) { SNES snes; /* SNES context */ SLES sles; /* SLES context */ PC pc; /* PC context */ Vec x, r, F; /* vectors */ Mat J, JPrec; /* Jacobian, preconditioner matrices */ int ierr, its, n = 5, i, size, flg; double h, xp = 0.0; Scalar v, pfive = .5; PetscInitialize( &argc, &argv,(char *)0,help ); MPI_Comm_size(PETSC_COMM_WORLD,&size); if (size != 1) SETERRA(1,0,"This is a uniprocessor example only!"); ierr = OptionsGetInt(PETSC_NULL,"-n",&n,&flg); CHKERRA(ierr); h = 1.0/(n-1); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Create nonlinear solver context - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ ierr = SNESCreate(PETSC_COMM_WORLD,SNES_NONLINEAR_EQUATIONS,&snes); CHKERRA(ierr); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Create vector data structures; set function evaluation routine - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ ierr = VecCreate(PETSC_COMM_SELF,PETSC_DECIDE,n,&x); CHKERRA(ierr); ierr = VecDuplicate(x,&r); CHKERRA(ierr); ierr = VecDuplicate(x,&F); CHKERRA(ierr); ierr = SNESSetFunction(snes,r,FormFunction,(void*)F); CHKERRA(ierr); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Create matrix data structures; set Jacobian evaluation routine - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ ierr = MatCreateSeqAIJ(PETSC_COMM_SELF,n,n,3,PETSC_NULL,&J); CHKERRA(ierr); ierr = MatCreateSeqAIJ(PETSC_COMM_SELF,n,n,1,PETSC_NULL,&JPrec); CHKERRA(ierr); /* Note that in this case we create separate matrices for the Jacobian and preconditioner matrix. Both of these are computed in the routine FormJacobian() */ ierr = SNESSetJacobian(snes,J,JPrec,FormJacobian,0); CHKERRA(ierr); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Customize nonlinear solver; set runtime options - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ /* Set preconditioner for matrix-free method */ ierr = OptionsHasName(PETSC_NULL,"-snes_mf",&flg); CHKERRA(ierr); if (flg) { ierr = SNESGetSLES(snes,&sles); CHKERRA(ierr); ierr = SLESGetPC(sles,&pc); CHKERRA(ierr); ierr = OptionsHasName(PETSC_NULL,"-user_precond",&flg); CHKERRA(ierr); if (flg) { /* user-defined precond */ ierr = PCSetType(pc,PCSHELL); CHKERRA(ierr); ierr = PCShellSetApply(pc,MatrixFreePreconditioner,PETSC_NULL);CHKERRA(ierr); } else {ierr = PCSetType(pc,PCNONE); CHKERRA(ierr);} } ierr = SNESSetFromOptions(snes); CHKERRA(ierr); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Initialize application: Store right-hand-side of PDE and exact solution - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ xp = 0.0; for ( i=0; i<n; i++ ) { v = 6.0*xp + pow(xp+1.e-12,6.0); /* +1.e-12 is to prevent 0^6 */ ierr = VecSetValues(F,1,&i,&v,INSERT_VALUES); CHKERRA(ierr); xp += h; } /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Evaluate initial guess; then solve nonlinear system - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ ierr = VecSet(&pfive,x); CHKERRA(ierr); ierr = SNESSolve(snes,x,&its); CHKERRA(ierr); PetscPrintf(PETSC_COMM_SELF,"number of Newton iterations = %d\n\n", its ); /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Free work space. All PETSc objects should be destroyed when they are no longer needed. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ ierr = VecDestroy(x); CHKERRA(ierr); ierr = VecDestroy(r); CHKERRA(ierr); ierr = VecDestroy(F); CHKERRA(ierr); ierr = MatDestroy(J); CHKERRA(ierr); ierr = MatDestroy(JPrec); CHKERRA(ierr); ierr = SNESDestroy(snes); CHKERRA(ierr); PetscFinalize(); return 0; } /* ------------------------------------------------------------------- */ /* FormInitialGuess - Forms initial approximation. Input Parameters: user - user-defined application context X - vector Output Parameter: X - vector */ int FormFunction(SNES snes,Vec x,Vec f,void *dummy) { Scalar *xx, *ff,*FF,d; int i, ierr, n; ierr = VecGetArray(x,&xx); CHKERRQ(ierr); ierr = VecGetArray(f,&ff); CHKERRQ(ierr); ierr = VecGetArray((Vec)dummy,&FF); CHKERRQ(ierr); ierr = VecGetSize(x,&n); CHKERRQ(ierr); d = (double) (n - 1); d = d*d; ff[0] = xx[0]; for ( i=1; i<n-1; i++ ) { ff[i] = d*(xx[i-1] - 2.0*xx[i] + xx[i+1]) + xx[i]*xx[i] - FF[i]; } ff[n-1] = xx[n-1] - 1.0; ierr = VecRestoreArray(x,&xx); CHKERRQ(ierr); ierr = VecRestoreArray(f,&ff); CHKERRQ(ierr); ierr = VecRestoreArray((Vec)dummy,&FF); CHKERRQ(ierr); return 0; } /* ------------------------------------------------------------------- */ /* FormJacobian - This routine demonstrates the use of different matrices for the Jacobian and preconditioner Input Parameters: . snes - the SNES context . x - input vector . ptr - optional user-defined context, as set by SNESSetJacobian() Output Parameters: . A - Jacobian matrix . B - different preconditioning matrix . flag - flag indicating matrix structure */ int FormJacobian(SNES snes,Vec x,Mat *jac,Mat *prejac,MatStructure *flag, void *dummy) { Scalar *xx, A[3], d; int i, n, j[3], ierr; ierr = VecGetArray(x,&xx); CHKERRQ(ierr); ierr = VecGetSize(x,&n); CHKERRQ(ierr); d = (double)(n - 1); d = d*d; /* Form Jacobian. Also form a different preconditioning matrix that has only the diagonal elements. */ i = 0; A[0] = 1.0; ierr = MatSetValues(*jac,1,&i,1,&i,&A[0],INSERT_VALUES); CHKERRQ(ierr); ierr = MatSetValues(*prejac,1,&i,1,&i,&A[0],INSERT_VALUES); CHKERRQ(ierr); for ( i=1; i<n-1; i++ ) { j[0] = i - 1; j[1] = i; j[2] = i + 1; A[0] = d; A[1] = -2.0*d + 2.0*xx[i]; A[2] = d; ierr = MatSetValues(*jac,1,&i,3,j,A,INSERT_VALUES); CHKERRQ(ierr); ierr = MatSetValues(*prejac,1,&i,1,&i,&A[1],INSERT_VALUES); CHKERRQ(ierr); } i = n-1; A[0] = 1.0; ierr = MatSetValues(*jac,1,&i,1,&i,&A[0],INSERT_VALUES); CHKERRQ(ierr); ierr = MatSetValues(*prejac,1,&i,1,&i,&A[0],INSERT_VALUES); CHKERRQ(ierr); ierr = MatAssemblyBegin(*jac,MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); ierr = MatAssemblyBegin(*prejac,MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); ierr = MatAssemblyEnd(*jac,MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); ierr = MatAssemblyEnd(*prejac,MAT_FINAL_ASSEMBLY); CHKERRQ(ierr); ierr = VecRestoreArray(x,&xx); CHKERRQ(ierr); *flag = SAME_NONZERO_PATTERN; return 0; } /* ------------------------------------------------------------------- */ /* MatrixFreePreconditioner - This routine demonstrates the use of a user-provided preconditioner. This code implements just the null preconditioner, which of course is not recommended for general use. Input Parameters: . ctx - optional user-defined context, as set by PCShellSetApply() . x - input vector Output Parameter: . y - preconditioned vector */ int MatrixFreePreconditioner(void *ctx,Vec x,Vec y) { int ierr; ierr = VecCopy(x,y); CHKERRQ(ierr); return 0; }
PETSc provides some tools to help approximate the Jacobian matrices efficiently via finite differences. These tools are intended for use in certain situations where one is unable to compute Jacobian matrices analytically, and matrix-free methods do not work well without a preconditioner, due to very poor conditioning. The approximation requires several steps:
ISColoring iscoloring; MatFDColoring fdcoloring; MatStructure str;/* This initializes the nonzero structure of the Jacobian. This is artificial because clearly if we had a routine to compute the Jacobian we wouldn't need to use finite differences. */ FormJacobian(snes,x,&J,&J,&str,&user);
/* Color the matrix, i.e. determine groups of columns that share no common rows. These columns in the Jacobian can all be computed simulataneously. */ MatGetColoring(J,COLORING_SL,&iscoloring);
/* Create the data structure that SNESDefaultComputeJacobianWithColoring() uses to compute the actual Jacobians via finite differences. */ MatFDColoringCreate(J,iscoloring,&fdcoloring); ISColoringDestroy(iscoloring); MatFDColoringSetFromOptions(fdcoloring);
/* Tell SNES to use the routine SNESDefaultComputeJacobianWithColoring() to compute Jacobians. */ SNESSetJacobian(snes,J,J,SNESDefaultComputeJacobianWithColoring,fdcoloring);
Of course, we are cheating a bit. If we do not have an analytic formula for computing the Jacobian, then how do we know what its nonzero structure is so that it may be colored? Determining the structure is problem dependent, but fortunately, for most grid-based problems (the class of problems for which PETSc is designed) if one knows the stencil used for the nonlinear function one can usually fairly easily obtain an estimate of the location of nonzeros in the matrix.
One need not necessarily use the routine MatGetColoring() to determine a coloring. For example, if a grid can be colored directly (without using the associated matrix), then that coloring can be provided to MatFDColoringCreate(). Note that the user must always preset the nonzero structure in the matrix regardless of which coloring routine is used.
For sequential matrices PETSc provides three matrix coloring routines from the
MINPACK package [(ref more84)]: smallest-last ( sl), largest-first ( lf),
and incidence-degree ( id). These colorings, as well as the ``natural'' coloring
for which each column has its own unique color, may be accessed with the command line options
-mat_coloring [sl,id,lf,natural]Alternatively, one can set a coloring type of COLORING_SL, COLORING_ID, COLORING_LF, or COLORING_NATURAL when calling MatGetColoring().
As for the matrix-free computation of Jacobians (see Section
Matrix-Free Methods
), two parameters affect the accuracy of the
finite difference Jacobian approximation. These are set with the command
ierr = MatFDColoringSetParameters(MatFDColoring fdcoloring,double rerror,double umin);The parameter rerror is the square root of the relative error in the function evaluations, erel; the default is 10-8 , which assumes that the functions are evaluated to full double-precision accuracy. The second parameter, umin, is a bit more involved; its default is 10e-8 . Column i of the Jacobian matrix (denoted by F_:i) is approximated by the formula
F'_:i F(u + h*dx_i) - F(u)h
where h is computed via
h = e_rel*u_i & if |u_i| > u_min
h = e_rel*u_min*sign(u_i) & otherwise.
These parameters may be set from the options database with
-mat_fd_coloring_err <err> -mat_fd_coloring_umin <umin>
Note that the MatGetColoring() routine currently works only on sequential routines. Extensions may be forthcoming. However, if one can compute the coloring iscoloring some other way, the routine MatFDColoringCreate() does work in parallel. An example of this for 2D distributed arrays is given below that uses the utility routine DAGetColoring().
ierr = DAGetColoring(da,&iscoloring,&J); ierr = MatFDColoringCreate(J,iscoloring,&fdcoloring); ierr = MatFDColoringSetFromOptions(fdcoloring); ierr = ISColoringDestroy(iscoloring);Note that the routine MatFDColoringCreate() currently is only supported for the AIJ matrix format.
The TS component provides a framework for the scalable solution of ODEs arising from the discretization of time-dependent PDEs, and of steady-state problems using pseudo-timestepping.
Time-Dependent Problems: Consider the ODE
u_t = F(u,t),
where u is a finite-dimensional vector, usually obtained from discretizing a PDE with finite differences, finite elements, etc. For example, discretizing the heat equation
u_t = u_xx
with centered finite differences results in
(u_i)_t = u_i+1 - 2 u_i + u_i-1h^2.
The TS component provides code to solve these equations (currently using the forward or backward Euler method) as well as an interface to other sophisticated ODE solvers, in a clean and easy manner, where the user need only provide code for the evaluation of F(u,t) and (optionally) its associated Jacobian matrix.
Steady-State Problems: In addition, TS provides a general code for performing pseudo timestepping with a variable timestep at each physical node point. For example, instead of directly attacking the steady-state problem
F(u) = 0,
we can use pseudo-transient continuation by solving
u_t = F(u).
By using time differencing with the backward Euler method, we obtain
u^n+1 - u^ndt^n = F(u^n+1).
More generally we can consider a diagonal matrix Dtn that has a pseudo-timestep for each node point to obtain the series of nonlinear equations
Dt^n^-1(u^n+1 - u^n) = F(u^n+1).
For this problem the user must provide F(u) and the diagonal matrix Dtn, (or optionally, if the timestep is position independent, a scalar timestep) as well as optionally the Jacobian of F(u).
The user first creates a TS object with the command
ierr = int TSCreate(MPI_Comm comm,TSProblemType problemtype,TS *ts);The TSProblemType is one of TS_LINEAR or TS_NONLINEAR, to indicate whether F(u,t) is given by a matrix A, or A(t), or a function F(u,t).
One can set the solution method with the routine
ierr = TSSetType(TS ts,TSType type);Currently supported types are TS_EULER, TS_BEULER, and TS_PSEUDO or the command line option -ts_type euler, beuler, pseudo.
Set the initial time and timestep with the command
ierr = TSSetInitialTimeStep(TS ts,double time,double dt);One can change the timestep with the command
ierr = TSSetTimeStep(TS ts,double dt);One can determine the current timestep with the routine
ierr = TSGetTimeStep(TS ts,double* dt);Here, ``current'' refers to the timestep being used to attempt to promote the solution form un to un+1.
One sets the total number of timesteps to run or the total time to run
(whatever is first) with the command
ierr = TSSetDuration(TS ts,int maxsteps,double maxtime);One sets up the timestep context with
ierr = TSSetUp(TS ts);destroys it with
ierr = TSDestroy(TS ts);and views it with
ierr = TSView(TS ts,Viewer viewer);
To set up TS for solving an ODE, one must set the following:
ierr = TSSetSolution(TS ts, Vec initialsolution);The vector initialsolution should contain the ``initial conditions'' for the ODE.
ierr = TSSetRHSMatrix(TS ts,Mat A, Mat B,int (*f)(TS,double,Mat*,Mat*, MatStructure*,void*),void *fP);The matrix B (although usually the same as A) allows one to provide a different matrix to be used in the construction of the preconditioner. The function f is used to form the matrices A and B at each timestep if the matrices are time dependent. If the matrix does not depend on time, the user should pass in PETSC_NULL for f. The variable fP allows users to pass in an application context that is passed to the f() function whenever it is called, as the final argument. The user must provide the matrices A and B; if they have the right-hand side only as a linear function, they must construct a MatShell matrix. Note that this is the same interface as that for SNESSetJacobian().
bullet For nonlinear problems (or linear problems solved using explicit timestepping methods) the user passes the function with the routine
ierr = TSSetRHSFunction(TS ts,int (*f)(TS,double,Vec,Vec,void*),void *fP);The arguments to the function f() are the timestep context, the current time, the input for the function, the output for the function, and the (optional) user-provided context variable fP.
ierr = TSSetRHSJacobian(TS ts,Mat A, Mat B,int (*f)(TS,double,Vec,Mat*,Mat*, MatStructure*,void*),void *fP);The arguments for the function f() are the timestep context, the current time, the location where the Jacobian is to be computed, the Jacobian matrix, an alternative approximate Jacobian matrix used as a preconditioner, and the optional user-provided context, passed in as fP. The user must provide the Jacobian as a matrix; thus, if using a matrix-free approach is used, the user must create a MatShell matrix. Again, note the similarity to SNESSetJacobian().
PVODE is a parallel ODE solver developed by Hindmarsh et al. at LLNL. The TS component provides an interface to use PVODE directly from PETSc. (To install PETSc to use PVODE, see the installation guide, docs/installation.html.)
To use the PVODE integrators, call
ierr = TSSetType(TS ts,TSType TS_PVODE);or use the command line option -ts_type pvode.
PVODE comes with to main integrator families, Adams and BDF (backward
differentiation formula). One can select these with
ierr = TSPvodeSetType(TS ts,TSPvodeType [PVODE_ADAMS,PVODE_BDF]);or the command line option -ts_pvode_type <adams,bdf>. BDF is the default.
PVODE does not use the SNES component of PETSc for its nonlinear
solvers, so one cannot change the nonlinear solver options via
SNES. Rather, PVODE uses the preconditioners within the PC component
of PETSc, which can be accessed via
ierr = TSPvodeGetPC(TS ts,PC *pc);The user can then directly set preconditioner options; alternatively, the usual runtime options can be employed via -pc_xxx.
Finally, one can set the PVODE tolerances via
ierr = TSPvodeSetTolerance(TS ts,double abs,double rel);where abs denotes the absolute tolerance and rel the relative tolerance.
Other PETSc-PVode options include
ierr = TSPVodeSetGramSchmidtType(TS ts,TSPVodeGramSchmidtType type);where type is either PVODE_MODIFIED_GS or PVODE_UNMODIFIED_GS. This may be set via the options data base with -ts_pvode_gramschmidt_type <modifed,unmodified>.
The routine
ierr = TSPVodeSetGMRESRestart(TS ts,int restart);sets the number of vectors in the Krylov subpspace used by GMRES. This may be set in the options database with -ts_pvode_gmres_restart restart.
For solving steady-state problems with pseudo-timestepping one proceeds as follows.
ierr = TSSetRHSFunction(TS ts,int (*f)(TS,double,Vec,Vec,void*),void *fP);The arguments to the function f() are the timestep context, the current time, the input for the function, the output for the function and the (optional) user-provided context variable fP.
ierr = TSSetRHSJacobian(TS ts,Mat A, Mat B,int (*f)(TS,double,Vec,Mat*,Mat*, MatStructure*,void*),void *fP);The arguments for the function f() are the timestep context, the current time, the location where the Jacobian is to be computed, the Jacobian matrix, an alternative approximate Jacobian matrix used as a preconditioner, and the optional user-provided context, passed in as fP. The user must provide the Jacobian as a matrix; thus, if using a matrix-free approach, one must create a MatShell matrix.
ierr = TSPseudoSetTimeStep(TS ts,int(*dt)(TS,double*,void*),void* dtctx);The function dt is a user-provided function that computes the next pseudo-timestep. As a default one can use TSPseudoDefaultTimeStep(TS,double*,void*) for dt. This routine updates the pseudo-timestep with one of two strategies: the default
dt^n = dt_increment*dt^n-1*|| F(u^n-1) |||| F(u^n)||
or, the alternative,
dt^n = dt_increment*dt^0*|| F(u^0) |||| F(u^n)||
which can be set with the call
ierr = TSPseudoIncrementDtFromInitialDt(TS ts);or the option -ts_pseudo_increment_dt_from_initial_dt. The value dt_increment is by default 1.1, but can be reset with the call
ierr = TSPseudoSetTimeStepIncrement(TS ts,double inc);or the option -ts_pseudo_increment <inc>.
PETSc graphics components are not intended to compete with high-quality graphics packages. Instead, they are intended to be easy to use interactively with PETSc programs. We urge users to generate their publication-quality graphics using a professional graphics package. If a user wants to hook certain packages in PETSc, he or she should send a message to petsc-maint@mcs.anl.gov, and we will see whether it is reasonable to try to provide direct interfaces.
For drawing predefined PETSc objects such as matrices and vectors, one must
first create a viewer using the
command
ierr = ViewerDrawOpenX(MPI_Comm comm,char *display,char *title,int x,int y,int w, int h,Viewer *viewer);This viewer may be passed to any of the XXXView() routines. To draw into the viewer, one must obtain the Draw object with the command
ierr = ViewerDrawGetDraw(Viewer viewer,Draw *draw);Then one can call any of the DrawXXX commands on the draw object. If one obtains the draw object in this manner, one does not call the DrawOpenX() command discussed below.
Predefined viewers, VIEWER_DRAWX_WORLD and VIEWER_DRAWX_SELF, may be used at any time. Their initial use will cause the appropriate window to be created.
By default, PETSc drawing tools employ a private colormap, which remedies the problem of poor color choices for contour plots due to an external program's mangling of the colormap (e.g, Netscape tends to do this). Unfortunately, this causes flashing of colors as the mouse is moved between the PETSc windows and other windows. Alternatively, a shared colormap can be used via the option -draw_x_shared_colormap.
One can open a window that is not associated with a viewer directly
under the X11 Window System with the
command
ierr = DrawOpenX(MPI_Comm comm,char *display,char *title,int x,int y,int w, int h,Draw *win);All drawing routines are done relative to the windows coordinate system and viewport. By default the drawing coordinates are from (0,0) to (1,1), where (0,0) indicates the lower left corner of the window. The application program can change the window coordinates with the command
ierr = DrawSetCoordinates(Draw win,double xl,double yl,double xr,double yr);By default, graphics will be drawn in the entire window. To restrict the drawing to a portion of the window, one may use the command
ierr = DrawSetViewPort(Draw win,double xl,double yl,double xr,double yr);These arguments, which indicate the fraction of the window in which the drawing should be done, must satisfy 0 le tt xl le tt xr le 1 and 0 le tt yl le tt yr le 1.
To draw a line, one uses
the command
ierr = DrawLine(Draw win,double xl,double yl,double xr,double yr,int cl);The argument cl indicates the color (which is an integer between 0 and 255) of the line. A list of predefined colors may be found in include/draw.h and includes DRAW_BLACK, DRAW_RED, DRAW_BLUE etc.
To ensure that all graphics actually have been displayed, one should use
the
command
ierr = DrawFlush(Draw win);When displaying by using double buffering, which is set with the command
ierr = DrawSetDoubleBuffer(Draw win);all processors must call
ierr = DrawSynchronizedFlush(Draw win);in order to swap the buffers. From the options database one may use -draw_pause n, which causes the PETSc application to pause n seconds at each DrawPause(). A time of -1 indicates that the application should pause until receiving mouse input from the user.
Text can be drawn with either of the two
commands
ierr = DrawString(Draw win,double x,double y,int color,char *text); ierr = DrawStringVertical(Draw win,double x,double y,int color,char *text);The user can set the text font size or determine it with the commands
ierr = DrawStringSetSize(Draw win,double width,double height); ierr = DrawStringGetSize(Draw win,double *width,double *height);
PETSc includes a set of routines for manipulating simple two-dimensional
graphs. These routines, which begin with DrawAxisDraw(), are usually
not used directly by the application programmer. Instead, the programmer
employs the line graph routines to draw simple line graphs.
As shown in the program, within Figure 15
, line graphs
are created with the command
ierr = DrawLGCreate(Draw win,int ncurves,DrawLG *ctx);The argument ncurves indicates how many curves are to be drawn. Points can be added to each of the curves with the command
ierr = DrawLGAddPoint(DrawLG ctx,double *x,double *y);The arguments x and y are arrays containing the next point value for each curve. Several points for each curve may be added with
ierr = DrawLGAddPoints(DrawLG ctx,int n,double **x,double **y);The line graph is drawn (or redrawn) with the command
ierr = DrawLGDraw(DrawLG ctx);A line graph that is no longer needed can be destroyed with the command
ierr = DrawLGDestroy(DrawLG ctx);To plot new curves, one can reset a linegraph with the command
ierr = DrawLGReset(DrawLG ctx);The line graph automatically determines the range of values to display on the two axes. The user can change these defaults with the command
ierr = DrawLGSetLimits(DrawLG ctx,double xmin,double xmax,double ymin,double ymax);It is also possible to change the display of the axes and to label them. This procedure is done by first obtaining the axes context with the command
ierr = DrawLGGetAxis(DrawLG ctx,DrawAxis *axis);One can set the axes' colors and labels, respectively, by using the commands
ierr = DrawAxisSetColors(DrawAxis axis,int axis_lines,int ticks,int text); ierr = DrawAxisSetLabels(DrawAxis axis,char *top,char *x,char *y);
#ifdef PETSC_RCS_HEADER static char vcid[] = "$Id: ex3.c,v 1.28 1997/10/10 04:04:50 bsmith Exp $"; #endif static char help[] = "Plots a simple line graph\n"; #include "petsc.h" #include <math.h> int main(int argc,char **argv) { Draw draw; DrawLG lg; DrawAxis axis; int n = 20, i, ierr, x = 0, y = 0, width = 300, height = 300,flg; char *xlabel, *ylabel, *toplabel; double xd, yd; xlabel = "X-axis Label";toplabel = "Top Label";ylabel = "Y-axis Label"; PetscInitialize(&argc,&argv,(char*)0,help); OptionsGetInt(PETSC_NULL,"-width",&width,&flg); OptionsGetInt(PETSC_NULL,"-height",&height,&flg); OptionsGetInt(PETSC_NULL,"-n",&n,&flg); OptionsHasName(PETSC_NULL,"-nolabels",&flg); if (flg) { xlabel = (char *)0; toplabel = (char *)0; } ierr = DrawOpenX(PETSC_COMM_SELF,0,"Title",x,y,width,height,&draw);CHKERRA(ierr); ierr = DrawLGCreate(draw,1,&lg); CHKERRA(ierr); ierr = DrawLGGetAxis(lg,&axis); CHKERRA(ierr); ierr = DrawAxisSetColors(axis,DRAW_BLACK,DRAW_RED,DRAW_BLUE); CHKERRA(ierr); ierr = DrawAxisSetLabels(axis,toplabel,xlabel,ylabel); CHKERRA(ierr); for ( i=0; i<n ; i++ ) { xd = (double)( i - 5 ); yd = xd*xd; ierr = DrawLGAddPoint(lg,&xd,&yd); CHKERRA(ierr); } ierr = DrawLGIndicateDataPoints(lg); CHKERRA(ierr); ierr = DrawLGDraw(lg); CHKERRA(ierr); ierr = DrawFlush(draw); CHKERRA(ierr); PetscSleep(2); ierr = DrawLGDestroy(lg); CHKERRA(ierr); ierr = DrawDestroy(draw); CHKERRA(ierr); PetscFinalize(); return 0; }
For both the linear and nonlinear solvers default routines allow one to graphically monitor convergence of the iterative method. These are accessed via the command line with -ksp_xmonitor and -snes_xmonitor. See also Sections Convergence Monitoring and Convergence Monitoring .
The two functions used are KSPLGMonitor() and KSPLGMonitorCreate() . These can easily be modified to serve specialized needs.
PETSc contains some code to generate output in Postscript and VRML (Virtual Reality Modeling Language). This code is currently undergoing revision, but is available for the adventurous. Stay tuned for future developments.
To disable all x-window-based graphics, edit the file ${}PETSC_DIR/bmake/${}PETSC_ARCH/base and remove the flag -DHAVE_X11 from the CONF variable definition. Then (re)compile the PETSc libraries.
Most of the functionality of PETSc can be obtained by people who program purely in Fortran 77 or Fortran 90. Note, however, that we recommend the use of C and/or C++ because these languages contain several extremely powerful concepts that the Fortran77/90 family does not. The PETSc Fortran interface works with both F77 and F90 compilers.
Since Fortran77 does not provide type checking of routine input/output parameters, we find that many errors encountered within PETSc Fortran programs result from accidentally using incorrect calling sequences. Such mistakes are immediately detected during compilation when using C/C++. Thus, using a mixture of C/C++ and Fortran often works well for programmers who wish to employ Fortran for the core numerical routines within their applications. In particular, one can effectively write PETSc driver routines in C/C++, thereby preserving flexibility within the program, and still use Fortran when desired for underlying numerical computations.
Only a few differences exist between the C and Fortran PETSc interfaces, all of which are due to differences in Fortran syntax. All Fortran routines have the same names as the corresponding C versions, and PETSc command line options are fully supported. The routine arguments follow the usual Fortran conventions; the user need not worry about passing pointers or values. The calling sequences for the Fortran version are in most cases identical to the C version, except for the error checking variable discussed in Section Error Checking and a few routines listed in Section Routines with Different Fortran Interfaces . Note that use of the PETSc Fortran interface requires first compiling the interface library, which is discussed in Section Compiling and Linking Fortran Programs .
PETSc Fortran users have two choices for including the PETSc header files.
Recommended Approach:
In the first approach,
the Fortran include files for PETSc are located in the directory
${}PETSC_DIR/include/finclude and should be used via statements
such as the following:
#include "include/finclude/includefile.h"Since one must be very careful to include each file no more than once in a Fortran routine, application programmers must manually include each file needed for the various PETSc components within their program. This approach differs from the PETSc C/C++ interface, where the user need only include the highest level file, for example, snes.h, which then automatically includes all of the required lower level files. As shown in the examples of Section Sample Fortran77 Programs , in Fortran one must explicitly list each of the include files. If using this approach one must employ the Fortran file suffix .F rather than .f. This convention enables use of the CPP preprocessor, which allows the use of the #include statements that define PETSc objects and variables. (Familarity with the CPP preprocessor is not needed for writing PETSc Fortran code; one can simply begin by copying a PETSc Fortran example and its corresponding makefile.)
Alternative Approach:
If working with .f files is absolutely essential (perhaps as
part of a heritage code), the conventional Fortran style include
statement can be employed. The weakness of this approach is that either the
complete path of the include file must be hardwired with a statement such as
include '/home/username/petsc/include/foldinclude/includefile.h'or a link must be estabilished in the directory containing the Fortran source file to the file
ln -s /home/username/petsc/include/foldinclude/includefile.h includefile.hSome Fortran compilers will accept a -I<directory>, but depending on the Fortran compiler, they may use the -I list only for the #include style of include. In addition, the user must declare all PETSc objects as integer rather than by their name. For example, declarations within Fortran .F files have the form
SLES solver Mat A, B Vec x, y integer iwhile the analogous statements within .f files are
integer solver integer A, B integer x, y integer i
In the Fortran version, each PETSc routine has as its final argument
an integer error variable, in contrast to the C convention of
providing the error variable as the routine's return value. The error
code is set to be nonzero if an error has been detected; otherwise, it
is zero. For example, the Fortran and C variants of SLESSolve() are
given, respectively, below, where ierr denotes the error variable:
call SLESSolve(SLES sles,Vec b,Vec x,int its,int ierr) ierr = SLESSolve(SLES sles,Vec b,Vec x,int *its);Fortran programmers using the .F file suffix, as discussed in Section Include Files , can check these error codes with CHKERRA(ierr), which terminates all process when an error is encountered. Likewise, one can set error codes within Fortran programs by using SETERRA(ierr,p,' '), which again terminates all processes upon detection of an error. Note that complete error tracebacks with CHKERRQ() and SETERRQ(), as described in Section Simple PETSc Examples for C routines, are not directly supported for Fortran routines; however, Fortran programmers can easily use the error codes in writing their own tracebacks. For example, one could use code such as the following:
call SLESSolve(sles,x,y,ierr) if ( ierr .ne. 0 ) then print*, 'Error in routine ...' return endifNote that users of the Fortran .f suffix cannot employ the macros SETERRA() and CHKERRA().
The most common reason for crashing PETSc Fortran code is forgetting the final ierr argument.
Since Fortran does not allow arrays to be returned in routine
arguments, all PETSc routines that return arrays, such as
VecGetArray(), MatGetArray(),
ISGetIndices(), and DAGetGlobalIndices()
are defined slightly differently in Fortran than in C.
Instead of returning the array itself, these routines
accept as input a user-specified array of dimension one and return an
integer index to the actual array used for data storage within PETSc.
The Fortran interface for several routines is as follows:
double precision xx_v(1), aa_v(1) integer ss_v(1), dd_v(1), dd_i, ss_i, xx_i, aa_i, ierr, nloc Vec x Mat A IS s DA dTo access array elements directly, both the user-specified array and the integer index must then be used together. For example, the following Fortran program fragment illustrates directly setting the values of a vector array instead of using VecSetValues(). Note the (optional) use of the preprocessor #define statement to enable array manipulations in the conventional Fortran manner.call VecGetArray(x,xx_v,xx_i,ierr) call MatGetArray(A,aa_v,aa_i,ierr) call ISGetIndices(s,ss_v,ss_i,ierr) call DAGetGlobalIndices(d,nloc,dd_v,dd_i,ierr)
#define xx_a(ib) xx_v(xx_i + (ib))Figure 17 contains an example of using VecGetArray() within a Fortran routine.double precision xx_v(1) integer xx_i, i, ierr, n Vec x call VecGetArray(x,xx_v,xx_i,ierr) call VecGetLocalSize(x,n,ierr) do 10, i=1,n xx_a(i) = 3*i + 1 10 continue call VecRestoreArray(x,xx_v,xx_i,ierr)
Since in this case the array is accessed directly from Fortran, indexing begins with 1, not 0 (unless the array is declared as xx_v(0:1)). This is different from the use of VecSetValues() where, indexing always starts with 0.
Note: If using VecGetArray(), MatGetArray(), ISGetIndices(), or DAGetGlobalIndices() from Fortran, the user must not compile the Fortran code with options to check for ``array entries out of bounds'' (e.g., on the IBM RS/6000 this is done with the -C compiler option, so never use the -C option with this).
Since the use of both Fortran and C routines is sometimes needed in application codes, we provide two PETSc commands to facilitate passing PETSc objects (such as Mat and SLES) between the two languages. These routines must be called within any C/C++ routines that pass/receive PETSc objects to/from Fortran routines to ensure that the objects are properly handled, since Fortran treats PETSc objects simply as integers.
Different machines have different methods of naming Fortran routines called from C (or C routines called from Fortran). Most Fortran compilers change all the capital letters in Fortran routines to small. On some machines, the Fortran compiler appends an underscore to the end of each Fortran routine name; for example, the Fortran routine Dabsc() would be called from C with dabsc_(). Other machines change all the letters in Fortran routine names to capitals.
PETSc provides two macros (defined in C/C++) to help write
portable code that mixes C/C++ and Fortran. They are
HAVE_FORTRAN_UNDERSCORE and HAVE_FORTRAN_CAPS
,
which are defined in the file ${}PETSC_DIR/bmake/${}PETSC_ARCH/base.site.
The macros are used, for example, as follows:
#if defined(HAVE_FORTRAN_CAPS) #define dabsc_ DABSC #elif !defined(HAVE_FORTRAN_UNDERSCORE) #define dabsc_ dabsc #endif ..... dabsc_(&n,x,y); /* call the Fortran function */Another useful routine for mixed language programming with PETSc is PetscInitializeFortran(), which should be used if one is using a C main program that calls Fortran routines that in turn call PETSc routines. In this case, PetscInitializeFortran() should be called from C after the call to PetscInitialize() to initialize some of the default viewers, communicators, etc. for use in the Fortran. PetscInitializeFortran() is not needed if a user's main program is written in Fortran; in this case, just calling PetscInitialize() in the main program is sufficient.
In several PETSc C functions, one has the option of passing a 0 (null)
argument (for example, the fifth argument of MatCreateSeqAIJ()).
From Fortran, users must pass PETSC_NULL_XXX to indicate a
null argument (where XXX is INTEGER, DOUBLE, CHARACTER,
or SCALAR depending on the type of argument required);
passing 0 from
Fortran will crash
the code. Note
that the C convention of passing PETSC_NULL (or 0) cannot
be used. For example, when no options prefix is desired in the
routine OptionsGetInt(), one must use the following command in
Fortran:
call OptionsGetInt(PETSC_NULL_CHARACTER,'-name',N,flg,ierr)This Fortran requirement is inconsistent with C, where the user can employ PETSC_NULL for all null arguments.
The Fortran interface to VecDuplicateVecs() differs slightly
from the C/C++ variant because Fortran does not allow arrays to be
returned in routine arguments. To create n vectors of the same
format as an existing vector, the user must declare a vector array,
v_new of size n. Then, after VecDuplicateVecs() has
been called, v_new will contain (pointers to) the new PETSc
vector objects. When finished with the vectors, the use should
destroy them by calling VecDestroyVectors().
For example, the following code fragment
duplicates v_old to form two new vectors, v_new(1) and v_new(2).
Vec v_old, v_new(2) integer ierr Scalar alpha call VecDuplicateVecs(v_old,2,v_new,ierr) alpha = 4.3 call VecSet(alpha,v(1),ierr) alpha = 6.0 call VecSet(alpha,v(2),ierr) call VecDestroyVecs(v_new,2,ierr)
All matrices and vectors in PETSc use zero-based indexing, regardless of whether C or Fortran is being used. The interface routines, such as MatSetValues() and VecSetValues(), always use zero indexing. See Section Basic Matrix Operations for further details.
When a routine is set from within a Fortran program by a routine such as KSPSetConvergenceTest(), that routine is assumed to be a Fortran routine. Likewise, when a routine is set from within a C program, that routine is assumed to be written in C.
Figure 22
shows a sample makefile that can be used for
PETSc programs. In this makefile, one can compile and run a debugging version
of the Fortran program ex3.F with the actions make BOPT=g ex3 and
make runex3, respectively. The compilation command is restated below:
ex3: ex3.o -${FLINKER} -o ex3 ex3.o ${PETSC_FORTRAN_LIB} ${PETSC_LIB} ${RM} ex3.oNote that the PETSc Fortran interface library, given by ${}PETSC_FORTRAN_LIB, must precede the base PETSc libraries, given by ${}PETSC_LIB, on the link line.
The following Fortran routines differ slightly from their C counterparts; see the manual pages and previous discussion in this chapter for details:
PETSc includes limited support for direct use of Fortran90 pointers. Current routines include:
#include "include/finclude/vec.h90"Analogous include files for other components are da.h90, mat.h90, and is.h90; the conventional Fortran style include files (as discussed in Section Include Files ) are supported as well.
Unfortunately, these routines currently work only on certain machines with certain compilers. They currently work with the SGI, the Cray T3E, the IBM and the NAG Fortran 90 compiler.
Sample programs that illustrate the PETSc interface for Fortran are given in Figures 16 - 19 , corresponding to ${}PETSC_DIR/src/vec/examples/tests/ex19.F, ${}PETSC_DIR/src/vec/examples/tutorials/ex4f.F, ${}PETSC_DIR/src/draw/examples/tests/ex5.F, and ${}PETSC_DIR/src/snes/examples/ex1f.F, respectively. We also refer Fortran programmers to the C examples listed throughout the manual, since PETSc usage within the two languages differs only slightly.
! ! "$Id: ex19.F,v 1.31 1998/04/15 18:00:32 balay Exp $"; ! #include "include/finclude/petsc.h" #include "include/finclude/vec.h" ! ! This example demonstrates basic use of the PETSc Fortran interface ! to vectors. ! integer n, ierr,flg Scalar one, two, three, dot double precision norm,rdot Vec x,y,w n = 20 one = 1.0 two = 2.0 three = 3.0 call PetscInitialize(PETSC_NULL_CHARACTER,ierr) call OptionsGetInt(PETSC_NULL_CHARACTER,'-n',n,flg,ierr) ! Create a vector, then duplicate it call VecCreate(PETSC_COMM_WORLD,PETSC_DECIDE,n,x,ierr) call VecDuplicate(x,y,ierr) call VecDuplicate(x,w,ierr) call VecSet(one,x,ierr) call VecSet(two,y,ierr) call VecDot(x,y,dot,ierr) rdot = PetscReal(dot) write(6,100) rdot 100 format('Result of inner product ',f10.4) call VecScale(two,x,ierr) call VecNorm(x,NORM_2,norm,ierr) write(6,110) norm 110 format('Result of scaling ',f10.4) call VecCopy(x,w,ierr) call VecNorm(w,NORM_2,norm,ierr) write(6,120) norm 120 format('Result of copy ',f10.4) call VecAXPY(three,x,y,ierr) call VecNorm(y,NORM_2,norm,ierr) write(6,130) norm 130 format('Result of axpy ',f10.4) call VecDestroy(x,ierr) call VecDestroy(y,ierr) call VecDestroy(w,ierr) call PetscFinalize(ierr) end
! ! "$Id: ex4f.F,v 1.21 1998/04/15 18:00:13 balay Exp $"; ! ! Description: Illustrates the use of VecSetValues() to set ! multiple values at once; demonstrates VecGetArray(). ! !/*T ! Concepts: Vectors^Assembling vectors; Using vector arrays; ! Routines: VecCreateSeq(); VecDuplicate(); VecSetValues(); VecView(); ! Routines: VecCopy(); VecView(); VecGetArray(); VecRestoreArray(); ! Routines: VecAssemblyBegin(); VecAssemblyEnd(); VecDestroy(); ! Processors: 1 !T*/ ! ----------------------------------------------------------------------- program ex4f implicit none ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! Include files ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! ! The following include statements are required for Fortran programs ! that use PETSc vectors: ! petsc.h - base PETSc routines ! vec.h - vectors #include "include/finclude/petsc.h" #include "include/finclude/vec.h" ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! Macro definitions ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! ! Macros to make clearer the process of setting values in vectors and ! getting values from vectors. ! ! - The element xx_a(ib) is element ib+1 in the vector x ! - Here we add 1 to the base array index to facilitate the use of ! conventional Fortran 1-based array indexing. ! #define xx_a(ib) xx_v(xx_i + (ib)) #define yy_a(ib) yy_v(yy_i + (ib)) ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! Beginning of program ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Scalar xwork(6) Scalar xx_v(1), yy_v(1) integer i, n, ierr, loc(6) PetscOffset xx_i, yy_i Vec x, y call PetscInitialize(PETSC_NULL_CHARACTER,ierr) n = 6 ! Create initial vector and duplicate it call VecCreateSeq(PETSC_COMM_SELF,n,x,ierr) call VecDuplicate(x,y,ierr) ! Fill work arrays with vector entries and locations. Note that ! the vector indices are 0-based in PETSc (for both Fortran and ! C vectors) do 10 i=1,n loc(i) = i-1 xwork(i) = 10.0*i 10 continue ! Set vector values. Note that we set multiple entries at once. ! Of course, usually one would create a work array that is the ! natural size for a particular problem (not one that is as long ! as the full vector). call VecSetValues(x,6,loc,xwork,INSERT_VALUES,ierr) ! Assemble vector call VecAssemblyBegin(x,ierr) call VecAssemblyEnd(x,ierr) ! View vector write(6,20) 20 format('initial vector:') call VecView(x,VIEWER_STDOUT_SELF,ierr) call VecCopy(x,y,ierr) ! Get a pointer to vector data. ! - For default PETSc vectors, VecGetArray() returns a pointer to ! the data array. Otherwise, the routine is implementation dependent. ! - You MUST call VecRestoreArray() when you no longer need access to ! the array. ! - Note that the Fortran interface to VecGetArray() differs from the ! C version. See the users manual for details. call VecGetArray(x,xx_v,xx_i,ierr) call VecGetArray(y,yy_v,yy_i,ierr) ! Modify vector data do 30 i=1,n xx_a(i) = 100.0*i yy_a(i) = 1000.0*i 30 continue ! Restore vectors call VecRestoreArray(x,xx_v,xx_i,ierr) call VecRestoreArray(y,yy_v,yy_i,ierr) ! View vectors write(6,40) 40 format('new vector 1:') call VecView(x,VIEWER_STDOUT_SELF,ierr) write(6,50) 50 format('new vector 2:') call VecView(y,VIEWER_STDOUT_SELF,ierr) ! Free work space. All PETSc objects should be destroyed when they ! are no longer needed. call VecDestroy(x,ierr) call VecDestroy(y,ierr) call PetscFinalize(ierr) end
! ! "$Id: ex5.F,v 1.21 1998/04/15 18:03:01 balay Exp $"; ! #include "include/finclude/petsc.h" #include "include/finclude/draw.h" ! ! This example demonstrates basic use of the Fortran interface for ! Draw routines. ! Draw draw DrawLG lg DrawAxis axis integer n,i, ierr, x, y, width, height,flg Scalar xd,yd n = 20 x = 0 y = 0 width = 300 height = 300 call PetscInitialize(PETSC_NULL_CHARACTER,ierr) call OptionsGetInt(PETSC_NULL_CHARACTER,'-width',width,flg,ierr) call OptionsGetInt(PETSC_NULL_CHARACTER,'-height',height,flg,ierr) call OptionsGetInt(PETSC_NULL_CHARACTER,'-n',n,flg,ierr) call DrawOpenX(PETSC_COMM_SELF,PETSC_NULL_CHARACTER, & & PETSC_NULL_CHARACTER,x,y,width,height,draw,ierr) call DrawLGCreate(draw,1,lg,ierr) call DrawLGGetAxis(lg,axis,ierr) call DrawAxisSetColors(axis,DRAW_BLACK,DRAW_RED,DRAW_BLUE,ierr) call DrawAxisSetLabels(axis,'toplabel','xlabel','ylabel',ierr) do 10, i=0,n-1 xd = i - 5.0 yd = xd*xd call DrawLGAddPoint(lg,xd,yd,ierr) 10 continue call DrawLGIndicateDataPoints(lg,ierr) call DrawLGDraw(lg,ierr) call DrawFlush(draw,ierr) call PetscSleep(10,ierr) call DrawLGDestroy(lg,ierr) call DrawDestroy(draw,ierr) call PetscFinalize(ierr) end
! ! "$Id: ex1f.F,v 1.23 1998/04/23 02:11:26 balay Exp $"; ! !/*T ! Concepts: SNES^Solving a system of nonlinear equations (basic uniprocessor example) ! Routines: SNESCreate(); SNESSetFunction(); SNESSetJacobian(); ! Routines: SNESSolve(); SNESSetFromOptions(); SNESGetSLES(); ! Routines: SLESGetPC(); SLESGetKSP(); KSPSetTolerances(); PCSetType(); ! Processors: 1 !T*/ ! ! Description: Uses the Newton method to solve a two-variable system. ! ! ----------------------------------------------------------------------- program main implicit none ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! Include files ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! ! The following include statements are generally used in SNES Fortran ! programs: ! petsc.h - base PETSc routines ! vec.h - vectors ! mat.h - matrices ! ksp.h - Krylov subspace methods ! pc.h - preconditioners ! sles.h - SLES interface ! snes.h - SNES interface ! Other include statements may be needed if using additional PETSc ! routines in a Fortran program, e.g., ! viewer.h - viewers ! is.h - index sets ! #include "include/finclude/petsc.h" #include "include/finclude/vec.h" #include "include/finclude/mat.h" #include "include/finclude/ksp.h" #include "include/finclude/pc.h" #include "include/finclude/sles.h" #include "include/finclude/snes.h" ! ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! Variable declarations ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! ! Variables: ! snes - nonlinear solver ! sles - linear solver ! pc - preconditioner context ! ksp - Krylov subspace method context ! x, r - solution, residual vectors ! J - Jacobian matrix ! its - iterations for convergence ! SNES snes SLES sles PC pc KSP ksp Vec x, r Mat J integer ierr, its, size, rank Scalar pfive double precision tol ! Note: Any user-defined Fortran routines (such as FormJacobian) ! MUST be declared as external. external FormFunction, FormJacobian ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! Macro definitions ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! ! Macros to make clearer the process of setting values in vectors and ! getting values from vectors. These vectors are used in the routines ! FormFunction() and FormJacobian(). ! - The element lx_a(ib) is element ib in the vector x ! #define lx_a(ib) lx_v(lx_i + (ib)) #define lf_a(ib) lf_v(lf_i + (ib)) ! ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! Beginning of program ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - call PetscInitialize(PETSC_NULL_CHARACTER,ierr) call MPI_Comm_size(PETSC_COMM_WORLD,size,ierr) call MPI_Comm_rank(PETSC_COMM_WORLD,rank,ierr) if (size .ne. 1) then if (rank .eq. 0) then write(6,*) 'This is a uniprocessor example only!' endif SETERRA(1,0,' ') endif ! - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - - - - - - - ! Create nonlinear solver context ! - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - - - - - - - call SNESCreate(PETSC_COMM_WORLD,SNES_NONLINEAR_EQUATIONS, & & snes,ierr) ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! Create matrix and vector data structures; set corresponding routines ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! Create vectors for solution and nonlinear function call VecCreateSeq(PETSC_COMM_SELF,2,x,ierr) call VecDuplicate(x,r,ierr) ! Create Jacobian matrix data structure call MatCreate(PETSC_COMM_SELF,2,2,J,ierr) ! Set function evaluation routine and vector call SNESSetFunction(snes,r,FormFunction,PETSC_NULL_OBJECT,ierr) ! Set Jacobian matrix data structure and Jacobian evaluation routine call SNESSetJacobian(snes,J,J,FormJacobian,PETSC_NULL_OBJECT, & & ierr) ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! Customize nonlinear solver; set runtime options ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! Set linear solver defaults for this problem. By extracting the ! SLES, KSP, and PC contexts from the SNES context, we can then ! directly call any SLES, KSP, and PC routines to set various options. call SNESGetSLES(snes,sles,ierr) call SLESGetKSP(sles,ksp,ierr) call SLESGetPC(sles,pc,ierr) call PCSetType(pc,PCNONE,ierr) tol = 1.e-4 call KSPSetTolerances(ksp,tol,PETSC_DEFAULT_DOUBLE_PRECISION, & & PETSC_DEFAULT_DOUBLE_PRECISION,20,ierr) ! Set SNES/SLES/KSP/PC runtime options, e.g., ! -snes_view -snes_monitor -ksp_type <ksp> -pc_type <pc> ! These options will override those specified above as long as ! SNESSetFromOptions() is called _after_ any other customization ! routines. call SNESSetFromOptions(snes,ierr) ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! Evaluate initial guess; then solve nonlinear system ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! Note: The user should initialize the vector, x, with the initial guess ! for the nonlinear solver prior to calling SNESSolve(). In particular, ! to employ an initial guess of zero, the user should explicitly set ! this vector to zero by calling VecSet(). pfive = 0.5 call VecSet(pfive,x,ierr) call SNESSolve(snes,x,its,ierr) if (rank .eq. 0) then write(6,100) its endif 100 format('Number of Newton iterations = ',i5) ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ! Free work space. All PETSc objects should be destroyed when they ! are no longer needed. ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - call VecDestroy(x,ierr) call VecDestroy(r,ierr) call MatDestroy(J,ierr) call SNESDestroy(snes,ierr) call PetscFinalize(ierr) end ! --------------------------------------------------------------------- ! ! FormFunction - Evaluates nonlinear function, F(x). ! ! Input Parameters: ! snes - the SNES context ! x - input vector ! dummy - optional user-defined context (not used here) ! ! Output Parameter: ! f - function vector ! subroutine FormFunction(snes,x,f,dummy) implicit none #include "include/finclude/petsc.h" #include "include/finclude/vec.h" #include "include/finclude/snes.h" SNES snes Vec x, f integer ierr, dummy(*) ! Declarations for use with local arrays Scalar lx_v(1), lf_v(1) PetscOffset lx_i, lf_i ! Get pointers to vector data. ! - For default PETSc vectors, VecGetArray() returns a pointer to ! the data array. Otherwise, the routine is implementation dependent. ! - You MUST call VecRestoreArray() when you no longer need access to ! the array. ! - Note that the Fortran interface to VecGetArray() differs from the ! C version. See the Fortran chapter of the users manual for details. call VecGetArray(x,lx_v,lx_i,ierr) call VecGetArray(f,lf_v,lf_i,ierr) ! Compute function lf_a(1) = lx_a(1)*lx_a(1) & & + lx_a(1)*lx_a(2) - 3.0 lf_a(2) = lx_a(1)*lx_a(2) & & + lx_a(2)*lx_a(2) - 6.0 ! Restore vectors call VecRestoreArray(x,lx_v,lx_i,ierr) call VecRestoreArray(f,lf_v,lf_i,ierr) return end ! --------------------------------------------------------------------- ! ! FormJacobian - Evaluates Jacobian matrix. ! ! Input Parameters: ! snes - the SNES context ! x - input vector ! dummy - optional user-defined context (not used here) ! ! Output Parameters: ! A - Jacobian matrix ! B - optionally different preconditioning matrix ! flag - flag indicating matrix structure ! subroutine FormJacobian(snes,X,jac,B,flag,dummy) implicit none #include "include/finclude/petsc.h" #include "include/finclude/vec.h" #include "include/finclude/mat.h" #include "include/finclude/pc.h" #include "include/finclude/snes.h" SNES snes Vec X Mat jac, B MatStructure flag Scalar A(4) integer ierr, idx(2), dummy(*) ! Declarations for use with local arrays Scalar lx_v(1) PetscOffset lx_i ! Get pointer to vector data call VecGetArray(x,lx_v,lx_i,ierr) ! Compute Jacobian entries and insert into matrix. ! - Since this is such a small problem, we set all entries for ! the matrix at once. ! - Note that MatSetValues() uses 0-based row and column numbers ! in Fortran as well as in C (as set here in the array idx). idx(1) = 0 idx(2) = 1 A(1) = 2.0*lx_a(1) + lx_a(2) A(2) = lx_a(1) A(3) = lx_a(2) A(4) = lx_a(1) + 2.0*lx_a(2) call MatSetValues(jac,2,idx,2,idx,A,INSERT_VALUES,ierr) flag = SAME_NONZERO_PATTERN ! Restore vector call VecRestoreArray(x,lx_v,lx_i,ierr) ! Assemble matrix call MatAssemblyBegin(jac,MAT_FINAL_ASSEMBLY,ierr) call MatAssemblyEnd(jac,MAT_FINAL_ASSEMBLY,ierr) return end
PETSc includes a consistent, lightweight scheme to allow the profiling of application programs. The PETSc routines automatically log performance data if certain options are specified at runtime. The user can also log information about application codes for a complete picture of performance. In addition, as described in Section Interpreting -log_info Output: Informative Messages , PETSc provides a mechanism for printing informative messages about computations. Section Basic Profiling Information introduces the various profiling options in PETSc, while the remainder of the chapter focuses on details such as monitoring application codes and tips for accurate profiling. See Section for implementation details.
If an application code and the PETSc libraries have been compiled with the -DPETSC_LOG flag (which is the default for all versions), then various kinds of profiling of code between calls to PetscInitialize() and PetscFinalize() can be activated at runtime. Note that the flag -DPETSC_LOG can be specified for an installation of PETSc in the file ${}PETSC_DIR/bmake/${}PETSC_ARCH/base.${}BOPT, as discussed in Section PETSc Flags . The profiling options include the following:
As shown in Figure 7 (in Part I), the option -log_summary activates printing of profile data to standard output at the conclusion of a program. Profiling data can also be printed at any time within a program by calling PLogPrintSummary().
We print performance data for each routine, organized by PETSc components, followed by any user-defined events (discussed in Section Profiling Application Codes ). For each routine, the output data include the maximum time and floating point operation (flop) rate over all processors. Information about parallel performance is also included, as discussed in the following section.
For the purpose of PETSc floating point operation counting, we define one flop as one operation of any of the following types: multiplication, division, addition, or subtraction. For example, one VecAXPY() operation, which computes y = alpha x + y for vectors of length N, requires 2N flops (consisting of N additions and N multiplications). Bear in mind that flop rates present only a limited view of performance, since memory loads and stores are the real performance barrier.
For simplicity, the remainder of this discussion focuses on interpreting profile data for the SLES component, which provides the linear solvers at the heart of the PETSc package. Recall the hierarchical organization of the PETSc library, as shown in Figure 1 . Each SLES solver is composed of a PC (preconditioner) and KSP (Krylov subspace) component, which are in turn built on top of the Mat (matrix) and Vec (vector) modules. Thus, operations in the SLES module are composed of lower-level operations in these components. Note also that the nonlinear solvers component, SNES, is build on top of the SLES module, and the timestepping component, TS, is in turn built on top of SNES.
We briefly discuss interpretation of the sample output in
Figure 7
, which was generated by solving a linear
system on one processor using restarted GMRES and ILU
preconditioning. The linear solvers in SLES consist of two
basic phases, SLESSetUp() and SLESSolve(), each of which
consists of a variety of actions, depending on the particular
solution technique.
For the case of using the PCILU preconditioner and KSPGMRES
Krylov subspace method, the breakdown of PETSc routines is listed below.
As indicated by the levels of indentation, the
operations in SLESSetUp() include all of the operations within
PCSetUp(), which in turn include MatILUFactor(), and so on.
12345123123123
bullet : SLESSetUp - Set up linear solver
bullet : PCSetUp - Set up preconditioner
bullet : MatILUFactor - Factor preconditioning matrix
bullet : MatILUFactorSymbolic - Symbolic factorization phase
bullet : MatLUFactorNumeric - Numeric factorization phase
bullet : SLESSolve - Solve linear system
bullet : PCApply - Apply preconditioner
bullet : MatSolve - Forward/backward triangular solves
bullet : KSPGMRESOrthog - Orthogonalization in GMRES
bullet : VecDot or VecMDot - Inner products
bullet : MatMult - Matrix-vector product
bullet : MatMultAdd - Matrix-vector product + vector addition
bullet : VecScale, VecNorm, VecAXPY, VecCopy, ...
The summaries printed via -log_summary reflect this
routine hierarchy. For example, the performance summaries for a
particular high-level routine such as SLESSolve include all of
the operations accumulated in the lower-level components that
make up the routine. Using the GUI utility PETScView
(described in Chapter PETSc GUI Utilities
) for a small example problem can
help to provide the user with an understanding of the operations
within an application code, thus making this hierarchy more aparrent
for a particular application.
Admittedly, we do not currently present the output with -log_summary so that the hierarchy of PETSc operations is completely clear, primarily because we have not determined a clean and uniform way to do so throughout the library. Improvements may follow. However, for a particular problem, the user should generally have an idea of the basic operations that are required for its implementation (e.g., which operations are performed when using GMRES and ILU, as described above), so that interpreting the -log_summary data should be relatively straightforward.
We next discuss performance summaries for parallel programs, as shown within Figures 20 and 21 , which present the combined output generated by the -log_summary option. The program that generated this data is ${}PETSC_DIR/src/sles/examples/ex21.c. The code loads a matrix and right-hand-side vector from a binary file and then solves the resulting linear system; the program then repeats this process for a second linear system. This particular case was run on four processors of an IBM SP, using restarted GMRES and the block Jacobi preconditioner, where each block was solved with ILU.
Figure 20 presents an overall performance summary, including times, floating-point operations, computational rates, and message-passing activity (such as the number and size of messages sent and collective operations). Summaries for various user-defined stages of monitoring (as discussed in Section Profiling Multiple Sections of Code ) are also given. Information about the various phases of computation then follow (as shown separately here in Figure 21 ). Finally, a summary of memory usage and object creation and destruction is presented.
Total Mflop/sec = 10^-6 * ( sum of flops over all processors)/( max time over all processors)
Note: Total computational rates < 1 MFlop are listed as 0 in this column of the phase summary table. Additional statistics for each phase include the total number of messages sent, the average message length, and the number of global reductions.
The final data presented are the percentages of the various statistics (time ( %T), flops/sec ( %F), messages( %M), average message length ( %L), and reductions ( %R)) for each event relative to the total computation and to any user-defined stages (discussed in Section Profiling Multiple Sections of Code ). These statistics can aid in optimizing performance, since they indicate the sections of code that could benefit from various kinds of tuning. Chapter Hints for Performance Tuning gives suggestions about achieving good performance with PETSc codes.
The PETSc utility ${}PETSC_DIR/bin/petscview [logfile] can be used to examine the profile data generated by -log and -log_all. Chapter PETSc GUI Utilities provides details regarding this Tk/Tcl tool, which provides a high-level view of the interrelationships among various code modules. Also, Section Restricting Event Logging gives information on restricting event logging.
It is also possible to use the Upshot (or Nupshot) package
[(ref upshot)] to visualize PETSc events.
This package comes with the MPE software, which is part of the MPICH
[(ref mpich-web-page)] implementation of MPI.
The option
-log_mpe [logfile]creates a logfile of events appropriate for viewing with Upshot. The user can either use the default logging file, mpe.log, or specify an optional name via logfile.
To use this logging option, the user may employ any implementation of MPI (not necessarily MPICH), but must build and link the MPE part of the MPICH. The user must compile the PETSc library with the -DHAVE_MPE flag, which is not activated by default. The user can turn on MPE logging by specifying -DHAVE_MPE in the PCONF variable within ${}PETSC_DIR/bmake/${}PETSC_ARCH/base.site and (re)compiling all of PETSc.
By default, not all PETSc events are logged with MPE. For example,
since MatSetValues() may be called thousands of times in a program,
by default its calls are not logged with MPE. To activate MPE logging of
a particular event, one should use the command
PLogEventMPEActivate(int event);To deactivate logging of an event for MPE, one should use
PLogEventMPEDeactivate(int event);The event may be either a predefined PETSc event (as listed in the file ${}PETSC_DIR/include/petsclog.h) or one obtained with PetscEventRegister() (as described in Section Profiling Application Codes ). These routines may be called as many times as desired in an application program, so that one could restrict MPE event logging only to certain code segments.
To see what events are logged by default, the user can view the source code; see the files src/plot/src/plogmpe.c and include/petsclog.h. A simple program and GUI interface to see the events that are predefined and their definition is being developed.
The user can also log MPI events. To do this, simply consider the PETSc application as any MPI application, and follow the MPI implementation's instructions for logging MPI calls. For example, when using MPICH, this merely required adding -llmpi to the library list before -lmpi.
PETSc automatically logs object creation, times, and floating-point
counts for the library routines. Users can easily supplement
this information by monitoring their application codes as well.
The basic steps involved in logging a
user-defined portion of code, called an event, are shown in the
code fragment below:
#include "petsclog.h" int USER_EVENT; PLogEventRegister(&USER_EVENT,"User event name","Color:"); PLogEventBegin(USER_EVENT,0,0,0,0); /* application code segment to monitor */ PLogFlops(number of flops for this code segment); PLogEventEnd(USER_EVENT,0,0,0,0);One must register the event by calling PLogEventRegister(), which assigns a unique integer to identify the event for profiling purposes:
ierr = PLogEventRegister(int *e,char *string,char *color);Here string is a user-defined event name, and color is an optional user-defined event color (for use with Upshot/Nupshot logging); one should see the manual page for details. The argument returned in e should then be passed to the PLogEventBegin() and PLogEventEnd() routines.
Events are logged by using the pair
PLogEventBegin(int event,PetscObject o1,PetscObject o2,PetscObject o3,PetscObject o4); PLogEventEnd(int event,PetscObject o1,PetscObject o2,PetscObject o3,PetscObject o4);The four objects are the PETSc objects that are most closely associated with the event. For instance, in a matrix-vector product they would be the matrix and the two vectors. These objects can be omitted by specifying 0 for o1 - o4. The code between these two routine calls will be automatically timed and logged as part of the specified event.
The user can log the number of floating-point operations
for this segment of code by calling
PLogFlops(number of flops for this code segment);between the calls to PLogEventBegin() and PLogEventEnd(). This value will automatically be added to the global flop counter for the entire program.
By default, the profiling produces a single set of statistics for all
code between the PetscInitialize() and PetscFinalize()
calls within a program. One can independently monitor up to ten
stages of code by switching among the various stages with the comands
PLogStagePush(int stage); PLogStagePop();where stage is an integer (0-9); see the manual pages for details. The command
PLogStageRegister(int stage,char *name)allows one to associate a name with a stage; these names are printed whenever summaries are generated with -log_summary or PLogPrintSummary(). The following code fragment uses three profiling stages within an program.
PetscInitialize(int *argc,char ***args,0,0); /* [stage 0 of code here] */ PLogStageRegister(0,"Stage 0 of Code"); for (i=0; i<ntimes; i++) { PLogStagePush(1); PLogStageRegister(1,"Stage 1 of Code"); /* [stage 1 of code here] */ PLogStagePop() PLogStagePush(2); PLogStageRegister(1,"Stage 2 of Code"); /* [stage 2 of code here] */ PLogStagePop() } PetscFinalize();Figures 20 and 21 show output generated by -log_summary for a program that employs several profiling stages. In particular, this program is subdivided into six stages: loading a matrix and right-hand-side vector from a binary file, setting up the preconditioner, and solving the linear system; this sequence is then repeated for a second linear system. For simplicity, Figure 21 contains output only for stages 4 and 5 (linear solve of the second system), which comprise the part of this computation of most interest to us in terms of performance monitoring. This code organization (solving a small linear system followed by a larger system) enables generation of more accurate profiling statistics for the second system by overcoming the often considerable overhead of paging, as discussed in Section Accurate Profiling: Overcoming the Overhead of Paging .
By default, all PETSc operations are logged.
To enable or disable the PETSc logging of individual events, one uses the commands
PLogEventActivate(int event); PLogEventDeactivate(int event);The event may be either a predefined PETSc event (as listed in the file ${}PETSC_DIR/include/petsclog.h) or one obtained with PetscEventRegister() (as described in Section Profiling Application Codes ).
PETSc also provides routines that deactivate (or activate)
logging for entire components of the library. Currently, the
components that support such logging (de)activation are Mat (matrices),
Vec (vectors), SLES (linear solvers, including KSP
and PC components), and SNES (nonlinear solvers):
PLogEventDeactivateClass(MAT_COOKIE); PLogEventDeactivateClass(SLES_COOKIE); /* includes PC and KSP */ PLogEventDeactivateClass(VEC_COOKIE); PLogEventDeactivateClass(SNES_COOKIE);and
PLogEventActivateClass(MAT_COOKIE); PLogEventActivateClass(SLES_COOKIE); /* includes PC and KSP */ PLogEventActivateClass(VEC_COOKIE); PLogEventActivateClass(SNES_COOKIE);Recall that the option -log_all produces extensive profile data, which can be a challenge for PETScView to handle due to the memory limitations of Tcl/Tk. Thus, one should generally use -log_all when running programs with a relatively small number of events or when disabling some of the events that occur many times in a code (e.g., VecSetValues(), MatSetValues()).
Section Using -log_mpe with Upshot/Nupshot gives information on the restriction of events in MPE logging.
Users can activate the printing of verbose information about algorithms, data structures, etc. to the screen by using the option -log_info or by calling PLogInfoAllow(PETSC_TRUE). Such logging, which is used throughout the PETSc libraries, can aid the user in understanding algorithms and tuning program performance. For example, as discussed in Section Sparse Matrices , -log_info activates the printing of information about memory allocation during matrix assembly.
Application programmers can employ this logging as well, by
using the routine
PLogInfo(void* obj,char *message,...)where obj is the PETSc object associated most closely with the logging statement, message. For example, in the line search Newton methods, we use a statement such as
PLogInfo(snes,"Cubically determined step, lambda %g\n",lambda);One can selectively turn off informative messages about any of the basic PETSc objects (e.g., Mat, SNES) with the command
PLogInfoDeactivateClass(int object_cookie)where object_cookie is one of MAT_COOKIE, SNES_COOKIE, etc. Messages can be reactivated with the command
PLogInfoActivateClass(int object_cookie)Such deactivation can be useful when one wishes to view information about higher level PETSc components (e.g., TS and SNES) without seeing all lower level data as well (e.g., Mat). One can deactivate events at runtime for matrix and linear solver components via -log_info [no_mat, no_sles].
PETSc application programmers can access the wall clock time directly
with the command
PLogDouble time; ierr = PetscGetTime(&time); CHKERRA(ierr);In addition, as discussed in Section Profiling Application Codes , PETSc can automatically profile user-defined segments of code.
All output from PETSc programs (including informative messages, profiling information, and convergence data) can be saved to a file by using the command line option -log_history [filename]. If no file name is specified, the output is stored in the file $HOME/.petschistory. Note that this option only saves output printed with the PetscPrintf() and PetscFPrintf() commands, not the standard printf() and fprintf() statements.
One factor that often plays a significant role in profiling a code is paging by the operating system. Generally, when running a program only a few pages required to start it are loaded into memory rather than the entire executable. When the execution procedes to code segments that are not in memory, a pagefault occurs, prompting the required pages to be loaded from the disk (a very slow process). This activity distorts the results significantly. (The paging effects are noticeable in the the log files generated by -log_mpe, which is described in Section Using -log_mpe with Upshot/Nupshot .)
To eliminate the effects of paging when profiling the performance of a program, we have found an effective procedure is to run the exact same code on a small dummy problem before running it on the actual problem of interest. We thus ensure that all code required by a solver is loaded into memory during solution of the small problem. When the code procedes to the actual (larger) problem of interest, all required pages have already been loaded into main memory, so that the performance numbers are not distorted.
When this procedure is used in conjunction with the user-defined stages of profiling described in Section Profiling Multiple Sections of Code , we can focus easily on the problem of interest. For example, we used this technique in the program ${}PETSC_DIR/src/sles/examples/tutorials/ex10.c to generate the timings within Figures 20 and 21 . In this case, the profiled code of interest (solving the linear system for the larger problem) occurs within event stages 4 and 5. Section Interpreting -log_summary Output: Parallel Performance provides details about interpreting such profiling data.
This chapter presents some tips on achieving good performance within PETSc 2.0 codes. We urge users to read these hints before evaluating the performance of PETSc application codes.
Code compiled with the BOPT=O option generally runs two to three times faster than that compiled with BOPT=g, so we recommend using one of the optimized versions of code ( BOPT=O, BOPT=O_c++, or BOPT=O_complex) when evaluating performance.
The user can specify alternative compiler options instead of the defaults set in the PETSc distribution. One can set the compiler options for a particular architecture ( PETSC_ARCH) and BOPT by editing the file ${}PETSC_DIR/bmake/${}PETSC_ARCH/base.${}BOPT. Section Customized Makefiles gives details.
Users should not spend time optimizing a code until after having determined where it spends the bulk of its time on realistically sized problems. As discussed in detail in Chapter Profiling , the PETSc routines automatically log performance data if certain runtime options are specified. We briefly highlight usage of these features below.
Performing operations on chunks of data rather than a single element at a time can significantly enhance performance.
Since the process of dynamic memory allocation for sparse matrices is inherently very expensive, accurate preallocation of memory is crucial for efficient sparse matrix assembly. One should use the matrix creation routines for particular data structures, such as MatCreateSeqAIJ() and MatCreateMPIAIJ() for compressed, sparse row formats, instead of the generic MatCreate() routine. For problems with multiple degrees of freedom per node, the block, compressed, sparse row formats, created by MatCreateSeqBAIJ() and MatCreateMPIBAIJ(), can significantly enhance performance. Section Sparse Matrices includes extensive details and examples regarding preallocation.
When symbolically factoring an AIJ matrix, PETSc has to guess
how much fill there will be. Careful use of the parameter ' f' (fill
estimate) when calling MatLUFactorSymbolic() or MatILUFactorSymbolic()
can reduce greatly the number of mallocs and copies required, and thus
greatly improve the performance of the factorization. One way to
determine a good value for f is to run a program with the option -log_info.
The symbolic factorization phase will then print information such as
Info:MatILUFactorSymbolic_AIJ:Realloc 12 Fill ratio:given 1 needed 2.16423This indicates that the user should have used a fill estimate factor of about 2.17 (instead of 1) to prevent the 12 required mallocs and copies. The command line option
-pc_ilu_fill 2.17will cause PETSc to preallocate the correct amount of space for incomplete (ILU) factorization. The corresponding option for direct (LU) factorization is -pc_lu_fill <fill_amount>.
Users should employ a reasonable number of PetscMalloc() calls in their codes. Hundreds or thousands of memory allocations may be appropriate; however, if tens of thousands are being used, then reducing the number of PetscMalloc() calls may be warranted. For example, reusing space or allocating large chunks and dividing it into pieces can produce a significant savings in allocation overhead. Section Data Structure Reuse gives details.
Data structures should be reused whenever possible. For example, if a code often creates new matrices or vectors, there often may be a way to reuse some of them. Very significant performance improvements can be achieved by reusing matrix data structures with the same nonzero pattern. If a code creates thousands of matrix or vector objects, performance will be degraded. For example, when solving a nonlinear problem or timestepping, reusing the matrices and their nonzero structure for many steps when appropriate can make the code run significantly faster.
A simple technique for saving work vectors, matrices, etc. is employing a user-defined context. In C and C++ such a context is merely a structure in which various objects can be stashed; in Fortran a user context can be an integer array that contains both parameters and pointers to PETSc objects. See ${}PETSC_DIR/snes/examples/tutorials/ex5.c and ${}PETSC_DIR/snes/examples/tutorials/ex5f.F for examples of user-defined application contexts in C and Fortran, respectively.
PETSc users should run a variety of tests. For example, there are a large number of options for the linear and nonlinear equation solvers in PETSc, and different choices can make a very big difference in convergence rates and execution times. PETSc employs defaults that are generally reasonable for a wide range of problems, but clearly these defaults cannot be best for all cases. Users should experiment with many combinations to determine what is best for a given problem and customize the solvers accordingly.
As discussed in Chapter SLES: Linear Equations Solvers , the default linear solvers are
-ksp_type <ksp_name> -pc_type <pc_name>One can also specify a variety of runtime customizations for the solvers, as discussed throughout the manual.
In particular, note that the default restart parameter for GMRES is 30, which may be too small for some large-scale problems. One can alter this parameter with the option -ksp_gmres_restart <restart> or by calling KSPGMRESSetRestart(). Section Krylov Methods gives information on setting alternative GMRES orthogonalization routines, which may provide much better parallel performance.
PETSc provides a number of tools to aid in detection of problems with memory allocation, including leaks and use of uninitialized space. We briefly describe these below.
The performance of a code can be affected by a variety of factors, including the cache behavior, other users on the machine, etc. Below we briefly describe some common problems and possibilities for overcoming them.
Allowing the user to modify parameters and options easily at runtime
is very desirable for many applications. PETSc 2.0 provides a simple
mechanism to enable such customization. To print a list of
available options for a given program, simply specify the option
-help (or -h) at runtime, e.g.,
mpirun ex1 -helpNote that all runtime options correspond to particular PETSc routines that can be explicitly called from within a program to set compile-time defaults. For many applications it is natural to use a combination of compile-time and runtime choices. For example, when solving a linear system, one could explicitly specify use of the Krylov subspace technique BiCGStab by calling
ierr = KSPSetType(ksp,KSPBCGS);One could then override this choice at runtime with the option
-ksp_type tfqmrto select the Transpose-Free QMR algorithm. (See Chapter SLES: Linear Equations Solvers for details.)
The remainder of this section discusses details of runtime options.
Each PETSc process maintains a database of option names and values
(stored as text strings). This database is generated with the command
PETScInitialize(), which is listed below in its C/C++ and
Fortran variants, respectively:
ierr = PetscInitialize(int *argc,char ***args,char *file_name,char *help_message); call PetscInitialize(character file_name,integer ierr)The arguments argc and args (in the C/C++ version only) are the usual command line arguments, while the file_name is a name of a file that can contain additional options. By default this file is called .petscrc in the user's home directory. The user can also specify options via the environmental variable PETSC_OPTIONS. The options are processed in the following order:
The file format for specifying options is
-optionname possible_value -anotheroptionname possible_value ...All of the option names must begin with a dash (-) and have no intervening spaces. Note that the option values cannot have intervening spaces either, and tab characters cannot be used between the option names and values. The user can employ any naming convention. For uniformity throughout PETSc, we employ the format -package_option (for instance, -ksp_type and -mat_view_info).
Users can specify an alias for any option name (to avoid typing the
sometimes lengthy default name) by adding an alias to the
.petscrc file in the format
alias -newname -oldnameFor example,
alias -kspt -ksp_type alias -sd -start_in_debuggerComments can be placed in the .petscrc file by using one of the following symbols in the first column of a line: #, %, or !.
Any subroutine in a PETSc program can add entries to the database with the
command
ierr = OptionsSetValue(char *name,char *value);though this is rarely done. To locate options in the database, one should use the commands
ierr = OptionsHasName(char *pre,char *name,int *flg); ierr = OptionsGetInt(char *pre,char *name,int *value,int *flg); ierr = OptionsGetDouble(char *pre,char *name,double *value,int *flg); ierr = OptionsGetString(char *pre,char *name,char *value,int maxlen,int *flg); ierr = OptionsGetStringArray(char *pre,char *name,char **values,int *maxlen,int *flg); ierr = OptionsGetIntArray(char *pre,char *name,int *value,int *nmax,int *flg); ierr = OptionsGetDoubleArray(char *pre,char *name,double *value, int *nmax,int *flg);All of these routines set flg=1 if the corresponding option was found, flg=0 if it was not found. The optional argument pre indicates that the true name of the option is the given name (with the dash ``-'' removed) prepended by the prefix pre. Usually pre should be set to PETSC_NULL (or PETSC_NULL_CHARACTER for Fortran); its purpose is to allow someone to rename all the options in a package without knowing the names of the individual options. For example, when using block Jacobi preconditioning, the KSP and PC methods used on the individual blocks can be controlled via the options -sub_ksp_type and -sub_pc_type.
One useful means of keeping track of user-specified runtime options is use of -optionstable, which prints to stdout during PetscFinalize() a table of all runtime options that the user has specified. A related option is -optionsleft, which prints the options table and indicates any options that have not been requested upon a call to PetscFinalize(). This feature is useful to check whether an option has been activated for a particular PETSc object (such as a solver or matrix format), or whether an option name may have been accidentally misspelled.
PETSc employs a consistent scheme for examining, printing, and
saving objects through commands of the form
ierr = XXXView(XXX obj,Viewer viewer);Here obj is any PETSc object of type XXX, where XXX is Mat, Vec, SNES, etc. There are several predefined viewers:
ierr = ViewerSetFormat(Viewer viewer,int format,char *name);Possible formats include VIEWER_FORMAT_ASCII_DEFAULT, VIEWER_FORMAT_ASCII_MATLAB, and VIEWER_FORMAT_ASCII_IMPL. The implementation-specific format, VIEWER_FORMAT_ASCII_IMPL, displays the object in the most natural way for a particular implementation. For example, when viewing a block diagonal matrix that has been created with MatCreateSeqBDiag(), VIEWER_FORMAT_ASCII_IMPL prints by diagonals, while VIEWER_FORMAT_ASCII_DEFAULT uses the conventional row-oriented format.
The routines
ierr = ViewerPushFormat(Viewer viewer,int format,char *name); ierr = ViewerPopFormat(Viewer viewer);allow one to temporarily change the format of a viewer.
As discussed above, one can output PETSc objects in binary format by
first opening a binary viewer with ViewerFileOpenBinary() and
then using MatView(), VecView(), etc. The corresponding
routines for input of a binary object have the form XXXLoad(). In
particular, matrix and vector binary input is handled by the
following routines:
ierr = MatLoad(Viewer viewer,MatType outtype,Mat *newmat); ierr = VecLoad(Viewer viewer,Vec *newvec);These routines generate parallel matrices and vectors if the viewer's communicator has more than one processor. The particular matrix and vector formats are determined from the options database; see the manual pages for details.
One can provide additional information about matrix data for matrices
stored on disk by providing an optional file matrixfilename.info,
where matrixfilename is the name of the file containing the matrix.
The format of the optional file is the same as the .petscrc file
and can (currently) contain the following:
-matload_block_size <bs> -matload_bdiag_diags <s1,s2,s3,...>The block size indicates the size of blocks to use if the matrix is read into a block oriented data structure (for example, MATSEQBDIAG or MATMPIBAIJ). The diagonal information s1,s2,s3,... indicates which (block) diagonals in the matrix have nonzero values. Section gives details.
PETSc programs may be debugged using one of the two options below.
By default the GNU debugger gdb is used when -start_in_debugger or -on_error_attach_debugger is specified. To employ either xxgdb or the common UNIX debugger dbx, one uses command line options as indicated above. On HP-UX machines the debugger xdb should be used instead of dbx; on RS/6000 machines the xldb debugger is supported as well. By default, the debugger will be started in a new xterm (to enable running separate debuggers on each process), unless the option noxterm is used. In order to handle the MPI startup phase, the debugger command ``cont'' should be used to continue execution of the program within the debugger. Rerunning the program through the debugger requires terminating the first job and restarting the processor(s); the usual ``run'' option in the debugger will not correctly handle the MPI startup and should not be used. Not all debuggers work on all machines, so the user may have to experiment to find one that works correctly.
Errors are handled through the routine PetscError().
This routine
checks a stack of error handlers and calls the one on the top.
If the stack is empty, it selects PetscTraceBackErrorHandler(),
which tries to print a traceback.
A new error handler can be put on the stack with
ierr = PetscPushErrorHandler(int (*HandlerFunction)(int line,char *dir,char *file, char *message,int number,void*),void *HandlerContext)The arguments to HandlerFunction() are the line number where the error occurred, the file in which the error was detected, the corresponding directory, the error message, the error integer, and the HandlerContext. The routine
ierr = PetscPopErrorHandler()removes the last error handler and discards it.
PETSc provides two additional error handlers besides
PetscTraceBackErrorHandler():
PetscAbortErrorHandler() PetscAttachErrorHandler()PetscAbortErrorHandler() calls abort on encountering an error, while PetscAttachErrorHandler() attaches a debugger to the running process if an error is detected. At runtime, these error handlers can be set with the options -on_error_abort or -on_error_attach_debugger [noxterm, dbx, xxgdb, xldb] [-display DISPLAY].
All PETSc calls can be traced (useful for determining where a program is
hanging without running in the debugger) with the option
-log_trace [filename]where filename is optional. By default the traces are printed to the screen. This can also be set with the command PLogTraceBegin(FILE*).
It is also possible to trap signals by using the
command
ierr = PetscPushSignalHandler( int (*Handler)(int,void *),void *ctx);The default handler PetscDefaultSignalHandler() calls PetscError() and then terminates. In general, a signal in PETSc indicates a catastrophic failure. Any error hander that the user provides should try to clean up only before exiting. By default all PETSc programs use the default signal handler, although the user can turn this off at runtime with the option -no_signal_handler .
There is a separate signal handler for floating-point exceptions.
The option -fp_trap turns on the floating-point trap at runtime,
and the routine
ierr = PetscSetFPTrap(int flag);can be used in-line. A flag of PETSC_FP_TRAP_ON indicates that floating-point exceptions should be trapped, while a value of PETSC_FP_TRAP_OFF (the default) indicates that they should be ignored. Note that on certain machines, in particular the IBM RS/6000, trapping is very expensive.
A small set of macros is used to make the error handling lightweight.
These macros are used throughout the PETSc libraries and can be employed
by the application
programmer as well. When an error is first detected,
one should set it by calling
SETERRQ(int flag,int pflag,char *message);The user should check the return codes for all PETSc routines (and possibly user-defined routines as well) with
ierr = PetscRoutine(...); CHKERRQ(int ierr);Likewise, all memory allocations should be checked with
ptr = (double *) PetscMalloc(n*sizeof(double)); CHKPTRQ(void *ptr);If this procedure is followed throughout all of the user's libraries and codes, any error will by default generate a clean traceback of the location of the error. In any main programs, however, the variants SETERRA(), CHKERRA(), and CHKPTRA() should be used instead to cause all processes of program to abort when an error has been detected. Use of the abort variant of the error checking commands is critical in the main program, since they ensure that MPI_Abort() is called before the process ends; otherwise, other MPI processes that did not generate errors may remain unterminated.
Note that the macro __FUNC__ is used to keep track of
routine names during error tracebacks. Users need not worry about this
macro in their application codes; however, users can take advantage of this feature
if desired by setting this macro before each user-defined routine
that may call SETERRQ(), SETERRA(), CHKERRQ(),
or CHKERRA(). A simple example of usage is given below.
#undef __FUNC__ #define __FUNC__ "MyRoutine1" int MyRoutine1() { /* code here */ return 0; }
When developing large codes, one is often in the position of having a correctly (or at least believed to be correctly) running code; making a change to the code then changes the results for some unknown reason. Often even determining the precise point at which the old and new codes diverge is a major pain. In other cases, a code generates different results when run on different numbers of processors, although in exact arithmetic the same answer is expected. (Of course, this assumes that exactly the same solver and parameters are used in the two cases.)
PETSc provides some support for determining exactly where in the code
the computations lead to different results. First, compile both programs
with different names. Next, start running
both programs as a single MPI job. This procedure is dependent on the particular
MPI implementation being used.
For example, when using MPICH on workstations,
procgroup files can be used to specify the processors on which the job is
to be run. Thus, to run two programs, old and new,
each on two processors, one should create the procgroup file with the
following contents:
local 0 workstation1 1 /home/bsmith/old workstation2 1 /home/bsmith/new workstation3 1 /home/bsmith/new(Of course, workstation1, etc. can be the same machine.) Then, one can execute the command
mpirun -p4pg <procgroup_filemame> old -compare <tolerance> [your_program_options]Note that the same runtime options must be used for the two programs. The first time an inner product or norm detects an inconsistency larger than <tolerance>, PETSc will generate an error. The usual runtime options -start_in_debugger and -on_error_attach_debugger may be used. The user can also place the commands
PetscCompareDouble() PetscCompareScalar() PetscCompareInt()in portions of the application code to check for consistency between the two versions.
PETSc supports the use of complex numbers in application programs written in C, C++, and Fortran. To do so, we employ C++ versions of the PETSc libraries in which the basic ``scalar'' datatype, given in PETSc codes by Scalar, is defined as complex (or complex<double> for machines using templated complex class libraries). To work with complex numbers, the user should compile the PETSc libraries (including the Fortran interface library) and the application code with BOPT=[g_complex,O_complex,Opg_complex] for debugging, optimized, and profiling versions, respectively. The file ${}PETSC_DIR/docs/installation.html provides detailed instructions for installing PETSc.
We recommend using optimized Fortran kernels for some key numerical routines with complex numbers (such as matrix-vector products, vector norms, etc.) instead of the default C++ routines. See the ``Complex Numbers'' section of the file ${}PETSC_DIR/docs/installation.html for details on building these kernels. This implementation exploits the maturity of Fortran compilers while retaining the identical user interface. For example, on rs6000 machines, the base single-node performance when using the Fortran kernels is 4-5 times faster than the default C++ code.
Recall that each variant of the PETSc libraries is stored in a different directory, given by ${}PETSC_DIR/lib/lib${}BOPT/${}PETSC_ARCH, according to the architecture and BOPT optimization variable. Thus, the libraries for complex numbers are maintained separately from those for real numbers. When using any of the complex numbers versions of PETSc, all vector and matrix elements are treated as complex, even if their imaginary components are zero. Of course, one can elect to use only the real parts of the complex numbers when using the complex versions of the PETSc libraries; however, when working only with real numbers in a code, one should use a version of PETSc for real numbers for best efficiency.
The program ${}PETSC_DIR/src/sles/examples/tutorials/ex11.c solves a linear system with a complex coefficient matrix. Its Fortran counterpart is ${}PETSC_DIR/src/sles/examples/tutorials/ex11f.F.
If users develop application codes on UNIX machines using Emacs (which we
highly recommend), the etags feature can be used to search PETSc
files quickly and efficiently. To use this feature, one should
first check if the file,
${}PETSC_DIR/TAGS exists. If this file is
not present, it should be generated by
running make etags from the PETSc home directory.
Once the file exists, from
Emacs the user should issue
the command
M-x visit-tags-tablewhere `` M'' denotes the Emacs Meta key, and enter the name of the TAGS file. Then the command `` M-.'' will cause Emacs to find the file and line number where a desired PETSc function is defined. Any string in any of the PETSc files can be found with the command `` M-x tags-search''. To find repeated occurrences, one can simply use `` M-,'' to find the next occurrence.
If users develop application codes on UNIX machines using VI, the ctags feature can be used to browse PETSc files quickly and efficiently. To use this feature, one should first check if the file, ${}PETSC_DIR/vitags exists. If this file is not present, it should be generated by running make vitags from the PETSc home directory. Once the file exists, from VI, the user should issue the command `` :set tags=${}PETSC_DIR/vitags'' or the user should add to his `` /.exrc '' file the line `` set tags=${}PETSC_DIR/vitags''. Then the command `` :tag FunctionName'' will cause VI to find the file and line number where a desired PETSc function is defined.
When used in a message-passing environment, all communication within PETSc is done through MPI, the message-passing interface standard [(ref MPI-final)]. Any file that includes petsc.h (or any other PETSc include file), can freely use any MPI routine.
This chapter describes the design of the PETSc makefiles, which are the key to managing our code portability across a wide variety of UNIX systems.
To make a program named ex1, one may use the command
make BOPT=[g,O,Opg] PETSC_ARCH=arch ex1which will compile a debugging, optimized, or profiling version of the example and automatically link the appropriate libraries. The architecture, arch, is one of sun4, solaris, rs6000, IRIX, hpux, freebsd, etc. Note that when using command line options with make (as illustrated above), one must not place spaces on either side of the ``='' signs. The variables BOPT and PETSC_ARCH can also be set as environmental variables. Although PETSc is written in C, it can be compiled with a C++ compiler. For many C++ users this may be the preferred route. To compile with the C++ compiler, one should use the option BOPT=g_c++ or BOPT=O_c++, or BOPT=Opg_c++. The options BOPT=g_complex, BOPT=O_complex, and BOPT=Opg_complex will create versions that use complex double-precision numbers.
The directory ${}PETSC_DIR/bmake contains virtually all makefile commands and customizations to enable portability across different architectures. Most makefile commands for maintaining the PETSc system are defined in the file ${}PETSC_DIR/bmake/common. These commands, which process all appropriate files within the directory of execution, include
make BOPT=g ACTION=lib treewere executed from the directory ${}PETSC_DIR/src/ksp, the debugging library for all Krylov subspace solvers would be built.
The directory ${}PETSC_DIR/bmake contains a subdirectory for each architecture that contains machine-specific information, enabling the portability of our makefile system. For instance, for Sun SPARCstations running OS 4.1.3, the directory is called sun4. Each architecture directory contains several base makefiles:
We discovered that for no apparent reason, under freeBSD the include syntax is different from that of all other makefiles. Thus, under freeBSD gnumake must be used.
PETSc has several flags that determine how the source code will be compiled. The default flags for particular versions are specified by the variable PETSCFLAGS within the base files of ${}PETSC_DIR/bmake/${}PETSC_ARCH, discussed in Section Customized Makefiles . The flags include
Maintaining portable PETSc makefiles is very simple. In Figures 22 , 23 , and 24 we present three sample makefiles.
The first is a ``minimum'' makefile for maintaining
a single program that uses the PETSc libraires.
The most important line in this makefile is the line starting with include:
include ${PETSC_DIR}/bmake/${PETSC_ARCH}/baseThis line includes other makefiles that provide the needed definitions and rules for the particular base PETSc installation (specified by ${}PETSC_DIR) and architecture (specified by ${}PETSC_ARCH). (See Running PETSc Programs for information on setting these environmental variables.) As listed in the sample makefile, the appropriate include file is automatically completely specified; the user should not alter this statement within the makefile.
ALL: ex2CFLAGS = ${CPPFLAGS} FFLAGS = DIRS =
include ${PETSC_DIR}/bmake/${PETSC_ARCH}/base
ex2: ex2.o chkopts ${CLINKER} -o ex2 ex2.o ${PETSC_LIB} ${RM} ex2.o
The second sample makefile, given in Figure 23 , controls the generation of several example programs.
CFLAGS = ${CPPFLAGS} RUNEXAMPLES_1 = runex1 runex2 RUNEXAMPLES_2 = runex4 RUNEXAMPLES_3 = runex3 EXAMPLESC = ex1.c ex2.c ex4.c EXAMPLESF = ex3.F EXAMPLES_1 = ex1 ex2 EXAMPLES_2 = ex4 EXAMPLES_3 = ex3ex1: ex1.o -${CLINKER} -o ex1 ex1.o ${PETSC_LIB} ${RM} ex1.o ex2: ex2.o -${CLINKER} -o ex2 ex2.o ${PETSC_LIB} ${RM} ex2.o ex3: ex3.o -${FLINKER} -o ex3 ex3.o ${PETSC_FORTRAN_LIB} ${PETSC_LIB} ${RM} ex3.o ex4: ex4.o -${CLINKER} -o ex4 ex4.o ${PETSC_LIB} ${RM} ex4.o
runex1: -@${MPIRUN} ex1 runex2: -@${MPIRUN} -np 2 ex2 -mat_seqdense -optionsleft runex3: -@${MPIRUN} ex3 -v -log_summary runex4: -@${MPIRUN} -np 4 ex4 -trdump
include ${PETSC_DIR}/bmake/${PETSC_ARCH}/base
ALL: libCFLAGS = SOURCEC = sp1wd.c spinver.c spnd.c spqmd.c sprcm.c SOURCEF = degree.f fnroot.f genqmd.f qmdqt.f rcm.f fn1wd.f gen1wd.f \ genrcm.f qmdrch.f rootls.f fndsep.f gennd.f qmdmrg.f qmdupd.f SOURCEH = OBJSC = sp1wd.o spinver.o spnd.o spqmd.o sprcm.o OBJSF = degree.o fnroot.o genqmd.o qmdqt.o rcm.o fn1wd.o gen1wd.o \ genrcm.o qmdrch.o rootls.o fndsep.o gennd.o qmdmrg.o qmdupd.o LIBBASE = libpetscmat MANSEC = 2
include ${PETSC_DIR}/bmake/${PETSC_ARCH}/base
The variable MANSEC indicates that any manual pages generated from this source should be included in the second section.
This approach to portable makefiles has some minor limitations, including the following:
PETSc includes two GUI utilities, PETScView and PETScOpts, that facilitate library use and interpretation of computational results. We acknowledge the contributions of Matt Hille (Washington State University), who focused on the design and documentation of these tools while a participant in Argonne's Summer Student Research Participation Program, 1995.
As discussed in Chapter Profiling , PETSc incorporates uniform event logging throughout the library in the form of statistics regarding object creation and destruction, floating-point operations, execution time, and memory usage. The utility PETScView provides an abstract interpretation of the profile data for a high-level view of the interrelationships among various code modules. PETScView assists in debugging, analysis, and performance enhancement and is especially useful for complex simulations that employ a combination of numerical methods and modeling techniques.
An additional GUI tool is PETScOpts, which provides a simple interface to the full range of PETSc options database commands. As discussed in Section Runtime Options , these options enable the user to set particular solvers, data structures, profiling options, etc. at runtime, thereby facilitating the customization and comparison of various algorithms and storage schemes.
PETScView and PETScOpts use the Tcl and the Tk Toolkit [(ref tcl-tk-web-page)]. Therefore, in order to use the PETSc utility programs, the Tcl and Tk packages must be installed on the user's local system. See the following WWW site for information about Tcl/Tk: http://www.sunlabs.com/research/tcl/.
In order for PETScView and PETScOptions to work properly, a
slight modification of the source code is required. Using any text
editor, the user should load the file ${}PETSC_DIR/bin/petscview (or
${}PETSC_DIR/bin/petscopts) and change the
first line in the source code to point to the proper location of where
wish can be found. For example, if wish is located at
/usr/bin/wish, the first line for each program should be changed
to
#! /usr/local/wish -fAfter this modification, PETScView and PETScOpts are ready to run.
Since Tcl/Tk are constantly changing (even faster than PETSc :-)), it is difficult to keep PetscView and PetscOpts compatible with the latest release of Tcl/Tk, while still working with earlier releases. Thus, the user may have to modify the PetscView and PetscOpts script slightly to get them working with a particular version of Tcl/Tk.
Whenever a PETSc program is executed with the -log_all or -log option, a log file is produced that can then be interpreted by PETScView. PETScView generates a dynamic tree-shaped hierarchy whose nodes contain icons that uniquely identify PETSc objects (such as linear solvers, matrices, and distributed arrays). The objects are color coded to denote the various states of activity, so that PETScView illustrates the changing relationships among objects during program execution. A sample PETScView object tree is shown in Figure 25 for a parallel linear solver.
A number of built-in commands enable the application programmer to navigate easily through the simulation. In addition, PETScView can display performance statistics for particular objects and their children, thus enabling users to focus performance tuning efforts. This section provides introductory information regarding PETScView. Additional details are available via the ``help'' feature of the utility.
Due to fundamental limitations of Tcl/Tk, PETScView can only be run with relatively small log files (at most a couple thousand events). If the user generates a very large log file, Tcl/Tk will hang and often swamp the machine.
To begin PETScView, the user should type petscview from the UNIX shell prompt.
To load a PETSc log file, one runs PETScView, giving the log
file name as a command line argument:
petscview Log.0This command invokes PETScView and automatically loads and interprets the profiling data contained in the file Log.0. PETScView can also be run without a log file given as a command line argument. In this case, the user must load the log file from within PETScView. To do this, the user selects the ``Open File'' command from the file menu. PETScView will automatically present the user with another window from which the file can be selected.
PETScView supports several additional command line arguments, as listed in Table 5 . Note that the command line options override the default values within the user's .petscviewrc file, which is discussed further in the following section.
Argument Purpose -deffile filename The location of the definitions file -time Show the time -notime Do not show the time -stepsize N Set the stepsize to N -delay N Set the delay to N milliseconds -printerdest DESTINATION Set the destination (File or Printer) -printcommand COMMAND Set the postscript print command -printer PRINTER Specify the printer -printorientation 1 or 0 $1 = landscape , 0 = portrait -printcolor Specify color mode (color, gray, mono)
PETSc configuration files contain information that determines the graphical representation of PETSc objects within PETScView. Whenever PETScView is invoked, a configuration file is automatically read. The location of this file is specified in .petscviewrc, which is stored in the user's home directory. By default, .petscviewrc points to the configuration file, ( ${}PETSC_DIR/bin/petscview.cfg); however, this can be changed to point to a configuration file created by the application programmer. Section Advanced Features gives more information about changing the .petscviewrc file.
Even though PETScView loads a definitions file whenever it is initially run, a file of new definitions can be loaded from within PETScView at any time. This command is found in the file menu. Loading a new definitions file will automatically update all PETSc objects.
The ``print'' command of the file menu displays a dialog box from which the user can change the printing options. The default values are loaded from .petscviewrc when PETScView is first run. When the proper options are set for printing, one clicks on the ``print'' button or presses ``return''. If the user is printing to a file, another dialog box will appear from which the user may specify the output filename.
Note: Currently, PETScView prints trees of size less than 1024x768 (measured in pixels). If the scrollbars are needed to view any parts of a tree, it is very unlikely that the whole tree will be printed. This situation presents no problem, however, when the tree is printed to a file.
To exit PETScView, one selects the ``Exit'' option of the file menu. When this action is confirmed, PETScView will terminate and return the user to the calling shell.
Once a file has been loaded by PETScView, the user can navigate through the simulation by using PETScView's play bar. The play bar is located at the bottom of the window and contains buttons whose appearances and functioning resemble the buttons on a tape player. From left to right, these buttons have the following functions:
All of the above functions can also be accessed through the ``player'' menu. Selecting the player menu lists the commands in addition to their accelerator keys. The player menu also contains two additional commands that allow the user to jump to an arbitrary step in the simulation or to an arbitrary time during the simulation. When one of these commands is invoked, the user is prompted for the proper target jump value.
The View menu contains several additional commands that can be useful in the profiling of a PETSc program. These commands enable the user to view the raw profiling data as well as various statistics about program performance.
The ``options'' menu allows the user to change certain options that are set by default whenever PETScView is initially invoked. (The default values are defined in .petscviewrc.) These options include the step size when stepping through the simulation, the delay between events when playing through the simulation, and the colors chosen to denote the internal states of the PETSc objects.
The simulation's step size can be changed at any time during the simulation by selecting the ``Step size'' command of the ``Options'' menu. Whenever the user clicks on this selection, a dialog box appears prompting the user for the desired step size. Valid step sizes range from 1 to the total number of events in the simulation.
To change the delay, one selects the ``delay'' command of the options menu. A cascading menu presents the user with a few built-in delays, which include none, real-time, and second delay. (The real-time delay option causes delays between events in the simulation to be proportionate to the delays in the actual execution of the PETSc program.) To specify a user-defined delay, the user clicks on the ``after delay'' selection. Then the user will be presented with a dialog box in which the user can specify the desired delay in milliseconds. Only values ranging from 1 to 2000 are valid.
The final selection on the ``Options'' menu presents the user with a cascading menu with entries that allow the user to change PETScView's object color-coding scheme. Selecting any one of the entries presents the user with another window from which the color may be chosen. (The colors are taken from /usr/lib/X11/rgb.txt.)
PETScView allows the user to define/redefine the graphical representation of PETSc objects. To do so, the user creates a configuration file, a Tcl script file, which includes specific definitions for group shapes, group labels, object icons, object labels, and action strings. Whenever PETScView is invoked, a configuration file is read from the location specified in the .petscviewrc file. By default, it is set to point to the configuration file included with the PETSc package ( ${}PETSC_DIR/bin/petscview.cfg). Allowing the application programmer to create a customized configuration file enables PETScView to interpret the profiling data even when new PETSc objects have been created.
Group Objectcookie Viewers 0 Index Sets 1 Vectors 2 Vector Scattering 3 Matrices 4 Draw (simple graphics) 5 Line Graphs 6 Krylov Subspace Solvers 7 Preconditioners 8 Simplified Linear Equations Solvers 9 Grids 10 Stencils 11 Simplified Nonlinear Solvers 12 Distributed Arrays 13 Matrix Scattering 14
set GroupShape(OBJECT_COOKIE) SHAPE set GroupDesc(OBJECT_COOKIE) "GROUP DESCRIPTION"where OBJECT_COOKIE is the integer used to identify the group of objects and SHAPE is one of the predefined shapes used by PETScView. These shapes include
Square Wide_Rectangle Down_Triangle Thin_RectangleV Tall_Oval Octagon Thin_RectangleH Wide_Oval Circle Rectangle Up_TriangleGROUP DESCRIPTION is a line of text enclosed by quotes to describe the object group. For example, vectors have the following group descriptions:
set GroupShape(2) Thin_RectangeV set GroupDesc(2) Vectors
Within each group, the object is identified with the use of a second integer ( OBJECT_TYPE). This specifies the type of object within the object group. PETScView requires both a name and an icon to be defined for every object. These definitions have the following syntax:
set Icon(OBJECT_COOKIE,OBJECT_TYPE) "-text TEXT" or set Icon(OBJECT_COOKIE,OBJECT_TYPE) "-bitmap @BITMAP_LOC" set Name(OBJECT_COOKIE,OBJECT_TYPE) "OBJECT DESCRIPTION"where TEXT is a short description of the object (used if a bitmap is inappropriate) and BITMAP_LOC is the location of the bitmap graphic. The OBJECT DESCRIPTION is a line of text used to describe the object. As an example, we give the following definitions:
set Icon(2,0) "-bitmap @$env(PETSC_DIR)/bitmaps/vector.bit" set Name(2,0) "Sequential Vector" set Icon(2,1) "-bitmap @$env(PETSC_DIR)/bitmaps/vectorp.bit" set Name(2,1) "Parallel Vector"Notice the syntax that Tcl requires for the location of the bitmap. The bitmap location must be preceded by a @ in order for PETScView the work properly. $env(${}PETSC_DIR) is used to access the value of the environmental variable PETSC_DIR. To use the value of any other environmental variables in specifying a file location, one must use in the expression the following syntax:
$env(ENVIRONMENTAL_VARIABLE)To create one's own bitmap picture to represent an object, the user creates the bitmap using a program such as bitmap. Once this is done, PETScView must know the location of the bitmap. The user must specify the precise location in the file system where the bitmap graphic can be found. For example, suppose that one creates a new bitmap to symbolize a parallel vector. Since the bitmap is located in the user's home directory, the following definition will not create an error:
set Icon(2,1) "-bitmap $env(HOME)/vectorp.bit"More examples on defining additional PETSc objects and informat about how PETSc defines the object types, are given in ${}PETSC_DIR/bin/petscview.cfg.
When certain actions occur during the execution of a PETSc program,
these actions are also recorded in the profiling data. Once again,
PETSc uses an integer to specify the type of action that is being
performed. PETScView interprets the actions using the
definitions contained in action() string definitions. These
definitions are also located in the configuration file. An action
definition has the following syntax:
set Action(ACTION_ID) "ACTION"where ACTION_ID is an integer that encodes the action and ACTION is a descriptive string. Currently, PETScView uses the action definitions as defined in petscview.cfg.
PETScOpts is a PETSc utility program that enables the application
programmer to modify his or her personal .petscrc file. As
described in Section Runtime Options
, the .petscrc file
contains a list of options that will be passed to a PETSc program
whenever it is executed. This file has the following format:
-optionname possible_value -anotheroptionname possible_valueEven though this file can be manually modified by the application programmer with any text editor, PETScOpts greatly simplifies this task.
The command petscopts will invoke PETScOpts from the UNIX shell prompt. Any entries contained in the .petscrc file of the user's home directory will automatically be interpreted by PETScOpts. Once inside PETScOpts, a number of entry boxes, check buttons, radio buttons, and other widgets allow the application programmer to specify the options that should be saved in the .petscrc file for future use.
PETScOpts can also write the PETSc command line options to a file
other than the default .petscrc file. To do so, run PETScOpts
with the file name as a command line argument:
petscopts file_nameFrom within PETScOpts, a different file can be loaded at any time by selecting the ``Open file'' option of the file menu.
Even though many of PETSc's command line options are self-explanatory, a single descriptive line of text is displayed at the bottom of the window whenever the pointer is positioned over any check button, radio button, or entry that specifies an option.
The user can exit PETScOpts at any time by selecting the exit button from the file menu. If a .petscrc file was loaded when PETScOpts was initiated, the user is asked whether the current or original settings (or neither) should be saved in .petscrc.
This chapter introduces additional features of the PETSc matrices and solvers. Since most PETSc users should not need to use these features, we recommend skipping this chapter during an initial reading.
One can extract a (parallel) submatrix from a given (parallel) using
ierr = MatGetSubMatrix(Mat A,IS rows,IS cols,int csize,MatGetSubMatrixCall call,Mat *B);This extracts the rows and columns of the matrix A into B. If call is MAT_INITIAL_MATRIX it will create the matrix B. If call is MAT_REUSE_MATRIX it will reuse the B created with a previous call. The argument csize is ignored on sequential matrices, for parallel matrices it determines the ``local columns'' if the matrix format supports this concept. Often one can use the default by passing in PETSC_DECIDE. To create a B matrix that may be multiplied with a vector x one can use
ierr = VecGetLocalSize(x,&csize); ierr = MatGetSubMatrix(Mat A,IS rows,IS cols,int csize,MatGetSubMatrixCall call,Mat *B);
Normally, PETSc users will access the matrix solvers through the SLES interface, as discussed in Chapter SLES: Linear Equations Solvers , but the underlying factorization and triangular solve routines are also directly accessible to the user.
The LU and Cholesky
matrix factorizations are split into
two or three stages depending on the user's needs. The first stage is
to calculate an ordering for the matrix. The ordering generally is
done to reduce fill in a sparse factorization; it does not make much
sense for a dense matrix.
ierr = MatGetReordering(Mat matrix,MatReorderingType type,IS* rowperm,IS* colperm);The currently available alternatives for the ordering type are
Users can add their own reordering routines
by providing a function with the calling sequence
int reorder(Mat A,MatReorderingType type,IS* rowperm,IS* colperm);Here A is the matrix for which we wish to generate a new ordering, type may be ignored and rowperm and colperm are the row and column permutations generated by the reordering routine. The user registers the reordering routine with the command
ierr = MatReorderingRegister(MatReorderingType inname,MatReorderingType *name,char *sname, int (*reorder)(Mat,MatReorderingType,IS*,IS*)));The input argument *sname is a string of the user's choice, iname is either an ordering defined in mat.h or ORDER_NEW , to indicate one is introducing a new ordering, while the output argument *name is the registration number returned to the user. See the code in src/mat/impls/order/sorder.c and other files in that directory for examples on how the reordering routines may be written.
Once the reordering routine has been registered, it can be selected for use at runtime with the command line option -mat_order sname. If reordering directly, the user should provide the name as the second input argument of MatGetReordering().
The following routines perform complete, in-place, symbolic, and numerical
factorizations for symmetric and nonsymmetric matrices, respectively:
ierr = MatCholeskyFactor(Mat matrix,IS permutation,double pf); ierr = MatLUFactor(Mat matrix,IS rowpermutation,IS columnpermutation,double pf);The argument pf ge 1 is the predicted fill expected in the factored matrix, as a ratio of the original fill. For example, pf=2.0 would indicate that one expects the factored matrix to have twice as many nonzeros as the original.
For sparse matrices it is very unlikely that the factorization is actually done in-place. More likely, new space is allocated for the factored matrix and the old space deallocated, but to the user it appears in-place because the factored matrix replaces the unfactored matrix.
The
two
factorization
stages
can also be performed separately, by using the out-of-place mode:
ierr = MatCholeskyFactorSymbolic(Mat matrix,IS perm, double pf,Mat *result); ierr = MatLUFactorSymbolic(Mat matrix,IS rowperm,IS colperm,double pf,Mat *result); ierr = MatCholeskyFactorNumeric(Mat matrix,Mat *result); ierr = MatLUFactorNumeric(Mat matrix, Mat *result);In this case, the contents of the matrix result is undefined between the symbolic and numeric factorization stages. It is possible to reuse the symbolic factorization. For the second and succeeding factorizations, one simply calls the numerical factorization with a new input matrix and the same factored result matrix. It is essential that the new input matrix haveexactly the same nonzero structure as the original factored matrix. (The numerical factorization merely overwrites the numerical values in the factored matrix and does not disturb the symbolic portion, thus enabling reuse of the symbolic phase.) In general, calling XXXFactorSymbolic with a dense matrix will do nothing except allocate the new matrix; the XXXFactorNumeric routines will do all of the work.
Why provide the plain XXXfactor routines when one could simply call the two-stage routines? The answer is that if one desires in-place factorization of a sparse matrix, the intermediate stage between the symbolic and numeric phases cannot be stored in a result matrix, and it does not make sense to store the intermediate values inside the original matrix that is being transformed. We originally made the combined factor routines do either in-place or out-of-place factorization, but then decided that this approach was not needed and could easily lead to confusion.
We do not currently support sparse matrix factorization with pivoting
for numerical stability. This is because trying to both reduce fill
and do pivoting can become quite complicated. Instead, we provide a
poor stepchild substitute. After one has obtained a reordering, with
MatGetRordering(Mat A,MatOrdering type,IS *row,IS *col) one
may call
ierr = MatReorderForNonzeroDiagonal(Mat A,double tol,IS row, IS col);which will try to reorder the columns to ensure that no values along the diagonal are smaller than tol in a absolute value. If small values are detected and corrected for, a nonsymmetric permutation of the rows and columns will result. This is not guaranteed to work, but may help if one was simply unlucky in the original ordering. When using the SLES solver interface the options -pc_ilu_nonzeros_along_diagonal <tol> and -pc_lu_nonzeros_along_diagonal <tol> may be used. Here, tol is an optional tolerance to decide if a value is nonzero; by default it is 1.e-10.
Once a matrix has been factored, it is natural to solve linear systems.
The following four routines enable this process:
ierr = MatSolve(Mat A,Vec x, Vec y); ierr = MatSolveTrans(Mat A, Vec x, Vec y); ierr = MatSolveAdd(Mat A,Vec x, Vec y, Vec w); ierr = MatSolveTransAdd(Mat A, Vec x, Vec y, Vec w);The matrix A of these routines must have been obtained from a factorization routine; otherwise, an error will be generated. In general, the user should use the SLES solvers introduced in the next chapter rather than using these factorization and solve routines directly.
Again, virtually all users should use KSP through the SLES interface and, thus, will not need to know the details that follow.
It is possible to generate a Krylov subspace context with the
command
ierr = KSPCreate(MPI_Comm comm,KSP *kps);Before using the Krylov context, one must set the matrix-vector multiplication routine and the preconditioner with the commands
ierr = PCSetOperators(PC pc,Mat mat,Mat pmat,MatStructure flag); ierr = KSPSetPC(KSP ksp,PC pc);In addition, the KSP solver must be initialized with
ierr = KSPSetUp(KSP ksp);Solving a linear system is done with the command
ierr = KSPSolve(KSP ksp,int *its);Finally, the KSP context should be destroyed with
ierr = KSPDestroy(KSP ksp);It may seem strange to put the matrix in the preconditioner rather than directly in the KSP; this decision was the result of much agonizing. The reason is that for SSOR with Eisenstat's trick, and certain other preconditioners, the preconditioner has to change the matrix-vector multiply. This procedure could not be done cleanly if the matrix were stashed in the KSP context that PC cannot access.
Any preconditioner can supply not only the preconditioner, but also a routine that essentially performs a complete Richardson step. The reason for this is mainly SOR. To use SOR in the Richardson framework, that is,
u^n+1 = u^n + B(f - A u^n),
is much more expensive than just updating the values. With this addition it is reasonable to state that all our iterative methods are obtained by combining a preconditioner from the PC component with a Krylov method from the KSP component. This strategy makes things much simpler conceptually, so (we hope) clean code will result. Note: We had this idea already implicitly in older versions of SLES, but, for instance, just doing Gauss-Seidel with Richardson in old SLES was much more expensive than it had to be. With PETSc 2.0 this should not be a problem.
Most users will obtain their preconditioner contexts from the SLES
context with the command SLESGetPC(). It is possible to create,
manipulate, and destroy PC contexts directly, although this capability
should rarely be needed. To create a PC context, one uses the command
ierr = PCCreate(MPI_Comm comm,PC *pc);The routine
ierr = PCSetType(PC pc,PCType method);sets the preconditioner method to be used. The two routines
ierr = PCSetOperators(PC pc,Mat mat,Mat pmat,MatStructure flag); ierr = PCSetVector(PC pc,Vec vec);set the matrices and type of vector that are to be used with the preconditioner. The vec argument is needed by the PC routines to determine the format of the vectors. The routine
ierr = PCGetOperators(PC pc,Mat *mat,Mat *pmat,MatStructure *flag);returns the values set with PCSetOperators().
The preconditioners in PETSc can be used in several ways. The two
most basic routines simply apply the preconditioner or its transpose
and are given, respectively, by
ierr = PCApply(PC pc,Vec x,Vec y); ierr = PCApplyTrans(PC pc,Vec x,Vec y);In particular, for a preconditioner matrix, B, that has been set via PCSetOperators(pc,A,B,flag), the routine PCApply(pc,x,y) computes y = B-1 x by solving the linear system By = x with the specified preconditioner method.
Additional preconditioner routines are
ierr = PCApplyBAorAB(PC pc,int right,Vec x,Vec y,Vec work,int its); ierr = PCApplyBAorABTrans(PC pc,int right,Vec x,Vec y,Vec work,int its); ierr = PCApplyRichardson(PC pc,Vec x,Vec y,Vec work,int its);The first two routines apply the action of the matrix followed by the preconditioner or the preconditioner followed by the matrix depending on whether the integer right is zero or one. The final routine applies its iterations of Richardson's method. The last three routines are provided to improve efficiency for certain Krylov subspace methods.
A PC context that is no longer needed can be destroyed with the
command
ierr = PCDestroy(PC pc);