Newsgroups: comp.parallel
From: "Patrick F. McGehearty" <patrick@rsn.hp.COM>
Subject: Re: LINPACK Benchmark
Organization: Hewlett Packard, Richardson, Tx USA
Date: 13 Apr 1998 16:49:59 GMT
Message-ID: <6gtfnn$3su$1@encore.ece.cmu.edu>

In article <6f738q$lod$1@encore.ece.cmu.edu>,  <chetan@cdac.ernet.in> wrote:
>The question is
>How does one arrive at this number ?
>I have gone through the LINPACK site at netlib and have (I
>think) gone through alomost all the related documents. So,
>I am NOT looking for references to netlib or TOP500 report.
>Is it done thing to multiply the number of processors in the
>machine by the MFLOPS to obtain the performance for the
>machine ?
>where
>MFLOPS = OPS/time
>and OPS = 2/3N**3+2N**2	 for NxN matrix
>time is the time taken to finish the job (wall-clock or
>otherwise)

The only number that is obtained by multiplying number of processors
and MFLOPS is the R_peak.  Another definition for R_peak is "that rate
which the salesman guarantees you will not exceed". :-)

R_max might be as high as 96% of R_peak for some vector machines or as
low as 45% of R_peak for low bandwidth boxes, just to pick a couple of
cases that I found by quick inspection of Dongarra's charts.  You
obtain R_max by implementing and executing the best Linear Equation
Solver for your chosen architecture.  Then you vary the problem size
to find the point where you stop getting improved performance by
having a larger problem.  Getting high R_max on small problem sizes
(N_max) is considered a good thing.  The all Fortran source provided
for the Linpack100x100 benchmark is pretty much irrelevant for finding
the R_max of current architectures.  It can provide reference results
to insure you get correct answers with your tuned code. For serious
competitors, it is not unusually for an expert in the architecture and
in Linear equations to spend several weeks tuning their existing
algorithms to show a new machine in the best possible light.  For
radical architecture changes, several people might work on the issue
for multiple months.  All this tuning effort is not just benchmark
chasing, as the improvements usually show up in the production math
libraries for solving linear equations.

Because serious architecture specific tuning can often make a factor
of two difference in performance between a good implementation of a
linear equation solver and a great implementation, be careful about
believing claims about the relative strength of an architecture unless
you know something about the competence of the tuning effort that was
put into achieving the results you are looking at.

- Patrick McGehearty
(an occasional Linpack tuner for both vector and RISC parallel boxes)
patrick@rsn.hp.com

--
Articles to parallel@ctc.com (Administrative: bigrigg@cs.cmu.edu)
Archive: http://www.hensa.ac.uk/parallel/internet/usenet/comp.parallel