FILEREGIONIOSTATS(1)    

NAME
       FileRegionIOstats  -  Produce  report from I/O File Region
       Summary trace records

SYNOPSIS
       FileRegionIOstats tracefile

DESCRIPTION
       FileRegionIOstats generates a report  of  application  I/O
       activity summarized by file region from Open, Global Open,
       and File Region Summary trace records in  the  input  SDDF
       file.   The  necessary trace event records are produced by
       the I/O extension to the Pablo trace library when the File
       Region Summary option is enabled.

       As  FileRegionIOstats is running, it periodically displays
       the number of input trace packets (records)  processed  to
       standard error.  The report output is directed to standard
       out and is 132 characters  wide.   Several  paragraphs  of
       text describing the report contents are included after the
       actual I/O summary information.  The descriptive  text  is
       included  in  a slightly modified form in the section "THE
       REPORT" below.

       If the input file does not include any Open, Global  Open,
       or  File  Region Summary Trace records, the report will be
       generated without error, but will contain  only  headings.
       If the input file includes Open and/or Global Open records
       but no File Region Summary records,  the  report  will  be
       generated  but will contain only information on when files
       were opened.  For the report to work  as  intended,  there
       must  be  File  Region  Summary records in the input trace
       file.

BEFORE RUNNING THE PROGRAM
       The application to be studied must  be  instrumented  with
       the  I/O extension to the Pablo trace library and call the
       function enableFileRegionSummaries() to generate the trace
       records  needed  as  input  for this program.  The instru-
       mented application is run and one or more trace files  are
       generated  - the number of trace files depends on the num-
       ber of processors used to run the application.
       If there are multiple trace files, they should  be  merged
       with  the  MergePabloTraces(1) program to produce a single
       trace file for the execution  including  information  from
       all  processors.   The  program SyncIOfileIDs(1) should be
       run on the single trace file to synchronize IDs for  files
       that  were  opened more than once by the application.  The
       file generated by SyncIOfileIDs ending in ".syncFiles" can
       be used as input to FileRegionIOstats.

       It  is  possible  to  run  FileRegionIOstats and produce a
       report summarizing I/O activity by File Region on an  SDDF
       input  file  that contains File Region Summary records but
       which has not been merged and synchronized.  If the appli-
       cation's  trace  files  are not merged and synchronized as
       outlined above, the generated report must  be  interpreted
       accordingly.

       If  the input trace file uses the same fileID for multiple
       file names, an error message is generated and the  program
       exits without finishing the report.

THE REPORT
       The  report generated by FileRegionIOstats provides a spa-
       tial summary of  the  I/O  activity  in  individual  files
       accessed  by  processor(s)  executing  a  traced  program.
       Each  file  is  divided  spatially  into  parts  or  "file
       regions"  whose  size is set by the programmer in the ini-
       tial call to enableFileRegionSummaries().  The region size
       may  be  adjusted  by  subsequent  calls to setFileRegion-
       Size().  A summary of I/O activity for a file region on  a
       single  processor  is  generated  whenever  an I/O request
       moves the file position indicator from  one  part  of  the
       file to another.  In addition, summaries are produced when
       files are closed and when the programmer calls  outputFil-
       eRegionSummaries().

       To  illustrate:  Say  a  file has 3 regions - A, B, and C.
       Say an application  opens  the  file,  accesses  bytes  in
       region  A  (perhaps  with multiple I/O requests), accesses
       bytes in region B, accesses bytes in  region  C,  accesses
       bytes in region B, and closes the file.   The report would
       have summary lines for  the  Open,  Region  A,  Region  B,
       Region  C,  and  Region B.  Note the summary lines for the
       two phases of activity in Region B will not  be  combined.
       The  summarization is produced whenever I/O activity moves
       to a new file region (or when the file is closed)  -   not
       for  the  individual  regions  over the entire application
       lifetime.

       Details on the I/O summary  information  included  in  the
       report follow:

       A  line, "Bytes in File Region", gives the number of bytes
       in a file region as specified by  the  programmer  in  the
       application.  The actual bytes included in summary records
       is affected by opens, closes, and calls  to  outputFileRe-
       gionSummaries()  in  addition to the specified file region
       size.

       Report lines with the Event  Cause  "Open"  correspond  to
       file  opens,  including global opens for applications run-
       ning on the Intel Paragon.  All other report  lines  print
       summary statistics for a given file (indicated by File ID)
       on a given processor (indicated by PE Nmbr) since the last
       report line for that file/processor combination.  An Event
       Cause column reports what triggered the  summary:  "Close"
       indicates  the  file was closed; "Force" indicates output-
       FileRegionSummaries() was called;  "Region"  indicates  an
       I/O  request moved the file position indicator into a dif-
       ferent part (region) of the file.

       Count, Bytes, and Time columns show the event count, bytes
       involved,  and  seconds  taken  for  read, seek, and write
       requests.  Reads, seeks, and  writes  are  considered  I/O
       access  events.   Asynchronous  reads  and  writes (on the
       Intel Paragon systems) are included in the read and  write
       columns,  with  the  reported seconds corresponding to the
       duration  of  the  asynchronous  call,  not  to  the  time
       required  for  the completion of the requested data trans-
       fer.

       First and Last Byte columns  report  the  first  and  last
       bytes  accessed since the file was opened or the last sum-
       mary was generated.  These will be -1 if no  I/O  accesses
       occurred.

       The  Timestamp column gives (in seconds) the time the open
       or summary occurred relative to when tracing  was  enabled
       for the application.

KNOWN PROBLEMS
       The  report  is  not graceful in its handling of different
       file region sizes on  different  processors  of  the  same
       application  run.    The  individual  summary lines of the
       report will be correct, but the  "Bytes  in  File  Region"
       lines  will  not  reflect  the  processor(s) for which the
       bytes reported apply.

       For files accessed in a global  mode,  the  "Seek  Bytes",
       "First  Byte", and "Last Byte" values will often be incor-
       rect.  In particular,  on  Intel  Paragon  systems,  these
       three values should not be trusted for files accessed with
       an iomode of M_LOG, M_SYNC, M_RECORD, or M_GLOBAL.   Since
       the  position  of the file pointer triggers output of File
       Region Summary event records which appear in the report as
       lines  with  the  "Region" cause,  those summary lines may
       not accurately correspond to the file pointer moving to  a
       new  file  region  for  files accessed in the global mode.
       The values given, with the exception of the  three  fields
       listed  above,  will be correct over the course of the I/O
       to the file even though the generation of  the  individual
       summaries may not occur when the actual file pointer moves
       to new regions in the file as intended.  The I/O extension
       to  the Pablo instrumentation library attempts to minimize
       the overhead incurred in gathering file  pointer  informa-
       tion and does not track file pointer positioning correctly
       when the  activity  on  one  processor  affects  the  file
       pointer position on another processor.  An attempt will be
       made to address this problem in the next release.

SEE ALSO
       AdjustTime(1), IOstats(1), IOstatsTable(1),
       IOtotalsByPE(1), LifetimeIOstats(1), MergePabloTraces(1),
       SyncIOfileIDs(1), TimeWindowIOstats(1)
       Ruth A. Aydt, A User's Guide to Pablo I/O Instrumentation
       Ruth A. Aydt, The Pablo Self-Defining Data Format

COPYRIGHT
       Copyright 1994-1996, The University of Illinois  Board  of
       Trustees.

AUTHOR
       Ruth A. Aydt, University of Illinois

Pablo Environment          Oct 15, 1996