FILEREGIONIOSTATS(1)
NAME
FileRegionIOstats - Produce report from I/O File Region
Summary trace records
SYNOPSIS
FileRegionIOstats tracefile
DESCRIPTION
FileRegionIOstats generates a report of application I/O
activity summarized by file region from Open, Global Open,
and File Region Summary trace records in the input SDDF
file. The necessary trace event records are produced by
the I/O extension to the Pablo trace library when the File
Region Summary option is enabled.
As FileRegionIOstats is running, it periodically displays
the number of input trace packets (records) processed to
standard error. The report output is directed to standard
out and is 132 characters wide. Several paragraphs of
text describing the report contents are included after the
actual I/O summary information. The descriptive text is
included in a slightly modified form in the section "THE
REPORT" below.
If the input file does not include any Open, Global Open,
or File Region Summary Trace records, the report will be
generated without error, but will contain only headings.
If the input file includes Open and/or Global Open records
but no File Region Summary records, the report will be
generated but will contain only information on when files
were opened. For the report to work as intended, there
must be File Region Summary records in the input trace
file.
BEFORE RUNNING THE PROGRAM
The application to be studied must be instrumented with
the I/O extension to the Pablo trace library and call the
function enableFileRegionSummaries() to generate the trace
records needed as input for this program. The instru-
mented application is run and one or more trace files are
generated - the number of trace files depends on the num-
ber of processors used to run the application.
If there are multiple trace files, they should be merged
with the MergePabloTraces(1) program to produce a single
trace file for the execution including information from
all processors. The program SyncIOfileIDs(1) should be
run on the single trace file to synchronize IDs for files
that were opened more than once by the application. The
file generated by SyncIOfileIDs ending in ".syncFiles" can
be used as input to FileRegionIOstats.
It is possible to run FileRegionIOstats and produce a
report summarizing I/O activity by File Region on an SDDF
input file that contains File Region Summary records but
which has not been merged and synchronized. If the appli-
cation's trace files are not merged and synchronized as
outlined above, the generated report must be interpreted
accordingly.
If the input trace file uses the same fileID for multiple
file names, an error message is generated and the program
exits without finishing the report.
THE REPORT
The report generated by FileRegionIOstats provides a spa-
tial summary of the I/O activity in individual files
accessed by processor(s) executing a traced program.
Each file is divided spatially into parts or "file
regions" whose size is set by the programmer in the ini-
tial call to enableFileRegionSummaries(). The region size
may be adjusted by subsequent calls to setFileRegion-
Size(). A summary of I/O activity for a file region on a
single processor is generated whenever an I/O request
moves the file position indicator from one part of the
file to another. In addition, summaries are produced when
files are closed and when the programmer calls outputFil-
eRegionSummaries().
To illustrate: Say a file has 3 regions - A, B, and C.
Say an application opens the file, accesses bytes in
region A (perhaps with multiple I/O requests), accesses
bytes in region B, accesses bytes in region C, accesses
bytes in region B, and closes the file. The report would
have summary lines for the Open, Region A, Region B,
Region C, and Region B. Note the summary lines for the
two phases of activity in Region B will not be combined.
The summarization is produced whenever I/O activity moves
to a new file region (or when the file is closed) - not
for the individual regions over the entire application
lifetime.
Details on the I/O summary information included in the
report follow:
A line, "Bytes in File Region", gives the number of bytes
in a file region as specified by the programmer in the
application. The actual bytes included in summary records
is affected by opens, closes, and calls to outputFileRe-
gionSummaries() in addition to the specified file region
size.
Report lines with the Event Cause "Open" correspond to
file opens, including global opens for applications run-
ning on the Intel Paragon. All other report lines print
summary statistics for a given file (indicated by File ID)
on a given processor (indicated by PE Nmbr) since the last
report line for that file/processor combination. An Event
Cause column reports what triggered the summary: "Close"
indicates the file was closed; "Force" indicates output-
FileRegionSummaries() was called; "Region" indicates an
I/O request moved the file position indicator into a dif-
ferent part (region) of the file.
Count, Bytes, and Time columns show the event count, bytes
involved, and seconds taken for read, seek, and write
requests. Reads, seeks, and writes are considered I/O
access events. Asynchronous reads and writes (on the
Intel Paragon systems) are included in the read and write
columns, with the reported seconds corresponding to the
duration of the asynchronous call, not to the time
required for the completion of the requested data trans-
fer.
First and Last Byte columns report the first and last
bytes accessed since the file was opened or the last sum-
mary was generated. These will be -1 if no I/O accesses
occurred.
The Timestamp column gives (in seconds) the time the open
or summary occurred relative to when tracing was enabled
for the application.
KNOWN PROBLEMS
The report is not graceful in its handling of different
file region sizes on different processors of the same
application run. The individual summary lines of the
report will be correct, but the "Bytes in File Region"
lines will not reflect the processor(s) for which the
bytes reported apply.
For files accessed in a global mode, the "Seek Bytes",
"First Byte", and "Last Byte" values will often be incor-
rect. In particular, on Intel Paragon systems, these
three values should not be trusted for files accessed with
an iomode of M_LOG, M_SYNC, M_RECORD, or M_GLOBAL. Since
the position of the file pointer triggers output of File
Region Summary event records which appear in the report as
lines with the "Region" cause, those summary lines may
not accurately correspond to the file pointer moving to a
new file region for files accessed in the global mode.
The values given, with the exception of the three fields
listed above, will be correct over the course of the I/O
to the file even though the generation of the individual
summaries may not occur when the actual file pointer moves
to new regions in the file as intended. The I/O extension
to the Pablo instrumentation library attempts to minimize
the overhead incurred in gathering file pointer informa-
tion and does not track file pointer positioning correctly
when the activity on one processor affects the file
pointer position on another processor. An attempt will be
made to address this problem in the next release.
SEE ALSO
AdjustTime(1), IOstats(1), IOstatsTable(1),
IOtotalsByPE(1), LifetimeIOstats(1), MergePabloTraces(1),
SyncIOfileIDs(1), TimeWindowIOstats(1)
Ruth A. Aydt, A User's Guide to Pablo I/O Instrumentation
Ruth A. Aydt, The Pablo Self-Defining Data Format
COPYRIGHT
Copyright 1994-1996, The University of Illinois Board of
Trustees.
AUTHOR
Ruth A. Aydt, University of Illinois
Pablo Environment Oct 15, 1996