FILEREGIONIOSTATS(1) NAME FileRegionIOstats - Produce report from I/O File Region Summary trace records SYNOPSIS FileRegionIOstats tracefile DESCRIPTION FileRegionIOstats generates a report of application I/O activity summarized by file region from Open, Global Open, and File Region Summary trace records in the input SDDF file. The necessary trace event records are produced by the I/O extension to the Pablo trace library when the File Region Summary option is enabled. As FileRegionIOstats is running, it periodically displays the number of input trace packets (records) processed to standard error. The report output is directed to standard out and is 132 characters wide. Several paragraphs of text describing the report contents are included after the actual I/O summary information. The descriptive text is included in a slightly modified form in the section "THE REPORT" below. If the input file does not include any Open, Global Open, or File Region Summary Trace records, the report will be generated without error, but will contain only headings. If the input file includes Open and/or Global Open records but no File Region Summary records, the report will be generated but will contain only information on when files were opened. For the report to work as intended, there must be File Region Summary records in the input trace file. BEFORE RUNNING THE PROGRAM The application to be studied must be instrumented with the I/O extension to the Pablo trace library and call the function enableFileRegionSummaries() to generate the trace records needed as input for this program. The instru- mented application is run and one or more trace files are generated - the number of trace files depends on the num- ber of processors used to run the application. If there are multiple trace files, they should be merged with the MergePabloTraces(1) program to produce a single trace file for the execution including information from all processors. The program SyncIOfileIDs(1) should be run on the single trace file to synchronize IDs for files that were opened more than once by the application. The file generated by SyncIOfileIDs ending in ".syncFiles" can be used as input to FileRegionIOstats. It is possible to run FileRegionIOstats and produce a report summarizing I/O activity by File Region on an SDDF input file that contains File Region Summary records but which has not been merged and synchronized. If the appli- cation's trace files are not merged and synchronized as outlined above, the generated report must be interpreted accordingly. If the input trace file uses the same fileID for multiple file names, an error message is generated and the program exits without finishing the report. THE REPORT The report generated by FileRegionIOstats provides a spa- tial summary of the I/O activity in individual files accessed by processor(s) executing a traced program. Each file is divided spatially into parts or "file regions" whose size is set by the programmer in the ini- tial call to enableFileRegionSummaries(). The region size may be adjusted by subsequent calls to setFileRegion- Size(). A summary of I/O activity for a file region on a single processor is generated whenever an I/O request moves the file position indicator from one part of the file to another. In addition, summaries are produced when files are closed and when the programmer calls outputFil- eRegionSummaries(). To illustrate: Say a file has 3 regions - A, B, and C. Say an application opens the file, accesses bytes in region A (perhaps with multiple I/O requests), accesses bytes in region B, accesses bytes in region C, accesses bytes in region B, and closes the file. The report would have summary lines for the Open, Region A, Region B, Region C, and Region B. Note the summary lines for the two phases of activity in Region B will not be combined. The summarization is produced whenever I/O activity moves to a new file region (or when the file is closed) - not for the individual regions over the entire application lifetime. Details on the I/O summary information included in the report follow: A line, "Bytes in File Region", gives the number of bytes in a file region as specified by the programmer in the application. The actual bytes included in summary records is affected by opens, closes, and calls to outputFileRe- gionSummaries() in addition to the specified file region size. Report lines with the Event Cause "Open" correspond to file opens, including global opens for applications run- ning on the Intel Paragon. All other report lines print summary statistics for a given file (indicated by File ID) on a given processor (indicated by PE Nmbr) since the last report line for that file/processor combination. An Event Cause column reports what triggered the summary: "Close" indicates the file was closed; "Force" indicates output- FileRegionSummaries() was called; "Region" indicates an I/O request moved the file position indicator into a dif- ferent part (region) of the file. Count, Bytes, and Time columns show the event count, bytes involved, and seconds taken for read, seek, and write requests. Reads, seeks, and writes are considered I/O access events. Asynchronous reads and writes (on the Intel Paragon systems) are included in the read and write columns, with the reported seconds corresponding to the duration of the asynchronous call, not to the time required for the completion of the requested data trans- fer. First and Last Byte columns report the first and last bytes accessed since the file was opened or the last sum- mary was generated. These will be -1 if no I/O accesses occurred. The Timestamp column gives (in seconds) the time the open or summary occurred relative to when tracing was enabled for the application. KNOWN PROBLEMS The report is not graceful in its handling of different file region sizes on different processors of the same application run. The individual summary lines of the report will be correct, but the "Bytes in File Region" lines will not reflect the processor(s) for which the bytes reported apply. For files accessed in a global mode, the "Seek Bytes", "First Byte", and "Last Byte" values will often be incor- rect. In particular, on Intel Paragon systems, these three values should not be trusted for files accessed with an iomode of M_LOG, M_SYNC, M_RECORD, or M_GLOBAL. Since the position of the file pointer triggers output of File Region Summary event records which appear in the report as lines with the "Region" cause, those summary lines may not accurately correspond to the file pointer moving to a new file region for files accessed in the global mode. The values given, with the exception of the three fields listed above, will be correct over the course of the I/O to the file even though the generation of the individual summaries may not occur when the actual file pointer moves to new regions in the file as intended. The I/O extension to the Pablo instrumentation library attempts to minimize the overhead incurred in gathering file pointer informa- tion and does not track file pointer positioning correctly when the activity on one processor affects the file pointer position on another processor. An attempt will be made to address this problem in the next release. SEE ALSO AdjustTime(1), IOstats(1), IOstatsTable(1), IOtotalsByPE(1), LifetimeIOstats(1), MergePabloTraces(1), SyncIOfileIDs(1), TimeWindowIOstats(1) Ruth A. Aydt, A User's Guide to Pablo I/O Instrumentation Ruth A. Aydt, The Pablo Self-Defining Data Format COPYRIGHT Copyright 1994-1996, The University of Illinois Board of Trustees. AUTHOR Ruth A. Aydt, University of Illinois Pablo Environment Oct 15, 1996