IOSTATS(1) NAME IOstats - Produce report from I/O detail trace records about I/O operations by request type SYNOPSIS IOstats [-processor number] [-fileID id] [-paramFile params] [-browserFile browseOut] tracefile DESCRIPTION IOstats generates a report of application I/O activity summarized by I/O request type from detailed I/O trace records in the input SDDF trace file. The necessary trace event records are produced by the I/O extension to the Pablo trace library by default whenever the I/O exten- sion has been initialized and individual I/O calls have been instrumented. As IOstats is running, it periodically displays the number of input trace packets (records) processed to standard error. The report output is directed to standard out. Several paragraphs of text describing the report contents are included after the actual I/O activity information. An expanded version of the descriptive text is included in the section "THE REPORT" below. If the input file does not include any I/O detail trace records, the report will be generated without error but will contain all zeros. For the report to work as intended, there must be I/O detail trace records in the input SDDF file. If the programmer elects not to collect I/O detail trace records for the entire application execu- tion, the report must be interpreted with that in mind. OPTIONS -processor number Only I/O activity occurring on the specified pro- cessor is included in the report. A comment con- taining the processor number is written to standard error when IOstats begins execution. The processor number is also included in the report itself after the heading "Reported Processor:". The default behavior is to include I/O activity from all processors. -fileID id Only I/O activity associated with the specified file, as identified by id, is included in the report. A comment containing the file ID is writ- ten to standard error when IOstats begins execu- tion. The file ID and file name are also included in the report itself after the headings "Reported avoid this file ID to file name mismatch problem. The default behavior is to include I/O activity for all files in the report. -paramFile params A parameter file, params, may be used to override the default configurations for the minimum, maxi- mum, and bin sizes found in the report histograms. A comment containing the name of the parameter file is written to standard error when IOstats begins execution. This option allows the user to tune the histogram parameters to suit the characteristics of their input trace data. See the section "USING A PARAME- TER FILE" below for more details on the format and contents of the parameter file. -browserFile browserOut When specified, this option causes IOstats to pro- duce an SDDF file, browserOut, containing a subset of the reported information in a format that can be loaded and viewed with the Pablo Browser program. See "THE BROWSER DATA" section below for details on the information written to the SDDF file. Note that the Pablo Brower program is currently not distributed outside the Pablo research group due to limited portability across platforms. BEFORE RUNNING THE PROGRAM The application to be studied must be instrumented with the I/O extension to the Pablo trace library and should NOT call the function disableIOdetail(). The instrumented application is run and one or more trace files are gener- ated - the number of trace files depends on the number of processors used to run the application. If there are multiple trace files, they should be merged with the MergePabloTraces(1) program to produce a single trace file for the execution including information from all processors. It is possible to run IOstats on unmerged trace files, but the output will only contain information for the single processor reported in the input trace file. If you intend to use the -fileID option to report only accesses to individual files, the program SyncIOfileIDs(1) should be run on the input trace file to synchronize IDs for files that were opened more than once by the applica- tion. The file generated by SyncIOfileIDs ending in ".syncFiles" can be used as input to IOstats. The file ending in ".syncFiles.map" can be used to identify the file ID of the particular file whose I/O you are inter- ested in viewing. If I/O detail trace records were not produced for all I/O operations in the application, either because some calls were not instrumented or because detail tracing was dis- abled for part of the execution, the report will include only those operations for which detail records were pro- duced and should be interpreted with that in mind. THE REPORT The report summarizes information from detailed I/O event trace records. Every I/O operation for the processor(s) and file(s) in the input tracefile that is not filtered out by a command line -fileID or -processor switch is treated separately - there is no attempt to correlate operations that take place concurrently on individual nodes of a multiprocessor system. Said another way, the processor and file ID information in the trace event records is used for filtering purposes, but is not consid- ered beyond that as the statistics are generated for this report. The First and Last I/O Operation timestamps are based on I/O requests from every processor included in the report. These values give a general feel for the wall-clock time over which the I/O reporting takes place. The report is divided into sections by operation type: Open, Close, Read, Seek, Write, Flush, and Miscellaneous I/O. In addition, for traces gathered on Intel Paragon systems, there are sections for the operations Global Open, Asynchronous Read, Asynchronous Write, IO Wait and IO Done, IO Mode, and Lsize. The last section of the report gives totals for All I/O Operations reported indi- vidually in the earlier sections. The information reported for the individual operation types in their respective sections is very similar. An operation count and total time is given in each section, along with the mean, standard deviation, variance, mini- mum, maximum, and histogram for the operation durations (in seconds) and the time between operations of the same type (also in seconds). For Read and Write operations, additional information about bytes transferred, bytes transferred over total time, and statistical summaries for Bytes Transferred and Bytes/Second based on individual requests are provided. The Seek operation section con- tains a statistical summary of bytes traversed. When asynchronous operations are used, the bytes transferred are reported in the Asynchronous Read and Asynchronous Write sections of the output. The following paragraphs explain further the meaning of the reported values. The operation count reports the number of times an opera- tion occurred on any processor included in the report. For example, say the source program issued a single read request which is executed on 16 nodes of a parallel machine. The report will show a count of 16 read opera- tions when run on a file with trace data collected from all 16 nodes. That is, each I/O request on each processor is treated separately even though a single call in the program source caused the "multiple" requests. Following this line of reason, sections for operations which involve byte transfer (reads, writes, asynchronous reads, asynchronous writes) report the total number of bytes involved on any processor included in the report. Say our previous read operation requested 4 bytes of data, it executed on 16 nodes so ( 4 * 16 ) = 64 bytes were transferred by that read request. When operation durations are reported, each operation on each processor is again treated separately, even though they may overlap in wall-clock time. Again using our read example, let's say all processors start reading at time n seconds and finish at time n+2 seconds. The total time for the read operations would be ( 16 * (n+2 - n) ) = 32 seconds. The mean read duration would be 2, and there would be no deviation from the mean. If our program did very little except this read, it's quite possible that in wall-clock time the total program execution may take less than 32 seconds. The "Bytes Xferred/Total Time" for this example is ( 64 / 32 ) = 2 bytes/second. Note that this is not the machine throughput which would be ( 64 / 2 ) = 32 bytes/second. (This throughput value is not included in the report.) The report also gives the bytes/second based on individual read requests. For our read example, every request was for 4 bytes and took 2 seconds so the mean is 2 bytes/sec- ond and there is no deviation from that. Let's change our example a bit and say each read still requested 4 bytes of data, but 8 nodes finished in 2 seconds and 8 finished in 4 seconds. The total time would be ( 8*2 + 8*4 ) = 48 seconds and the "Bytes Xferred / Total Time" is ( 64 / 48 ) = 1.333 bytes/second. Here machine throughput might be calculated as ( 64 / 4 ) = 16 bytes/second. For the modified example, the bytes/second based on individual read requests would be ( 4 / 2 ) = 2 bytes/second for 8 nodes and ( 4 / 4 ) = 1 bytes/second for 8 nodes yielding a mean of 1.5, a standard deviation of .516, and a variance of .267. Now let's change the example further and say that the reads do not begin at the same time on each node but instead there is a 3 second delay between the start of each read. The cryptic ascii diagram below attempts to show this scenario... nodes are labeled 0 through f and appear under the timeline where they would be performing their read operations. 1 2 3 4 Time 0123456789012345678901234567890123456789012345678 Nodes 00 11 22 33 44 55 66 77 8888 aaaa cccc eeee 9999 bbbb dddd ffff Here we see that 48 seconds elapses on the wall-clock while the reads take place. But, no reads are taking place during eight of those seconds (2, 5, 8, ...) and the machine throughput is open to interpretation. Do we say the throughput is ( 64 / 48 ) = 1.333 bytes/second, or should the idle seconds be subtracted out yielding ( 64 / (48 - 8 ) ) = 1.6 bytes/second, or should we somehow par- tition the requests where multiple nodes are doing reads separately? We really don't know why the read requests are offset by 3 seconds - should that offset be "charged" to the I/O system? Difficulties such as these, combined with the inability to reconstruct from a trace file the program source that generated read operations across mul- tiple nodes highlight why no machine throughput numbers appear on the report. Another statistic given for each reported I/O operation is the time between XXX (where XXX is some I/O operation) operations in seconds. When multiple processors are included in the report, this value represents the time between any XXX operation on any node that is included in the report (in seconds). In the first and second examples above where all reads began at the same time on all nodes, the mean would be 0. In the third example, where each read began 3 seconds after the read on another node, the mean would be 3 seconds. None of the examples deviate from the mean. Finally, for systems which include Asynch Read and Write information, the durations reported are for the asyn- chronous read or asynchronous write calls and do not include the actual time required to transfer the requested bytes. USING A PARAMETER FILE The IOstats program has default configuration parameters controlling the minimums, maximums, and bin sizes used in the histograms included in the report. The user may over- ride these defaults for some or all of the histograms by using the -paramFile command line option. The parameter file is an ASCII SDDF file whose records provide histogram configuration information which will override the minimum, maximum, and bin size defaults com- piled into the IOstats executable. All configuration records have the name "initialize IO histogram" and the value of the first field "statistic" controls which his- togram to configure. A sample parameter file follows: SDDFA #1: // "IOstats" "Histogram initialization" "initialize IO histogram" { // "Recognized Statistics" "Reported as" // "time between IO" "Time between I/O Operations" // "open durations" "Open Operation Durations" // "time between opens" "Time between Open Operations" // "close durations" "Close Operation Durations" // "time between closes" "Time between Close Operations" // "read durations" "Read Operation Durations" // "time between reads" "Time between Read Operations" // "read sizes" "Read Operation Bytes Transferred" // "read throughput" "Bytes/Second based on individual Read s" // "seek durations" "Seek Operation Durations" // "time between seeks" "Time between Seek Operations" // "seek sizes" "Seek Operation Bytes Traversed" // "write durations" "Write Operation Durations" // "time between writes" "Time between Write Operations" // "write sizes" "Write Operation Bytes Transferred" // "write throughput" "Bytes/Secnd based on individual Write s" // "flush durations" "Flush Operation Durations" // "time between flushes" "Time between Flush Operations" // "misc i/o durations" "Misc I/O Operation Durations" // "time between misc i/o" "Time between Misc I/O Operations" // "global open durations" "Global Open Operation Durations" // "time between global opens" "Time between Global Open Operations" // "asynch read durations" "Asynch Read Operation Durations" // "time between asynch reads" "Time between Asynch Read Operations" // "asynch read sizes" "Asynch Read Operation Bytes Transf... " // "asynch write durations" "Asynch Write Operation Durations" // "time between asynch writes" "Time between Asynch Write Operations" // "asynch write sizes" "Asynch Write Operation Bytes Trans... " // "iowait durations" "IO Wait and IO Done Operation Dura... " // "time between iowaits" "Time between IO Wait and IO Done O... " // "iomode durations" "IO Mode Operation Durations" // "time between iomodes" "Time between IO Mode Operations" // "lsize durations" "Lsize Operation Durations" // "time between lsizes" "Time between Lsize Operations" char "statistic"[]; // "Minimum" "Minimum bin value" double "minimum"; // "Maximum" "Maximum bin value" double "maximum"; // "Bin Size" "Size of bin" double "bin size"; };; "initialize IO histogram" { [30] { "read sizes" }, 0, 1024, 256 };; "initialize IO histogram" { [30] { "write sizes" }, 1024, 10240, 1024 };; This parameter file contains two data records which over- ride the default values for two of the histograms produced by the IOstats program. First, a minimum value of 0, a maximum value of 1024, and a bin size of 256 will be used in the histogram "Read Operation Bytes Transferred:". Second, a minimum value of 1024, a maximum value of 10240, and a bin size of 1024 will be used in the histogram "Write Operation Bytes Transferred:". The bin size should divide evenly into (maximum - mini- mum), and ( (maximum - minimum) / bin size ) yields the number of "bounded" bins appearing in the histogram. In addition to the "bounded" bins, there are underflow and overflow bins to report values that fall outside the mini- mum/maximum range selected. The comments for the field "statistic" in the descriptor for the "initialize IO histogram" record contain the strings that should be used in the data records to over- ride all possible histogram default configurations. THE BROWSER DATA The browser SDDF file contains histogram data for each type of I/O operation ( IO_OP_NAME = Open, Close, Read, Seek, Write, Flush, Misc I/O, Global Open, Asynch Read, Asynch Write, IO Wait and IO Done, IO Mode, Lsize) and summary information across all operation types. o IO_OP_NAME Operation Durations (seconds) o Time between IO_OP_NAME Operations (seconds) o Time between I/O Operations (seconds) o Read Operation Bytes Transferred: o Seek Operation Bytes Traversed: o Write Operation Bytes Transferred: o Bytes/Second based on individual Read Requests o Bytes/Second based on individual Write Requests o Count of I/O Operations o Total Time of I/O Operations o Mean Duration of I/O Operations KNOWN PROBLEMS For files accessed in a global mode, the "Seek Operation Bytes Traversed:" will often be incorrect. In particular, on Intel Paragon systems, the reported seek bytes should not be trusted for files with an iomode of M_LOG, M_SYNC, M_RECORD, or M_GLOBAL. The I/O extension to the Pablo instrumentation library attempts to minimize the overhead incurred in gathering file pointer information, and does not track file pointer positioning correctly when the activity on one processor affects the file pointer posi- tion on another processor. An attempt will be made to address this problem in the next release. FILES/Templates/IOstats.parameters A sample parameter file to use with the -paramFile option. SEE ALSO FileRegionIOstats(1), IOstatsTable(1), IOtotalsByPE(1), MergePabloTraces(1), LifetimeIOstats(1), SyncIOfileIDs(1), TimeWindowIOstats(1) Ruth A. Aydt, A User's Guide to Pablo I/O Instrumentation Ruth A. Aydt, The Pablo Self-Defining Data Format COPYRIGHT Copyright 1994-1996, The University of Illinois Board of Trustees. AUTHOR Ruth A. Aydt, University of Illinois Pablo Environment Oct 16, 1996