A common source of performance bugs is disk I/O. In this blog, I'm going to give an overview of the features in our tool that help developers understand source of I/O stalls, including determining latency, files involved, and relating behavior to application source code.
The first screenshot, shown above illustrates how I/O stalls are depicted in our application. In the Thread Blocking view, I/O delays have a dedicated category (dark purple in VSTS Beta 1). You can identify the threads that are spending a significant fraction of their lifetime blocked I/O using the Execution Breakdown tab. Using this view, you can focus your attention on the thread(s) that are relevant and you may choose to eliminate other threads from the view by selecting the thread channels on the left and then clicking the "Hide selected threads" icon in the toolbar. In the screenshot, I've hidden all but the main thread (id = 5968) from the view and you can observe from the breakdown graph that this thread spends most of its time executing, except for some synchronization and I/O delays. In the timeline, you can observe that the I/O stalls occur at the tail end of execution.
The screenshot above shows what happens when you zoom in on that phase with lots of I/O stalls. More specifically, it shows what happens when you click on one of the I/O blocking segments. You will notice that the "Selection" tab, now shows details concerning this blocking event, more specifically (if it wasn't for a bug in Beta 1), it would show something like the following text:
Category = I/O
API = WriteFile
Delay = 70.3215 ms
Now, what we'd like to know is the file that was being written to at this time. The way you can do this is by moving the thread to be near the Disk write channels and clicking on the write segment (if available) that closely corresponds to the I/O blocking event. I've zoomed in on the blocking event that I selected above and now you can see that what happens when I click on the Disk 0 Write channel segment right above the blocking event. The Selection tab content now shows the number of physical disk I/O writes that were occuring at that moment with a list of the filenames involved. In this case, the file is "MatMult2.out". At this point, you are most likely interested in knowing the aggregate delay incurred writing to this file and where in my application these delays manifest themselves. To do that, there are two tools. First, when you click on the "Disk I/O Stats" tab, you will get a summary of the files being read/written to in the current view as well as the number, type of access, and number of bytes, in addition to the I/O latency. Second, if you select the "Other Blocking Reasons" tab, you will get a summary of blocking delays, collated by callstack so you can identify the number of blocking events on the specific WriteFile call in the current view. You should make sure to click the "UpdateStats" button in the toolbar to make sure that the reports are updated for the current view and threads selected. This reporting capability will be improved to filter on specific blocking categories in the future. From this report, you can also right-click on a callstack frame (e.g., WriteFile), which would allow you to view either the function body or the callsites for the current function. The next screenshot shows how the source file where the call to WriteFile is made opens up in this case with the cursor in the general vicinity of the call.
A couple of quick notes are in order about I/O and interactions with the operating system in our tool:
Now go play with our tool!