Welcome to MSDN Blogs Sign in | Join | Help

SQLIOSim - Outstanding I/O Warnings

I have encountered some questions pertaining to the SQLIOSim outstanding IO warnings and I am posting some details.

 

The Display monitor thread is responsible for updating the display (progress outputs) and checking for long IOs  (CheckForLongIO/ReportLongIo).   Each time it wakes (from 1 to 10 second range depending on state) it fires the check.

 

Each logical CPU (scheduler) in the simulator has an IO list.   The CheckForLongIO loops over each logical CPU and checks the pending IO list of each logical scheduler.

 

When the IO is posted (async) we grab the current time and add the IO to the logical scheduler specific list.  When the check is performed each IO in the list is checked with HasOverlappedIoCompleted.   This is a WIN32 macro that checks the status stored in the OVERLAPPED structure of the IO to see if the system has reported the IO as complete.  When FALSE is returned and the duration of the IO is > 15 seconds the condition is flagged as a ‘stalled’ IO as the OS and sub-system have not completed the specific IO.

 

The number of these outstanding IOs is accumulated and reported after all logical CPUs have been checked.

 

The following output shows 2 logical CPUs being used by the simulator.

 

03/10/07

15:35:58

3264

System

CPUCount = 2, Affinity = 0, IOAffinity = 0, MaxMemory = 630 MB, StopOnError = Yes, TestCycles = 1, TestCycleDuration = 300 s, CacheHitRatio = 100.00%

 

At 15:36:46 you have 5 IO requests that exceeded 15 seconds.  One second later another 72 requests exceeded the 15 second mark.  (The same IO is not reported again, once flagged as long it won’t be counted on secondary passes.)

 

What this does not show is how many IOs have completed.  SQLIOSim could have completed 1000s of IOs during this 15 second window but some of the requests are being stalled. 

 

03/10/07

15:36:45

3764

Display Monitor

Error: 0x00000000
Error Text:
Description: 5 IO requests are outstanding for more than 15 sec.

03/10/07

15:36:46

3764

Display Monitor

Error: 0x00000000
Error Text:
Description: 72 IO requests are outstanding for more than 15 sec.

 

The files are open with NO buffering so they should not involve system cache components.  You should review filter drivers and the like.  Likely something like elevator seek or something of this nature is at play when you see the outstanding IO warnings.

 

 

03/10/07

15:35:58

3264

System

NoBuffering = Yes, WriteThrough = Yes, UseScatterGather = Yes, ForceReadAhead = No, MaxOutstandingIO = 0, TargetIODuration = 100 ms

LOGICALCONFIG::PrintConfigurationToLog

e:\yukon\sosbranch\sql\ntdbms\storeng\util\sqliosim\logicalconfig.cpp

228

 

03/10/07

15:35:58

3264

System

AllowIOBursts = Yes, DeleteFilesAtStartup = Yes, DeleteFilesAtShutdown = No, StampFiles = No

     

 

The following error text is reported for each, individual file no more than every 30 seconds, indicating the number of total IOs identified as taking longer than 15 seconds.

 

The following indicates that 826 IO requests for the file sqliosim.mdx exceeded 15 seconds to complete in the last 30 second processing window.   The last IO that is currently STATUS_PENDING (256) is at the specified offset on the specified handle.

 

 

03/10/07

15:37:16

3764

Display Monitor

Error: 0x00000000
Error Text:
Description: 826 long IO requests are outstanding for file h:\sqliosim.mdx. Handle 0x2bc. Offset of latest long IO 0x3cf80000, duration 16000 ms

 

Support normally has to use the kernel debugger and !irpfind to locate the IRPs associated with the threads in SQLIOSim and see their IO stack location.

 

The way the output is sequenced can be puzzling until you understand the sequence.   You will see the series of CheckForLongIO (# IO requests are outstanding for more than 15 sec.) and then the ReportLongIO message (# long IO requests are outstanding for file …).  

 

The CheckForLongIO calls ReportLongIO but ReportLongIO has the 30 second reporting throttle.  When the 30 seconds is exceeded the message is logged.  This means that the next CheckForLongIO message (shown below with 887 IO requests) is included in the count of the ReportLongIO as the CheckForLongIO adds up all stalled IOs before reporting the message.

 

 

03/10/07

15:37:26

3764

Display Monitor

Error: 0x00000000
Error Text:
Description: 29 IO requests are outstanding for more than 15 sec.

CLogicalCPUs::CheckForLongIO

e:\yukon\sosbranch\sql\ntdbms\storeng\util\sqliosim\logicalcpu.cpp

1230

 

03/10/07

15:37:27

3764

Display Monitor

Error: 0x00000000
Error Text:
Description: 3 IO requests are outstanding for more than 15 sec.

CLogicalCPUs::CheckForLongIO

e:\yukon\sosbranch\sql\ntdbms\storeng\util\sqliosim\logicalcpu.cpp

1230

 

03/10/07

15:37:59

3764

Display Monitor

Error: 0x00000000
Error Text:
Description: 1138 long IO requests are outstanding for file h:\sqliosim.mdx. Handle 0x2bc. Offset of latest long IO 0x85640000, duration 15375 ms

CLogicalFile::ReportLongIO

e:\yukon\sosbranch\sql\ntdbms\storeng\util\sqliosim\fileio.cpp

811

 

03/10/07

15:37:59

3764

Display Monitor

Error: 0x00000000
Error Text:
Description: 887 IO requests are outstanding for more than 15 sec.

     

 

JAN 2008 UPDATE - SQLIOSim Stalled I/O Warnings During Checkpoint Phase of Test

Allow bursts is used to disable I/O target throttling.   This is in direct response to a stress scenario that has uncovered drivers that can blue screen when under large I/O depths and memory pressure.   

 

DEFAULT: TRUE   (Current decision was that SQLIOSim was designed to be a data consistency checker not performance test so it should allow burst activities.)

 

The current design is as follows.

 

·         During initial file build and checkpoint, when TRUE, I/O throttling is disabled.

·         During the actual test phase I/O throttling is always enabled, regardless of the allow burst setting.

·         During checkpoint phase of the test, when TRUE, I/O throttling is disabled.

·         Before the audit phase I/O throttling is enabled.

 

The design is to change the target I/O time for the entire set of logical files.   The normal behavior is to use the configuration setting for the I/O target response time.   Basically, we use it to determine how many OVERLAPPED structures are allowed to be used.  When we are not meeting the target response we throttle back the number of I/O requests that can be used.  When we are meeting the response rate we add I/O requests so we can post more I/O.

 

When Allow Bursts disables throttling is sets the target to MAX_DWORD.   When throttling is enabled again it returns to the configured target.

 

This allows the checkpoint phase to do what the older SQLIOStress did and post 1000s of I/Os in rapid succession to push the I/O depth.  This can result in the I/O warnings.  What you are looking for is that machine to cleanly recover from the situation and continue testing.

 

The one side effect this behavior may have that affects common testing scenarios, during a cycle is that when disabled the number of I/O request structures could grow large.   When throttling is re-enabled it may take a bit of time of the normal activities to come back into compliance with the target.  

 

If I was able to get a depth of 1000 during the (allow bursts checkpoint activity) these request structures are present when I start the standard cycle activities.   The target has returned to the configured value but nothing is preventing the initial I/O activity from getting a request.   It won’t be until the first set of I/Os are completed during a cycle that the number of request structures is trimmed.   So at the beginning of a cycle you might get more I/O depth than the end of the cycle.

 

I think the way the code it written you won’t see this very often.   The normal cycle does a checkpoint and then an audit.  The audit is limited by the configured I/O target so it should bring the number of requests back inline before the cycle starts all other user simulations.

 

Bob Dorr
SQL Server Senior Escalation Engineer

Published Tuesday, April 10, 2007 4:04 PM by psssql
Filed under: ,

Comments

# Understanding SQLIOSIM Output

Thursday, June 28, 2007 12:36 PM by Kevin Kline

One of the things that’s great about my job is the fact that I get to meet people who are a lot smarter

# re: SQLIOSim - Outstanding I/O Warnings

Thursday, July 19, 2007 3:40 AM by abhigupta22

I had run the test for HP MSA drive wiht one local drive. But for MSA drvie it is printing following errors:-

19/07 12:43:56 11688 CreateFileStream Error: 0x80070032

Error Text: The request is not supported.

Description: Unable to get disk cache info for Z:\

# re: SQLIOSim - Outstanding I/O Warnings

Monday, October 08, 2007 10:10 AM by psssql

The utility attempts to check the caching state.   When it can't it logs an error.  This should be a warning and will be updated in future versions.

[RDORR Jan 24, 2008] This is correct and is already updated in the SQL 2008 CTP builds.

Anonymous comments are disabled
 
Page view tracker