I ran into an issue yesterday that brought up questions as to how the checkpoint's max outstanding, I/O target is determined. There is different behavior across the various versions of SQL Server and it would take an entire white paper to fully explain the checkpoint process. I am going to try to boil it down to the most relevant information.
I previously outlined some of the checkpoint behavior in the following white papers.
Checkpoint sweeps the hashed data buffers. When it finds a dirty buffer it calls the routine (WriteMultiple). WriteMultiple does hash lookups for adjacent, dirty pages. The outstanding page count is incremented when a page write request is made. When the write completes the value is decrement.
The I/Os are written asynchronously. Checkpoint limits the number of I/Os it can post by comparing the number of outstanding requests with the target. When the target is reached checkpoint waits. Once an I/O completes Checkpoint can post another I/O.
Some high level pseudo-code helps.
bufferPos = 0;
while(bufferPos < maxBuffers)
if(buf[bufferPos] is dirty)
pagesInMotion = WriteMultiple(buf[bufferPos]);
while (totalCheckpointPagesOutstanding >= maxOutstandingPages)
Check success of WriteOperation
Max Outstanding I/O Target
|Version ||Max Outstanding I/Os for Checkpoint |
|SQL Server 2000 SP3 ||100 |
|SQL Server 2000 SP4 ||64 * Logical Schedulers |
|SQL Server 2005 ||*Formula |
As I stated there are other parts to FlushCache. For example SQL Server 2000 SP3 and SP4 attempt to maintain a rate that meets the configured recovery interval. This may include additional sleeps.
Generally speaking the loop will continue without additional waits. This means that the I/O load during SQL Server 2000 checkpoints is held steady. As soon as one of the I/Os complete Checkpoint can issue a new request.
See: KB 906121: Checkpoint resumes the behavior that it exhibited before you installed SQL Server 2000 SP3 when you enable trace flag 828 on a computer that is running SQL Server 2000 SP4
The reason I specifically point this out is that I have seen systems where some of the I/Os become stalled. Lets say the first 50 I/Os are stalled and the last 50 complete. Checkpoint puts out 50 more I/O requests that can again complete. This can make the performance counters for average disk sec/transfer and disk queue length more challenging to interpret.
*Formula (SQL 2005)
Checkpoint (FlushCache) was significantly upgraded for SQL Server 2005. It can honor a target (see manual checkpoint) and various other parameters can affect Checkpoint. I am going to describe one of these behaviors.
The starting point for max outstanding I/O level is loosely calculated as:
- Try to checkpoint entire SQL Server memory footprint in 300 seconds assuming 20ms I/O target. Divide total SQL Server commit buffers to determine the number buffer rate.
- Divide outcome of by the number of logical schedulers
- If outcome of is less than 20 default to 20
- Multiply outcome by 10
Note: Portions of this calculation are used to determine max outstanding lazy writes.
Each time SQL Server 2005 checkpoint executes the calculated target establishes the max outstanding I/O level. Depending on the type of checkpoint this value is adjusted as the I/Os complete.
SQL Server 2005 checkpoint adds duration tracking to the I/O requests. As each I/O completes the duration is used to adjust a running I/O average. Here is some pseudo-code showing a very high level running average calculation. A running average is used to accommodate the 'immediate' state of the I/O path so Checkpoint can react quickly to the current state and not the historical states.
Check success of WriteOperation
duration = Current Time - I/O Start Time
runningAvg = (duration + (runningAvg * 7)) / 8;
Using this running average, Checkpoint adjusts the number of allowed max outstanding I/Os with a goal to keep I/O response time below 20ms.
SQL Server Senior Escalation Engineer