NTDebugging Puzzler 0x00000007: Interlocked functions

NTDebugging Puzzler 0x00000007: Interlocked functions

  • Comments 19


Today, we will have some fun with interlocked functions.


The following section of code is reentrant.  A “well meaning” developer used interlocked functions to avoid serializing on a global table lock.


Initial smoke testing shows that the code works fine.  Sometimes things are not as they appear when doing initial code review.  After several hours of heavy stress testing, the developer finds the machine has bugchecked.  Analysis of the dump showed that the caller of this function had steamrolled through nonpaged pool writing stacks on top of everyone’s pool memory.


The goal of today’s puzzler is to find the bug and describe how it could be fixed with minimal timing impact.


Here are a few details before you begin.


1.       The variable gIndex is an unsigned long global.

2.       The gLogArray memory was allocated from nonpaged pool and the size of this allocation is correct.




00:   PLOG_ENTRY GetNextLogEntry ()


01:         ULONG IterationCount = MAX_RECORDS;


02:         PLOG_ENTRY pEntry;


03:         do


04:               if (InterlockedCompareExchange(&gIndex, 0, MAX_RECORDS) == MAX_RECORDS)


05:                     pEntry = &gLogArray[0];


06:               else


07:                     pEntry = &gLogArray[InterlockedIncrement(&gIndex)];


08:               --IterationCount;


09:         } while(InterlockedCompareExchange(&pEntry->Active, 1, 0) != 0 && (IterationCount > 0));


0a:         return (IterationCount > 0) ? pEntry : NULL;





Happy hunting,


Dennis Middleton “The NTFS Doctor”


  [Update: our answer. Posted 6/10/2008]


Thanks for all the great posts!  I saw many interesting answers, and a few unique solutions that I hadn’t considered.


Bug Description


A slight time gap exists between the InterlockedCompareExchange and the InterlockedIncrement.  For this reason, several threads could pass the check for MAX_RECORDS prior to doing the increment.


Assume that N is the number of threads that pass the check for MAX_RECORDS while gIndex is at a particular value.

If N or more threads are able to pass the check while gIndex is equal to MAX_RECORDS-(N-1), then gIndex would be incremented beyond MAX_RECORDS.


For example, let’s assume that 3 threads passed the check while gIndex was at MAX_RECORDS-2.  Then after the three increments occur, gIndex would be equal to MAX_RECORDS+1.  From that point, invalid pointers would be passed out to the caller.


Possible Solutions


There are several ways to solve this problem.  Some are more efficient than others.  I would avoid doing checks for MAX_RECORDS-1, MAX_RECORDS, or MAX_RECORDS+1 (interlocked or not) since there could potentially be more than two threads involved in the race condition.  Such a solution would only reduce the likelihood of an overflow.


There were a few posts suggesting a lock or critical section for the section of code between the compare and increment.  That would be one solution, but it would also do away with the benefits of using interlocked functions.


In keeping with the philosophy of keeping the code fast and simple, here’s a solution that gives a very good result with minimal impact.


1.       I removed the if () {} else {} and simply allowed gIndex to increment unchecked.  With this change, gIndex can approach its max unsigned long value and possibly wrap - we need to keep the array index in check.

2.       The modulus operator (added to line 4 below) will divide the incremented gIndex by MAX_RECORDS and use the remainder as the array index.  When dividing, the resultant remainder is always smaller than the divisor (MAX_RECORDS).  For this reason, the array index is guaranteed to be smaller than MAX_RECORDS.  As even multiples of MAX_RECORDS are reached, the array index resets back to zero mathemagically and no interlocked compare is even necessary.



00:   PLOG_ENTRY GetNextLogEntry ()


01:         ULONG IterationCount = MAX_RECORDS;


02:         PLOG_ENTRY pEntry;


03:         do



04:               pEntry = &gLogArray[InterlockedIncrement(&gIndex) % MAX_RECORDS];


05:               --IterationCount;


06:         } while(InterlockedCompareExchange(&pEntry->Active, 1, 0) != 0 && (IterationCount > 0));


07:         return (IterationCount > 0) ? pEntry : NULL;




With the fix in place, the code is smaller, faster, easier to read, and most of all - the bug is gone.  When developing code, always try to think small.             


Best Regards,

NTFS Doctor




Leave a Comment
  • Please add 8 and 6 and type the answer here:
  • Post
  • Seems to be a simple race condition between the InterlockedCompareExchange at line 0x04 and the InterlockedIncrement at line 0x07.  At MAX_RECORDS-1, two threads could both pass the InterlockedCompareExchange and then both call InterlockedIncrement pushing gIndex up to MAX_RECORDS+1 and killing memory from then on.

  • Scenario: gIndex == MAX_RECORDS-1. Thread 1 passes through line 4. Context switch. Thread 2 passes through line 4. Thread 2 passes through line 7. gIndex == MAX_RECORDS. Context switch. Thread 1 passes through line 7. gIndex == MAX_RECORDS + 1. Threads will now start scribbling past the end.

    The obvious fix would be to use a second loop to set gIndex to the correct value:

    ULONG index;

    do {

      index = gIndex;

    } while (InterlockedCompareExchange(&gIndex, (index + 1) % MAX_RECORDS, index) != index);

    pEntry = &gLogArray[index];

    This is probably not what you have in mind when you want a "minimal timing impact" fix, though. So I'll leave it to people with more experience in writing error-prone code like this.

  • Issue: gIndex might pass MAX_RECORDS.

    Lets assume gIndex = (MAX_RECORDS-1).

    its possible for two (or more) threads to execute InterlockedIncrement(&gIndex) on line 07.

    Possible fix:




    } while( (pEntry >= &gLogArray[MAX_RECORDS] || InterlockedCompareExchange(&pEntry->Active, 1, 0) != 0) && (IterationCount > 0));

  • Oops :-( my "fix" got things worse.

    I forgot to zero out gIndex.

  • The problem is between line 4-7. gIndex boundary checking and incrementing are not protected.

    Illustrated with two threads:

    T1: line 4, gIndex=MAX_RECORD-1

    T2: line 4, gIndex=MAX_RECORD-1

    T1: line 6,7, gIndex=MAX_RECORD

    T2: line 6,7, gIndex=MAX_RECORD+1

    I would fix it to:

    pEntry = &gLogArray[InterlockedIncrement(&gIndex)];

    if (InterlockedCompareExchange(&gIndex, 0, MAX_RECORDS+1) == MAX_RECORDS+1)

       pEntry = &gLogArray[0];

  • My second try...

    Keep old_value from line 4 call to InterlockedCompareExchange.

    Replace line 7 InterlockedIncrement with:

    if (InterlockedCompareExchange(&gIndex,old_value+1, old_value) == old_value)

       pEntry = &gLogArray[old_value+1];


       goto line_04; //some other thread messed up...lets try again

  • i think this is related to index point out the allocated gLogArray boundary, cause when gIndex==(MAX_RECORDS-1) and execute to Line07, it will increaase to MAX_RECORDS and continue to return the &gIndex[MAX_RECORDS] which is unallocated;

    setting Line04 to check with (MAX_RECORDS) should solve the problem

  • Suppose gIndex = MAX_RECORD -1.

    Two consumers come:

    -Both compare exchange, no go since no wraparound.

    -Both increment. Boom, one of them is given an out of boundary memory location.

    To watch for this race, you must not unconditionally increment the counter. Instead,

    do the following:

    localVar = compareExchange(...).

    if (localVar != MAX_RECORDS){

     localVar1 = compareExchange(&gIndex,localVar,localVar + 1);

     If (localVar1 != localVar + 1){





    against the return value of the original compareExchange and try to put

  • Hi,

     when gIndex is MAX_RECORDS-1, between line 4-7 another processes could increment gIndex. Then gIndex could cross MAX_RECORDS value, and at line 9 InterlockedCompareExchange could rewrite memory IterationCount times. Line 4 alone is ineffective for guarding array range.

     How to fix? FastMutex for lines 4-7?

    That's idea.

    Slavo Tomascik

  • On the while condition, if the first condion:

    InterlockedCompareExchange(&pEntry->Active, 1, 0) results on true (meaning that pEnty active is 1 and owned by the thread), we have a final condition on the return:

    return (IterationCount > 0) ? pEntry : NULL;

    but if iterationCount is zero (and pEntry was owned by the thread) the function will return NULL, but the log entry will be owned by the thread and lost.

    To fix this we can swap the while condition:

    while((IterationCount > 0)&& (InterlockedCompareExchange(&pEntry->Active, 1, 0) != 0));

    With the Swap, the thread that reaches 0 on iteration count could miss a log entry, but at least will not leak it.

  • The IterationCount starts from MAX_RECORDS and continues up until is 0; that means that there are at most MAX_RECORDS loops. This leads to the conclusion that the array size is MAX_RECORDS and not MAX_RECORDS+1.

    When gIndex is MAX_RECORDS-1, the  pEntry pointer takes the address of gLogArray[MAX_RECORDS] which means that it passes the memory boundary.

    The fix is to compare the gIndex with MAX_RECORDS-1 as in:

    if (InterlockedCompareExchange(&gIndex, 0, MAX_RECORDS-1) == MAX_RECORDS-1)

  • Two bugs:

     1- it uses gLogArray[MAX_RECORDS]

     2- it returns NULL even if it founds a valid log array record.

    it should be like below:

    PLOG_ENTRY GetNextLogEntry ()


         ASSERT (MAX_RECORDS > 0);

         ULONG IterationCount = MAX_RECORDS;

         PLOG_ENTRY pEntry;      



               if (InterlockedCompareExchange(&gIndex, 0, MAX_RECORDS - 1) == MAX_RECORDS - 1)

                     pEntry = &gLogArray[0];


                     pEntry = &gLogArray[InterlockedIncrement(&gIndex)];

         } while(InterlockedCompareExchange(&pEntry->Active, 1, 0) != 0 && (--IterationCount > 0));

         return (IterationCount > 0) ? pEntry : NULL;


  • Just a quick stab at it. But it seems to be a variation of the double check problem. You check something and then update it outside the locked path.

    Above you check if you have reached the end of the circular buffer. Let's say 3 cores do that simultanously when gIndex is MAX_RECORDS-1, then those 3 cores will increment gIndex passed MAX_RECORDS and then next time around the loop will go haywire on you update whatever memory it can find when setting the Active flag for IterationCount location passed this circular buffer.

    So two problems, double check and ill controlled invariant (>MAX_RECORDS is a safer bet, still completely wrong, but..)

  • Didn't see that we were supposed to give a slight modification that could solve the problem.

    One way might be to do the increment before the check, a bit ugly (all 3 cores will contend for index 0) but safe (you are guarding the buffer usage, not the actual index), at least I think it is safe. I usually don't play with these kind of things as it is very easy to get it wrong. Just grab the spinlock and create a critical path and be done with it (unless this is some really crucial kernel code path)

    Also I just realized I was talking about cores and not threads. Are the interlocked barriers? I think so, so it should be safe.

  • #1 Not multi-thread safe, This line can be paused before execution, queing it up

    thread one pauses right before pEntry = &gLogArray[InterlockedIncrement(&gIndex)]; thread two executes InterlockedCompareExchange(&gIndex, 0, MAX_RECORDS) == MAX_RECORDS)

    #2 IterationCount's usage isn't thread safe --

    #2 is really bad though, because it would typically assemble int a

    mov eax,[value]

    sub eax,1

    mov [value],eax

    instead of

    sub DWORD PTR [value],1 which means that thread switching is three times more likely to screw with the value, need to use interlock to decrement interationcount (among other things)

Page 1 of 2 (19 items) 12