Spot the Defect!

Spot the Defect!

  • Comments 17

At Microsoft we have an internal email list called "Spot the Defect" -- people mail around buggy code they've discovered and we compete to see who can find the most problems with it.  It's fun, and you learn a lot about what other people consider bugs -- everything from security holes to lying comments!

 

I love playing Spot the Defect. Here is the code for the WScript.Sleep method with the comments removed and some serious bugs added.  You'll note that this code has all the required features I mentioned in my previous post.  We go to sleep in one-second (or less) intervals, and tell the operating system to wake us up if COM posts a message to the message queue, because there might be an event handler to dispatch.  We also check to see if the host recorded a script error (either due to an event handler or due to the script timeout firing) so that we can abort the sleep.  This way we never keep the script alive more than a second after it was shut down due to error.

 

What bug did I add?

 

HRESULT CWScript::Sleep(long Time)

{

    const DWORD TimerGranularity = 1000;

    if (Time < 0)

        return E_INVALIDARG;

    DWORD StartTickCount = ::GetTickCount();

    DWORD EndTickCount = StartTickCount + Time;

    DWORD CurTickCount = StartTickCount;

    while(CurTickCount < EndTickCount)

    {

        MSG msg;

        DWORD CurWaitTime = (DWORD)(EndTickCount - CurTickCount);

        if (CurWaitTime > TimerGranularity)

            CurWaitTime = TimerGranularity;

        ::MsgWaitForMultipleObjects(0, NULL, TRUE, CurWaitTime, QS_ALLINPUT | QS_ALLPOSTMESSAGE);

        if (0 != ::PeekMessage(&msg, NULL, 0, 0, PM_REMOVE))

            ::DispatchMessage(&msg);

        if (m_pHost->FErrorPending())

            return S_OK;

        CurTickCount = ::GetTickCount();

    }

    return S_OK;

}

 

  • I see two things that look odd: 1. Is it legal to call MsgWaitForMultipleObjects with zero handles? 2. After MsgWaitForMultipleObjects returns, you should pump all waiting messages, not just one.
  • 1) Yes, that's legal. 2) Good catch -- that bug actually exists in the sources right now, I didn't add it! I'll send that one to the sustaining engineering team. However, there is a more serious bug that I've introduced deliberately.
  • 1) DWORD EndTickCount = StartTickCount + Time; This feels like it's overflow-prone. 2) If MsgWaitForMultipleObjects does indeed accept 0, NULL for the handles arguments, how does it handle bWaitAll == TRUE for such a case? I don't have a compiler/debugger handy :)
  • 1) You're on the right track. What specifically can go wrong here? 2) I must confess that I don't know. Try it and see!
  • *grin* Eric, can I get a T-shirt for finding that bug ;) The overflow bug would appear in a situation such as 1. The machine has been on for a long time and GetTickCount() is nearing 0xFFFFFFFF (which happens every ~49.7 days) 2. The caller passes in a Time value which, when added to the current tick count, overflows a DWORD 3. CurTickCount will then be greater than EndTickCount 4. The while loop exits immediately, so the code never sleeps
  • Worse, though, would be if both CurTickCount and EndTickCount are just before 2^32. By the time you set CurTickCount to ::GetTickCount again, the current tick count may have overflowed and ended up a very small number, which would cause the function to sleep almost 50 days!
  • Sorry, no T-Shirt, but good effort. That's one possible overflow bug. There is a FAR worse one though. Your overflow bug causes it to never wait. There's also one that causes it to wait forever.
  • The worst case possibility for the above scenario is that EndTickCount could be calculated as 0xFFFFFFFF while CurTickCount could be set to 0 at the bottom of the loop. The conditional is still true, so the while loop executes again, but EndTickCount - CurTickCount is now 0xFFFFFFFF, a.k.a. INFINITE, so MsgWaitForMultipleObjects might wait forever.
  • Ed and I apparently posted at the same time. You got it Ed. Actually, it will sleep AT LEAST 50 days, but is likely to sleep longer. The "window" in which curTickCount > endTickCount can be very small and if you miss it, another one doesn't come around for 50 days. In the actual code we keep track of everything in doubles, not DWORDS, detect clock overflows, and adjust the current tick count variable accordingly.
  • Or if EndTickCount happened to be 0xFFFFFFFF, that would wait (almost) forever. The loop would end only if CurTickCount were, by chance, re-set when GetTickCount was 0xFFFFFFFF.
  • Bradley, the adjustment down to the one second granularity prevents INFINITE from ever being passed to WFMO, but in practice this will cause an infinite wait as the odds of hitting CurTickCount = 0xffffffff exactly at the bottom of the loop are around 1/1000, and you get one chance every 50 days. So we can expect that in this (unlikely but possible) scenario, we'd wait on average 25000 days!
  • We just use the subtraction trick to avoid overflows, i.e., change (CurTickCount < EndTickCount) to (CurTickCount - StartTickCount < Time), as documented in MSDN, but I suppose if you want to sleep for 50+ days, you'd need to detect clock overflows.
  • Yes, of course. Somehow I missed the use of CurWaitTime; that'll teach me to speed-read the code in order to get 1ST P05T!!1!!11. :-)
  • 2Kim Gräsman: Quoting the MSDN, April 2003, MsgWaitForMultipleObjects: === bWaitAll [in] If this parameter is TRUE, the function returns when the states of all objects in the pHandles array have been set to signaled and an input event has been received. If this parameter is FALSE, the function returns when the state of any one of the objects is set to signaled or an input event has been received. In this case, the return value indicates the object whose state caused the function to return. === Since the argument is TRUE, the first rule applies. Since the set of said objects is empty, every predicate about its elements is true, including "X has been set to signaled". So one might expect that the function should return immediately. However, that’s wrong, because the rule also says "and an input event has been received". Thus, although the first subcondition "(forall h in pHandles) (h is set to signaled)" is always true, the second "an input event has been received" is false until a message comes. Therefore, this part of code is correct. Another question might be: Can you replace the bWaitAll value with FALSE, preserving the semantics? The answer is yes. The first subcondition would change to "(exists X in pHandles) (X is set to signaled)". Since pHandles is an empty set, the subcondition is always false. However, with bWaitAll the boolean operation between these subconditions changes to OR, and the function will still return when an input event is received. Not depending on the value of bWaitAll, the function will also return if the timeout elapses.
  • Centaur, Yes, it makes sense, if you regard the 0, NULL pair as an empty set. Admittedly, that would probably be the only way to see it if 0, NULL is indeed valid input, and this is not mentioned in the MSDN article. Your other question is what threw me, as well. If the handle set is empty, the second parameter shouldn't really matter, which isn't documented either. So, looking for a bug, I thought I'd raise it :) Thanks for the input.
Page 1 of 2 (17 items) 12