• The Old New Thing

    Bonus chatter about that virus that is responsible for the top six Explorer crashes

    • 29 Comments

    Last year, I wrote about a virus that is responsible for the top six Explorer crashes, by a wide margin.

    I learned later how the authors of this XYZ Virus operate, and it happens to answer a question posted by commenter SteveL as to why these virus writers are so incompetent that they crash so much.

    First, the virus authors infect your computer and crash your system every so often on purpose. Meanwhile, they also set up a legitimate-looking Web site which sells anti-virus software that claims to remove this virus. You send them your money, they send you the software.

    The kicker is that the removal software doesn't work. Your computer is still infected with the XYZ virus. But they don't care. They already got your money.

  • The Old New Thing

    The format rectangle is recalculated whenever the window resizes, and that's a good thing

    • 5 Comments

    Reader asmguru62 asks why a custom format rectangle of a multi-line edit control is lost when the edit control is resized. That's right, the format rectangle is recalculated whenever the window resizes, and that's a good thing.

    Imagine if it weren't recalculated. You create a multi-line edit control. Like many programs, you might create it at a temporary size while you try to figure out what its real size should be. You might even resize the control dynamically as the container is resized. (After all, everybody wants resizable dialogs.) What you definitely don't want is the format rectangle of the edit control to be locked to the size the window was originally created at.

    "Well, yeah," asmguru62 says, "sure that may be desirable if the application is using the default margins, but if the margins have been customized, then shouldn't those margins be preserved?"

    First, who says that your call to EM_SETRECT was to established fixed-size margins? All the function knows is that you said "Put the text inside this rectangle." It has no idea how you computed that rectangle. Did you subtract a fixed number of pixels? Did you use a percentage? Did you set the rectangle to force the text into a rectangle whose width and height form the golden mean?

    Second, if you want to set the margins, then set the margins. The EM_SETMARGINS message lets you specify the size of the left and right margins of a multi-line edit control. The edit control will take your margins into account when it recalculates its format rectangle after a resize.

    In other words, EM_SETRECT is the low-level function that lets you manipulate the edit control's internal formatting rectangle, the same rectangle that the edit control itself manipulates in response to the things that edit controls naturally respond to. There is no fancy inference engine here that says, "Let me attempt to reverse-engineer how this rectangle is related to the client rectangle so I can carry forward this computation when I recalculate the format rectangle." Think of it as reaching in and directly whacking one of the edit control's private member variables.

    If you want something higher-level, then use EM_SETMARGINS.

  • The Old New Thing

    I didn't realize that it was International Group B Strep Awareness Month

    • 4 Comments

    I guess they're not doing a particularly good job of creating awareness because it wasn't until I consulted the 2006 National Health Observances calendar that the month of July is International Group B Strep Awareness Month.

    For some reason, July and August are pretty light on the health observances calendar. Maybe because people are on summer vacation.

    Who decides whether a particular health observance merits a day, a week, or a month? You'd think they'd save the months for the really big issues, seeing as there are only twelve of them. And for some reason, ultraviolet radiation gets two months. May is Ultraviolet Awareness Month, and July is UV Safety Month. That's one sixth of the year dedicated to ultraviolet radiation.

    (I'm not belittling the health causes themselves, just the way they get assigned days on the calendar.)

  • The Old New Thing

    When a program is launched via DDE, and there are multiple servers, which one is used?

    • 7 Comments

    Although you are more than welcome to stop using DDE (please oh please stop using it), there are still many applications which still use it (*cough*), and as a result, customers still have questions about it.

    One customer wanted to know why, if multiple copies of a DDE-based application are running, Windows 7 will send the command to the earliest-launched copy of the program, whereas Windows XP will send the command to the most-recently-used copy. "Our employees were used to forcing the document to open in a specific window by switching to that window, thereby making it most-recently-used, then switching back to Explorer and double-clicking the document, and expecting it to open in that window. And that usually (but not always) worked. In Windows 7, it rarely works. Is there an explanation for this change in behavior other than 'internal process and window handling stuff'?"

    It's internal process and window handling stuff.

    If multiple DDE servers are available to handle a command, it is unspecified what order they are used.

    It so happens that Windows XP uses Send­Message­Timeout with the HWND_BROADCAST window target, and it so happens that Send­Message­Timeout on Windows XP sends the messages in top-to-bottom order by z-order. If all windows are responding to messages, then it means that the window closest to the top of the z-order will get the first chance to respond. If there are unresponsive windows, then things get more complicated, and as the customer noted, the results become somewhat unpredictable. (Also, if there are unresponsive windows, your machine froze for 30 seconds.)

    In Windows Vista an optimization was added: Instead of just diving in and broadcasting DDE requests to everybody in search of a server, Explorer remembers the last window that responded and goes straight to that window first, on the theory that if it knew how to handle the Open command last time, it most likely will know how to handle it this time. And it can do this even if other applications are not responding to messages.

    If the optimization fails, then Explorer goes back to the slow broadcast method. But most of the time, the optimization works, and the document gets launched much faster.

  • The Old New Thing

    Visual Studio 2005 gives you acquire and release semantics for free on volatile memory access

    • 10 Comments

    If you are using Visual Studio 2005 or later, then you don't need the weird Interlocked­Read­Acquire function because Visual Studio 2005 and later automatically impose acquire semantics on reads from volatile locations. It also imposes release semantics on writes to volatile locations. In other words, you can replace the old Interlocked­Read­Acquire function with the following:

    #if _MSC_VER >= 1400
    LONG InterlockedReadAcquire(__in volatile LONG *pl)
    {
        return *pl; // Acquire imposed by volatility
    }
    #endif
    

    This is a good thing because it expresses your intentions more clearly to the compiler. The old method that overloaded Interlocked­Compare­Exchange­Acquire forced the compiler to perform the actual compare-and-exchange even though we really didn't care about the operation; we just wanted the side effect of the Acquire semantics. On some architectures, this forces the cache line dirty even if the comparison fails.

  • The Old New Thing

    The PSN_SETACTIVE notification is sent each time your wizard page is activated

    • 5 Comments

    A customer had received a number of crashes via Windows Error Reporting and believed that they had found a bug in the tree view common control.

    In our UI, we have a tree view with checkboxes. The tree view displays a fixed item at the top, followed by a variable number of dynamic items. When the user clicks Next, we look at the tree view to determine what the user selected. The code goes like this (pseudo):

    htiRoot = GetTreeRoot();
    
    // First process the fixed item
    htiFixed = GetChild(htiRoot);
    if (IsTreeViewItemChecked(htiFixed)) {
        .. add the fixed item ...
    }
    
    // Now process the dynamic items
    hti = GetNextSibling(htiFixed);
    while (hti != NULL) {
      if (IsTreeViewItemChecked(hti)) {
        ... add the dynamic item ...
      }
      hti = GetNextSibling(hti);
    }
    

    In the crashes we receive, other variables in the program indicate that there should be only one dynamic item, but our loop iterates multiple times. Furthermore, the first time through the loop, the hItem is not the handle to the first dynamic item but is in fact the handle to the fixed item. This naturally results in a crash when we try to treat the fixed item as if it were a dynamic item.

    Another thing we noticed is that at the time of the crash, all three variables htiRoot htiFixed, and hti have the same value.

    Our attempts to reproduce the problem in-house have been unsuccessful. From our analysis, we believe that the tree view APIs used to obtain handles to children and sibling nodes are misbehaving.

    The customer included the crash bucket number, so we were able to connect to the same crash dumps that the customer was looking at.

    The first thing to dismiss was the remark that all three of the local variables had the same value. This is to be expected since they have non-overlapping lifetimes, and the compiler decided to alias them all to each other to save memory.

    ...
            lea     eax,[ebp+8]         ; htiRoot
            push    eax
            push    1                   ; some flag
            push    ebx                 ; some parameter
            call    00965fb9            ; GetTreeRoot
            mov     [ebp-2Ch],eax
            test    eax, eax
            jl      00971a49            ; failed - exit
    
            mov     edi, [_imp__SendMessageW]
            push    4                   ; TVGN_CHILD
            push    110Ah               ; TVM_GETNEXTITEM
            push    dword ptr [ebx+10h] ; window handle
            call    edi                 ; SendMessage
            mov     [ebp+8],eax         ; htiFixed
    
        ... eliding if (IsTreeViewItemChecked(...)) ...
            jmp     00971a1c            ; enter loop
    
    00971931:
        ... eliding if (IsTreeViewItemChecked(...)) ...
    
    00971a1c:
            push    dword ptr [ebp+8]   ; hti
            push    1                   ; TVGN_NEXT
            push    110Ah               ; TVM_GETNEXTITEM
            push    dword ptr [ebx+10h] ; window handle
            call    edi                 ; SendMessage
            mov     [ebp+8],eax         ; update hti
            test    eax, eax            ; hti == NULL?
            jne     00971931            ; N: continue loop
    

    I've removed code not directly relevant to the discussion. The point to see here is that the compiler combined all three variables into one physical memory location at [ebp+8] since there is no point in the program where more than one of the values is needed at a time. In other words, the compiler rewrote your code like this:

    hti = GetTreeRoot();
    
    hti = GetChild(hti);
    if (IsTreeViewItemChecked(hti)) {
        .. add the fixed item ...
    }
    
    while ((hti = GetNextSibling(hti)) != NULL) {
      if (IsTreeViewItemChecked(hti)) {
        ... add the dynamic item ...
      }
    }
    

    Not only did the compiler merge all your hti variables into one, it realized that once it did that, the two calls to Get­Next­Sibling could be folded together as well.

    Okay, one mystery solved. What about the others?

    From studying the crash dump, the shell team determined that the reason the first dynamic item appears to be the fixed item is that the tree view actually has two fixed items:

    003d06d8 Root
    + 003d0a38 "Configuration settings"
    + 003d0888 "Configuration settings"
    + 003d07b0 "Saved game from May 27, 2009 at 2:42 PM (playing as Thor)"
    + 003d0600 "Saved game from May 27, 2009 at 2:42 PM (playing as Thor)"
    

    "Configuration settings" is the fixed item, and the saved games are the dynamic items. (This isn't the actual scenario from the customer, but it gets the point across.) The customer was wrong to use the definite article when referring to the handle to the fixed item, since there are two fixed items here. In a sense, the customer's understanding that there is only one fixed item clouded their ability to debug the problem: When they saw another fixed item, they assumed not that they received another item that was fixed, but rather that they were getting the same fixed item twice.

    Seeing that the tree view was being populated twice directed the next step of the investigation: Why?

    The code that populates the tree view is called from the wizard page's PSN_SET­ACTIVE notification, and that one piece of information was the last piece of the puzzle.

    The PSN_SET­ACTIVE notification is sent each time the wizard or property sheet page is selected as the current page. If the page is activated twice, then you will get two PSN_SET­ACTIVE notifications. The solution was to populate the tree view only the first time the page was activated.

    Exercise: What was missing from the customer's testing that prevented them from reproducing the problem in their labs?

  • The Old New Thing

    Can CoCreateGuid ever return GUID_NULL?

    • 26 Comments

    A customer asked whether the Co­Create­Guid function can ever return GUID_NULL. Their code uses GUID_NULL for special purposes, and it would be bad if that was ever returned as the GUID for an object. "Can we assume that Co­Create­Guid never returns GUID_NULL? Or should we test the return value against GUID_NULL, and if it is equal, then call Co­Create­Guid and try again?"

    Some people started running Co­Create­Guid a bunch of times and observing that it was spitting out type 4 GUIDs, which will always have a 4 in the version field. Then other people started wondering whether the use of Algorithm 4 was contractual (it isn't). Then still other people went back to read the RFCs which cover UUIDs to see whether those documents provided any guidance.

    And then I had to step in and stop the madness.

    It is very easy to show that any UUID generator which generates GUID_NULL has failed to meet the requirement that the generated UUID be unique in space and time: If it's equal to GUID_NULL, then it isn't unique!

    The uniqueness requirement is that the generated GUID be different from any other valid GUID. And if it generated GUID_NULL, then it wouldn't be different from GUID_NULL! (And GUID_NULL is a valid GUID, specifically identified in RFC4122 section 4.1.7.)

    If you're so worried about Co­Create­Guid generating a duplicate GUID_NULL, why aren't you worried about Co­Create­Guid generating a duplicate IID_IUnknown or GUID_DEV­CLASS_1394 or any of the other GUIDs that have already been generated in the past?

    In other words, no valid implementation of Co­Create­Guid can generate GUID_NULL because the specification for the function says that it is not allowed to generate any GUID that has been seen before.

    One of my colleagues cheekily remarked, "And even if it did generate GUID_NULL for some reason, uniqueness would require that it do so only once! (So you should try to force this bug to occur in test, and then you can be confident that it will never occur in production.)"

  • The Old New Thing

    SubtractRect doesn't always give you the exact difference

    • 15 Comments

    The Subtract­Rect function takes a source rectangle and subtracts out the portion which intersects a second rectangle, returning the result in a third rectangle. But wait a second, the result of subtracting one rectangle from another need not be another rectangle. It might be an L-shape, or it might be a rectangle with a rectangular hole. How does this map back to a rectangle?

    The documentation for Subtract­Rect says that the function performs the subtraction when they "intersect completely in either the x- or y-direction." But I prefer to think of it as the alternate formulation offered in the documentation: "In other words, the resulting rectangle is the bounding box of the geometric difference."

    I was reminded of this subject when I saw some code that tried to do rectangle manipulation like this:

    // Clip rcA to be completely inside rcB.
    RECT rcSub;
    // rcSub = the part of rcA that stick out beyond rcB
    if (SubtractRect(&rcSub, &rcA, &rcB)) {
        // Remove that part from rcA
        SubtractRect(&rcA, &rcA, &rcSub);
    }
    

    If the rectangle rcA extends beyond rcB in more than one direction, then the geometric difference will not be rectangular, and the result of Subtract­Rect will be expanded to the bounding box of the difference, which means that it will return rcA again. And then the second line will subtract it all out, leaving the rectangle empty.

    Oops.

    What they really wanted was

    // Clip rcA to be completely inside rcB.
    IntersectRect(&rcA, &rcA, &rcB);
    
  • The Old New Thing

    Why is the debugger telling me I crashed because my DLL was unloaded, when I see it loaded right here happily executing code?

    • 8 Comments

    A customer was puzzled by what appeared to be contradictory information coming from the debugger.

    We have Windows Error Reporting failures that tell us that we are executing code in our DLL which has been unloaded. Here's a sample stack:

    Child-SP          RetAddr           Call Site
    00000037`7995e8b0 00007ffb`fe64b08e ntdll!RtlDispatchException+0x197
    00000037`7995ef80 000007f6`e5d5390c ntdll!KiUserExceptionDispatch+0x2e
    00000037`7995f5b8 00007ffb`fc977640 <Unloaded_contoso.dll>+0x3390c
    00000037`7995f5c0 00007ffb`fc978296 RPCRT4!NDRSRundownContextHandle+0x18
    00000037`7995f610 00007ffb`fc9780ed RPCRT4!DestroyContextHandlesForGuard+0xea
    00000037`7995f650 00007ffb`fc9b5ff4 RPCRT4!ASSOCIATION_HANDLE::~ASSOCIATION_HANDLE+0x39
    00000037`7995f680 00007ffb`fc9b5f7c RPCRT4!LRPC_SASSOCIATION::`scalar deleting destructor'+0x14
    00000037`7995f6b0 00007ffb`fc978b25 RPCRT4!LRPC_SCALL_BROKEN_FLOW::FreeObject+0x14
    00000037`7995f6e0 00007ffb`fc982e44 RPCRT4!LRPC_SASSOCIATION::MessageReceivedWithClosePending+0x6d
    00000037`7995f730 00007ffb`fc9825be RPCRT4!LRPC_ADDRESS::ProcessIO+0x794
    00000037`7995f870 00007ffb`fe5ead64 RPCRT4!LrpcIoComplete+0xae
    00000037`7995f910 00007ffb`fe5e928a ntdll!TppAlpcpExecuteCallback+0x204
    00000037`7995f980 00007ffb`fc350ce5 ntdll!TppWorkerThread+0x70a
    00000037`7995fd00 00007ffb`fe60f009 KERNEL32!BaseThreadInitThunk+0xd
    00000037`7995fd30 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
    

    But if we ask the debugger what modules are loaded, our DLL is right there, loaded as happy as can be:

    0:000> lm
    start             end                 module name
    ...
    000007f6`e6000000 000007f6`e6050000   contoso    (deferred)
    ...
    

    In fact, we can view other threads in the process, and they are happily running code in our DLL. What's going on here?

    All the information you need to solve this problem is given right there in the problem report. You just have to put the pieces together.

    Let's take a closer look at that <Unloaded_contoso.dll>+0x3390c entry. The address that the symbol refers to is the return address from the previous frame: 000007f6`e5d5390c. Subtract 0x3390c from that, and you get 000007f6`e5d20000, which is the base address of the unloaded module.

    On the other hand, the lm command says that the currently-loaded copy of contoso.dll is loaded at 000007f6`e6000000. This is a different address.

    What happened here is that contoso.dll was loaded into memory at 000007f6`e5d20000, and then it ran for a while. The DLL was then unloaded from memory, and later loaded back into memory. When it returned, it was loaded at a different address 000007f6`e6000000. For some reason (improper cleanup when unloading the first copy, most likely), there was still a function pointer pointing into the old unloaded copy, and when NDRS­Rundown­Context­Handle tries to call into that function pointer, it calls into an unloaded DLL, and you crash.

    When faced with something that seems impossible, you need to look more closely for clues that suggest how your implicit assumptions may be incorrect. In this case, the assumption was that there was only one copy of contoso.dll.

  • The Old New Thing

    Today, we use a GPS to locate Baby Jesus

    • 12 Comments

    When Baby Jesus disappears from a Nativity scene, he might be wearing a tracking device:

    For two consecutive years, thieves made off with the baby Jesus figurine in Wellington, a town of 60,000 in Palm Beach County, Fla. The ceramic original, donated by a local merchant, was made in Italy and worth about $1,800...

    Last year, officials took a GPS unit normally used to track the application of mosquito spray and implanted it in the latest replacement figurine. After that one disappeared, sheriff's deputies quickly tracked it down.

    It's sad that the world has come to this, but nice to know that technology is helping to make the world a slightly better place.

Page 382 of 426 (4,251 items) «380381382383384»