October, 2008

  • The Old New Thing

    If you're going to reformat source code, please don't do anything else at the same time


    I spend a good amount of my time doing source code archaeology, and one thing that really muddles the historical record is people who start with a small source code change which turns into large-scale source code reformatting.

    I don't care how you format your source code. It's your source code. And your team might decide to change styles at some point. For example, your original style guide may have been designed for the classic version of the C language, and you want to switch to a style guide designed for C++ and its new // single-line comments. Your new style guide may choose to use spaces instead of tabs for indentation, or it may dictate that opening braces go on the next line rather than hanging out at the end of the previous line, or you may have a new convention for names of member variables. Maybe your XML style guidelines changed from using

    <element attribute1="value1" attribute2="value2" />

    Whatever your changes are, go nuts. All I ask is that you restrict them to "layout-only" check-ins. In other words, if you want to do some source code reformatting and change some code, please split it up into two check-ins, one that does the reformatting and the other that changes the code.

    Otherwise, I end up staring at a diff of 1500 changed lines of source code, 1498 of which are just reformatting, and 2 of which actually changed something. Finding those two lines is not fun.

    And the next time somebody is asked to do some code archaeology, say to determine exactly when a particular change in behavior occurred, or to give an assessment of how risky it would be to port a change, the person who is invited to undertake the investigation may not be me. It may very well be you.

  • The Old New Thing

    Why can't you thunk between 32-bit and 64-bit Windows?


    It was possible to use generic thunks in 16-bit code to allow it to call into 32-bit code. Why can't we do the same thing to allow 32-bit code to call 64-bit code?

    It's the address space.

    Both 16-bit and 32-bit Windows lived in a 32-bit linear address space. The terms 16 and 32 refer to the size of the offset relative to the selector.

    Okay, I suspect most people haven't had to deal with selectors (and that's probably a good thing). In 16-bit Windows, addresses were specified in the form of a selector (often mistakenly called a "segment") and an offset. For example, a typical address might be 0x0123:0x4567. This means "The byte at offset 0x4567 relative to the selector 0x0123." Each selector had a corresponding entry in one of the descriptor tables which describes things like what type of selector it is (can it be used to read data? write data? execute code?), but what's important here is that it also contained a base address and a limit. For example, the entry for selector 0x0123 might say "0x0123 is a read-only data selector which begins at linear address 0x00524200 and has a limit of 0x7FFF." This means that the address 0x0123:n refers to the byte whose linear address is 0x00524200 + n, provided that n ≤ 0x7FFF.

    With the introduction of the 80386, the maximum limit for a selector was raised from 0xFFFF to 0xFFFFFFFF. (Accessing the bytes past 0xFFFF required a 32-bit offset, of course.) Now, if you were clever, you could say "Well, let me create a selector and set its base to 0x00000000 and its limit to 0xFFFFFFFF. With this selector, I can access the entire 32-bit linear address space. There's no need to chop it up into 64KB chunks like I had to back in the 16-bit days. And then I can just declare that all addresses will be in this form and nobody would have to bother specifying which selector to use since it is implied."

    And if you said this, then you invented the Win32 addressing scheme. It's not that there are no selectors; it's just that there is effectively only one selector, so there's no need to say it all the time.

    Now let's look at the consequences of this for thunking.

    First, notice that a full-sized 16-bit pointer and a 32-bit flat pointer are the same size. The value 0x0123:0x467 requires 32 bits, and wow, so too does a 32-bit pointer. This means that data structures containing pointers do not change size between their 16-bit and 32-bit counterparts. A very handy coincidence.

    Next, notice that the 16-bit address space is still fully capable of referring to every byte in the 32-bit address space, since they are both windows into the same underlying linear address space. It's just that the 16-bit address space can only see the underlying linear address space in windows of 64KB, whereas the 32-bit address space can see it all at once. This means that any memory that 32-bit code can access 16-bit code can also access. It's just more cumbersome from the 16-bit side since you have to build a temporary address window.

    Neither of these two observations holds true for 32-bit to 64-bit thunking. The size of the pointer has changed, which means that converting a 32-bit structure to a 64-bit structure and vice versa changes the size of the structure. And the 64-bit address space is four billion times larger than the 32-bit address space. If there is some memory in the 64-bit address space at offset 0x000006fb`01234567, 32-bit code will be unable to access it. It's not like you can build a temporary address window, because 32-bit flat code doesn't know about these temporary address windows; they abandoned selectors, remember?

    It's one thing when two people have two different words to describe the same thing. But if one party doesn't even have the capability of talking about that thing, translating between the two will be quite difficult indeed.

    P.S., like most things I state as "fact", this is just informed speculation.

  • The Old New Thing

    Psychic debugging: Why your thread is spending all its time processing meaningless thread timers


    I was looking at one of those "my program is consuming 100% of the CPU and I don't know why" bugs, and upon closer investigation, the proximate reason the program was consuming 100% CPU was that one of the threads was being bombarded with WM_TIMER messages where the MSG.hWnd is NULL. The program was dispatching them as fast as it could, but the messages just kept on coming. Curiously, the LPARAM for these messages was zero.

    This should be enough information for you to figure out what is going on.

    First, you should refresh your memory as to what a null window handle in a WM_TIMER message means: These are thread timers, timers which are associated not with a window but with a thread. You create a thread timer by calling the SetTimer function and passing NULL as the window handle. Thread timer messages arrive in the message queue, and the DispatchMessage function calls the timer procedure specified by the message LPARAM. If the LPARAM of a thread timer message is zero, then dispatching the message consists merely of throwing it away. (If there were a window handle, then the message would be delivered to the window procedure, but there isn't one, so there's nothing else that can be done.)

    The program was spending all its time retrieving WM_TIMER messages from its queue and throwing them away. The real question is how all these thread timers ended up on the thread when they don't do anything. Who would create a timer that didn't do anything? And who would create dozens of them?

    One of the more common patterns for creating a window timer is to write SetTimer(hwnd, idTimer, dwTimeout, NULL). This creates a window timer whose identifier is idTimer. Since the timer procedure is NULL, the WM_TIMER message is dispatched to the window procedure, which in turn will have a case WM_TIMER statement followed by a switch (wParam) to handle the timer message.

    But what if hwnd is NULL, say because you forgot to check the return value of a function like CreateWindow? Well, then you just created a thread timer by mistake. And if you make this mistake several times in a row, you've just created several thread timers. Now you might think that the code that created the thread timer by mistake will also destroy the thread timer by mistake when it finally gets around to calling KillTimer(hwnd, idTimer) and passes NULL for the hwnd. But it doesn't.

    One reason is that in many cases, it's the timer that turns itself off. In other words, the KillTimer happens inside the WM_TIMER message handler. But if the WM_TIMER message isn't associated with that window, then that window procedure never gets a chance to turn off the timer.

    Another reason is more insidious. Recall that the idTimer parameter to the SetTimer function is ignored when you create a thread timer. Since you can't predict what other thread timers may exist, you can't know which timer identifiers are in use and which are free. Instead, the SetTimer function creates a unique thread timer identifier and returns it, and it is that timer identifier you must use when destroying the thread timer. Of course, the code that accidentally created the thread timer thought it was creating a window timer (which uses the timer identifier you specify), so it didn't bother saving the return value. Result: Thread timer is created and becomes orphaned.

    The machine I was asked to look at was running a stress scenario, so it was entirely likely that a low memory condition caused a function like CreateWindow to fail, and the program most likely neglected to check the return value. I never did hear back to find out if that indeed was the source of the problem, but seeing as they didn't come back for more help, I suspect I put them on the right track.

  • The Old New Thing

    Eventually, nothing is special any more


    Commenter ulric suggested that two functions for obtaining the "current" window should exist, one for normal everyday use and one for "special use" when you want to interact with windows outside your process.

    I'd be more at ease however if the default behaviour of the API was to return HWND for the current process only, and the apps that really need HWND from other potentially other processes would have to be forced to use another API that is specifically just for that.

    This is an excellent example of suggesting something that Windows already does. The special function has become so non-special, you don't even realize any more that it's special.

    Originally, in 16-bit Windows, the function for getting the "current" window was GetActiveWindow. This obtained the active window across the entire system. One of the major changes in Win32 is the asynchronous input model, wherein windows from different input queues receive separate input. That way, one program that has stopped responding to input doesn't clog up input for other unrelated windows. Win32 changed the meaning of GetActiveWindow to mean the active window from the current input queue.

    In 16-bit Windows, there was only one input queue, the global one. In 32-bit Windows, each thread (or group of input-attached threads) gets its own input queue.

    As a result of this finer granularity, when a program was ported from 16-bit Windows to 32-bit Windows, it didn't "see" windows from other programs when it called functions like GetFocus or GetActiveWindow. As every Win32 programmer should know, these states are local to your input queue.

    Okay, let's look at what we've got now. GetFocus and GetActiveWindow give you the status of your input queue. In other words, in a single-threaded program (which, if you're coming from 16-bit Windows, is the only type of program there is), calling GetActiveWindow gives you the active window from your program. It doesn't return the active window from another program.¹ Things are exactly as ulric suggested!

    Now let's look at the second half of the suggestion. If a program really needs to get a window from potentially other processes, it would have to use some other function that is specifically just for that. And indeed, that's why the GetForegroundWindow function was added. The GetForegroundWindow function is the special function specifically designed for obtaining windows from other processes.

    Therefore, we did exactly what ulric recommended, and it still turned into a mess. Why?

    Because once you create something special, it doesn't remain special for long.

    It may take a while, but eventually people find that the regular function "doesn't work" (for various definitions of "work"), and they ask around for help. "When I call GetActiveWindow, I'm not getting the global active window; I'm just getting the local one. How do I get the global one?" Actually, they probably don't even formulate the question that clearly. It's probably more like "I want to get the active window, but GetActiveWindow doesn't work."

    And then somebody responds with "Yeah, GetActiveWindow doesn't work. I've found that GetForegroundWindow works a lot better."

    The response is then "Wow, that works great! Thanks!"

    Eventually, the word on the street is "GetActiveWindow doesn't work. Use GetForegroundWindow instead." Soon, people are using it for everything, waxing their car, calming a colicky baby, or improving their sexual attractiveness.

    What used to be a function to be used "only in those rare occasions when you really need it" has become "the go-to function that gets the job done."

    In fact, the unfashionableness of the active window has reached the point that people have given up on calling it the active window at all! Instead, they call it the foreground window from the current process. It's like calling a land line a "wired cell phone".

    Requiring a new flag to get the special behavior doesn't change things at all. It's the same story, just with different names for the characters. "GetFocalWindow² doesn't work unless you pass the GFW_CROSSPROCESS flag." Soon, everybody will be passing the GFW_CROSSPROCESS not because they understand what it does but just because "That's what I was told to do" and "It doesn't work if I don't pass it."


    ¹Assuming you haven't run around attaching your thread to some other program's input queue. This is a pretty safe assumption since the AttachThreadInput function didn't exist in 16-bit Windows either.

    ²GetFocalWindow is an imaginary function created for the purpose of the example.

    [Raymond is currently away; this message was pre-recorded.]

  • The Old New Thing

    To climb the corporate ladder you'll need some rope, but rope has many purposes


    When KC told me about a trick she learned to get an area expert to respond to her email, I cautioned her that the trick might backfire:

    A friend of mine (let's call him Bob) happens also to work in the technology industry, and the manager for the part of the project he worked on was, to put it nicely, "in the wrong line of work." No matter how many times Bob would explain how the system worked on the whiteboard, his manager never really understood it. And the misunderstandings weren't just of the "oh I missed a little detail" variety; rather, they tended to elicit a "What planet are you from?" sort of reaction. Bob spent many impromptu meetings patiently trying to clear up various degrees of confusion or explaining why some clever new idea won't work because it violates the laws of physics as we currently understand them.

    Add to the tenuous grasp on technical concepts the uncanny ability to take all the credit when making presentations to senior management. You know the type of people I'm talking about: They're the sort who manage to skate through school by using their good looks, charming personality, and/or social status to get other people to do their homework for them.

    One day, the manager sent out a plan to a large group of people, including some senior managers, and included in the email one of those "Huh?" questions, something akin to "... and it'll connect to the server wirelessly through the parallel port, and—hey Bob, inkjet printers can run off the parallel port, right?"

    Bob decided that he'd had enough. He replied to the mail thread with a simple, "Yes, inkjet printers can run off the parallel port."

    Somebody else in the group gave Bob a phone call. "Um, Bob, what do inkjet printers have to do with anything?"

    Bob answered, "Just making sure there's enough rope."

    Bob's colleague replied, "Gotcha."

    The manager's tenure ended a few months later.

  • The Old New Thing

    Why does the Disk Management snap-in report my volume as Healthy when the drive is dying?


    Windows Vista displays a big scary dialog when the hard drive's on-board circuitry reports that the hardware is starting to fail. Yet if you go to the Disk Management snap-in, it reports that the drive is Healthy. What's up with that?

    The Disk Management snap-in is interested in the logical structure of the drive. Is the partition table consistent? Is there enough information in the volume to allow the operating system to mount it? It doesn't know about the drive's physical condition. In other words, "As far as the Disk Management snap-in is concerned, the drive is healthy."

    Similarly, your car's on-board GPS may tell you that you are on track for a 6pm arrival at your destination, unaware that you have an oil leak that is going to force you to the side of the road sooner or later. All the GPS cares about is that the car is travelling along the correct road.

    [Raymond is currently away; this message was pre-recorded.]

  • The Old New Thing

    Acquire and release sound like bass fishing terms, but they also apply to memory models


    Many of the normal interlocked operations come with variants called InterlockedXxxAcquire and InterlockedXxxRelease. What do the terms Acquire and Release mean here?

    They have to do with the memory model and how aggressively the CPU can reorder operations around it.

    An operation with acquire semantics is one which does not permit subsequent memory operations to be advanced before it. Conversely, an operation with release semantics is one which does not permit preceding memory operations to be delayed past it. (This is pretty much the same thing that MSDN says on the subject of Acquire and Release Semantics.)

    Consider the following code fragment:

    int adjustment = CalculateAdjustment();
    while (InterlockedCompareExchangeAcquire(&lock, 1, 0) != 0)
      { /* spin lock */ }
    for (Node *node = ListHead; node; node = node->Next)
       node->value += adjustment;
    InterlockedExchangeRelease(&lock, 0);

    Applying Acquire semantics to the first operation operation ensures that the operations on the linked list are performed only after the lock variable has been updated. This is obviously desired here, since the purpose of the updating the lock variable is ensure that no other threads are updating the list while we're walking it. Only after we have successfully set the lock to 1 is it safe to read from ListHead. On the other hand, the Acquire operation imposes no constraints upon when the store to the adjustment variable can be completed to memory. (Of course, there may very well be other constraints on the adjustment variable, but the Acquire does not add any new constraints.)

    Conversely, Release semantics for an interlocked operation prevent pending memory operations from being delayed past the operation. In our example, this means that the stores to node->value must all complete before the interlocked variable's value changes back to zero. This is also desired, because the purpose of the lock is to control access to the linked list. If we had completed the stores after the lock was released, then somebody else could have snuck in, taken the lock, and, say, deleted an entry from the linked list. And then when our pending writes completed, they would end up writing to memory that has been freed. Oops.

    The easy way to remember the difference between Acquire and Release is that Acquire is typically used when you are acquiring a resource (in this case, taking a lock), whereas Release is typically used when you are releasing the resource.

    As the MSDN article on acquire and release semantics already notes, the plain versions of the interlocked functions impose both acquire and release semantics.

    Bonus reading: Kang Su discusses how VC2005 converts volatile memory accesses into acquires and releases.

    [Raymond is currently away; this message was pre-recorded.]

  • The Old New Thing

    The dangers of setting your double-click speed too short


    After I noted how the window manager uses the double-click time as a basis for determining how good your reflexes are, people got all excited about reducing the double-click speed to make Windows feel peppier. But be careful not to go overboard.

    Back in the Windows 95 days, we got a bug from a beta tester that went roughly like this:

    Title: Double-clicks stop working after using mouse control panel
    Reproducibility: Consistent, hardware-independent
    Severity: Major loss of functionality

    1. Open the mouse control panel.
    2. Go to the Double-click speed slider.
    3. Drag the slider all the way to the right (fastest).
    4. Click OK.

    Result: Mouse double-clicks no longer recognized.

    We had to explain to the beta tester that, no, everything is actually working as intended. But if you set the double-click slider to the fastest setting, you had better be good at double-clicking really fast. You have clearly set the double-click speed was faster than you are physically capable of double-clicking. Maybe you can ask your twelve-year-old nephew to do your double-clicking for you.

    That's why there is the test icon next to the slider. Before clicking OK, make sure you can still double-click the test icon. If you can't, then you picked a setting that's too fast for your reflexes and you should consider a slower setting.

    Pre-emptive Yuhong Bao comment: In Windows 95, the test icon was a jack-in-the-box.

    [Raymond is currently away; this message was pre-recorded.]

  • The Old New Thing

    Why does killing Winlogon take down the entire system?


    Commenter Demeli asks, "Why does Winlogon take down the entire system when you attach a debugger to it? (drwtsn32 -p <pid of Winlogon>)"

    This question already has a mistaken in it. Running drwtsn32 on a process isn't attaching a debugger to it. Attaching a debugger would be something like ntsd -p <pid of Winlogon>, and this does work, assuming you have the necessary privileges, of course. (Indeed, this is how the Windows team debugs problems with Winlogon.) In other words, the literal answer to the question is "No, Winlogon does not take down the entire system when you attach a debugger to it."

    What drwtsn32 does is take a crash dump of the process and then kills the target process. And it is killing Winlogon that crashes the system.

    Winlogon is what is known as a "critical system process", the death of which forces a system restart. And you can probably guess why the system considers Winlogon critical to its functioning. For example, Winlogon is responsible for handling the secure attention sequence, also known as Ctrl+Alt+Del. If Winlogon were to die, then the secure attention sequence would stop working. That's kind of bad.

  • The Old New Thing

    Why do maximized windows lose their title bar translucency?


    If you have translucent title bars enabled,¹ you may have noticed that the translucency goes away when you maximize a window. Why is that?

    This is a performance optimization.

    Opaque title bars are more efficient than translucent ones, and when you maximize a window, you're saying,² "I want to focus entirely on this window and no other windows really matter to me right now." In that case, the desktop window manager doesn't bother with translucency because you're not paying any attention to it anyway.

    This may seem like a very minor change, but the difference is noticeable on benchmarks, and, like it or not, magazine writers like to use benchmarks as an "objective" way of determining how good a product is. The reviewers choose the game, and we are forced to play it.


    ¹The desktop window composition feature that provides the translucent title bar probably has some official name, but I can never remember what it is and I'm too lazy to go find out.

    ²This may not be literally what you're saying, but it's how the window manager interprets your action.

    [Raymond is currently away; this message was pre-recorded.]

Page 1 of 4 (33 items) 1234