• The Old New Thing

    Book review: Advanced Windows Debugging (Mario Hewardt and Daniel Pravat)

    • 19 Comments

    Ever so often, somebody sends me a book, and most of the time I glance through it and say, "Eh."

    But not this time.

    Advanced Windows Debugging will make you the envy of your friends (if your friends are computer nerds). Even the section with the "Oh come on every moron knows this already" title Basic Debugger Tasks has stuff that I didn't know. Fortunately, you don't have to slog through the stuff you already do know in order to find it, because the nifty new debugger commands are set off in the snippets of debugger conversation. (And by debugger conversation, I mean output of a debugger based on the Windows debug engine, debuggers like ntsd, kd and windbg.)

    Once you get past the "basics", you still have loads more ahead of you. The book covers debugging scenarios like a corrupted heap, a deadlock, or 100% CPU usage, as well as debugging tasks, like following the trail of an LPC request from the client to the server, peeking at the token count of a semaphore, and reconstructing a partially-corrupted stack—and illustrates each investigation with both discussion and annotated debugger output. All the things that seasoned developers take for granted (because they have become instinctual after years of experience) are spelled out for you. Learn more from the book's web site, not unsurprisingly named advancedwindowsdebugging.com.

    I'm keeping this book on my shelf. You can borrow it, but I'm going to insist that you return it when you're done.

  • The Old New Thing

    How to insert a large number of items into a treeview efficiently

    • 18 Comments

    Just a quick tip today.

    If you need to insert a large number of items into a treeview, like tens of thousands, then it's much more efficient to insert them "backwards". (I'm ignoring for now the usability question of having a treeview that large in the first place.) In other words, instead of

    for (i = 0; i < array.Count(); i++) {
     TVINSERTSTRUCT tvis;
     tvis.hParent = hParentNode;
     tvis.hInsertAfter = TVIF_LAST;
     tvis.item.mask = TVIF_TEXT;
     item.item.pszText = array[i].Text();
     TreeView_InsertItem(hwndTreeView, &tvis);
    }
    
    do it this way:
    for (i = array.Count() - 1; i >= 0; i--) {
     TVINSERTSTRUCT tvis;
     tvis.hParent = hParentNode;
     tvis.hInsertAfter = TVIF_FIRST;
     tvis.item.mask = TVIF_TEXT;
     item.item.pszText = array[i].Text();
     TreeView_InsertItem(hwndTreeView, &tvis);
    }
    

    Why is backwards-insertion faster?

    It has to do with the annoying hInsert­After parameter. To validate that the hInsert­After parameter is valid, the treeview needs to verify that the hInsert­After is a valid child of the hParent, and this is done by walking the parent's children looking for a match. The sooner you find the match, the faster the validation completes. (And if you pass TVI_LAST, then the treeview needs to walk to the end of the child list.)

    You'd think that you could verify the parent/child relationship by just doing a Tree­View_Get­Parent(hInsert­After), but that turns out not to be strict enough, because hInsert­After might itself not be a valid parameter. If hInsert­After is a bogus value, then you may crash when you try to read its Parent property. That's if you're lucky. If you're not lucky, then the random memory that hInsert­After points to might look just enough like a valid HTREEITEM that you end up inserting the new node after a completely bogus node, and now the treeview has become corrupted.

    Sure, you got the same problem if you passed a garbage HTREEITEM to Tree­View_Get­Parent, but in that case, it's just garbage in garbage out. Nothing gets corrupted; the application just gets a garbage result. But in the case of Tree­View_Insert­Item, the treeview is going to update its internal data structures based on the garbage you passed in, and that means that the treeview winds up corrupted.

    To ensure that the value passed in is valid, the treeview checks it against the list of valid values for hInsert­After. And therefore, you get better performance if the valid value you pass is the one that it checks first.

    (As it happens, a lot of programs pass garbage for hInsert­After, so this defensive validation step is absolutely necessary.)

    You might say that the treeview could have a one-off optimization for the special TVI_LAST value by remembering the last child so it can be located quickly. The question is whether the complexity of adding that optimization is worth the cost: Any tree rearranging function would have to update the cached value, and if you miss a spot, then the treeview ends up corrupted. That's a pretty high-risk optimization you're suggesting there.

    And think about it this way: You're worrying about a tree where a single node has tens of thousands of children, something which (I am no longer ignoring) is a horrible user interface. That's like a list box with tens of thousands of items, or a dialog box with tens of thousands of checkboxes. You'll probably want to consider a better way of presenting the information than in a tree view that goes on for hundreds of screens. The treeview isn't optimized for this case because you don't optimize for the case where somebody is mis-using your system.

  • The Old New Thing

    If you need anything other than natural alignment, you have to ask for it

    • 27 Comments

    If you need variables to be aligned a particular way, you need to ask for it.

    Let's say I have the following code:

    void fn() 
    { 
     int a; 
     char b; 
     long c; 
     char d[10];
    } 
    

    What would the alignment of the starting adresses of a,b,c and d be?

    What would the alignment be if the memory were allocated on heap?

    If this alignment varies for different data types within the same translation unit, is there a way to force uniform alignment for all types?

    If you need a particular alignment, you have to ask for it. By default, all you can count on is that variables are aligned according to their natural requirements.

    First, of course, there is no guarantee that local variables even reside on the stack. The optimizer may very well decide that particular local variables can reside in registers, in which case it has no alignment at all!

    There are a few ways to force a particular alignment. The one that fits the C language standard is to use a union:

    union char_with_int_alignment {
     char ch;
     int Alignment;
    } u;
    

    Given this union, you can say u.ch to obtain a character whose alignment is suitable for an integer.

    The Visual C++ compiler supports a declaration specifier to override the default alignment of a variable.

    typedef struct __declspec(align(16)) _M128 {
        unsigned __int64 Low;
        __int64 High;
    } M128, *PM128;
    

    This structure consists of two eight-byte members. Without the __declspec(align(#)) directive, the alignment of this structure would be 8-byte, since that is the alignment of the members with the most restrictive alignment. (Both unsigned __int64 and __int64 are naturally 8-byte-aligned.) But with the directive, the aligment is expanded to 16 bytes, which is more restrictive than what the structure normally would be. This particular structure is declared with more restrictive alignment because it is intended to be use to hold 128-bit values that will be used by the 128-bit XMM registers.

    A third way to force alignment with the Visual C++ compiler is to use the #pragma pack(#) directive. (There is also a "push" variation of this pragma which remembers the previous ambient alignment, which can be restored by a "pop" directive. And the /Zp# directive allows you to specify this pragma from the compiler command line.) This directive specifies that members can be placed at alignments suitable for #-byte objects rather than their natural alignment requirements, if the natural alignment is more restrictive. For example, if you set the pack alignment to 2, then all objects that are bigger than two bytes will be aligned as if they were two-byte objects. This can cause 32-bit values and 64-bit values to become mis-aligned; it is assumed that you know what you're doing any can compensate accordingly.

    For example, consider this structure whose natural alignment has been altered:

    #pragma pack(1)
    struct misaligned_members {
     WORD w;
     DWORD dw;
     BYTE b;
    };
    

    Given this structure, you cannot pass the address of the dw member to a function that expects a pointer to a DWORD, since the ground rules for programming specify that all pointers must be aligned unless unaligned pointers are explicitly permitted.

    void ExpectsAlignedPointer(DWORD *pdw);
    void UnalignedPointerOkay(UNALIGNED DWORD *pdw);
    
    misaligned_members s;
    ExpectsAlignedPointer(&s.dw); // wrong
    UnalignedPointerOkay(&s.dw);  // okay
    

    What about the member w? Is it aligned or not? Well, it depends.

    If you allocate a single structure on the heap, then the w member is aligned, since heap allocations are always aligned in a manner suitable for any fundamental data type. (I vaguely recall some possible weirdness with 10-byte floating point values, but that's not relevant to the topic at hand.)

    misaligned_members *p = (misaligned_members)
        HeapAllocate(hheap, 0, sizeof(misaligned_members));
    

    Given this code fragment, the member p->w is aligned since the entire structure is suitably aligned, and therefore so too is w. If you allocated an array, however, things are different.

    misaligned_members *p = (misaligned_members)
        HeapAllocate(hheap, 0, 2*sizeof(misaligned_members));
    

    In this code fragment, p[1].w is not aligned because the entire misaligned_members structure is 2+4+1=7 bytes in size since the packing is set to 1. Therefore, the second structure begins at an unaligned offset relative to the start of the array.

    One final issue is the expectations for alignment when using header files provided by an outside component. If you are writing a header file that will be consumed by others, and you require special alignment, you need to say so explicitly in your header file, because you don't control the code that will be including your header file. Furthermore, if your header file changes any compiler settings, you need to restore them before your header file is complete. If you don't follow this rule, then you create the situation where a program stops working if a program changes the order in which it includes seemingly-unrelated header files.

    // this code works
    #include <foo.h>
    #include <bar.h>
    
    // this code doesn't
    #include <bar.h>
    #include <foo.h>
    

    The problem was that bar.h changed the default structure alignment and failed to return it to the original value before it was over. As a result, in the second case, the structure alignment for the foo.h header file got "infected" and no longer matched the structure alignment used by the foo library.

    You can imagine an analogous scenario where deleting a header file can cause a program to stop working.

    Therefore, if you're writing a header file that will be used by others, and you require nonstandard alignment for your structures, you should use this pattern to change the default alignment:

    #include <pshpack1.h> // change alignment to 1
    ... stuff that assumes byte packing ...
    #include <poppack.h>  // return to original alignment
    

    In this way, you "leave things the way you found them" and avoid the mysterious infection scenarios described above.

  • The Old New Thing

    The magical healing properties of safe mode - bonus content

    • 35 Comments

    Okay, so you already read The healing properties of safe mode in TechNet Magazine. Here's the bonus content that was cut for space.

    First, the original title was "The Magical Healing Powers of Safe Mode," but it got trimmed for space reasons. (Ich bin mit der deutschen Übersetzung des ersten Satzes ein bisschen enttäuscht. Die eingeklammerte Phrase bittet um einen von den berühmten nur auf Deutsch gesehenen unverständlich langen adjektivischen Ausdrücken. Anstatt dessen hat der Übersetzer aufgegeben und die Phrase einfach weggelassen. Anderseits benutzt die deutsche Version den ursprünglichen Titel, so vielleicht ist es ja nicht so schlecht.)

    Useless Windows history: The feature now known as safe mode went through many other names before the final name was settled upon.

    • Fail-safe boot
    • FailSafe boot
    • Fail-safe mode
    • Safe mode
  • The Old New Thing

    Which windows appear in the Alt+Tab list?

    • 16 Comments

    Commenter Phil Quirk wants to know what the rules are for determining which windows appear in the Alt+Tab list. It's actually pretty simple although hardly anything you'd be able to guess on your own. Note: The details of this algorithm are an implementation detail. It can change at any time, so don't rely on it. In fact, it already changed with Flip and Flip3D; I'm just talking about the Classic Alt+Tab window here.

    For each visible window, walk up its owner chain until you find the root owner. Then walk back down the visible last active popup chain until you find a visible window. If you're back to where you're started, then put the window in the Alt+Tab list. In pseudo-code:

    BOOL IsAltTabWindow(HWND hwnd)
    {
     // Start at the root owner
     HWND hwndWalk = GetAncestor(hwnd, GA_ROOTOWNER);
    
     // See if we are the last active visible popup
     HWND hwndTry;
     while ((hwndTry = GetLastActivePopup(hwndWalk)) != hwndTry) {
      if (IsWindowVisible(hwndTry)) break;
      hwndWalk = hwndTry;
     }
     return hwndWalk == hwnd;
    }
    

    The purpose of this algorithm is to assign the most meaningful representative winow from each cluster of windows related by ownership. (Notice that the algorithm doesn't care whether the owned window is modal or non-modal.)

    At least that's the simple rule if you're not playing crazy window style games. The WS_EX_TOOLWINDOW and WS_EX_APPWINDOW extended styles were created so people can play games and put their window in the Alt+Tab list or take it out even if the simple rule would normally have decided otherwise. This is one of those "Okay, if you think you're smarter than Windows, here's your chance to prove it" options. Personally, I would avoid them since it makes your window behave differently from the rest of the windows in the system.

    A window with the WS_EX_TOOLWINDOW extended style is treated as if it weren't visible, even if it is. A window with the WS_EX_APPWINDOW extended style is treated as if it has no owner, even if it does.

    Once you start adding these extended styles, you enter the world of "I'm trying to work around the rules" and the result is typically even worse confusion than what you had without them.

    I'm not sure what the original commenter is getting at. The window hierarchy described in the suggestion (which doesn't make it so much a suggestion as it is a request for me to debug their problem) says that window C is modal on both windows A and B, which doesn't make sense to me, since a window has only one owner.

    The algorithm for choosing the Alt+Tab representative from each cluster of windows may not be the best, but it's what we have. I wouldn't be surprised if the details are tweaked from time to time. No, wait, let me rephrase that. I know that the details are tweaked from time to time. The spirit of the operation is preserved (to show the windows the user can switch to, using the most "natural" candidate for each cluster of windows), but the specific details may be fined-tuned as the concept of "naturalness" is refined.

  • The Old New Thing

    Polling by sleeping versus polling by waiting with a timeout

    • 34 Comments

    Commenter Francois Boucher asks it's better to write a background worker thread that polls with Sleep() and a flag, or polls by waiting for an event with a timeout?

    // method A
    while (fKeepGoing) {
     .. a little work ..
     Sleep(50);
    }
    
    // method B
    do {
     .. a little work ..
    } while (WaitForSingleObject(hEvent, 50) == WAIT_TIMEOUT);
    

    "Which scenario is better? The first one uses only 1 handle for the thread. The second one will use 2. But is the first scenario wasting more thread time? Is it worth using the event (kernel object)?"

    Yeah, whatever.

    I don't quite understand why you want to pause every so often. Why not just do the work at low priority? When there are more important things to do, your background thread will stop doing whatever it's doing. When there is an available CPU, then your background thread will do its thing as fast as it can (until something with higher priority arrives).

    The only thing I can think of is that by adding the pauses, your program won't look like it's consuming 100% CPU while the background processing is going on. Which is true, but then again, that's not much consolation. "Wow, with these changes, my spell check takes only 10% of the CPU. But then again, it also takes 10 times as long." Is that an improvement? You made your customer wait ten times as long for the document to be spell checked. That's ten times less battery life for your laptop.

    And generally speaking, polling should be avoided because it carries negative consequences for system performance. So we're basically asking, "Which is better, hammering with a screwdriver or hammering with pliers?"

    But let's say for the sake of argument that this "back off periodically" polling loop is actually the right design, and the only argument is which of the above two methods is "better" in terms of the criteria listed above (handles, thread time).

    It still doesn't matter.

    Method A has one fewer handle, but one more flag. So the total number of things you have to keep track of is the same either way.

    "Oh, but I save the overhead of an additional handle."

    Dude, you're already burning a thread. A single handle to an event is noise compared to the cost of a thread.

    "But I save the cost of validating the handle and following the reference to the underlying kernel object."

    Dude, you're about to go to sleep for 50 milliseconds. Saving a few thousand clocks is noise compared to 50 milliseconds.

    The flag method does have some other problems, none of which are deal-breaking, but are things to bear in mind.

    First, that flag that you're checking. There's no synchronization on that variable, so the background thread may run a smidge longer than necessary because the change hasn't yet propagated to the CPU running the loop. Similarly, the sleep loop does take a little longer to notice that it should stop working. If the fKeepGoing flag is set to FALSE during the sleep, the sleep will still run to completion before the loop finds out.

    In the grand scheme of things, however, the extra 50 to 100 milliseconds are probably not a big deal. The background thread is a little slow to shut down, that's all. The user will probably not even notice that the CPU meter was higher than normal for an additional tenth of a second. After all, the typical human reaction time is 100 milliseconds anyway.

    I'm assuming that the code that signals the background thread doesn't sit around waiting for the background thread to clean up. If it does, then this 100ms delay may start to combine with other delays to turn into something the user may start to notice. A tenth of a second here, a tenth of a second there, soon you may find yourself accumulating a half second's delay, and that's a delay the human brain can perceive.

  • The Old New Thing

    You just have to accept that the file system can change

    • 31 Comments

    A customer who is writing some sort of code library wants to know how they should implement a function that determines whether a file exists. The usual way of doing this is by calling GetFileAttributes, but what they've found is that sometimes GetFileAttributes will report that a file exists, but when they get around to accessing the file, they get the error ERROR_DELETE_PENDING.

    The lesser question is what ERROR_DELETE_PENDING means. It means that somebody opened the file with FILE_SHARE_DELETE sharing, meaning that they don't mind if somebody deletes the file while they have it open. If the file is indeed deleted, then it goes into "delete pending" mode, at which point the file deletion physically occurs when the last handle is closed. But while it's in the "delete pending" state, you can't do much with it. The file is in limbo.

    You just have to be prepared for this sort of thing to happen. In a pre-emptively multi-tasking operating system, the file system can change at any time. If you want to prevent something from changing in the file system, you have to open a handle that denies whatever operation you want to prevent from happening. (For example, you can prevent a file from being deleted by opening it and not specifying FILE_SHARE_DELETE in your sharing mode.)

    The customer wanted to know how their "Does the file exist?" library function should behave. Should it try to open the file to see if it is in delete-pending state? If so, what should the function return? Should it say that the file exists? That it doesn't exist? Should they have their function return one of three values (Exists, Doesn't Exist, and Is In Funky Delete State) instead of a boolean?

    The answer is that any work you do to try to protect users from this weird state is not going to solve the problem because the file system can change at any time. If a program calls "Does the file exist?" and the file does exist, you will return true, and then during the execution of your return statement, your thread gets pre-empted and somebody else comes in and puts the file into the delete-pending state. Now what? Your library didn't protect the program from anything. It can still get the delete-pending error.

    Trying to do something to avoid the delete-pending state doesn't accomplish anything since the file can get into that state after you returned to the caller saying "It's all clear." In one of my messages, I wrote that it's like fixing a race condition by writing

    // check several times to try to avoid race condition where
    // g_fReady is set before g_Value is set
    if (g_fReady && g_fReady && g_fReady && g_fReady && g_fReady &&
        g_fReady && g_fReady && g_fReady && g_fReady && g_fReady &&
        g_fReady && g_fReady && g_fReady) { return g_Value; }
    

    The compiler folks saw this message and got a good chuckle out of it. One of them facetiously suggested that they add code to the compiler to detect this coding style and not optimize it away.

  • The Old New Thing

    If you say that you don't care about something, you shouldn't be upset that it contains garbage

    • 32 Comments

    There are many situations where you pass a structure to a function, and the function fills in the structure with information you request. In some cases, the function always fills in the entire structure (example: GlobalMemoryStatus). In other cases, you tell the function which bits of information you care about, to save the function the effort of computing something you weren't interested in anyway (example: TreeView_GetItem).

    In the latter case, if you say that you aren't interested in certain parts of the structure, and then you change your mind and start paying attention to them, don't be surprised if you find that there's nothing interesting there. After all, you said you didn't care.

    For example, if you call TreeView_GetItem and set the mask to TVIF_IMAGE | TVIF_PARAM, this means that you want the function to set the iImage and lParam members of the TVITEM structure and that you don't care about the rest. After the call returns, the values of those two members are defined, since you said that's what you wanted, and the remainder of the output fields are undefined. They might contain useful information, they might contain garbage, you're not supposed to care since you said that you didn't.

    Why might fields you said you didn't care about still contain information (correct or incorrect)? It might be that the value is so easy to compute that checking whether the value should be set takes more work than actually setting it! In such a case, the function might choose to set the value even if you didn't say that you needed it.

    On the other hand, the value might be an artifact of a translation layer: You pass a structure saying, "I'm interested in two out of the four members." The function in turn calls a lower lever function with a different structure, saying, "I'm interested in two out of the five members of this different structure." After the call returns, the middle-man function converts the lower-level structure to the higher-level structure. Sure, it may also "convert" stuff that was never asked for, but you said you weren't interested, so they just get garbage. In other words, the function you're calling might be defined like this:

    // The pinfo parameter points to this structure
    struct FOOINFO {
     DWORD dwInUse;
     DWORD dwAvailable;
     DWORD dwRequested;
     DWORD dwDenied;
    };
    
    // The dwMask parameter can be a combination of these values
    #define FOOINFO_INUSE     0x0001
    #define FOOINFO_AVAILABLE 0x0002
    #define FOOINFO_REQUESTED 0x0004
    #define FOOINFO_DENIED    0x0008
    
    BOOL GetFooInfo(FOOINFO *pinfo, DWORD dwMask);
    

    Now, the GetFooInfo function might just be a middle man that talks to another component to do the real work.

    // lowlevel.h
    
    struct LOWLEVELSTATS {
     DWORD dwUnitSize;
     DWORD dwUnitsInUse;
     DWORD dwUnitsAvailable;
     DWORD dwUnitsRequested;
     DWORD dwUnitsGranted;
     DWORD dwTotalRequests;
    };
    
    // The dwMask parameter can be a combination of these values
    #define LLSTATS_UNITSIZE  0x0001
    #define LLSTATS_INUSE     0x0002
    #define LLSTATS_AVAILABLE 0x0004
    #define LLSTATS_REQUESTED 0x0008
    #define LLSTATS_GRANTED   0x0020
    #define LLSTATS_REQUESTS  0x0040
    
    BOOL GetLowLevelStatistics(LOWLEVELSTATS *pstats, DWORD dwMask);
    

    The resulting GetFooInfo function merely translates the call from the application into a call to the GetLowLevelStatistics function:

    BOOL GetFooInfo(FOOINFO *pinfo, DWORD dwMask)
    {
     LOWLEVELSTATS stats;
     DWORD dwLowLevelMask = LLINFO_UNITSIZE;
    
     if (dwMask & FOOINFO_INUSE)
      dwLowLevelMask |= LLSTATS_INUSE;
     if (dwMask & FOOINFO_AVAILABLE)
      dwLowLevelMask |= LLSTATS_AVAILABLE;
     if (dwMask & FOOINFO_REQUESTED)
      dwLowLevelMask |= LLSTATS_REQUESTED;
     if (dwMask & FOOINFO_DENIED)
      dwLowLevelMask |= LLSTATS_REQUESTED | LLSTATS_GRANTED;
    
     if (!GetLowLevelStats(&info;stats, dwLowLevelMask))
      return FALSE;
    
     // Convert the LOWLEVELSTATS into a FOOINFO
     pinfo->dwInUse = stats.dwUnitSize * stats.dwUnitsInUse;
     pinfo->dwAvailable = stats.dwUnitSize * stats.dwUnitsAvailable;
     pinfo->dwRequested = stats.dwUnitSize * stats.dwUnitsRequested;
     pinfo->dwDenied = stats.dwUnitSize *
                       (stats.dwUnitsRequested - stats.dwUnitsGranted);
    
     return TRUE;
    }
    

    Notice that if you ask for just FOOINFO_DENIED, you still get the dwRequested as a side effect, since computing the number of requests that were denied entails obtaining the total number of requests. On the other hand, you also get garbage for dwInUse since the call to GetLowLevelStats didn't ask for LLSTATS_INUSE, but the code that converts the LOWLEVELSTATS to a FOOINFO doesn't know that and converts the uninitialized garbage. But since you said that you didn't care about the dwInUse member, you shouldn't be upset that it contains garbage.

    You now know enough to answer this person's question.

    (Note of course that I'm assuming we are not returning uninitialized garbage across a security boundary.)

  • The Old New Thing

    What determines which programs show up on the front page of the Windows XP Start menu?

    • 57 Comments

    The principle is that programs you've run most often recently are the ones that show up on the front page of the Start menu. At least, that's what we started with, but it turns out that some fine-tuning was needed in order to get the experience to be more "natural".

    The basic rule is that each time you launch a program, it "earns a point", and the longer you don't launch a program, the more points it loses. The Start menu then shows the programs that have the most points. That's about all I'm going to say about the mechanics of point-winning for a variety of reasons.

    • The details of the method by which the programs on the Start menu are selected are still patent-pending.†
    • The details are also extraordinarily boring. I get drowsy every time I have to explain them.
    • Exposing the details would just encourage people to "game the system" in order to improve the placement of their programs on the Start menu.
    • Formally documenting the details would implicitly grant permission for people to rely on those details. This would prevent the Start menu designers from making adjustments to the rules (for example, to combat people who "game the system") or to scrap the process entirely and replace it with a whole new method.
    • It's not this part of the Start menu selection algorithm that puzzles people.

    After the basic rule is applied, the fine-tuning and detail-following kick in. Those are the parts that are puzzling to most people. The next several entries will go into many of the subtleties and fine-tuning behind the Start menu's list of frequently-used programs.

    Now, you may wonder about all these subtleties and whether they're really necessary, but it is these little fine-tuning steps that make the final result more useful and feel natural.

    Please hold off your questions until the (two-week!) series is complete, because I suspect a later entry will answer them. (This series is an expansion upon the TechNet column on the same topic. If you've read the TechNet article, then a lot of this series will be review.)

    Pre-emptive snarky comment

    †"Software patents suck!" It's irrelevant what your or my opinion of software patents is. So long as they are legal, they will exist, and you and I will just have to deal with it. If you want a change, write to your congressman. Support candidates whose position on software patents is compatible with yours. Complaining to me accomplishes nothing.‡ It's sad that I have to write this, but any time somebody writes the word "patent" the comments degenerate into flamology about patents. I have a few future entries about patents; the response to this article will determine whether they stay on the schedule or quietly vanish like the stories about Bob.

    ‡Well, it does accomplish something: It gives me another reason to stop blogging.*

    *Along with the people who keep bugging me about using daggers instead of asterisks. Hint: Asterisks already mean something in computer programming.

  • The Old New Thing

    Performance consequences of polling

    • 52 Comments

    Polling kills.

    A program should not poll as a matter of course. Doing so can have serious consequences on system performance. It's like checking your watch every minute to see if it's 3 o'clock yet instead of just setting an alarm.

    First of all, polling means that a small amount of CPU time gets eaten up at each poll even though there is nothing to do. Even if you tune your polling loop so its CPU usage is only, say, a measly one tenth of one percent, once this program is placed on a Terminal Server with 800 simultaneous connections, your 0.1% CPU has magnified into 80% CPU.

    Next, the fact that a small snippet of code runs at regular intervals means that it (and all the code that leads up to it) cannot be pruned from the system's working set. They remain present just to say "Nope, nothing to do." If your polling code touches any instance data (and it almost certainly will), that's a minimum of one page's worth of memory per instance. On an x86-class machine, that 4K times the number of copies of the program running. On that 800-user Terminal Server machine, you've just chewed up 3MB of memory, all of which is being kept hot just in case some rare event occurs.

    Finally, polling has deleterious effects even for people who aren't running humongous Terminal Server machines with hundreds of users. A single laptop will suffer from polling, because it prevents the CPU from going to more power-efficient sleep states, resulting in a hotter laptop and shorter battery life.

    Of course, Windows itself is hardly blame-free in this respect, but the performance team remains on the lookout for rogue polling in Windows and "politely reminds" teams they find engaging in polling that they should "strongly consider" other means of accomplishing what they're after.

Page 5 of 427 (4,266 items) «34567»