January, 2006

  • The Old New Thing

    Performance consequences of polling


    Polling kills.

    A program should not poll as a matter of course. Doing so can have serious consequences on system performance. It's like checking your watch every minute to see if it's 3 o'clock yet instead of just setting an alarm.

    First of all, polling means that a small amount of CPU time gets eaten up at each poll even though there is nothing to do. Even if you tune your polling loop so its CPU usage is only, say, a measly one tenth of one percent, once this program is placed on a Terminal Server with 800 simultaneous connections, your 0.1% CPU has magnified into 80% CPU.

    Next, the fact that a small snippet of code runs at regular intervals means that it (and all the code that leads up to it) cannot be pruned from the system's working set. They remain present just to say "Nope, nothing to do." If your polling code touches any instance data (and it almost certainly will), that's a minimum of one page's worth of memory per instance. On an x86-class machine, that 4K times the number of copies of the program running. On that 800-user Terminal Server machine, you've just chewed up 3MB of memory, all of which is being kept hot just in case some rare event occurs.

    Finally, polling has deleterious effects even for people who aren't running humongous Terminal Server machines with hundreds of users. A single laptop will suffer from polling, because it prevents the CPU from going to more power-efficient sleep states, resulting in a hotter laptop and shorter battery life.

    Of course, Windows itself is hardly blame-free in this respect, but the performance team remains on the lookout for rogue polling in Windows and "politely reminds" teams they find engaging in polling that they should "strongly consider" other means of accomplishing what they're after.

  • The Old New Thing

    There are two types of rebates, and you need to be on the alert


    Many commenters to my earlier entry on sales in France had questions about rebates. Slate explained the whole rebate thing back in 2003. The short version: There are two types of rebates, manufacturer rebates and retailer rebates. Manufacturer rebates exist because they want the retail price to go down, but they are afraid that if they just lowered the wholesale price, retailers would not pass the savings on to the consumer. A manufacturer's rebate ensures that all the benefit of the price drop goes to the consumer and not to any middle-men. Retailer rebates, on the other hand, are carefully crafted schemes designed to trick the consumer into buying the product and then failing to meet all the requirements for redeeming the rebate coupon. Read the Slate article for details.

  • The Old New Thing

    If your callback fails, it's your responsibility to set the error code


    There are many cases where a callback function is allowed to halt an operation. For example, you might decide to return FALSE to the WM_NCCREATE message to prevent the window from being created, or you might decide to return FALSE to one of the many enumeration callback functions such as the EnumWindowsProc callback. When you do this, the enclosing operation will return failure back to its caller: the CreateWindow function returns NULL; the EnumWindows function returns FALSE.

    Of course, when this happens, the enclosing operation doesn't know why the callback failed; all it knows is that it failed. Consequently, it can't set a meaningful value to be retrieved by the GetLastError function.

    If you want something meaningful to be returned by the GetLastError function when your callback halts the operation, it's the callback's responsibility to set that value by calling the SetLastError function.

    This is something that is so obvious I didn't think it needed to be said; it falls into the "because computers aren't psychic (yet)" category of explanation. But apparently it wasn't obvious enough, so now I'm saying it.

  • The Old New Thing

    The vtable does not always go at the start of the object


    Although the diagrams I presented in my discussion of The layout of a COM object place the vtable at the beginning of the underlying C++ object, there is no actual requirement that it be located there. It is perfectly legal for the vtable to be in the middle or even at the end of the object, as long as the functions in the vtable know how to convert the address of the vtable pointer to the address of the underlying object. Indeed, in the second diagram in that article, you can see that the "q" pointer indeed points into the middle of the object.

    Here's an example that puts the vtable at the end of the object:

    class Data {
     Data() : m_cRef(1) { }
     virtual ~Data() { }
     LONG m_cRef;
    class VtableAtEnd : Data, public IUnknown {
     STDMETHODIMP QueryInterface(REFIID riid, void **ppvOut)
      if (riid == IID_IUnknown) {
       *ppvOut = static_cast<IUnknown*>(this);
       return S_OK;
      *ppvOut = NULL;
       return E_NOINTERFACE;
      return InterlockedIncrement(&m_cRef);
      LONG cRef = InterlockedDecrement(&m_cRef);
      if (!cRef) delete this;
      return cRef;

    The layout of this object may very well be as follows: (Warning: Diagram requires a VML-enabled browser.)

    p    IUnknown.vtbl    QueryInterface

    Observe that in this particular object layout, the vtable resides at the end of the object rather than at the beginning. This is perfectly legitimate behavior. Although it is the most common object layout to put the vtable at the beginning, COM imposes no requirement that it be done that way. If you want to put your vtable at the end and use negative offsets to access your object's members, then more power to you.

  • The Old New Thing

    How air conditioning revolutionized competitive bicycling


    I'm not really interested in sports. Teams, standings, scores, who got traded to what team, none of that is interesting to me. What I am interested in, however, is "meta-sports": The business of sports, the technology of sports, the evolution of techniques, changes in the rules, that sort of thing. That's one of the reasons I'm a fan of the radio program Only a Game. (The other, more important, reason can be summed up in two words: Charlie Pierce.)

    All that is a rather lengthy lead-in to Transition Game, Nick Schulz's look at the world behind sports. He covers what it is about sports that I like, with none of the stuff I don't like. (I've linked to him before, but I like him so much I'm going to do it again.) You too can learn how air conditioning revolutionized competitive bicycling. Or you can learn about the use of robots as camel jockeys in Qatar. Here's a picture. It's like an episode of Futurama come to life.

  • The Old New Thing

    The cost of trying too hard: String searching


    There are many algorithms for fast string searching, but the running of a string search is inherently O(n), where n is the length of the string being searched: If m is the length of the string being searched for (which I will call the "target string"), then any algorithm that accesses fewer than n/m elements of the string being searched will have a gap of m unaccessed elements, which is enough room to hide the target string.

    More advanced string searching algorithms can take advantage of characteristics of the target string, but in the general case, where the target string is of moderate size and is not pathological, all that the fancy search algorithms give you over the naive search algorithm is a somewhat smaller multiplicative constant.

    In the overwhelming majority of cases, then, a naive search algorithm is adequate. As long as you're searching for normal strings and not edge cases like "Find aaaaaaaaaaaaaaab in the string aaaaaaaaaaaaaabaaaaaaaaaaaaaaab". If you have a self-similar target string, the running time of a naive search is O(mn) where m is the length of the target string. The effort in the advanced searching algorithms goes towards diminishing the effect of m, but pay for it by requiring preliminary analysis of the target string. If your searches are for "relatively short" "normal" target strings, then the benefit of this analysis doesn't merit the cost.

    That's why nearly all library functions that do string searching use the naive algorithm. The naive algorithm is the correct algorithm over 99% of the time.

  • The Old New Thing

    From Doom to Gloom: The story of a video game


    NPR's Morning Edition developed a series on the subject of flops, and one of their segments was devoted to the rise and fall of John Romero. You can read more about the phenomenon known as Daikatana in a huge series on Gamespot. Set aside at least an hour if you choose to read it. You can also read the Dallas Observer story that opened the floodgates.

  • The Old New Thing

    The cost of trying too hard: Splay trees


    Often, it doesn't pay off to be too clever. Back in the 1980's, I'm told the file system group was working out what in-memory data structures to use to represent the contents of a directory so that looking up a file by name was fast. One of the experiments they tried was the splay tree. Splay trees were developed in 1985 by Sleator and Tarjan as a form of self-rebalancing tree that provides O(log n) amortized cost for locating an item in the tree, where n is the number of items in the tree. (Amortized costing means roughly that the cost of M operations is O(M log n). The cost of an individual operation is O(log n) on average, but an individual operation can be very expensive as long as it's made-up for by previous operations that came in "under budget".)

    If you're familiar with splay trees you may already see what's about to happen.

    A very common operation in a directory is enumerating and opening every file in it, say, because you're performing a content search through all the files in the directory or because you're building a preview window. Unfortunately, when you sequentially access all the elements in a splay tree in order, this leaves the tree totally unbalanced. If you enumerate all the files in the directory and open each one, the result is a linear linked list sorted in reverse order. Locating the first file in the directory becomes an O(n) operation.

    From a purely algorithmic analysis point of view, the O(n) behavior of that file open operation is not a point of concern. After all, in order to get to this point, you had to perform n operations to begin with, so that very expensive operation was already "paid for" by the large number of earlier operations. However, in practice, people don't like it when the cost of an operation varies so widely from use to use. If you arrive at a client's office five minutes early for a month and then show up 90 minutes late one day, your explanation of "Well, I was early for so much, I'm actually still ahead of schedule according to amortized costing," your client will probably not be very impressed.

    The moral of the story: Sometimes trying too hard doesn't work.

    (Postscript: Yes, there have been recent research results that soften the worst-case single-operation whammy of splay trees, but these results weren't available in the 1980's. Also, remember that consistency in access time is important.)

  • The Old New Thing

    ReadProcessMemory is not a preferred IPC mechanism


    Occasionally I see someone trying to use the ReadProcessMemory function as an inter-process communication mechanism. This is ill-advised for several reasons.

    First, you cannot use ReadProcessMemory across security contexts, at least not without doing some extra work. If somebody uses "runas" to run your program under a different identity, your two processes will not be able to use ReadProcessMemory to transfer data back and forth.

    You could go to the extra work to get ReadProcessMemory by adjusting the privileges on your process to grant PROCESS_VM_READ permission to the owner of the process you are communicating with, but this opens the doors wide open. Any process running with that identity read the data you wanted to share, not just the process you are communicating with. If you are communicating with a process of lower privilege, you just exposed your data to lower-privilege processes other than the one you are interested in.

    What's more, once you grant PROCESS_VM_READ permission, you grant it to your entire process. Not only can that process read the data you're trying to share, it can read anything else that is mapped into your address space. It can read all your global variables, it can read your heap, it can read variables out of your stack. It can even corrupt your stack!

    What? Granting read access can corrupt your stack?

    If a process grows its stack into the stack guard page, the unhandled exception filter catches the guard exception and extends the stack. But when it happen inside a private "catch all exceptions" handler, such as the one that the IsBadReadPtr Function uses, it is handled privately and doesn't reach the unhandled exception filter. As a result, the stack is not grown; a new stack guard page is not created. When the stack normally grows to and then past the point of the prematurely-committed guard page, what would normally be a stack guard exception is now an access violation, resulting in the death of the thread and with it likely the process.

    You might think you could catch the stack access violation and try to shut down the thread cleanly, but that is not possible for multiple reasons. First, structured exception handling executes on the stack of the thread that encountered the exception. If that thread has a corrupted stack, it becomes impossible to dispatch that exception since the stack that the exception filters want to run on is no longer viable.

    Even if you could somehow run these exception filters on some sort of "emergency stack", you still can't fix the problem. At the point of the exception, the thread could be in the middle of anything. Maybe it was inside the heap manager with the heap lock held and with heap data structures in a state of flux. In order for the process to stay alive, the heap data structures need to be made consistent and the heap lock released. But you don't know how to do that.

    There are plenty of other inter-process communication mechanisms available to you. One of them is anonymous shared memory, which I discussed a few years ago. Anonymous shared memory still has the problem that any process running under the same token as the one you are communicating with can read the shared memory block, but at least the scope of the exposure is limited to the data you explicitly wanted to share.

    (In a sense, you can't do any better than that. The process you are communicating with can do anything it wants with the data once it gets it from you. Even if you somehow arranged so that only the destination process can access the memory, there's nothing stopping that destination process from copying it somewhere outside your shared memory block, at which point your data can be read from the destination process by anybody running with the same token anyway.)

  • The Old New Thing

    At least there's a funny side to spam


    Poorly-drawn cartoons inspired by actual spam subject lines!

    It's pretty much what the title says. Don't forget to read the fan mail.

    Sometimes it's even funny.

Page 2 of 4 (36 items) 1234