June, 2009

  • The Old New Thing

    Why does Explorer use the term KB instead of KiB?


    Although the International Electronic Commission established the term kibibyte for 1024 bytes, with the abbreviation KiB, Windows Explorer continues to use the abbreviation KB. Why doesn't Explorer get with the program?

    Because nobody else is on the program either.

    If you look around you, you'll find that nobody (to within experimental error) uses the terms kibibyte and KiB. When you buy computer memory, the amount is specified in megabytes and gigabytes, not mebibytes and gibibytes. The storage capacity printed on your blank CD is indicated in megabytes. Every document on the Internet (to within experimental error) which talks about memory and storage uses the terms kilobyte/KB, megabyte/MB, gigabyte/GB, etc. You have to go out of your way to find people who use the terms kibibyte/KiB, mebibyte/MiB, gibibyte/GiB, etc.

    In other words, the entire computing industry has ignored the guidance of the IEC.

    Explorer is just following existing practice. Everybody (to within experimental error) refers to 1024 bytes as a kilobyte, not a kibibyte. If Explorer were to switch to the term kibibyte, it would merely be showing users information in a form they cannot understand, and for what purpose? So you can feel superior because you know what that term means and other people don't.

    For an explanation of other storage units, you can consult this helpful chart from xkcd.

  • The Old New Thing

    Why does MS-DOS use 8.3 filenames instead of, say, 11.2 or 16.16?


    When I discussed years ago why operating system files tend to follow the old 8.3 file name convention, I neglected to mention why the old MS-DOS filename convention was 8.3 and not, say, 11.2 or 16.16.

    It's a holdover from CP/M.

    As I noted when I discussed the old MS-DOS wildcard matching rules, MS-DOS worked hard at being compatible with CP/M. And CP/M used 8.3 filenames.

    Why did CP/M use 8.3 filenames? I don't know. There's nothing obvious in the CP/M directory format that explains why those two reserved bytes couldn't have been used to extend the file name to 10.3. But maybe they figured that eight was a convenient number.

  • The Old New Thing

    Why does the CreateProcess function modify its input command line?


    One of the nasty gotchas of the CreateProcess function is that the lpCommandLine parameter must be a mutable pointer. If you pass a pointer to memory that cannot be written to (for example, a pointer to a page that is marked PAGE_READONLY), then you might crash. Commenter Ritchie wonders why this parameter is so weird.

    Basically, somebody back in the 1980's wanted to avoid allocating memory. (Another way of interpreting this is that somebody tried to be a bit too clever.)

    The CreateProcess temporarily modifies the string you pass as the lpCommandLine in its attempt to figure out where the program name ends and the command line arguments begin. Now, it could have made a copy of the string and made its temporary modifications to the copy, but hey, if you modify the input string directly, then you save yourself an expensive memory allocation operation. Back in the old days, people worried about avoiding memory allocations, so this class of micro-optimization is the sort of thing people worried about as a matter of course. Of course, nowadays, it seems rather antiquated.

    Now, there may also be good technical reasons (as opposed to merely performance considerations) for avoiding allocating memory on the heap. When a program crashes, the just in time debugger is launched with the CreateProcess function, and you don't want to allocate memory on the heap if the reason the program crashed is that the heap is corrupted. Otherwise, you can get yourself into a recursive crash loop: While trying to launch the debugger, you crash, which means you try to launch the debugger to debug the new crash, which again crashes, and so on. The original authors of the CreateProcess function were careful to avoid allocating memory off the heap, so that in the case the function is being asked to launch the debugger, it won't get waylaid by a corrupted heap.

    Whether these concerns are still valid today I am not sure, but it was those concerns that influenced the original design and therefore the interface.

    Why is it that only the Unicode version is affected? Well, because the ANSI version of the function just converts its strings to Unicode and the calls the Unicode version of the function. Consequently, the ANSI version of the function happens to implement the workaround as a side effect of its original goal: The string passed to the Unicode version of the function is a temporary string!

    Exercise: Why is it okay for the ANSI version of the CreateProcess to allocate a temporary string from the heap when the Unicode function cannot?

  • The Old New Thing

    A concrete illustration of practical running time vs big-O notation


    One of the five things every Win32 programmer needs to know is that memory latency can throw your big-O computations out the window. Back in 2003, I ran into a concrete example of this.

    Somebody started with the algorithm presented in Fast Algorithms for Sorting and Searching Strings by Jon L. Bentley and Robert Sedgewick, implemented it in C#, and compared the performance against a HashTable, TernaryTree and SortedList. Surprisingly, the hash table won on insertion and retrieval of tens of thousands of randomly-generated strings. Why?

    Remember, big-O notation hides the constants, and those constants can get pretty big. What's more, the impact of those constants is critical for normal-sized workloads. The big-O notation allows you to compare algorithms when the data sets become extremely large, but you have to keep the constants in mind to see when the balance tips.

    The central point of my presentation at the PDC was that complexity analysis typically ignores memory bandwidth effects and assumes that all memory accesses perform equally. This is rarely true in practice. As we saw, leaving L2 is a big hit on performance, and accessing the disk is an even greater hit.

    The tree doesn't rebalance, so inserting strings in alphabetical order will result in a bad search tree. (They do call this out in their paper.) To locate a k-character string, Bentley-Sedgewick traverses at least k pointers, usually more. ("How much more" depends on how many prefixes are shared. More shared prefixes = more pointers traversed.) It also requires k(4p) bytes of memory to store that string, where p is the size of a pointer. Remember those pesky constants. High constant factor overhead starts to kill you when you have large datasets.

    More on those constants: Complexity analysis assumes that an add instruction executes in the same amount of time as a memory access. This is not true in practice, but the difference is a bounded constant factor, so it can be ignored for big-O purposes. Note, however, that that constant often exceeds one million if you take a page fault. One million is a big constant.

    Going back to memory bandwidth effects: At each node, you access one character and one pointer. So you use only 6 bytes out of a 64-byte cache line. You're wasting 90% of your bus bandwidth and you will certainly fall out of L2.

    Bentley-Sedgewick says that this is beaten out by not traversing the entire string being searched for in the case of a miss. I.e., their algorithm is tuned for misses. If you expect most of your probes to be misses, this can be a win. (The entire string is traversed on a hit, of course, so there is no gain for hits.)

    Note also that this "cost" for traversing the string on a miss is overstated due to memory bandwidth effects. The characters in a string are contiguous, so traversing the string costs you only L/64 cache lines, where L is the length of the string, and one potential page fault, assuming your string is less than 4KB. Traversing the tree structure costs you at least L cache lines and probably more depending on your branching factor, as well as L potential page faults.

    Let's look at the testing scenario again. The testing was only on hits, so the improved performance on misses was overlooked entirely. What's more, the algorithm takes advantage of strings with common prefixes, but the testing scenario used randomly-generated strings, which generates a data set opposite from the one the algorithm was designed for, since randomly-generated strings are spread out across the problem space instead of being clustered with common prefixes.

    Those are some general remarks; here are some performance notes specific to the CLR.

    I don't know whether it does or not, but I would not be surprised if System.String.GetHashCode caches the hash value in the string, which would mean that the cost of computing the hash is shared by everybody who uses it in a hashing operation. (Therefore, if you count the cost incurred only by the algorithm, hashing is free.)

    Note also that Bentley-Sedgewick's insert() function stores the object back into the tree in the recursive case. Most of the time, the value being stored is the same as the value that was already there. This dirties the cache line for no reason (forcing memory bandwidth to be wasted on a flush) and—particularly painful for the CLR—you hit the write barrier and end up dirting a whole boatload of cards. A very small change avoids this problem: Change

        p->eqkid = insert(p->eqkid, ++s); 


        Tptr newkid = insert(p->eqkid, ++s);
        if (p->eqkid != newkid) p->eqkid = newkid;

    (and similarly in the other branches). "This code is short but subtle, and worth careful study." How very true.

    Note also that if you use their "sleazy hack" of coercing a string to a Tptr, you had to have changed the type of eqkid from Tptr to object. This introduces a CLR type-check into the inner loop. Congratulations, you just tubed the inner loop performance.

    Now go to the summary at the end of the article.

    1. "Ternary trees do not incur extra overhead for insertion or successful searches." I'm not sure what "extra" means here, but hash tables have the same behavior.
    2. "Ternary trees are usually substantially faster than hashing for unsuccessful searches." Notice that they are optimized for misses.
    3. "Ternary trees gracefully grow and shrink; hash tables need to be rebuilt after large size changes." True, but the CLR hashtable does this so you don't have to. Somebody wrote it for you.
    4. "Ternary trees support advanced searches such as partial-match and near-neighbor search." These operations weren't tested.
    5. "Ternary trees support many other operations, such as traversal to report items in sorted order." These operations weren't tested either.

    Notice that the article doesn't claim that ternary trees are faster than hashing for successful searches. So if that's all you're testing, you're testing the wrong thing. One of the big benefits of ternary trees is the new operations available (4 and 5), but if you're not going to perform those operations, then you ended up paying for something you don't use.

    There are several morals of the story.

    1. Constants are important.
    2. Memory bandwith is important.
    3. Performance optimizations for unmanged code do not necessarily translate to managed code.
    4. What are you really testing?

    Mind you, Bentley and Sedgewick are not morons. They know all this.

    [Typo fixed 11am, thanks Nathan_works and Jachym Kouba.]

  • The Old New Thing

    Spam trackback attack returns, it's not a matter of whether but how much


    Like microsoft.com, the question isn't whether blogs.msdn.com site is under attack but rather how bad the attack is right now.

    There are a number of regular culprits, like codedstyle.com, anith.com, simplynetdev.com, but those sites tend to focus on the most recent few articles. A new category of trackback spammer is here: The I'm going to scrape your entire site and create a trackback for every article trackback spammer.

    Site From To Count Rate (pings/hr)
    paidsurveyshub.info 5/28/2009 04:27 PM 5/28/2009 04:41 PM 27 111
    www.newillinoismesotheliomalawyers.co.cc 5/28/2009 05:18 PM 5/28/2009 05:18 PM 1
    codedstyle.com 5/29/2009 07:25 AM 5/29/2009 07:26 AM 5 240
    asp-net-hosting.simplynetdev.com 5/29/2009 07:34 AM 5/29/2009 07:36 AM 2 30
    www.anith.com 5/29/2009 08:24 AM 5/29/2009 08:24 AM 1
    paidsurveyshub.info 5/29/2009 09:07 AM 5/29/2009 10:26 AM 73 55
    microsoft-sharepoint.simplynetdev.com 5/29/2009 10:39 AM 5/29/2009 10:39 AM 2
    paidsurveyshub.info 5/29/2009 12:09 PM 5/30/2009 04:33 AM 584 36
    outdoorceilingfansite.info 5/30/2009 11:49 PM 5/31/2009 12:09 AM 206 615
    woodtvstand.info 5/31/2009 02:47 PM 5/31/2009 06:01 PM 507 157
    patiochairsite.info 5/31/2009 08:47 PM 5/31/2009 09:18 PM 24 45
    hammockstandsite.info 5/31/2009 10:20 PM 5/31/2009 10:44 PM 28 68
    indoorgrillsrecipes.info 6/01/2009 12:46 AM 6/01/2009 01:00 AM 20 81
    portablegreenhousesite.info 6/01/2009 03:05 AM 6/01/2009 04:35 AM 102 67
    uniformstores.info 6/01/2009 05:41 AM 6/01/2009 07:00 AM 68 51
    asp-net-hosting.simplynetdev.com 6/01/2009 07:13 AM 6/01/2009 07:13 AM 1
    codedstyle.com 6/01/2009 07:40 AM 6/01/2009 07:40 AM 2
    woodtvstand.info 6/01/2009 10:27 AM 6/01/2009 11:48 AM 397 294
    patiochairsite.info 6/01/2009 11:46 AM 6/01/2009 12:02 PM 10 38
    hammockstandsite.info 6/01/2009 12:07 PM 6/01/2009 12:15 PM 21 158
    indoorgrillsrecipes.info 6/01/2009 12:17 PM 6/01/2009 12:36 PM 51 161
    portablegreenhousesite.info 6/01/2009 12:36 PM 6/01/2009 01:03 PM 67 149
    uniformstores.info 6/01/2009 01:04 PM 6/01/2009 01:38 PM 80 141
    paidsurveyshub.info 6/01/2009 10:47 PM 6/02/2009 01:20 AM 16 6

    I'm pretty sure this will continue for at least the next week. I think I'm going to have to write a script that auto-deletes all these bogus trackbacks.

  • The Old New Thing

    What does the "Zw" prefix mean?


    If you spend time in kernel mode, you're accustomed to seeing functions with two-letter (or occasionally, three-letter) prefixes that indicate which component they belong to.

    Ex Executive ExAllocatePool
    Hal Hardware abstraction layer HalGetBusData
    Io I/O manager IoAllocateIrp
    Ke Kernel KeBugCheck
    Ks Kernel streaming KsAcquireControl
    Mm Memory manager MmGetPhysicalAddress
    Ob Object manager ObReferenceObjectByHandle
    Po Power management PoSetSystemState
    Se Security SeAccessCheck
    Tdi Transport driver interface TdiProviderReady
    Zw ???? ZwCancelTimer

    What does the "Zw" mean?

    Answer: Nothing.

    The people who chose the letters wanted to pick something that was unlikely to collide with anything. Perhaps they had a prior bad experience with having chosen a prefix, only to find that somebody ahead of them claimed it already?

  • The Old New Thing

    If you want to consume all the virtual address space, well, then go ahead and consume it, you don't need my help


    Commenter Matthew Chaboud asks if there's an easy way to consume all the virtual address space below 4GB, short of, well, actually allocating it. "It seems like there should be a cleaner way to do this."

    If you want to consume all the virtual address space, then call VirtualAlloc until you turn blue. Programs shouldn't care what address they get back from a memory allocation function; they should handle values below 2GB and above 2GB with equal facility.

    It's not like there's a ConsumeAllAvailableVirtualAddressSpaceAndExhaustTheHeap function. (Is there a AllocateAllRemainingDiskSpaceAndFillExistingFilesWithZeroes function?) What would be the point of such a function? Once you call it, you have run out of memory!

    If Mr. Chaboud is talking about keeping programs away from bottom 4GB of virtual address space on a 64-bit machine, then a much easier way to do this is to set the AllocationPreference configuration setting to specify that memory should be allocated from high addresses first. (But I don't think that's the scenario that prompted the original question, because on 64-bit Windows, the default heap is above the 4GB boundary, so there would be no need to exhaust the heap in order to consume the memory at virtual addresses below 4GB.)

    Correction: Pavel Lebedinsky points out that the default heap is below 4GB on 64-bit machines. It used to be above the 4GB boundary on earlier versions of 64-bit Windows, but I guess they changed it.

  • The Old New Thing

    Cool guys don't look at explosions


    Thanks to NPR's Monkey See blog for finding this:

  • The Old New Thing

    Sure, I can get spurious WM_MOUSEMOVE messages, but why do they keep streaming in?


    I wrote some time ago that the window manager generates spurious WM_MOUSEMOVE messages in order to let programs know that the mouse has entered the window even if the reason was a window hierarchy reshuffle rather than a physical motion of the pointing device. But some people have noticed that that explanation fails to account for all the WM_MOUSEMOVE messages that are being delivered. In particular, the reasoning fails to explain why a stream of WM_MOUSEMOVE messages is being generated. So where is this infinite stream of WM_MOUSEMOVE messages coming from, even when the window hierarchy is stable?

    They're most likely coming from some third party so-called enhancement software.

    The Windows performance and mobility teams keep a close eye on these sort of continuous phenomena. The performance folks are interested because this continuous stream of messages suck away resources that could be used for something more productive. It's not just the cost in CPU. The memory in the message handling code path can't be paged out since it's being hit all the time. The context switches force CPU caches to be flushed. Algorithms which had been tuned to reside entirely within the L2 cache now find themselves going out to slower main memory. Meanwhile, the mobility team is concerned because all this continuous activity runs down your battery, prevents the CPU from going into a low-power state, and prevents your screen saver from kicking in.

    If you find a continuous stream of WM_MOUSEMOVE messages, then there is some continuous activity going on that is causing it. It might be some software that is polling the mouse in order to provide "extra value". For example, they might check the mouse cursor position and try to guess what it is positioned over, but they never realize that "You know, if nothing has changed, the answer is probably the same as it as last time you checked." Or, as the author of linked posting above eventually figured out, it might be a buggy wireless mouse which not only is sucking down your CPU and preventing your screen saver from running, but is also draining your wireless mouse battery!

  • The Old New Thing

    The butter and the money for the butter


    In a discussion a few years ago, I saw the phrase, "Now you have the butter and the money." This was new to me, and a little Web searching (guided in part by a guess at the author's nationality) revealed it to be a French proverb, the full version of which is On ne peut pas avoir le beurre et l'argent du beurre: "You can't have the butter and the money for the butter." It's a really nice phrase, and maybe someday I'll be able to use it.

    Bonus butter idiom: Reading the blog of a German colleague, I ran across the phrase alles wieder in Butter ("everything back in butter"), which from context appeared to mean something like "everything's all right again." Some more Web searching suggests that I was basically right, and that the idiom comes from the Middle Ages: To prevent glassware transported over the Alps from breaking in transit, a clever businessman discovered that he could set the glasses in a cask, then pour hot butter over them. As the butter cooled, it held the glasses in place, thereby preventing them from rattling against each other and cracking during transport. Everything was back in butter.

Page 1 of 5 (41 items) 12345