June, 2009

  • The Old New Thing

    Why does Explorer use the term KB instead of KiB?

    • 74 Comments

    Although the International Electronic Commission established the term kibibyte for 1024 bytes, with the abbreviation KiB, Windows Explorer continues to use the abbreviation KB. Why doesn't Explorer get with the program?

    Because nobody else is on the program either.

    If you look around you, you'll find that nobody (to within experimental error) uses the terms kibibyte and KiB. When you buy computer memory, the amount is specified in megabytes and gigabytes, not mebibytes and gibibytes. The storage capacity printed on your blank CD is indicated in megabytes. Every document on the Internet (to within experimental error) which talks about memory and storage uses the terms kilobyte/KB, megabyte/MB, gigabyte/GB, etc. You have to go out of your way to find people who use the terms kibibyte/KiB, mebibyte/MiB, gibibyte/GiB, etc.

    In other words, the entire computing industry has ignored the guidance of the IEC.

    Explorer is just following existing practice. Everybody (to within experimental error) refers to 1024 bytes as a kilobyte, not a kibibyte. If Explorer were to switch to the term kibibyte, it would merely be showing users information in a form they cannot understand, and for what purpose? So you can feel superior because you know what that term means and other people don't.

    For an explanation of other storage units, you can consult this helpful chart from xkcd.

  • The Old New Thing

    Why does MS-DOS use 8.3 filenames instead of, say, 11.2 or 16.16?

    • 45 Comments

    When I discussed years ago why operating system files tend to follow the old 8.3 file name convention, I neglected to mention why the old MS-DOS filename convention was 8.3 and not, say, 11.2 or 16.16.

    It's a holdover from CP/M.

    As I noted when I discussed the old MS-DOS wildcard matching rules, MS-DOS worked hard at being compatible with CP/M. And CP/M used 8.3 filenames.

    Why did CP/M use 8.3 filenames? I don't know. There's nothing obvious in the CP/M directory format that explains why those two reserved bytes couldn't have been used to extend the file name to 10.3. But maybe they figured that eight was a convenient number.

  • The Old New Thing

    Why does the CreateProcess function modify its input command line?

    • 27 Comments

    One of the nasty gotchas of the CreateProcess function is that the lpCommandLine parameter must be a mutable pointer. If you pass a pointer to memory that cannot be written to (for example, a pointer to a page that is marked PAGE_READONLY), then you might crash. Commenter Ritchie wonders why this parameter is so weird.

    Basically, somebody back in the 1980's wanted to avoid allocating memory. (Another way of interpreting this is that somebody tried to be a bit too clever.)

    The CreateProcess temporarily modifies the string you pass as the lpCommandLine in its attempt to figure out where the program name ends and the command line arguments begin. Now, it could have made a copy of the string and made its temporary modifications to the copy, but hey, if you modify the input string directly, then you save yourself an expensive memory allocation operation. Back in the old days, people worried about avoiding memory allocations, so this class of micro-optimization is the sort of thing people worried about as a matter of course. Of course, nowadays, it seems rather antiquated.

    Now, there may also be good technical reasons (as opposed to merely performance considerations) for avoiding allocating memory on the heap. When a program crashes, the just in time debugger is launched with the CreateProcess function, and you don't want to allocate memory on the heap if the reason the program crashed is that the heap is corrupted. Otherwise, you can get yourself into a recursive crash loop: While trying to launch the debugger, you crash, which means you try to launch the debugger to debug the new crash, which again crashes, and so on. The original authors of the CreateProcess function were careful to avoid allocating memory off the heap, so that in the case the function is being asked to launch the debugger, it won't get waylaid by a corrupted heap.

    Whether these concerns are still valid today I am not sure, but it was those concerns that influenced the original design and therefore the interface.

    Why is it that only the Unicode version is affected? Well, because the ANSI version of the function just converts its strings to Unicode and the calls the Unicode version of the function. Consequently, the ANSI version of the function happens to implement the workaround as a side effect of its original goal: The string passed to the Unicode version of the function is a temporary string!

    Exercise: Why is it okay for the ANSI version of the CreateProcess to allocate a temporary string from the heap when the Unicode function cannot?

  • The Old New Thing

    A concrete illustration of practical running time vs big-O notation

    • 29 Comments

    One of the five things every Win32 programmer needs to know is that memory latency can throw your big-O computations out the window. Back in 2003, I ran into a concrete example of this.

    Somebody started with the algorithm presented in Fast Algorithms for Sorting and Searching Strings by Jon L. Bentley and Robert Sedgewick, implemented it in C#, and compared the performance against a HashTable, TernaryTree and SortedList. Surprisingly, the hash table won on insertion and retrieval of tens of thousands of randomly-generated strings. Why?

    Remember, big-O notation hides the constants, and those constants can get pretty big. What's more, the impact of those constants is critical for normal-sized workloads. The big-O notation allows you to compare algorithms when the data sets become extremely large, but you have to keep the constants in mind to see when the balance tips.

    The central point of my presentation at the PDC was that complexity analysis typically ignores memory bandwidth effects and assumes that all memory accesses perform equally. This is rarely true in practice. As we saw, leaving L2 is a big hit on performance, and accessing the disk is an even greater hit.

    The tree doesn't rebalance, so inserting strings in alphabetical order will result in a bad search tree. (They do call this out in their paper.) To locate a k-character string, Bentley-Sedgewick traverses at least k pointers, usually more. ("How much more" depends on how many prefixes are shared. More shared prefixes = more pointers traversed.) It also requires k(4p) bytes of memory to store that string, where p is the size of a pointer. Remember those pesky constants. High constant factor overhead starts to kill you when you have large datasets.

    More on those constants: Complexity analysis assumes that an add instruction executes in the same amount of time as a memory access. This is not true in practice, but the difference is a bounded constant factor, so it can be ignored for big-O purposes. Note, however, that that constant often exceeds one million if you take a page fault. One million is a big constant.

    Going back to memory bandwidth effects: At each node, you access one character and one pointer. So you use only 6 bytes out of a 64-byte cache line. You're wasting 90% of your bus bandwidth and you will certainly fall out of L2.

    Bentley-Sedgewick says that this is beaten out by not traversing the entire string being searched for in the case of a miss. I.e., their algorithm is tuned for misses. If you expect most of your probes to be misses, this can be a win. (The entire string is traversed on a hit, of course, so there is no gain for hits.)

    Note also that this "cost" for traversing the string on a miss is overstated due to memory bandwidth effects. The characters in a string are contiguous, so traversing the string costs you only L/64 cache lines, where L is the length of the string, and one potential page fault, assuming your string is less than 4KB. Traversing the tree structure costs you at least L cache lines and probably more depending on your branching factor, as well as L potential page faults.

    Let's look at the testing scenario again. The testing was only on hits, so the improved performance on misses was overlooked entirely. What's more, the algorithm takes advantage of strings with common prefixes, but the testing scenario used randomly-generated strings, which generates a data set opposite from the one the algorithm was designed for, since randomly-generated strings are spread out across the problem space instead of being clustered with common prefixes.

    Those are some general remarks; here are some performance notes specific to the CLR.

    I don't know whether it does or not, but I would not be surprised if System.String.GetHashCode caches the hash value in the string, which would mean that the cost of computing the hash is shared by everybody who uses it in a hashing operation. (Therefore, if you count the cost incurred only by the algorithm, hashing is free.)

    Note also that Bentley-Sedgewick's insert() function stores the object back into the tree in the recursive case. Most of the time, the value being stored is the same as the value that was already there. This dirties the cache line for no reason (forcing memory bandwidth to be wasted on a flush) and—particularly painful for the CLR—you hit the write barrier and end up dirting a whole boatload of cards. A very small change avoids this problem: Change

        p->eqkid = insert(p->eqkid, ++s); 
    

    to

        Tptr newkid = insert(p->eqkid, ++s);
        if (p->eqkid != newkid) p->eqkid = newkid;
    

    (and similarly in the other branches). "This code is short but subtle, and worth careful study." How very true.

    Note also that if you use their "sleazy hack" of coercing a string to a Tptr, you had to have changed the type of eqkid from Tptr to object. This introduces a CLR type-check into the inner loop. Congratulations, you just tubed the inner loop performance.

    Now go to the summary at the end of the article.

    1. "Ternary trees do not incur extra overhead for insertion or successful searches." I'm not sure what "extra" means here, but hash tables have the same behavior.
    2. "Ternary trees are usually substantially faster than hashing for unsuccessful searches." Notice that they are optimized for misses.
    3. "Ternary trees gracefully grow and shrink; hash tables need to be rebuilt after large size changes." True, but the CLR hashtable does this so you don't have to. Somebody wrote it for you.
    4. "Ternary trees support advanced searches such as partial-match and near-neighbor search." These operations weren't tested.
    5. "Ternary trees support many other operations, such as traversal to report items in sorted order." These operations weren't tested either.

    Notice that the article doesn't claim that ternary trees are faster than hashing for successful searches. So if that's all you're testing, you're testing the wrong thing. One of the big benefits of ternary trees is the new operations available (4 and 5), but if you're not going to perform those operations, then you ended up paying for something you don't use.

    There are several morals of the story.

    1. Constants are important.
    2. Memory bandwith is important.
    3. Performance optimizations for unmanged code do not necessarily translate to managed code.
    4. What are you really testing?

    Mind you, Bentley and Sedgewick are not morons. They know all this.

    [Typo fixed 11am, thanks Nathan_works and Jachym Kouba.]

  • The Old New Thing

    What does the "Zw" prefix mean?

    • 26 Comments

    If you spend time in kernel mode, you're accustomed to seeing functions with two-letter (or occasionally, three-letter) prefixes that indicate which component they belong to.

    PrefixComponentExample
    Ex Executive ExAllocatePool
    Hal Hardware abstraction layer HalGetBusData
    Io I/O manager IoAllocateIrp
    Ke Kernel KeBugCheck
    Ks Kernel streaming KsAcquireControl
    Mm Memory manager MmGetPhysicalAddress
    Ob Object manager ObReferenceObjectByHandle
    Po Power management PoSetSystemState
    Se Security SeAccessCheck
    Tdi Transport driver interface TdiProviderReady
    Zw ???? ZwCancelTimer

    What does the "Zw" mean?

    Answer: Nothing.

    The people who chose the letters wanted to pick something that was unlikely to collide with anything. Perhaps they had a prior bad experience with having chosen a prefix, only to find that somebody ahead of them claimed it already?

  • The Old New Thing

    If you want to consume all the virtual address space, well, then go ahead and consume it, you don't need my help

    • 9 Comments

    Commenter Matthew Chaboud asks if there's an easy way to consume all the virtual address space below 4GB, short of, well, actually allocating it. "It seems like there should be a cleaner way to do this."

    If you want to consume all the virtual address space, then call VirtualAlloc until you turn blue. Programs shouldn't care what address they get back from a memory allocation function; they should handle values below 2GB and above 2GB with equal facility.

    It's not like there's a ConsumeAllAvailableVirtualAddressSpaceAndExhaustTheHeap function. (Is there a AllocateAllRemainingDiskSpaceAndFillExistingFilesWithZeroes function?) What would be the point of such a function? Once you call it, you have run out of memory!

    If Mr. Chaboud is talking about keeping programs away from bottom 4GB of virtual address space on a 64-bit machine, then a much easier way to do this is to set the AllocationPreference configuration setting to specify that memory should be allocated from high addresses first. (But I don't think that's the scenario that prompted the original question, because on 64-bit Windows, the default heap is above the 4GB boundary, so there would be no need to exhaust the heap in order to consume the memory at virtual addresses below 4GB.)

    Correction: Pavel Lebedinsky points out that the default heap is below 4GB on 64-bit machines. It used to be above the 4GB boundary on earlier versions of 64-bit Windows, but I guess they changed it.

  • The Old New Thing

    Cool guys don't look at explosions

    • 10 Comments

    Thanks to NPR's Monkey See blog for finding this:

  • The Old New Thing

    Oh great, and my phone even has a CPU meter

    • 33 Comments

    These fancy-dancy IP phones never cease to make me wonder what our world has come to. I remember when a telephone was a bucket of carbon granules and a momentary switch, and the rotary dial was just a mechanism for going on and off hook at a precise frequency. (Indeed, sometimes for fun, I'll pulse-dial a phone by tapping on the hook.)

    The other day, somebody sent out an email message:

    I'm amusing myself watching the "CPU Load" graph on the phone. Then again, I'm easily amused.

    Naturally, the CPU meter is useless, because you can only switch to it when you're not using the phone. Once you pick up the handset, the CPU meter dismisses itself so you can see information about the call you're on.

    Oh wait, you can go through the menus to switch back to the CPU meter after it auto-dismisses itself. Woo-hoo, now I can tell people, "Sorry, can you talk slower? My phone's CPU is maxing out."

    This what-barely-qualifies-as-amusement didn't last long. A year later, the units were replaced with a different model phone that didn't have a CPU meter. Progress.

  • The Old New Thing

    The butter and the money for the butter

    • 39 Comments

    In a discussion a few years ago, I saw the phrase, "Now you have the butter and the money." This was new to me, and a little Web searching (guided in part by a guess at the author's nationality) revealed it to be a French proverb, the full version of which is On ne peut pas avoir le beurre et l'argent du beurre: "You can't have the butter and the money for the butter." It's a really nice phrase, and maybe someday I'll be able to use it.

    Bonus butter idiom: Reading the blog of a German colleague, I ran across the phrase alles wieder in Butter ("everything back in butter"), which from context appeared to mean something like "everything's all right again." Some more Web searching suggests that I was basically right, and that the idiom comes from the Middle Ages: To prevent glassware transported over the Alps from breaking in transit, a clever businessman discovered that he could set the glasses in a cask, then pour hot butter over them. As the butter cooled, it held the glasses in place, thereby preventing them from rattling against each other and cracking during transport. Everything was back in butter.

  • The Old New Thing

    I'm sorry, Brian George, but we got cut off and I couldn't call you back

    • 24 Comments

    Yesterday, at 4 o'clock in the afternoon, I receive a telephone call at work. There is that characteristic pause after connecting which tells me that I have probably just been called by a telemarketer.

    "Hello?"

    Hello, I'm Brian George from Liquid Capital Management. Our company founder, Brian Kim, has been working closely with Microsoft employees to help them with financial matters. Are you familiar with hedge funds?

    I hear another voice in the background. This is almost certainly a boiler room operation.

    "There appears to be another voice on the line, do you hear it? Oh wait, it's gone. Could you repeat the name of your company?"

    It's Liquid Capital. Are you the Chief Financial Officer?

    "Liquid Capital Corporation?"

    No, Liquid Capital Management. Are you familiar with hedge funds?

    "And your company's founder's name, is that spelled with a Y or an I?"

    With an I. Are you familiar with hedge funds?

    "I'm sorry, if you could just answer a few questions so I can log this call properly. You are Brian George from Liquid Capital Management. I assume your name is spelled with an I as well?"

    Yes.

    "Excellent. And where is your company based?"

    We're in New York City.

    "Okay, and what is your phone number, in case we get disconnected?"

    646-237-4400. Are you familiar with hedge funds?

    "Okay, thank you for your patience. Now, what was it you would like to talk about?"

    (Silence.)

    Apparently, Mr. George became impatient with my call logging procedure and hung up. I try calling him back, but an automated voice tells me, "We're sorry. The party you have dialed is not accepting calls at this time."

    He was calling after business hours in New York, so maybe it was some sort of emergency. Gosh, I hope he wasn't calling about anything important.

Page 1 of 5 (41 items) 12345