April, 2006

  • The Old New Thing

    German adjectives really aren't that hard; they just look that way

    • 19 Comments

    I may have scared a bunch of people with that chart of German adjective endings, but as several commenters noted, native speakers don't refer to the charts; they just say what comes naturally. (Well, except for Leo Petr, who claims that native Russian speakers actually study these charts in grade school.) Commenter Helga Waage noted that one quickly sees patterns in the charts that make them much easier to digest. And that's true. But I taught myself the German adjective endings a completely different way. If you're a student of German, you might find this helpful. If you're not, then you probably just want to skip the rest of this entry.

    As a side note, you have to make sure you put the columns in the right order. In many textbooks, the columns are ordered as "masculine, feminine, neuter, plural", but this fails to highlight the strong similarity between the masculine and neuter genders. From a grammatical standpoint, German neuter nouns are "90% masculine, 10% feminine"; therefore, it's more natural to put the neuter column between the masculine and feminine columns. I therefore prefer the order "masculine, neuter, feminine, plural", which as it so happens appears to be the order that Germans themselves use.

    I'm going to do away with the terms "strong", "weak", and "mixed". Instead, I'm going to reduce it to the question "How much work does the adjective have to do?" which breaks down into two inflections. In my mind, I don't have terms for these two inflections, but for the purpose of this discussion I'll call them "hardworking" and "lazy".

    We start with the lazy inflection, which is used when the definite article or a word that has the same ending as the definite article is present. The lazy inflection is simple: In the singular of the nominative and accusative cases (the "easy cases"), the ending is "-e". In the plural and in the genitive and dative cases (the "hard cases"), the ending is "-en".

    M N F   P
    Nom   -e   -en
    Acc -en
    Dat -en
    Gen

    There is only one exception to this general rule, which I highlighted in the table above. But even that exception is natural, because the masculine gender is the only one whose articles change between the nominative and the accusative, from "der" to "den" and "ein" to "einen", so you're already used to sticking an extra "-en" in the masculine accusative singular.

    (By the way, I call the nominative and accusative the "easy" cases since most textbooks teach them them within the first few weeks, which means that you've quickly become familiar with them and treat them as old friends. On the other hand, the dative and genitive are not usually introduced until second year, thereby making them "hard" due to their relative unfamiliarity.)

    The hardworking inflection is even easier than the lazy inflection. You use the hardworking inflection when there is no word that has the same ending as the definite article. In this case, the adjective must step up and take the ending itself. (I've included the definite article in the chart for reference.)

    M N F P
    Nom
    der
    -er
    das
    -es
    die
    -e
    die
    -e
    Acc
    den
    -en
    das
    -es
    die
    -e
    die
    -e
    Dat
    dem
    -em
    dem
    -em
    der
    -er
    den
    -en
    Gen    
    der
    -er
    der
    -er

    Hey, wait, I left two boxes blank. What's going on here?

    Well, because in those two cases, even if there is nothing else to carry the ending of the definite article, the noun itself gets modified by adding "-s". For example, the genitive of the neuter noun "Wasser" (water) is "Wassers" (of water). The word that carries the ending of the definite article is the noun itself! That's why I leave the boxes blank: The scenario never occurs in German.

    It is those empty boxes, however, that always trip me up. When it comes time to decide what ending to put on the adjective, and I'm in one of those two boxes, the word with the ending of the definite article hasn't appeared yet so I think I'm in the "hardworking" case. And then when I get around to saying the "-s" at the end of "Wassers", I realize, "Oh, crap, there's that indicator. I should have used the lazy form." But it's too late, I already said the adjective with the wrong ending. I could go back and fix it, but that would interrupt the flow of the conversation, so I usually decide to let it slide and take the hit of sounding stupid. (Or, more precisely, sounding more stupid.) If you listen carefully, you may notice me pause for a fraction of a second just as I reach the "-s" and the realization dawns on me that I messed up again.

    If you compare my charts to the official charts with strong, weak and mixed inflections, you'll see that my "lazy" inflection matches the weak inflection exactly, and my "hardworking" inflection matches the "strong" inflection except for those empty boxes. (Because, under my rules, those empty boxes are lazy.) The mixed inflection matches the "lazy" inflection except in three places, which I count as "hardworking" because the indefinite article "ein" does not take an ending in exactly those three places.

    Anyway, so there's how I remember my German adjective endings. Mind you, I don't work through the details of these rules each time I have to decide on an ending. I just have to make the simple note of whether the definite article ending has already appeared (or in the case I always forget: will soon appear). If not, then I put it on the adjective.

  • The Old New Thing

    Computing over a high-latency network means you have to bulk up

    • 65 Comments

    One of the big complaints about Explorer we've received from corporations is how often it accesses the network. If the computer you're accessing is in the next room, then accessing it a large number of times isn't too much of a problem since you get the response back rather quickly. But if the computer you're talking to is halfway around the world, then even if you can communicate at the theoretical maximum possible speed (namely, the speed of light), it'll take 66 milliseconds for your request to reach the other computer and another 66 milliseconds for the reply to come back. In practice, the signal takes longer than that to make its round trip. A latency of a half second is not unusual for global networks. A latency of one to two seconds is typical for satellite networks.

    Note that latency and bandwidth are independent metrics. Bandwidth is how fast you can shovel data, measured in data per unit time (e.g. bits per second); latency is how long it takes the data to reach its destination, measured in time (e.g. milliseconds). Even though these global networks have very high bandwidth, the high latency is what kills you.

    (If you're a physicist, you're going to see the units "data per unit time" and "time" and instinctively want to multiply them together to see what the resulting "data" unit means. Bandwidth times latency is known as the "pipe". When doing data transfer, you want your transfer window to be the size of your pipe.)

    High latency means that you should try to issue as few I/O requests as possible, although it's okay for each of those requests to be rather large if your bandwidth is also high. Significant work went into reducing the number of I/O requests issued by Explorer during common operations such as enumerating the contents of a folder.

    Enumerating the contents of a folder in Explorer is more than just getting the file names. The file system shell folder needs other file metadata such as the last-modification time and the file size in order to build up its SHITEMID, which is the unit of item identification in the shell namespace. One of the other pieces of information that the shell needs is the file's index, a 64-bit value that is different for each file on a volume. Now, this information is not returned by the "slow" FindNextFile function. As a result, the shell would have to perform three round-trip operations to retrieve this extra information:

    • CreateFile(),
    • GetFileInformationByHandle() (which returns the file index in the BY_HANDLE_FILE_INFORMATION structure), and finally
    • CloseHandle().

    If you assume a 500ms network latency, then these three additional operations add a second and a half for each file in the directory. If a directory has even just forty files, that's a whole minute spent just obtaining the file indices. (As we saw last time, the FindNextFile does its own internal batching to avoid this problem when doing traditional file enumeration.)

    And that's where this "fast mode" came from. The "fast mode" query is another type of bulk query to the server which returns all the normal FindNextFile information as well as the file indices. As a result, the file index information is piggybacked on top of the existing FindNextFile-like query. That's what makes it fast. In "fast mode", enumerating 200 files from a directory would take just a few seconds (two "bulk queries" that return the FindNextFile information and the file indices at one go, plus some overhead for establishing and closing the connection). In "slow mode", getting the normal FindNextFile information takes a few seconds, but getting the file indices would add another 1.5 seconds for each file, for an additional 1.5 × 200 = 300 seconds, or five minutes.

    I think most people would agree that reducing the time it takes to obtain the SHITEMIDs for all the files in a directory from five minutes to a few seconds is a big improvement. That's why the shell is so anxious to use this new "fast mode" query.

    If your program is going to be run by multinational corporations, you have to take high-latency networks into account. And this means bulking up.

    Sidebar: Some people have accused me of intentionally being misleading with the characterization of this bug. Any misleading on my part was unintentional. I didn't have all the facts when I wrote up that first article, and even now I still don't have all the facts. For example, FindNextFile using bulk queries? I didn't learn that until Tuesday night when I was investigating an earlier comment—time I should have been spending planning Wednesday night's dinner, mind you. (Yes, I'm a slacker and don't plan my meals out a week at a time like organized people do.)

    Note that the exercise is still valuable as a thought experiment. Suppose that FindNextFile didn't use bulk queries and that the problem really did manifest itself only after the 101st round-trip query. How would you fix it?

    I should also point out that the bug in question is not my bug. I just saw it in the bug database and thought it would be an interesting springboard for discussion. By now, I'm kind of sick of it and will probably not bother checking back to see how things have settled out.

  • The Old New Thing

    Where did the name for Microsoft Access come from?

    • 15 Comments

    We've seen how the names for some Microsoft products had to be changed due to a name conflict. I'm told that the people who had to come up with the name for the database product avoided this pitfall in a clever way: Instead of trying to avoid a name that was already taken, they intentionally used a name that was already taken: By Microsoft itself.

    They discovered that Microsoft had a long-forgotten terminal emulator product called Microsoft Access. "Access" sounded like an appropriate name for a database product, so they blew the dust off it and gave the name a new life.

  • The Old New Thing

    Locale-sensitive number grouping

    • 67 Comments

    Most westerners are familiar with the fact that the way numbers are formatted differ between the United States and much of Europe.

    Culture Format
    United States 1,234,567.89
    France 1 234 567,89
    Germany 1.234.567,89
    Switzerland 1'234'567.89

    What people don't realize is that the grouping is not always in threes. In India, the least significant group consists of three digits, but subsequent groups are in pairs.

    India 12,34,567.89

    I've also seen reports that the first group consists of five digits, followed by pairs:

    India 12,34567.89

    Meanwhile, Chinese and Japanese traditionally group in fours.

    China, Japan 123 4567.89

    What does this mean for you? Don't assume that numbers group in threes, and of course you can't assume that the grouping separator is the comma and the decimal character is the period. Just use the GetNumberFormat function and let NLS do the work for you.

    Next time, a little more about that NUMBERFMT structure.

  • The Old New Thing

    Adding a new flag to enable behavior that previously was on by default

    • 73 Comments

    One of the suggestions for addressing the network compatibility problem was to give up on fast mode and have a new "fast mode 2". (Equivalently, add a flag to the server capabilities that means "I support fast mode, and I'm not buggy.") This is another example of changing the rules after the game is over, by adding a flag to work around driver bugs.

    Consider a hypothetical program that uses fast mode on Windows XP. It runs against a Windows Server 2003 server and everybody is happy. Suppose you make a change to Windows Vista so that it requires that servers set a new "fast mode 2" flag in order to support fast mode. When the customer upgrades their client from Windows XP to Windows Vista, they would find that their hypothetical program ran much slower. Whose fault is it? Not the hypothetical program that was using fast mode on Windows XP; that program is using fast mode correctly. Not the Windows Server 2003 machine; that server supports fast mode correctly. Is it Windows Vista, then, that is at fault?

    "Hey, don't blame me," you answer. you answer. "It's that guy over there. That guy you've never heard of. He made me do it. Blame him!"

    To describe this sort of behavior I like to steal a phrase from Albert Einstein: "Spooky action at a distance". (Einstein used it to describe what in modern physics is known as quantum entanglement.) In this particular situation, we have a conversation between two participants (the client software and the server software) mediated by a third (Windows) which collapses due to the mere existence of a fourth party not involved in the conversation! It's as if your CD player suddenly lost the ability to play any of your music CDs because some company you've never heard of halfway around the world pressed a bunch of bad CDs for a few months earlier this year.

    Some people suggested, "Why not have a flag that says 'I support fast mode'?" Indeed that flag already exists; that's why Windows Vista was trying to use fast mode in the first place. The problem wasn't that the server didn't support fast mode. The problem was that the server had a bug in its fast mode implementation.

    "Okay, then add a new flag that says 'My fast mode isn't buggy.'" Consider also how this course of action would look after a few revisions of the specification:

    In response to the QUERY_CAPABILITIES request, the server shall return a 32-bit value consisting of zero of more of the following bits:

    0x00000001  This server supports fast mode
    0x00000002  This server supports fast mode and doesn't have the bug where enumerating a directory with more than 128 files fails on the 129th query
    0x00000004  This server supports fast mode and doesn't have the bug where the long file name is reported incorrectly in the response packet
    0x00000008  This server supports fast mode and doesn't have the bug where directories whose names consist entirely of digits are misreported as files
    0x00000010  This server supports fast mode and doesn't have the bug where the enumeration resets if a file is created in the directory while the enumeration is in progress
    0x00000020  This server supports fast mode and doesn't have the bug where FindNext returns failure even though there are still files to be enumerated
    ...

    If a new capabilities flag were created for every single server bug that was discovered, the capabilities mask would quickly fill up with all these random bits for bugs that were fixed ages ago. And each time a bug was found in any one server, all servers would have to be updated to add the new capabilities bit that says, "I'm not that buggy server you found on April 8th 2006," even the servers sitting in a locked closet whose operating systems are burned into EPROMs. And if you're the author of a new server, which capabilities bits do you set? Do you claim that you don't have the bug where FindNext returns failure even though the enumeration hasn't completed? What if, six months after you ship, somebody finds a bug in your server of exactly that sort? I guess this mean that the next revision of the protocol will have to have a new flag:

    0x00000020  This server supports fast mode and doesn't have the bug where it claims that it doesn't have the "FindNext returns failure even though there are still files to be enumerated" bug, even though it actually does have the bug, but in a more subtle manner

    Or maybe you're convinced that you don't have any bugs in your "fast mode" implementation. Do you report 0xFFFFFFFF to say "I have no bugs at all, not even the ones people might discover later in other implementations"? What happens when the 33rd "fast mode" bug is found? Do we have to have a QUERY_CAPABILITIES2 function? If a capabilities bit is created for every single bug that ever existed in a networking protocol implementation, you'd have a few thousand capability bits all of whom mean "I don't have that bug where..."

    Now, I'm not saying that this course of action is out of the question. Sometimes you have to do it, but you also have to realize that the cost for making this type of change is very high, and the benefit had better be worth it.

  • The Old New Thing

    What's the deal with the house in front of Microsoft's RedWest campus?

    • 23 Comments

    What's the deal with the house in front of Microsoft's RedWest campus?

    Here is my understanding. It may be incomplete or even flat-out wrong.

    The house belongs to a couple who was unwilling to sell their property when Microsoft's real estate people were buying up the land on which to build the RedWest campus. (I'm told it was originally a chicken farm.) Eventually, a deal was struck: The couple would sell the property to Microsoft but retain the right to live there until the end of their natural lives. Furthermore, Microsoft would assume responsibility for maintaining the lawn and landscaping.

    When Microsoft needed to build an underground parking garage beneath their property, the house was put on a truck, carried across the street, where it rested for the duration of the construction, after which it was returned to its original location. I imagine the couple was put up in a very nice hotel for the duration of the construction. (Heck, maybe they got a nice kitchen remodel out of the deal, who knows?)

    And while I'm spreading rumors about the Microsoft RedWest campus, here's another one: If you pay a visit to the campus, you will find a nature trail that leads through the wetlands that adjoin the campus. I was told that the wetlands preservation area was part of the environmental impact mitigation plan that was necessary to obtain approval for the construction. The students at the nearby school will occasionally take field trips there.

    (I'm going to cover lighter issues for a while just to take a break from the network interoperability topic that has raged for over a week now.)

  • The Old New Thing

    Why is the Microsoft Protection Service called "msmpsvc"?

    • 40 Comments

    (This is the first in a series of short posts on where Microsoft products got their names.)

    The original name for the malware protection service was "mpsvc" the "Microsoft Protection Service", but it was discovered later that that filename was already used by malware! As a result, the name of the service had to be changed by sticking an "ms" in front, making it "msmpsvc.exe".

    Therefore, technically, its name is the "Microsoft Microsoft Protection Service". (This is, of course, not to be confused with "mpssvc.exe", which is, I guess, the "Microsoft Protection Service Service".)

    Fortunately, the Marketing folks can attempt to recover by deciding that "msmpsvc" stands for "Microsoft Malware Protection Service". But you and I will know what it really stands for.

  • The Old New Thing

    What does CS_SAVEBITS do?

    • 35 Comments

    If you specify the CS_SAVEBITS class style, then the window manager will try to save the bits covered by the window. But the real question is why, because that is your guide to using this power only for good, not for evil.

    When a window whose class specifies the CS_SAVEBITS class style is displayed, the window manager takes a snapshot of the pixels on the screen where the window will be displayed. First, it asks the video card to store the pixels in available off-screen video memory (fast). If no video memory is available, then the pixels will be stored in system memory (slower). If the saved pixels have not been discarded in the meantime (see below), then when the window is hidden, the saved pixels are copied back to the screen and validated; in other words, the pixels are marked as "good" and no WM_PAINT message is generated.

    What invalidates the saved pixels? Anything that would cause those pixels to be out of sync with what should be on the screen once the popup window is removed. Here are some examples:

    • If the popup window moves, then the saved pixels are discarded, since putting those pixels back on the screen would put them in the wrong place.
    • If an underlying window invalidates itself (most commonly via InvalidateRect), then the saved pixels are also discarded, because the underlying window has indicated that it wants to change its pixels.
    • If any windows beneath the popup change size or position or z-order, then the saved pixels are of no use.
    • If any windows are created or destroyed beneath the popup.
    • If somebody calls GetDC for a window beneath the popup and starts drawing.

    You get the idea. If copying the saved pixels back to the screen would result in an inconsistent display, then the saved pixels are discarded.

    So how do you use this power for good and not for evil?

    One consideration is that the region should cover a relatively small portion of the screen, because the larger the saved bitmap, the less likely it will fit into available off-screen video memory, which means the more likely it will have to travel across the bus in a video-to-system-memory blit, the dreaded "vid-sys blt" that game developers are well familiar with. In the grand scheme of vid/sys blts, "vid-vid" is the fastest (since the video card is very good at shuffling memory around within itself), "sys-sys" is next best (since the motherboard can shuffle memory around within itself, though it'll cost you CPU cache space), "sys-vid" is in third place, and "vid-sys" is the worst: Programs write to video memory much more often than they read from it. As a result, the bandwidth between the video card and system memory is optimized for writing to video, not reading from it.

    But the primary concern for deciding when to use the CS_SAVEBITS window class style is not making the window manager go to all the trouble of saving the pixels, only to have to throw them away. A window that is a good candidate for the CS_SAVEBITS style is therefore one that does not move, covers a relatively small portion of the screen, and is visible for only a short time. That the window shouldn't move is obvious: If the window moves, then the saved pixels are useless. The other two rules of thumb try to minimize the opportunity for another window to do something that invalidates the saved pixels. By keeping the window small in area and putting it on the screen for only a short time, you keep the "target" small both spatially and temporally.

    Consequently, the best candidates for CS_SAVEBITS are menus, tooltips, and small dialogs, since they aren't too big, they don't typically move around, and they go away pretty quickly.

    (Some people appear to be under the mistaken impression that CS_SAVEBITS saves the bits of the window itself. I don't know where people get this impression from since even a modicum of experimentation easily demonstrates it to be false. The Windows drawing model follows the principle of Don't save anything you can recalculate.)

  • The Old New Thing

    Doing the best we can until time travel has been perfected

    • 64 Comments

    Mistakes were made.

    Mistakes such as having Windows NT put Notepad in a different location from Windows 3.1. (Though I'm sure they had their reasons.) Mistakes such as having a TCS_VERTICAL when there is already a CCS_VERT style. Mistakes such as having listview state images be one-biased, whereas treeview state images are zero-biased.

    But what's done is done. The mistakes are out there. You can't go back and fix them—at least not until time travel has been perfected—or you'll break code that was relying on the mistakes. (And believe me, there's a lot of code that relies on mistakes.) You'll just have to do the best you can with the situation as it is.

    Often, when I discuss a compatibility problem, people will respond with "That's your own damn fault. If you had done XYZ, then you wouldn't have gotten into this mess." Maybe that's true, maybe it isn't, but that doesn't make any progress towards solving the problem and therefore isn't very constructive. I sure hope these people never become lifeguards.

    "Help me, I'm drowning!"

    "Are you wearing a life preserver?"

    "No."

    "Well, if you had worn a life preserver, then you wouldn't be drowning. It's your own damn fault."

    When faced with a problem, you first need to understand the problem, then you set about exploring solutions to the problem. Looking for someone to blame doesn't solve the problem. I'm not saying that one should never assign blame, just that doing so doesn't actually solve anybody's problem. (If you want to blame somebody, do it at the bug post-mortem. Then you can study the conditions that led to the mistake, assign blame, if you're looking for a scapegoat, and take steps to prevent a future mistake of the same sort from occurring. As a lifeguard, you first rescue the drowning person, and then you lecture them for not wearing a life preserver.)

  • The Old New Thing

    A new scripting language doesn't solve everything

    • 96 Comments

    Yes, there are plenty of scripting languages that are much better than boring old batch. Batch files were definitely a huge improvement over SUBMIT back in 1981, but they've been showing their age for quite some time. The advanced age of boring old batch, on the other hand, means that you have millions of batch files out there that you had better not break if you know what's good for you. (Sure, in retrospect, you might decide to call the batch language a design mistake, but remember that it had to run in 64KB of memory on a 4.77MHz machine while still remaining compatible in spirit with CP/M.)

    Shipping a new command shell doesn't solve everything either. For one thing, you have to decide if you are going to support classic batch files or not. Maybe you decide that you won't and prefer to force people to rewrite all their batch files into your new language. Good luck on that.

    On the other hand, if you decide that you will support batch files after all, then presumably your new command shell will not execute old batch files natively, but rather will defer to CMD.EXE. And there's your problem: You see, batch files have the ability to modify environment variables and have the changes persist beyond the end of the batch file. Try it:

    C> copy con marco.cmd
    @set MARCO=polo
    ^Z
            1 file(s) copied.
    
    C> echo %MARCO%
    %MARCO%
    
    C> marco
    
    C> echo %MARCO%
    polo
    

    If your new command shell defers to CMD.EXE, these environment changes won't propagate back to your command shell since the batch file modifies the environment variables of CMD.EXE, not your shell. Many organizations have a system of batch files that rely on the ability to pass parameters between scripts by stashing them into environment variables. The DDK's own razzle does this, for example, in order to establish a consistent build environment and pass information to build.exe about what kind of build you're making. And I bet you have a batch file or two that sets your PROMPT or PATH environment variable or changes your current directory.

    So good luck with your replacement command shell. I hope you figure out how to run batch files.

Page 1 of 4 (34 items) 1234