• The Old New Thing

    It's amazing how many business meetings there are in Munich in late September

    • 20 Comments

    During my emergency vacation, we stopped at a German supermarket, and my friend loaded up on all sorts of odd and fascinating products. This is something he does every time he travels abroad. At the register, my friend did the work of unloading the cart onto the conveyor belt while I went ahead to bag and to deal with any questions from the cashier, since I was the only German-speaking person in our little group. The woman behind my friend looked at what he was buying and made some remark that implied that he did not make the most price-efficient choices.

    My friend replied, "Oh, we're from the United States, and I'm just buying things that we don't have in the States."

    The woman's demeanor changed. She was no longer upset that my friend failed to purchase the sale items. Instead, she said with concern, "But these are not typical German items."

    My friend explained, "I just like buying different things."

    I was out of earshot for this conversation. Otherwise, I would have quipped, "Keine Sorgen. Letztes Mal haben wir typische deutsche Sachen gekauft." ("Don't worry. Last time, we bought typical German things.")

    Bonus chatter: As we loaded the groceries into the car, a gentleman noticed that we were speaking English and pegged us for foreigners. He asked us, "What brings you to Munich?"

    My friend explained, "I'm here for a business meeting."

    The gentleman replied with a twinkle in his eye, "It's amazing how many business meetings there are in Munich in late September."

    By popular demand: A few people wanted to know what sort of odds and ends my friend bought. To be honest, I don't remember because, after all, it was seven years ago. But it included novelty candies (for example, in the United States, candy cigarettes are difficult if not impossible to find), food packaged in squeezable tubes, and cans of fruit cocktail. I don't know why he got the fruit cocktail.

  • The Old New Thing

    What's up with the strange treatment of quotation marks and backslashes by CommandLineToArgvW

    • 28 Comments

    The way the CommandLineToArgvW function treats quotation marks and backslashes has raised eyebrows at times. Let's look at the problem space, and then see what algorithm would work.

    Here are some sample command lines and what you presumably want them to be parsed as:

    Command line Result
    program.exe "hello there.txt" program.exe
    hello there.txt
    program.exe "C:\Hello there.txt" program.exe
    C:\Hello there.txt

    In the first example, we want quotation marks to protect spaces.

    In the second example, we want to be able to enclose a path in quotation marks to protect the spaces. Backslashes inside the path have no special meaning; they are copied as any other normal character.

    So far, the rule is simple: Inside quotation marks, just copy until you see the matching quotation marks. Now here's another wrinkle:

    Command line Result
    program.exe "hello\"there" program.exe
    hello"there

    In the third example, we want to embed a quotation mark inside a quotated string by protecting it with a backslash.

    Okay, to handle this case, we say that a backslash which precedes a quotation mark protects the quotation mark. The backslash itself should disappear; its job is to protect the quotation mark and not to be part of the string itself. (If we kept the backslash, then it would not be possible to put a quotation mark into the command line parameter without a preceding backslash.)

    But what if you wanted a backslash at the end of the string? Then you protect the backslash with a backslash, leaving the quotation mark unprotected.

    Command line Result
    program.exe "hello\\" program.exe
    hello\

    Okay, so what did we come up with?

    We want a backslash before a quotation mark to protect the quotation mark, and we want a backslash before a backslash to protect the backslash (so you can end a string with a backslash). Otherwise, we want the backslash to be given no special treatment.

    The CommandLineToArgvW function therefore works like this:

    • A string of backslashes not followed by a quotation mark has no special meaning.
    • An even number of backslashes followed by a quotation mark is treated as pairs of protected backslashes, followed by a word terminator.
    • An odd number of backslashes followed by a quotation mark is treated as pairs of protected backslashes, followed by a protected quotation mark.

    The backslash rule is confusing, but it's necessary to permit the very important second example, where you can just put quotation marks around a path without having to go in and double all the internal path separators.

    Personally, I would have chosen a different backslash rule:

    Warning - these are not the actual backslash rules. These are Raymond's hypothetical "If I ran the world" backslash rules.

    • A backslash followed by another backslash produces a backslash.
    • A backslash followed by a quotation mark produces a quotation mark.
    • A backslash followed by anything else is just a backslash followed by that other character.

    I prefer these rules because they can be implemented by a state machine. On the other hand, it makes quoting regular expressions a total nightmare. It also breaks "\\server\share\path with spaces", which is pretty much a deal-breaker. Hm, perhaps a better set of rules would be

    Warning - these are not the actual backslash rules. These are Raymond's second attempt at hypothetical "If I ran the world" backslash rules.

    • Backslashes have no special meaning at all.
    • If you are outside quotation marks, then a " takes you inside quotation marks but generates no output.
    • If you are inside quotation marks, then a sequence of 2N quotation marks represents N quotation marks in the output.
    • If you are inside quotation marks, then a sequence of 2N+1 quotation marks represents N quotation marks in the output, and then you exit quotation marks.

    This can also be implemented by a state machine, and quoting an existing string is very simple: Stick a quotation mark in front, a quotation mark at the end, and double all the internal quotation marks.

    But what's done is done, and the first set of backslash rules is what CommandLineToArgvW implements. And since the behavior has been shipped and documented, it can't change.

    If you don't like these parsing rules, then feel free to write your own parser that follows whatever rules you like.

    Bonus chatter: Quotation marks are even more screwed up.

  • The Old New Thing

    How is the CommandLineToArgvW function intended to be used?

    • 18 Comments

    The CommandLineToArgvW function does some basic command line parsing. A customer reported that it was producing strange results when you passed an empty string as the first parameter:

    LPWSTR argv = CommandLineToArgvW(L"", &argc);
    

    Well, okay, yeah, but huh?

    The first parameter to CommandLineToArgvW is supposed to be the value returned by GetCommandLineW. That's the command line, and that's what CommandLineToArgvW was designed to parse. If you pass something else, then CommandLineToArgvW will try to cope, but it's not really doing what it was designed for.

    It turns out that the customer was mistakenly passing the lpCmdLine parameter that was passed to the wWinMain function:

    int WINAPI wWinMain(
        HINSTANCE hInstance,
        HINSTANCE hPrevInstance,
        LPWSTR lpCmdLine,
        int nCmdShow)
    {
        int argc;
        LPWSTR argv = CommandLineToArgvW(lpCmdLine, &argc);
        ...
    }
    

    That command line is not in the format that CommandLineToArgvW expects. The CommandLineToArgvW function wants the full, unexpurgated command line as returned by the GetCommandLineW function, and it breaks it up on the assumption that the first word on the command line is the program name. If you hand it an empty string, the CommandLineToArgvW function says, "Whoa, whoever generated this command line totally screwed up. I'll try to muddle through as best I can."

    Next time, we'll look at the strange status of quotation marks and backslashes in CommandLineToArgvW.

  • The Old New Thing

    Follow-up: The impact of overwhelmingly talented competitors on the rest of the field

    • 14 Comments

    A while back, I wrote on the impact of hardworking employees on their less diligent colleagues. Slate uncovered a study that demonstrated the reverse effect: How Tiger Woods makes everyone else on the course play worse.

    The magic ingredient is the incentive structure. If you have an incentive structure which rewards the best-performing person, and there is somebody who pretty much blows the rest of the field out of the water, then the incentive structure effectively slips down one notch. Everybody is now fighting for second place (since they've written off first place to Tiger Woods), and since the second place prize is far, far below the first place prize, people don't have as much incentive to play well as they did when Tiger wasn't in the mix.

    The effect weakens the further down the ladder you go, for although the difference between first and second place is huge, the difference between 314th place and 315th place is pretty negligible.

  • The Old New Thing

    How do I create a UNC to an IPv6 address?

    • 18 Comments

    Windows UNC notation permits you to use a raw IPv4 address in dotted notation as a server name: For example, net view \\127.0.0.1 will show you the shared resources on the computer whose IP address is 127.0.0.1. But what about IPv6 addresses? IPv6 notation contains colons, which tend to mess up file name parsing since a colon is not a valid character in a path component.

    Enter the ipv6-literal.net domain.

    Take your IPv6 address, replace the colons with dashes, replace percent signs with the letter "s", and append .ipv6-literal.net. This magic host resolves back to the original IPv6 address, but it avoids characters which give parsers the heebie-jeebies.

    Note that this magic host is resolved internally by Windows and never hits the network. It's sort of a magic escape sequence.

  • The Old New Thing

    Microspeak: Sats

    • 27 Comments

    I introduced this Microspeak last year as part of a general entry about management-speak, but I'm giving it its own entry because it deserves some attention on its own.

    I just want to have creative control over how my audience can interact with me without resorting to complex hacking in a way that is easy to explain but ups our blogging audiences sats to a new level that may also stimulate a developer ecosytem that breeds quality innovation...

    Ignore the other management-speak; we're looking at the weird four-letter word sats.

    Sats is short for satisfaction metrics. This falls under the overall obsession on measurement at Microsoft. For many categories of employees (most notably the approximately 1000 employees eligible for the so-called Shared Performance Stock Awards program), compensation is heavily influenced by customer satisfaction metrics, known collectively as CSAT.

    Satisfaction metrics are so important that they have their own derived jargon.

    JargonMeaningDescription
    VSATVery satisfied Percentage of customers who report that they are very satisfied.
    DSATDissatisfied Percentage of customers who report that they are somewhat dissatisfied or very dissatisfied.
    NSATNet satisfaction NSAT = VSAT − DSAT

    All of these jargon terms are pronounced by saying the first letter, followed by the word sat, so for example, NSAT is pronounced N-sat.

    You can see some of these metrics in use in a blog post from the Director of Operations at Microsoft.com. Notice how he uses the terms VSAT and DSAT without bothering to explain what they mean. The meanings are so obvious to him that it doesn't even occur to him that others might not know what they mean. (By comparison, Kent Sharkey includes a definition when he uses the term.)

    And if you haven't gotten enough of this jargon yet, there's an entire training session online on the subject of the Customer Satisfaction Index. If you're impatient, click ahead to section 9.

  • The Old New Thing

    Ha ha, the speaker gift is a speaker, get it?

    • 11 Comments

    As a thank-you for presenting at TechReady11, the conference organizers gave me (and presumably the other speakers) a portable speaker with the Windows logo printed on it.

    The speaker underneath the logo is the X-Mini II Capsule Speaker, and I have to agree with Steve Clayton that they pack a lot of sound in a compact size. Great for taking on trips, or even picnics.

    It's been a long time since I last recommended a Christmas gift for geeks, so maybe I'll make up for it by giving two suggestions this year.

    The second suggestion is a response to a comment from that old article: My bicycle lock is just a laptop combination lock that I repurposed as a bicycle lock. It's a pretty clumsy design for a laptop lock, since it comes in two parts, one of which is easy to lose, but if you just "accidentally" lose the clip part, what's left is a simple cable combination lock that easily tucks into a side pocket of my bicycle trunk bag. Yes, that lock isn't going to stop a dedicated thief for very long, but fortunately, the Microsoft parking garage is not crawling with dedicated thieves because, as a rule, Microsoft tries not to hire dishonest people.

    So, um, that's a suggestion for a bicycle lock for somebody who lives in a low-crime area. Hm, maybe that's not a very good suggestion after all.

  • The Old New Thing

    Why doesn't Win32 give you the option of ignoring failures in DLL import resolution?

    • 14 Comments

    Yuhong Bao asked, via the Suggestion Box, "Why not implement delay-loading by having a flag in the import entry specifying that Windows should mimic the Windows 3.1 behavior for resolving that import?"

    Okay, first we have to clear up the false assumptions in the question.

    The question assumes that Windows 3.1 had delay-loading functionality in the first place (functionality that Yuhong Bao would like added to Win32). Actually, Windows 3.1 behavior did not have any delay-load functionality. If your module imported from another DLL in its import table, the target DLL was loaded when your module was loaded. There was no delay. The target DLL loaded at the same time your module did.

    So there is no Windows 3.1 delay-load behavior to mimic in the first place.

    Okay, maybe the question really was, "Instead of failing to load the module, why not just let the module load, but set the imported function pointers to a stub function that raises an error if you try to call it, just like Windows 3.1 did?"

    Because it turns out that the Windows 3.1 behavior resulted in data loss and mystery crashes. The Win32 design solved this problem by making failed imports fatal up front (a design principle known as fail fast), so you knew ahead of time that your program was not going to work rather than letting you run along and then watch it stop working at the worst possible time, and probably in a situation where the root cause is much harder to identify. (Mind you, it may stop working at the worst possible time for reasons the loader could not predict, but at least it stopped what it could.)

    In other words, this was a situation the Win32 people thought about and made an explicit design decision that this is a situation they would actively not support.

    Okay, but when Visual Studio was looking at how to add delay-load functionality, why didn't they implement it by changing the Win32 loader so that failed imports could be optionally marked as non-fatal?

    Well, um, because the Visual Studio team doesn't work on Windows?

    There's this feature you want to add. You can either add it to the linker so that all programs can take advantage of the feature on all versions of Windows, or you can add it to the operating system kernel, so that it works only on newer versions of Windows. If the feature had been added to the loader rather than the linker, application vendors would say, "Stupid Microsoft. I can't take advantage of this new feature because a large percentage of my customer base is still running the older operating system. Why couldn't they have added this feature to the linker, so it would work on all operating systems?" (You hear this complaint a lot. Any time a new version of Windows adds a feature, everybody demands that it be ported downlevel.)

    Another way of looking at this is realizing that you're adding a feature to the operating system which applications can already do for themselves. Suppose you say, "Okay, when you call a function whose import could not be resolved, we will display a fatal application error." The response is going to be "But I don't want my application to display a fatal application error. I want you to call this error handler function instead, and the error handler will decide what to do about the error." Great, now you have to design an extensibility mechanism. And what if two DLLs each try to install different failed-imported-function handlers?

    When you start at minus 100 points, saying, "Oh, this is not essential functionality. Applications can simulate it on their own just as easily, and with greater flexibility" does nothing to get you out of the hole. If anything, it digs you deeper into it.

  • The Old New Thing

    Hey there token, long time no see! (Did you do something with your hair?)

    • 10 Comments

    Consider a system where you have a lot of secured objects, and suppose further that checking whether a user has access to an object is a slow operation. This is not as rare as you might think: Even though a single access check against a security descriptor with a small number of ACEs might be fast, you can have objects with complicated security descriptors or (more likely) users who belong to hundreds or thousands of security groups. Since checking whether a security descriptor grants access to a token is potentially¹ O(nm) in the number of ACEs in the security descriptor and the number of groups the user belongs to (since each ACE needs to be checked to see if it matches each group), even a check against a small security descriptor can multiply out to a slow operation when the user belongs to thousands of groups.

    Suppose your profiling shows that you spend a lot of time checking tokens against security descriptors. How can you create a cache of access/no-access results so you can short-circuit the expensive security check when a user requests access to an object? (And obviously, you can't have any false positives or false negatives. Security is at stake here!)

    First, let's look at things that don't solve the problem: One option is to query the SID from the token and cache the access/no-access result with the SID. This option is flawed because between the two checks, the user's group membership may have changed. For example, suppose object X is accessible to members of Group G. Bob starts out as a member of Group G, asks you for access, and you grant it and cache the fact that Bob has access to object X. Later that day, Bob's membership in Group G is revoked, and when Bob logs on the next day, his token won't include Group G. If you had merely cached Bob's SID, you would have seen the entry in the cache and said, "Welcome back, Bob. Have fun with object X!" Bob then rubs his hands together and mutters Excellent! and starts making unauthorized changes to object X.

    Now, Bob's membership in Group G might have been revoked at Bob's request. Reducing one's privileges is a common safety measure. For example, Bob might remove his membership in the Administrators group so he won't accidentally delete an important file. Low Rights Internet Explorer intentionally removes a slew of privileges from its token so that the scope of damage of an attack from a malicious site is limited.

    Okay, so how can we recognize that the Bob that comes back has different group membership from the Bob that visited us the first time? You can do this with the help of the TOKEN_STATISTICS structure. This structure contains a number of locally-unique values which can be used to recognize and correlate tokens. A locally-unique value is a value that is unique on the local machine until the operating system is shut down or restarted. You request the statistics for a token by calling GetTokenInformation function, passing TokenStatistics as the information class.

    The AuthenticationId is known in some places as the LogonId because it is assigned to the logon session that the access token represents. There can be many tokens representing a single logon session, so that won't work for our purposes.

    The TokenId is a little closer. It is a locally-unique value assigned to a token when it is created. This value remains attached to that token until it is destroyed. This is closer, but still not perfect, because Bob can enable or disable privileges, and that doesn't change the token, but it sure changes the result of a security check!

    The ModifiedId is a value which is updated each time a token is modified. Therefore, when you want to cache that This particular token has access to this security descriptor, you should use the ModifiedId as the key. (Remember, locally-unique values are good only until the system shuts down or restarts, so don't cache them across reboots!)

    Now, a cache with a bad policy is another name for a memory leak, so be careful how much and how long you cache the results of previous security checks. You don't want somebody who goes into a loop alternatively calling AdjustTokenPrivileges and your function to cause your cache to consume all the memory in the system. (Each call to AdjustTokenPrivileges updates the ModifiedId, which causes your code to create a new cache entry.)

    Now, you might decide to use as your lookup key the ModifiedId and some unique identifier associated with the object. This means that if Bob accesses 500 objects, you have 500 cache entries saying Bob has access to object 1 through Bob has access to object 500. (And you have to remembering to purge all cached results for an object if the object's security descriptor changes.)

    It turns out you can do better.

    Even though you may have millions of objects, you probably don't have millions of security descriptors. For example, consider your hard drive: Most of the files on that hard drive use one of just a handful of security descriptors. In particular, it's nearly always the case that all files in a directory share the same security descriptor, because they start out with the security descriptor inherited from the directory, and most people don't bother customizing it. Even if your hard drive is on a server with hundreds of users connecting and creating files, you will probably only have a few thousand unique security descriptors.

    A better cache key would be the ModifiedId of the token being checked and the self-relative security descriptor that the token was checked against. If Bob accesses 500 objects, there will probably be only around five unique security descriptors. That's only five cache entries for Bob. It also saves you the trouble of remember to purge the cache when an object's security descriptor changes, since a new security descriptor changes one of the lookup keys, so it gets a new cache entry. Since security descriptors tend to be shared among many objects, you get two bonus benefits: The old security descriptor is probably still being used by some other object, so you may as well leave it in the cache and let it age out naturally. And second, there's a good chance the new security descriptor is already in your cache because it's probably already being used by some other object.

    ¹I use the word potentially because Windows Vista introduced an optimization which preprocesses the token to reduce the complexity of the access check operation. In practice, the access check is linear in the number of ACEs in the security descriptor.

    Bonus chatter: Note that even though Bob can remove his membership in a group, the system still knows that he's just pretending. This is important, because the security descriptor might contain a Deny ACE for people on Project Nosebleed. Even if Bob removes the Nosebleed group membership from his token in an attempt to get around the Deny ACE, the operating system won't be fooled: "Nice try, Bob. I know it's still you."

    Sponsorship message: I'd like to thank my pals over in the security team for reviewing this article and making suggestions and corrections. This article is sponsored by the AuthzAccessCheck function, which supports caching the results of an access check.

  • The Old New Thing

    Flushing your performance down the drain, that is

    • 30 Comments

    Some time ago, Larry Osterman discussed the severe performance consequences of flushing the registry, which is a specific case of the more general performance catch: Flushing anything will cost you dearly.

    A while back, I discussed the high cost of the "commit" function, and all the flush-type operations turn into a commit at the end of the day. FlushViewOfFile, [see correction below] FlushFileBuffers, RegFlushKey, they all wait until the data has been confirmed written to the disk. If you perform one of these explicit flush operations, you aren't letting the disk cache do its job. These types of operations are necessary only if you're trying to maintain transactional integrity. If you're just flushing the data because "Well, I'm finished so I want to make sure it gets written out," then you're just wasting your (and the user's) time. The data will get written out, don't worry. Only if there is a power failure in the next two seconds will the data fail to get written out, but that's hardly a new problem for your program. If the power went out in the middle of the call to FlushFileBuffers (say, after it wrote out the data containing the new index but before it wrote out the data the index points to), you would've gotten partially-written data anyway. If you're not doing transactional work, then your call to FlushFileBuffers didn't actually fix anything. You still have a window during which inconsistency exists on the disk.

    Conclusion: View any call to FlushViewOfFile, [see correction below] FlushFileBuffers, and RegFlushKey with great suspicion. They will kill your program's performance, and even in the cases in which you actually would want to call it, there are better ways of doing it nowadays.

    More remarks on that old TechNet article: The text for the Enable advanced performance check box has been changed in Windows 7 to something that more accurately describes what it does: Turn off Windows write-cache buffer flushing on the device. There's even explanatory text that explains the conditions under which it would be appropriate to enable that setting:

    To prevent data loss, do not select this check box unless the device has a separate power supply that allows the device to flush its buffer in case of power failure.

    Hard drives nowadays are more than just platters of magnetic media. There's also RAM on the hard drive circuit board, and this RAM is used by the hard drive firmware as yet another buffer. If the drive is told, "Write this data to the hard drive at this location," the drive copies the data into its private RAM buffer and immediately returns a successful completion code to the operating system. The drive then goes about seeking the head, looking for the sector, and physically writing out the data.

    When your program issues a write command to the file system (assuming that file system buffering is enabled), the write goes into the operating system disk cache, and periodically, the data from the operating system disk cache is flushed to the hard drive. As we saw above, the hard drive lies to the operating system and says "Yeah, I wrote it," even though it hasn't really done it yet. The data the operating system requested to be written is just sitting in a RAM buffer on the hard drive, that in turn gets flushed out to the physical medium by the hard drive firmware.

    If you call one of the FlushBlahBlah functions, Windows flushes out its disk cache buffers to the hard drive, as you would expect. But as we saw above, this only pushes the data into the RAM buffer on the hard drive. Windows understands this and follows up with another command to the hard drive, "Hey, I know you're one of those sneaky hard drives with an internal RAM buffer. Yes, I'm talking to you; don't act all innocent like. So do me a favor, and flush out your internal RAM buffers too, and let me know when that's done." This extra "I know what you did last summer" step ensures that the data really is on physical storage, and the FlushBlahBlah call waits until the "Okay, I finished flushing my internal RAM buffer" signal from the hard drive before returning control to your program.

    This extra "flush out your internal RAM buffer too" command is the right thing to do, but it can safely be skipped under very special circumstances: Consider a hard drive with a power supply separate from the computer which can keep the drive running long enough to flush out its internal RAM, even in the event of a sudden total loss of external power. For example, it might be an external drive with a separate power supply that is hooked up to a UPS. If you have this very special type of set-up, then Windows doesn't need to issue the "please flush out your internal RAM buffers too" command, because you have a guarantee that the data will make it to the disk no matter what happens in the future. Even if a transformer box explodes, cutting off all power to your building, that hard drive has enough residual power to get the data from the internal RAM buffer onto the physical medium. Only if your hard drive has that type of set-up is it safe to turn on the Turn off Windows write-cache buffer flushing on the device check box.

    (Note that a laptop computer battery does not count as a guarantee that the hard drive will have enough residual power to flush its RAM buffer to physical media. You might accidentally eject the battery out of your laptop, or you might let your battery run down completely. In these cases, the hard drive will not have a chance to finish flushing its internal RAM buffer.)

    Of course, if the integrity of your disks is not important then go ahead and turn the setting on even though you don't have a battery backup. One case where this may be applicable is if you have a dedicated hard drive you don't care about losing if the power goes out. Many developers on the Windows team devote an entire hard drive to holding the files generated by a build of the operating system. Before starting a build, they reformat the drive. If the power goes out during a build, they'll just reformat the drive and kick off another build. In this case, go ahead and check the box that says Enable advanced performance. But if you care about the files on the drive, you shouldn't check the box unless you have that backup power supply.

Page 124 of 426 (4,251 items) «122123124125126»