September, 2010

  • The Old New Thing

    Microspeak: Sats

    • 27 Comments

    I introduced this Microspeak last year as part of a general entry about management-speak, but I'm giving it its own entry because it deserves some attention on its own.

    I just want to have creative control over how my audience can interact with me without resorting to complex hacking in a way that is easy to explain but ups our blogging audiences sats to a new level that may also stimulate a developer ecosytem that breeds quality innovation...

    Ignore the other management-speak; we're looking at the weird four-letter word sats.

    Sats is short for satisfaction metrics. This falls under the overall obsession on measurement at Microsoft. For many categories of employees (most notably the approximately 1000 employees eligible for the so-called Shared Performance Stock Awards program), compensation is heavily influenced by customer satisfaction metrics, known collectively as CSAT.

    Satisfaction metrics are so important that they have their own derived jargon.

    JargonMeaningDescription
    VSATVery satisfied Percentage of customers who report that they are very satisfied.
    DSATDissatisfied Percentage of customers who report that they are somewhat dissatisfied or very dissatisfied.
    NSATNet satisfaction NSAT = VSAT − DSAT

    All of these jargon terms are pronounced by saying the first letter, followed by the word sat, so for example, NSAT is pronounced N-sat.

    You can see some of these metrics in use in a blog post from the Director of Operations at Microsoft.com. Notice how he uses the terms VSAT and DSAT without bothering to explain what they mean. The meanings are so obvious to him that it doesn't even occur to him that others might not know what they mean. (By comparison, Kent Sharkey includes a definition when he uses the term.)

    And if you haven't gotten enough of this jargon yet, there's an entire training session online on the subject of the Customer Satisfaction Index. If you're impatient, click ahead to section 9.

  • The Old New Thing

    Ha ha, the speaker gift is a speaker, get it?

    • 11 Comments

    As a thank-you for presenting at TechReady11, the conference organizers gave me (and presumably the other speakers) a portable speaker with the Windows logo printed on it.

    The speaker underneath the logo is the X-Mini II Capsule Speaker, and I have to agree with Steve Clayton that they pack a lot of sound in a compact size. Great for taking on trips, or even picnics.

    It's been a long time since I last recommended a Christmas gift for geeks, so maybe I'll make up for it by giving two suggestions this year.

    The second suggestion is a response to a comment from that old article: My bicycle lock is just a laptop combination lock that I repurposed as a bicycle lock. It's a pretty clumsy design for a laptop lock, since it comes in two parts, one of which is easy to lose, but if you just "accidentally" lose the clip part, what's left is a simple cable combination lock that easily tucks into a side pocket of my bicycle trunk bag. Yes, that lock isn't going to stop a dedicated thief for very long, but fortunately, the Microsoft parking garage is not crawling with dedicated thieves because, as a rule, Microsoft tries not to hire dishonest people.

    So, um, that's a suggestion for a bicycle lock for somebody who lives in a low-crime area. Hm, maybe that's not a very good suggestion after all.

  • The Old New Thing

    Why doesn't Win32 give you the option of ignoring failures in DLL import resolution?

    • 14 Comments

    Yuhong Bao asked, via the Suggestion Box, "Why not implement delay-loading by having a flag in the import entry specifying that Windows should mimic the Windows 3.1 behavior for resolving that import?"

    Okay, first we have to clear up the false assumptions in the question.

    The question assumes that Windows 3.1 had delay-loading functionality in the first place (functionality that Yuhong Bao would like added to Win32). Actually, Windows 3.1 behavior did not have any delay-load functionality. If your module imported from another DLL in its import table, the target DLL was loaded when your module was loaded. There was no delay. The target DLL loaded at the same time your module did.

    So there is no Windows 3.1 delay-load behavior to mimic in the first place.

    Okay, maybe the question really was, "Instead of failing to load the module, why not just let the module load, but set the imported function pointers to a stub function that raises an error if you try to call it, just like Windows 3.1 did?"

    Because it turns out that the Windows 3.1 behavior resulted in data loss and mystery crashes. The Win32 design solved this problem by making failed imports fatal up front (a design principle known as fail fast), so you knew ahead of time that your program was not going to work rather than letting you run along and then watch it stop working at the worst possible time, and probably in a situation where the root cause is much harder to identify. (Mind you, it may stop working at the worst possible time for reasons the loader could not predict, but at least it stopped what it could.)

    In other words, this was a situation the Win32 people thought about and made an explicit design decision that this is a situation they would actively not support.

    Okay, but when Visual Studio was looking at how to add delay-load functionality, why didn't they implement it by changing the Win32 loader so that failed imports could be optionally marked as non-fatal?

    Well, um, because the Visual Studio team doesn't work on Windows?

    There's this feature you want to add. You can either add it to the linker so that all programs can take advantage of the feature on all versions of Windows, or you can add it to the operating system kernel, so that it works only on newer versions of Windows. If the feature had been added to the loader rather than the linker, application vendors would say, "Stupid Microsoft. I can't take advantage of this new feature because a large percentage of my customer base is still running the older operating system. Why couldn't they have added this feature to the linker, so it would work on all operating systems?" (You hear this complaint a lot. Any time a new version of Windows adds a feature, everybody demands that it be ported downlevel.)

    Another way of looking at this is realizing that you're adding a feature to the operating system which applications can already do for themselves. Suppose you say, "Okay, when you call a function whose import could not be resolved, we will display a fatal application error." The response is going to be "But I don't want my application to display a fatal application error. I want you to call this error handler function instead, and the error handler will decide what to do about the error." Great, now you have to design an extensibility mechanism. And what if two DLLs each try to install different failed-imported-function handlers?

    When you start at minus 100 points, saying, "Oh, this is not essential functionality. Applications can simulate it on their own just as easily, and with greater flexibility" does nothing to get you out of the hole. If anything, it digs you deeper into it.

  • The Old New Thing

    Hey there token, long time no see! (Did you do something with your hair?)

    • 10 Comments

    Consider a system where you have a lot of secured objects, and suppose further that checking whether a user has access to an object is a slow operation. This is not as rare as you might think: Even though a single access check against a security descriptor with a small number of ACEs might be fast, you can have objects with complicated security descriptors or (more likely) users who belong to hundreds or thousands of security groups. Since checking whether a security descriptor grants access to a token is potentially¹ O(nm) in the number of ACEs in the security descriptor and the number of groups the user belongs to (since each ACE needs to be checked to see if it matches each group), even a check against a small security descriptor can multiply out to a slow operation when the user belongs to thousands of groups.

    Suppose your profiling shows that you spend a lot of time checking tokens against security descriptors. How can you create a cache of access/no-access results so you can short-circuit the expensive security check when a user requests access to an object? (And obviously, you can't have any false positives or false negatives. Security is at stake here!)

    First, let's look at things that don't solve the problem: One option is to query the SID from the token and cache the access/no-access result with the SID. This option is flawed because between the two checks, the user's group membership may have changed. For example, suppose object X is accessible to members of Group G. Bob starts out as a member of Group G, asks you for access, and you grant it and cache the fact that Bob has access to object X. Later that day, Bob's membership in Group G is revoked, and when Bob logs on the next day, his token won't include Group G. If you had merely cached Bob's SID, you would have seen the entry in the cache and said, "Welcome back, Bob. Have fun with object X!" Bob then rubs his hands together and mutters Excellent! and starts making unauthorized changes to object X.

    Now, Bob's membership in Group G might have been revoked at Bob's request. Reducing one's privileges is a common safety measure. For example, Bob might remove his membership in the Administrators group so he won't accidentally delete an important file. Low Rights Internet Explorer intentionally removes a slew of privileges from its token so that the scope of damage of an attack from a malicious site is limited.

    Okay, so how can we recognize that the Bob that comes back has different group membership from the Bob that visited us the first time? You can do this with the help of the TOKEN_STATISTICS structure. This structure contains a number of locally-unique values which can be used to recognize and correlate tokens. A locally-unique value is a value that is unique on the local machine until the operating system is shut down or restarted. You request the statistics for a token by calling GetTokenInformation function, passing TokenStatistics as the information class.

    The AuthenticationId is known in some places as the LogonId because it is assigned to the logon session that the access token represents. There can be many tokens representing a single logon session, so that won't work for our purposes.

    The TokenId is a little closer. It is a locally-unique value assigned to a token when it is created. This value remains attached to that token until it is destroyed. This is closer, but still not perfect, because Bob can enable or disable privileges, and that doesn't change the token, but it sure changes the result of a security check!

    The ModifiedId is a value which is updated each time a token is modified. Therefore, when you want to cache that This particular token has access to this security descriptor, you should use the ModifiedId as the key. (Remember, locally-unique values are good only until the system shuts down or restarts, so don't cache them across reboots!)

    Now, a cache with a bad policy is another name for a memory leak, so be careful how much and how long you cache the results of previous security checks. You don't want somebody who goes into a loop alternatively calling AdjustTokenPrivileges and your function to cause your cache to consume all the memory in the system. (Each call to AdjustTokenPrivileges updates the ModifiedId, which causes your code to create a new cache entry.)

    Now, you might decide to use as your lookup key the ModifiedId and some unique identifier associated with the object. This means that if Bob accesses 500 objects, you have 500 cache entries saying Bob has access to object 1 through Bob has access to object 500. (And you have to remembering to purge all cached results for an object if the object's security descriptor changes.)

    It turns out you can do better.

    Even though you may have millions of objects, you probably don't have millions of security descriptors. For example, consider your hard drive: Most of the files on that hard drive use one of just a handful of security descriptors. In particular, it's nearly always the case that all files in a directory share the same security descriptor, because they start out with the security descriptor inherited from the directory, and most people don't bother customizing it. Even if your hard drive is on a server with hundreds of users connecting and creating files, you will probably only have a few thousand unique security descriptors.

    A better cache key would be the ModifiedId of the token being checked and the self-relative security descriptor that the token was checked against. If Bob accesses 500 objects, there will probably be only around five unique security descriptors. That's only five cache entries for Bob. It also saves you the trouble of remember to purge the cache when an object's security descriptor changes, since a new security descriptor changes one of the lookup keys, so it gets a new cache entry. Since security descriptors tend to be shared among many objects, you get two bonus benefits: The old security descriptor is probably still being used by some other object, so you may as well leave it in the cache and let it age out naturally. And second, there's a good chance the new security descriptor is already in your cache because it's probably already being used by some other object.

    ¹I use the word potentially because Windows Vista introduced an optimization which preprocesses the token to reduce the complexity of the access check operation. In practice, the access check is linear in the number of ACEs in the security descriptor.

    Bonus chatter: Note that even though Bob can remove his membership in a group, the system still knows that he's just pretending. This is important, because the security descriptor might contain a Deny ACE for people on Project Nosebleed. Even if Bob removes the Nosebleed group membership from his token in an attempt to get around the Deny ACE, the operating system won't be fooled: "Nice try, Bob. I know it's still you."

    Sponsorship message: I'd like to thank my pals over in the security team for reviewing this article and making suggestions and corrections. This article is sponsored by the AuthzAccessCheck function, which supports caching the results of an access check.

  • The Old New Thing

    Flushing your performance down the drain, that is

    • 30 Comments

    Some time ago, Larry Osterman discussed the severe performance consequences of flushing the registry, which is a specific case of the more general performance catch: Flushing anything will cost you dearly.

    A while back, I discussed the high cost of the "commit" function, and all the flush-type operations turn into a commit at the end of the day. FlushViewOfFile, [see correction below] FlushFileBuffers, RegFlushKey, they all wait until the data has been confirmed written to the disk. If you perform one of these explicit flush operations, you aren't letting the disk cache do its job. These types of operations are necessary only if you're trying to maintain transactional integrity. If you're just flushing the data because "Well, I'm finished so I want to make sure it gets written out," then you're just wasting your (and the user's) time. The data will get written out, don't worry. Only if there is a power failure in the next two seconds will the data fail to get written out, but that's hardly a new problem for your program. If the power went out in the middle of the call to FlushFileBuffers (say, after it wrote out the data containing the new index but before it wrote out the data the index points to), you would've gotten partially-written data anyway. If you're not doing transactional work, then your call to FlushFileBuffers didn't actually fix anything. You still have a window during which inconsistency exists on the disk.

    Conclusion: View any call to FlushViewOfFile, [see correction below] FlushFileBuffers, and RegFlushKey with great suspicion. They will kill your program's performance, and even in the cases in which you actually would want to call it, there are better ways of doing it nowadays.

    More remarks on that old TechNet article: The text for the Enable advanced performance check box has been changed in Windows 7 to something that more accurately describes what it does: Turn off Windows write-cache buffer flushing on the device. There's even explanatory text that explains the conditions under which it would be appropriate to enable that setting:

    To prevent data loss, do not select this check box unless the device has a separate power supply that allows the device to flush its buffer in case of power failure.

    Hard drives nowadays are more than just platters of magnetic media. There's also RAM on the hard drive circuit board, and this RAM is used by the hard drive firmware as yet another buffer. If the drive is told, "Write this data to the hard drive at this location," the drive copies the data into its private RAM buffer and immediately returns a successful completion code to the operating system. The drive then goes about seeking the head, looking for the sector, and physically writing out the data.

    When your program issues a write command to the file system (assuming that file system buffering is enabled), the write goes into the operating system disk cache, and periodically, the data from the operating system disk cache is flushed to the hard drive. As we saw above, the hard drive lies to the operating system and says "Yeah, I wrote it," even though it hasn't really done it yet. The data the operating system requested to be written is just sitting in a RAM buffer on the hard drive, that in turn gets flushed out to the physical medium by the hard drive firmware.

    If you call one of the FlushBlahBlah functions, Windows flushes out its disk cache buffers to the hard drive, as you would expect. But as we saw above, this only pushes the data into the RAM buffer on the hard drive. Windows understands this and follows up with another command to the hard drive, "Hey, I know you're one of those sneaky hard drives with an internal RAM buffer. Yes, I'm talking to you; don't act all innocent like. So do me a favor, and flush out your internal RAM buffers too, and let me know when that's done." This extra "I know what you did last summer" step ensures that the data really is on physical storage, and the FlushBlahBlah call waits until the "Okay, I finished flushing my internal RAM buffer" signal from the hard drive before returning control to your program.

    This extra "flush out your internal RAM buffer too" command is the right thing to do, but it can safely be skipped under very special circumstances: Consider a hard drive with a power supply separate from the computer which can keep the drive running long enough to flush out its internal RAM, even in the event of a sudden total loss of external power. For example, it might be an external drive with a separate power supply that is hooked up to a UPS. If you have this very special type of set-up, then Windows doesn't need to issue the "please flush out your internal RAM buffers too" command, because you have a guarantee that the data will make it to the disk no matter what happens in the future. Even if a transformer box explodes, cutting off all power to your building, that hard drive has enough residual power to get the data from the internal RAM buffer onto the physical medium. Only if your hard drive has that type of set-up is it safe to turn on the Turn off Windows write-cache buffer flushing on the device check box.

    (Note that a laptop computer battery does not count as a guarantee that the hard drive will have enough residual power to flush its RAM buffer to physical media. You might accidentally eject the battery out of your laptop, or you might let your battery run down completely. In these cases, the hard drive will not have a chance to finish flushing its internal RAM buffer.)

    Of course, if the integrity of your disks is not important then go ahead and turn the setting on even though you don't have a battery backup. One case where this may be applicable is if you have a dedicated hard drive you don't care about losing if the power goes out. Many developers on the Windows team devote an entire hard drive to holding the files generated by a build of the operating system. Before starting a build, they reformat the drive. If the power goes out during a build, they'll just reformat the drive and kick off another build. In this case, go ahead and check the box that says Enable advanced performance. But if you care about the files on the drive, you shouldn't check the box unless you have that backup power supply.

  • The Old New Thing

    The contractually obligatory beeper, and the customers who demand them

    • 33 Comments

    One of the fun parts of meeting with other developers, either at conferences or on my self-funded book tour, is exchanging war stories. Here's one of the stories I've collected, from somebody describing a former company. As is customary, I've removed identifying information.

    One day, the engineering team were instructed that the team was being issued a beeper, and that a member of the engineering team had to be on call at all times. This new requirement was the handiwork of the sales team, who landed a big contract with a customer, but the customer insisted on a support contract which included the ability to talk to a member of the engineering team at any time, day or night, to be used when they encountered an absolutely critical problem that the support team could not resolve to their satisfaction.

    The engineering team grumbled but knew this was something they had to accept because the customer placed a huge order and the company needed the money. To secure the engineering staff's agreement, management contributed $10 to a pool each week, and whoever was saddled with the beeper when it went off got the pool money as a consolation prize.

    The engineering team drew up a schedule, and responsibility for the beeper rotated among the team members.

    A week went by. The beeper didn't go off.

    Another week. Still quiet.

    Months passed. Still nothing.

    It was over a year before the customer asked to talk to a member of the engineering team. The engineer who received the windfall understood that the prize was a team effort, and as I recall, he spent it by buying beer and snacks for everyone.

    The engineering team figured that the customer had the engineer on call clause on their list of bullet point must-have features, even though they didn't really plan on using it much at all.

  • The Old New Thing

    How do I customize the order of items in the All Programs section of the Start menu?

    • 31 Comments

    The items in the All Programs section of the Start menu are grouped into two sections, although there are no visible divider lines between them.

    1. Non-folders, sorted alphabetically.
    2. Folders, sorted alphabetically.

    We saw earlier that the Fast Items lost their special status in Windows Vista and are sorted with the regular items. Another change from Windows XP is the order of the remaining two groups: Windows XP put folders above non-folders, because that was the sort order imposed by the IShellFolder::CompareIDs method so that folders sorted above files in regular Explorer windows. This deviation from standard sort order starting in Windows Vista was introduced because the guidance for application developers regarding Start menu shortcuts is to place program shortcuts in the Programs folder, with other supporting stuff in subfolders. Given that guidance, it is the program shortcuts in the Start menu that are more important, so they go at the top.

    If you don't like the alphabetical ordering, then you can go the Start menu Properties, select Customize, and then scroll down to the bottom of the options tree and uncheck Sort All Programs by name. If you do this, then you can manually rearrange the items in the All Programs menu via drag/drop to put them in whatever order you like.

    Pre-emptive snarky comment: "Changing the Start menu from a cascading menu to a tree navigation model was the stupidest idea since unsliced bread." Yes, I know you all hate it. Old news. Consider this a tip on how to cope with adversity.

  • The Old New Thing

    Was there really an Opera billboard outside Microsoft main campus?

    • 17 Comments

    In an interview with the Seattle Times, Rod Hamlin of Opera Software claimed,

    We put a big red billboard out by Microsoft last year that said, "Want to be a real Internet explorer? www.opera.com." We got some interesting feedback on that. All of the AT&T executives could see it and all the Microsoft guys driving back home past Marymoor Park.

    Okay, so where was this billboard? He says it was near Marymoor Park, and that it could be seen from AT&T executive offices, which makes sense so far because AT&T Wireless has offices in the Redmond Town Center business and shopping center, which lies right across the highway from the park.

    But then things fall apart. First of all, there is no billboard stand anywhere along the stretch of highway that goes between Marymoor Park and Redmond Town Center.

    Second, if you go to the regulations governing highway advertising in the State of Washington [easier-to-read PDF version], section 47.42.040 describes the types of signs allowed, and the alleged Opera billboard does not appear to be any of permissible types. (The closest match would be 47.42.040(4), if Opera had offices within twelve miles of Redmond Town Center.)

    Third, you'd think there'd be plenty of pictures of an advertising campaign this cheeky. But I haven't been able to find any online. POIDH.

    Here's an actual cheeky prank (and the response). The Internet Explorer team since learned their lesson, and now they send congratulatory cake.

    Stop the presses: A colleague of mine says that he saw the sign. But it wasn't a billboard. Actually, it wasn't even a sign. It was a sponsorship banner hung on the fence of one of the sports fields at Marymoor Park, the sort of sign that more traditionally might read Bob's Auto Repair proudly supports youth sports. Go Mustangs!) I asked him why he didn't take a picture. "I guess we've all become pretty jaded. Either that or everybody figured somebody else would take a picture (so then nobody did)."

    After our conversation, he went and took a picture.

  • The Old New Thing

    What happens to a named object when all handles to it are closed?

    • 14 Comments

    A customer had a question about named kernel objects:

    I understand that handles in a process if leaked will be destroyed by the kernel when the process exits. My question would be around named objects.

    Would named objects hold their value indefinitely? If I run a small utility app to increment a named counting semaphore, the count of that named semaphore could be lost when that app exits?

    I would expect it to always hold its current value so that transactions across processes and across time could be held even if no process is holding on to it.

    When the last handle to a named kernel object (such as a named semaphore or a named shared memory block) is closed, the object itself is destroyed. Doesn't matter whether you explicitly closed the handle by calling CloseHandle or the kernel closed the handle for you when it cleaned up the mess you left behind. The object manager doesn't say, "Well, if the application explicitly called CloseHandle, then I'll also delete the named object, but if the application leaked the handle, then I'll leave the named object around."

    First of all, that would kind of belie the whole concept of clean-up. Cleaning up means destroying the resources the application neglected to.

    Second, this would create a bizarre situation where the way to access a new feature is to intentionally do something wrong. (Namely, to leak a handle to a named object.)

    Okay, so maybe the expectation was that named objects persisted after all handles to them are closed, even if the handle is closed via the normal CloseHandle mechanism. But then how would you delete a named object? There is no DeleteNamedEvent function, after all. You could write a process that created 2 billion named objects and then leaked them. Boom, now you can't clean up by killing the process; you have to restart the computer.

    Kernel objects all follow the same lifetime rules, whether they are named or anonymous: The object is destroyed when the last reference to it is removed (when the handle is closed, noting also that running threads and processes keep a reference to the corresponding kernel object).

    If you want something that survives after all its handles are closed, then use something with a persistence model, like a file.

  • The Old New Thing

    It rather involved being on the other side of this airtight hatchway: If you grant users full control over critical files, then it's not the fault of the system for letting users modify them

    • 55 Comments

    Today's dubious security vulnerability is another example of If you reconfigure your computer to be insecure, don't be surprised that there's a security vulnerability.

    This example comes from by an actual security vulnerability report submitted to Microsoft:

    I have found a critical security vulnerability that allows arbitrary elevation to administrator from unprivileged accounts.

    1. Grant Full Control of the Windows directory (and all its contents and subdirectories) to Everyone.
    2. Log on as an unprivileged user and perform these actions...

    I can just stop there because your brain has already stopped processing input because of all the alarm bells ringing after you read that first step. That first step gives away the farm. If you grant control to the entire contents of the Windows directory to non-administrators, then don't be surprised that they can run around and do bad things!

    "If I remove all the locks from my doors, then bad guys can steal my stuff."

    Yeah, so don't do that. This is not a security vulnerability in the door.

    Bonus chatter: There are many variations on this dubious security vulnerability. Actual vulnerability reports submitted to Microsoft include
    • "First, grant world-write permission to this registry key..."
    • "First, reconfigure Internet Explorer to allow scripting of ActiveX controls not marked safe for scripting..."
    • "On a compromised machine, you can..."

    That last one is impressive for its directness. "Starting on the other side of this airtight hatchway..."

Page 3 of 4 (31 items) 1234