• The Old New Thing

    People lie on surveys and focus groups, often unwittingly

    • 81 Comments

    Philip Su's discussion of early versions of Microsoft Money triggered a topic that had been sitting in the back of my mind for a while: That people lie on surveys and focus groups, often unwittingly. I can think of three types of lies offhand. (I'm not counting malicious lying; that is, intentional lying for the purpose of undermining the results of the survey or focus group.)

    First, people lie about the reasons why they do things.

    The majority of consumers who buy computers claim that personal finance management is one of the top three reasons they are purchasing a PC. They've been claiming this for more than a decade. But only somewhere around 2% of consumers end up using a personal finance manager.

    This is one of those unconscious lies. People claim that they want a computer to do their personal finances, to organize their recipes, to mail-merge their Christmas card labels.

    They are lying.

    Those are the things people wish they would use their computer for. That's before the reality hits them of how much work it is to track every expenditure, transcribe every recipe, type in every address. In reality, they end up using the computer to play video games, surf the web, and email jokes to each other.

    Just because people say they would do something doesn't mean they will. That leads to the second class of focus group lying: The polite lie, also known as "say what the sponsor wants to hear".

    The following story is true, but the names have been changed.

    A company conducted focus groups for their Product X, which had as its main competitor Product Q. They asked people who were using Product Q, "Why do you use Product Q instead of Product X?" The respondents gave their reasons: "Because Product Q has feature F," "Because Product Q performs G faster," "Because Product Q lets me do activity H." They added, "If Product X did all that and was cheaper, we'd switch to it."

    Armed with this valuable insight, the company expended time, effort, and money in adding feature F to Product X, making Product X do G faster, and adding the ability to do activity H. They lowered the price and sat back and waited for the customers to beat a path to their door.

    But the customers didn't come.

    Why not?

    Because the customers were lying. In reality, they had no intention of switching from Product Q to Product X at all. They grew up with Product Q, they were used to the way Product Q worked, they simply liked Product Q. Product Q had what in the hot bubble-days was called "mindshare", but what in older days was called "brand loyalty" or just "inertia".

    When asked to justify why they preferred Product Q, the people in the focus group couldn't say, "I don't know; I just like it." That would be perceived as an "unhelpful" answer, and besides it would be subconsciously admitting that they were being manipulated by Product Q's marketing! Instead, they made up reasons to justify their preference to themselves and consequently to the sponsor of the focus group.

    Result: Company wastes tremendous effort on the wrong thing.

    (Closely related to this is the phenomenon of saying—and even believing—"I'd pay ten bucks for that!" Yet when the opportunity arises to buy it for $10, you decline. I do this myself.)

    The third example of lying that occurred to me is the one where you don't even realize that you are contradicting yourself. My favorite example of this was a poll on the subject of congestion charging on highways in the United States. The idea behind congestion charging is to create a toll road and vary the cost of driving on the road depending on how heavy traffic is. Respondents were asked two questions:

    1. "If congestion charging were implemented in your area, do you think it would reduce traffic congestion?"
    2. "If congestion charging were implemented in your area, would you be less likely to drive during peak traffic hours?"

    Surprisingly, most people answered "No" to the first question and "Yes" to the second. But if you stop and think about it, if people avoid driving during peak traffic hours, then congestion would be reduced because there are fewer cars on the road. An answer of "Yes" to the second question logically implies an answer of "Yes" to the first question.

    (One may be able to explain this by arguing that, "Well, sure congestion charging would be effective for influencing my driving behavior, but I don't see how it would affect enough other people to make it worthwhile. I'm special." Sort of how most people rate themselves as above-average drivers.)

    What I believe happened was that people reacted by saying to themselves, "I am opposed to congestion charging," and concluding, "Therefore, I must do what I can to prevent it from happening." Proclaiming on surveys that it would never work is one way of accomplishing this.

    When I shared my brilliant theories with some of my colleagues, one of them, a program manager on the Office team, added his own observation (which I have edited slightly):

    A variation of two of the above observations that often shows up in the usability lab:

    A user has spent an hour battling with the software. At some point the user's expectation of how the software should behave (the "user model") diverged from the actual behavior. Consequently, the user couldn't predict what will happen next and is therefore having a horrible time making any progress on the task. (Usually, this is the fault of the software design unintentionally misleading the user—which is why we test things.) After many painful attempts, the user finally succeeds, gets hints, or is flat-out told how the feature works. Often, the user stares mutely at the monitor for five seconds, then says: "I suppose that makes sense."

    It's an odd combination of people wanting to give a helpful answer with people wanting to feel special. In this case, the user wants to say something nice about the software that any outside observer could clearly tell was broken. Additionally, the users (subconsciously) don't want to admit that they were wrong and don't understand the software.

    Usability participants also have a tendency to say "I'm being stupid" when those of us on the other side of the one-way glass are screaming "No you're not, the software is broken!" That's an interesting contrast—in some cases, pleading ignorance is a defense. In other cases, pleading mastery is. At the end of the day, you must ignore what the user said and base any conclusions on what they did.

    I'm sure there are other ways people subconsciously lie on surveys and focus groups, but those are the ones that came to mind.

    [Insignificant typos fixed, October 13.]

  • The Old New Thing

    Defrauding the WHQL driver certification process

    • 81 Comments

    In a comment to one of my earlier entries, someone mentioned a driver that bluescreened under normal conditions, but once you enabled the Driver Verifier (to try to catch the driver doing whatever bad thing it was doing), the problem went away. Another commenter bemoaned that WHQL certification didn't seem to improve the quality of the drivers.

    Video drivers will do anything to outdo their competition. Everybody knows that they cheat benchmarks, for example. I remember one driver that ran the DirectX "3D Tunnel" demonstration program extremely fast, demonstrating how totally awesome their video card is. Except that if you renamed TUNNEL.EXE to FUNNEL.EXE, it ran slow again.

    There was another one that checked if you were printing a specific string used by a popular benchmark program. If so, then it only drew the string a quarter of the time and merely returned without doing anything the other three quarters of the time. Bingo! Their benchmark numbers just quadrupled.

    Anyway, similar shenanigans are not unheard of when submitting a driver to WHQL for certification. Some unscrupulous drivers will detect that they are being run by WHQL and disable various features so they pass certification. Of course, they also run dog slow in the WHQL lab, but that's okay, because WHQL is interested in whether the driver contains any bugs, not whether the driver has the fastest triangle fill rate in the industry.

    The most common cheat I've seen is drivers which check for a secret "Enable Dubious Optimizations" switch in the registry or some other place external to the driver itself. They take the driver and put it in an installer which does not turn the switch on and submit it to WHQL. When WHQL runs the driver through all its tests, the driver is running in "safe but slow" mode and passes certification with flying colors.

    The vendor then takes that driver (now with the WHQL stamp of approval) and puts it inside an installer that enables the secret "Enable Dubious Optimizations" switch. Now the driver sees the switch enabled and performs all sorts of dubious optimizations, none of which were tested by WHQL.

  • The Old New Thing

    IsBadXxxPtr should really be called CrashProgramRandomly

    • 81 Comments

    Often I'll see code that tries to "protect" against invalid pointer parameters. This is usually done by calling a function like IsBadWritePtr. But this is a bad idea. IsBadWritePtr should really be called CrashProgramRandomly.

    The documentation for the IsBadXxxPtr functions presents the technical reasons why, but I'm going to dig a little deeper. For one thing, if the "bad pointer" points into a guard page, then probing the memory will raise a guard page exception. The IsBadXxxPtr function will catch the exception and return "not a valid pointer". But guard page exceptions are raised only once. You just blew your one chance. When the code that is managing the guard page accesses the memory for what it thinks is the first time (but is really the second), it won't get the guard page exception but will instead get a normal access violation.

    Alternatively, it's possible that your function was called by some code that intentionally passed a pointer to a guard page (or a PAGE_NOACCESS page) and was expecting to receive that guard page exception or access violation exception so that it could dynamically generate the data that should go onto that page. (Simulation of large address spaces via pointer-swizzling is one scenario where this can happen.) Swallowing the exception in IsBadXxxPtr means that the caller's exception handler doesn't get a chance to run, which means that your code rejected a pointer that would actually have been okay, if only you had let the exception handler do its thing.

    "Yeah, but my code doesn't use guard pages or play games with PAGE_NOACCESS pages, so I don't care." Well, for one thing, just because your code doesn't use these features pages doesn't mean that no other code in your process uses them. One of the DLLs that you link to might use guard pages, and your use of IsBadXxxPtr to test a pointer into a guard page will break that other DLL.

    And second, your program does use guard pages; you just don't realize it. The dynamic growth of the stack is performed via guard pages: Just past the last valid page on the stack is a guard page. When the stack grows into the guard page, a guard page exception is raised, which the default exception handler handles by committing a new stack page and setting the next page to be a guard page.

    (I suspect this design was chosen in order to avoid having to commit the entire memory necessary for all thread stacks. Since the default thread stack size is a megabyte, this would have meant that a program with ten threads would commit ten megabytes of memory, even though each thread probably uses only 24KB of that commitment. When you have a small pagefile or are running without a pagefile entirely, you don't want to waste 97% of your commit limit on unused stack memory.)

    "But what should I do, then, if somebody passes me a bad pointer?"

    You should crash.

    No, really.

    In the Win32 programming model, exceptions are truly exceptional. As a general rule, you shouldn't try to catch them. And even if you decide you want to catch them, you need to be very careful that you catch exactly what you want and no more.

    Trying to intercept the invalid pointer and returning an error code creates nondeterministic behavior. Where do invalid pointers come from? Typically they are caused by programming errors. Using memory after freeing it, using uninitialized memory, that sort of thing. Consequently, an invalid pointer might actually point to valid memory, if for example the heap page that used to contain the memory has not been decomitted, or if the uninitialized memory contains a value that when reinterpreted as a pointer just happens to be a pointer to memory that is valid right now. On the other hand, it might point to truly invalid memory. If you use IsBadWritePtr to "validate" your pointers before writing to them, then in the case where it happens to point to memory that is valid, you end up corrupting memory (since the pointer is "valid" and you therefore decide to write to it). And in the case where it happens to point to an invalid address, you return an error code. In both cases, the program keeps on running, and then that memory corruption manifests itself as an "impossible" crash two hours later.

    In other words IsBadWritePtr is really CorruptMemoryIfPossible. It tries to corrupt memory, but if doing so raises an exception, it merely fails the operation.

    Many teams at Microsoft have rediscovered that IsBadXxxPtr causes bugs rather than fixes them. It's not fun getting a bucketful of crash dumps and finding that they are all of the "impossible" sort. You hunt through your code in search of this impossible bug. Maybe you find somebody who was using IsBadXxxPtr or equivalently an exception handler that swallows access violation exceptions and converts them to error codes. You remove the IsBadXxxPtr in order to let the exception escape unhandled and crash the program. Then you run the scenario again. And wow, look, the program crashes in that function, and when you debug it, you find the code that was, say, using a pointer after freeing it. That bug has been there for years, and it was manifesting itself as an "impossible" bug because the function was trying to be helpful by "validating" its pointers, when in fact what it was doing was taking a straightforward problem and turning it into an "impossible" bug.

    There is a subtlety to this advice that you should just crash when given invalid input, which I'll take up next time.

  • The Old New Thing

    Even more about C# anonymous methods, from the source

    • 81 Comments

    If you want to know still more about C# anonymous methods, you can check out the web site of Grant Richins who has an entire category devoted to anonymous methods, and he should know, since he actually implemented them.

    Now that CLR week is over, I'm curious what you all thought of it. Would you like to see another CLR week at some point? Should I stick to Win32? (Or doesn't it matter because I'm an arrogant Microsoft apologist either way?)

  • The Old New Thing

    Not my finest hour: Driving a manual transmission

    • 81 Comments

    Sara Ford messed up her leg last month; you can see her on ScooterCam-1 at TechEd 2007. The best part of the name ScooterCam-1 isn't the ScooterCam; it's the 1. That just gives it that extra little kitchy space technology feel that turns a good name into a great name.

    Anyway, she asked around if somebody with a car with an automatic transmission would be willing to swap with her manual transmission car until her leg got better. To me, driving a stick shift is like changing a flat tire: It's something that everybody should know how to do, but it's not something that anybody is expected to enjoy doing. I offered my car and (long boring part of story deleted) we met at her place and exchanged keys. I got into her car, backed out of the parking space, and... stalled right in front of her.

    Great job there. I really know how to instill confidence.

  • The Old New Thing

    Will dragging a file result in a move or a copy?

    • 80 Comments

    Some people are confused by the seemingly random behavior when you drag a file. Do you get a move or a copy?

    And you're right to be confused because it's not obvious until you learn the secret. Mind you, this secret hasn't changed since 1989, but an old secret is still a secret just the same. (Worse: An old secret is a compatibility constraint.)

    • If Ctrl+Shift are held down, then the operation creates a shortcut.
    • If Shift is held down, then the operation is a move.
    • If Ctrl is held down, then the operation is a copy.
    • If no modifiers are held down and the source and destination are on the same drive, then the operation is a move.
    • If no modifiers are held down and the source and destination are on different drives, then the operation is a copy.

    This is one of the few places where the fact that there are things called "drives" makes itself known to the end user in a significant way.

  • The Old New Thing

    On the almost-feature of floppy insertion detection in Windows 95

    • 80 Comments

    Gosh, that floppy insertion article generated a lot of comments.

    First, to clarify the table: The table is trying to say that if you had a Style A floppy drive, then issuing the magic series of commands would return 1 if a floppy was present, or 0 if the floppy was not present. On the other hand, if you had a Style B floppy drive, then issuing the magic series of commands would return 0 if a floppy was present, or 1 if the floppy was not present. That's what I was trying to say in the table. The answer was consistent within a floppy style, but you first had to know what style you had.

    The downside of waiting until the user uses a floppy for the first time is that you have the it sometimes works and sometimes doesn't problem. Dad buys a new computer and a copy of the Happy Fun Ball game for his son. Dad turns on the computer, and then follows the instructions that come with the Happy Fun Ball package: "Just insert the floppy and follow the instructions on the screen." Dad inserts the floppy and... nothing happens because this is the first time Dad used the floppy, and he was expecting autodetection to work.

    Dad says, "Stupid program doesn't work."

    Dad complains to his co-workers at work. "He loves this game Happy Fun Ball when he visits his cousin's house, so I bought a computer and a copy of Happy Fun Ball, and it doesn't work!"

    Dad tries again that evening and this time it works, because in the meantime, he inserted a floppy to do something else (say, create an emergency boot disk). Bizarre. This just reinforces Dad's impression that computers are unpredictable and he will never understand how to use them.

    One could say that a feature that mysteriously turns itself on and off is worse than a feature that simply doesn't work. At least when it doesn't work, it predictably doesn't work. Human beings value predictability.

    You can't perform the test "the first time the drive is installed" because there is no way to tell that a drive has been installed. (Classic floppy drives are not Plug-and-Play.) Even worse, you can't tell that the user has replaced the Style A floppy drive with a Style B floppy drive. The user will see that floppy insertion detection stopped working and return the drive to the store. "This drive is broken. Floppy insertion detection doesn't work."

    It is also not the case that the ambiguity in the specification indicated a flaw in the specification. The C++ language specification, for example, leaves a lot of behaviors at the discretion of the implementation. This allows implementations to choose a behavior that works best for them. The no-spin-up floppy presence detection algorithm relied on several behaviors which were covered by the specification, and one that was not. It was not part of the original charter for the floppy specification committee to support spinless presence detection; it's just something that my colleague discovered over a decade after the specification was written.

    But the main reason for not bothering is that the benefit was minuscule compared to the cost. Nobody wants floppy drives to spin up as soon as a disk is inserted. That just makes them think they've been attacked by a computer virus. It'd all just be a lot of work for a feature nobody wants. And then you'd all be posting, "I can't believe Microsoft wasted all this effort on floppy insertion detection when they should have fixed insert favorite bug here."

  • The Old New Thing

    How many failure reports does a bug have to get before Windows will fix it?

    • 79 Comments

    When a program crashes or hangs, you are given the opportunity to send an error report to Microsoft. This information is collected by the Windows Error Reporting team for analysis. Occasionally, somebody will ask, "How many failures need to be recorded before the bug is fixed? A thousand? Ten thousand? Ten million?"

    Each team at Microsoft takes very seriously the failures that come in via Windows Error Reporting. Since failures are not uniformly distributed, and since engineering resources are not infinite, you have to dedicate your limited resources to where they will have the greatest effect. Each failure that is collected and reported is basically a vote for that failure. The more times a failure is reported, the higher up the chart it moves, and the more likely it will make the cut. In practice, 50% of the crashes are caused by 1% of the bugs. Obviously you want to focus on that top 1%.

    It's like asking how many votes are necessary to win American Idol. There is no magic number that guarantees victory; it's all relative. As my colleague Paul Betts describes it, "It's like a popularity contest, but of failure."

    Depending on the component, it may take only a few hundred reports to make a failure reach the upper part of the failure charts, or it may take hundreds of thousands. And if the failure has been tracked to a third-party plug-in (which is not always obvious in the crash itself and may require analysis to ferret out), then the information is passed along to the plug-in vendor.

    What about failures in third-party programs? Sure, send those in, too. The votes are still tallied, and if the company has signed up to access Windows Error Reporting data, they can see which failures in their programs are causing the most problems. Signing up for Windows Error Reporting is a requirement for the Windows 7 software logo program. I've also heard but cannot confirm that part of the deal is that if your failure rate reaches some threshold, your logo certification is at risk of revocation.

    If the vendor signed up for Windows Error Reporting but is a slacker and isn't looking at their failures, there's a chance that Microsoft will look at them. Towards the end of the Windows 7 project, the compatibility team looked at the failure data coming from beta releases of Windows 7 to identify the highest-ranked failures in third party programs. The list was then filtered to companies which participated in Windows Error Reporting, and to failures which the vendor had not spent any time investigating. I was one of a group of people asked to study those crashes and basically debug the vendor's program for them.

    My TechReady talk basically consisted of going through a few of these investigations. I may clean some of them up for public consumption and post them here, but it's a lot of work because I'd have to write a brand new program that exhibits the bug, force the bug to trigger (often involving race conditions), then debug the resulting crash dump.

    I know there is a subculture of people who turn off error reporting, probably out of some sense of paranoia, but these people are ultimately just throwing their vote away. Since they aren't reporting the failures, their feedback doesn't make it back to the failure databases, and their vote for fix this bug never gets reported.

    By the way, the method for disabling Windows Error Reporting given in that linked article is probably not the best. You should use the settings in the Windows Error Reporting node in the Group Policy Editor.

    I should've seen this coming: Over time, I've discovered that there are a some hot-button topics that derail any article that mentions them. Examples include UAC, DRM, and digital certificate authorities. As a result, I do my best to avoid any mention of them. I didn't mention digital certificate authorities today, but I did link to an article which mentioned them, and now the subject has overrun the comments like kudzu. I don't need to deal with this nonsense, so I'm just going to kill my promised future article that was related to Windows Error Reporting, (I was also planning on converting my talk on debugging application compatibility issues into future articles, but since that's also related to Windows Error Reporting, I'm going to abandon that too.)

  • The Old New Thing

    When people ask for security holes as features: Silent install of uncertified drivers

    • 79 Comments

    Probably the single greatest source of bluescreen crashes in Windows XP is buggy device drivers. Since drivers run in kernel mode, there is no higher authority checking what they're doing. If some user-mode code runs amok and corrupts memory, it's just corrupting its own memory. The process eventually crashes, but the system stays up. On the other hand, if a driver runs amok and corrupts memory, it's corrupting your system and eventually your machine dies.

    In acknowledgement of the importance of having high-quality drivers, Windows XP warns you when an uncertified driver is being installed. Which leads to today's topic, a question from a device driver author.

    When I try to install any driver, I get a User Consent Dialog box which tells the user that this is an unsigned driver. Is it possible to author a driver installation package that by-passes this user consent dialog box?

    The whole purpose of that dialog is to prevent the situation you desire from happening! [typo fixed 5pm] If you don't want the warning dialog, submit your driver for certification. (For testing purposes, you can sign your drivers with the test root certificate and install the test root certificate before running your setup program. Of course, installing the test root certificate also causes the desktop to read "For test purposes only" as a reminder that your machine is now allowing test-signed drivers to be installed.)

    Driver writers, of course, find the certification process cumbersome and will do whatever they can to avoid it. Because, of course, if you submit your driver for certification, it might fail! This has led to varying degrees of shenanigans to trick the WHQL team into certifying a driver different from the one you intend to use. My favorite stunt was related to my by a colleague who was installing a video card driver whose setup program displayed a dialog that read, roughly, "After clicking OK, do not touch your keyboard or mouse while we prepare your system." After you click OK, the setup program proceeds to move the mouse programmatically all over the screen, opening the Display control panel, clicking on the Advanced button, clicking through various other configuration dialogs, a flurry of activity for what seems like a half a minute. When faced with a setup program that does this, your natural reaction is to scream, "Aaaiiiiigh!"

  • The Old New Thing

    The decoy visual style

    • 79 Comments

    During the development of Windows XP, the visual design team were very cloak-and-dagger about what the final visual look was going to be. They had done a lot of research and put a lot of work into their designs and wanted to make sure that they made a big splash at the E3 conference when Luna was unveiled. Nobody outside the visual styles team, not even me, knew what Luna was going to look like.

    On the other hand, the programmers who were setting up the infrastructure for visual styles needed to have something to test their code against. And something had to go out in the betas.

    The visual styles team came up with two styles. In secret, they worked on Luna. In public, they worked on a "decoy" visual style called "Mallard". (For non-English speakers: A mallard is a type of duck commonly used as the model for decoys.) The ruse was so successful that people were busy copying the decoy and porting it to their own systems. (So much for copyright protection.)

Page 7 of 426 (4,251 items) «56789»