May, 2008

  • The Old New Thing

    Data breakpoints are based on the linear address, not the physical address

    • 6 Comments

    When you ask the debugger to set a read or write breakpoint, the breakpoint fires only if the address is read from or written to by the address you specify. If the memory is mapped to another address and modified at that other address, then your breakpoint won't see it.

    For example, if you have multiple views on the same data, then modifications to that data via alternate addresses will not trigger the breakpoint.

    The hardware breakpoint status is part of the processor context, which is maintained on a per-thread basis. Each thread maintains its own virtualized hardware breakpoint status. You don't notice this in practice because debuggers are kind enough to replicate the breakpoint state across all threads in a process so that the breakpoint fires regardless of which thread triggers it. But that replication typically doesn't extend beyond the process you're debugging; the debugger doesn't bother replicating your breakpoints into other processes! This means that if you set a write breakpoint on a block of shared memory, and the write occurs in some other process, your breakpoint won't fire since it's not your process that wrote to it.

    When you call into kernel mode, there is another context switch, this time between user mode and kernel mode, and the kernel mode context of course doesn't have your data breakpoint. Which is a good thing, because if that data breakpoint fired in kernel mode, how is your user-mode debugger expected to be able to make any sense of it? The breakpoint fired when executing code that user mode doesn't have permission to access, and it may have fired while the kernel mode code owned an important critical section or spinlock, a critical section the debugger itself may very well need. Imagine if the memory were accessed by the keyboard driver. Oops, now your keyboard processing has been suspended. Even worse, what if the memory were accessed by a a hardware interrupt handler? Hardware interrupt handlers can't even access paged memory, much less allow user-mode code to run.

    This "program being debugged takes a lock that the debugger itself needs" issue isn't usually a problem when a user-mode debugger debugs a user-mode process, because the locks held by a user-mode process typically affect only that process. If a process takes a critical section, sure that may deadlock the process, but the debugger is not part of the process, so it doesn't care.

    Of course, the "debugger is its own world" principle falls apart if the debugger is foolish enough to require a lock that the program being debugged also uses. Debugger authors therefore must be careful to avoid these sorts of cross-process dependencies. (News flash: Writing a debugger is hard.) You can still run into trouble if the program being debugged has done something with global consequences like create a fullscreen topmost window (thereby covering the debugger) or installed a global keyboard hook (thereby interfering with typing). If you've tried debugging a system service, you may have run into this sort of cross-process deadlock. For example, if you debug the service that is responsible for the networking client, and the debugger tries to access the network (for example, to load symbols), you've created a deadlock since the debugger needs to access the network, which it can't do because the networking service is stopped in the debugger.

    Hardware debugging breakpoints are a very convenient tool for chasing down bugs, but you have to understand their limitations.

    Additional reading: Data breakpoint oddities.

  • The Old New Thing

    Another interesting detail from the analysis of Windows Error Reporting data for Explorer

    • 47 Comments

    I was in a meeting last year where I learned an interesting tidbit of information. One of the people at the meeting was looking at the error reports submitted against Explorer, and the breakdown went something like this. For the purpose of discussion, the number of reports have been normalized into "units", the precise meaning of which is left unspecified, but is meaningful for comparison purposes.†

    RankCauseUnits
    1XYZ.v2 Virus6 million
    2XYZ.v3 Virus5.5 million
    3XYZ.v1 Virus5 million
    4XYZ.v1 Virus4.5 million
    5XYZ.v2 Virus4.5 million
    6XYZ.v2 Virus4 million
    7Bug 27182850,000

    The XYZ virus (not its real name) and its variants together are responsible for the top six categories of Explorer crashes, and by an enormous margin. Seventh place, an actual bug, comes in at only 1/80th the rate of number six; if you group all the XYZ virus failures together, then the combined virus failures outnumber the most popular Explorer bug by a factor of nearly 600.

    I remember reading a report that half of Explorer crashes can be directly attributable to malware. Seeing the top Explorer crash swamped by a single virus really drives that point home.‡

    Footnotes

    †I don't know what these units mean either.

    ‡The anti-malware team is very interested in this data, because when a new category of Windows crashes suddenly spikes in popularity, there's a decent chance that a new virus is on the loose.

  • The Old New Thing

    Apparently I've been promoted by mistake all these years

    • 7 Comments

    A few years ago, this posting on the secret to getting promoted caught my eye.

    After several years at Microsoft, I had an epiphany today. I came to the realization that the best way, rather the only way -- to get promoted is to demonstrate the ability to hire and retain top talent. Other factors do go a long way but without the ability to hire, you are going to reach your glass ceiling sooner than you think.

    I have done absolutely nothing to demonstrate this ability, and if I managed to hire or retain anybody, it was purely by accident and for that I apologize.

    As far as I'm aware, I've never done anything listed in the article as "the only way" to get promoted. I must have been promoted by mistake. (And I suspect Mini-Microsoft might have a thing or two to say to the claim that the only way to get a promotion at Microsoft is to hire, hire, hire.)

    Yes, I know that the phrase "the only way" was written for rhetorical effect and is not to be taken literally. I'm making a joke, people!

    And as it so happens, I also took the text out of context. The target audience for the remarks was senior management, not dorky programmer types like me. So, strike two.

    Aftermath: After reading this entry in my "upcoming entries" blog queue accessible only to Microsoft employees, a fellow employee emailed me to say that this Web site was a factor in deciding to come to Microsoft, so my claim that I have done absolutely nothing is incorrect. I wrote back, "Well, maybe so, but it's also the case that I haven't been promoted since you were hired."

    The response: "I guess I must not be top talent."

  • The Old New Thing

    Why always "Windows XP" and "Windows Vista" and not just "XP" and "Vista"?

    • 53 Comments

    When the Internet Explorer folks announced that they were going to call their next version of Internet Explorer Internet Explorer 7 for Windows XP and Internet Explorer 7 in Windows Vista, many people responded to the awkward name by suggesting that it be shortened to Internet Explorer 7 XP and Internet Explorer 7 Vista. Why the longer names?

    Lawyers.

    Microsoft's own trademark guidelines† specify that the product names are Windows XP and Windows Vista and not just XP and Vista. The trademark is on the entire phrase, not just the last word. Furthermore, the trademark guidelines specify that products may not append just XP or Vista to their names; they have to say X for Windows XP or X for Windows Vista.

    In an earlier era, you had to be careful to say Windows NT and not just NT for the same reason. You see, the name NT is a registered trademark of Northern Telecom, and part of the agreement with "the other NT" is that the Windows product would always be used with the word Windows in front.

    If you took a close look at the Windows 2000 box, you may have seen the phrase "Built on NT Technology." I don't know how hard it was to do, but I suspect a good amount of negotiations with Northern Telecom took place to allow Microsoft to use that alternate formulation without the word Windows in front. Indeed, if you looked really closely at the box, you'd have found a trademark acknowledgement for Northern Telecom deep in the fine print.

    Lawyers by training are very cautious people. After all, a new lawsuit against Microsoft gets filed approximately once every thirty seconds.¶ They're probably also responsible for all your Office# shortcuts on the Start menu being named Microsoft Office This 2007 and Microsoft Office That 2007 instead of This 2007 and That 2007, or even (shocking!) just This and That. It's a daring move, and lawyers don't like to be daring. Nobody ever got sued for playing it safe.††

    Nitpicker's corner (guest appearance)

    *Just burning off a footnote marker because I don't like asterisks.

    †I myself violate some of these guidelines because I try to write like a human being and not a robot. Only robots say Windows-based programs.‡

    ‡That statement is not literally true. Here's a reformulation of that statement for the benefit of robots:§ "People who say Windows-based programs sound like robots."

    §That statement is also not literally true. Here's a reformulation of that statement for the benefit of people who take a robotic approach to reading: "Here's a reformulation of that statement for the benefit of people who take a robotic approach to reading:"

    ||Burning off another footnote marker because I don't like parallel lines either.

    ¶An exaggeration, not a statement of fact.

    #s/Office/Microsoft® Office™ System/**

    **I have not researched whether that's the correct way of writing it.

    ††Okay, maybe somebody somewhere has gotten sued for playing it safe. It was just a catchy sentence, not a statement of fact.

  • The Old New Thing

    Why every advertising agency needs to have a review panel of twelve-year-old boys

    • 28 Comments

    The Office of Government and Commerce needed a new logo, and they hired a design firm to develop one for £14,000. The conversations between the design company and their client may have gone something like this:

    Designer: Okay, so here are some ideas we came up with.

    Client: I don't like any of them. The lines are too thick. I want something lighter, more friendly and less bureaucratic.

    Second meeting.

    Designer: Here are some variations that use thinner lines.

    Client: Nope, these are all still ugly. Give me something with more circles. Less angular.

    Third meeting.

    Designer: We came up with some variations on your circle idea.

    Client: No, we can't use these. They have colour in them. A coloured logo would make our letterhead much more expensive.

    And so on. Finally, the design team comes up with something the client approves of. Now it's time to order the mousepads and unveil the logo.

    Oops.

    A friend of mine remarked, "This is why every advertising agency needs to have a review panel of twelve-year-old boys."

  • The Old New Thing

    The Phantom Bug: Why doesn't MessageBox work from my WM_NCDESTROY handler?

    • 5 Comments

    Adrian McCarthy ran into a problem where MessageBox didn't work when called from a WM_NCDESTROY handler. You already know how to solve this; you just have to connect the dots. See if you can do it on your own before the answer is revealed.

  • The Old New Thing

    Why are some GDI functions named ExtXxx instead of XxxEx?

    • 21 Comments

    By convention, an enhanced version of a function Xxx is called XxxEx, but there are many GDI functions that don't follow this conventions, most notably ExtTextOut, which should have been named TextOutEx under the XxxEx convention. Why don't the GDI functions follow that convention?

    Because they were named before the XxxEx convention was established.

    Nothing nefarious, just an artifact of history.

  • The Old New Thing

    If users can shut down the machine, it's not a security hole if they can shut down the machine

    • 17 Comments

    One great way to come up with a dubious security vulnerability is to take something completely innocuous and wrap it inside layer upon layer of obfuscation, and then you proclaim that the obfuscation is the vulnerability. Here's an example based on an actual dubious vulnerability report:

    Title: Native NT application can shut down computer

    Description: I have written this native NT application which bypasses the Win32 layer and talks directly to the low-level native NT functions. By calling various native NT functions, I can cause a dialog box to appear which includes a Shut Down button that shuts down the computer if the user clicks on it.

    Well, sure, you can go through all that to shut down the computer. Or you can save yourself all the hassle and just call ExitWindowsEx. You see, that dialog box you found includes a "Shut Down" button only if the user that ran it has permission to shut down the computer in the first place.

    It is not a security vulnerability that users with permission to shut down the computer can shut down the computer.

    This is another example of people getting excited that they were able to do something unusual. But just because you can do something unusual doesn't mean that you've found a security vulnerability.

  • The Old New Thing

    The Big Red Switch really was big and red

    • 34 Comments

    In this article on compatibility between the .NET Framework versions 1.1 and 2.0, there is a passing mention of a setting nicknamed the "Big Red Switch".

    The power switch on the original IBM PC really was big and red. Well, orange-red. Here's a picture of the power switch on an IBM PC-AT. Decide for yourself what color it is.

    In college, the hallway that led to the basement lab where most of the computer science students did their work had a big red switch, a pushbutton, labeled "Emergency power shutoff". Nobody was sure whether it actually was hooked up to anything or was simply a joke, but nobody wanted to take the chance of finding out, either. I remember one evening some people were goofing off with a basketball, and somebody missed a pass, and the ball hit the wall dangerously close to the big red switch. They (and everybody in the lab) immediately realized what had almost happened, and the group sheepishly took their ball outside.

    Some time later, the legend of the big red switch was ultimately resolved. The morning after a particularly bad rainstorm, one of my colleagues came into the computing center and explained that he had gone down to the computer lab the previous evening and found it flooded, with water still coming in. Realizing that electricity plus water equals danger, he went over to the big red switch and pushed it. And true to its label, it shut off the power to the lab and the central file server. We were all in awe that he got to push the big red button for a legitimate reason. An opportunity like that comes only once in a lifetime.

  • The Old New Thing

    Psychic debugging: Why does ExitProcess(1) produce an exit code of zero?

    • 18 Comments

    Here's a question that came from a customer. By now, you should already have the necessary psychic powers to answer it.

    Our program calls ExitProcess(1) to indicate that it exited unsuccessfully. The process that launched our program waits for the program to exit and then calls GetExitCodeProcess to retrieve the exit code. The function succeeds, but the exit code is zero! How can this be?

    Hint: Read about how processes terminate on Windows XP.

Page 1 of 4 (39 items) 1234