January, 2010

  • The Old New Thing

    If you are trying to understand an error, you may want to look up the error code to see what it means instead of just shrugging


    A customer had a debug trace log and needed some help interpreting it. The trace log was generated by an operating system component, but the details aren't important to the story.

    I've attached the log file. I think the following may be part of the problem.

    [07/17/2005:18:31:19] Creating process D:\Foo\bar\blaz.exe
    [07/17/2005:18:31:19] CreateProcess failed with error 2

    Any ideas?

    Bob Smith
    Senior Test Engineer
    Tailspin Toys

    What struck me is that Bob is proud of the fact that he's a Senior Test Engineer, perhaps because it makes him think that we will take him more seriously because he has some awesome title.

    But apparently a Senior Test Engineer doesn't know what error 2 is. There are some error codes that you end up committing to memory because you run into them over and over. Error 32 is ERROR_SHARING_VIOLATION, error 3 is ERROR_PATH_NOT_FOUND, and in this case, error 2 is ERROR_FILE_NOT_FOUND.

    And even if Bob didn't have error 2 memorized, he should have known to look it up.

    Error 2 is ERROR_FILE_NOT_FOUND. Does the file D:\Foo\bar\blaz.exe exist?

    No, it doesn't.


    Bob seems to have shut off his brain and decided to treat troubleshooting not as a collaborative effort but rather as a game of Twenty Questions in which the person with the problem volunteers as little information as possible in order to make things more challenging. I had to give Bob a nudge.

    Can you think of a reason why the system would be looking at D:\Foo\bar\blaz.exe? Where did you expect it to be looking for blaz.exe?

    This managed to wake Bob out of his stupor, and the investigation continued. (And no, I don't remember what the final resolution was. I didn't realize I would have to remember the fine details of this support incident three years later.)

  • The Old New Thing

    What idiot would hard-code the path to Notepad?


    There seemed to be a great deal of disbelief that anybody would hard-code the path to Notepad.

    Here's one example and here's another.

    There's a large class of problems that go like this:

    I'm running Program X, and when I tell it to view the error log, I get this error message: CreateProcess of "C:\Windows\Notepad.exe errorlog.txt" failed: error 2: The system cannot find the file specified. What is wrong and how do I fix it?

    Obviously, the file C:\Windows\Notepad.exe is missing. But how can that be? Well, Windows Server 2008 bit the bullet and removed one of the copies of Notepad. Once you learn this, troubleshooting the above problem becomes a simple exercise in psychic debugging.

    My psychic powers tell me that you're running Windows Server 2008. The Notepad program no longer exists in the Windows directory; it's now in the system directory. Find the setting for your program that lets you change the program used for viewing error logs and tell it to use C:\Windows\System32\Notepad.exe.

    Of course, this tip works only if the program permits you to change the program used for viewing error logs. If they hard-code the path, then you'll have to find some other workaround. (For example, you might try using the CorrectFilePaths shim.)

  • The Old New Thing

    The wrong way to determine the size of a buffer


    A colleague of mine showed me some code from a back-end program on a web server. Fortunately, the company that wrote this is out of business. Or at least I hope they're out of business!

    size = 16384;
    while (size && IsBadReadPtr(buffer, size)) {
  • The Old New Thing

    The goggles, they do nothing!: Gemulator advertisement from 1992


    Darek Mihocka, proprietor of emulators.com, and whom I linked to a few years ago, released the source code to Atari ST emulator Gemulator 9.0, and in celebration, he also posted his 1992 promotional video to YouTube: Part 1, Part 2, Part 3.

    Warning: It's a really bad video. The music, the hair, the cheesy video effects, the bad acting, oh did I mention the hair? But it's also a trip in the wayback machine.

    Pre-emptive snarky comment: "You idiot, you got the quote wrong. It's 'My eyes! The goggles do nothing!'"

  • The Old New Thing

    But that's not all: The story of that cheesy Steve Ballmer Windows video


    While it's true that the cheesy Steve Ballmer Windows video had bad music, bad hair, and bad acting, it's also true that all that cheese was intentional.

    That video was produced for and shown at the Company Meeting, back when a mainstay of the Company Meeting was spoofs of popular television advertisements—what today would be called "virally popular"—with Bill Gates and other senior executives taking the starring roles. The "Crazy Steve" video was a spoof of late-night television advertisements, the most direct influence being the popular-at-the-time Crazy Eddie commercials.

    So enjoy the "Crazy Steve" video, but don't fool yourself into thinking this was a real commercial.

    Bonus commercial chatter: I don't know the story behind the commercial produced by crack-smoking monkeys. It was shot in one of the Microsoft old-campus buildings, but I don't recognize any of the actors. This leaves open the horrific possibility that the advertisement was for real!

    Extra bonus chatter: The original Windows XP commercial, featuring Madonna's Ray of Light, had to be abandoned less than two months before launch thanks to the events of September 11, 2001: A commercial featuring people flying was deemed to be in bad taste so soon after the event. I don't know how they did it, but the marketing department managed to put together a new ad campaign in less than two months. (This also explains why some online ads for Windows XP employed the song Ray of Light, even though the song had nothing to do with the new Windows XP ad campaign: They were leftovers which could be salvaged because they didn't depict flying.)

    Too bad, because I liked the original campaign.

    Double secret bonus chatter: Could this be proto-Kylie?

    Update: an article which includes another story about the filming of the spoof commercial.

  • The Old New Thing

    During process termination, the gates are now electrified


    It turns out that my quick overview of how processes exit on Windows XP was already out of date when I wrote it. Mind you, the information is still accurate for Windows XP (as far as I know), but the rules changed in Windows Vista.

    What about critical sections? There is no "Uh-oh" return value for critical sections; EnterCriticalSection doesn't have a return value. Instead, the kernel just says "Open season on critical sections!" I get the mental image of all the gates in a parking garage just opening up and letting anybody in and out.

    In Windows Vista, the gates don't go up. Instead they become electrified!

    If during DLL_PROCESS_DETACH at process termination on Windows Vista you call EnterCriticalSection on a critical section that has been orphaned, the kernel no longer responds by just letting you through. Instead, it says, "Oh dear, things are in unrecoverably bad shape. Best to just terminate the process now." If you try to enter an orphaned critical section during process shutdown, the kernel simply calls TerminateProcess on the current process!

    It's sort of like the movie Speed: If the thread encounters a critical section that causes it to drop below 50 miles per hour, it blows up.

    Fortunately, this error doesn't change the underlying analysis of How my lack of understanding of how processes exit on Windows XP forced a security patch to be recalled.

    But it also illustrates how the details of process shutdown are open to changes in the implementation at any time, so you shouldn't rely on them. Remember the classical model for how processes exit: You cleanly shut down all your worker threads, and then call ExitProcess. If you don't follow that model (and given the current programming landscape, you pretty have no choice but to abandon that model, what with DLLs creating worker threads behind your back), it's even more important that you follow the general guidance of not doing anything scary in your DllMain function.

  • The Old New Thing

    Historically, Windows didn't tend to provide functions for things you can already do yourself


    Back in the old days, programmers were assumed to be smart and hardworking. Windows didn't provide functions for things that programs could already do on their own. Windows worried about providing functionality for thing that programs couldn't do. That was the traditional separation of responsibilities in operating systems of that era. If you wanted somebody to help you with stuff you could in principle do yourself, you could use a runtime library or a programming framework.

    You know how to open files, read them, and write to them; therefore, you could write your own file copy function. You know how to walk a linked list; the operating system didn't provide a linked list management library. There are apparently some people who think that it's the job of an operating system to alleviate the need for implementing them yourself; actually that's the job of a programming framework or tools library. Windows doesn't come with a finite element analysis library either.

    You can muse all you want about how things would have been better if Windows had had an installer library built-in from the start or even blame Windows for having been released without one, but then again, the core unix operating system doesn't have an application installer library either. The unix kernel has functions for manipulating the file system and requesting memory from the operating system. Standards for installing applications didn't arrive until decades later. And even though such standards exist today (as they do in Windows), there's no law of physics preventing a vendor from writing their own installation program that doesn't adhere to those standards and which can do anything they want to the system during install. After all, at the end of the day, installing an application's files is just calling creat and write with the right arguments.

    Commenter Archangel remarks, "At least if the ACL route had been taken, the installers would have had to be fixed - and fixed they would have been, when the vendors realised they didn't run on XP."

    These arguments remind me of the infamous "Step 3: Profit" business plan of the Underpants Gnomes.

    • Step 1: Require every Windows application to adhere to new rules or they won't run on the next version of Windows.
    • ...
    • Step 3: Windows is a successful operating system without applications which cause trouble when they break those rules.

    It's that step 2 that's the killer. Because the unwritten step 2 is "All applications stop working until the vendors fix them."

    Who's going to fix the the bill-printing system that a twelve-year-old kid wrote over a decade ago, but which you still use to run your business. (I'm not making this up.) What about that shareware program you downloaded three years ago? And it's not just software where the authors are no longer available. The authors may simply not have the resources to go back and update every single program that they released over the past twenty years. There are organizations with thousands of install scripts which are used to deploy their line-of-business applications. Even if they could fix ten scripts a day, it'd take them three years before they could even start thinking about upgrading to the next version of Windows. (And what about those 16-bit applications? Will they have to be rewritten as 32-bit applications? How long will that take? Is there even anybody still around who understands 16-bit Windows enough to be able to undertake the port?)

  • The Old New Thing

    Why aren't compatibility workarounds disabled when a debugger is attached?


    Ken Hagan wonders why compatibility workarounds aren't simply disabled when a debugger is attached.

    As I noted earlier, many compatibility workarounds are actually quicker than the code that detects whether the workaround would be needed.

    BOOL IsZoomed(HWND hwnd)
      return GetWindowLong(hwnd, GWL_STYLE) & WS_MAXIMIZED;

    Now suppose you find a compatibility problem with some applications that expect the IsZoomed function to return exactly TRUE or FALSE. You then change the function to something like this:

    BOOL IsZoomed(HWND hwnd)
      return (GetWindowLong(hwnd, GWL_STYLE) & WS_MAXIMIZED) != 0;

    Now, we add code to enable the compatibility workaround only if the application is on the list of known applications which need this workaround:

    BOOL IsZoomed(HWND hwnd)
      if (GetWindowLong(hwnd, GWL_STYLE) & WS_MAXIMIZED) {
          if (IsApplicationCompatibilityWorkaroundRequired(ISZOOMED_TRUEFALSE)) {
              return TRUE;
          } else {
              return WS_MAXIMIZED;
      } else {
        return FALSE;

    What was a simple flag test now includes a check to see whether an application compatibility workaround is required. These checks are not cheap, because the compatibility infrastructure needs to look up the currently-running application in the compatibility database, check that the version of the application that is running is the one the compatibility workaround is needed for (which could involve reading the file version resource or looking for other identifying clues), and then returning either the compatible answer (TRUE) or the answer that resulted from the original simple one-line function.

    So not only is the function slower (having to do a compatibility check), it also looks really stupid.

    Oh wait, now we also have to stick in a debugger check:

    BOOL IsZoomed(HWND hwnd)
      if (GetWindowLong(hwnd, GWL_STYLE) & WS_MAXIMIZED) {
          if (!IsDebuggerPresent() &&
             IsApplicationCompatibilityWorkaroundRequired(ISZOOMED_TRUEFALSE)) {
              return TRUE;
          } else {
              return WS_MAXIMIZED;
      } else {
        return FALSE;

    And then people complain that Windows is slow and bloated: A simple one-line function ballooned into ten lines.

    Another reason why these compatibility workarounds are left intact when a debugger is running is that changing program behavior based on whether a debugger is attached would prevent application vendors from debugging one problem because all sorts of new problems suddenly got injected.

    Suppose you support Program X, and you get a report of a security vulnerability in your program. You run the program under the debugger, and when you run the alleged exploit code, you find that the program doesn't behave the same as it does when the debugger is not attached. Some compatibility workaround that was active when your program is run normally is being suppressed, and the change in behavior changes your program enough that the alleged security exploit doesn't behave quite the same.

    When run outside the debugger, the program crashes, but when run under the debugger, the program displays a strange error message but manages to keep from crashing. Congratulations, you introduced a Heisenbug.

    And then you say, "There's something wrong with the debugger. It must be a bug in Windows."

    Pre-emptive Yuhong Bao comment: The heap manager switches to an alternate algorithm if it detects a debugger, and the CloseHandle function raises an exception if running under the debugger.

  • The Old New Thing

    How about not granting debug privileges to users?


    Commenter Yuhong Bao suggests, "How about not granting debug privileges on the user? This will make bypassing the protection impossible."

    This is such a great idea that Windows has worked that way for years.

    Normal non-administrative users do not have debug privilege. They can only debug processes that they already have PROCESS_ALL_ACCESS to. In other words, non-administrative users can only pwn processes that they already pwn. No protection is being bypassed since you had full access in the first place.

    The SeDebugPrivilege allows you to debug any process, even those to which you do not have full access. This is clearly dangerous, which is why it's not granted to non-administrative users by default.

    Yuhong Bao also suggests, "How about separating the dangerous activities from the non-dangerous activities, or better, only allowing approved programs to do the dangerous activities?" (Where dangerous activities are defined as things that modify the program behavior.) I'm assuming this is discussing limiting the capabilities of SeDebugPrivilege, since in the absence of SeDebugPrivilege, the scope of your abilities is limited to things you already had the ability to do anyway; debugging didn't add anything new.

    But even if you limited SeDebugPrivilege to nondestructive actions, you can still lose the farm. This imaginary SeLimitedDebugPrivilege would still let you read a target process's memory, which means you can do things like steal passwords and snoop on the activities of other users.

    The last suggestion is to "only allow approved programs to do the dangerous activities." Again, I'm assuming this is discussing limiting the capabilities of SeDebugPrivilege, because without SeDebugPrivilege there is no new danger. But even in that limited context, what is an "approved program"? Approved by whom?

    Must the program be digitally signed by Microsoft? I suspect people who write debuggers which compete with, say, Microsoft Visual Studio, would be upset if they had to submit their debugger to Microsoft for approval. And what are the requirements for receiving this approval? Does the debugger have to pass some battery of tests like WHQL? There are already plenty of readers of this Web site who reject WHQL as useless. Would this "debugger certification" also be useless?

    Or maybe approval consists of merely being digitally signed at all? There are plenty of readers of this Web site who object to the high cost of obtaining a digital certificate (US$399; I don't think the $99 discounted version works for code signing.) And there are also plenty of readers who consider code signing to be payola and designed to maximize profit rather than effectiveness.

    Or do you mean that the program needs to be listed in some new registry key called something like Approved Debuggers? Then what's to stop a rogue program from just auto-approving itself by writing to the Approved Debuggers registry key on its own?

    But then again, all this is a pointless discussion once you realize that SeDebugPrivilege is granted by default only to administrators. And since administrators already pwn the machine, there's no protection that SeDebugPrivilege bypasses: You already bypassed it when you became an administrator.

  • The Old New Thing

    Can you get rotating an array to run faster than O(n²)?


    Some follow-up remarks to my old posting on rotating a two-dimensional array:

    Some people noticed that the article I linked to purporting to rotate the array actually transposes it. I was wondering how many people would pick up on that.

    I was surprised that people confused rotating an array (or matrix) with creating a rotation matrix. They are unrelated operations; the only thing they have in common are the letters r-o-t-a-t-i. A matrix is a representation of a linear transformation, and a rotation matrix is a linear transformation which rotates vectors. In other words, applying the rotation matrix to a vector produces a new vector which is a rotated version of the original vector. The linear transformation is a function of one parameter: It takes a vector and produces a new vector. A rotation matrix is a matrix which rotates other things. Whereas rotating an array is something you do to the array. The array is the thing being rotated, not the thing doing the rotating. It didn't even occur to me that people would confuse the two. It's the difference a phone dial and dialing a phone.

    Showing that you cannot rotate an array via matrix multiplication is straightforward. Suppose there were a matrix R which rotated an array (laid out in the form of a matrix) clockwise. The result of rotating the identity matrix would be a a matrix with 1's along the diagonal from upper right to lower left, let's call that matrix J. Then we have RI = J, and therefore R = J. Now apply R to both sides: RRI = RJ = I and therefore R² = I. But clearly rotating clockwise twice is not the identity for n ≥ 2. (Rotating clockwise twice is turning upside-down.)

    A more mechanical way to see this is to take the equation R = J and show that J does not perform the desired operation; just try it on the matrix with 1 in the upper left entry and 0's everywhere else.

    And since it's one of those geeky math pastimes to see how many differents proofs you can come up with for a single result, the third way to show that rotation cannot be effected by matrix multiplication is to observe that the transformation is not linear. (That's the magical algebra-theoretical way of showing it, which is either so obvious you can tell just by looking at it or so obscure it defies comprehension.) [The transformation viewed as a transformation on matrices rather than a transformation on column vectors is indeed linear, but the matrix for that would be an n² × n² matrix, and the operation wouldn't be matrix multiplication, so that doesn't help us here.]

    The last question raised by this exercise was whether you could do better than O(n²). Computer science students spend so much time trying to push the complexity of an algorithm down that they neglect to learn how to tell that you can't go any lower. In this case, you obviously can't do better than O(n²) because every single one of the n² entries in the array needs to move (except of course the center element if n is odd). If you did less than O(n²) of work, then for sufficiently large n, you will end up not moving some array elements, which would be a failure to complete the required operation.

    Bonus chatter: Mind you, you can do better than O(n²) if you change the rules of the problem. For example, if you allow pretending to move the elements, say by overloading the [] operator, then you can perform the rotation in O(1) time by just writing a wrapper:

    struct IArray
      virtual int& Element(int x, int y) = 0;
      virtual ~IArray() = 0;
    class RotatedArray : public IArray {
     RotatedArray(IArray *p) : m_p(p) { }
     ~RotatedArray() { delete m_p; }
     int& Element(int x, int y) {
      return m_p->Element(y, x);
     IArray *m_p;
    void RotateInPlace(IArray *& p, int N)
     p = new RotatedArray(p);

    This pseudo-rotates the elements by changing the accessor. Cute but doesn't actually address the original problem, which said that you were passed an array, not an interface that simulates an array.

Page 1 of 4 (32 items) 1234