October, 2010

  • The Old New Thing

    Wildly popular computer game? The Windows product team has you covered

    • 30 Comments

    In Windows 95, the most heavily-tested computer game was DOOM. Not because the testers spent a lot of time developing test plans and test harnesses and automated run-throughs. Nope, it was because it was by far the most popular game the Windows 95 team members played to unwind.

    It was a huge breakthrough when DOOM finally ran inside a MS-DOS box and didn't require booting into MS-DOS mode any more. Now you could fire up DOOM without having to exit all your other programs first.

    I've learned that in Windows Vista, the most heavily tested game was World of Warcraft. Most members of the DirectX application compatibility team are WoW players, in addition to a large number in the Windows division overall.

    So if you have a wildly popular computer game for the PC, you can be pretty sure that the Windows team will be all over it. "For quality control purposes, I assure you."

    Related story: How to make sure your network card works with Windows.

  • The Old New Thing

    Why are the keyboard scan codes for digits off by one?

    • 17 Comments

    In Off by one what, exactly?, my colleague Michael Kaplan wrote

    And this decision long ago that caused the scan codes to not line up for these digits when they could have...

    The word that struck me there was "decision".

    Because it wasn't a "decision" to make the scan codes almost-but-not-quite line up with digits. It was just a coincidence.

    If you look at the scan code table from Michael's article

    you can see stretches of consecutive scan codes, broken up by weird places where the consecutive pattern is violated. The weirdness makes more sense when you look at the original IBM PC XT keyboard:

    01
    Esc
    02
    1
    03
    2
    04
    3
    05
    4
    06
    5
    07
    6
    08
    7
    09
    8
    0A
    9
    0B
    0
    0C
    0D
    =
    0E
    0F
    10
    Q
    11
    W
    12
    E
    13
    R
    14
    T
    15
    Y
    16
    U
    17
    I
    18
    O
    19
    P
    1A
    [
    1B
    ]
    1C
    1D
    Ctrl
    1E
    A
    1F
    S
    20
    D
    21
    F
    22
    G
    23
    H
    24
    J
    25
    K
    26
    L
    27
    ;
    28
    '
    29
    `
    2A
    2B
    \
    2C
    Z
    2D
    X
    2E
    C
    2F
    V
    30
    B
    31
    N
    32
    M
    33
    ,
    34
    .
    35
    /
    36
    37
    *
    38
    Alt
    39
    Space
    3A
    Caps

    With this presentation, it becomes clearer how scan codes were assigned: They simply started at 01 and continued through the keyboard in English reading order. (Scan code 00 is an error code indicating keyboard buffer overflow.) The reason for the keyboard scan code being off-by-one from the digits is merely due to the fact that there was one key to the left of the digits. If there were two keys to the left of the digits, they would have been off by two.

    Of course, if the original keyboard designers had started counting from the lower left corner, like all right-thinking mathematically-inclined people, then this sort-of-coincidence would never have happened. The scan codes for the digits would have been 2E through 37, and nobody would have thought anything of it.

    It's a testament to the human brain's desire to find patterns and determine a reason for them that what is really just a coincidence gets interpreted as some sort of conspiracy.

  • The Old New Thing

    Secret passages on Microsoft main campus

    • 33 Comments

    They aren't really "secret passages" but they are definitely underutilized, and sometimes they provide a useful shortcut.

    At the northwest corner of Building 50, there are two doors. One leads to a stairwell that takes you to the second floor. That's the one everybody uses. The other door is a service entrance that takes you to the cafeteria. If your office is on the second or third floor in the northwest corner, it's faster to use the service hallway to get to the cafeteria than it is to walk to the core of the building and take the main stairs.

    There is a service tunnel that runs from the first floor of Building 86 (entrance next to the first floor cafeteria elevator) through the loading dock to the Central Garage, where you can continue to Building 85 or 84. This is not really any faster than the regular route, but it does have the advantage of being underground and mostly indoors, which is a major benefit when it is cold or raining.

    What is your favorite secret passage at your workplace?

  • The Old New Thing

    Why is the origin at the upper left corner?

    • 32 Comments

    Via the Suggestion Box, Dirk Declercq asks why the default client-area coordinate origin is at the upper left corner instead of the lower left corner. (I assume he also intends for the proposed client-area coordinate system to have the y-coordinate increasing as you move towards the top of the screen.)

    Well, putting the client area origin at the lower left would have resulted in the client coordinate space not being a simple translation of the screen coordinate space. After all, the screen origin is at the upper left, too. Windows was originally developed on left-to-right systems, where the relationship between client coordinates and screen coordinates was a simple translation. Having the y-coordinate increase as you move down the screen but move up the client would have just been one of those things you did to be annoying.

    Okay, so why not place the screen origin at the lower left, too?

    Actually, OS/2 does this, and DIBs do it as well. And then everybody wonders why their images are upside-down.

    Turns out that the people who designed early personal computers didn't care much for mathematical theory. The raster gun of a television set starts at the upper left corner, continues to the right, and when it reaches the right-hand edge of the screen, it jumps back to the left edge of the screen to render the second scan line. Why did television sets scan from the top down instead of from the bottom up? Beats me. You'll have to ask the person who invented the television (who, depending on whom you ask, is Russian or American or German or Scottish or some other nationality entirely), or more specifically, whoever invented the scanning model of image rendering, why they started from the top rather than from the bottom.

    Anyway, given that the video hardware worked from top to bottom, it was only natural that the memory for the video hardware work the same way. (The Apple II famously uses a peculiar memory layout in order to save a chip.)

    Who knows, maybe if the design of early computers had been Chinese, we would be wondering why the origin was in the upper right corner with the pixels in column-major order.

    Bonus chatter: Even mathematicians can't get their story straight. Matrices are typically written with the origin element at the upper left. Which reminds me of a story from the old Windows 95 days. The GDI folks received a defect report from the user interface team, who backed up their report with a complicated mathematical explanation. The GDI team accepted the change request with the remark, "We ain't much fer book lernin."

  • The Old New Thing

    Why does each drive have its own current directory?

    • 39 Comments

    Commenter Dean Earley asks, "Why is there a 'current directory' AND an current drive? Why not merge them?"

    Pithy answer: Originally, each drive had its own current directory, but now they don't, but it looks like they do.

    Okay, let's unwrap that sentence. You actually know enough to answer the question yourself; you just have to put the pieces together.

    Set the wayback machine to DOS 1.0. Each volume was represented by a drive letter. There were no subdirectories. This behavior was carried forward from CP/M.

    Programs from the DOS 1.0 era didn't understand subdirectories; they referred to files by just drive letter and file name, for example, B:PROGRAM.LST. Let's fire up the assembler (compilers were for rich people) and assemble a program whose source code is on the A drive, but sending the output to the B drive.

    A>asm foo       the ".asm" extension on "foo" is implied
    Assembler version blah blah blah
    Source File: FOO.ASM
    Listing file [FOO.LST]: NUL throw away the listing file
    Object file [FOO.OBJ]: B: send the object file to drive B

    Since we gave only a drive letter in response to the Object file prompt, the assembler defaults to a file name of FOO.OBJ, resulting in the object file being generated as B:FOO.OBJ.

    Okay, now let's introduce subdirectories into DOS 2.0. Suppose you have want to assemble A:\SRC\FOO.ASM and put the result into B:\OBJ\FOO.OBJ. Here's how you do it:

    A> B:
    B> CD \OBJ
    B> A:
    A> CD \SRC
    A> asm foo
    Assembler version blah blah blah
    Source File: FOO.ASM
    Listing file [FOO.LST]: NUL
    Object file [FOO.OBJ]: B:
    

    The assembler reads from A:FOO.ASM and writes to B:FOO.OBJ, but since the current directory is tracked on a per-drive basis, the results are A:\SRC\FOO.ASM and B:\OBJ\FOO.OBJ as desired. If the current directory were not tracked on a per-drive basis, then there would be no way to tell the assembler to put its output into a subdirectory. As a result, DOS 1.0 programs were effectively limited to operating on files in the root directory, which means that nobody would put files in subdirectories (because their programs couldn't access them).

    From a DOS 1.0 standpoint, changing the current directory on a drive performs the logical equivalent of changing media. "Oh look, a completely different set of files!"

    Short attention span.

    Remembering the current directory for each drive has been preserved ever since, at least for batch files, although there isn't actually such a concept as a per-drive current directory in Win32. In Win32, all you have is a current directory. The appearance that each drive has its own current directory is a fake-out by cmd.exe, which uses strange environment variables to create the illusion to batch files that each drive has its own current directory.

    Dean continues, "Why not merge them? I have to set both the dir and drive if i want a specific working dir."

    The answer to the second question is, "They already are merged. It's cmd.exe that tries to pretend that they aren't." And if you want to set the directory and the drive from the command prompt or a batch file, just use the /D option to the CHDIR command:

    D:\> CD /D C:\Program Files\Windows NT
    C:\Program Files\Windows NT> _
    

    (Notice that the CHDIR command lets you omit quotation marks around paths which contain spaces: Since the command takes only one path argument, the lack of quotation marks does not introduce ambiguity.

  • The Old New Thing

    The evolution of the ICO file format, part 4: PNG images

    • 25 Comments

    We finish our tour of the evolution of the ICO file format with the introduction of PNG-compressed images in Windows Vista.

    The natural way of introducing PNG support for icon images would be to allow the biCompression field of the BITMAP­INFO­HEADER to take the value BI_PNG, in which case the image would be represented not by a DIB but by a PNG. After all, that's why we have a biCompression field: For forward compatibility with future encoding systems. Wipe the dust off your hands and declare victory.

    Unfortunately, it wasn't that simple. If you actually try using ICO files in this format, you'll find that a number of popular icon-authoring tools crash when asked to load a PNG-compressed icon file for editing.

    The problem appeared to be that the new BI_PNG compression type appeared at a point in the parsing code where it was not prepared to handle such a failure (or the failure was never detected). The solution was to change the file format so that PNG-compressed images fail these programs' parsers at an earlier, safer step. (This is sort of the opposite of penetration testing, which keeps tweaking data to make the failure occur at a deeper, more dangerous step.)

    Paradoxically, the way to be more compatible is to be less compatible.

    The format of a PNG-compressed image consists simply of a PNG image, starting with the PNG file signature. The image must be in 32bpp ARGB format (known to GDI+ as Pixel­Format­32bpp­ARGB). There is no BITMAP­INFO­HEADER prefix, and no monochrome mask is present.

    Since we had to break compatibility with the traditional format for ICO images, we may as well solve the problem we saw last time of people who specify an incorrect mask. With PNG-compressed images, you do not provide the mask at all; the mask is derived from the alpha channel on the fly. One fewer thing for people to get wrong.

  • The Old New Thing

    The evolution of the ICO file format, part 1: Monochrome beginnings

    • 36 Comments

    This week is devoted to the evolution of the ICO file format. Note that the icon resource format is different from the ICO file format; I'll save that topic for another day.

    The ICO file begins with a fixed header:

    typedef struct ICONDIR {
        WORD          idReserved;
        WORD          idType;
        WORD          idCount;
        ICONDIRENTRY  idEntries[];
    } ICONHEADER;
    

    idReserved must be zero, and idType must be 1. The idCount describes how many images are included in this ICO file. An ICO file is really a collection of images; the theory is that each image is an alternate representation of the same underlying concept, but at different sizes and color depths. There is nothing to prevent you, in principle, from creating an ICO file where the 16×16 image looks nothing like the 32×32 image, but your users will probably be confused.

    After the idCount is an array of ICONDIRECTORY entries whose length is given by idCount.

    struct IconDirectoryEntry {
        BYTE  bWidth;
        BYTE  bHeight;
        BYTE  bColorCount;
        BYTE  bReserved;
        WORD  wPlanes;
        WORD  wBitCount;
        DWORD dwBytesInRes;
        DWORD dwImageOffset;
    };
    

    The bWidth and bHeight are the dimensions of the image. Originally, the supported range was 1 through 255, but starting in Windows 95 (and Windows NT 4), the value 0 is accepted as representing a width or height of 256.

    The wBitCount and wPlanes describe the color depth of the image; for monochrome icons, these value are both 1. The bReserved must be zero. The dwImageOffset and dwBytesInRes describe the location (relative to the start of the ICO file) and size in bytes of the actual image data.

    And then there's bColorCount. Poor bColorCount. It's supposed to be equal to the number of colors in the image; in other words,

    bColorCount = 1 << (wBitCount * wPlanes)

    If wBitCount * wPlanes is greater than or equal to 8, then bColorCount is zero.

    In practice, a lot of people get lazy about filling in the bColorCount and set it to zero, even for 4-color or 16-color icons. Starting in Windows XP, Windows autodetects this common error, but its autocorrection is slightly buggy in the case of planar bitmaps. Fortunately, almost nobody uses planar bitmaps any more, but still, it would be in your best interest not to rely on the autocorrection performed by Windows and just set your bColorCount correctly in the first place. An incorrect bColorCount means that when Windows tries to find the best image for your icon, it may choose a suboptimal one because it based its decision on incorrect color depth information.

    Although it probably isn't true, I will pretend that monochrome icons existed before color icons, because it makes the storytelling easier.

    A monochome icon is described by two bitmaps, called AND (or mask) and XOR (or image, or when we get to color icons, color). Drawing an icon takes place in two steps: First, the mask is ANDed with the screen, then the image is XORed. In other words,

    pixel = (screen AND mask) XOR image

    By choosing appropriate values for mask and image, you can cover all the possible monochrome BLT operations.

    mask image result operation
    0 0 (screen AND 0) XOR 0 = 0 blackness
    0 1 (screen AND 0) XOR 1 = 1 whiteness
    1 0 (screen AND 1) XOR 0 = screen nop
    1 1 (screen AND 1) XOR 1 = NOT screen invert

    Conceptually, the mask specifies which pixels from the image should be copied to the destination: A black pixel in the mask means that the corresponding pixel in the image is copied.

    The mask and image bitmaps are physically stored as one single double-height DIB. The image bitmap comes first, followed by the mask. (But since DIBs are stored bottom-up, if you actually look at the bitmap, the mask is in the top half of the bitmap and the image is in the bottom half).

    In terms of file format, each icon image is stored in the form of a BITMAPINFO (which itself takes the form of a BITMAPINFOHEADER followed by a color table), followed by the image pixels, followed by the mask pixels. The biCompression must be BI_RGB. Since this is a double-height bitmap, the biWidth is the width of the image, but the biHeight is double the image height. For example, a 16×16 icon would specify a width of 16 but a height of 16 × 2 = 32.

    That's pretty much it for classic monochrome icons. Next time we'll look at color icons.

    Still, given what you know now, the following story will make sense.

    A customer contacted the shell team to report that despite all their best efforts, they could not get Windows to use the image they wanted from their .ICO file. Windows for some reason always chose a low-color icon instead of using the high-color icon. For example, even though the .ICO file had a 32bpp image available, Windows always chose to use the 16-color (4bpp) image, even when running on a 32bpp display.

    A closer inspection of the offending .ICO file revealed that the bColorCount in the IconDirectoryEntry for all the images was set to 1, regardless of the actual color depth of the image. The table of contents for the .ICO file said "Yeah, all I've got are monochrome images. I've got three 48×48 monochrome images, three 32×32 monochrome images, and three 16×16 monochrome images." Given this information, Windows figured, "Well, given those choices, I guess that means I'll use the monochrome one." It chose one of images (at pseudo-random), and went to the bitmap data and found, "Oh, hey, how about that, it's actually a 16-color image. Okay, well, I guess I can load that."

    In summary, the .ICO file was improperly authored. Patching each IconDirectoryEntry in a hex editor made the icon work as intended. The customer thanked us for our investigation and said that they would take the issue up with their graphic design team.

  • The Old New Thing

    How do I programmatically invoke Aero Peek on a window?

    • 27 Comments

    A customer wanted to know if there was a way for their application to invoke the Aero Peek feature so that their window appeared and all the other windows on the system turned transparent.

    No, there is no such programmatic interface exposed. Aero Peek is a feature for the user to invoke, not a feature for applications to invoke so they can draw attention to themselves.

    Yes, I realize you wrote a program so awesome that all other programs pale in comparison, and that part of your mission is to make all the other programs literally pale in comparison to your program.

    Sorry.

    Maybe you can meet up with that other program that is the most awesome program in the history of the universe and share your sorrows over a beer.

  • The Old New Thing

    Non-psychic debugging: Why you're leaking timers

    • 22 Comments

    I was not involved in this debugging puzzle, but I was informed of its conclusions, and I think it illustrates both the process of debugging as well as uncovering a common type of defect. I've written it up in the style of a post-mortem.

    A user reported that if they press and hold the F2 key for about a minute, our program eventually stops working. According to Task Manager, our User object count has reached the 10,000 object limit, and closer inspection revealed that we had created over 9000 timer objects.

    We ran the debugger and set breakpoints on SetTimer and KillTimer to print to the debugger each timer ID as it was created and destroyed. Visual inspection of the output revealed that all but one of the IDs being created was matched with an appropriate destruction. We re-ran the scenario with a conditional breakpoint on SetTimer set to fire when that bad ID was set. It didn't take long for that breakpoint to fire, and we discovered that we were setting the timer against a NULL window handle.

    A different developer on the team arrived at the same conclusion by a different route. Instead of watching timers being created and destroyed, the developer dumped each timer message before it was dispatched and observed that most of the entries were associated with NULL window handles.

    Two independent analyses came to the same conclusion: We were creating a bunch of thread timers and not destroying them.

    A closer inspection of the code revealed that thread timers were not intended in the first place. Each time the user presses F2, the code calls SetTimer and passes a window handle it believes to be non-NULL. The timer is destroyed in the window procedure's WM_TIMER handler, but since the timer was registered against the wrong window handle, the WM_TIMER is never received by the intended target's window procedure, and the timer is never destroyed.

    The window handle is NULL due to a defect in the code which handles the F2 keypress: The handle that the code wanted to use for the timer had not yet been set. (It was set by a later step of F2 processing.) The timer was being set by a helper function which is called both before and after the code that sets the handle, but it obviously was written on the assumption that it would only be called after.

    To reduce the likelihood of this type of defect being introduced in the future, we're going to introduce a wrapper function around SetTimer which asserts that the window handle is non-NULL before calling SetTimer. (In the rare case that we actually want a thread timer, we'll have a second wrapper function called SetThreadTimer.)

    I haven't seen the wrapper function, but I suspect it goes something like this:

    inline UINT_PTR SetWindowTimer(
        __in HWND hWnd, // NB - not optional
        __in UINT_PTR nIDEvent,
        __in UINT uElapse,
        __in_opt TIMERPROC lpTimerFunc)
    {
        assert(hWnd != NULL);
        return SetTimer(hWnd, nIDEvent, uElapse, lpTimerFunc);
    }
    
    inline UINT_PTR SetThreadTimer(
        __in UINT uElapse,
        __in_opt TIMERPROC lpTimerFunc)
    {
        return SetTimer(NULL, 0, uElapse, lpTimerFunc);
    }
    
    __declspec(deprecated)
    WINUSERAPI
    UINT_PTR
    WINAPI
    SetTimer(
        __in_opt HWND hWnd,
        __in UINT_PTR nIDEvent,
        __in UINT uElapse,
        __in_opt TIMERPROC lpTimerFunc);
    

    There are few interesting things here.

    First, observe that the annotation for the first parameter to SetWindowTimer is __in rather than __in_opt. This indicates that the parameter cannot be NULL. Code analysis tools can use this information to attempt to identify potential defects.

    Second, observe that the SetThreadTimer wrapper function omits the first two parameters. For thread timers, the hWnd passed to SetTimer is always NULL and the nIDEvent is ignored.

    Third, after the two wrapper functions, we redeclare the SetTimer, but mark it as deprecated so the compiler will complain if somebody tries to call the original function instead of one of the two wrappers. (The __declspec(deprecated) extended attribute is a nonstandard Microsoft extension.)

    Exercise: Why did I use __declspec(deprecated) instead of #pragma deprecated(SetTimer)?

  • The Old New Thing

    Hacking Barney the dinosaur for fun (no profit)

    • 21 Comments

    Many years ago, Microsoft produced a collection of interactive toys called ActiMates, and one of the features was that television programs could broadcast an encoded signal which would enable the toy to interact with the program. The idea would be that the Barney doll would do something that was coordinated with what was happening on Barney & Friends.

    When this came out, a bunch of us wondered what it would take to hack into the device and get Barney to say and do, um, very un-Barneyish things. One of us managed to get a schematic for the device, but since none of us was an electrical engineer, that pretty much dead-ended the project.

    Over ten years later, I learned that we weren't the only people to get that idea. I met someone who told me that he managed to get his hands on the internal devkit for the ActiMates series and control a Barney doll from his PC. Not satisfied with being limited to the built-in Barney phrases, he was able to "take additional creative steps with the devkit" to stream his own replacement audio to the device (although he was never able to get the sound quality of his streamed audio to sound as good as the built-in phrases). As a result, he could make Barney say whatever he wanted, and if he really felt like it, he could wake up all the Barney toys in his apartment complex at midnight and give orders to his robot army of purple dinosaurs.

    The catch was that his robot army most likely would have consisted of just one robot.

    Bonus reading: SWEETPEA: Software Tools for Programmable Embodied Agents [pdf], Michael Kaminsky, Paul Dourish, W. Keith Edwards, Anthony LaMarca, Michael Salisbury and Ian Smith, CHI'99.

Page 1 of 3 (27 items) 123