History

  • The Old New Thing

    The ways people mess up IUnknown::QueryInterface

    • 33 Comments

    When you're dealing with application compatibility, you discover all sorts of things that worked only by accident. Today, I'll talk about some of the "creative" ways people mess up the IUnknown::QueryInterface method.

    Now, you'd think, "This interface is so critical to COM, how could anybody possible mess it up?"

    Forgetting to respond to IUnknown.

    Sometimes you get so excited about responding to all these great interfaces that you forget to respond to IUnknown itself. We have found objects where

    IShellFolder *psf = some object;
    IUnknown *punk;
    psf->QueryInterface(IID_IUnknown, (void**)&punk);
    
    fails with E_NOINTERFACE!

    Forgetting to respond to your own interface.

    There are some methods which return an object with a specific interface. And if you query that object for its own interface, its sole reason for existing, it says "Huh?"

    IShellFolder *psf = some object;
    IEnumIDList *peidl, *peidl2;
    psf->EnumObjects(..., &peidl);
    peidl->QueryInterface(IID_IEnumIDList, (void**)&peidl2);
    

    There are some objects which return E_NOINTERFACE to the QueryInterface call, even though you're asking the object for itself! "Sorry, I don't exist," it seems they're trying to say.

    Forgetting to respond to base interfaces.

    When you implement a derived interface, you implicitly implement the base interfaces, so don't forget to respond to them, too.

    IShellView *psv = some object;
    IOleView *pow;
    psv->QueryInterface(IID_IOleView, (void**)&pow);
    
    Some objects forget and the QueryInterface fails with E_NOINTERFACE.

    Requiring a secret knock.

    In principle, the following two code fragments are equivalent:

    IShellFolder *psf;
    IUnknown *punk;
    CoCreateInstance(CLSID_xyz, ..., IID_IShellFolder, (void**)&psf);
    psf->QueryInterface(IID_IUnknown, (void**)&punk);
    
    CoCreateInstance(CLSID_xyz, ..., IID_IUnknown, (void**)&punk);
    punk->QueryInterface(IID_IShellFolder, (void**)&psf);
    

    In reality, some implementations mess up and fail the second call to CoCreateInstance. The only way to create the object successfully is to create it with the IShellFolder interface.

    Forgetting to say "no" properly.

    One of the rules for saying "no" is that you have to set the output pointer to NULL before returning. Some people forget to do that.

    IMumble *pmbl;
    punk->QueryInterface(IID_IMumble, (void**)&pmbl);
    

    If the QueryInterface succeeds, then pmbl must be non-NULL on return. If it fails, then pmbl must be NULL on return.

    The shell has to be compatible with all these buggy objects because if it weren't, customers would get upset and the press would have a field day. Some of the offenders are big-name programs. If they broke, people would report, "Don't upgrade to Windows XYZ, it's not compatible with <big-name program>." Conspiracy-minded folks would shout, "Microsoft intentionally broke <big-name program>! Proof of unfair business tactics!"

    [Raymond is currently on vacation; this message was pre-recorded.]

  • The Old New Thing

    Some files come up strange in Notepad

    • 29 Comments

    David Cumps discovered that certain text files come up strange in Notepad.

    The reason is that Notepad has to edit files in a variety of encodings, and when its back against the wall, sometimes it's forced to guess.

    Here's the file "Hello" in various encodings:

    48 65 6C 6C 6F

    This is the traditional ANSI encoding.

    48 00 65 00 6C 00 6C 00 6F 00

    This is the Unicode (little-endian) encoding with no BOM.

    FF FE 48 00 65 00 6C 00 6C 00 6F 00

    This is the Unicode (little-endian) encoding with BOM. The BOM (FF FE) serves two purposes: First, it tags the file as a Unicode document, and second, the order in which the two bytes appear indicate that the file is little-endian.

    00 48 00 65 00 6C 00 6C 00 6F

    This is the Unicode (big-endian) encoding with no BOM. Notepad does not support this encoding.

    FE FF 00 48 00 65 00 6C 00 6C 00 6F

    This is the Unicode (big-endian) encoding with BOM. Notice that this BOM is in the opposite order from the little-endian BOM.

    EF BB BF 48 65 6C 6C 6F

    This is UTF-8 encoding. The first three bytes are the UTF-8 encoding of the BOM.

    2B 2F 76 38 2D 48 65 6C 6C 6F

    This is UTF-7 encoding. The first five bytes are the UTF-7 encoding of the BOM. Notepad doesn't support this encoding.

    Notice that the UTF7 BOM encoding is just the ASCII string "+/v8-", which is difficult to distinguish from just a regular file that happens to begin with those five characters (as odd as they may be).

    The encodings that do not have special prefixes and which are still supported by Notepad are the traditional ANSI encoding (i.e., "plain ASCII") and the Unicode (little-endian) encoding with no BOM. When faced with a file that lacks a special prefix, Notepad is forced to guess which of those two encodings the file actually uses. The function that does this work is IsTextUnicode, which studies a chunk of bytes and does some statistical analysis to come up with a guess.

    And as the documentation notes, "Absolute certainty is not guaranteed." Short strings are most likely to be misdetected.

    [Raymond is currently on vacation; this message was pre-recorded.]

  • The Old New Thing

    Why does the Resource Compiler complain about strings longer than 255 characters?

    • 13 Comments
    As we learned in a previous entry, string resources group strings into bundles of 16, each Unicode string in the bundle prefixed by a 16-bit length. Why does the Resource Compiler complain about strings longer than 255 characters?

    This is another leftover from 16-bit Windows.

    Back in the Win16 days, string resources were also grouped into bundles of 16, but the strings were in ANSI, not Unicode, and the prefix was only an 8-bit value.

    And 255 is the largest length you can encode in an 8-bit value.

    If your 32-bit DLL contains strings longer than 255 characters, then 16-bit programs would be unable to read those strings.

    This is largely irrelevant nowadays, but the warning remained in the Resource Compiler for quite some time.

    It appears to be gone now. Good riddance.

  • The Old New Thing

    Why is the line terminator CR+LF?

    • 40 Comments
    This protocol dates back to the days of teletypewriters. CR stands for "carriage return" - the CR control character returned the print head ("carriage") to column 0 without advancing the paper. LF stands for "linefeed" - the LF control character advanced the paper one line without moving the print head. So if you wanted to return the print head to column zero (ready to print the next line) and advance the paper (so it prints on fresh paper), you need both CR and LF.

    If you go to the various internet protocol documents, such as RFC 0821 (SMTP), RFC 1939 (POP), RFC 2060 (IMAP), or RFC 2616 (HTTP), you'll see that they all specify CR+LF as the line termination sequence. So the the real question is not "Why do CP/M, MS-DOS, and Win32 use CR+LF as the line terminator?" but rather "Why did other people choose to differ from these standards documents and use some other line terminator?"

    Unix adopted plain LF as the line termination sequence. If you look at the stty options, you'll see that the onlcr option specifies whether a LF should be changed into CR+LF. If you get this setting wrong, you get stairstep text, where

    each
        line
            begins
    
    where the previous line left off. So even unix, when left in raw mode, requires CR+LF to terminate lines. The implicit CR before LF is a unix invention, probably as an economy, since it saves one byte per line.

    The unix ancestry of the C language carried this convention into the C language standard, which requires only "\n" (which encodes LF) to terminate lines, putting the burden on the runtime libraries to convert raw file data into logical lines.

    The C language also introduced the term "newline" to express the concept of "generic line terminator". I'm told that the ASCII committee changed the name of character 0x0A to "newline" around 1996, so the confusion level has been raised even higher.

    Here's another discussion of the subject, from a unix perspective.

  • The Old New Thing

    On a server, paging = death

    • 40 Comments

    Chris Brumme's latest treatise contained the sentence "Servers must not page". That's because on a server, paging = death.

    I had occasion to meet somebody from another division who told me this little story: They had a server that went into thrashing death every 10 hours, like clockwork, and had to be rebooted. To mask the problem, the server was converted to a cluster, so what really happened was that the machines in the cluster took turns being rebooted. The clients never noticed anything, but the server administrators were really frustrated. ("Hey Clancy, looks like number 2 needs to be rebooted. She's sucking mud.") [Link repaired, 8am.]

    The reason for the server's death? Paging.

    There was a four-bytes-per-request memory leak in one of the programs running on the server. Eventually, all the leakage filled available RAM and the server was forced to page. Paging means slower response, but of course the requests for service kept coming in at the normal rate. So the longer you take to turn a request around, the more requests pile up, and then it takes even longer to turn around the new requests, so even more pile up, and so on. The problem snowballed until the machine just plain keeled over.

    After much searching, the leak was identified and plugged. Now the servers chug along without a hitch.

    (And since the reason for the cluster was to cover for the constant crashes, I suspect they reduced the size of the cluster and saved a lot of money.)

  • The Old New Thing

    More on the AMD64 calling convention

    • 2 Comments

    Josh Williams picks up the 64-bit ball with an even deeper discussion of the AMD64 (aka x64) calling convention and things that go wrong when you misdeclare your function prototypes.

  • The Old New Thing

    Why do text files end in Ctrl+Z?

    • 34 Comments

    Actually, text files don't need to end in Ctrl+Z, but the convention persists in certain circles. (Though, fortunately, those circles are awfully small nowadays.)

    This story requires us to go back to CP/M, the operating system that MS-DOS envisioned itself as a successor to. (Since the 8086 envisioned itself as the successor to the 8080, it was natural that the operating system for the 8086 would view itself as the successor to the primary operating system on the 8080.)

    In CP/M, files were stored in "sectors" of 128 bytes each. If your file was 64 byte long, it was stored in a full sector. The kicker was that the operating system tracked the size of the file as the number of sectors. So if your file was not an exact multiple of 128 bytes in size, you needed some way to specify where the "real" end-of-file was.

    That's where Ctrl+Z came in.

    By convention, the unused bytes at the end of the last sector were padded with Ctrl+Z characters. According to this convention, if you had a program that read from a file, it should stop when it reads a Ctrl+Z, since that meant that it was now reading the padding.

    To retain compatibility with CP/M, MS-DOS carried forward the Ctrl+Z convention. That way, when you transferred your files from your old CP/M machine to your new PC, they wouldn't have garbage at the end.

    Ctrl+Z hasn't been needed for years; MS-DOS records file sizes in bytes rather than sectors. But the convention lingers in the "COPY" command, for example.
  • The Old New Thing

    Why are dialog boxes initially created hidden?

    • 17 Comments

    You may not have noticed it until you looked closely, but dialog boxes are actually created hidden initially, even if you specify WS_VISIBLE in the template. The reason for this is historical.

    Rewind back to the old days (we're talking Windows 1.0), graphics cards are slow and CPUs are slow and memory is slow. You can pick a menu option that displays a dialog and wait a second or two for the dialog to get loaded off the floppy disk. (Hard drives are for the rich kids.) And then you have to wait for the dialog box to paint.

    To save valuable seconds, dialog boxes are created initially hidden and all typeahead is processed while the dialog stays hidden. Only after the typeahead is finished is the dialog box finally shown. And if you typed far ahead enough and hit Enter, you might even have been able to finish the entire dialog box without it ever being shown! Now that's efficiency.

    Of course, nowadays, programs are stored on hard drives and you can't (normally) out-type a hard drive, so this optimization is largely wasted, but the behavior remains for compatibility reasons.

    Actually this behavior still serves a useful purpose: If the dialog were initially created visible, then the user would be able to see all the controls being created into it, and watch as WM_INITDIALOG ran (changing default values, hiding and showing controls, moving controls around...) This is both ugly and distracting. ("How come the box comes up checked, then suddenly unchecks itself before I can click on it?")

  • The Old New Thing

    Why do operations on "byte" result in "int"?

    • 38 Comments
    (The following discussion applies equally to C/C++/C#, so I'll use C#, since I talk about it so rarely.)

    People complain that the following code elicits a warning:

    byte b = 32;
    byte c = ~b;
    // error CS0029: Cannot implicitly convert type 'int' to 'byte'
    

    "The result of an operation on 'byte' should be another 'byte', not an 'int'," they claim.

    Be careful what you ask for. You might not like it.

    Suppose we lived in a fantasy world where operations on 'byte' resulted in 'byte'.

    byte b = 32;
    byte c = 240;
    int i = b + c; // what is i?
    

    In this fantasy world, the value of i would be 16! Why? Because the two operands to the + operator are both bytes, so the sum "b+c" is computed as a byte, which results in 16 due to integer overflow. (And, as I noted earlier, integer overflow is the new security attack vector.)

    Similarly,

    int j = -b;
    
    would result in j having the value 224 and not -32, for the same reason.

    Is that really what you want?

    Consider the following more subtle scenario:

    struct Results {
     byte Wins;
     byte Games;
    };
    
    bool WinningAverage(Results captain, Results cocaptain)
    {
     return (captain.Wins + cocaptain.Wins) >=
            (captain.Games + cocaptain.Games) / 2;
    }
    

    In our imaginary world, this code would return incorrect results once the total number of games played exceeded 255. To fix it, you would have to insert annoying int casts.

     return ((int)captain.Wins + cocaptain.Wins) >=
            ((int)captain.Games + cocaptain.Games) / 2;
    
    So no matter how you slice it, you're going to have to insert annoying casts. May as well have the language err on the side of safety (forcing you to insert the casts where you know that overflow is not an issue) than to err on the side of silence (where you may not notice the missing casts until your Payroll department asks you why their books don't add up at the end of the month).
  • The Old New Thing

    Defrauding the WHQL driver certification process

    • 81 Comments

    In a comment to one of my earlier entries, someone mentioned a driver that bluescreened under normal conditions, but once you enabled the Driver Verifier (to try to catch the driver doing whatever bad thing it was doing), the problem went away. Another commenter bemoaned that WHQL certification didn't seem to improve the quality of the drivers.

    Video drivers will do anything to outdo their competition. Everybody knows that they cheat benchmarks, for example. I remember one driver that ran the DirectX "3D Tunnel" demonstration program extremely fast, demonstrating how totally awesome their video card is. Except that if you renamed TUNNEL.EXE to FUNNEL.EXE, it ran slow again.

    There was another one that checked if you were printing a specific string used by a popular benchmark program. If so, then it only drew the string a quarter of the time and merely returned without doing anything the other three quarters of the time. Bingo! Their benchmark numbers just quadrupled.

    Anyway, similar shenanigans are not unheard of when submitting a driver to WHQL for certification. Some unscrupulous drivers will detect that they are being run by WHQL and disable various features so they pass certification. Of course, they also run dog slow in the WHQL lab, but that's okay, because WHQL is interested in whether the driver contains any bugs, not whether the driver has the fastest triangle fill rate in the industry.

    The most common cheat I've seen is drivers which check for a secret "Enable Dubious Optimizations" switch in the registry or some other place external to the driver itself. They take the driver and put it in an installer which does not turn the switch on and submit it to WHQL. When WHQL runs the driver through all its tests, the driver is running in "safe but slow" mode and passes certification with flying colors.

    The vendor then takes that driver (now with the WHQL stamp of approval) and puts it inside an installer that enables the secret "Enable Dubious Optimizations" switch. Now the driver sees the switch enabled and performs all sorts of dubious optimizations, none of which were tested by WHQL.

Page 43 of 50 (500 items) «4142434445»