July, 2008

  • The Old New Thing

    How did the invalid floating point operand exception get raised when I disabled it?

    • 23 Comments

    Last time, we learned about the dangers of uninitialized floating point variables but left with a puzzle: Why wasn't this caught during internal testing?

    I dropped a hint when I described how SNaNs work: You have to ask the processor to raise an exception when it encounters a signaling NaN, and the program disabled that exception. Why was an exception being raised when it had been disabled?

    The clue to the cause was that the customer that was encountering the crash reported that it tended to happen after they printed a report. It turns out that the customer's printer driver was re-enabling the invalid operand exception in its DLL_PROCESS_ATTACH handler. Since the exception was enabled, the SNaN exception, which was previously masked, was now live, and it crashed the program.

    I've also seen DLLs change the floating point rounding state in their DLL_PROCESS_ATTACH handler. This behavior can be traced back to old versions of the C runtime library which reset the floating point state as part of their DLL_PROCESS_ATTACH; this behavior was corrected as long ago as 2002 (possibly even earlier; I don't know for sure). Obviously that printer driver was even older. Good luck convincing the vendor to fix a bug in a driver for a printer they most likely don't even manufacture any more. If anything, they'll probably just treat it as incentive for you to buy a new printer.

    When you load external code into your process, you implicitly trust that the code won't screw you up. This is just another example of how a DLL can inadvertently screw you up.

    Sidebar

    One might argue that the LoadLibrary function should save the floating point state before loading a library and restore it afterwards. This is an easy suggestion to make in retrospect. Writing software would be so much easier if people would just extend the courtesy of coming up with a comprehensive list of "bugs applications will have that you should protect against" before you design the platform. That way, when a new class of application bugs is found, and they say "You should've protected against this!", you can point to the list and say, "Nuh, uh, you didn't put it on the list. You had your chance."

    As a mental exercise for yourself: Come up with a list of "all the bugs that the LoadLibrary function should protect against" and how the LoadLibrary function would go about doing it.

  • The Old New Thing

    Don't be helpless: You can put things together, it doesn't have to be a single command

    • 29 Comments

    Humans are distinguished among all animal species by their advanced development of and heavy reliance on tools. Don't betray your ancestors. Use those tools you have.

    For example, during the debugging of a thread pool problem, it looked like somebody did a PostThreadMessage to a thread pool thread and left the message unprocessed after the thread pool function returned. Who could it have been? Well, one idea was to see if there were any DLLs in the system which called both QueueUserWorkItem and PostThreadMessage.

    I did a little legwork and contributed the following analysis to the mail thread:

    Of all the DLLs loaded into the process, the following call PostThreadMessage:

    SHLWAPI.dll 77D72436 221 PostThreadMessageA
    SHELL32.dll 77D78596 222 PostThreadMessageW
    ole32.dll 77D78596 222 PostThreadMessageW
    ... (list trimmed; you get the idea) ...

    Of those DLLs, these also call QueueUserWorkItem:

    shlwapi.dll
    shell32.dll
    ... (list trimmed; you get the idea) ...

    Astounded, somebody wanted to know how I came up with that list.

    Nothing magic. You have the tools, you have a brain, so connect the dots.

    The lm debugger command lists all the DLLs loaded into the process. Copy the output from the debugger window and paste it into a text file. Now write a little script that takes each line of the text file and does a link /dump /imports on the corresponding DLL. I happen to prefer perl for this sort of thing, but you can use a boring batch file if you like.

    for /f %i in (dlls.txt) do ^
    @echo %i & link /dump /imports %i | findstr PostThreadMessage
    

    Scrape the results off the screen, prune out the misses, and there you have it.

    "I tried that, but the result wasn't in the same format as what you posted."

    Well, yeah. There's no law that says that I can't manually reformat the data before presenting it in an email message. Since there were only a dozen hits, it's not worth writing a script to do that type of data munging. Typing "backspace, home, up-arrow" twelve times is a lot faster than writing a script to take the output of the above batch file and turn it into the output I used in the email message.

    Another boring batch file filters the list to those DLLs that also call QueueUserWorkItem. Writing it (or a script in your favorite language) is left as an exercise.

    No rocket science here. Just taking a bunch of tools and putting them together to solve a problem. That's what your brain is for, after all.

  • The Old New Thing

    Uninitialized floating point variables can be deadly

    • 49 Comments

    A colleague of mine related to me this story about uninitialized floating point variables. He had a function that went something like this, simplified for expository purposes. The infoType parameter specified which piece of information you're requesting, and depending on what you're asking for, one or the other of the output parameters may not contain a meaningful result.

    BOOL GetInfo(int infoType, int *intResult, double *dblResult)
    {
     int intValue;
     double dblValue;
    
     switch (infoType) {
     case NUMBER_OF_GLOBS:
      intValue = ...;
      break;
    
     case AVERAGE_GLOB_SIZE:
      dblValue = ...;
      break;
     ...
     }
     *intResult = intValue;
     *dblResult = dblValue;
     ...
    }
    

    After the product shipped, they started geting crash reports. This was in the days before Windows Error Reporting, so all they had to work from was the faulting address, which implicated the line *dblResult = dblValue.

    My colleague initially suspected that dblResult was an invalid pointer, but a search of the entire code base ruled out that possibility.

    The problem was the use of an uninitialized floating point variable. Unlike integers, not all bit patterns are valid for use as floating point values. There is a category of values known as signaling NaNs, or SNaN for short, which are special "not a number" values. If you ask the processor to, it will keep an eye out for these signaling NaNs and raise an "invalid operand" exception when one is encountered. (This, after all, is the whole reason why it's called a signaling NaN.)

    The problem was that, if you are sufficiently unlucky, the leftover values in the memory assigned to the dblValue will happen to have a bit pattern corresponding to a SNaN. And then when the processor tries to copy it to dblResult, then exception is raised.

    There's another puzzle lurking behind this one: Why wasn't this problem caught in internal testing? We'll learn about that next time.

  • The Old New Thing

    When I double-click an Excel spreadsheet, Excel opens but the document doesn't

    • 39 Comments

    Sometime last year, we got a report from a customer that whenever he double-clicks an Excel spreadsheet, Excel starts up, but the document isn't loaded. Instead, he gets an error message saying that document could not be found. He has to go to the Open dialog and open the spreadsheet manually. This report was routed to the shell team, since it appeared to be an Explorer problem.

    We studied the file type registration for Excel documents; those were fine. We suggested some breakpoints, but everything looked good there, too. Even ddespy reported that the "Open" command was being sent to Excel correctly. So far as the shell was concerned, it was sending the command that Excel registered as the "Please send me this command when you want to open a document," and yet when Explorer sent that very command, Excel responded with "Huh? What are you talking about?"

    This indicated that an investigation of Excel itself was in order, and an hour later, the problem was identified. Under Excel Options, Advanced, General, there is an option called "Ignore other applications that use Dynamic Data Exchange", and the customer had enabled this option. The consequences of enabling this option are described in Knowledge Base article Q211494: Since Excel was configured to ignore DDE requests, it ignored the DDE request that came from Explorer.

    A misconfigured copy of Excel resulted in an error message that by all indications looked like an Explorer bug. That's pretty much par for the course in Explorer-ville.

  • The Old New Thing

    Microspeak: Well, actually management-speak

    • 30 Comments

    I hate management-speak.

    Here's an example from an internal Web site.

    The purpose of this Web site is two-fold.

    1. Create a reference source (best practices) where individuals can learn how to plan/facilitate and leverage their X activities most effectively.
    2. Establish a library of X material teams can utilize.

    Wow, let's look at the first stated purpose. It goes on for so long and uses blatant management-speak such as "facilitate" and "leverage" that by the time it's over, I forget how the sentence started. Going back and reading it again, it appears that the first item is identical to the second! It's just that the first item says it in a more confusing way.

    The second item shows evidence of management wordsmithing as well. "Utilize" instead of "use". An action verb like "establish" rather than a state verb like "to be". And those changes actually render the sentence incorrect. The purpose of the site isn't to "establish" a library; the purpose of the site is to be that library. Establishing the library is what you did when you created the site in the first place! That purpose has already been completed.

    I think the people who built this Web site just copied their annual review goals into the Web site text, forgetting that the review goal describes what you are supposed to do, not what the thing you created is supposed to do.

    Here's how I would have written it:

    This Web site is a library of X materials.
  • The Old New Thing

    What does each country claim for its own?

    • 202 Comments

    One of the things that fascinates me is how each country's view of history is clouded by its own chauvinism. I was reminded of this when researchers were able to reconstruct the original recording from a phonautograph which predated Edison's phonograph, thereby adding another claim to the mix of who invented sound recording.

    I think the most contentious invention belongs to human flight. It seems that every country on the planet has a claim to being the pioneer in this field. I'm particularly amused that both France and Brazil claim Alberto Santos-Dumont as their own. Failure is an orphan.

    When I visited Portugal, I asked one of the professors, "What is it that students in Portugal are taught is Portugal's greatest contribution to humanity?"

    The professor had to stop and think for a while before formulating an answer.

    "Portugal has not fared very well of late economically. Our best years were long ago. I would say that our greatest contribution was our accomplishments during the Age of Discoveries."

    My question to you, dear reader, is to tell us what students in your country are taught are your country's greatest achievements, or alternatively, what students believe them to be. These beliefs need not be based in fact. I'm more interested in what it is that people want you to believe whether or not it's actually true.

    For starters, here's my list of what students are taught (or end up believing) are the great accomplishments of the United States:

    • Democracy (even though it existed for millennia prior, and some might argue whether what we have today still counts as one)
    • Powered flight (The Wright Brothers)
    • The telephone (Alexander Graham Bell)
    • The light bulb, phonograph, and motion pictures (Thomas Edison)
    • The camera (George Eastmann)
    • The elevator (Elisha Otis)
    • The automobile (Henry Ford)

    Many of these are contested, and two of them are flat-out wrong: Elisha Otis did not invent the elevator, but he made them popular in the United States thanks to safety improvements. Similarly, Henry Ford did not invent the automobile but he made them popular and affordable in the United States by using an assembly line.

  • The Old New Thing

    Windows Vista changed the Alt+Tab order slightly

    • 38 Comments

    For decades, the Alt+Tab order was the same as the Z-order, but that changes in Windows Vista if you use the enhanced Alt+Tab feature known as Flip, which is on by default on most systems. There are three types of interactive task switching in Windows Vista:

    • Classic Alt+Tab: This is the same one that's been around since Windows 95. It shows a grid of icons.
    • Flip (new for Windows Vista): This shows a grid of thumbnails.
    • Flip3D (also new for Windows Vista): This shows a stack of windows in 3D.

    Classic Alt+Tab continues to show the icons in Z-order order, but the developer who wrote Flip told me that Flip changed it up a bit based on feedback from the design team. The first several icons are still shown in Z-order order, but if you have a lot of windows open, the rest of them are shown in alphabetical order to make it easier to pick the one you want from the list.

    I think it's a good sign that nobody seems to have noticed. A lot of user interface work tries to be invisible.

  • The Old New Thing

    Why does the "Install Font" dialog look so old-school?

    • 51 Comments

    8 wonders why the "Install Font" dialog looks so old-school. (And Kevin Provance demonstrates poor reading skills by not only ignoring the paragraph that explains why the suggestion box is closed, but also asking a question that's a dup of one already in the suggestion box!)

    Because it's a really old dialog.

    That dialog has been around for probably two decades now. It works just fine, and since it's not really a high-traffic dialog, updating it takes lower priority than things that get used more often. Development and testing resources aren't infinite, after all. I'm sure that if you look harder, you can find other old dialog boxes. (My pet ugly old dialog is the Character Map program. It's hideous. and I say that even though my boss's boss wrote it.)

    Besides, people don't really add fonts that way much any more. When you install font packs, such as the Consolas Font Pack for Microsoft Visual Studio 2005, they just install the fonts as part of their setup process. It's all taken care of for you.

  • The Old New Thing

    How can SIGINT be safely delivered on the main thread?

    • 20 Comments

    Commenter AnotherMatt wonders why Win32 console programs deliver console notifications on a different thread. Why doesn't it deliver them on the main thread?

    Actually, my question is the reverse. Why does unix deliver it on the main thread? It makes it nearly impossible to do anything of consequence inside the signal handler. The main thread might be inside the heap manager (holding the heap critical section) when the signal is raised. If the signal handler tried to access the heap, it would deadlock with itself if you're lucky, or just corrupt the heap if you aren't.

    For example, consider this signal handler:

    void catch_int(int sig_num)
    {
        /* re-set the signal handler again to catch_int, for next time */
        signal(SIGINT, catch_int);
        /* and print the message */
        printf("Don't do that");
        fflush(stdout);
    }
    

    What happens if the signal is raised while the main program is executing its own fflush, say after it had already flushed half the buffer? If two threads called fflush, the second caller would wait for the first to complete. But here, it's all coming from within the same thread; the second caller can't wait for the first caller to return, since the first caller can't run until the second caller returns!

    (Note also that this signal handler potentially modifies errno, which can lead to "impossible" bugs in the main program.)

    Win32 doesn't believe in interrupt user-mode code with other user-mode code asynchronously because it makes it impossible to reason about the state of the process. Delivering the console notification on a second thread means that if the second thread tries to access the heap while the first thread is inside the heap manager, the second thread will dutifully wait for the heap to stabilize before it goes ahead and starts mucking with it.

  • The Old New Thing

    Windows could not properly load the XYZ keyboard layout

    • 14 Comments

    In my rôle as the family technical support department, I get to poke around without really knowing what I'm doing and hope to stumble across a solution. Sometimes I succeed; sometimes I fail.

    Today, I'm documenting one of my successes in the hope that it might come in handy for you, the technical support department for your family. (If not, then I guess today is not your day.)

    The boot drive on the laptop belonging to one of my relatives became corrupted, and her brother-in-law had the honor of extracting the drive, sticking it into a working computer, doing the chkdsk magic, reinstalling the software that got corrupted, and otherwise getting the machine back on its feet. (It's a good thing I wasn't the one to do it because all of the programs are in Chinese, and I can't read Chinese beyond a few dozen characters.) Anyway, the machine returned to life, mostly. The bizarro proprietary hardware (that a certain manufacturer insists on using in order to make their machines special) still doesn't have drivers, but it was happy for the most part.

    There was just one problem remaining, and it fell upon me to fix it: She couldn't type Chinese characters any more. Normally, this is done by selecting an appropriate IME, but no matter what we picked, it was as if we were always using the US-English keyboard.

    One clue was that if you deleted the IME and then re-added it, you got the error message Windows could not properly load the XYZ keyboard layout.

    Here is how I fixed it. (This was a Windows XP machine.) Maybe it will help you, maybe not.

    First, go to the Regional and Language Options control panel and set everything back to English (US):

    • On the Advanced tab, under "Select a language to match the language version of the non-Unicode programs you want to use", select "English (United States)".
    • On the Languages tab, under "Text services and input languages", click the Details button. Change your default input language to "English (United States) - US" and remove all the non-English keyboard layouts.

    Restart to make sure that nobody is using those old services.

    After the restart, go back to the Regional and Language Options control panel, go to the Languages tab, and uncheck "Install files for East Asian languages." That is the whole point of this exercise. All the other steps were just removing enough obstacles so we could do that.

    Restart to make sure nobody is using any of the East Asian fonts.

    After the restart, add the East Asian fonts back, and when you're asked whether you should use the files already on the machine, say "No." That way, they will be re-copied from the CD.

    (This step was trickier for me, because one of the hardware devices that didn't work was the DVD drive! I thought I was stuck, but then I realized that the wireless network antenna still was functional, so I went to another computer in the house, put the Windows XP CD in the drive, and shared out the CD-ROM drive. Then I went back to the first computer and told it to install the files from the second computer.)

    Once everything gets reinstalled (including the corrupted keyboard layout files), you can go back and add the Chinese IME back, and reset all those other settings back to Chinese.

    Neither I nor the owner of the laptop is very good at the other's native language (though she is far better at English than I am at Chinese), so fixing her computer is the best way I have of showing her my appreciation.

Page 1 of 4 (40 items) 1234