November, 2009

  • The Old New Thing

    Why can you create a PIF file that points to something that isn't an MS-DOS program?

    • 22 Comments

    James MAstros asked why it's possible to create a PIF file that refers to a program that isn't an MS-DOS program. (That's only part of the question; I addressed other parts last year.)

    Well, for one thing, there was indeed code to prevent you from setting PIF properties for something that isn't an MS-DOS program, so the precaution was already there. But it didn't stop anybody who was really determined to try. All you had to do was create an MS-DOS program, then create a PIF file for it, and then overwrite the MS-DOS program file with something else. Since time travel has not been invented, the PIF creator code can't retroactively go back in and say, "Well, if I knew you were going to pull this trick, I wouldn't have let you create it in the first place!" You can't even enforce it at the time the PIF file is launched, because somebody could replace the file during the split second between checking the file type and actually using it.

    Of course, the real question is "Why, if you create a PIF file that describes something that isn't an MS-DOS program, does it still work?" It still works because the PIF file did exactly what it was supposed to do. It created an MS-DOS virtual machine with the specified parameters and then ran the program in it.

    Now, back in the days when PIF files were invented, if you tried to run something that wasn't an MS-DOS program inside an MS-DOS virtual machine, all that happened was that the MS-DOS stub ran, the same thing that happened if you tried to run the program from MS-DOS. The program was treated not as a Windows program but as an MS-DOS program.

    Windows 95 changed that. If you tried to run a Windows application from the MS-DOS command prompt, it would run the Windows application instead of telling you, "This program cannot be run in DOS mode." This change was made for two reasons.

    First, the existing behavior seemed pretty stupid. You're running Windows, and if you open a command prompt and try to run a Windows program, you're told, "You need Windows to run this program." It's like one of those bizarro-land government red tape nightmares, where you go to the courthouse to file some papers in person, and the clerk at the desk says, "I'm sorry, you have to file this document in person at the courthouse."

    Second, it was necessary to allow 32-bit console programs to run when launched from a MS-DOS command window. Since Windows 95 used the MS-DOS prompt as its command line interface (as opposed to Windows NT which used a 32-bit command prompt), it was kind of important that you be able to run 32-bit console programs from a virtual machine. Without it, the whole idea of a console program became kind of weak. "Yeah, we have console programs, but you can't launch them from a console."

    What happens, then, if you create PIF file that points to a 32-bit program? Well, the operating system goes to all the effort to create a virtual machine to the specifications you indicated in the PIF file. You want a particular amount of extended memory? Okay, we'll set that up. You want a custom icon? Sure, no problem. You want it to disable DPMI memory? You got it. Once that's all set up, the virtual MS-DOS driver says, "Okay, and set the initial CS:IP for the virtual machine to the MS-DOS EXEC call to run the program the PIF file specified." The EXEC call executes, and the interop code kicks in and launches the 32-bit Windows program.

    From the virtual machine's point of view, nothing is actually wrong; you merely did something in a really roundabout way. It's like booking a meeting room, specifying that you would like a slide projector, that the chairs and tables be set in a particular arrangement, that everybody be provided with water and a notebook, and then putting a sign on the door that says, "This meeting has moved to Location Z." You go to all this effort to get the conference room to be set up exactly the way you want it, and then you end up not using it. The conference center doesn't care (as long as it still gets paid).

    PIF files are like shortcuts, but with the added effort of creating an MS-DOS virtual machine. And just like with shortcuts, bad guys can choose a dangerous target and make your day miserable.

  • The Old New Thing

    Leave it to the Taiwanese to think of wrapping a donut inside another donut

    • 19 Comments

    The food known in Mandarin Chinese as 油條 (yóutiáo), but which in Taiwanese goes by the name 油炸粿, is basically a fried stick of dough, similar to a cruller, but puffier rather than cakey. The traditional way of eating it is to wrap it inside a 燒餅 (a sesame-coated flatbread), and dip the entire combination into a bowl of hot soy milk. I prefer salty soy milk, but some people prefer sweet. (Those people who prefer the sweet version are clearly wrong.)

    Obviously, the donut sandwich was invented before the low-carb diet craze.

    Sidebar: Salty soy milk (鹹豆漿) is one of those nostalgia breakfasts for me, or more accurately, one of those manufactured nostalgia breakfasts, because I didn't actually eat it that much as a child.

    Sidebar 2: For authentic Chinese food in Seattle, my choice is Chiang's Gourmet. They have an extensive menu of standard Chinese breakfast foods. The service is surly, but that somehow just adds to the experience.

  • The Old New Thing

    Trying to avoid double-destruction and inadvertently triggering it

    • 34 Comments

    We saw some time ago the importance of artificially bumping an object's reference count during destruction to avoid double-destruction. However, one person's attempt to avoid this problem ended up triggering it.

    ULONG MyObject::Release()
    {
     LONG cRef = InterlockedDecrement(&m_cRef);
     if (cRef > 0) return cRef;
     m_cRef = MAXLONG; // avoid double-destruction
     delete this;
     return 0;
    }
    

    The explanation for the line m_cRef = MAXLONG was that it was done to avoid the double-destruction problem if the object receives a temporary AddRef/Release during destruction.

    While it's true that you should set the reference count to an artificial non-zero value, choosing MAXLONG has its own problem: integer overflow.

    Suppose that during the object's destruction, the reference count is temporarily incremented twice and decremented twice.

    Action m_cRef
    Just before call to Release() 1
    InterlockedDecrement 0
    m_cRef = MAXLONG 2147483647
    destructor does temporary AddRef() −2147483648 (integer overflow)
    destructor does temporary AddRef() −2147483647
    destructor does temporary Release() −2147483648
    since m_cRef < 0, we re-destruct

    Sure, choosing a huge DESTRUCTOR_REFCOUNT means that you have absolutely no chance of decrementing the reference count back to zero prematurely. However, if you choose a value too high, you introduce the risk of incrementing the reference count so high that it overflows.

    That's why the most typical values for DESTRUCTOR_REFCOUNT are 1, 42, and 1000. The value 1 is really all you need to avoid double-destruction. Some people choose 42 because it's cute, and other people choose 1000 because it's higher than any "normal" refcount, so it makes it easier to spot during debugging. But even then, the "high" value of 1000 still leaves room for over two billion AddRef()s before overflowing the reference count.

    On the other hand, if you choose a value like MAXLONG or MAXDWORD, then you're taking something that previously never happened (reference count integer overflow) and turning it into an almost certainty.

  • The Old New Thing

    I reorganized your kitchen for you, sweetie

    • 29 Comments

    I suspect most people are familiar with the It may be a mess, but it's my mess and I know where everything is phenomenon. That doesn't necessarily mean that items are in the best location, but at least you know which suboptimal location you chose.

    :: Wendy :: told me a story some time ago about something that happened while her parents were visiting. When she returned from work, her mother said, "Oh, Wendy, darling, I reorganized your kitchen for you. You had everything in the wrong place."

    Wendy's mother was trying to be helpful, but of course it was a net loss for poor Wendy, who couldn't find anything in her kitchen for weeks. Yes, there was the whole Oh great where did my mother put my food processor? problem, but even after she found it, the "improved" location was far worse than its original location. In fact, in many cases, it was in the exact opposite location from where it should be.

    You see, Wendy is left-handed, and her mother is right-handed.

  • The Old New Thing

    Little-known command line utility: clip

    • 55 Comments

    Windows Vista includes a tiny command line utility called clip. All it does is paste its stdin onto the clipboard.

    dir | clip
    echo hey | clip
    

    For the opposite direction, I use a little perl script:

    use Win32::Clipboard;
    print Win32::Clipboard::GetText();
    
  • The Old New Thing

    Stories of anticipating dead computers: Windows Home Server

    • 65 Comments

    Like most geeks, I have a bit of history with dead computers. In the past, I used the "wait until it breaks, and then panic" model, but recently I've begun being a bit more anticipatory, like replacing an old laptop before it actually expires.

    Anticipating another future dead computer, I bought an external USB hard drive for backing up important files, but upon reading the description on the box, I started to have second thoughts. It came with its own backup software that reportedly installed automatically when you plugged in the drive (!). I didn't want that; I just wanted a boring USB hard drive.

    One of my friends (who used to work with USB devices) cautioned me: "Those things are evil. Some of them enumerate as a keyboard and 'type in' a device driver so they can own your machine even if you have autorun disabled." Wow, that's a level of craziness I previously had not been aware of.

    Upon further discussion, I was convinced to return the external hard drive unopened and instead get a copy of Windows Home Server. I went for the Acer Aspire EasyStore H340 instead of trying to build my own reduced-footprint low-power quiet-fan computer. And amazingly, the EasyStore comes with only two pieces of shovelware, the excellent LightsOut add-in, which I kept, and some annoying trialware, which was easily uninstalled.

    I felt kind of weird getting a Home Server since I have only one home computer of consequence, so I'd basically have a one-computer network. (I do have that laptop, but I'm careful not to keep anything on it that isn't already backed up somewhere else.) And because the Home Server would easily be the most powerful computer in the house, even though all it does is sit there doing nothing most of the time. But the convenience is hard to beat. It just sits there quietly and does its job of backing up the other computer every night. (And seeing as I had the machine anyway, I also have it back up my laptop, even though there's nothing really important on it. Most nights, the laptop backup takes only five minutes. And just because I can, I even back up the old laptop that doesn't even do anything any more aside from surf the Internet!)

    Of course, the first thing you do with a new gadget is tinker with it, and I installed Whiist and created a photo album. It was so easy to do, I feel like I'm losing my geek cred. I mean, this sort of thing is supposed to involve hours of staring at the screen, scouring the Internet for information, and groveling through hundreds of settings trying to get things working. If anybody can get a home server up and running with automatic nightly backups and an online photo album by just clicking on some fluffy GUI buttons, then what will I have to feel superior about?

    I'm kidding. My hat's off to the legendary Charlie Kindel and the Windows Home Server team They hit this one out of the park. It's an awesome product.

    Now that backing up is so painless, it has set a new baseline behavior: Now, I feel kind of uneasy making large-scale changes to files on my home computer unless I have a complete backup. (Backups are the reason I bought the server. All the other features, like the photo album, are just gravy.)

    And yes, every few weeks, I restore a randomly-selected file from backup just to make sure the backups are working.

    FTC disclaimer: Although Windows Home Server is a product of Microsoft Corporation (my employer), no compensation was tied to this review. (I didn't even get an employee discount.) I'm just a happy customer.

  • The Old New Thing

    How do I create a toolbar that sits in the taskbar?

    • 12 Comments

    Commenter Nick asks, "How would you go about creating a special toolbar to sit on the taskbar like the Windows Media Player 10 minimised toolbar?"

    You would look at the DeskBand API SDK Sample in the Windows Platform SDK.

    The magic word is DeskBand. This MSDN page has an overview.

    Bonus chatter: I've seen some online speculation as to whether a DeskBand counts as a shell extension, because of the guidance against writing shell extensions in managed code. As with all guidance, you need to understand the rationale behind the guidance so you can apply the guidance intelligently instead of merely following it blindly off a cliff. Summarizing the rationale: Since only one version of the CLR can exist in a process, any shell extension which runs inside the host process which uses the CLR may inject a version of the CLR that conflicts with the version of the CLR the host process (or some other component in the host process) wants to use. Now that you understand the reason, you also can answer the question, "Is a DeskBand a shell extension (for the purpose of this guidance)?" Yes, because DeskBands (like all other COM objects registered as in-process servers) run inside the host process.

    As another example of how understanding the rationale behind guidance lets you know when the guidance no longer applies: In the time since the original guidance was developed, the CLR team came up with a way to run multiple versions of the CLR inside a single process (for specific values of "multiple"). Therefore, if you use one of those "I won't conflict with other versions of the CLR inside the same process" versions, then you can see that the rationale behind the guidance no longer applies.

  • The Old New Thing

    Signs that the symbols in your stack trace are wrong

    • 19 Comments

    One of the things programmers send to each other when they are trying to collaborate on a debugging problem is stack traces. Usually something along the lines of "My program does X, then Y, then Z, and then it crashes. Here is a stack trace. Can you tell me what's wrong?"

    It helps if you at least glance at the stack trace before you send it, because there are often signs that the stack trace you're about to send is completely useless because the symbols are wrong. Here's an example:

    We are testing our program and it gradually grinds to a halt. When we connect a debugger, we find that all of our threads, no matter what they are doing, eventually wind up hung in kernel32!EnumResourceLanguagesA. Can someone explain why that function is hanging, and why it seems all roads lead to it?

       0  Id: 12a4.1468 Suspend: 1 Teb: 000006fb`fffdc000 Unfrozen
    kernel32!EnumResourceLanguagesA+0xbea00
    kernel32!EnumResourceLanguagesA+0x2b480
    bogosoft!CObjMarker::RequestBlockForFetch+0xf0
    ...
    
       1  Id: 12a4.1370 Suspend: 1 Teb: 000006fb`fffda000 Unfrozen
    kernel32!EnumResourceLanguagesA+0xbea00
    kernel32!EnumResourceLanguagesA+0x2b480
    bsnetlib!CSubsystem::CancelMain+0x90
    
       2  Id: 12a4.1230 Suspend: 1 Teb: 000006fb`fffd8000 Unfrozen
    NETAPI32!I_NetGetDCList+0x117e0
    kernel32!EnumResourceLanguagesA+0x393a0
    ntdll!LdrResFindResource+0x58b20
    ...
    
       3  Id: 12a4.cc0 Suspend: 1 Teb: 000006fb`fffd6000 Unfrozen
    kernel32!EnumResourceLanguagesA+0xa80
    bsnetlib!BSFAsyncWait+0x190
    ...
    
      4  Id: 12a4.1208 Suspend: 1 Teb: 000006fb`fffd4000 Unfrozen
    kernel32!EnumResourceLanguagesA+0xbea00
    kernel32!EnumResourceLanguagesA+0x2b480
    bogosoft!TObjList<DistObj>::Get+0xb0
    
      5  Id: 12a4.1538 Suspend: 1 Teb: 000006fb`fffae000 Unfrozen
    kernel32!EnumResourceLanguagesA+0xbf3d0
    kernel32!EnumResourceLanguagesA+0x2c800
    bsnetlib!Tcp::ReadSync+0x340
    ...
    
       6  Id: 12a4.16e0 Suspend: 1 Teb: 000006fb`fffac000 Unfrozen
    ntdll!LdrResFindResource+0x61808
    ntdll!LdrResFindResource+0x1822a0
    kernel32!EnumResourceLanguagesA+0x393a0
    ntdll!LdrResFindResource+0x58b20 
    ...
    

    This stack trace looks suspicious for a variety of reasons.

    First of all, look at that offset EnumResourceLanguagesA+0xbea00. It's unlikely that the EnumResourceLanguagesA function (or any other function) is over 750KB in size, as this offset suggests.

    Second, it's unlikely that the EnumResourceLanguagesA function (or any other function, aside from obvious cases like tree walking) is recursive. And it's certainly unlikely that a huge function will also be recursive.

    Third, it seems unlikely that the EnumResourceLanguagesA function would call, NETAPI32!I_NetGetDCList. What does enumerating resource languages have to do with getting a DC list?

    Fourth, look at those functions that are allegedly callers of EnumResourceLanguagesA: bogosoft!CObjMarker::RequestBlockForFetch, bsnetlib!CSubsystem::CancelMain, bsnetlib!Tcp::ReadSync. Why would any of these functions want to enumerate resource languages?

    These symbols are obvious wrong. The huge offsets are present because the debugger has access only to exported functions, and it's merely showing you the name of the nearest symbol, even though it has nothing to do with the actual function. It's just using the nearest signpost it can come up with. It's like if somebody gave you directions to the movie theater like this: "Go to city hall downtown and then go north for 35 miles." This doesn't mean that the movie theater is in the downtown district or that the downtown district is 35 miles long. It's just that the person who's giving you directions can't come up with a better landmark than city hall.

    This is just another case of the principle that you have to know what's right before you can see what's wrong. If you have no experience with good stack traces, you don't know how to recognize a bad one.

    Oh, and even though the functions in question are in kernel32, you can still get symbols for that DLL with the help of the Microsoft Symbol Server.

  • The Old New Thing

    The day the coffee machine exploded

    • 19 Comments

    Some time ago, Microsoft began installing Starbucks coffee makers in the kitchens, and caffeine addicts waited anxiously for the machines to reach their building. Or at least that's what happened on the main Redmond campus. But what about the satellite offices?

    I'm told that each satellite office qualified for an iCup machine when the number of employees at the office reached some magic value. One of my colleagues who works at the office in New York City told me that they eagerly awaited the arrival of the machine when they learned that they reached that threshold. The long-anticipated day arrived: The coffee machine was installed in the kitchen.

    And it exploded.

    Okay, it didn't really explode. But the receptacle for holding the spent grounds overflowed and burst, spilling its guts out onto the kitchen floor. If you didn't know what happened, you'd have thought it had exploded.

    The reason it exploded was that, although the New York office is rather small, it does have a very high number of visitors. As you can imagine, clients pay visits to the New York offices for meetings, presentations, all that stuff that clients visit offices for; but the underlying algorithm for determining how many coffee machines each office receives doesn't take into account how many visitors each location receives.

    Oh, and happy Guy Fawkes Day. Try not to blow up any coffee machines.

  • The Old New Thing

    In the product end game, every change carries significant risk

    • 22 Comments

    One of the things I mentioned in my talk the other week comparing school with Microsoft is that in school, as the deadline approaches, the work becomes increasingly frantic. On the other hand, in commercial software, as the deadline approaches, the rate of change slows down, because the risk of regression outweighs the benefit of the fix.

    A colleague of mine offered up this example from Windows 3.1: To fix a bug in GDI, the developers made a very simple fix. It consisted of setting a global flag when a condition was detected and checking the flag in another place in the code and executing a few lines of code if it was set. The change was just a handful of lines, it was very tightly scoped, and it did not affect the behavior of GDI if the flag was not set. They tested the code, it fixed the problem, everything looked good. What could possibly go wrong?

    A few days after the fix went in, the GDI team started seeing weird crashes that made no sense in code completely unrelated to the places where they made the change. What is going on?

    After some investigation, they discovered a memory corruption bug. In 16-bit Windows, the local heap came directly after the global variables, and local heap memory was managed in the form of local handles. A common error when working with the local heap was using a local handle as a pointer rather than passing it to the LocalLock function to convert the handle to a pointer. The developers found a place where the code forgot to perform this conversion before using a local handle. (In Windows 3.1, most of GDI was written in assembly language, so you didn't have a compiler to do type checking and complain that you're using a handle as a pointer.) Using the handle as a pointer resulted in a global variable being corrupted.

    Investigation of the code history revealed that this bug had existed in the code since the day it was first written. Why hadn't anybody encountered this bug before?

    The handle that was being used incorrectly was allocated at boot time, so its value was consistent from run to run. The corruption took the form of writing a zero into memory at the wrong location, and it so happened that the variable that was accidentally being set to zero was not used often, and at the time the corruption occurred, it happened to have the value zero already.

    Adding a new global variable shifted the other global variables around in memory, and now the accidental write of zero hit an important variable whose value was usually not zero.

    In the product end game, every change carries significant risk. It's often a more prudent decision to live with the bug you understand than to fix it and risk exposing an even worse bug whose existence may not come to light until after you ship.

Page 3 of 4 (34 items) 1234