January, 2009

  • The Old New Thing

    The problem with The Month Where Everyone Focuses on Improving Documentation is that most people are terrible technical writers

    • 38 Comments

    Why not have a month where everybody focuses on improving documentation like that month a few years ago where everybody focused on security?

    Well, part of it is that most people suck at technical writing. The technical part, maybe, but the writing almost definitely not. Writing is hard (as I've learned firsthand), and technical writing is a special genre of writing that requires a comparatively rare skill set, combining technical background with strong writing skills. Because it doesn't matter how much technical information you know if you are unable to convey this information to anyone else clearly.

    Also, there are lots of tools available for identifying potential security problems. PREfast, for example, can alert you to potential buffer overruns, and other scripts can hunt down uses of deprecated functions and similar potential security problems. On the other hand, how do you write a script that locates bad documentation?

    Pre-emptive snarky comment: "Just write a script that jumps to a random MSDN page!"

    What's more, technical people tend to write documentation for other technical people, on the assumption that the reader is already familiar with the material. Even knowing how to recognize that something "obvious" may not be obvious to everyone is a skill that takes time to develop. And as I have learned over and over again, it's a skill that I myself do not possess. Consider, for example, my posting that merely restates what I thought was obvious from the documentation.

    There's another problem with taking everybody on the team off their normal tasks and focusing them on documentation: How do you explain the one month delay in the product? Even the one-month delay for the so-called security push had its skeptics. Imagine if you told everybody that the product was late because we pulled the development team off of fixing bugs and told them to write documentation.

  • The Old New Thing

    Games to play at your Battlestar Galactica watching party

    • 11 Comments

    It is common among my circle of friends to have Battlestar Galactica-watching parties, and one way of making it a party is by playing games.

    Things we've done (or plan on doing):

    • Battlestar Galactica tarot card readings.
    • Battlestar Galactica Jeopardy.
    • Battlestar Galactica Family Feud (based on results of actual online polls, like "Which Battlestar Galactica character would look good in a dress.")
    • Battlestar Galactica Apples to Apples.
    • Battlestar Galactica Password.
    • Battlestar Galactica cocktails.
  • The Old New Thing

    A process shutdown puzzle

    • 29 Comments

    In honor of National Puzzle Day, I leave you today with a puzzle based on an actual customer problem.

    Part One: The customer explains the problem.

    We have this DLL, and during its startup, it creates a thread with the following thread procedure:

    DWORD CALLBACK ThreadFunction(void *)
    {
      HANDLE HandleArray[2];
      HandleArray[0] = SetUpStuff();
      if (HandleArray[0]) {
        HandleArray[1] = ShutdownEvent;
        while (WaitForMultipleObjects(2, HandleArray,
                                 FALSE, INFINITE) == WAIT_OBJECT_0) {
          ProcessStuff();
        }
        CleanUpStuff(HandleArray[0]);
      }
      SetEvent(ThreadCompleteEvent);
      FreeLibraryAndExitThread(ThisLibrary, 0);
    }
    

    During process shutdown, the following function is called as part of DLL_PROCESS_DETACH handling:

    void StopWorkerThread()
    {
      // tell the thread to stop
      SetEvent(ShutdownEvent);
    
      // wait for it to stop
      WaitForSingleObject(ThreadCompleteEvent, INFINITE);
    
      // Clean up
      CloseHandle(ShutdownEvent);
      ShutdownEvent = NULL;
    
      CloseHandle(ThreadCompleteEvent);
      ThreadCompleteEvent = NULL;
    }
    

    The above function is hanging at the call to WaitForSingleObject. If we break in, we see that the thread that is supposed to be running the ThreadFunction is gone. I verified that the thread was successfully created, but by the time we get around to waiting for it, it's already gone.

    I checked, and nobody sets the ThreadCompleteEvent except the StopWorkerThread function. I stepped through SetUpStuff, and it succeeded. However, a breakpoint on CleanUpStuff was never hit. No exceptions were thrown either.

    I am completely stumped as to how this thread disappeared.

    You already know enough to explain how the thread disappeared.

    Part Two: After providing your explanation, the customer came up with this solution.

    Thank you for your explanation. We've made the following changes to fix the problem. Again, thank you for your help.

    DWORD CALLBACK ThreadFunction(void *)
    {
      HANDLE HandleArray[2];
      HandleArray[0] = SetUpStuff();
      if (HandleArray[0]) {
        HandleArray[1] = ShutdownEvent;
        while (WaitForMultipleObjects(2, HandleArray,
                                 FALSE, INFINITE) == WAIT_OBJECT_0) {
          ProcessStuff();
        }
        CleanUpStuff(HandleArray[0]);
      }
      // SetEvent(ThreadCompleteEvent);
      FreeLibraryAndExitThread(ThisLibrary, 0);
    }
    
    void StopWorkerThread()
    {
      // tell the thread to stop
      SetEvent(ShutdownEvent);
    
      // wait for the thread
      WaitForSingleObject(ThreadHandle, INFINITE);
    
      // Clean up
      CloseHandle(ShutdownEvent);
      ShutdownEvent = NULL;
    }
    

    Criticize this proposed solution.

    Part Three: Even though the proposed solution is flawed, explain why it doesn't cause a problem in practice. (I.e., explain why the customer is always lucky.)

  • The Old New Thing

    The great thing about being popular is that everybody wants to see you go down

    • 20 Comments

    The servers that run this Web site are under heavy load, even when things are operating normally. And on top of that, they have to fend off a lot of attacks. There's the usual spam pingbots, but usually when the site starts to get all bogged down, it's because there is an active attack on the site at the network level. And it doesn't matter what software is running the site. It's not like the bad guys are going to say, "Oh, this site is using PHP. I guess we'll leave them alone."

    For example, the problems earlier this week were caused by two IP addresses saturating all the connections to the server. Last October's slowdown was caused by the server being overwhelmed by 100,000 simultaneous connections (suspected to be a denial of service attack but no proof). The slowdown from last August was caused by a distributed attack from a botnet attempting to perform various SQL injection attacks. (They failed, but they kept trying.) The outage from last July was caused by a computer owned by a different customer of the hosting service that had been hacked, and which was launching its own network attack that took out connectivity for all other computers on the same network subnet. (In other words, blogs.msdn.com just happened to be in the wrong place at the wrong time.)

    Those are all the outages for the past six months that I still have records of. (I'm not saying there were no other outages; those are just the ones that the people who run the servers considered significant enough that they sent out an explanation for the outage.) And it's not clear how switching to a different blog engine would have prevented any of them.

  • The Old New Thing

    When you have only 16KB of memory, you can't afford to waste any of it on fluffy stuff

    • 16 Comments

    The original IBM PC came with 16KB of memory.

    That's not a lot of space in which to squeeze an operating system, especially since you had to leave enough memory for the user to actually get work done. A product of its time, the MS-DOS kernel is written entirely in assembly language, pretty much standard procedure for programs of the era. It also meant that the code takes all sorts of crazy shortcuts to shave a few bytes here, a few bytes there, in order to squeeze into as little memory as possible. For example, one very common trick was to have jump into the middle of an instruction, knowing that the second half of the instruction, when reinterpreted as the start of an instruction, performs the operation you wanted.

    Anyway, this subject arose in response to my discussion of why a corrupted program sometimes results in a "Program too big to fit in memory" error, which prompted the question from commenter 8 wondering why the kernel didn't simply reject .COM files bigger than 64KB?

    Well, yeah, and that's what it did: By complaining that it was too big to fit into memory. There's no point adding a redundant test. (It appears that some people like to call these redundant tests basic sanity checking, but I consider sanity checking to be protecting against unreasonable values before they cause trouble. But in this case, they don't cause trouble—the error is detected and reported even without the so-called sanity check.)

    Consider:

    bool SomeFunction(...)
    {
     ...
     if (x == 3) return false;
     if (x < 10) return false;
     ...
    }
    

    The first test is redundant, because if x is three, then even without the test, the function will still fail because x is also less than ten.

    And when you're trying to squeeze your kernel into a few bytes as possible, you're certainly not going to waste your time coding up a redundant test.

  • The Old New Thing

    There's camping, and then there's luxury camping, and then there's ridiculous luxury camping

    • 6 Comments

    Back in 2002, I read an article about luxury camping in the Wall Street Journal, and it struck me as kind of missing the point of camping.

    For campers too busy to shop for marshmallows, one place stocks a s'mores kit -- skewers included -- in its gourmet general store. Another provides blow dryers, putting an end to "river hair."

    When Karen Schaupeter and her husband arrived at El Capitan Canyon in Santa Barbara, Calif., they were chauffeured to their campsite in a golf cart. Dinner was tamales with mango salsa prepared by the staff, in front of a roaring bonfire -- also prepared by the staff. In the morning, Ms. Schaupeter ordered a latte at El Capitan's store. "I thought I was being snooty," says the Oakland photo stylist. "But people were coming in saying, 'a double no-foam mocha, please.' "

    A tent site at the Chattooga River Resort, for example, is just $18 a night. Beverages are extra (1955 Chateau Latour: $1,100), as are rented DVDs for your laptop. Tack on a "room-service" steak dinner for four and it'll run you $75. Plus, because you are camping, you'll still have to cook the steaks. In Northern California, a barebones pitch-your-own-tent site is $30 at Costanoa, but if you want a maid to fluff the down comforter, you've got to spring for at least a canvas cabin at $130 a night.

    Now all this sounded pretty extravagant at the time, I mean, $350 a night for camping? Room service? But at least you have to cook the steak yourself, that's something at least.

    Apparently, in the years since the article was written, things had gotten even worse: Now you don't even have to cook your own steak.

    "We don't pitch tents. We don't cook outdoors. We don't share a bathroom. It's just not going to happen. This is a kid who has never flown anything but first class or stayed anywhere other than a Four Seasons."

    The Bondicks, who live near Boston and have a personal chef, shelled out $595 a night, plus an additional $110 per person per day for food.

    It's a hefty price to sleep in a tent, but the perks include a camp butler to build the fire, a maid to crank up the heated down comforter and a cook to whip up bison rib-eye for dinner and French toast topped with huckleberries for breakfast.

    (Put that job title on your résumé: Camp Butler.)

    The end of the article describes some of the "luxury nature" events you can sign up for, like being driven to the start of a scenic section of a hike and picked up at the other side, so you don't have to "hoof it past the same view twice."

    What a horrific ordeal it must be to experience nature twice.

  • The Old New Thing

    Why can't I see all of the 4GB of RAM in my machine?, redux

    • 15 Comments

    Phil Taylor gives another few reasons why machine with 4GB of RAM doesn't show up as such.

    (Here's my earlier posting on this subject, for reference.)

    These articles about possible reasons for memory not showing up are not intended to be comprehensive. It is entirely possible that the problem you are experiencing is not one described here.

  • The Old New Thing

    I think I can read the bassoonist's music from here

    • 8 Comments

    An insane 1.4-gigapixel image of Obama's inaugural address.

    All it needs is a guy in the audience dressed like Waldo.

  • The Old New Thing

    But then we ran into problems when we started posting 10,000 messages per second

    • 4 Comments

    Once upon a time, a long, long time ago, there was a research team inside Microsoft who was working on alternate models for handling input. I don't know what eventually came of that project, and I don't even remember the details of the meeting, but I do remember the punch line, so I'm just going to make up the rest.

    The research project broke up the duties of their system into a few components. The two that are important to the story are a driver component which received information from various hardware devices and transmitted that information via the PostMessage function to another component whose job it was to study those input messages and route them to the appropriate application. (In 16-bit Windows, the PostMessage function was specifically written so it could be called from device drivers during hardware interrupts.) Each time the driver received information from a hardware device, it posted a message to its helper program.

    Everything seemed to go reasonably smoothly. The device driver received a hardware event, it posted a message to the helper program, and the helper program retrieved the message and processed it. But once they cranked up the hardware devices to produce information at a higher rate (and therefore produced input with much finer resolution), the events started coming in faster and faster, and their design started to collapse under the pressure.

    The research team asked to meet with the user interface team to help work out their problems under load. They outlined their design and explained that it worked well at low data rates, "but then we ran onto problems when we started posting 10,000 messages per second."

    At that point, the heads of all the user interface people just sat there and boggled for a few seconds.

    "That's like saying your Toyota Camry has stability problems once you get over 500 miles per hour."

    If you're going to be pumping huge quantities of data through the message queue, creating a separate message for each one is crazy. Think about it: Suppose you're posting 10,000 messages per second. The thread whose job it is to process the messages gets pre-empted and doesn't run for 50 milliseconds. That's 500 messages behind schedule already. Now suppose it takes a dozen page faults (which take 8ms each, say). Now you're another 1000 messages behind. Windows NT sets an arbitrary limit of 10,000 unprocessed messages in a message queue so that a runaway program won't drain the desktop heap and roach everything. A few hiccups in your process will quickly send you over that limit.

    For this usage pattern, you want to switch from one event per message to a signal on the transition (or edge triggering).

    When the first event occurs, post a single message to the helper program saying there is work to do and set a flag saying helper window has been notified. Meanwhile, stash the information you would have included in the message into a privately-managed queue. If an event arrives when the helper window has been notified flag is set, then don't post a message; just append the work item to the queue. When the helper window receives the there is work to do message, it calls back into the driver to say Okay, give me some work to do. After it does the work, it calls into the driver to say Okay, what else do you want me to do? (Alternatively, you can have the helper window grab the entire work list at once.) When the helper window asks for work to do and comes back empty-handed, then clear the helper window has been notified flag so the next time an event occurs, a new message will be posted to kick-start the helper window.

    Commenter Hayden proposed a number of other mechanisms. The send a list of work items rather than just one technique works well if you know when the list of work items is complete and therefore is ready to send. The second technique is the one I described here; it works well if the producer doesn't really know when each chunk of incoming work is finished, or if the work that comes in is continuous. The third mechanism merely avoids the message queue altogether and uses a semaphore instead.

    The point is not to try to drive your Camry at 500 miles per hour. Find a way to get your work done while keeping the Camry well within its design parameters, or look for a different vehicle.

  • The Old New Thing

    Why can't you apply ACLs to registry values?

    • 22 Comments

    Someone wondered why you can't apply ACLs to individual registry values, only to the containing keys.

    You already know enough to answer this question; you just have to put the pieces together.

    In order for a kernel object to be ACL-able, you need to be able to create a handle to it, since it is the act of creating the handle that performs the access check.

    Creating a handle to the value means that we would need a function like RegOpenValue and corresponding RegQueryValueData and RegSetValueData functions which take not a registry key handle but a registry value handle.

    And then you've basically come full circle. You've reinvented the 16-bit registry, where data was stored only in the tips of the trees. Just change value to subkey and you're back where you started.

    What would be the point of adding an additional layer that just re-expresses what you had before, just in a more complicated way?

    Commenter bcthanks wondered why we didn't abandon values and just stored everything in subkeys, like the 16-bit registry did. Well, if you want to do that, then more power to you. Though it would make it difficult for you to store anything other than REG_SZ data in the registry. If you wrote a REG_BINARY blob to the default value of a subkey, what should be returned if somebody called RegQueryValue which always returns a string?

Page 1 of 4 (34 items) 1234