April, 2013

  • The Old New Thing

    Dangerous setting is dangerous: This is why you shouldn't turn off write cache buffer flushing

    • 71 Comments

    Okay, one more time about the Write-caching policy setting.

    This dialog box takes various forms depending on what version of Windows you are using.

    Windows XP:

      Enable write caching on the disk
    This setting enables write caching in Windows to improve disk performance, but a power outage or equipment failure might result in data loss or corruption.

    Windows Server 2003:

      Enable write caching on the disk
    Recommended only for disks with a backup power supply. This setting further improves disk performance, but it also increases the risk of data loss if the disk loses power.

    Windows Vista:

      Enable advanced performance
    Recommended only for disks with a backup power supply. This setting further improves disk performance, but it also increases the risk of data loss if the disk loses power.

    Windows 7 and 8:

      Turn off Windows write-cache buffer flushing on the device
    To prevent data loss, do not select this check box unless the device has a separate power supply that allows the device to flush its buffer in case of power failure.

    Notice that the warning text gets more and more scary each time it is updated. It starts out just by saying, "If you lose power, you might have data loss or corruption." Then it adds a recommendation, "Recommended only for disks with a backup power supply." And then it comes with a flat-out directive: "Do not select this check box unless the device has a separate power supply."

    The scary warning is there for a reason: If you check the box when your hardware does not satisfy the criteria, you risk data corruption.

    But it seems that even with the sternest warning available, people will still go in and check the box even though their device does not satisfy the criteria, and the dialog box says right there do not select this check box.

    And then they complain, "I checked this box, and my hard drive was corrupted! You need to investigate the issue and release a fix for it."

    Dangerous setting is dangerous.

    At this point, I think the only valid "fix" for this feature would be to remove it entirely. This is why we can't have dangerous things.

  • The Old New Thing

    How can I figure out which user modified a file?

    • 20 Comments

    The Get­File­Time function will tell you when a file was last modified, but it won't tell you who did it. Neither will Find­First­File, Get­File­Attributes, or Read­Directory­ChangesW, or File­System­Watcher.

    None of these the file system functions will tell you which user modified a file because the file system doesn't keep track of which user modified a file. But there is somebody who does keep track: The security event log.

    To generate an event into the security event log when a file is modified, you first need to enable auditing on the system. In the Local Security Policy administrative tool, go to Local Policies, and then double-click Audit Policy. (These steps haven't changed since Windows 2000; the only thing is that the Administrative Tools folder moves around a bit.) Under Audit Object Access, say that you want an audit raised when access is successfully granted by checking Success (An audited security access attempt that succeeds).

    Once auditing is enabled, you can then mark the files that you want to track modifications to. On the Security tab of each file you are interested in, go to the Auditing page, and select Add to add the user you want to audit. If you want to audit all accesses, then you can choose Everyone; if you are only interested in auditing a specific user or users in specific groups, you can enter the user or group.

    After specifying whose access you want to monitor, you can select what actions should generate security events. In this case, you want to check the Successful box next to Create files / write data. This means "Generate a security event when the user requests and obtains permission to create a file (if this object is a directory) or write data (if this object is a file)."

    If you want to monitor an entire directory, you can set the audit on the directory itself and specify that the audit should apply to objects within the directory as well.

    After you've set up your audits, you can view the results in Event Viewer.

    This technique of using auditing to track who is generating modifications also works for registry keys: Under the Edit menu, select Permissions.

    Exercise: You're trying to debug a problem where a file gets deleted mysteriously, and you're not sure which program is doing it. How can you use this technique to log an event when that specific file gets deleted?

  • The Old New Thing

    Some trivia about the //build/ 2011 conference

    • 12 Comments

    Registration for //build/ 2013 opens tomorrow. I have no idea what's in store this year, but I figure I'd whet your appetite by sharing some additional useless information about //build/ 2011.

    The internal code name for the prototype tablets handed out at //build/ 2011 was Nike. I think we did a good job of keeping the code name from public view, but one person messed up and accidentally let it slip to Mary-Jo Foley when they said that the contact email for people having tax problems related to the device is nikedistⓐmicrosoft.com.

    The advance crew spent an entire week preparing those devices. One of the first steps was unloading the devices from the pallettes. This was done in a disassembly line: The boxes were opened, the devices were fished out, then removed from the protective sleeve. At the end of this phase, you had one neat stack of boxes and one neat stack of devices.

    The advance crew also configured the hall so they would be ready to start once Redmond sent down the final bits of the Developer Preview build. The hall was divided into sections, and each section consisted of eight long tables. Four of the tables were arranged in a square, and the other four tables were placed outside the square, one parallel to each side, forming four lanes.

     
     
           
     
     

    Along the inner tables, there were docking stations, each with power, wired access to a private network, and a USB thumb drive. Along the outer tables, there were desk organizers like this one, ready to hold several devices in a vertical position, and next to the organizer was a power strip with power cables at the ready.

    In this phase of the preparation, the person working the station would take a device, pop it into a docking station, and power it on with the magic sequence to boot from USB. The USB stick copied itself to a RAM drive, then ran scripts to reformat the hard drive and copy all the setup files from the private network onto the hard drive, then it installed the build onto the machine, installed Visual Studio, installed the sample applications, flashed the firmware, and otherwise prepared the machine for unboxing. (Not necessarily in that order; I didn't write the scripts, so I don't know what they did exactly. But I figure these were the basic steps.) Once the setup files were copied from the private network, the rest of the installation could proceed autonomously. It didn't need any further access to the USB stick or the network. Everything it needed was on the RAM drive or the hard drive.

    The scripts changed the screen color based on what step of the process it was in, so that the person working the station could glance over all the devices to see which ones needed attention. Once all the files were copied from the network, the devices were unplugged from the docking station and moved to the vertical desk organizer. There, it got hooked up with a power cable and left to finish the installation. Moving the device to the second table freed up the docking station to accept another device.

    Assuming everything went well, the screen turned green to indicate that installation was complete, and the device was unplugged, powered down, and placed in the stack of devices that were ready for quality control.

    The devices that passed quality control then needed to be boxed up so they could be handed out to the conference attendees. Another assembly line formed: The devices were placed back in the protective sleeves, nestled snugly in their boxes, and the boxes closed back up.

    Now, I'm describing this all as if everything ran perfectly smoothly. Of course there were problems which arose, some minor and some serious, and the process got tweaked as the days progressed in order to make things more efficient or to address a problem that was discovered.

    For example, the devices were labeled preview devices, but shortly before the conference was set to begin, the manufacturer registered their objection to the term, since preview implies that the device will actually turn into a retail product. They insisted that the devices be called prototype devices. This meant that mere days before the conference opened, a rush print job of 5000 stickers had to be shipped down to the convention center in order to cover the word preview with the word prototype. A new step was added to the assembly line: place sticker over offending word.

    Another example of problem-solving on the fly: The SIM chip for the wireless data plan was preinstalled in the device. The chip came on a punch-out card, and the manufacturer decided to leave the card shell in the box. Okay, I guess, except that the card shell had the SIM card's account number printed on it. Since the reassembly process didn't match up the devices with the original boxes, you had all these devices with unmatched card shells. In theory, somebody might call the service provider and give the account number on the shell rather than the number on the SIM card. To fix this, a new step was added to the assembly line: Remove the card shells. All the previously-assembled boxes had to be unpacked so the shells could be removed. (At some point, somebody discovered that you could extract the shells without removing the foam padding if you held the box at just the right angle and shook it, so that saved a few seconds.)

    Now about the devices themselves: They were a very limited run of custom hardware, and they were not cheap. I think the manufacturing cost was in the high $2000s per unit, and that doesn't count all the sunk costs. I found it amusing when people wrote, "What do you mean a free tablet? Obviously they baked that into the cost of the conference registration, so you paid for it anyway." Conference registeration was $2,095 (or $1,595 if you registered early), which nowhere near covered the cost of the device.

    Some people whined that Microsoft should have made these devices available to the general public for purchase. First of all, these are developer prototypes, not consumer-quality devices. They are suitable for developing Windows 8 software but aren't ready for prime time. (For one thing, they run hot. More on that later.) Second of all, there aren't any to sell. We gave them all away! It's not like there's a factory sitting there waiting for orders. It was a one-shot production run. When they ran out, they ran out.¹

    Third, these devices, by virtue of being prototypes, had a high infant morality rate. I don't know exactly, but I'm guessing that maybe a quarter of them ended up not being viable. One of the things that the advance crew had to do was burn in the devices to try to catch the dead devices. I remember the team being very worried that the hardware helpdesk at the conference would be overwhelmed by machines that slipped through the on-site testing. Luckily, that didn't happen. (Perhaps they were too successful, because everybody ended up assuming that pumping out these puppies was a piece of cake!)

    Doing a little back-of-the-envelope calculations, let's say that the machines cost around $2,750 to produce, and that a quarter of them failed burn-in. Add on top of that a 25% buffer for administrative overhead, and you're looking at a cost-per-device of over $4,500. I doubt there would be many people interested in buying one at that price.

    Especially since you could buy something very similar for around $1100 to $1400. It won't have the hardware customizations, but it'll be close.

    The hardware glitches that occurred during the keynote never appeared during rehearsals in Redmond. But when rehearing in Anaheim, the hardware started flaking out like crazy and eventually self-destructing. (And like I said, those devices weren't cheap!) One of my colleagues got a call from Los Angeles: "When you come down here, bring as many extra Nikes as you can. We're burning through them like mad!" My colleague ended up pissing off everybody in the airport security line behind her when she got to the X-ray machine and unloaded nine devices onto the conveyer belt. "Great, I just put tens of thousands of dollars worth of top-secret hardware on an airport X-ray machine. I hope nothing happens to them."

    Why did the devices start failing during rehearsals in Anaheim, when the ran just fine in Redmond? Because in Anaheim, the devices were being run at full brightness all the time (so they show up better on camera), and they were driving giant video displays, and they were sitting under hot stage lights for hours on end. On top of that, I'm told that the HDMI protocol is bi-directional, so it's possible that the giant video displays at the convention center were feeding data back into the devices in a way that they couldn't handle. Put all that together, and you can see why the devices would start overheating.

    What made it worse was that in order to cram all the extra doodads and sensors into the device, the intestines had to be rearranged, and the touch processor chip ended up being placed directly over the HDMI processor chip. That meant that when the HDMI chip overheated, it caused the touch processor to overheat, too. If you watched the keynote carefully, you'd see that shortly before the machine on stage blew up, you saw the touch sensor flip out and generate phantom touches all over the screen. That was the clue that the machine was about to die from overheating and it would be in the presenter's best interest to switch to another machine quickly. (The problem, of course, is that the presenter is looking out into the audience giving the talk, not staring at the device's screen the whole time. As a result, this helpful early warning signal typically goes unnoticed by the very person who can do the most about it.)

    The day before the conference officially began, Jensen Harris did a preview presentation to the media. One of the glitches that hit during his presentation was that the system started hallucinating an invisible hand that kept swiping the Word Hunt sample game back onto the screen. Jensen quipped, "This is our new auto-Word Hunt feature. We want to make sure you always have Word Hunt when you need it. We've moved beyond touch. Now you don't even need to touch your PC to get access to Word Hunt."

    Jensen's phenomenal calm in the face of adversity also manifested itself during his keynote presentation. You in the audience never noticed it, but at one point, one of the demo applications hit a bug and hung. Jensen spotted the problem before it became obvious and smoothly transitioned to another device and continued. What's more, while he was talking, he went back to the first device and surreptitiously called up Task Manager, killed the the hung application, and prepared the device for the next demo. All this without skipping a beat.

    We are all in awe of Jensen.

    When he stopped by the booth, Jensen said to me, "I don't know how you can stand it, Raymond. Now I can't walk down the hallway without a dozen people coming up to me and wanting to say something or shake my hand or get my autograph!" (One of the rare times we are both in the same room.)

    Welcome to nerd celebrity, Jensen. You just have to smile and be polite.

    Bonus chatter: What happened to the devices that failed quality control? A good number of them were rejected for cosmetic reasons (scuff marks, mostly). As a thank-you gift to the advance crew for all their hard work, everybody was given their choice of a scuffed-up device to take home. The remaining devices that were rejected for purely cosmetic reasons were taken back to Redmond and distributed to the product team to be used for internal testing purposes.

    ¹ My group had one of these scuffed-up devices that we used for internal testing. Somebody dropped it, and a huge spiderweb crack covered the left third of the screen, so you had to squint to see what was on the screen through the cracks. We couldn't order a replacement because there was nowhere to order replacements from. We just had to continue testing with a device that had a badly cracked screen.

  • The Old New Thing

    Dark corners of C/C++: The typedef keyword doesn't need to be the first word on the line

    • 29 Comments

    Here are some strange but legal declarations in C/C++:

    int typedef a;
    short unsigned typedef b;
    

    By convention, the typedef keyword comes at the beginning of the line, but this is not actually required by the language. The above declarations are equivalent to

    typedef int a;
    typedef short unsigned b;
    

    The C language (but not C++) also permits you to say typedef without actually defining a type!

    typedef enum { c }; // legal in C, not C++
    

    In the above case, the typedef is ignored, and it's the same as just declaring the enum the plain boring way.

    enum { c };
    

    Other weird things you can do with typedef in C:

    typedef;
    typedef int;
    typedef int short;
    

    None of the above statements do anything, but they are technically legal in pre-C89 versions of the C language. They are just alternate manifestations of the quirk in the grammar that permits you to say typedef without actually defining a type. (In C89, this loophole was closed: Clause 6.7 Constraint 2 requires that "A declaration shall declare at least a declarator, a tag, or the members of an enumeration.")

    That last example of typedef int short; is particularly misleading, since at first glance it sounds like it's redefining the short data type. But then you realize that int short and short int are equivalent, and this is just an empty declaration of the short int data type. It doesn't actually widen your shorts. If you need to widen your shorts, go see a tailor.¹

    Note that just because it's legal doesn't mean it's recommended. You should probably stick to using typedef the way most people use it, unless you're looking to enter the IOCCC.

    ¹ The primary purpose of this article was to tell that one stupid joke. And it's not even my joke!

  • The Old New Thing

    Technically not lying, but not exactly admitting fault either

    • 11 Comments

    I observed a spill suspiciously close to a three-year-old's play table. I asked, "How did the floor get wet?"

    She replied, "Water."

    It's not lying, but it's definitely not telling the whole story. She'll probably grow up to become a lawyer.

  • The Old New Thing

    If you don't know what you're going to do with the answer to a question, then there's not much point in making others work hard to answer it

    • 32 Comments

    A customer asked the following question:

    We've found that on Windows XP, when we call the XYZ function with the Awesome flag, the function fails for no apparent reason. However, it works correctly on Windows 7. Do you have any ideas about this?

    So far, the customer has described what they have observed, but they haven't actually asked a question. It's just nostalgia, and nostalgia is not a question. (I'm rejecting "Do you have an ideas about this?" as a question because it too vague to be a meaningful question.)

    Please be more specific about your question. Do you want to obtain Windows 7-style behavior on Windows XP? Do you want to obtain Windows XP-style behavior on Windows 7? Do you merely want to understand why the two behave differently?

    The customer replied,

    Why do they behave differently? Was it a new design for Windows 7? If so, how do the two implementations differ?

    I fired up a handy copy of Windows XP in a virtual machine and started stepping through the code, and then I stopped and realized I was about to do a few hours' worth of investigation for no clear benefit. So I stopped and responded to their question with my own question.

    Why do you want to know the reason for the change in behavior? How will the answer affect what you do next? Consider the following three answers:

    1. "The behavior was redesigned in Windows 7."
    2. "The Windows XP behavior was a bug that was fixed in Windows 7."
    3. "The behavior change was a side-effect of a Windows Update hotfix."

    What will you do differently if the answer is (1) rather than (2) or (3)?

    The customer never responded. That saved me a few hours of my life.

    If you don't know what you're going to do with the answer to a question, then there's not much point in others working hard to answer it. You're just creating work for others for no reason.

  • The Old New Thing

    Using opportunistic locks to get out of the way if somebody wants the file

    • 24 Comments

    Opportunistic locks allow you to be notified when somebody else tries to access a file you have open. This is usually done if you want to use a file provided nobody else wants it.

    For example, you might be a search indexer that wants to extract information from a file, but if somebody opens the file for writing, you don't want them to get Sharing Violation. Instead, you want to stop indexing the file and let the other person get their write access.

    Or you might be a file viewer application like ildasm, and you want to let the user update the file (in ildasm's case, rebuild the assembly) even though you're viewing it. (Otherwise, they will get an error from the compiler saying "Cannot open file for output.")

    Or you might be Explorer, and you want to abandon generating the preview for a file if somebody tries to delete it.

    (Rats I fell into the trap of trying to motivate a Little Program.)

    Okay, enough motivation. Here's the program:

    #include <windows.h>
    #include <winioctl.h>
    #include <stdio.h>
    
    OVERLAPPED g_o;
    
    REQUEST_OPLOCK_INPUT_BUFFER g_inputBuffer = {
      REQUEST_OPLOCK_CURRENT_VERSION,
      sizeof(g_inputBuffer),
      OPLOCK_LEVEL_CACHE_READ | OPLOCK_LEVEL_CACHE_HANDLE,
      REQUEST_OPLOCK_INPUT_FLAG_REQUEST,
    };
    
    REQUEST_OPLOCK_OUTPUT_BUFFER g_outputBuffer = {
      REQUEST_OPLOCK_CURRENT_VERSION,
      sizeof(g_outputBuffer),
    };
    
    int __cdecl wmain(int argc, wchar_t **argv)
    {
      g_o.hEvent = CreateEvent(nullptr, FALSE, FALSE, nullptr);
    
      HANDLE hFile = CreateFileW(argv[1], GENERIC_READ,
        FILE_SHARE_READ, nullptr, OPEN_EXISTING,
        FILE_FLAG_OVERLAPPED, nullptr);
      if (hFile == INVALID_HANDLE_VALUE) {
        return 0;
      }
    
      DeviceIoControl(hFile, FSCTL_REQUEST_OPLOCK,
          &g_inputBuffer, sizeof(g_inputBuffer),
          &g_outputBuffer, sizeof(g_outputBuffer),
          nullptr, &g_o);
      if (GetLastError() != ERROR_IO_PENDING) {
        // oplock failed
        return 0;
      }
    
      DWORD dwBytes;
      if (!GetOverlappedResult(hFile, &g_o, &dwBytes, TRUE)) {
        // oplock failed
        return 0;
      }
    
      printf("Cleaning up because somebody wants the file...\n");
      Sleep(1000); // pretend this takes some time
    
      printf("Closing file handle\n");
      CloseHandle(hFile);
    
      CloseHandle(g_o.hEvent);
    
      return 0;
    }
    

    Run this program with the name of an existing file on the command line, say scratch x.txt. The program will wait.

    In another command window, run the command type x.txt. The program keeps waiting.

    Next, run the command echo hello > x.txt. Now things get interesting.

    When the command prompt opens x.txt for writing, the Device­Io­Control call completes. At this point we print the Cleaning up... message.

    To simulate the program taking a little while to clean up, we sleep for one second. Observe that the command prompt has not yet returned. Instead of immediately failing the request to open for writing with a sharing violation, the kernel puts the open request on hold to give our program time to clean up and close our handle.

    Finally, our simulated clean-up is complete, and we close the handle. At this point, the kernel allows the command processor to proceed and open the file for writing so it can write hello into it.

    That's the basics of opportunistic locks, but your program will almost certainly not be structured this way. You will probably not wait synchronously on the overlapped I/O but rather have the completion queued up to a completion function, an I/O completion port, or have a thread pool task listen on the event handle. When you do that, remember that you need to keep the OVERLAPPED structure as well as the REQUEST_OPLOCK_INPUT_BUFFER and REQUEST_OPLOCK_OUTUT_BUFFER structures valid until the I/O completes.

    (You may find the Cancel­Io function handy to try to accelerate the clean-up of the file handle and any other actions that are dependent upon it.)

    You can read more about opportunistic locks on MSDN. Note that there are limitations on explicitly-managed opportunistic locks; for example, they don't work across the network.

  • The Old New Thing

    The phenomenon of houses with nobody living inside, for perhaps-unexpected reasons

    • 11 Comments

    In London, some of the most expensive real estate is in neighborhoods where relatively few people actually live. According to one company's estimate, 37% of the the residences have been purchased by people who merely use them as vacation homes, visiting only for a week or two per year and leaving the building empty the remainder of the year. In other words, the people who can afford to live there choose not to.

    This same phenomenon is reported in other cities. For example, only 10% of the condos in the Plaza Hotel are occupied full-time.

    Another example of a house with nobody living inside is the case where the house is a façade for an industrial building, most commonly an electrical substation or a subway ventilation shaft.

    I find both categories fascinating.

  • The Old New Thing

    The problem with adding more examples and suggestions to the documentation is that eventually people will stop reading the documentation

    • 27 Comments

    I am a member of a peer-to-peer discussion group on an internal tool for programmers which we'll call Program Q. Every so often, somebody will get tripped up by smart quotes or en-dashes or ellipses, and they will get an error like

    C:\> q select table –s “awesome table”
    Usage: q select table [-n] [-s] table
    Error: Must specify exactly one table.
    

    After it is pointed out that they are a victim of Word's auto-conversion of straight quotes to slanted quotes, there will often be a suggestion, "You should treat en-dashes as plain dashes, smart quotes as straight quotes, and fancy-ellipses as three periods."

    The people who support Program Q are members of this mailing list, and they explain that unfortunately for Program Q, those characters have been munged by internal processing to the point that when they reach the command line parser, they have been transformed into characters like ô and ö, so the parser doesn't even know that it's dealing with an en-dash or smart-quote or fancy-ellipsis.

    Plus, this is a programming tool. Programmers presumably prefer consistent and strict behavior rather than auto-correcting guess-what-I-really-meant behavior. One of the former members of the Program Q support team recalled,

    It might be possible to detect potential unintended goofiness and raise an error, but that creates the possibility of false positives, which in turn creates its own set of support issues that are more difficult to troubleshoot and resolve. Sometimes it's better to just let a failure fail at the point of failure rather than trying to be clever.

    There was a team that had a script that started up the Program Q server, and if there was a problem starting the server, it restored the databases from a backup. Automated failure recovery, what could possibly go wrong? Well, what happened is that the script decided to auto-restore from a week-old backup and thereby wiped out a week's worth of work. And it turns out that the failure in question was not caused by database corruption in the first place. Oops.

    "Well, if you're not going to do auto-correction, at least you should add this explanation to the documentation."

    The people who support Program Q used to take these suggestions to heart, and when somebody said, "You should mention this in the documentation," they would more than not go ahead and add it to the documentation.

    But that merely created a new phenomenon:

    I can't get Program Q to create a table. I tried q create -template awesome_template awesome_table, but I keep getting the error "Template 'awesome_template' does not exist in the default namespace. Check that the template exists in the specified location. See 'q help create -template' for more information." What am I doing wrong?

    Um, did you check that the template exists in the specified location?

    "No, I haven't. Should I?"

    (Facepalm.)

    After some troubleshooting, the people on the discussion group determine that the problem was that the template was created in a non-default namespace, so you had to use a full namespace qualifier to specify the template. (I'm totally making this up, I hope you realize. The actual Program Q doesn't have a template-create command. I'm just using this as a fake example for the purpose of storytelling.)

    After this all gets straightened out, somebody will mention, "This is explained in the documentation for template creation. Did you read it?"

    "I didn't read the documentation because it was too long."

    If you follow one person's suggestion to add more discussion to the documentation, you end up creating problems for all the people who give up on the documentation because it's too long, regardless of how well-organized it is. In other words, sometimes adding documentation makes things worse. The challenge is to strike a decent balance.

    Pre-emptive snarky comment: "TL;DR."

  • The Old New Thing

    Dreaming about games based on Unicode

    • 20 Comments

    I dreamed that two of my colleagues were playing a game based on pantomiming Unicode code points. One of them got LOW QUOTATION MARK, and the other got a variety of ARROW POINTING NORTHEAST, ARROW POINTING EAST, ARROW POINTING SOUTHWEST.

    I wonder how you would pantomime ZERO WIDTH NON-JOINER.

Page 1 of 3 (30 items) 123