July, 2011

  • The Old New Thing

    The historical struggle over control of the Portuguese language

    • 42 Comments

    Portugal has been going through a rough patch. Its international stature has diminished over the years, its economy has always struggled to remain competitive, the government had to accept a bailout to avoid defaulting on its debt, and on top of it all, it is losing control of its own language.

    In Portugal, the latest round of Portuguese spelling reform takes effect over a six-year transition period, leaving the Portuguese dismayed that the spelling of their language is being driven by Brazil, a former colony. I sympathize with the plight of the Portuguese, although I also understand the value of consistent spelling. (The rules for the English language are established not by any central authority but rather are determined by convention.)

    I wonder if the U.K. feels the same way about its former colony.

    Bonus chatter: The Microsoft Language Portal Blog reports that Microsoft intends to phase in the spelling reform over a four-year period for Brazil-localized products. A quick glance at the Microsoft style guide for Portuguese (Portugal) says that the spelling reform has yet to take effect among the Portugal-localized version of Microsoft products.

  • The Old New Thing

    What is that horrible grinding noise coming from my floppy disk drive?

    • 41 Comments

    Wait, what's a floppy disk drive?

    For those youngsters out there, floppy disks are where we stored data before the invention of the USB earring. A single floppy disk could hold up to two seconds of CD-quality audio. This may not sound like a lot, but it was in fact pretty darned awesome, because CDs hadn't been invented yet either.

    Anyway, if you had a dodgy floppy disk (say, because you decided to fold it in half), you often heard a clattering sound from the floppy disk drive as it tried to salvage what data it could from the disk. What is that sound?

    That sound is recalibration.

    The floppy disk driver software kept getting errors back from the drive saying "I can't find any good data." The driver figures, "Hm, maybe the problem is that the drive head is not positioned where I think it is." You see, floppy drives do not report the actual head position; you have to infer it by taking the number of "move towards the center" commands you have issued and subtracting the number of "move towards the edge" commands. The actual location of the drive head could differ from your calculations due to an error in the driver, or it could just be that small physical errors have accumulated over time, resulting in a significant difference between the theoretical and actual positions. (In the same way that if you tell somebody to step forward ten steps, then backward ten steps, they probably won't end up exactly where they started.)

    To get the logical and physical positions back in sync, the driver does what it can to get the drive head to a known location. It tells the hardware, "Move the drive head one step toward the edge of the disk. Okay, take another step. One more time. Actually, 80 more times." Eventually, the drive head reaches the physical maximum position, and each time the driver tells the hardware to move the head one more step outward, it just bumps against the physical boundary of the drive hardware and makes a click sound. If you issue at least as many "one more step outward" commands as there are steps from the innermost point of the disk to the edge, then the theory is that at the end of the operation, the head is in fact at track zero. At that point, you can set your internal "where is the drive head?" variable to zero and restart the original operation, this time with greater confidence that the drive head is where you think it is.

    The amount of clattering depends on where the drive head was when the operation began. If the drive head were around track 40, then the first 40 requests to move one step closer to the center would do exactly that, and then next 43 requests would make a clicking noise. On the other hand, if the drive head were closer to track zero already, then nearly all of the requests result in the drive head bumping against the physical boundary of the drive hardware, and you get a longer, noisier clicking or grinding sound.

    You can hear the recalibration at the start of this performance.

    Bonus floppy drive music.

    Bonus reading: Tim Paterson, author of DOS, discusses all those floppy disk formats.

  • The Old New Thing

    At least it'll be easy to write up the security violation report

    • 37 Comments

    Many years ago, Microsoft instituted a new security policy at the main campus: all employees must visibly wear their identification badge, even when working in their office. As is customary with with nearly all new security policies, it was met with resistance.

    One of my colleagues was working late, and his concentration was interrupted by a member of the corporate security staff at his door.

    Sir, can I see your cardkey?

    My colleague was not in a good mood (I guess it was a nasty bug), so he curtly replied, "No. I'm busy."

    Sir, you have to show me your cardkey. It's part of the new security policy.

    "I told you, I'm busy."

    Sir, if you don't show me your cardkey, I will have to write you up.

    "Go ahead, if it'll get you out of my office."

    All right, then. What's your name?

    Without even looking from his screen, my colleague replied impatiently, "It's printed on the door."

    The policy was rescinded a few weeks later.

  • The Old New Thing

    Hey, let's report errors only when nothing is at stake!

    • 37 Comments

    Only an idiot would have parameter validation, and only an idiot would not have it. In an attempt to resolve this paradox, commenter Gabe suggested, "When running for your QA department, it should crash immediately; when running for your customer, it should silently keep going." A similar opinion was expressed by commenter Koro and some others.

    This replaces one paradox with another. Under the new regime, your program reports errors only when nothing is at stake. "Report problems when running in test mode, and ignore problems when running on live data." Isn't this backwards? Shouldn't we be more sensitive to problems with live data than problems with test data? Who cares if test data gets corrupted? That's why it's test data. But live data—we should get really concerned when there's a problem with live data. Allowing execution to continue means that you're attempting to reason about a total breakdown of normal functioning.

    Now, if your program is mission-critical, you probably have some recovery code that attempts to reset your data structures to a "last known good" state or which attempts to salvage what information it can, like how those space probes have a safe mode. And that's great. But silently ignoring the condition means that your program is going to skip happily along, unaware that what it's doing is probably taking a bad situation and subtly making it slightly worse. Eventually, things will get so bad that something catastrophic happens, and when you go to debug the catastrophic failure, you'll have no idea how it got that way.

  • The Old New Thing

    How do I find the original name of a hard link?

    • 35 Comments

    A customer asked, "Given a hardlink name, is it possible to get the original file name used to create it in the first place?"

    Recall that hard links create an alternate name for a file. Once that alternate name is created, there is no way to tell which is the original name and which is the new name. The new file does not have a "link back to the original"; they are both links to the underlying file content. This is an old topic, so I won't go into further detail. Though this question does illustrate that many people continue to misunderstand what hard links are.

    Anyway, once you figure out what the customer is actually asking, you can give a meaningful answer: "Given the path to a file, how can I get all the names by which the file can be accessed?" The answer is Find­First­File­NameW.

    Note that the names returned by the Find­First­File­NameW family of functions are relative to the volume mount point. To convert it to a full path, you need to append it to the mount point. Something like this:

    typedef void (*ENUMERATEDNAMEPROC)(__in PCWSTR);
    
    void ProcessOneName(
        __in PCWSTR pszVolumeRoot,
        __in PCWSTR pszLink,
        __in ENUMERATEDNAMEPROC pfnCallback)
    {
      wchar_t szFile[MAX_PATH];
      if (SUCCEEDED(StringCchCopy(szFile, ARRAYSIZE(szFile), pszVolumeRoot)) &&
          PathAppend(szFile, pszLink)) {
       pfnCallback(szFile);
      }
    }
    
    void EnumerateAllNames(
        __in PCWSTR pszFileName,
        __in ENUMERATEDNAMEPROC pfnCallback)
    {
     // Supporting paths longer than MAX_PATH left as an exercise
     wchar_t szVolumeRoot[MAX_PATH];
     if (GetVolumePathName(pszFileName, szVolumeRoot, ARRAYSIZE(szVolumeRoot))) {
      wchar_t szLink[MAX_PATH];
      DWORD cchLink = ARRAYSIZE(szLink);
      HANDLE hFind = FindFirstFileNameW(pszFileName, 0, &cchLink, szLink);
      if (hFind != INVALID_HANDLE_VALUE) {
       ProcessOneName(szVolumeRoot, szLink, pfnCallback);
       while (cchLink = ARRAYSIZE(szLink),
              FindNextFileNameW(hFind, &cchLink, szLink)) {
        ProcessOneName(szVolumeRoot, szLink, pfnCallback);
       }
       FindClose(hFind);
      }
     }
    }
    
    // for demonstration purposes, we just print the name
    void PrintEachFoundName(__in PCWSTR pszFile)
    {
     _putws(pszFile);
    }
    
    int __cdecl wmain(int argc, wchar_t **argv)
    {
     for (int i = 1; i < argc; i++) {
      EnumerateAllNames(argv[i], PrintEachFoundName);
     }
     return 0;
    }
    

    Update: Minor errors corrected, as noted by acq and Adrian.

  • The Old New Thing

    The danger of making the chk build stricter is that nobody will run it

    • 31 Comments

    Our old pal Norman Diamond suggested that Windows should go beyond merely detecting dubious behavior on debug builds and should kill the application when it misbehaves.

    The thing is, if you make an operating system so strict that the slightest misstep results in lightning bolts from the sky, then nobody would run it.

    Back in the days of 16-bit Windows, as today, there were two builds, the so-called retail build, which had assertions disabled, and the so-called debug build, which had assertions enabled and broke into the debugger if an application did something suspicious. (This is similar to today's terms checked and free.)

    Now, the Windows development team is big on self-hosting. After all, if you are writing the operating system, you should be running it, too. What's more, it was common to self-host the debug version of the operating system, since that's the one with the extra checks and assertions that help you flush out the bugs.

    As it happens, the defect tracking system we used back in the day triggered a lot of these assertions. As I recall, refreshing a query resulted in about 50 parameter validation errors caught and reported by Windows. This made using the defect tracking system very cumbersome because you had to babysit the debugger and hit "i" (for ignore) 50 times each time you refreshed a query.

    (As I noted in my talk at Reflections|Projections 2009, the great thing about defect tracking systems is that you will hate every single one you use. Sure, the new defect tracking system may have some new features and be easier to use and run faster, but all that does is delay the point at which you begin hating it.)

    If Windows had taken the stance that the slightest error resulted in the death of the application, then it would have been impossible for a member of the Windows development team to run the defect tracking system program itself, because once it hit the first of those 50 parameter validation error reports, the program would have been killed, and the defect tracking system would have been rendered useless.

    Remember, don't change program semantics in the debug build. That just creates Heisenbugs.

    I remember that at one point the Windows team asked the people who supported the defect tracking system, "Hey, your program has a lot of problems that are being reported by the Windows debug build. Can you take a look at it?"

    The response from the defect tracking system support team was somewhat ironic: "Sorry, we don't support running the defect tracking system on a debug build of Windows. We found that the debug version of Windows breaks into the debugger too much."

  • The Old New Thing

    Windows has supported multiple UI languages for over a decade, but nobody knew it

    • 30 Comments

    In the early days of Windows, there was a separate version of Windows for each language, and once you decided to install, say, the French version of Windows, you were locked into using French. You couldn't change your mind and, say, switch to German. The reason for this is that there were bits and pieces of language-dependent information stored all over the system.

    One obvious place is in file names. For example, a shortcut to the calculator program was kept at %USERPROFILE%\Start Menu\Programs\Accessories\Calculator.lnk on US-English systems, but %USERPROFILE%\Startmenü\Programme\Zubehör\Rechner.lnk on German systems. The name of the physical file system directory or file was displayed to the user as the name of the menu item. This means that if you started with an English system and simply replaced all the user interface resources with the corresponding German ones, you would still see a folder named Accessories on your Start menu, containing a shortcut named Calculator, even though they should now be displayed as Zubehör and Rechner.

    The registry was another place where language-dependent strings were stored. For example, file type descriptions were stored in plain text, which meant that if you installed an English system, then HKEY_CLASSES_ROOT\txtfile had the value Text Document, and that's the value shown to the user under the Typ column even though the user had switched the user interface resources to German.

    For Windows 2000, an effort was made to move all language-dependent content into resources so that they could be changed dynamically. If you need to store a language-dependent string anywhere, you can't store the string in plain text, because that would not survive a change in language. You have to store an indirect string and convert the indirect string to a real string at runtime, so that it mapped through the user's current user interface language. It was quite an effort identifying all the places that needed to be changed to conform to the new rules while still ensuring that the new rules were backward compatible with old code that followed the old rules.

    For example, you couldn't just say "To register a language-aware file type friendly name, write an indirect string to HKEY_CLASSES_ROOT\progid. For example, set HKEY_CLASSES_ROOT\txtfile to REG_SZ:@C:\Windows\system32\notepad.exe,-469." If you did that, then applications which retrieved file type friendly names by reading directly from HKEY_CLASSES_ROOT\progid (instead of using functions like SHGet­File­Info) would end up showing this to the user:

    Name Type Modified
    House pictures @C:\Windows\system32\zipfldr.dll,-10195 11/16/1998 4:09 PM
    notes @C:\Windows\system32\notepad.exe,-469 11/23/1998 1:52 PM
    Proposal @"C:\Program Files\Windows NT\Accessories\WORDPAD.EXE",-190 10/31/1998 10:32 AM

    instead of

    Name Type Modified
    House pictures Compressed Folder 11/16/1998 4:09 PM
    notes Text Document 11/23/1998 1:52 PM
    Proposal Rich Text Document 10/31/1998 10:32 AM

    Designing and implementing all this was a major undertaking (that's what happens when you have to retrofit something as opposed to designing it in from the beginning), and to keep the test matrix from growing quadratically in the number of supported languages, a decision was made early on to support dynamic language changes only if the starting language is English. So yes, you could have both English and Dutch resources installed, but you have to start with English and add Dutch and not the other way around.

    Mind you, the implementation in Windows 2000 was not perfect. There were still places where English strings appeared even after you switched the user interface language to Dutch or German, but things got better at each new version of Windows. Unfortunately, pretty much nobody knew about this feature, since it was marketed to large multinational corporations and not to your random everyday users who simply want to change the user interface to a language they are more comfortable with.

    For Windows 2000 and Windows XP, you still had two ways of installing Windows with a German user interface: You could either install the English version and then add the German language pack (the fancy Windows 2000 multilingual way), or you could install the fully-localized German version of Windows, just as you always did. In Windows Vista, fully-localized versions of Windows were dropped. From Windows Vista onwards, all versions of Windows consist of a base language-neutral version with a language pack installed on top.

    While it's true that access to the feature has improved in more recent versions of Windows, the feature has existed for over a decade. But of course, that doesn't stop people from claiming that it's a "new" feature. Don't let the facts get in the way of a good story.

  • The Old New Thing

    The tradition of giving cute names to unborn babies

    • 30 Comments

    Many of my friends gave names to their unborn babies. Most of them were based on various objects that were the size of the adorable little parasite¹ at the time they discovered that they were pregnant:

    • The Peanut
    • Gumdrop
    • Jellybean
    • Blueberry
    • Mr. Bean

    There were a few outliers, though.

    That last one takes a bit of explaining. Having grown tired of people asking her what she was planning on naming the baby, my friend made up an absurd name and used it with a straight face. "We're think of naming her Aubergine, if it's a girl." People would respond with a polite but confused "Oh, that's an interesting name."

    Then, still deadpan, she would add, "If it's a boy, then we're leaning toward Mad-Dog."

    That usually tipped people off that she was just messing with them.

    Related:

    ¹ A peek behind the curtain: I couldn't decide whether to write fetus or embryo, and I knew that if I picked one, then people would say that I should've picked the other, so I decided to avoid the issue entirely by writing "adorable little parasite". This is what nitpickers have turned me into.

  • The Old New Thing

    How is it possible to run Wordpad by just typing its name even though it isn't on the PATH?

    • 29 Comments

    In a comment completely unrelated to the topic, Chris Capel asks how Wordpad manages to run when you type its name into the Run dialog even though the command prompt can't find it? In other words, the Run dialog manages to find Wordpad even though it's not on the PATH.

    Chris was unable to find anywhere I discussed this issue earlier, but it's there, just with Internet Explorer as the application instead of Wordpad.

    It's through the magic of App Paths.

    App Paths was introduced in Windows 95 to address the path pollution problem. Prior to the introduction of App Paths, typing the name of a program without a fully-qualified path resulted in a search along the path, and if it wasn't found, then that was the end of that. File not found. As a result, it became common practice for programs, as part of their installation, to edit the user's AUTOEXEC.BAT and add the application's installation directory to the path.

    This had a few problems.

    First of all, editing AUTOEXEC.BAT is decidedlly nontrivial since batch files can have control flow logic like IF and CALL and GOTO. Finding the right SET PATH=... or PATH ... command is an exercise in code coverage analysis, especially since MS-DOS 6 added multi-config support to CONFIG.SYS, so the value of the CONFIG environment variable is determined at runtime. If you wanted to avoid hanging your setup program, you would have to solve the Halting Problem. (You can't just stick at PATH ... at the beginning because it might get wiped out by a later PATH command, and you can't just stick it at the end, because control might never reach last line of the batch file.)

    And of course, very few uninstall programs would take the time to undo the edits the installer performed, and even if they tried, there's no guarantee that the undo would be successful, since the user (or another installer!) may have edited the AUTOEXEC.BAT file in the meantime.

    Even if you postulate the existence of the AUTOEXEC.BAT editing fairy who magically edits your AUTOEXEC.BAT for you, you still run into the PATH length limit. The maximum length of a command line was 128 characters in MS-DOS, and if each program added itself to the PATH, it wouldn't be long before the PATH reached its maximum length.

    Pre-emptive Yuhong Bao irrelevant detail that has no effect on the story: Windows 95 increased the maximum command line length, but the program being launched needed to know where to look for the "long command line". And that didn't help existing installers which were written against the old 128-character limit. Give them an AUTOEXEC.BAT with a line longer than 128 characters and you had a good chance that you'd hit a buffer overflow bug.

    On top of the difficulty of adding more directories to the PATH, there was the recognition that this was another case of using a global setting to solve a local problem. It seemed wasteful to add a directory to the path just so you could find one file. Each additional directory on the path slowed down path sarching operations, even the ones unrelated to locating that one program.

    Enter App Paths. The idea here is that instead of adding your application directory to the path, you just create an entry under the App Paths key saying, "If somebody is looking to execute contoso.exe, I put it over here." Instead of adding an entire directory to the path, you just add a single file, and it's used only for application execution purposes, so it doesn't slow down other path search operations like loading DLLs.

    (Note that the old documentation on App Paths has been superseded by the new documentation linked above.)

    Now that there was a place to store information associated with a particular application, you may as well use it for other stuff as well. A secondary source of path pollution came from applications which added not only the application directory to the path, but also a helper directory where the application kept its DLLs. To address this, an additional Path value specified which directories your application wanted to be added to the path before it was executed. Over time, additional attributes were added to the App Paths key, such as the UseUrl value we saw some time ago.

    When you type the name of a program into the Run dialog (with no path), the Shell­Execute function checks if the name corresponds to an application registered under App Paths. If so, then it uses the registration information to launch the application. Hooray, applications can be run by just typing their name without requiring them to modify the global path.

    Note that this extra lookup is performed only by the Sh­ellExecute family of functions, so if you use Create­Process or Search­Path, you'll still get ERROR_FILE_NOT_FOUND.

    Now, the intent was that the registered full path to the application is the same as the registered short name, just with a full path in front. For example, wordpad.exe registers the full path of %ProgramFiles%\Windows NT\Accessories\WORDPAD.EXE. But there's no check that the two file names match. The Pbrush folks took advantage of this by registering an application path entry for pbrush.exe with a full path of %SystemRoot%\System32\mspaint.exe: That way, when somebody types pbrush into the Run dialog, they get redirected to mspaint.exe.

    Sneaky.

  • The Old New Thing

    No, we're not going to play Stairway to Heaven, and please tell everbody else in your area code to stop calling me

    • 26 Comments

    Some time ago, I told the story of how one employee's phone received calls intended for a local radio station's contest line due to people dialing seven digits instead of ten and defaulting to the wrong area code.

    Upon reading that story, a colleague of mine pointed out that one of the conference rooms in his building has a similar problem. The direct line for the conference room is identical to the request line for a local radio station, save for the area code. People who work in the building know never to answer the phone in that conference room.

    (Although apparently there have been a couple of pranks involving the call-forwarding function on the conference room telephone.)

Page 1 of 3 (26 items) 123