May, 2006

  • The Old New Thing

    On languages and spelling


    When I brought up the topic of spelling bees earlier this year, it triggered several comments on how various languages deal with the issue of spelling. Here are some thoughts on the topics that were brought up:

    German spelling is only partly phonetic. Given the spelling of a word, one can, after applying a rather large set of rules, determine its pronunciation with very high accuracy. On the other hand, given the pronunciation of a word, the spelling is not obvious. For example, do you write "Feber" or "Vehber" or possibly "Phäber"? "Ist" or "isst"? "Quelle" or "Kwälle"? The fact that Germany is undergoing controversial spelling reform proves that German spelling is not entirely predictable. After all, if spelling were completely phonetic, there would be no need for reform!

    And all those pronunciation rules. Sometimes a "d" is pronounced like "t"; sometimes a "t" is pronounced like "z"; sometimes a "g" is pronounced like "ch"; sometimes "st" is pronounced like "scht". One would think that a truly "phonetically-spelled" language would have a one-to-one correspondence between sounds and letters. (I'm led to believe that many Eastern European languages are phonetic in this way.) Furthermore, given a word's spelling, it's not always obvious where the stress lies. For example, you just have to know that the accent in Krawatte goes on the second syllable. The spelling gives you no help.

    Swedish is like German in this respect: Given the spelling of a word, you can (again, after the application of a rather large set of rules) determine its pronunciation with a high degree of confidence. But going in the other direction can be a nightmare. The tricky "sj" sound goes by many spellings: "sj", "stj", "stj" , "sk", "ch", and sometimes even "g" (in French-derived words). Depending on the regional accent, the pronunciation of a leading "s" can vary depending on the ending of the previous word. (Though I suspect most Swedes don't even hear the difference themselves.)

    At least in English, we're honest about the fact that our spelling is complicated. English spelling only starts to become intuitive once you've learned French, German, Middle English, Greek, Latin, and a handful of other languages, learned British history (so you know who conquered whom when and ransacked their language for new words), and learned how the precursor-languages to modern English were pronounced at the time the words were imported.

    That last point is a problem common to many languages. The spelling of a word tends to change much more slowly than its pronunciation. English retains the original spelling long after the pronunciation has moved on. Many Chinese characters are puzzling until you realize that the word was pronounced differently a few thousand years ago. (Yes, there is a phonetic component to Chinese characters, believe it or not.) Resistance to spelling reform in Germany is just another manifestation of spelling inertia.

    One thing I thought was interesting was the types of competitions different languages use to promote correct spelling and/or grammar. In the United States, spelling competitions (known as "spelling bees") are the most common way of accomplishing this. Students are each given a word to spell, which must be done from memory. Spell it correctly and you survive to the next round; spell it incorrectly and you are eliminated.

    It is my understanding that in Taiwan, the analogous competition is the "dictionary look-up". I'm hazy on the details, but I think the way it works is that a character is shown to the class, and the students race to look it up in the dictionary. Since dictionaries are typically arrange phonetically, a student who already knows how the character is pronounced has an advantage over a student who has to count strokes and perform radical decomposition in order to look it up.

    I was not previous aware of dictation competitions, but they appear to be particularly popular in Poland. This allows greater emphasis to be placed on the complexity of Polish grammar. A former colleague of mine who grew up in Poland told me that when she goes back to visit relatives, it takes her a while to "regain her tongue" and stop making grammatical errors. You know you've got a complicated language when even a native speaker has to get back up to speed.

  • The Old New Thing

    When people mimic the display rather than the actual data


    I recall a bug that we were investigating that was being caused by a registry key being set when it shouldn't have been. But when you looked at the key in Regedit, it say "(value not set)". Why were we going down the "value is set" branch? A little spelunking with the debugger revealed the reason directly: Whoever set up that registry key wrote the literal string "(value not set)" to the registry! Thus, the value was set, to the string "(value not set)"!

    We were flabbergasted. The only explanation we could come up with was that whoever created the registry key didn't understand that the "(value not set)" was shown by Regedit for a key with no value. Instead, they figured "Oh, I need it to look like that, so I'll set the value to '(value not set)'. Now it looks right in Regedit."

    Along similar lines, I've been told of a system which appeared to have two (Default) values. As you can probably guess by now, what really happened is that somebody created a value whose name was "(Default)".

    The moral of the story is not to confuse what something is with what something looks like.

  • The Old New Thing

    Automatic messages when you're not in the office - the infamous OOF


    "OOF" is a word you hear a lot at Microsoft. KC Lemson gave the etymology a while back (though my recollection is that it stood for "Out of Office Feature", not that my memory is good for much nowadays). Incidentally, KC is profiled on the Microsoft Careers site, though she goes under the top-secret code name "KC" there.

    Most people set their "vacation" message to something pretty straightforward. A brief message, a return date, and a flowchart of who can be contacted in the meantime. Here's what one might look like. (For the sake of illustration, I made up a "Teapot project" as well some imaginary members and team mailing list. I did not make up "Kansas", however. Believe it or not, that's a real state!)

    In Kansas until March 3, checking email sporadically.

    Teapot shading: Fred Smith
    Teapot rotation: Bob Wilson
    Teapot general: tpteam
    Emergency: 425.555.9595

    The OOF is an opportunity for small-form-factor humor. When he left on holiday at the end of December, Marc Miller's OOF message introduced the "flowchart" section with the heading "These people are probably also OOF". Jensen Harris's OOF earlier this year read

    Out of office, Thursday March 31. Back on Friday.
    If you are injured, dial 911.

    (But don't call 911 for a non-emergency like this lady. On the other hand, KC called 911 because she couldn't get out of bed.)

    As for me, I try to keep my OOF under twenty words. Part of the trick is getting rid of the "flowchart". I remember one time I simply wrote "Returning dd-mmm-yy. You'll just have to cope until then."

    The "flowchart" section of the OOF is one of those places where beginners go overboard, listing a half dozen topics and the corresponding backup. It's a sort of ego trip, where you can quietly show off, "Wow, look at all the things I do. How would you ever survive without me?" As with email signatures and the amassing of physical objects, the more seasoned you become, the more you value the ability to keep it short and simple.

  • The Old New Thing

    Solutions that don't actually solve anything


    If changing a setting requires administrator privileges in the first place, then any behavior that results cannot be considered a security hole because in order to alter the setting, attackers must already have gained administrative privileges on the machine, at which point you've already lost the game. If attackers have administrative privileges, they're not going to waste his time fiddling with some setting and leveraging it to gain even more privileges on the system. They're already the administrator; why go to more work to get what they already have?

    One reaction to this is to try to "secure" the feature by asking, "Well, can we make it harder to change that setting?" For example, in response to the Image File Execution Options key, Norman Diamond suggested "only allowing the launching of known debuggers." But this solution doesn't actually solve anything. What would a "known debugger" be?

    • "The operating system contain a hard-coded list of known debuggers. On that list are ntsd.exe, cdb.exe, and maybe windbg.exe." Personally, I would be okay with that, but that's because I do all my debugging in assembly language anyway. Most developers would want to use devenv.exe or bds.exe or even gdb.exe. If somebody comes up with a new debugger, they would have to petition Microsoft to add it to the hard-coded list of "known debuggers" and then wait for the next service pack for it to get broad distribution. And even before the ink was dry on the electrons, I'm sure somebody somewhere will already have filed an anti-competitive-behavior lawsuit. ("Microsoft is unlawfully raising the barrier to entry to competing debugging products!")
    • "Okay, then the program just needs to be digitally signed in order to be considered a 'known debugger'." Some people would balk at the $500/year cost of a code signing certificate. And should the operating system ask the user whether or not they trust the signing authority before running the debugger? (What if the debugger is being invoked on a service or a remote computer? There is nowhere to display the UI!) Actually, these were all trick questions. It doesn't matter whether the operating system prompts or not, because the attackers would just mark their signing certificate as a "trusted" certificate. And in fact the $500/year wouldn't stop the attackers, since they would just create their own certificate and install it as a "trusted root". Congratulations, the only people who have to pay the $500/year are the honest ones. The bad guys just slip past with their self-signed trusted-root certificate.
    • "Okay, forget the digital signature thing, just have a registry key that lists all the 'known debuggers'. If you're on the list, then you can be used in Image File Execution Options." Well, in that case, the attackers would just update the registry key directly and set themselves as a "known debugger". That "known debuggers" registry key didn't slow them done one second.
    • "Okay, then not a registry key, but some other setting that's hard to find." Oh, now you're advocating security through obscurity?

    Besides, it doesn't matter how much you do to make the Image File Execution Options key resistant to unwanted tampering. If the attacker has administrative privileges on your machine, they won't bother with Image File Execution Options anyway. They'll just install a rootkit and celebrate the addition of another machine to their robot army.

    Thus is the futility of trying to stop someone who already has obtained administrative privileges. You're just closing the barn door after the horse has bolted.

  • The Old New Thing

    Why doesn't Ethan Hunt have to wear identification?


    Whenever there was a scene in Mission: Impossible III that took place at the agency offices, I was repeatedly bothered by the fact that all the people in the building are wearing their identification badges clipped to their jackets or shirts. Except Ethan Hunt. He gets to walk through the halls like a cologne advertisement.

    Why doesn't he have to wear identification? His boss has to wear identification. His boss's boss has to wear identification. But Ethan Hunt gets to just wander around in black looking cool without any unsightly identification tag that would ruin the look of whole outfit.

    I was also somewhat off-put, as was Bob Mondello, that the producers thought it necessary to identify cities on-screen as "Berlin, Germany", "Rome, Italy", and "Shanghai, China". Do they think we're so stupid that we don't know where Berlin is?

    (And keep an eye out for the American-style fire alarm during the chase through Shanghai just as Ethan Hunt turns a corner. At first glance I thought it said "REEB", but upon further reflection I believe the last two letters are more likely to be "FD"—"fire department". I don't know what the first two letters stand for, or even if I remembered them correctly.)

  • The Old New Thing

    Subtle ways your innocent program can be Internet-facing


    Last time, we left off with a promise to discuss ways your program can be Internet-facing without your even realizing it, and probably the most common place for this is the command line. Thanks to CIFS, files can be shared across the Internet and accessed via UNC notation. This means that anybody can set up a CIFS server and create files like \\\some\file.ext, and they will look to the world like a file on a file server somewhere (because that is, in fact, what it is). When you double-click it, you're launching the document.

    And that's where the command line attack comes from. Suppose your program is a handler for a file association. Say, your program is litware.exe and it is the registered handler for .LIT files. The attacker just has to create a file called \\\some\path\target.lit and induce the user into double-clicking it. Once that's done, your program will be run with the command line you registered, which will probably be

    "C:\Program Files\Litsoft\litware.exe" \\\some\path\target.lit

    Notice that the attacker controls the path. This means that if you have a bug in your command line parser, the attacker can exploit it.

    Code injection via the command line is an elevation of privilege.

    Note that this extends beyond merely extra-long file names. If you registered your verb incorrectly by forgetting to put quotation marks around the file name insertion %1, the attacker can hatch a file with an odd name like \\\strange -uninstall path.lit. The resulting command line is therefore

    "C:\Program Files\Litsoft\litware.exe" \\\strange -uninstall path.lit

    Your parser then breaks the command line up into words and interprets this command line as having three parts:

    • The file \\\strange
    • The command line switch -uninstall
    • The file path.lit.

    The program then tries to load the file \\\strange and fails, possibly displaying an error message, then it uninstalls itself, and then tries (and fails) to load the file path.lit. End result: The user gets two strange error messages and the program is uninstalled.

    Of course, the attacker also controls the contents of the file, so any vulnerabilities in your file parser can be exploited as well.

    Code injection via file contents is an elevation of privilege.

    If you write a shell extension, your extension will run if the user activates it on the remote file. For example, if you have a context menu extension, it will be instantiated and initialized with the remote file as the data object. Many context menu extensions contain buffer overflow bugs in the way they mishandle the names of the files that the user right-clicked on. (Notice that I said "names"—plural. The user might multi-select files and right-click on them.) For example, a certain shareware file archival program responds to the GCS_HELPTEXT request by taking the names of all the files and combining them into the message "Add the files A, B, C, D, and E to the archive." Unfortunately, when the names A, B, C, D, and E are very long, an exploitable buffer overrun occurs.

    Code injection triggered by file name length is an elevation of privilege.

    Just because your program doesn't contact the Internet explicitly doesn't mean it's safe from Internet-based attacks.

  • The Old New Thing

    Seattle boating season opens but never closes


    This past weekend was Opening Day of the Seattle boating season. This tends to create traffic chaos in the Montlake neighborhood, which leads to confusing newspaper headlines like Opening Day closure. I remember many years ago asking a boat-owning colleague, "So, when does boating season close?"

    "Oh, it doesn't close."

    "Then why do they have an Opening Day for something that hasn't closed?"

    "It gives the slacker fair-weather boaters a target date to get their boats back in condition. It really should be called something like Bring Out Your Boats Day."

  • The Old New Thing

    It rather involved being on the other side of this airtight hatchway


    Not every code injection bug is a security hole.

    Yes, a code injection bug is a serious one indeed. But it doesn't become a security hole until it actually allows someone to do something they normally wouldn't be able to.

    For example, suppose there's a bug where if you type a really long file name into a particular edit control and click "Save", the program overflows a buffer. With enough work, you might be able to turn this into a code injection bug, by entering a carefully-crafted file name. But that's not a security hole yet. All you've found so far is a serious bug. (Yes, it's odd that I'm underplaying a serious bug, but only because I'm comparing it to a security hole.)

    Look at what you were able to do: You were able to get a program to execute code of your choosing. Big deal. You can already do that without having to go through all this effort. If you wanted to execute code of your own choosing, then you can just put it in a program and run it!

    The hard way

    1. Write the code that you want to inject, compile it to native machine code.
    2. Analyze the failure, develop a special string whose binary representation results in the overwriting of a return address, choosing the value so that it points back into the stack.
    3. Write an encoder that takes the code you wrote in step 1 and converts it into a string with no embedded zeros. (Because an embedded zero will be treated as a string terminator.)
    4. Write a decoder that itself contains no embedded-zeros.
    5. Append the encoded result from step 3 to the decoder you wrote in step 4 and combine it with the binary representation you developed in step 2.
    6. Type the resulting string into the program.
    7. Watch your code run.

    The easy way
    1. Write the code that you want to inject. (You can use any language, doesn't have to compile to native code.)
    2. Run it.

    It's like saying that somebody's home windows are insecure because a burglar could get into the house by merely unlocking and opening the windows from the inside. (But if the burglar has to get inside in order to unlock the windows...)

    Code injection doesn't become a security hole until you have elevation of privilege. In other words, if attackers gains the ability to do something they normally wouldn't. If the attack vector requires setting a registry key, then the attacker must already have obtained the ability to run enough code to set a registry key, in which case they can just forget about "unlocking the window from the inside" and just replace the code that sets the registry with the full-on exploit. The alleged attack vector is a red herring. The burglar is already inside the house.

    Or suppose you found a technique to cause an application to log sensitive information, triggered by a setting that only administrators can enable. Therefore, in order to "exploit" this hole, you need to gain administrator privileges, in which case why stop at logging? Since you have administrator privileges, you can just replace the application with a hacked version that does whatever you want.

    Of course, code injection can indeed be a security hole if it permits elevation of privilege. For example, if you can inject code into a program running at a different security level, then you have the opportunity to elevate. This is why extreme care must be taken when writing unix root-setuid programs and Windows services: These programs run with elevated privileges and therefore any code injection bug becomes a fatal security hole.

    A common starting point from which to evaluate elevation of privilege is the Internet hacker. If some hacker on the Internet can inject code onto your computer, then they have successfully elevated their privileges, because that hacker didn't have the ability to execute arbitrary code on your machine prior to the exploit. Next time, we'll look at some perhaps-unexpected places your program can become vulnerable to an Internet attack, even if you think your program isn't network-facing.

  • The Old New Thing

    What can I do with the HINSTANCE returned by the ShellExecute function?


    As we saw earlier, in 16-bit Windows, the HINSTANCE identified a program. The Win32 kernel is a complete redesign from the 16-bit kernel, introducing such concepts as "kernel objects" and "security descriptors". In particular 16-bit Windows didn't have "process IDs"; the instance handle served that purpose. That is why the WinExec and ShellExecute functions returned an HINSTANCE. But in the 32-bit world, HINSTANCEs do not uniquely identify a running program since it is merely the base address of the executable. Since each program runs in its own address space, that value is hardly unique across the entire system.

    So what can you do with the HINSTANCE returned by the ShellExecute function? You can check if it greater than 32, indicating that the call was successful. If the value is less than 32, then it is an error code. The precise value of the HINSTANCE in the greater-than-32 case is meaningless.

    Why am I bothering to tell you things that are already covered in MSDN? Because people still have trouble putting two and two together. I keep seeing people who take the HINSTANCE returned by the ShellExecute function and hunt through all the windows in the system looking for a window with a matching GWLP_HINSTANCE (or GWL_HINSTANCE if you're still living in the unenlightened non-64-bit-compatible world). This doesn't work for the two reasons I described above. First, the precise value of the HINSTANCE you get back is meaningless, and even if it were meaningful, it wouldn't do you any good since the HINSTANCE is not unique. (In fact, the HINSTANCE for a process is nearly always 0x00400000, since that is the default address most linkers assign to program executables.)

    The most common reason people want to pull this sort of trick in the first place is that they want to do something with the program that was just launched, typically, wait for it to exit, indicating that the user has closed the document. Unfortunately, this plan comes with its own pitfalls.

    First, as we noted, the HINSTANCE that you get from the ShellExecute function is useless. You have to use the ShellExecuteEx function and set the SEE_MASK_NOCLOSEPROCESS flag in the SHELLEXECUTEINFO structure, at which point a handle to process is returned in the hProcess member. But that still doesn't work.

    A document can be executed with no new process being created. The most common case (but hardly the only such) in which you will encounter this is if the registered handler for the document type requested a DDE conversation. In that case, an existing instance of the program has accepted responsibility for the document. Waiting for the process to exit is not the same as waiting for the user to close the document, because closing the document doesn't exit the process.

    Just because the user closes the document doesn't mean that the process exits. Most programs will let you open a new document from the "File" menu. Once that new document is opened, the user can close the old one. (Single-document programs implicitly close the old document when the new one is opened.) What's more, closing all open windows associated with the document need not result in the program exiting. Some programs run in the background even after you've closed all their windows, either to provide some sort of continuing service, or just because they are just anticipating that the user will run the program again soon so they delay the final exit for a few minutes to see if they will be needed.

    Just because the process exits doesn't mean that the document is closed. Some programs detect a previous instance and hand off the document to that instance. Other programs are stubs that launch another process to do the real work. In either case, the newly-created process exits quickly, but the document is still open, since the responsibility for the document has been handed off to another process.

    There is no uniform way to detect that a document has been closed. Each program handles it differently. If you're lucky, the program exposes properties that allow you to monitor the status of an open document. As we saw earlier, Internet Explorer exposes properties of its open windows through the ShellWindows object. I understand that Microsoft Office also exposes a rather elaborate set of automation interfaces for its component programs.

  • The Old New Thing

    On the bogusness of reporting the winning word in a spelling bee


    Whenever the United States media report on a spelling bee (typically, the Scripps National Spelling Bee, the best-known spelling bee in the country), they always report on the "winning word". But the winning word is a bogus metric because the winning word in real life tends to be comparatively easy. It's the penultimate word that is the hard one.

    In nearly all spelling bees, when the field narrows to just two contestants, if one contestant misses a word, the other contestant must spell that word plus a bonus word to win. Sort of like volleyball. The bonus word is not necessarily a hard word; in fact, just by the principle of regression to the mean, it is likely to be a comparatively easy word. The hard word is the one that knocked out the second-place winner. Look at it this way: Nobody misspelled the winning word, so how hard can it be?

    Consider this hypothetical spelling bee:

    Judge: The word is "chiaroscuro".
    Player A: c-h-i-a-r-u-s-c-u-r-o.
    Judge: I'm sorry, that's incorrect. Player B?
    Player B: c-h-i-a-r-o-s-c-u-r-o.
    Judge: Correct. And your next word is "dog".
    Player B: d-o-g.
    Judge: Congratulations, Player B, you're the winner.

    [9am: How embarrassing. I misspelled "chiaroscuro".]

    The newspapers all report that "The winning word was 'dog'," and people reading the newspaper say, "Pshaw, I don't know why people get all worked up about this spelling bee thing. Even I can spell 'dog'."

    For example, in 2005, the "winning word" was "appoggiatura", a word any musician can spell in their sleep. The penultimate word was the somewhat more challenging "roscian".

    This year's Scripps National Spelling Bee will be held on May 31 and June 1, 2006.

Page 3 of 4 (35 items) 1234