Holy cow, I wrote a book!
When I brought up the topic of
spelling bees earlier this year,
it triggered several comments on how various languages deal with
the issue of spelling.
Here are some thoughts on the topics that were brought up:
German spelling is
only partly phonetic.
Given the spelling of a word, one can, after applying a rather large
set of rules, determine its pronunciation with very high accuracy.
On the other hand,
given the pronunciation of a word, the spelling is not obvious.
For example, do you write "Feber" or "Vehber"
or possibly "Phäber"?
"Ist" or "isst"?
"Quelle" or "Kwälle"?
The fact that Germany is undergoing controversial spelling reform
proves that German spelling is not entirely predictable.
After all, if spelling were completely phonetic, there would be no
need for reform!
And all those pronunciation rules.
Sometimes a "d" is pronounced like "t";
sometimes a "t" is pronounced like "z";
sometimes a "g" is pronounced like "ch";
sometimes "st" is pronounced like "scht".
One would think that a truly "phonetically-spelled" language would have
a one-to-one correspondence between sounds and letters.
(I'm led to believe that many Eastern European languages are phonetic
in this way.)
Furthermore, given a word's spelling, it's not always obvious where the
For example, you just have to know that the accent in Krawatte
goes on the second syllable.
The spelling gives you no help.
Swedish is like German in this respect:
Given the spelling of a word, you can (again, after the application of
a rather large set of rules) determine its pronunciation with
a high degree of confidence.
But going in the other direction can be a nightmare.
The tricky "sj" sound goes by many spellings:
"sj", "stj", "stj" , "sk", "ch", and sometimes even "g"
(in French-derived words).
Depending on the regional accent,
the pronunciation of a leading "s" can vary depending on the ending
of the previous word.
(Though I suspect most Swedes don't even hear the difference themselves.)
At least in English, we're honest about the fact that our spelling
English spelling only starts to become intuitive once you've learned
French, German, Middle English, Greek, Latin, and a handful of other
languages, learned British history (so you know who conquered whom when
and ransacked their language for new words), and learned how
the precursor-languages to modern English were pronounced at the
time the words were imported.
That last point is a problem common to many languages.
The spelling of a word tends to change
much more slowly than its pronunciation.
English retains the original spelling long after
the pronunciation has moved on.
Many Chinese characters are puzzling until you realize that the
word was pronounced differently a few thousand years ago.
(Yes, there is a phonetic component to Chinese characters, believe it
Resistance to spelling reform in Germany is just another manifestation
of spelling inertia.
One thing I thought was interesting was the types of competitions
different languages use to promote correct spelling and/or grammar.
In the United States, spelling competitions (known as "spelling bees")
are the most common way of accomplishing this.
Students are each given a word to spell,
which must be done from memory.
Spell it correctly and you survive to the next round;
spell it incorrectly and you are eliminated.
It is my understanding that in Taiwan,
the analogous competition is the "dictionary look-up".
I'm hazy on the details, but I think the way it works is that
a character is shown to the class, and the students race to look
it up in the dictionary.
Since dictionaries are typically arrange phonetically,
a student who already knows how the character is pronounced
has an advantage over a student who has to count strokes
and perform radical decomposition in order to look it up.
I was not previous aware of
but they appear to be
particularly popular in Poland.
This allows greater emphasis to be placed on the complexity
of Polish grammar.
A former colleague of mine who grew up in Poland told me that
when she goes back to visit relatives,
it takes her a while to "regain her tongue" and stop making
You know you've got a complicated language when even a native
speaker has to get back up to speed.
I recall a bug that we were investigating that was being caused
by a registry key being set when it shouldn't have been.
But when you looked at the key in Regedit, it say
"(value not set)".
Why were we going down the "value is set" branch?
A little spelunking with the debugger revealed the reason
Whoever set up that registry key wrote the literal
string "(value not set)" to the registry!
Thus, the value was set,
to the string "(value not set)"!
We were flabbergasted.
The only explanation we could come up with was that
whoever created the registry key didn't understand that
the "(value not set)" was shown by Regedit for a key with no
Instead, they figured "Oh, I need it to look like that,
so I'll set the value to '(value not set)'. Now it looks
right in Regedit."
Along similar lines, I've been told of a system which appeared to have
two (Default) values.
As you can probably guess by now,
what really happened is that somebody created a value whose name
The moral of the story is not to confuse what something is
with what something looks like.
"OOF" is a word you hear a lot at Microsoft.
etymology a while back
(though my recollection is that it stood for
"Out of Office Feature", not that my memory is good for much nowadays).
Incidentally, KC is
profiled on the Microsoft Careers site,
though she goes under the top-secret code name "KC" there.
Most people set their "vacation" message to something pretty straightforward.
A brief message, a return date, and a flowchart of who can be contacted
in the meantime. Here's what one might look like.
(For the sake of illustration,
I made up a
as well some imaginary members and team mailing list.
I did not make up "Kansas", however.
Believe it or not, that's
a real state!)
In Kansas until March 3, checking email sporadically.
Teapot shading: Fred Smith
Teapot rotation: Bob Wilson
Teapot general: tpteam
Teapot shading: Fred Smith
Teapot rotation: Bob Wilson
Teapot general: tpteam
The OOF is an opportunity for small-form-factor humor.
When he left on holiday at the end of December,
Marc Miller's OOF message introduced the "flowchart" section with
the heading "These people are probably also OOF".
Jensen Harris's OOF earlier this year read
Out of office, Thursday March 31. Back on Friday.
If you are injured, dial 911.
(But don't call 911 for a non-emergency like
On the other hand, KC called 911 because
she couldn't get out of bed.)
As for me, I try to keep my OOF under twenty words.
Part of the trick is getting rid of the "flowchart".
I remember one time I simply wrote
"Returning dd-mmm-yy. You'll just have to cope until then."
The "flowchart" section of the OOF is one of those places
where beginners go overboard, listing a half dozen topics and
the corresponding backup.
It's a sort of ego trip, where you can quietly show off,
"Wow, look at all the things I do.
How would you ever survive without me?"
As with email signatures and
the amassing of physical objects,
the more seasoned you become,
the more you value the ability to keep it short and simple.
If changing a setting requires administrator privileges in the first place,
then any behavior that results cannot be considered a security hole
because in order to alter the setting, attackers must already have gained
administrative privileges on the machine, at which point you've already
lost the game.
If attackers have administrative privileges,
they're not going to waste his time fiddling with some setting and
leveraging it to gain even more privileges on the system.
They're already the administrator;
why go to more work to get what they already have?
One reaction to this is to try to "secure" the feature by asking,
"Well, can we make it harder to change that setting?"
For example, in response to the Image File Execution Options key,
Norman Diamond suggested "only allowing the launching of known debuggers."
But this solution doesn't actually solve anything.
What would a "known debugger" be?
Besides, it doesn't matter how much you do to make the Image File
Execution Options key resistant to unwanted tampering.
If the attacker has administrative privileges on your machine,
they won't bother with Image File Execution Options anyway.
They'll just install a rootkit and celebrate the addition of another
machine to their robot army.
Thus is the futility of trying to stop someone who already has
obtained administrative privileges.
closing the barn door after the horse has bolted.
Whenever there was a scene in
Mission: Impossible III
that took place at the agency offices,
I was repeatedly bothered by the fact that all the people
in the building are wearing their identification badges
clipped to their jackets or shirts.
Except Ethan Hunt.
He gets to walk through the halls like a cologne advertisement.
Why doesn't he have to wear identification?
His boss has to wear identification.
His boss's boss has to wear identification.
But Ethan Hunt gets to just wander around in black looking cool
without any unsightly identification tag that would ruin the
look of whole outfit.
I was also somewhat off-put,
as was Bob Mondello,
that the producers thought it necessary to identify cities
"Berlin, Germany", "Rome, Italy", and "Shanghai, China".
Do they think we're so stupid that we don't know where Berlin is?
(And keep an eye out for the American-style fire alarm during the
chase through Shanghai just as Ethan Hunt turns a corner.
At first glance I thought it said "REEB", but upon further reflection
I believe the last two letters are more likely to be
I don't know what the first two letters stand for, or even if I
remembered them correctly.)
Last time, we left off with a promise to discuss ways your program
can be Internet-facing without your even realizing it,
and probably the most common place for this is the command line.
files can be shared across the Internet and accessed via UNC notation.
This means that anybody can set up a CIFS server and create files like
and they will look to the world like a file on a file server somewhere
(because that is, in fact, what it is).
When you double-click it, you're launching the document.
And that's where the command line attack comes from.
Suppose your program is a handler for a file association.
Say, your program is litware.exe and it is the
registered handler for .LIT files.
The attacker just has to create a file called
and induce the user into double-clicking it.
Once that's done, your program will be run with the command
line you registered, which will probably be
Notice that the attacker controls the path.
This means that if you have a bug in your command line parser,
the attacker can exploit it.
Code injection via the command line is an elevation of privilege.
Note that this extends beyond merely extra-long file names.
If you registered your verb incorrectly by forgetting to put
quotation marks around the file name insertion %1,
the attacker can hatch a file with an odd name like
\\server.example.com\strange -uninstall path.lit.
The resulting command line is therefore
\\server.example.com\strange -uninstall path.lit
\\server.example.com\strange -uninstall path.lit
\\server.example.com\strange -uninstall path.lit
Your parser then breaks the command line up into words
and interprets this command line as having three parts:
The program then tries to load the file
and fails, possibly displaying an error message, then it uninstalls itself,
and then tries (and fails) to load the file path.lit.
The user gets two strange error messages and the program is uninstalled.
Of course, the attacker also controls the contents of the file,
so any vulnerabilities in your file parser can be exploited as well.
Code injection via file contents is an elevation of privilege.
If you write a shell extension, your extension will run
if the user activates it on the remote file.
For example, if you have a context menu extension, it will
be instantiated and initialized with the remote file as the
Many context menu extensions contain buffer overflow bugs
in the way they mishandle the names of the files that the user
(Notice that I said "names"—plural.
The user might multi-select files and right-click on them.)
For example, a certain shareware file archival program responds to
the GCS_HELPTEXT request by taking the names of
all the files and combining them into the message
"Add the files A, B, C, D, and E to the archive."
Unfortunately, when the names A, B, C,
D, and E are very long, an exploitable buffer overrun occurs.
Code injection triggered by file name length is an elevation of privilege.
Just because your program doesn't contact the Internet explicitly
doesn't mean it's safe from Internet-based attacks.
This past weekend was
Opening Day of the Seattle boating season.
This tends to create traffic chaos in the Montlake neighborhood,
which leads to confusing newspaper headlines like
Opening Day closure.
I remember many years ago asking a boat-owning colleague,
"So, when does boating season close?"
it doesn't close."
"Then why do they have an Opening Day for something that hasn't closed?"
"It gives the slacker fair-weather boaters a target date to get their
boats back in condition.
It really should be called something like Bring Out Your Boats Day."
Not every code injection bug is a security hole.
Yes, a code injection bug is a serious one indeed.
But it doesn't become a security hole until it actually
allows someone to do something they normally wouldn't be able to.
For example, suppose there's a bug where if you type
a really long file name into a particular edit control
and click "Save",
the program overflows a buffer.
With enough work, you might be able to turn this into a code
injection bug, by entering a carefully-crafted file name.
But that's not a security hole yet.
All you've found so far is a serious bug.
(Yes, it's odd that I'm underplaying a serious bug, but only because
I'm comparing it to a security hole.)
Look at what you were able to do:
You were able to get a program to execute code of your choosing.
You can already do that without having to go through all this effort.
If you wanted to execute code of your own choosing,
then you can just put it in a program and run it!
It's like saying that somebody's home windows are insecure because
a burglar could get into the house by merely unlocking and opening
the windows from the inside.
(But if the burglar has to get inside in order to unlock the windows...)
Code injection doesn't become a security hole until you have elevation
In other words, if attackers gains the ability to do something
they normally wouldn't.
If the attack vector requires setting a registry key, then
the attacker must already have obtained the ability to run enough code
to set a registry key, in which case they can just forget about
"unlocking the window from the inside" and just replace the code
that sets the registry with the full-on exploit.
The alleged attack vector is a red herring.
The burglar is already inside the house.
Or suppose you found a technique to cause an application to
log sensitive information, triggered by a setting that only administrators
Therefore, in order to "exploit" this hole, you need to gain
administrator privileges, in which case why stop at logging?
Since you have administrator privileges, you can just replace the
application with a hacked version that does whatever you want.
Of course, code injection can indeed be a security hole if it
permits elevation of privilege.
For example, if you can inject code into a program running at
a different security level, then you have the opportunity to elevate.
This is why extreme care must be taken when writing
unix root-setuid programs and Windows services:
These programs run with elevated privileges and therefore any
code injection bug becomes a fatal security hole.
A common starting point from which to evaluate elevation of privilege is
the Internet hacker.
If some hacker on the Internet can inject code onto your computer,
then they have successfully elevated their privileges, because
that hacker didn't have the ability to execute arbitrary code on your
machine prior to the exploit.
Next time, we'll look at some perhaps-unexpected places your program
can become vulnerable to an Internet attack, even if you think
your program isn't network-facing.
As we saw earlier,
in 16-bit Windows, the HINSTANCE identified a program.
The Win32 kernel is a complete redesign from the 16-bit kernel,
introducing such concepts as "kernel objects" and "security
In particular 16-bit Windows didn't have "process IDs"; the
instance handle served that purpose.
That is why the WinExec and ShellExecute
functions returned an HINSTANCE.
But in the 32-bit world, HINSTANCEs do not uniquely
identify a running program since it is merely the base address of the
Since each program runs in its own address space, that value is
hardly unique across the entire system.
So what can you do with the
HINSTANCE returned by the ShellExecute function?
You can check if it greater than 32, indicating that the call was
If the value is less than 32, then it is an error code.
The precise value of the HINSTANCE in the
greater-than-32 case is meaningless.
Why am I bothering to tell you things that are already covered in MSDN?
Because people still have trouble putting two and two together.
I keep seeing people who take the HINSTANCE
returned by the ShellExecute function
and hunt through all the windows in the system looking
for a window with a matching GWLP_HINSTANCE
(or GWL_HINSTANCE if you're still living in the
unenlightened non-64-bit-compatible world).
This doesn't work for the two reasons I described above.
First, the precise value of the
HINSTANCE you get back is meaningless,
and even if it were meaningful, it wouldn't do you any good
since the HINSTANCE is not unique.
(In fact, the HINSTANCE for a process is nearly
always 0x00400000, since that is the default address most linkers
assign to program executables.)
The most common reason people want to pull this sort of trick in the
first place is that they want to do something with the program that
was just launched, typically, wait for it to exit, indicating
that the user has closed the document.
Unfortunately, this plan comes with its own pitfalls.
First, as we noted, the HINSTANCE that you get
from the ShellExecute function is useless.
You have to use the ShellExecuteEx function
and set the SEE_MASK_NOCLOSEPROCESS flag
in the SHELLEXECUTEINFO structure,
at which point a handle to process is returned in the
But that still doesn't work.
A document can be executed with no new process being created.
The most common case (but hardly the only such) in which
you will encounter this is if the
registered handler for
the document type requested a DDE conversation.
In that case, an existing instance of the program has
accepted responsibility for the document.
Waiting for the process to exit is not the same as waiting
for the user to close the document, because closing the
document doesn't exit the process.
Just because the user closes the document doesn't mean that
the process exits.
Most programs will let you open a new document from the
Once that new document is opened, the user can close the old one.
(Single-document programs implicitly close the old document
when the new one is opened.)
What's more, closing all open windows associated with the
document need not result in the program exiting.
Some programs run in the background even after you've closed
all their windows, either to provide some sort of continuing
service, or just because they are just anticipating that the
user will run the program again soon so they delay the final
exit for a few minutes to see if they will be needed.
Just because the process exits doesn't mean that the
document is closed.
Some programs detect a previous instance and hand off the
document to that instance.
Other programs are stubs that launch another process to do
the real work.
In either case, the newly-created process exits quickly,
but the document is still open, since the responsibility
for the document has been handed off to another process.
There is no uniform way to detect that a document has been closed.
Each program handles it differently.
If you're lucky, the program exposes properties that allow you
to monitor the status of an open document.
As we saw earlier,
Internet Explorer exposes properties of its open windows through the
I understand that Microsoft Office also exposes a rather elaborate
set of automation interfaces for its component programs.
Whenever the United States media report on a spelling bee
Scripps National Spelling Bee,
the best-known spelling bee in the country),
they always report on the
But the winning word is a bogus metric because the winning word
in real life tends to be comparatively easy.
It's the penultimate word that is the hard one.
In nearly all spelling bees, when the field narrows to just two
contestants, if one contestant misses a word, the other contestant
must spell that word plus a bonus word to win.
Sort of like volleyball.
The bonus word is not necessarily a hard word; in fact, just by
the principle of regression to the mean, it is likely to
be a comparatively easy word.
The hard word is the one that knocked out the second-place winner.
Look at it this way:
Nobody misspelled the winning word, so how hard can it be?
Consider this hypothetical spelling bee:
Judge: The word is "chiaroscuro".
Player A: c-h-i-a-r-u-s-c-u-r-o.
Judge: I'm sorry, that's incorrect. Player B?
Player B: c-h-i-a-r-o-s-c-u-r-o.
Judge: Correct. And your next word is "dog".
Player B: d-o-g.
Judge: Congratulations, Player B, you're the winner.
[9am: How embarrassing. I misspelled "chiaroscuro".]
The newspapers all report that "The winning word was 'dog',"
and people reading the newspaper say,
"Pshaw, I don't know why people get all worked up about this spelling
Even I can spell 'dog'."
For example, in 2005, the "winning word" was
"appoggiatura", a word any musician can spell in their sleep.
The penultimate word was the somewhat more challenging "roscian".
This year's Scripps National Spelling Bee will be held on
May 31 and June 1, 2006.