I think the Turkish folks have it right.

After all, say that we had all of the following characters in English:

  1. I   U+0049   LATIN CAPITAL LETTER I
  2. i   U+0069   LATIN SMALL LETTER I
  3. İ   U+0130   LATIN CAPITAL LETTER I WITH DOT ABOVE
  4. ı   U+0131   LATIN SMALL LETTER DOTLESS I

Wouldn't we do the case mapping to put the dotted and dotless variants together (so that both #1/#4 and #3/#2 would be case pairs)? Be honest, doesn't that make more sense?

We even have a good reason, if you think about it. I mean, its not like the "I" in "him" sounds the one in "nice" and neither of them sounds like the one in "niece" and none of them sounds like the one with no sound in "friend". So with all of those different sounds, English would be a lot simpler if we had an extra pair of letters to work with. I have talked to a lot of native speakers of other languages about languages (occupational hazard), and many suggest that one of the hard things about learning English is the multiple sounds for the same letter. We could actually move towards simplifying things by adding the complication of a few variations on letters....

Ah well, that probably won't happen. But hopefully you can see the basis for languages that might have for wanting an "Å" or an "Ö" or a "Č" or an "İ" in their midst. And then like I pointed out at the beginning of this post, if all of the variants of "I" did exist, it would be crazy to case them in any other way....

Of course, as you may have imagined this plan does not exactly co-exist well with case insensitve registries, or filesystems (like FAT and NTFS). Suddenly that idea that seems more sensible looks like an awful security risk (I do not even have to imagine; I have built versions of Windows on my own development machine that would not boot because they were unable to find the "HKLM\SOFTWARE\MICROSOFT\Windows" registry key and have heard tales of the ones that were unable to find WIN.ini). And I have witnessed code reviews that had scores of developers scan through thousands of files in the .NET Framework to (among other things) properly not use "Turkic" casing when trying to look at the filesystem or the registry. Its amazing how difficult and expensive it can be to make a product behave intuitively....

See how I slipped the proper design into that last paragraph? If you said "yes" then I feel very clever, otherwise I don't. :-)

The right design is to use CultureInfo.CurrentCulture in your .NET code any time you want to get the (possibly different) casing behavior seen in Turkish and Azeri, like in strings that your end users would see. At the same time you would use CultureInfo.InvariantCulture for those cases where you want the invariant, unchanging behavior. And in unmanaged code you want LCMapString with the LCMAP_UPPERCASE/LCMAP_LOWERCASE transformations to use or not use the LCMAP_LINGUISTIC_CASING flag, depending on the same conditons.

Its easy to remember it and do it, if you learn it in the first place. :-)