Sorting it all Out Michael Kaplan's random stuff of dubious value Be sure to read the disclaimer here first!
J. Daniel Smith asked about ToLower() (and ToUpper()) and some trouble he was having with them:
The comment about Turkish in the docs with regards to "i" doesn't carry a lot of weight with fellow programmers and we only care about 8 languages: English, FIGS and CJK. One example that occurs to me is the word "Straße" in German. When upper-cased it should become "STRASSE" (no ��), but I can't seem to get code to do that. Also, being a noun, you can't lower-case this word as nouns always start with a capital in German; "straße" is wrong (unless there is a verb "strassen").
Windows and the .NET Framework mainly support simple, reversible casing -- which is to say single code point casing that have ToUpper() and ToLower() as inverse operations that can "undo" each other. As such, you cannot use either method to convert one to the other.
Comparison, on the other hand, will handle this case. If you compare "ß" to "SS" with CompareString and the NORM_IGNORECASE flag in Windows or the CompareInfo.Compare method and the CompareOptions.IgnoreCase flag in the .NET Framework, the two strings will be considered equal. Because in truth, they are equal -- just a case pair apart....
This happens on all locales, not just in German -- because the "ß" (U+00df, a.k.a LATIN SMALL LETTER SHARP S) is considered to be a simple case difference away from "SS" in the default table. Give it a try!
J. Daniel went on further to ask some additional questions:
In German, there is always an alternate spelling for words with umlauts: "für" is the same as "fuer". However, the converse is now always true; not every "ue" can be replaced with "ü". Similarly for "ß", it can always be replaced with "ss" (and must when UPPER-CASING as there is no such thing as an upper-case "ß"). But not every "ss" can be replaced with "ß". First, I can't seem to get ToUpper() to turn "ß" into "SS". Second, how do I correctly deal with "für"=="fuer"?
Ok, I think I took care of explaining the deal with the Sharp S. But let me add that this is not a conditional opertion -- Windows is neither drawing on huge German dictionaries to avoid treating them with this sort of equivalency nor using machine reading techniques and schoolboy knowledge of German to read the text....
For the second point, you will want to look at what is known as the German Phonebook Sort -- LCID of 0x00010407. It will have all of the following equivalences in collation:
Ä == AEä == aeÖ == OEö == oeÜ == UEü == ue
You can just think of collation as the technology that will travel to where casing fears to go.... :-)
This post is sponsored by "Ä" (U+00c4, a.k.a. LATIN CAPITAL LETTER A WITH DIAERESIS)
(Apologies once again for the Dogma/Carlin allusion in the title) I'll start by posting the moral of