(Apologies to those who are offended by the South Park movie scene that inspired the title of this post!)

About a month ago, Daniel J. Smith asked me something that prompted me to say Dere are qvestions? In zat case...

Then last week, Martin Müller asked in the microsoft.public.dotnet.internationalization:

Recently I've stumbled across the fact that the CompareInfo for my default culture de-DE as well as for InvariantCulture considers "ss" and german "ß"  (szlig) equivalent, which is not correct!

For example, calling lassen".IndexOf("ß") yields 2 instead of 0.

CultureInfo.InvariantCulture.CompareInfo.Compare("lassen", "laßen") returns 0, which is wrong, too.

Using CompareInfo.IndexOf() without special CompareOptions gives the same incorrect results. When I use CompareOptions.Ordinal, however, IndexOf correctly returns -1 and  Compare returns inequality. But CompareOptions.Ordinal cannot be combined with any other flag, so a case insensitive comparison isn't possible this way.

This bug occurrs with IndexOf and Compare of both String and CompareInfo.

Any comment on this or info when this will be fixed?

Well, I have a comment, but things are working as designed so nothing is going to be "fixed". I will explain....

In the German language, the Sharp S ("ß" or U+00df) is a lowercase letter, and it capitalizes to the letters "SS". Now Microsoft's casing tables only support simple Unicode casing, which does not include any rules that would change the size of the string such as this one. So doing a "ß".ToUpper() call will not return "SS".

(for more info on those casing rules, see CaseFolding.txt in the Unicode Character Database)

But in any case, collation can be a bit more flexible. Since the Sharp S is very much a German letter and not one widely used outside of German, it is included in the default table rules used by all locales (which allows German to be kept in the default table and it will be used by all locales that do not conflict).

But obviously on most locales, "ss" is what uppercases to "SS". Even on German, "ss" would uppercase to "SS".

So it is only logical to assume that in such a case, that if

"ss".ToUpper() == "ß".ToUpper() == "SS"

then

"ss" "ß"

at least for the technical purpose of facilitating the ability to treat these other cases properly.This why on almost all locales (including the invariant locale), "ß" looks so much like "ss".

 

This post is brought to you by "ß" (U+00df, a.k.a. LATIN SMALL LETTER SHARP S)
And really, who elase would it be? :-)