There's no "I" in IDN, part 5: Stephen Colbert's job is not in any jeopardy

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!

There's no "I" in IDN, part 5: Stephen Colbert's job is not in any jeopardy

  • Comments 12

Prior blogs in this series:

I suspect some of my readers are either fans or at least regular watchers of The Colbert Report.

Perhaps just my smarter readers.

Or maybe just the ones with basic cable....

Today's blog ends up being about a combination "tip of the hat/wag of the finger" question.

In honor of Stephen Colbert....

The question goes something like this:

SUBJECT: String.Compare for double byte characters in .Net

I have following two string characters whose comparisons in SQL are equal, however I couldn’t figure out any comparisons in .net (culture/ordinal/case insensitive) that would return me equality. Any ideas?

Goal is to not change SQL settings, but to find insensitive compare in .net.

String.Compare(
    "0336753496aaa@ae2.dion.ne.jp",
     "0336753496aaa@ae2.dion.ne.jp",
    CultureInfo.InvariantCulture,
    CompareOptions.IgnoreCase)

OR (Tried all combinations)

String.Compare(
    "0336753496aaa@ae2.dion.ne.jp",
    "0336753496aaa@ae2.dion.ne.jp",
    StringComparison.OrdinalIgnoreCase)

Now I'll consolidate the many different tips and wags:

First of all, a wag of the finger since the question referred to "double byte characters" despite every string involved using Unicode, in a language (C#) that uses Unicode.

Perhaps somewhat forgivable since the example was clearly referencing Japan, so perhaps the questioner was thinking about Japanese at the time. And therefore "double byte" was just old school thinking about CJK. Kind of like how they never migrated all those people off the FAREAST domain, even as everything else started referencing east Asia. Even though domain account migrations are so much easier these days after those thousands of migrations in Windows kind of forced ITG to get better at it....

Second of all, a tip of the hat to the genuine attempt to try to do comparisons that fold out distinctions in an attempt to get parity between SQL Server and the .NET Framework.

Third of all, a wag of the finger for ignoring the most important distinction in this case -- the implicit Width Insensitive nature of all _C*_A* collations in SQL Server, which could have been simulated by adding a StringComparison.IgnoreWidth to the first call, had their names not masked the fundamenta nature of the "hidden width" that makes me wonder if someone in SQL Server isn't worried about their weight too much....

Fourth of all, a wag of the finger for taking a question obviously covering E-mail Address Internationalization (EAI) but doing it without even asking the question in a way or to a distribution list that suggested they were thinking about EAI.

With a bonus fifth of all wag of the finger to SQL Server since it is hiding so much of the problem here that people come out of SQL Server wondering how to make other products act like them, rather than coming out asking the real questions....

Okay, seems like a lot more wags than tips on this one. And that's even ignoring the extra wags i decided to leave for another day.

I've decided I can't do "tip of the hat/wag of the finger" very well. I should leave that sort of thing to the professionals. From now on, I will.

I'll talk more about EAI another day, too....

Comment on the blather
Leave a Comment
  • Please add 2 and 6 and type the answer here:
  • Post
Blog - Comment List
  • Do you think the severity of the first wag of the finger might also be reduced a bit since the strings are coming from SQL Server, which (AFAIK) continues to use UCS-2 rather than UTF-16, so the encoding might legitimately be called "double byte"?

  • Perhaps a tiny bit, though the fact that SQLS flirts with UTF-16 and the fact that .NET isn't SQL blocks that some....

  • I have encountered many people that use "double byte" for the wide variants of the Latin script.

    Similarly, some use "4 byte characters" for characters that would need 4 bytes in GB 18030 even if they don't use surrogates as Unicode.

  • That's why I was finding that one more forgivable, Mihai!

  • Prior blogs in this series:

    part 1: If you're not Unicode, you're just wrong!

    part 2: Try not

  • Previous parts in this series:

    part 1: If you're not Unicode, you're just wrong!

    part 2: Try

  • Previous parts in this series:

    part 1: If you're not Unicode, you're just wrong!

    part 2: Try

  • Previous parts in this series:

    part 1: If you're not Unicode, you're just wrong!

    part 2: Try

  • Previous parts in this series:

    part 1: If you're not Unicode, you're just wrong!

    part 2: Try

  • Previous parts in this series:

    part 1: If you're not Unicode, you're just wrong!

    part 2: Try

  • Previous blogs in this series:

    part 11: There's no place like ::1, not even 127.0.0.1!

    part

  • Previous blogs in this series:

    part 12: Emoji + IDN == U+1F4A9 (PILE OF POO)

    part 11: There

Page 1 of 1 (12 items)