Sorting it all Out Michael Kaplan's random stuff of dubious value Be sure to read the disclaimer here first!
Unicode has a certain complexity to it that can at times be challenging.
Let's take for example U+1ec5, a.k.a. LATIN SMALL LETTER E WITH CIRCUMFLEX AND TILDE. Here is what it looks like (how good will depend on your OS and browser support!):
Now obviously that is pretty fully precomposed (in Unicode Normalization Form C). If it is fully decomposed, we get U+0065 U+0302 U+0303, a.k.a. LATIN SMALL LETTER E + COMBINING CIRCUMFLEX ACCENT + COMBINING TILDE. Here is what it looks like (again, how good will depend on your OS and browser support!):
And here is where the problems come in. Because between these two extremes lies as third case: U+00ea U+0303 a.k.a. LATIN SMALL LETTER E WITH CIRCUMFLEX + COMBINING TILDE. Here is what it looks like (again, how good will depend on your OS and browser support!):
Now if you convert that third case to NFC you will get the first case, and to NFD you will get the second. How does that happen?
Well, the rules for normalization are that you have to keep on performing the compression or decompression until you can't anymore.
So, there are two ways to get the information of that last case:
Step 1: Convert the string to NFD; we now have: U+0065 U+0302 U+0303
Step 2: U+0065 + U+0302 to NFC == U+00ea; we now also have U+00ea U+0303
Step 3: U+00ea + U+0303 to NFC == U+1ec5; we now also have U+1ec5
Now this is not what I would call a perfect algorithm by any stretch of the imagination. But it is a quick and dirty way to get the information on a bunch of equal forms.
But it certainly leaves open the question of whether the operating system and/or the .NET Framework should expose this information at some point....
This post brough to you by "ễ" (U+1ec5, a.k.a. LATIN SMALL LETTER E WITH CIRCUMFLEX AND TILDE)
QTran asked via the Contact link: Michael, Install the Vietnamese keyboard on XPSP2, and guess what it
Michael Kaplan's personal blog not approved by Microsoft (see disclaimer )! You may have read Vietnamese
Over in the Suggestion Box, Aaron asked: Hi again - question about one of your favorite codepages - 1258
As a by the way, this blog does NOT represent anything beyond my own personal thoughts. You could even