Sorting it all Out Michael Kaplan's random stuff of dubious value Be sure to read the disclaimer here first!
So just yesterday, Kelvin Houghton had an excellent question:
Hello All.I have a strange issue I would like your help on please. In a C# app if I have the line Console.WriteLine("\u3094"); I would expect to see the output of character ゔ But it instead outputs u30F4 which would look like this character rヴMy question is why does u30F4 get displayed when I told it to display u3094?ThanksKelvin
Like I said, an excellent question!
Globalization ace Garrett McGowan had the answer pretty quickly:
This is because there’s no equivalent character in the Japanese Windows code page (cp932). It’s returning the closest equivalent, the katakana form of the symbol.
A fact that you confirm by looking at the "best fit" tables from Microsoft that are publicly available and hosted on the Unicode site here, with an excerpt below:
0x3091 0x82ef ;ゑ Hiragana We0x3092 0x82f0 ;を Hiragana Wo0x3093 0x82f1 ;ん Hiragana N0x3094 0x8394 ;ヴ Hiragana Vu (add best-fit 2/1/96)0x309b 0x814a ;゛ Katakana-Hiragana Voiced Sound Mark0x309c 0x814b ;゜ Katakana-Hiragana Semi-Voiced Sound Mark0x309d 0x8154 ;ゝ Hiragana Iteration Mark0x309e 0x8155 ;ゞ Hiragana Voiced Iteration Mark0x30a1 0x8340 ;ァ Katakana Small A0x30a2 0x8341 ;ア Katakana A
As a bonus, everyone can reflect on the power of comments that no one probably realized were there? :-)
IPE expert Paul Chavez made an interesting point about the missing-ness of this character in shift-JIS:
Makes sense as ヴ is only used for purely phonetic writing of “foreign sounds” Although Hiragana is phonetic also, it is not used when writing “foreign sounds”.I know as my family name is spelled チャヴェズ.
And as a closing bit, there is a nice little note in the hiragana entry on everything2.com:
It should be mentioned that there are several obsolete kana that are rarely used today, in both hiragana and katakana. In hiragana: ゑ - the "we" hiraganaゐ - the "wi" hiraganaゔ - the "vu" hiragana All of these kana are considered obsolete, and exist only for use in transcribing older documents. In cases where the "vu" hiragana is used, the still in use katakana "vu" is placed instead, and when formed into another syllable, a smaller kana vowel is paired with it.
It should be mentioned that there are several obsolete kana that are rarely used today, in both hiragana and katakana. In hiragana:
ゑ - the "we" hiraganaゐ - the "wi" hiraganaゔ - the "vu" hiragana
All of these kana are considered obsolete, and exist only for use in transcribing older documents. In cases where the "vu" hiragana is used, the still in use katakana "vu" is placed instead, and when formed into another syllable, a smaller kana vowel is paired with it.
When all of this info had been passed about, Kelvin did have one more question to ask:
Just another question this raises. If you use charmap.exe or wordpad.exe it does display the characters correctly – how are they able to do that? Trying to fully understand as we have ######1 who is trying to localize a file name that uses u3094 and that is what displays incorrectly.
Now that does boil down to the basic Unicode vs. not issue -- and in particular the Console.WriteLine behavior is explained in Sometimes, the shortcuts give better AND faster results. A .NET limitation that can be worked around (when necessary) with WriteConsoleW, and an ISV limitation that can be worked around (when necessary) by converting to Unicode!
1 - Third party independent software vendor removed just for the hell of it, since this is mostly a .NET issue anyway, not an iSV bug.
This post brought to you by ゔ (U+3094, a.k.a. HIRAGANA LETTER VU)
Recently while paying attention to The Unicode List I was once again reminded why I don't pay more attention
allright it doesnt give u much information!!!
Functions like GetShortPathName have been around for a long time.
Too long, if you ask me.
Because