Sorting it all Out Michael Kaplan's random stuff of dubious value Be sure to read the disclaimer here first!
Yesterday in the post For the [locale] explorer in you...., I mentioned that there was a bug. Francois is actually the person who saw it, and he came and asked me about it....
The bug can be seen in the picture of the Uighur (PRC) culture (ug-CN):
Can you see it?
The side effect of the bug is the way that the parentheses are screwed up in the Native Name at the top. But the actual bug is in the purple Text section, where it claims that Bidirectional is False for a fulture for which it shoulc clearly be True.
As you can tell by looking in the source for Culture Explorer 2.0, Francois is simply using the TextInfo.IsRightToLeft property both to fill in that purple item and to set the TextBox.RightToLeft property of the controls containing native text with lines like:
NameNative.RightToLeft = ci.TextInfo.IsRightToLeft ? RightToLeft.Yes : RightToLeft.No;
So, there is a bug either in the Windows locale data for Uighur, or in the .NET Framework code that synthesizes a Windows Only culture from the Windows data.
My psychic powers suggested to me that the Windows data was correct, because although locale data can have mistakes on occasion, it is more likely that the specific locale data was reviewed than a generic process that may not have been tested across every possible culture since it shipped before Vista was widely available. It could have gone either way, I guess it was just a judgment thing.
To prove the acuity of my psychic powers, I suppose I could just ask you to run the How To [NOT] detect that a locale is bidi or even the How To detect that a locale is bidi code, or I could make you look at the binary FONTSIGNATURE for the locale, which in WCHAR values returned by GetLocaleInfo looks like this:
\x2000\x0000\x0000\x8000\x0008\x0000\x0000\x8800\x0000\x0000\x0000\x0000\x0000\x0000\x0000\x0000
But for those who do not find such a view to be too comfortable and who wanted more than just a rerun of blog posts past, let's take the following managed code instead:
using System;using System.Globalization;namespace Testing { class LdmlDump { [STAThread] static void Main(string[] args) { CultureInfo ci; string stCulture; // First figure out the name if(args.Length > 0) { stCulture = args[0]; } else { stCulture = CultureInfo.CurrentCulture.Name; } // Create the culture and say what it is ci = new CultureInfo(stCulture, false); Console.WriteLine("\r\nUsing the following culture: '{0}' ({1})\r\n", ci.DisplayName, ci.Name); // Create the replacement and fill it CultureAndRegionInfoBuilder carib = new CultureAndRegionInfoBuilder(stCulture, CultureAndRegionModifiers.Replacement); carib.LoadDataFromCultureInfo(ci); carib.LoadDataFromRegionInfo(new RegionInfo(stCulture)); carib.Save(stCulture + ".ldml"); } }}
Stick it in a file called DumpLdml.cs and compile it with the following from CMD:
csc DumpLdml.cs /r:sysglobl.dll
Now you can run it on any culture on the machine. This code may come in handy in future posts, too. :-)
We'll try both ar-SA and ug-CN, with mn-Mong-CN for luck:
E:\Users\michkap>DumpLdml.exe ar-SAUsing the following culture: 'Arabic (Saudi Arabia)' (ar-SA)E:\Users\michkap>DumpLdml.exe ug-CNUsing the following culture: 'Uighur (PRC)' (ug-CN)E:\Users\michkap>DumpLdml.exe mn-Mong-CNUsing the following culture: 'Mongolian (Traditional Mongolian, PRC)' (mn-Mong-CN)
Now looking at the LDML for each, one finds some interesting info. Both ar-SA and ug-CN have the following in them for the font signature:
<msLocale:fontSignature> <msLocale:unicodeRanges> <msLocale:range type="13" /> <msLocale:range type="63" /> <msLocale:range type="67" /> <msLocale:layoutProgress type="horizontalRightToLeft" /> </msLocale:unicodeRanges>
while mn-Mong-CN has:
<msLocale:fontSignature> <msLocale:unicodeRanges> <msLocale:range type="81" /> <msLocale:layoutProgress type="verticalBeforeHorizontal" /> </msLocale:unicodeRanges>
The layoutProgress is referring to the bits I talked about previously in How To [NOT] detect that a locale is bidi -- the following bits in the Unicode subset bitfields:
You can kind of tell where the language in the LDML comes from, huh? :-)
Anyway, it is clear that ug-CN has these bits set correctly, so the bug has to be in the .NET Framework code that synthesizes the Windows Only culture not using this information. Perhaps understandable given how obscure it is though -- further proof that we need our own LCTYPE containing the information in a more easily digested form? :-)
By the way Francois, I verified that this bug has already been reported in the .NET Framework, so no need to bug a new bug in. Though you could bump the number of occurrences if you wanted to. :-)
This post brought to you by ת (U+05ea, a.k.a. HEBREW LETTER TAV)
I'm wondering what exactly verticalBeforeHorizontal used in mn-Mong-CN means, as the MSDN documentation at http://msdn2.microsoft.com/en-us/ms404373.aspx doesn't say anything about it. Chinese written vertically progresses top-to-botttom in columns running right-to-left, but Mongolian progresses top-to-botttom in columns running left-to-right. Which, if either, of these layouts does verticalBeforeHorizontal imply, and is it possible to distinguish the two vertical layouts ?
Look at http://msdn.microsoft.com/library/intl/unicode_63ub.asp for better info -- the text is exactly matching bit 124....
So it looks like it is claiming that the text preferentially flows vertically, in a left to right direction. Which is actually what you just said, right? :-)
I still don't think that the documentation is very clear, but having now reread it for the third time my interpretation is that you can combine bits 123, 124 and 125 in order to specify almost any layout, so that for example if bits 123, 124 and 125 are all set then text should be laid out in vertical columns reading bottom-to-top with columns progressing right-to-left across the page (a possible Ogham layout); and with only bit 124 set then text should be laid out in vertical columns reading top-to-bottom with columns progressing left-to-right across the page (as for Mongolian and Phags-pa). Is that right?
Yes, that is correct.
Now I never claimed that it was exactly intuitive -- only that the text descriptions came directly from the fontsignature bits related to layout.... :-)
OK, thanks. Just one more question. I know that the Unicode range bits are set in the OS/2 table of the font, but what about bits 123-125 -- are these also derived from the UnicodeRange field of the OS/2 table in the font?
The other day in response to It's not right when IsRightToLeft is wrong , regular reader Andrew West
this is year 2008, and the problem still there.
and the culture ps-AF is suffering the same problem.
Guess it can be classified as a bug immortal now.
So, the other day, I wrote How to detect if a locale is Bidi, Windows 7/8 edition .
This is a topic