Blog - Title

May, 2009

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    The Whey doesn't get a locale, either

    • 7 Comments

    As a by the way, this blog does NOT represent anything beyond my own personal thoughts based on the way I think things are going. You could even blame it on my Tegretol dosage, to be perfectly honest (if the pain were not so intense I'd have skipped this med for sure). I am not even on the team that decides these things any more and I wasn't in charge of the strategy when I was then. Anyone who quotes me with prefacing words like "According to Microsoft..." is a complete and utter moron.

    It starts with a nursery rhyme, as it turns out a very popular one from what I am told:

    Little Miss Muffet
    Sat on a tuffet,
    Eating her curds and whey;
    Along came a spider,
    Who sat down beside her
    And frightened Miss Muffet away.

    You might wonder where I am going from here.

    Well, first there is the mail I got (product and personal info removed because it just seemed prudent):

    Dear Michael ..

    We have been contacted by a group of Kurdish people who would like to translate our product to Kurdish. However the problem we are facing is that we cannot find (checked winnt.h) the language ids (e.g. LANG_KURDISH) associated with the Kurdish language and sub language/culture.

    I have been reading the following paper It is Time to Add Kurdish Culture to VS .NET Globalization
    Since you have been referenced and I think to remember that you are involved with globalization at Microsoft I thought you are the right person to contact :-)

    Have the language ids for Kurdish already been defined? If not would it be possible to do so?

    As always thank you for your time and efforts.
    Best Regards,

    Ah, so the title and the nursery rhyme is a bad pun (curds vs. Kurds) skirting a geopolitically sensitive issue!

    The answer is that there is no LANG_KURDISH defined. Although the paper starts off on very solid ground it ignores the geopolitical issues completely, to wit:
    • Kurdistan is not a country but a region owned by Iraq, Iran, Turkey, Azerbaijan, Syria, and Armenia.
    • And only one of those countries recognizes their chunk as an autonomous entity (Iraq).
    • And only one recognizes Kurds as a minority (Iran).
    • And only three (Iraq, Iran, and Armenia) recognize Kurdish officially as a minority language; it is banned in Syria and I have talked previously about the efforts to ban individual letters in Turkey as an effort against Kurdish.

    Plus I have no idea who the authors talked to; although they quoted my book several times they never contacted me (by the address provided in the book or any other way) and I don't believe they contacted anyone with knowledge/understanding of the issue, or they would have gotten a much more accurate (though for them not so happy) of an answer.

    Remember that nothing gets added to .NET; it is first added to Windows and then later to .NET.

    So assuming that there were going to see (for example, off the top of my head) some or all of the following new locales added:

    • ku-Arab-IR
    • ku-Arab-IQ
    • ku-Cyrl-AZ
    • ku-Latn-AZ
    • ku-Arab-SY
    • ku-Latn-SY
    • ku-Cyrl-AM
    • ku-Latn-AM
    • ku-Latn-TR

    without (in many cases) the support and assistance of the governments who decide to let Microsoft ship software in their countries, and with the unhappiness of severaql of those countries as a part of the process, it is easy to imagine software banned in some of those places.

    Microsoft can't set policy here, or be used as proof of a policy.

    Thus we have our answer: the locale id values have not been defined, and they cannot be given the present environment.

    In the end, whether we are talking about Kurdish, or Kashmiri, or any of a number of such languages, Sometimes, tech companies cannot take sides. If you do things with custom locales and custom keyboards, then the operating system will support the efforts, but these things can't so easily be put in the box, built in. And thus they can't be so easily put into the .Net Framework,

    On the other hand, this is yet another reasons that LCIDs suck, and any software -- Microsoft or any other company -- that depends on LCIDs needs to get off of them. For everyone's sake. Note that it is the software's dependence on LCIDs that led to the question being asked in the first place!

     

    This post brought to you by ڵ (U+06b5, a.k.a. ARABIC LETTER LAM WITH SMALL V)

  • Sorting it all Out

    "Size matters, and at 8.25, yours simply isn't big enough," she said....

    • 3 Comments

    I am going straight to hell for the title on this one, I know it....

    The question, asked on a discussion list primarily concerned with WinForms, went something like this:

    I am facing some issue regarding showing right Japanese chars. It get truncated from bottom for few chars. Our Application wide font is Segoe UI 8.25 pt.

    This is one of those questions that leaves me momentarily stunned; I simply don't know what to say.

    An 8pt font is truncating some Japanese characters?

    By Design.

    By De-freaking-sign, in fact.

    Japanese Kanji may not not have as many ideographs in it as the bulk of Chinese Han in use but there are quite a few complex ones in there.

    I mean even just for example ones like

    (U+4e80) that clearly need a certain minimum number of lines that 8pt won't give -- you can go head and count the minimum number of lines it needs quite easily! -- or 

    (U+9f7d) which shows the problem even better -- squeezed down into 8pt seems ridiculous.

    Hell, some will look pretty awful squeezed down into 9pt!

    If there are areas where Kanji or Han or Hanja need to be displayed, there ,might need to be a bigger size. Perhaps if nothing else a way for users to tweak that application wide font, or at least the controls that can have that text in them.

    And that is even before getting into the languages of Southeast and South Asia, that make the East Asian requirements seem positively puny. I mean, any size that truncates a bit of some Kanji will make Tibetan look at best like a decorative monitor smudge....

     

    This blog brought to you by every character in Unicode that does not look so good at 8 pt....

  • Sorting it all Out

    The whole truth about MB_PRECOMPOSED and MB_COMPOSITE

    • 2 Comments

    As a by the way, this blog does NOT represent anything beyond my own personal thoughts. You could even blame it on my Tegretol dosage, to be perfectly honest (if the pain were not so intense I'd have skipped this med for sure). I am not even on the team that owns this code any more and I didn't own it when I was then. Just so you know....

    Recently when Shawn posted Don't use MB_COMPOSITE, MB_PRECOMPOSED or WC_COMPOSITECHECK, there were a few things he didn't mention.

    For example, there is the fact that in builds of Windows 7 prior to the final release, the behavior is designed to use normalization.

    Now of course there are a bunch of cases not in the tables in this Microsoft technology that pre-dates Unicode Normalization by more than half a decade, but that is small change.

    And yes it means that MB_PRECOMPOSED will be a no-op for most code pages but for Vietnamese will destroy that special form "V" described in blogs like Getting intermediate forms and Harder intermediate forms of characters and Frost's The Form Not Taken. Which destroys round-tripping for Vietnamese, at least.

    But this too is chump change. Someone will care but it might have shipped before anyone noticed.

    You know what did get noticed?

    I'll give you a hint: MB_COMPOSITE.

    And another hint: my own blog Stripping diacritics...., which uses Unicode Normalization Form D to do its work, just like MB_COMPOSITE would nominally be expected to convert the text to Form D,

    You give up? Well look at the fixed version of that code, found in Stripping is an interesting job (aka On the meaning of meaningless, aka All Mn characters are non-spacing, but some are more non-spacing than others).

    Converting fully formed modern precomposed Hangul syllables to composite Jamo might be a fine idea if Microsoft had not been de facto enjoined nearly a decade ago from providing the fonts that would allow those conjoining Jamo to be composed (of course buffer sizes would be almost tripled but nothing is perfect!).

    Given the current situation, however, this led to a disastrous situation.

    Now with all that said, nearly every word that Shawn said in Don't use MB_COMPOSITE, MB_PRECOMPOSED or WC_COMPOSITECHECK is true. I just felt like the proper context was needed.

    Especially in the context of the one part of his blog I really do disagree with:

    Hopefully I've terrified you and you'll stop using these flags, perhaps using NormalizeString() if you really need similar behavior.

    Since Unicode Normalization will cause many of the very same problems that these flags cause (e.g. poor roundtripping and especially the problems I note above with both Form C and Form D), scaring people away from the flags and suggesting the use of a function that in many contexts will produce results just as scary is probably not the best idea.

    It was written at what I have learned is probably not the best point to write a blog: when one is in the process of losing and/or has just lost the argument against a change, since it can color one's argument in a specific direction that under ordinary circumstances one might not choose to do....

     

     

    This post brought to you by(U+30d4, a.k.a. KATAKANA LETTER PI)

Page 1 of 1 (3 items)