Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
As a by the way, this blog does NOT represent anything beyond my own personal thoughts based on the way I think things are going. You could even blame it on my Tegretol dosage, to be perfectly honest (if the pain were not so intense I'd have skipped this med for sure). I am not even on the team that decides these things any more and I wasn't in charge of the strategy when I was then. Anyone who quotes me with prefacing words like "According to Microsoft..." is a complete and utter moron.
It starts with a nursery rhyme, as it turns out a very popular one from what I am told:
Little Miss Muffet Sat on a tuffet, Eating her curds and whey; Along came a spider, Who sat down beside her And frightened Miss Muffet away.
You might wonder where I am going from here.
Well, first there is the mail I got (product and personal info removed because it just seemed prudent):
Dear Michael ..We have been contacted by a group of Kurdish people who would like to translate our product to Kurdish. However the problem we are facing is that we cannot find (checked winnt.h) the language ids (e.g. LANG_KURDISH) associated with the Kurdish language and sub language/culture.I have been reading the following paper It is Time to Add Kurdish Culture to VS .NET GlobalizationSince you have been referenced and I think to remember that you are involved with globalization at Microsoft I thought you are the right person to contact :-)Have the language ids for Kurdish already been defined? If not would it be possible to do so?As always thank you for your time and efforts.Best Regards,
Ah, so the title and the nursery rhyme is a bad pun (curds vs. Kurds) skirting a geopolitically sensitive issue!
Plus I have no idea who the authors talked to; although they quoted my book several times they never contacted me (by the address provided in the book or any other way) and I don't believe they contacted anyone with knowledge/understanding of the issue, or they would have gotten a much more accurate (though for them not so happy) of an answer.
Remember that nothing gets added to .NET; it is first added to Windows and then later to .NET.
So assuming that there were going to see (for example, off the top of my head) some or all of the following new locales added:
without (in many cases) the support and assistance of the governments who decide to let Microsoft ship software in their countries, and with the unhappiness of severaql of those countries as a part of the process, it is easy to imagine software banned in some of those places.
Microsoft can't set policy here, or be used as proof of a policy.
Thus we have our answer: the locale id values have not been defined, and they cannot be given the present environment.
In the end, whether we are talking about Kurdish, or Kashmiri, or any of a number of such languages, Sometimes, tech companies cannot take sides. If you do things with custom locales and custom keyboards, then the operating system will support the efforts, but these things can't so easily be put in the box, built in. And thus they can't be so easily put into the .Net Framework,
On the other hand, this is yet another reasons that LCIDs suck, and any software -- Microsoft or any other company -- that depends on LCIDs needs to get off of them. For everyone's sake. Note that it is the software's dependence on LCIDs that led to the question being asked in the first place!
This post brought to you by ڵ (U+06b5, a.k.a. ARABIC LETTER LAM WITH SMALL V)
I am going straight to hell for the title on this one, I know it....
The question, asked on a discussion list primarily concerned with WinForms, went something like this:
I am facing some issue regarding showing right Japanese chars. It get truncated from bottom for few chars. Our Application wide font is Segoe UI 8.25 pt.
This is one of those questions that leaves me momentarily stunned; I simply don't know what to say.
An 8pt font is truncating some Japanese characters?
By Design.
By De-freaking-sign, in fact.
Japanese Kanji may not not have as many ideographs in it as the bulk of Chinese Han in use but there are quite a few complex ones in there.
I mean even just for example ones like
亀亀亀亀亀亀亀
(U+4e80) that clearly need a certain minimum number of lines that 8pt won't give -- you can go head and count the minimum number of lines it needs quite easily! -- or
齽齽齽齽齽齽齽
(U+9f7d) which shows the problem even better -- squeezed down into 8pt seems ridiculous.
Hell, some will look pretty awful squeezed down into 9pt!
If there are areas where Kanji or Han or Hanja need to be displayed, there ,might need to be a bigger size. Perhaps if nothing else a way for users to tweak that application wide font, or at least the controls that can have that text in them.
And that is even before getting into the languages of Southeast and South Asia, that make the East Asian requirements seem positively puny. I mean, any size that truncates a bit of some Kanji will make Tibetan look at best like a decorative monitor smudge....
This blog brought to you by every character in Unicode that does not look so good at 8 pt....
As a by the way, this blog does NOT represent anything beyond my own personal thoughts. You could even blame it on my Tegretol dosage, to be perfectly honest (if the pain were not so intense I'd have skipped this med for sure). I am not even on the team that owns this code any more and I didn't own it when I was then. Just so you know....
Recently when Shawn posted Don't use MB_COMPOSITE, MB_PRECOMPOSED or WC_COMPOSITECHECK, there were a few things he didn't mention.
For example, there is the fact that in builds of Windows 7 prior to the final release, the behavior is designed to use normalization.
Now of course there are a bunch of cases not in the tables in this Microsoft technology that pre-dates Unicode Normalization by more than half a decade, but that is small change.
And yes it means that MB_PRECOMPOSED will be a no-op for most code pages but for Vietnamese will destroy that special form "V" described in blogs like Getting intermediate forms and Harder intermediate forms of characters and Frost's The Form Not Taken. Which destroys round-tripping for Vietnamese, at least.
But this too is chump change. Someone will care but it might have shipped before anyone noticed.
You know what did get noticed?
I'll give you a hint: MB_COMPOSITE.
And another hint: my own blog Stripping diacritics...., which uses Unicode Normalization Form D to do its work, just like MB_COMPOSITE would nominally be expected to convert the text to Form D,
You give up? Well look at the fixed version of that code, found in Stripping is an interesting job (aka On the meaning of meaningless, aka All Mn characters are non-spacing, but some are more non-spacing than others).
Converting fully formed modern precomposed Hangul syllables to composite Jamo might be a fine idea if Microsoft had not been de facto enjoined nearly a decade ago from providing the fonts that would allow those conjoining Jamo to be composed (of course buffer sizes would be almost tripled but nothing is perfect!).
Given the current situation, however, this led to a disastrous situation.
Now with all that said, nearly every word that Shawn said in Don't use MB_COMPOSITE, MB_PRECOMPOSED or WC_COMPOSITECHECK is true. I just felt like the proper context was needed.
Especially in the context of the one part of his blog I really do disagree with:
Hopefully I've terrified you and you'll stop using these flags, perhaps using NormalizeString() if you really need similar behavior.
Since Unicode Normalization will cause many of the very same problems that these flags cause (e.g. poor roundtripping and especially the problems I note above with both Form C and Form D), scaring people away from the flags and suggesting the use of a function that in many contexts will produce results just as scary is probably not the best idea.
It was written at what I have learned is probably not the best point to write a blog: when one is in the process of losing and/or has just lost the argument against a change, since it can color one's argument in a specific direction that under ordinary circumstances one might not choose to do....
This post brought to you by ピ (U+30d4, a.k.a. KATAKANA LETTER PI)