Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
Claudia and I are using two different kinds of smartphones.
I use the HTC Arrive, which is running Windows Phone 7.5.
She uses the HTC Rhyme, which is running Android.
I'm Sprint and she's Verizon, but the carrier differences are minor.
The real difference I notice is in the emoticons!
Becaise those primitive days of
;-)
and
:-p
are behind us now.
And even cursory glance at Android versus WP emoticons should make this clear:
It is clear that semantic content is very different.
And we always have to be aware of the differences between (just to give one example)
if we want to avoid misunderstandings, to say the least!
In a very real way, the two platforms have severe interoperability issues without us taking the time to understand the pragmatic (in the linguistic sense) differences between them when interpreting the meanings between the two.
I'm just thankful neither of us has an iPhone, and that I'm no longer using the Palm Pre (which has been relegated to backup phone status!).
And what to do once we get into the Emoji?
That will be a topic for another day!
No, the title of this blog is not any sort of riddle!
Almost no Dutchman (or for that matter Dutchwoman!) ever voluntarily uses the "Dutch" keyboard.
You know, this keyboard:
They really don't like it.
Not even a little bit.
Not even at all!
What they do largely prefer is the United States - International keyboard.
This keyboard.
Simple enough, right?
Well, I've been listening to people working in this space for a while.
For about 13 years now, though modifying it for how mind-numbing the complaints might be it seems more like 113 years.
They complain about how weird it is to have a United States - International keyboard layout attached to Dutch!
Sometimes customers get weird about it after our UI kind of thrust it at them when it used to be so often hidden to them, as I mentioned in Keyboard UI in setup hoist by its own petard?.
But anyway, people got over it each time.
Some of them still never saw it, but knew that was the layout they liked.
Anyway, if you looked across all places people use Windows, the % of locales using it according to SQM data is interesting, for several reasons:
First of all, it is ironic that of all of those locations have the UnitedStates - International keyboard specified as one of the LOCALE_SKEYBOARDSTOINSTALL except for the two locales located in that region -- English - United States and Spanish - United States.
Second of all, is interesting that such a large percentage of the people who include it explicitly are in the US, though one may have to run other queries to eliminate the many machines located in the US that run with other language settings to decide whether that number is truly interesting or not.
Third of all, of those top ten countries that use this keyboard, only half of them (and 31% of them) are regions that are even remotely likely to care about the Euro, at ALTGR+5:
Though of course 31% is certainly enough to make it worthwhile!
Of course given what the blog about I referenced pointed out, we may never know how many people learned of this through >= Vista OOBE or Windows 8, who may never have minded changing the name before the UI made it so prominent....
Just imagine if they'd listen back in 2006 and fixed that bug! :-)
The question the other day was:
My code calls GetNumberFormat(“0”) and this returns “.00” on a zh-cn system and “0.00” on an English system. We take the pre-decimal portion of the string and end up with a null string in the zh-cn locale.
Are there flags to GetNumberFormat to control the output to be “0.00” irrespective of locale? Or should I just handle this in my code?
Ah, here's a tricky one for you.
On the one hand, there is LOCALE_ILZERO, which is clearly documented as
----------------------------------------------------------------------------
Specifier for leading zeros in decimal fields.
Now obviously one wants to respect locale preferences.
I mentioned this locale data field previously, in Ambiguity of Language in the Platform SDK and Objection, managed code! That zero is leading!.
On the other hand, in this case the decimal behavior is being widely ignored anyway.
And there is a NUMBERFMT.LeadingZero you can pass to GetNumberFormat or GetNumberFormatEx, which can be used to force the function to behave as requested.
Though keep in mind what those functions say about their lpNumberFormat parameter:
Pointer to a NUMBERFMT structure that contains number formatting information, with all members set to appropriate values. If the application does not set this parameter to NULL, the function uses the locale only for formatting information not specified in the structure, for example, the locale string value for the negative sign.
So yes, you can modify lpFormat->LeadingZero to make it 1, sure.
But you should also modify lpFormat->NumDigits to make it 0, under the circumstances.
You can probably ignore lpFormat->NegativeOrder and lpFormat->lpDecimalSep this time.
But if you don't specify several other members based off of GetLocaleInfo/GetLocaleInfoEx calls like lpFormat->lpThousandSep via LOCALE_STHOUSAND or lpFormat->Grouping via parsing of LOCALE_SGROUPING, then there isn't much point to calling a locale-specific function to format, anyway!
So the work here is definitely manageable, but may be more work then was originally hoped for....
The WM_UNICHAR message has had an interesting history.
Over time it has come up in this Blog occasionally, in blogs like Will the real Unicode character message please stand up?, which tend to be more generous in suggesting apps on Windows should just take it if they want to, but by 2008 questions like Why is my WM_UNICHAR handler never called? in Stack Overflow pointed out that the message itself crystallized it's intent/purpose pretty clearly:
The WM_UNICHAR message is equivalent to WM_CHAR, but it uses Unicode Transformation Format (UTF)-32, whereas WM_CHAR uses UTF-16. It is designed to send or post Unicode characters to ANSI windows and it can can handle Unicode Supplementary Plane characters.
But the other day, when Marc asked (on behalf of a customer):
When is WM_UNICHAR used in Windows 7? I thought that I might be able to create WM_UNICHAR from an IME by entering a surrogate pair through Unicode input but I only saw WM_CHAR messages. Under what conditions will the WM_UNICHAR message be used?
There are reportedly still some 3rd party IMEs that use it even for Unicode windows, but IMEs from Microsoft have largely moved away from that.
I suspect it is largely a philosophical idea, and how strongly you feel that a single discrete character should always be returned -- an issue some 3rd party IMEs do vote with their opinion.
But Microsoft IMEs have another good reason to move away from the view of "one character at a time" in such messages -- Unicode Variation Selectors, discussed in blog like UCS-2 to UTF-16, Part 10: Variation[ Selector] on a theme....
Now not every Microsoft IME is putting out variation selectors in its output, even in Windows 8.
But some of them do, and that's enough to convince the IME team at Microsoft to not bother with the "incomplete" WM_UNICHAR.
I still believe that if you get the message, whether it is a Unicode window or not, you should accept it.
For the sake of those 3rd party IMEs, if nothing else.
But there are people who don't bother now, since the docs now make it clear that this is "off brand" usage. So perhaps at this point I am tilting at windmills?
Previous blogs in this series:
I might have been the only software developer in the world who is confused about the world of "Desktop" versus "Metro".
Though I'm inclined to doubt it. :-)
Like when I was asked just the other day:
I am looking for an API that converts Unicode to Punycode. I can see that there’s already a .NET API just for this, but it’s not in Metro. What would it take to make it Metro?
My first thought was to point to IdnToAscii.
A function that claims to be available to both Desktop and Metro apps.
But someone was concerned about this answer:
As far as I know, IdnToAscii is only for C++.The question is for C#/VB.
Okay, here is where things get complicated.
I think I have it straight now, though.
Here are the travails to get there -- imagine each one required several emails to clarify (since each one did!):
So really, the fact that you cannot get all of this from a glance at the docs is that the docs are a work in progress.
But I can live with that -- as long as people can get to functionality, they are not blocked.
We can get stuff done!
Whether Desktop or Metro.
Whether Native or Managed.
Whether x86 or x64 or WoA (ARM).
I think I know what my first Modern app will be!
The question came in just the other day:
Hi Michael,
Do you have a list of all the keyboard cultures in Win8—or alternately which API would return me such a list (looking at your blog(s) I couldn’t easily determine what this API would be. All the APIs seemed to be about installed keyboards).
Questions like this always make me nervous you see.
Because almost invariably, there is something behind such questions.
Now in the end, there is no specific Win32 function to enumerate every keyboard.
Essentially one has enumerate every subkey to
HKLM\SYSTEM\CurrentControlSet\Control\Keyboard Layouts
So, I decided, after saying this, to try to find out the underlying issue. The answer came back readily enough:
There seems to be a change in behaviour (?) of when converting the keyboard ID to a culture. For example, zh-HK used to work but is now returning zh-HanT-HK. I wanted a list of these to ensure we do the right thing on our end.
It asks for the currentInputLanguageTag, and then tries to convert it to lcid using LocaleNameToLCID. However, it seems to fail for Tibetan and also Chinese Traditional IME (Hong Kong with Microsoft IME) languages which come shipped with Windows.
Now here is where the problem is obviously tangled up with someone else's implementation.
You see, that registry key's subkeys are called Keyboard Layout IDentifiers (aka KLIDs), some of which have Layout Id values under them, and at runtime each installed (loaded) keyboard layout has an HKL value associated with it, the lower 16 bits of which has a relationship with the KLID's lower 16 bits -- both of which sometimes have a relationship with the Language Identifier (aka LANGID) values that represent the lower 16 bits of Locale Identifiers.
Times they don't have such a relationship include both situations I directly caused as described in Getting the language (and more!) of an LCID-less keyboard/MSKLC keyboard layout names in your own language and situations I inspired/indirectly caused as described in The evolving Story of Locale Support, part 2 (raising the roof on keyboards).
Anyway, you might have noticed one of the things that was never mentioned:
keyboard IDs.
currentInputLanguageTags.
cultures.
And of course there is no LCID value that LocaleNameToLCID would ever return zh-Hant-HK for 0x0c04 (aka zh-HK).
I mean we have our problems in Windows mapping names -- e.g. Four cases where I don't like ResolveLocaleName (and you shouldn't either!) -- but this looks like an external issue is contributing or causing the problem.
So whatever the problem is here, some work to clean up the stuff the callers are doing will be needed before I could comment on whatever problem the are hitting.
To get back to the original question about a list?
We don't have that, as I said.
But if you enumerate those subkeys under
you can then extract the lower 16 bits, and then anytime it doesn't return 0x0c00, you can call LCIDToLocaleName (removing duplicates at they come up) to get a fairly robust list of "keyboard cultures".
This leaves just two other groups:
I find the last bit to be slightly unfortunate from a decision-making standpoint (I originally recommended they add a Layout Locale Name value too) , but the list will be limited since there are only a few of them.
At some point, I'll probably publish the list here myself, and recommend that the MSDN documentation writers do the same
That just leaves out the IMEs, but the original question was just about keyboards, so I guess they're their own for that bit. :-)
I noticed that in some locales the LOCAL_SNATIVEDIGITS have special digits, but the LOCALE_IDIGITSUBSTITUTION isn't set to use them.
What's up with that?
Yeah, we do that sometimes....
Kind of the opposite problem of when we turn digit substitution on them even when we are using U+0030 to U+0039.
I mean, in both cases digit substitution isn't ultimately done.
But either way we don't do useful work!
I don't know, really. I guess it is just kind of nice to have the digits that may be used by a locale be queryable....
Even if we don't use them ourselves.
Every once in a while I see an app that substitutes digits for those locales -- it is a great way to find people that do their own digit substitution but skip some of our settings.
Sharp eye noticing the issue, FWIW. :-)
Trust is hard.
And once lost, it is hard to get back.
For me, it wasn't about the raisins.
I hated 'em, and I stopped trusting cookies sometimes. But I never blamed the people who handed them to me.
And I was raised Jewish, so the Santa thing didn't hit me.
Or the Easter Bunny, for that matter.
Though I grew up in "Catholic Beachwood", so I witnessed some people who hit those issues....
And I always knew the Great Pumpkin was fake.
But the Tooth Fairy debacle?
That was where the wheels started to come off the wagon, for me.
I recall (after losing a bicuspid that I helped out myself once it was loose) overhearing my parents talk about who had money tor my tooth.
Aha!
But the bigger problem came before that.
And I didn't even know it....
You see, I wanted to be an astronaut.
I was too young to do much to help my resume then, so I did the one thing I knew I could do at that age.
I drank Tang!
I knew astronauts drank it, so I just would make sure to have it be a part of my training.
Easy.
But then, something unfortunate happened.
I had a coldish kind of thing.
Nothing serious, but I was given Dimetapp.
But I had decided I hated Dimetapp.
My mother had a "brilliant" idea.
She saw me snarfing up Tang like it was going out of style.
My astronaut training program had to stay on track, of course.
So she dosed the Tang with Dimetapp.
Well, perhaps you can guess what happened next.
Tang started tasting really nasty.
Not always, but my young detective mind concluded that the problem was too much Tang!
I realized my dreams of being an astronaut could never be accomplished.
I mean, I couldn't even pass the Tang test! How was I supposed to do all of the other astronaut things?
Suddenly I hated Tang.
I mean really hated Tang.
It wasn't until years later that I learned about the "Dimetapp Roofie" issue.
And how they ruined my dreams of being in space.
Just ruined them.
And caused trust issues! :-(
My other dream, to be a member of The Supremes, also didn't work out. As you probably guessed. But that's a story for another day....
People who attended the 26th Internationalization and Unicode conference might remember John McConnell's Day 2 Keynote The Windows Language Roadmap or When Do We Get Rongo-Rongo?.
Or maybe you saw the later version of the talk at the 2004 Global Development and Deployment Conference.
Or maybe you just read my blog When will we support Rongo-Rongo?.
Anyway, this becomes a rather fascinating addition to the story of Rongo-Rongo and Easter Island:
Easter island heads have bodies!??
If nothing else, that may mean many more text samples to work with.
Hidden in some back or kneecap or maybe even crotch there could be some Rosetta Stone equivalent....
Maybe even there will be enough information for Rongo-Rongo to be encoded in Unicode! :-)
Every once in a while, I am reminded of the fact that I've been blogging for a good long while.
Like the other day, when Andrew added me to an email thread, saying:
+Michael, in case he knows if there are specific reasons why the keyboard name “US” was marked as not localizable with the exception of Arabic and Hebrew.
The United States-International keyboard is named differently in English too, it’s not just an issue for localization:
My first thought: "how the hell would I know?"
I'm not a localization engineer, after all.
But then thought about it.
And remembered my Inaccurate localization can make you bust out laughing blog from September of 2006.
I then realized that the localization instructions were indeed a step to try to keep a US --> BR type situation from happening again!
A slight over-reaction, to be sure. But their hearts were in the right place.
The IPE realized the best fix for the situation:
Thanks Michael.So there wasn’t really a good reason to put the instruction there, since it was just to ensure that the translators didn’t localize US to their country equivalent. I believe we should remove the instruction and have everyone localize US to the translated equivalent of United States, to be consistent with “United States-International”
Perhaps the notion that my blog has no wider meaning or usage was knocked a bit over by the email. :-(
Ah well, as they say, all's well that ends better!
I suppose being able to recall such issues undermines my claim that I don't know how such things are decided.....
:-)
This blog today is about a character in Unicode.
U+00a0, aka NO-BREAK SPACE, specifically.
I could have made it an Every Character Has a Story blog, almost.
Except it is really going to be about locales on Microsoft platforms, rather than a history and/or story of the character itself.
So I won't talk about the suggestion to Sri Lanka to use it in their Standards, or the role Unicode has it play in lone combining characters, or any of the other interesting stories about it.
Sorry!
To start, there is a regular space, which allows anyone rendering text to treat it opportunistically as a line breaking opportunity.
Like if you have more characters in a line then you have line, then it will break at one of those places -- perhaps on that space!
But if you put a NO-BREAK SPACE there, then it will not be used as a line breaking opportunity -- the text on either side will act as if it is just another letter or something.
I endeavored to explain to my girlfriend what U+00a0 does, and she suggested maybe it was like how she and I were connected. That'll work. :-)
Anyhow, if you look at all of the LOCALE data in Windows, there are ~185 instances of the NO-BREAK SPACE, U+00a0.
The ~185 instances fall into two categories:
Now that second category makes sense.
If one has a month name of كانون الثاني, one may genuinely want to not let it span lines.
And so on.
The first category also makes sense -- one may want to make sure that the number $100 000 000.00 or 45 678.00 doesn't get split up either.
In fact, one may wonder about the ~9 cases that are similar to category #1 that use U+0020 for their LOCALE_STHOUSAND or LOCALE_SMONTHOUSANDSEP, right? :-)
You have to wonder if some or all of those ~9 and of the other ~214 cases that fall into category #2 usages of U+0020 are mistakes that would also be U+00a0, if they had a chance to think about it!
And then there are a few other interesting cases:
All of these cases have one thing in common.
According to docs, they insert a SPACE (LOCALE_ICURRENCY calls it a "separation") in all of these cases, even if the LOCALE_STHOUSAND or LOCALE_SMONTHOUSANDSEP have U+00a0 in them.
Obviously either the docs are wrong or the code creates formatted strings that could be broken before the line ends even if the separators clearly try to avoid this.
I don't know about you, but both ideas fail to sit very well with me, entirely.
How about you?
I'm almost afraid to try. Almost....
Tuesday May 8th, 2012 was yesterday.
For many, it was just another day.
Some births, some deaths. Some weddings. Maybe even some divorces.
Just like any other day.
But for another group, it was a really big day.
A day that might hole up in some small way to those who came in, those who went out, those who came together, those who went apart.
Because it was yesterday that the Unicode Technical Committee approved
U+20BA, aka TURKISH LIRA SIGN, aka
for inclusion in Unicode 6.2.
In fact, it is the only character being added to Unicode 6.2.
In the words of colleague Peter Constable after the issue was discussed and approved by the UTC:
This version will be published before the next meeting of the ISO committee that maintains ISO 10646. There is no concern that ISO would not approve the character for encoding or want to assign a different code point. Hence this code point can be assumed to stable as of now and can be used in implementations.
Cool. Pretty exciting, right?
This is the concrete step that puts the character first described here in Not the Lira or the New Lira, but a New Turkish Lira, nevertheless on the road to ending up in Windows, likely in time for the next version (and if the "Rupee Rumba" I first described in Rupee! Rupee! Let down your CHAR! is any indication, then some prior versions will see some support too -- Vista, Server 2008, Windows 7, Server 2008 R2).
Exact plans and schedule TBD for now.
I'll tell more when I know more about the future and such.
But in any case, it was not just another day, by any means!
So there is a blog I planned on writing in a week or two.
It was based on stuff people had been asking me after I blogged Can't Touch This! (Though I can TYPE this because I have the hardware, and the keyboard…).
About the touch optimized soft keyboard layouts...
Feedback was basically in three different categories:
Category #1: When is that guest blog coming?
I have several interested folks on the team that did the actual work to create the various touch-optimized layouts who have promised to do something here.
No firm date yet, but sooner or later it will happen, if for no other reason than to correct small mistakes of mine as I play with things.
I'll keep everyone posted. And I'll nag them enough to make sure they don't forget. :-)
Category #2: Can you really change the default for the Cherokee layout?
Well, I can't.
But others can!
And feedback is always a good thing.
Remember that any keyboard you create with MSKLC (once they fix the bug that made them not work, which has happened and you'll all see it in the next update, whenever that is) has to be exposed via that same Soft Input Panel. This will work the same way as any keyboard where the optimized layout is turned off....
Category #3: How can I create my own touch-optimized layouts?
This was the most frequently asked question, and it is the reason I moved this blog up to now instead of a few weeks from now.
Anyway, the popularity of the question is unfortunate since it is the one question I don't have a good answer for....
There are no current plans I know of to expose this "touch optimized layout creation" to third parties.
Even suggestions along the lines of an MSKLCesque update to do such things failed to generate the kind of interest I would have hoped for.
I'll keep suggesting, though.
If you've spent time trying out any of the optimized layouts, you too might have an opinion on this.
Feel free to leave it here, I'll make sure it gets to the right place... :-)
The question seemed simple enough:
Where can I get the Windows 7 SP1 MUI Language Pack for zh-TW? I was only able to find on for zh-HK!
The answer is unfortunately not so simple....
I first talked about it in Yes, Ivo^H^H^HVirginia, there is a zh-HK *and* a zh-TW.
Where I pointed out that although there is a zh-TW version of Windows 7, there was no MUI Language Pack for zh-TW.
Although some have found RTM files for a zh-TW Language Pack, it was missing a key feature that the zh-TW version of Windows itself has.
In fact, there are only three real differences between zh-TW and zh-HK:
Now one could argue that there should be more differences in terms of translation, but there aren't.
And since the zh-TW Language Pack was only doing the first two (one of which doesn't really matter and the other of which can be done via a system locale change), it really wasn't doing anything of consequence.
Plus it was confusing for anyone trying to compare the Language Pack versus the full Taiwan version.
So if you prefer the Taiwan style glyphs, you can just pick up the zh-HK Language Pack and a zh-TW system locale...
You might be tired ot me blogging about Digit Substitution.
I mean, it has been a rather commonly covered topic, over the years.
The entire issue can often be thought of as a pitched battle between competing forces.
One of the fundamental forces pushing us away from it is the one embodied by the moves of Internet Explorer that I described in Suddenly, in a bit more time than a blink of an eye, "standards support" becomes "less i18n support".
This is the move to be more conformant to standards.
And standards don't capture this notion.
It is also embodied in blogs like "Digit substitution is maybe a tolerable hack for displaying UI, but it’s definitely bad if you’re creating content."
You know, where people who work in the area and have some say over some of the overall direction of the product will go on the record with how problematic and "off the reservation" the feature often is.
Intererestingly, when it comes to Arabic, one of the opposing forces that is in favor of Digit Substitution is me.
Because, as I was reminded just yesterday, we have a bunch of customers who don't give a fig for international standards.
And they couldn't care less about whether the support of "Context" type Digit Substitution has implementation limitations in our UI.
They still want it, in pretty much half of the Arabic language locales.
When we pointed out many of the limitations and asked whether they would prefer either the "None" aka "never substitute" or "National" aka "always substitute" settings.
Across the board, the answer was:
It should be contextual as it is more convenient if digits come in the same script the context is. So, it should be contextual.
I realized something, in that moment.
Our concerns about showing different kinds of digits on the same screen, on the "limits of the GDI notion of Context"?
Not so important.
This isn't about us!
It's about them, and their content -- their expectations.
Suddenly, as long as developers have that same view, The evolving Story of Locale Support, part 22: Digit Substitution 2.0 is starting to look even more impressive.
And me?
I serve at the pleasure of the customer.... :-)