Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
Previous blogs in this series:
I might have been the only software developer in the world who is confused about the world of "Desktop" versus "Metro".
Though I'm inclined to doubt it. :-)
Like when I was asked just the other day:
I am looking for an API that converts Unicode to Punycode. I can see that there’s already a .NET API just for this, but it’s not in Metro. What would it take to make it Metro?
My first thought was to point to IdnToAscii.
A function that claims to be available to both Desktop and Metro apps.
But someone was concerned about this answer:
As far as I know, IdnToAscii is only for C++.The question is for C#/VB.
Okay, here is where things get complicated.
I think I have it straight now, though.
Here are the travails to get there -- imagine each one required several emails to clarify (since each one did!):
So really, the fact that you cannot get all of this from a glance at the docs is that the docs are a work in progress.
But I can live with that -- as long as people can get to functionality, they are not blocked.
We can get stuff done!
Whether Desktop or Metro.
Whether Native or Managed.
Whether x86 or x64 or WoA (ARM).
I think I know what my first Modern app will be!
No, the title of this blog is not any sort of riddle!
Almost no Dutchman (or for that matter Dutchwoman!) ever voluntarily uses the "Dutch" keyboard.
You know, this keyboard:
They really don't like it.
Not even a little bit.
Not even at all!
What they do largely prefer is the United States - International keyboard.
Simple enough, right?
Well, I've been listening to people working in this space for a while.
For about 13 years now, though modifying it for how mind-numbing the complaints might be it seems more like 113 years.
They complain about how weird it is to have a United States - International keyboard layout attached to Dutch!
Sometimes customers get weird about it after our UI kind of thrust it at them when it used to be so often hidden to them, as I mentioned in Keyboard UI in setup hoist by its own petard?.
But anyway, people got over it each time.
Some of them still never saw it, but knew that was the layout they liked.
Anyway, if you looked across all places people use Windows, the % of locales using it according to SQM data is interesting, for several reasons:
First of all, it is ironic that of all of those locations have the UnitedStates - International keyboard specified as one of the LOCALE_SKEYBOARDSTOINSTALL except for the two locales located in that region -- English - United States and Spanish - United States.
Second of all, is interesting that such a large percentage of the people who include it explicitly are in the US, though one may have to run other queries to eliminate the many machines located in the US that run with other language settings to decide whether that number is truly interesting or not.
Third of all, of those top ten countries that use this keyboard, only half of them (and 31% of them) are regions that are even remotely likely to care about the Euro, at ALTGR+5:
Though of course 31% is certainly enough to make it worthwhile!
Of course given what the blog about I referenced pointed out, we may never know how many people learned of this through >= Vista OOBE or Windows 8, who may never have minded changing the name before the UI made it so prominent....
Just imagine if they'd listen back in 2006 and fixed that bug! :-)
You might be tired ot me blogging about Digit Substitution.
I mean, it has been a rather commonly covered topic, over the years.
The entire issue can often be thought of as a pitched battle between competing forces.
One of the fundamental forces pushing us away from it is the one embodied by the moves of Internet Explorer that I described in Suddenly, in a bit more time than a blink of an eye, "standards support" becomes "less i18n support".
This is the move to be more conformant to standards.
And standards don't capture this notion.
It is also embodied in blogs like "Digit substitution is maybe a tolerable hack for displaying UI, but it’s definitely bad if you’re creating content."
You know, where people who work in the area and have some say over some of the overall direction of the product will go on the record with how problematic and "off the reservation" the feature often is.
Intererestingly, when it comes to Arabic, one of the opposing forces that is in favor of Digit Substitution is me.
Because, as I was reminded just yesterday, we have a bunch of customers who don't give a fig for international standards.
And they couldn't care less about whether the support of "Context" type Digit Substitution has implementation limitations in our UI.
They still want it, in pretty much half of the Arabic language locales.
When we pointed out many of the limitations and asked whether they would prefer either the "None" aka "never substitute" or "National" aka "always substitute" settings.
Across the board, the answer was:
It should be contextual as it is more convenient if digits come in the same script the context is. So, it should be contextual.
I realized something, in that moment.
Our concerns about showing different kinds of digits on the same screen, on the "limits of the GDI notion of Context"?
Not so important.
This isn't about us!
It's about them, and their content -- their expectations.
Suddenly, as long as developers have that same view, The evolving Story of Locale Support, part 22: Digit Substitution 2.0 is starting to look even more impressive.
I serve at the pleasure of the customer.... :-)
This blog today is about a character in Unicode.
U+00a0, aka NO-BREAK SPACE, specifically.
I could have made it an Every Character Has a Story blog, almost.
Except it is really going to be about locales on Microsoft platforms, rather than a history and/or story of the character itself.
So I won't talk about the suggestion to Sri Lanka to use it in their Standards, or the role Unicode has it play in lone combining characters, or any of the other interesting stories about it.
To start, there is a regular space, which allows anyone rendering text to treat it opportunistically as a line breaking opportunity.
Like if you have more characters in a line then you have line, then it will break at one of those places -- perhaps on that space!
But if you put a NO-BREAK SPACE there, then it will not be used as a line breaking opportunity -- the text on either side will act as if it is just another letter or something.
I endeavored to explain to my girlfriend what U+00a0 does, and she suggested maybe it was like how she and I were connected. That'll work. :-)
Anyhow, if you look at all of the LOCALE data in Windows, there are ~185 instances of the NO-BREAK SPACE, U+00a0.
The ~185 instances fall into two categories:
Now that second category makes sense.
If one has a month name of كانون الثاني, one may genuinely want to not let it span lines.
And so on.
The first category also makes sense -- one may want to make sure that the number $100 000 000.00 or 45 678.00 doesn't get split up either.
In fact, one may wonder about the ~9 cases that are similar to category #1 that use U+0020 for their LOCALE_STHOUSAND or LOCALE_SMONTHOUSANDSEP, right? :-)
You have to wonder if some or all of those ~9 and of the other ~214 cases that fall into category #2 usages of U+0020 are mistakes that would also be U+00a0, if they had a chance to think about it!
And then there are a few other interesting cases:
All of these cases have one thing in common.
According to docs, they insert a SPACE (LOCALE_ICURRENCY calls it a "separation") in all of these cases, even if the LOCALE_STHOUSAND or LOCALE_SMONTHOUSANDSEP have U+00a0 in them.
Obviously either the docs are wrong or the code creates formatted strings that could be broken before the line ends even if the separators clearly try to avoid this.
I don't know about you, but both ideas fail to sit very well with me, entirely.
How about you?
I'm almost afraid to try. Almost....
Trust is hard.
And once lost, it is hard to get back.
For me, it wasn't about the raisins.
I hated 'em, and I stopped trusting cookies sometimes. But I never blamed the people who handed them to me.
And I was raised Jewish, so the Santa thing didn't hit me.
Or the Easter Bunny, for that matter.
Though I grew up in "Catholic Beachwood", so I witnessed some people who hit those issues....
And I always knew the Great Pumpkin was fake.
But the Tooth Fairy debacle?
That was where the wheels started to come off the wagon, for me.
I recall (after losing a bicuspid that I helped out myself once it was loose) overhearing my parents talk about who had money tor my tooth.
But the bigger problem came before that.
And I didn't even know it....
You see, I wanted to be an astronaut.
I was too young to do much to help my resume then, so I did the one thing I knew I could do at that age.
I drank Tang!
I knew astronauts drank it, so I just would make sure to have it be a part of my training.
But then, something unfortunate happened.
I had a coldish kind of thing.
Nothing serious, but I was given Dimetapp.
But I had decided I hated Dimetapp.
My mother had a "brilliant" idea.
She saw me snarfing up Tang like it was going out of style.
My astronaut training program had to stay on track, of course.
So she dosed the Tang with Dimetapp.
Well, perhaps you can guess what happened next.
Tang started tasting really nasty.
Not always, but my young detective mind concluded that the problem was too much Tang!
I realized my dreams of being an astronaut could never be accomplished.
I mean, I couldn't even pass the Tang test! How was I supposed to do all of the other astronaut things?
Suddenly I hated Tang.
I mean really hated Tang.
It wasn't until years later that I learned about the "Dimetapp Roofie" issue.
And how they ruined my dreams of being in space.
Just ruined them.
And caused trust issues! :-(
My other dream, to be a member of The Supremes, also didn't work out. As you probably guessed. But that's a story for another day....
Claudia and I are using two different kinds of smartphones.
I use the HTC Arrive, which is running Windows Phone 7.5.
She uses the HTC Rhyme, which is running Android.
I'm Sprint and she's Verizon, but the carrier differences are minor.
The real difference I notice is in the emoticons!
Becaise those primitive days of
are behind us now.
And even cursory glance at Android versus WP emoticons should make this clear:
It is clear that semantic content is very different.
And we always have to be aware of the differences between (just to give one example)
if we want to avoid misunderstandings, to say the least!
In a very real way, the two platforms have severe interoperability issues without us taking the time to understand the pragmatic (in the linguistic sense) differences between them when interpreting the meanings between the two.
I'm just thankful neither of us has an iPhone, and that I'm no longer using the Palm Pre (which has been relegated to backup phone status!).
And what to do once we get into the Emoji?
That will be a topic for another day!
Doug Ewell commented in response to The relationship between the 'United States - International' keyboard layout and the Euro....:
And now, on The Unicode List™, we have someone wanting to know the position where INDIAN RUPEE SIGN will be added to the "U.S. English" keyboard. I don't know if he meant specifically Microsoft's, but maybe. The guy is located in Pune. It shows that despite their names, and despite the presence of local keyboards, the "U.S." keyboard layouts are in worldwide use.
The short answer, unsurprisingly enough to some, is:
Perhaps expanding it to the long answer would be prudent at this juncture!
When the rupee was added to Windows, a new en-IN type keyboard was added as well.
It takes kbdus.dll, and adds INDIAN RUPEE SIGN to ALTGR+4.
The United States - International keyboard already has something assigned there:
aka U+00a4, aka CURRENCY SIGN.
And we don't change keyboard layout assignments.
Also, SHIFT + ALTGR + 4 is taken. By
aka U+00a3,aka POUND SIGN.
Note that ALTGR+R is also taken there -- it is
aka U+00ae, aka REGISTERED SIGN.
But by that time, you've left the land of the even theoretically intuitive....
Now we did consider adding it to kbdus.dll, but that idea was rejected -- since adding ALTGR+4 to kbdus.dll would disable the lone "RIGHT ALT" key, causing it to send CTRL+ALT when it was typed in.
Doing that everywhere in the world for the sake of any one country's currency symbol was just deemed a bad idea -- that's where that en-IN keyboard idea came from, the meeting where we realized kbdus.dll was out of bounds....
So there is a blog I planned on writing in a week or two.
It was based on stuff people had been asking me after I blogged Can't Touch This! (Though I can TYPE this because I have the hardware, and the keyboard…).
About the touch optimized soft keyboard layouts...
Feedback was basically in three different categories:
Category #1: When is that guest blog coming?
I have several interested folks on the team that did the actual work to create the various touch-optimized layouts who have promised to do something here.
No firm date yet, but sooner or later it will happen, if for no other reason than to correct small mistakes of mine as I play with things.
I'll keep everyone posted. And I'll nag them enough to make sure they don't forget. :-)
Category #2: Can you really change the default for the Cherokee layout?
Well, I can't.
But others can!
And feedback is always a good thing.
Remember that any keyboard you create with MSKLC (once they fix the bug that made them not work, which has happened and you'll all see it in the next update, whenever that is) has to be exposed via that same Soft Input Panel. This will work the same way as any keyboard where the optimized layout is turned off....
Category #3: How can I create my own touch-optimized layouts?
This was the most frequently asked question, and it is the reason I moved this blog up to now instead of a few weeks from now.
Anyway, the popularity of the question is unfortunate since it is the one question I don't have a good answer for....
There are no current plans I know of to expose this "touch optimized layout creation" to third parties.
Even suggestions along the lines of an MSKLCesque update to do such things failed to generate the kind of interest I would have hoped for.
I'll keep suggesting, though.
If you've spent time trying out any of the optimized layouts, you too might have an opinion on this.
Feel free to leave it here, I'll make sure it gets to the right place... :-)
Nothing technical, though I'm technically pretty disappointed with New Mexico at the moment....
I am in Albuquerque right now.
With my girlfriend.
The trip was kind of last minute, but I figured over the past few years I've taken the iBot all over India.
So how hard could Albuquerque be? :-)
The Hard Rock Hotel and Casino, located at the Pueblo of Isleta, is in Albuquerque.
Population over half a million.
The largest city in New Mexico.
Not so much, as it turns out.
No taxi cab company had wheelchair accessible cabs.
No car rental company had car or truck or van available.
The volunteer at the information desk was doing his best, but even he was apologizing on behalf of Albuquerque, and New Mexico, every few minutes when yet another call he'd make would fall through.
I won't name the churches that wouldn't take non-parishoners in their shuttles, though they didn't even apologize all that sincerely.
So much for "being good Christians" -- all of them were unavailable to assist.
The volunteer admitted this happened earlier in the day, too.
I asked if he had any details.
It was a wounded veteran who was trying to get to the V.A. Hospital.
Of course I wondered how the situation resolved itself.
The fire department took a break from getting cats out of trees to give the veteran a ride....
Probably a tough sell to get them to get me to a hotel!
Assuming I could find a way to roll out of the airport, the shortest hotel route is 10.2 miles away.
And me with a mere 50% of my battery available.
The A.D.A. may have its limitations in how it low-balls accessibility. But if the "not compliant' sticker had to be put on the map, this place might win, hands down.
Thanks to heroic efforts of the assistant manager at the Hard Rock and two guys who lifted into the back of a shuttle while I transferred, I could have missed everything.
Including my girlfriend!
Pretty awful story about Albuquerque and accessibility, in any case....
I have to consider upcoming trips to North Carolina and Oklahoma, to make sure they have a better story here -- or if not, then how to at least jury-rig a story.
Anyone want to take bets on their respective accessibility stories? :-)
Tuesday May 8th, 2012 was yesterday.
For many, it was just another day.
Some births, some deaths. Some weddings. Maybe even some divorces.
Just like any other day.
But for another group, it was a really big day.
A day that might hole up in some small way to those who came in, those who went out, those who came together, those who went apart.
Because it was yesterday that the Unicode Technical Committee approved
U+20BA, aka TURKISH LIRA SIGN, aka
for inclusion in Unicode 6.2.
In fact, it is the only character being added to Unicode 6.2.
In the words of colleague Peter Constable after the issue was discussed and approved by the UTC:
This version will be published before the next meeting of the ISO committee that maintains ISO 10646. There is no concern that ISO would not approve the character for encoding or want to assign a different code point. Hence this code point can be assumed to stable as of now and can be used in implementations.
Cool. Pretty exciting, right?
This is the concrete step that puts the character first described here in Not the Lira or the New Lira, but a New Turkish Lira, nevertheless on the road to ending up in Windows, likely in time for the next version (and if the "Rupee Rumba" I first described in Rupee! Rupee! Let down your CHAR! is any indication, then some prior versions will see some support too -- Vista, Server 2008, Windows 7, Server 2008 R2).
Exact plans and schedule TBD for now.
I'll tell more when I know more about the future and such.
But in any case, it was not just another day, by any means!
People who attended the 26th Internationalization and Unicode conference might remember John McConnell's Day 2 Keynote The Windows Language Roadmap or When Do We Get Rongo-Rongo?.
Or maybe you saw the later version of the talk at the 2004 Global Development and Deployment Conference.
Or maybe you just read my blog When will we support Rongo-Rongo?.
Anyway, this becomes a rather fascinating addition to the story of Rongo-Rongo and Easter Island:
Easter island heads have bodies!??
If nothing else, that may mean many more text samples to work with.
Hidden in some back or kneecap or maybe even crotch there could be some Rosetta Stone equivalent....
Maybe even there will be enough information for Rongo-Rongo to be encoded in Unicode! :-)
The question the other day was:
My code calls GetNumberFormat(“0”) and this returns “.00” on a zh-cn system and “0.00” on an English system. We take the pre-decimal portion of the string and end up with a null string in the zh-cn locale.
Are there flags to GetNumberFormat to control the output to be “0.00” irrespective of locale? Or should I just handle this in my code?
Ah, here's a tricky one for you.
On the one hand, there is LOCALE_ILZERO, which is clearly documented as
Specifier for leading zeros in decimal fields.
Now obviously one wants to respect locale preferences.
I mentioned this locale data field previously, in Ambiguity of Language in the Platform SDK and Objection, managed code! That zero is leading!.
On the other hand, in this case the decimal behavior is being widely ignored anyway.
And there is a NUMBERFMT.LeadingZero you can pass to GetNumberFormat or GetNumberFormatEx, which can be used to force the function to behave as requested.
Though keep in mind what those functions say about their lpNumberFormat parameter:
Pointer to a NUMBERFMT structure that contains number formatting information, with all members set to appropriate values. If the application does not set this parameter to NULL, the function uses the locale only for formatting information not specified in the structure, for example, the locale string value for the negative sign.
So yes, you can modify lpFormat->LeadingZero to make it 1, sure.
But you should also modify lpFormat->NumDigits to make it 0, under the circumstances.
You can probably ignore lpFormat->NegativeOrder and lpFormat->lpDecimalSep this time.
But if you don't specify several other members based off of GetLocaleInfo/GetLocaleInfoEx calls like lpFormat->lpThousandSep via LOCALE_STHOUSAND or lpFormat->Grouping via parsing of LOCALE_SGROUPING, then there isn't much point to calling a locale-specific function to format, anyway!
So the work here is definitely manageable, but may be more work then was originally hoped for....
The question came in just the other day:
Do you have a list of all the keyboard cultures in Win8—or alternately which API would return me such a list (looking at your blog(s) I couldn’t easily determine what this API would be. All the APIs seemed to be about installed keyboards).
Questions like this always make me nervous you see.
Because almost invariably, there is something behind such questions.
Now in the end, there is no specific Win32 function to enumerate every keyboard.
Essentially one has enumerate every subkey to
So, I decided, after saying this, to try to find out the underlying issue. The answer came back readily enough:
There seems to be a change in behaviour (?) of when converting the keyboard ID to a culture. For example, zh-HK used to work but is now returning zh-HanT-HK. I wanted a list of these to ensure we do the right thing on our end.
It asks for the currentInputLanguageTag, and then tries to convert it to lcid using LocaleNameToLCID. However, it seems to fail for Tibetan and also Chinese Traditional IME (Hong Kong with Microsoft IME) languages which come shipped with Windows.
Now here is where the problem is obviously tangled up with someone else's implementation.
You see, that registry key's subkeys are called Keyboard Layout IDentifiers (aka KLIDs), some of which have Layout Id values under them, and at runtime each installed (loaded) keyboard layout has an HKL value associated with it, the lower 16 bits of which has a relationship with the KLID's lower 16 bits -- both of which sometimes have a relationship with the Language Identifier (aka LANGID) values that represent the lower 16 bits of Locale Identifiers.
Times they don't have such a relationship include both situations I directly caused as described in Getting the language (and more!) of an LCID-less keyboard/MSKLC keyboard layout names in your own language and situations I inspired/indirectly caused as described in The evolving Story of Locale Support, part 2 (raising the roof on keyboards).
Anyway, you might have noticed one of the things that was never mentioned:
And of course there is no LCID value that LocaleNameToLCID would ever return zh-Hant-HK for 0x0c04 (aka zh-HK).
I mean we have our problems in Windows mapping names -- e.g. Four cases where I don't like ResolveLocaleName (and you shouldn't either!) -- but this looks like an external issue is contributing or causing the problem.
So whatever the problem is here, some work to clean up the stuff the callers are doing will be needed before I could comment on whatever problem the are hitting.
To get back to the original question about a list?
We don't have that, as I said.
But if you enumerate those subkeys under
you can then extract the lower 16 bits, and then anytime it doesn't return 0x0c00, you can call LCIDToLocaleName (removing duplicates at they come up) to get a fairly robust list of "keyboard cultures".
This leaves just two other groups:
I find the last bit to be slightly unfortunate from a decision-making standpoint (I originally recommended they add a Layout Locale Name value too) , but the list will be limited since there are only a few of them.
At some point, I'll probably publish the list here myself, and recommend that the MSDN documentation writers do the same
That just leaves out the IMEs, but the original question was just about keyboards, so I guess they're their own for that bit. :-)
The question brings out one of the less intuitive behaviors of .NET Globalization:
Guys, I know I have already bugged you about this, but this thing keeps puzzling me.
The most harmless .NET code on Earth
Console.WriteLine(DateTime.Now.ToString("H:mm:ss", new CultureInfo("pa-Arab-PK")));prints21.28.15
The pattern "H:mm:ss" was obtained from the list of all long time 24-hour patterns that .NET returns and used unmodified, as an opaque string. The MSDN library says .NET takes .TimeSeparator from .ShortTimePattern; so it is a dot and all colons in the long time pattern get resolved to dots.
It feels like a bug. Apparently .NET cannot properly handle NLS data with different time separators within the same locale. The question is: is it a good idea to introduce such data that .NET cannot handle, even if we are told it is what Punjabi people really want?
Yes, and the DateTimeFormatInfo.TimeSeparator Property docs do say this.
But they say more than that. They point out, at length:
If the custom pattern includes the format pattern ":", DateTime.ToString displays the value of TimeSeparator in place of the ":" in the format pattern.
The time separator is derived from the ShortTimePattern property. Your applications are recommended to set the short or long time patterns to the exact values of interest, instead of attempting to have the time separator replaced. For example, to obtain the pattern h-mm-ss, the application should set "h-mm-ss" specifically. This also permits the setting of patterns such as "h'h 'mm'm 'ss's'" (3h 36m 12s) that don't contain a traditional separator between all parts of the format.
Perhaps this very issue -- the way that (for example) calling DateTimeFormatInfo.GetAllDateTimePatterns Method (Char) with the "T" (Long time pattern) can lead to unexpected results that seem to not go along harmoniously with an established separatof, given the fact that there is no way for .NET or Windows or anyone to discern the true intent of the COLON and such: whether it is intended to be a literal that is inserted as is, or a placeholder for the TimeSeparator.
One theoretical "fix" to make sure the intent is never unclear would be to remove the whole notion of de facto (i.e. unmarked) literals and always require everything that is not an insert to be a literal.
Though the cost (virtually every format string being more complicated, and less intuitive) is pretty high -- perhaps unexpectedly high.
It's probably easier to just confuse a few people, rather than to confuse many of them!
Over in the Suggestion Box, Joshua Boyce asked:
I have been reading your series "Getting all you can out of a keyboard layout", and in part 10a (blogs.msdn.com/.../581107.aspx) you hint at an upcoming 10b part to the series... Any chance that this series could be expanded to complete coverage of the issues you have described?
Well, Joshua has a point.
Now I wrote Getting all you can out of a keyboard layout, Part #10a over six years ago, and the follow-up I suggested there has never happened.
And with six years since then, it would be hard to claim I was busy the whole time. :-)
The truth is that I had a hard time sustaining interest in doing it for the sample, given how the suggested improvement wouldn't have much impact for any existing layout.
This made it a theoretical improvement, one that didn't have much real need....
Of course, there has been a change since then!
Last November, in The evolving Story of Locale Support, part 6: Behind the Cherokee Phonetic layout in Windows 8, I talked about a new Windows 8 keyboard with just the kind of complexity in it that the change might help.
I now have a reason to dust off that sample. And to not take 6 years to do so.