Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
After I wrote Do you know what Հայերեն, தமிழ், اُردو, and Tiếng Việt have in common? yesterday, I had a colleague who read it on her phone ask me why the third letter in Tiếng Việt (Vietnamese) didn't look right even though the eighth letter looked just fine.
There are for exampe the three ways you might see that string and how each one encodes that third letter:
Tiếng Việt U+0065 U+0302 U+0301Tiếng Việt U+00ea U+0301Tiếng Việt U+1ebf
The first one uses Unicode Normalization Form "D" (which Microsoft seldom uses)
On the other hand, the third one uses Unicode Normalization Form "C" (which Microsoft usually uses except in a few cases -- including Vietnamese.
That second form is the one whose origins are shrouded in Microsoft's hasty cover-up of the VNI debacle, also known as Code Page 1258.
But that code page is so seldom directly used now, that it cannot rightfully be considered the source.
The real source is that damn keyboard, the one I talked about in blogs like What to do with the Vietnamese keyboard on Windows? and On my "Vietnamese Plus" and "pseudo-Form V" constructs. Which is of course based on the code page....
If we want to escape it, it would require a task inspired by the single most complicated keyboard layout that has ever been in Windows (previously described in The evolving Story of Locale Support, part 6: Behind the Cherokee Phonetic layout in Windows 8).
And then, by using chained dead keys, create a genuine Unicode Normalization Form C version of the Vietnamese keyboard on Windows....
This cannot be done for Windows 8 even if it was approved, as way too much research and picking up Vietnamese hardware and writing so many dead key tables that I might crash some never before hit buffer problem.
But I'll put it on my list of things to do at some point. We've really screwed up more than our fair share of Vietnamese text -- in sorting, in keyboard, and in code page....
Hint #1: This is not really any sort of riddle.
Hint #2: No, it is not the fact that Windows supports Հայերեն, தமிழ், اُردو, and Tiếng Việt.
Also known as Armenian, Tamil, Urdu & Vietnamese.
I mean, yes it does.
But that's not what I'm asking about.
Do you give up?
Over on the Microsoft Language Portal Blog, Palle Petersen has asked people to Give feedback on Windows terminology for Armenian, Tamil, Urdu and Vietnamese. :-)
From now until March 21st, you can let Microsoft know about how appealing, easy to understand, and technically correct things are.
Or aren't, of course. :-)
And by things I mean both many terms from Windows 7 and new terms for Windows 8.
If you love one of these languages, then this is a great opportunity to be awesome.
And to make the next version of Windows more awesome, via the well-known transitive property of awesomosity!
Previous blogs in this series:
Today's blog starts with an email that was sent to The Unicode List by Jeroen Ruigrok van der Werven:
IDN and emoji combined brings you the wonderful domain of:http://💩.la/ (It should be U+1F4A9.)I don't know whether to laugh or cry at this marvel of technology. :)
Even though it was the weekend, Stephane Bortzmeyer replied quickly enough:
Note that it is a direct violation of RFC 5892. U+1F4A9, being ofcategory So, should be DISALLOWED. The registry was wrong to acceptit.
To which Jeroen Ruigrok van der Werven replied:
Oh, this will be fun. So I guess they did not check the codepoint categoriesin their validation step then? (I honestly have no idea how NICs do thisnowadays, it's been ages since I messed with stuff on that level.)
Now this is not such a new issue.
The first time I heard of it was from the blog The World’s First Emoji Domain, which was first put up on July 21st last year.
From that blog:
Now that you’ve had a moment to recover, I’d like to give particular thanks to the country of Laos, who run the last remaining domain registrar I’m aware of that still allows international domain names that use any Unicode character. Our sincere thanks must be given to Thongsing Thammavong, the Prime Minister of Laos, for his valuable assistance in making all of this possible. Update: I’ve just got word that, due to intense political unrest in Laos (untrue), they no longer allow Emoji domains! Yes, .la is no more. Fortunately, the territory of Tokelau (!) has stepped in to meet this intense international need! Emoji .tk domains are now available. (Why are they so hard to register? Due to fears of IDN homograph attacks, most registrars, like .com, now only allow specific language sets to be used for Unicode domain names. The days of registering ☃.net — a previous Cabel effort in this series — are long gone. In fact, back in 2007 ICANN expressly recommended that “symbols and icons [...] such as typographic and pictographic dingbats” should not be allowable code points for domain names. Fortunately, Laos didn’t get the memo.)
...
PS: Thanks to iwantmyname.com for doing emoji domain registration, and domai.nr for valuable assistance!
Note that people started registering Emoji domains right in the comments of that blog.
Now, who is right here -- the other blog author or Stephane?
RFC 5892: The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) is not as direct as Stephane suggests; although the process it describes hints at issues you can only indirectly hit if you run the process, e.g.
The mechanisms described here allow determination of the value of the property for future versions of Unicode (including characters added after Unicode 5.2). Changes in Unicode properties that do not affect the outcome of this process do not affect IDN. For example, a character can have its Unicode General_Category value (see [Unicode52]) change from So to Sm or from Lo to Ll, without affecting the algorithm results.
The rest of the document does not mention So or Sm explicitly at all, and you have to dig to derived character definitions to get to them -- to even understand why such references to So or Sm even exist.
A lot of the symbols issue here has mmuch more to do with the move from IDNA2003 to IDNA2008; as the new Unicode 6.1 update to UTS #46: Unicode IDNA Compatibility Processing state:
By using this Compatibility Processing, a domain name such as ÖBB.at will be mapped to the valid domain name öbb.at, thus matching user expectation for case behavior in domain names. For transitional use, the Compatibility Processing also allows domain names containing symbols and punctuation that were valid in IDNA2003, such as √.com (which has an associated web page). Such domain names containing symbols will gradually disappear as registries shift to IDNA2008.
Anyway, what we are largely seeing here is registries that were using the older rules.
Now IE9 on Windows 7 won't go to the "Get Coffee" domain at all:
Even if you do it in the Punycode version directly:
I won't speculate on the reasons -- beyond an idle guess that it may be following IDNA2008 rules and rejecting the site? Maybe.
And I can't get to either of my Windows 8 machines at the moment.
"Luckily" Firefox had no problem with it....
Hmmmm.
How I feel about its success largely depends on the reason for IE's failure.
If you know what I mean.
I want to be clear on what the moral is here -- prefer IDNA2008 whenever you can.
And please never mix IDN and Emoji, even when someone lets you....
When I wrote Latest about the "post-iBot" world...., I left out some information.
Some of it, I left out because I didn't know it. When I don't know something then as a general rule I don't say nearly as much about it.
In fact, there are still some things I do not know. And those things will continue on the same trend of me not talking about them.
But there were a few things I actually did know that I didn't share.
And there are a few things I have learned since then.
There is a safety mode that the iBot goes into any time something bad happens. When this occurs, you can work in Standard Mode and Four Wheel Mode.
But you cannot do Stair Climbing Mode, and you cannot work in two wheel "Balance" Mode.
Now, in addition to this going on anytime something bad happens, there is one other time -- when the device hits its internal "time bomb" date.
All people who do service are under instructions to (every time they perform service) to extend the date to the maximum the unit will allow, which for the software is up to 96 months (8 years) from when the service is performed.
Now when it comes to other work, their options are more limited.
The people doing service can do CG (Center of Gravity) calibrations, which would be an important thing to do if there was a huge change in a person's weight, or for example people like me who have iBots calibrated to assume I have a heavy laptop attached to the back of the chair who suddenly wants to change that calibration.
Now Johnson and Johnson has taken quite a hard-ass approach when it comes to transfer ownership of iBots -- something that really gets in the way of efforts like the Huey 091 Foundation.
Certainly J&J isn't thrilled to be on the end of a "Company doesn't support our country's veterans" story.
But this is kind of Johnson and Johnson's fault for getting rid of the people who did the other calibrations and changes such as disabling functionality that the person couldn't do or failed to prove he could do when tested, or if the speed was forcibly set to be lower.
Because they laid off almost all of the people do the assessments, they would run into FDA problems if anyone who was to receive a "transferred" chair who didn't get assessed and especially who needed any of those other non-CG calibrations done.
I have been told but have been unable to verify that they are quietly using the limited personnel they have left to take care of the few situations they can -- they really do want to avoid the bad press and get out of the business as quietly as they can.
At the same time, DEKA (Dean Kamen's company), is also saying very little right now about their future plans, which I and others assume is due to non-disclosure agreements that would keep them from helping of interfering until the iBot's ownership is transfered back to DEKA in the end of 2013.
Geeks like me have spent some time doing the things we can to make sure we can stay up and running even without the help of Johnson & Johnson. Though for us the problem is more along the lines of violating the warranty if we modified the chair in an unauthorized way, which is unlikely have very much sway on the likes of us once the warranty officially expires anyway.
Until/unless a new plan is announced for support after DEKA takes ownership of the iBot again.
I will say I'm rather disappointed in the way J&J has handled all this -- there are dozens things they could have done to make the situation easier for the iBot owners. Their whole exit strategy seems more like a carefully orchestrated series of tactical errors. Even now they could take steps to help people that they choose not to do.
When big companies have a chance to do good things, it is disappointing (if not surprising) when they don't.
For me, I look forward to DEKA and Dean Kamen getting back in the game....
I just left SAM REMIX, and the Gauguin exhibit.
Disclaimer #1: The only genuine knowledge I have of art is what I gained from a brief period in my early 20's where I dated several (okay, three) women who had art history majors in college; all of the rest just came from gong to art exhibits and shows and auctions, and developing my own tastes for what I liked.
Disclaimer #2: I intentionally signed up to go to the galleries at a relatively late time (10:20pm):
I did this primarily because Diane (one of those art history majors I dated 20 years ago) and Jen Graves (whose controversial You May Be Infected Already review of Gauguin & Polynesia:An Elusive Paradise reassured me two decades too late that Diane's negative feelings about Gauguin were probably not merely the "overreacting art major" that I took it for at the time), and their thoughts about Paul Gauguin and his life.
I wanted to be in a certain state of mind by the time I was going to be viewing this unique collection of Gauguin's work.
If I was going to view te art and have hope of being able to enjoy the art while ignoring the artist's execesses (former stockbroker who left his wife and five children to bang, impregnate, and give syphillis to 13 year olds that he was), I decided I was going to need to be pretty drunk.
Perhaps readers will feel this was not terribly logical, and that perhaps I let my guilt over dumping Diane for being so intense color by views.
The fact that it didn't work may in fact bolster this opinion.
As I rolled through the exhibit and saw so many pictures less decent than this one:
yet still sharing many of the same issues of less savory and less dressed young girls -- young women who don't make eye contact, a choice that felt to me like him and his guilt.
Or maybe they couldn't look at him, even when they married him (every 13 year old girl's dream).
And maybe he was just painting what he was seeing in his twisted, racist, and disgusting and misogynistic world view.
I couldn't enjoy it.
I was twelve mai tais in by 10:20 (I freely admit many people there were enabling me),but within minutes of experiencing so much of Gauguin I felt completely sober.
It was too late to save them, yes.
But suddenly I felt re-energized to think about fighting human trafficking. To go after the people who are inspired about exploiting worlds they find more primitive then their own, one under-aged girl at a time.
When I was through the exhibit, I tried to go back to the party.
The DJ was playing half songs -- and finally, not long after Tequila and Let's Hear It For the Boy and PYT, I realized I was done.
Before 10:20 I had met an attractive young doctor who was finishing her internship and planning for her residency. We spnt some time talking and really connected, and we probably would have ended up seeing each other in the future. But after I was too distracted to even look for her (perhaps even uncomfortable about seeking out a younger woman after all that even though she is 28), and now that it is all over I realize I probably won't ever see her again.
I can't help feelng that if I were a better person (who could look past the artist and appreciate the art) or a worse person (who wouldn't feel so uncomfortable about his immersion into the primitive), that I would have enjoyed it more....
I remember talking to Mark Zbikowski a couple of times when he was still at Microsoft.
Once was to get his opinion on updating the casing table, which had by that point grown to over 200 letter pairs missing from it that were assigned in Unicode.
And the other time was to report a bug (previously described in Ā was unexpected at this time).
Afterwards, I asked him how he dealt with a problem I was starting to have more and more often which he seemed to be dealing with quite deftly....
The problem with people who consider you to be "the guy" in an area, who will use whatever answer you give as the one true gospel.
His answer is one I have kept with me, one of the greatest lessons I've learned during all of my time at Microsoft.
Is it that the person asking the question often has the information they need (or are very close to having it), and the most important thing you can do is help bring them to the answer they already have (or are very close to having).
Bold pronouncements or admonitions, eveb when they are needed, should be based on limited statements that are basically always true -- and that it is just the situational application of those truths becomes the big content of the mail. Of narly every mail.
I remember I felt some doubt about whether that would necessarily work for me, as I was still searching for my "always true" truths.
Though I feel much less of that now, just a few years later.
So these days, I pay much more attention to what he was trying to tell me then.
I mean, here will always be questions, always be new situations that people are looking to fit into the grand tapestry that we all think of as
The Truth™
I'm probably as smart as I always was, though I think of myself as being a little bit wiser now....
By the way, if I ever did write a book again, you just got the advance notice of what the title would be, didn't you?
It would be the title of this very blog you are reading, ;published two hours and ten minutes after even the regular readers expected it....
The other day, in The evolving Story of Locale Support, part 19: In honor of International Mother Language Day..., I pointed to a blog in Steven Sinofsky's Building Windows 8 Blog written by my teammate Ian Hamilton entitled Using the Language You Want.
My blog focused on one issue Ian covered that i thought was really cool.
But I'll be honest.
Ian's blog has a lot of things in it that I think are really freaking cool!
Today, I want to talk about another of those things.
In fact, it is something that I think is cool for two separate reasons. :-)
It is one of the first things he describes:
In some countries, people can purchase PCs with a variety of languages preinstalled. With Windows 8, users will be able install additional display languages beyond those preinstalled languages. This means that the language of the PC no longer needs to be a major consideration when deciding on which model to buy. If the language you want is not preinstalled on the PC you like, you can now install the one you want.
But for some families, allowing the installation of an additional display language might not be enough, as they also need the ability to switch between languages. To illustrate the point, let’s look at the United States (where historically we have been less sensitive to these issues than in most other places around the world). We know from 2009 census data that 80% of Americans speak English at home. The other 20% speak something other than English. Not surprisingly, 35,468,501 (12.41% of the total) speak Spanish at home. Some PCs sold in the US have had English and Spanish preinstalled on them. On those PCs, the user picks one language or the other, and the one not chosen is wiped off the hard drive after first run. Feedback showed that customers loved having a Spanish language PC, but what they really needed was Spanish and English, and the ability to switch between them. A subsequent study by an outside firm confirmed these results. In many cases, parents in the home spoke Spanish, and their children were speaking English. The ability to have a Spanish user account for the parents, and an English one for the kids—or at least the ability to switch a single account’s display language back and forth between English and Spanish—was the way to delight these customers.
New, easier way to get languages
The new Language preferences section in Control Panel is the new one-stop place to find all Windows display languages in Windows 8. In the past, some languages were available through Windows Update, and others were distributed through the Microsoft Download Center.
The reasons for separating the languages into two groups and their separated distribution channels made no sense to our customers. It wasn’t their fault. This classification of languages only made sense to our internal teams. This confusion was a great motivator for re-imagining Language preferences in Control Panel. We will no longer ask customers to understand these nuances. Looking at the end-to-end experience, it made sense to build an entirely new experience around the acquisition of new languages.
Do you see what he did there?
#1 -- the whole distinction between Language Packs and Language Interface Packs that I described in When terminology affects satisfaction is gone as of Windows 8!
#2 -- people who buy the wrong language version of Windows will be able to get the right one without upgrading to the most expensive SKU!
#3 -- people in two language households who need to be able to easily switch back and forth will, like the folks I mentioned in #2, be able to easily do so without having to buy the most expensive SKU!
Now all three of those items are amazing, they truly are.
In fact, right about now you might be questioning my math skills since I said I was going to talk about two cool things.
Well, I actually was thinking of those three things as ONE cool thing.
So let's think of them as 1a, 1b, and 1c! :-)
The other cool thing is that we will be destroying an soul-free industry that created tools to hack Windows in order to hack 1a and 1b and huge support hit that the "genuine" architecture's consideration of tools that hacked language support forced on users, through no fault of their own (a situation I indirectly and euphemistically referred to in Intended Implicatures Redux, aka On Unintended Genuinosity Negation).
Those bastards will be forced out of business, and customers will no longer be unfairly taken advantage of by unscrupulous hackers who tried to profit from their confusion!
So customers win, and Microsoft wins, and he only people who lose and basically losers who kind of suck as people, and who deserved to lose anyway!
Now I know most of my readers will consider most of my #2 as being a whole mess of "inside baseball". But let me say how pleased I am that Microsoft is "throwing an elbow" and hitting those losers right in the face (sorrry for mixing those sports metaphors!).
Improving language support for all of customers is just the cherry on top of the sundae! :-)
So let's say you call GetKeyNameText to get the names of the keyboard keys.
As it says in the remarks:
The format of the key-name string depends on the current keyboard layout. The keyboard driver maintains a list of names in the form of character strings for keys with names longer than a single character. The key name is translated according to the layout of the currently installed keyboard, thus the function may give different results for different input locales. The name of a character key is the character itself. The names of dead keys are spelled out in full.
Ok, so let's load the German keyboard layout (kbdgr.dll, KLD value of 0000407).
Let's skip the function calls and load it up in MSKLC to get the key names all at once:
KEYNAME
01 ESC 0e R\x00DCCK 0f TABULATOR 1c EINGABE 1d STRG 2a UMSCHALT 36 "UMSCHALT RECHTS" 37 " (ZEHNERTASTATUR)" 38 ALT 39 LEER 3a FESTSTELL 3b F1 3c F2 3d F3 3e F4 3f F5 40 F6 41 F7 42 F8 43 F9 44 F10 45 PAUSE 46 ROLLEN-FESTSTELL 47 "7 (ZEHNERTASTATUR)" 48 "8 (ZEHNERTASTATUR)" 49 "9 (ZEHNERTASTATUR)" 4a "- (ZEHNERTASTATUR)" 4b "4 (ZEHNERTASTATUR)" 4c "5 (ZEHNERTASTATUR)" 4d "6 (ZEHNERTASTATUR)" 4e "+ (ZEHNERTASTATUR)" 4f "1 (ZEHNERTASTATUR)" 50 "2 (ZEHNERTASTATUR)" 51 "3 (ZEHNERTASTATUR)" 52 "0 (ZEHNERTASTATUR)" 53 "KOMMA (ZEHNERTASTATUR)" 57 F11 58 F12
KEYNAME_EXT
1c "EINGABE (ZEHNERTASTATUR)" 1d STRG-RECHTS 35 " (ZEHNERTASTATUR)" 37 DRUCK 38 "ALT GR" 45 NUM-FESTSTELL 46 UNTBR 47 POS1 48 NACH-OBEN 49 BILD-NACH-OBEN 4b NACH-LINKS 4d NACH-RECHTS 4f ENDE 50 NACH-UNTEN 51 BILD-NACH-UNTEN 52 EINFG 53 ENTF 54 <00> 56 HILFE 5b "LINKE WINDOWS" 5c "RECHTE WINDOWS" 5d ANWENDUNG
Now let's look at the Windows 8 soft keyboard based on kbdgr.dll:
There are some keys that match, and some that do not.
As it turns out, they are not using GetKeyNameText at all; they have a new, standard, conventional mechanism to put resources in a DLL -- the same "multi-language" style used by the common dialogd, which means that the kbdbr.dll "Portuguese (Brazilian ABNT)" keyboard with KLID 00000416:
01 Esc 0e Backspace 0f Tab 1c Enter 1d Ctrl 2a Shift 36 "Right Shift" 37 "Num *" 38 Alt 39 Space 3a "Caps Lock" 3b F1 3c F2 3d F3 3e F4 3f F5 40 F6 41 F7 42 F8 43 F9 44 F10 45 Pause 46 "Scroll Lock" 47 "Num 7" 48 "Num 8" 49 "Num 9" 4a "Num -" 4b "Num 4" 4c "Num 5" 4d "Num 6" 4e "Num +" 4f "Num 1" 50 "Num 2" 51 "Num 3" 52 "Num 0" 53 "Num Del" 54 "Sys Req" 57 F11 58 F12 7c F13 7d F14 7e F15 7f F16 80 F17 81 F18 82 F19 83 F20 84 F21 85 F22 86 F23 87 F24
1c "Num Enter" 1d "Right Control" 35 "Num /" 37 "Prnt Scrn" 38 "Right Alt" 45 "Num Lock" 46 Break 47 Home 48 Up 49 "Page Up" 4b Left 4d Right 4f End 50 Down 51 "Page Down" 52 Insert 53 Delete 54 <00> 56 Help 5b "Left Windows" 5c "Right Windows" 5d Application
is covered as well as the kbdpo.dll "Portuguese" keyboard with KLID 00000816:
01 ESC 0e BACKSPACE 0f TAB 1c ENTER 1d CTRL 2a SHIFT 36 "SHIFT DIREITO" 37 "ASTERISCO (TN)" 38 ALT 39 "BARRA DE ESPA\x00C7OS" 3a CAPSLOCK 3b F1 3c F2 3d F3 3e F4 3f F5 40 F6 41 F7 42 F8 43 F9 44 F10 45 PAUSE 46 "SCROLL LOCK" 47 "7 (TN)" 48 "8 (TN)" 49 "9 (TN)" 4a "MENOS (TN)" 4b "4 (TN)" 4c "5 (TN)" 4d "6 (TN)" 4e "MAIS (TN)" 4f "1 (TN)" 50 "2 (TN)" 51 "3 (TN)" 52 "0 (TN)" 53 "PONTO DECIMAL (TN)" 57 F11 58 F12
1c "ENTER (TN)" 1d "CTRL DIREITO" 35 "BARRA (TN)" 37 "PRINT SCRN" 38 "ALT DIREITO" 45 "NUM LOCK" 46 BREAK 47 HOME 48 "SETA ACIMA" 49 PGUP 4b "SETA \x00C0 ESQUERDA" 4d "SETA \x00C0 DIREITA" 4f END 50 "SETA ABAIXO" 51 PGDOWN 52 INSERT 53 DELETE 54 <00> 56 HELP 5b "WINDOWS ESQUERDA" 5c "WINDOWS DIREITA" 5d APLICATIVO
You can we see a lot of them are even more off than German!
Now I have several competing thoughts here:
On the the hand, the mapping of localization of SKU languages and keyboards is rather lopsized since there just 36 of one and 116 of the other, with many of the latter covering languages outside those 36.
On the other hand, the "localization" of key names with the names stored in the DLL (as described in Flirting with a strange keyboard, will you remember its language in the morning?) is a bloody mess that localization engineers have no good means to reach.
On yet another hand, the new model is one that is easy and comfortable for localizers of those 36 languages, which means Windows finally can fix some of that broken model.
And on yet another hand, the mapping of soft keyboards to the layout DLLs is a problematic match in another way -- not every single layout has a unique soft keyboard layout.
And on a fifth hand, those crazy DLL based names are the only ones with a semi-documented way for people to retrieve the names -- the GetKeyNameText function -- which although it as problems, like the Catalan bug reported by a customer to the MSDN Product Feedback Center here, is all we give developers.
On the last hand, the names are in many cases of of sync between the two, as the above examples show (this was how I originally found out about the problem!).
Me and my conflicted six hands, what to do?
With the help of localization engineers I now have those soft keyboard key names, and I am going to try to sync up many of the DLLs for those 36 languages, so the two will be more in sync for Windows 8 some time after the upcoming Windows 8 Consumer Preview, so they will return the same names.
Some of the remaining key names may stay broken, as we have no good source for what the names should be in other languages -- there may well be common hardware for some languages that I have no way to even attempt to sync up with.
I am struck by the fact that if I had never dug into this issue, no one else would have noticed until months or years later - or meybe ever. One of those bugs that now that I know about it, I can't ignore, though.
I almost wish for some triage effort to won't fix the issue and relieve me of the responsibility!
Probably worth the effort though, as imperfect as it may be in the end, right? :-)
Previous blogs from this series:
Remember that third rule I mentioned in Part 0?
I have neither inclination nor desire to violate either non-disclosure agreements or marketing news cycles related to Windows 8.
This last rule seems obvious, but I don't want anyone to misunderstand my intent here, or what I want to accomplish. Any time I talk about stuff you haven't heard before, it is only due the fact that they are doing other things right now, not because I am disclosing anything that you couldn't have found yourself by spelunking through the //Build Developer Preview, or eventually the Beta.
This blog you are reading today will in no way violate this rule!
Now of the many blogs above, for some it was Part 5 that was the most interesting.
Because it listed the locales that were scheduled to be added in Windows 8.
That list:
That list was for locales, not UI languages, as I said.
So it was nominally listing items that were to be added to Table 3 (Locales whose identifiers are not directly associated with any localizations of Windows, even if a related identifier might make for one representing a suitable localization) of The Locales of Windows 7, all divvied up.
As initially reported in the Building Windows 8 Blog a short time ago, in a blog entitled Using the Language You Want.
So I have bad news, worse news, and really good news!
The bad news will get you mad at me for over-promising and under-delivering. The worse news will get you mad at Microsoft for removing features. And the really good news will make you realize that I was just playing with your emotions!
The bad news: Many of those locales will not in fact be added to Table 3 (Locales whose identifiers are not directly associated with any localizations of Windows, even if a related identifier might make for one representing a suitable localization).
The worse news: Even some of the locales that used to be in Table 3 (Locales whose identifiers are not directly associated with any localizations of Windows, even if a related identifier might make for one representing a suitable localization). will be removed in Windows 8.
And the really good news: Every locale discussed earlier will actually be added to Table 2 (The locales representing languages for which Windows creates Language Interface Packs, aka LIPs) and occasionally even Table 1 (The locales representing languages into which Windows localizes) in Windows 8!
The official list of planned Windows 8 User Interface language additions:
I am going to pretend for a moment that is not amazing or incredible, okay?
Oh, screw that. I can't even pretend to be calm and dignified here.
This is amazing and in-freaking-credible!
In future blogs, I will dig in further to many of the exciting, interesting, and cool technical/linguistic/cultural issues that have come up here in the process of all of this very cool and interesting work.
And the awesome new story of Language Pack installation that finally fixes the problems some users have noticed since Windows 2000!
And the new UI paradigm!
Not to mention the coolness of an en-GB User Interface language in Windows 8 that is just begging to tell more!
Plus all of the other cool stuff the announcement over in the Building Windows 8 blog Using the Language You Want.
And all of the cool stuff about the Consumer Preview that you can try out yourself once you get your hands on it! :-)
But for now let me close with a personal note to Roy Bonny (ᎧᏂᎦ ᎪᎳᎭ) and Joseph Erb of Cherokee Nation: no more secrets, the President of Windows put the information into his Blog! :-)
Functions like GetShortPathName have been around for a long time.
Too long, if you ask me.
Because there are some things that it incorrectly assumes.
For example, it assumes that any characters that fit within the default system codepage are okay, despite the fact that one change the setting and reboot, leading to different results.
But perhaps this incorrect assumption is somewhat forgivable since the system seems ot behave the same way, and changing the default system locale and code page is not a common operation.
And there are some things that it does the wrong way.
For example, instead of calling WideCharToMultiByte with the WC_NO_BEST_FIT_CHARS flag, it instead calls RtlUnicodeStringToAnsiString which calls RtlUnicodeToMultiByteN, neither of which allows one to opt out of the best fit behavior -- something tat befits any file system type functions that want more accurate results!
Now I've ranted about the problems with best fit mappings over the years:
But there is one thing worse than best fit mappings -- and that is file system functions allowing their presence in cases where they shouldn't
In my opinion, this kind of problem is in some cases a lot less forgivable, since there are some languages that will commonly use letters like đ aka U+0111 aka LATIN SMALL LETTER D WITH STROKE which will best fit map in some code pages to d aka U+0064 aka LATIN SMALL LETTER D.
Functions that make incorrect assumptions like "everything that best fit maps in a code page fits in the code page" are awful since in the most extreme cases there are more than twice as many characters with best fit mappings as there are with correct mappings.
In the end, functions that behave this badly should be avoided, for code safety reasons....
In the past, I have tended to be rather hostile toward Emoji.
But there are some Emoji that I simply love, love, love.
Like some of the ones friend/colleague Alexander Sklar mentioned over in Facebook yesterday:
🙈 U+1f648 SEE-NO-EVIL MONKEY
🙉 U+1f649 HEAR-NO-EVIL MONKEY
🙊 U+1f64a SPEAK-NO-EVIL MONKEY
Just in case you aren't reading this blog on your Windows 8 machine, here they are in NoteWordpad on Windows 8:
It's worth noting that font fallback was in full force - I never specified the font, but Wordpad/RichEdit had no problem finding the font!
The origin of these three mystic apes -- originally named Mizaru (the one covering his eyes, who sees no evil), Kikazaru (the one covering his ears, who hears no evil), and Iwazaru (the one covering his mouth, who speaks no evil) -- are described in the Wikipedia article:
The source that popularized this pictorial maxim is a 17th century carving over a door of the famous Tōshō-gū shrine in Nikkō, Japan. The carvings at Toshogu Shrine were carved by Hidari Jingoro, and believed to have incorporated Confucius’s Code of Conduct, using the monkey as a way to depict man’s life cycle. There are a total of 8 panels, and the iconic three wise monkeys picture comes from panel 2. The philosophy, however, probably originally came to Japan with a Tendai-Buddhist legend, from China in the 8th century (Nara Period). It has been suggested that the figures represent the three dogmas of the so-called middle school of the sect.
In Chinese, a similar phrase exists in the Analects of Confucius from 2nd to 4th century B.C.: "Look not at what is contrary to propriety; listen not to what is contrary to propriety; speak not what is contrary to propriety; make no movement which is contrary to propriety" (非禮勿視, 非禮勿聽,非禮勿言, 非禮勿動). It may be that this phrase was shortened and simplified after it was brought into Japan.
It is through the Kōshin rite of folk religion that the most significant examples are presented. The Kōshin belief or practice is a Japanese folk religion with Chinese Taoism origins and ancient Shinto influence. It was founded by Tendai Buddhist monks in the late 10th century. A considerable number of stone monuments can be found all over the eastern part of Japan around Tokyo. During the later part of the Muromachi period, it was customary to display stone pillars depicting the three monkeys during the observance of Kōshin.
Though the teaching had nothing to do with monkeys, the concept of the three monkeys originated from a simple play on words. The saying in Japanese is "mizaru, kikazaru, iwazaru" (見ざる, 聞かざる, 言わざる, literally "don't see, don't hear, don't speak". However, -zaru, an archaic negative verb conjugation, is pronounced the same as zaru, the vocalized form of saru (猿?), "monkey", so the saying can also be interpreted as the names of three monkeys.
It is also possible that the three monkeys came from a more central root than a simple play on words.[contradiction] The shrine at Nikko is a Shinto shrine, and the monkey is an extremely important being in the Shinto religion. The monkey is believed to be the messenger of the Hie Shinto shrines, which also have connections with Tendai Buddhism. There are even important festivals that are celebrated during the year of the monkey (occurring every twelve years) and a special festival is celebrated every sixteenth year of the Kōshin.
"The Three Mystic Apes" (Sambiki Saru) were described as "the attendants of Saruta Hito no Mikoto or Kōshin, the God of the Roads". The Kōshin festival was held on the 60th day of the calendar. It has been suggested that during the Kōshin festival, according to old beliefs, one’s bad deeds might be reported to heaven "unless avoidance actions were taken…." It has been theorized that the three Mystic Apes, Not Seeing, Hearing, or Speaking, may have been the "things that one has done wrong in the last 59 days."
According to other accounts, the monkeys caused the Sanshi and Ten-Tei not to see, say or hear the bad deeds of a person. The Sanshi (三尸) are three worms living in everyone's body. The Sanshi keep track of the good deeds and particularly the bad deeds of the person they inhabit. Every 60 days, on the night called Kōshin-Machi (庚申待), if the person sleeps, the Sanshi will leave the body and go to Ten-Tei (天帝), the Heavenly God, to report about the deeds of that person. Ten-Tei will then decide to punish bad people, making them ill, shortening their time alive, and in extreme cases putting an end to their lives. Those believers of Kōshin who have reason to fear will try to stay awake during Kōshin nights. This is the only way to prevent the Sanshi from leaving their body and reporting to Ten-Tei.
An ancient representation of the 'no see, no hear, no say, no do' can be found in four golden figurines in the Zelnik Istvan Southeast Asian Gold Museum. These golden statues date from the 6th to 8th century. The figures look like tribal human people with not very precise body carvings and strong phallic symbols. This set indicates that the philosophy comes from very ancient roots.
There is a fourth monkey, named Shizaru, the "DO-NO-EVIL MONKEY", which has been seen in two different forms:
That fourth monkey is not in many of the legends and stories, and it is not in Unicode either.
If it were up to me, a space would have been reserved at U+1f64c for this fourth monkey, and if it was eventually added I would have been quite content to go with the PG version for the reference glyph
Alas, no such luck.
Perhaps it was an oblique criticism of Google's "Don't be evil" since there are so many people who question that it is really a guiding principle anymore -- so that's why Unicode chose to not "do no evil.":
But this seems really unlikely to me (I wasn't the one who originally suggested it).
Plus I don't want to violate the U+1f64a principle!
Anyhow, the story, though originally Chinese, has a unique importance in Japanese culture, where these three figures embody the Golden Rule there, providing a moral center that is almost the opposite of the "ignoring immorality" that it often means in Western culture.
But the cultural meanings in Japan are something that gives them a place in the Emoji... perhaps the three that redeem the whole set, in the eyes of some!
On the other hand, trivializing them my offend some even more.
But it serves, they serve, as a reminder that every character has a story.... :-)
The other day, over in the Building Windows 8 blog, Jennifer Norberg (senior PM lead on the HID - Human Interaction Platform - team) wrote an amazing blog entitled Enabling accessibility.
No translations just yet, but soon enpugh localized versions will be posted!
Steven's intro:
Windows 8 is a product we design for an incredibly broad spectrum of people around the world. One of the areas where we have worked to deliver an even greater level of innovation is in ensuring that Windows 8, particularly the new Metro style experience, is accessible to everyone regardless of their physical abilities. In this post we will talk about the engineering work that goes into the features we refer to as “accessibility” – though as you will see, many of these features are broadly applicable and just make the product better for everyone. If you are interested in Microsoft’s overall efforts in accessibility and related topics, please be sure to check out www.microsoft.com/enable. This post is especially important for developers building Metro style apps for inclusion in the Windows Store, as we are asking you to test the accessibility of your application prior to submission. I encourage folks who have never seen these tools in action to learn about them through the video. The upcoming beta will be a great chance for everyone to experience the product.
An important note. With the next public release of code (later this month) we will see a significant improvement in the capabilities described in this post, but we still have work to do between beta and RC especially with regards to working with the latest releases of third party tools. I just want to make sure folks know that this post talks about improvements in the next release as well as functionality that will still be improving as we get to the release candidate. This post was authored by Jennifer Norberg, a senior program manager lead on our HID team.
--Steven
The thing I really like about Enabling accessibility is how it covers the reasons and the types of enabling needed/provided and the history and the improvements and the way that new pieces -- of the Metro UI in particular -- are covered, and how the Store itself will be working to try to get apps written by others to also be accessible.
I'm really looking forward to seeing applications that meet the Baseline accessibility requirements. I mean, in the end Windows is a platform, so many of its features are only going to be as accessible as the apps people install and use. Knowing the effort to help creating more accessible apps in the ecosystems simply awesome.
And I'm not just saying that because of the things I need to work better with apps (though I admit that doesn't hurt!). I'm just annoyed and offended when people can't use a feature or a control or a product or a platform....
Anyway, check out Jennifer Norberg's Enabling accessibility blog, and if you have been trying out Windows 8 you can try and figure out what's there now and what's coming soon!
Today, I'll highlight one of the weaknesses of the way some of the work to extend locales shows our reach exceeding our grasp.
Now I have talked about digit substitution so much in the past that most regular readers are probably tired of hearing about it....
Frankly, I don't blame them!
So I'm not gonna be all about digit substitution this time.
Since that feature is locale based anyway, and the languages I am going to talk about here have none, it isn't relevant anyway....
But beyond that, Tai Le (languages using it include Dehong Dai) has an additional problem -- they use the Myanmar digits!
We worked around the fact of wondering how to make sure digits get seen in some cases is by adding the Myanmar Digits (U+1040 to U+1049) to both the Microsoft Tai Le font that shipped previously and the new Myanmar Text font:
You'll notice that they look slightly different, reflecting two entirely different traditions, as mentioned in the old proposal N2372:
In addition to these differences, there qare also slight size differences between glyphs in the two fonts.
Which were desgned by two different typographers
In two different styles.
To support two different scripts.
Plus it appears they didn't even completely capture the distinctions mentioned above -- I need to find out if that's a bug or not (perhaps the alternate glyphs are in font but only available using adavnaced OpenType features, which as I mentioned previously few technologies do).
Just ten little digits:
၀၁၂၃၄၅၆၇၈၉
And of course Myanmar Text also has the Myanmar Shan Digits in it, which are not in the Microsoft Tai Le font update:
Kind of funny how Unicode decided to capture those differences but not the others.
႐႑႒႓႔႕႖႗႘႙
Not funny "ha ha", if you know what I mean.
I guess the Tai Le differences weren't different enough....
But leaving the Shan digits aside, lets consider the two ways of looking at the standard Myanmar digits (U+1040 to U+1049) in these two fonts.
Uniscribe and the like have several options fo how to display these digits:
At the time Tai Le support addition was discussed in Unicode (pre-Unicode 4.0), this very issue and also the different forms were discussed, and almost led to both sets of digits being defined in Unicode in the two different blocks, though the theoretical nature of the first problem (Microsoft wouldn't add support until years later in Windows 7) and the fact that the second problem was widely treated as a minor typographic issue kept one set being used by both scripts.
And to our current troubles....
Now I don't want to imply that either Tai Le or Myanmar are not good sharers or that they both need a time out.
Well, not exactly.... :-)
There are many reasonable language experts and font developers who will consider the last two bullet points above as genuine bugs that beak their support that used to work, in prior versions.
As bad as the third point is, imagine incompletely re-applying the font in the fourth case -- a problem that many people might recognize from longstanding problems with Japanese text partially rendered with Chinese fonts!
Those people aren't wrong; there are just disadvantages to working on the edges of languages, of locales, that we start to support....
And of course in the case of issues in Word and RichEdit, there are disadvantages to not carefully dealing with the well-intended (though in my opinion somewhat flawed) designs of some programs and controls.
It is in its own way somewhat ironic that the default behavior doesn't point in the direction of the script and language used for Chinese minority language support. Oops!
But we can just keep that our little secret, right? :-)
Yesterday, in response to 'Now that its been saved, how do I open it?', regular reader Cheong00 commented:
I think it could be opened in other editors (I think Notepad++ and UltraEdit can do let you specify to look at the file with which encoding).
It'd be cool if there's a way to manually selecting the open file code page too.
Btw, I just realized UTF16LE and UTF16BE aren't selectable code page in Internet Explorer, is there any reason for those codepage be leaved out from selection?
I suppose I can take those one at a time....
Nie yes of course, there are many editors that can handle multiple encodings.
I was sticking to Microsoft ones since the question didn't seem to shaped as a "what 3rd party tools are out there?" kind of inquiry.
It would indeed be cool if you could select the code page when you open the file. And there are actually a few programs that do this -- like Microsoft Word, and Microsoft Access, to give two examples....
Chenong00 is right on that last point -- you can't see or set either UTF16LE or UTF16BE on the big IE Endoding list:
But I'll let you in on a little secret that very few people know....
If your page is encoded in UTF16-LE or UTF16-BE, you can just load it, and if you right-click and choose Encoding, you'll see that IE can show what it detected just fine:
Of course, it is using the same old naming scheme as we use in Notepad that everyone loves so much.
Well, we're noting if we aren't kinda consistent!
Now this trick works with I believe all of the detectable encodings you can find at Character Set Recognition.
So even though so many of them aren't on the big list of encodings you can choose....
You can still have your code page, even though you can't pick it too!
Over six years ago, on Wednesday, February 9th, 2005, I blogged the blog in this Blog entitled What is the difference between Big Endian and Little Endian Unicode?.
cue gratuitous art of the File Save dialog and the "Big Endian Unicode" option Notepad adds:
About six years and one day later, on Friday, February 10th, 2012, Diane posed a question in a comment to that blog:
How can I open a file once I have saved it as unicode big endian?
The short answer is to just open it the normal way!
If you are using Notepad or Wordpad or Word or several other programs, they will all detect the BYTE ORDER MARK (BOM) that Notepad inserts at the beginning of the file, and they will open it!
But let's assume there is more to this one....
For example, if Diane is a programmer trying to open the file, it becomes slightly more complicated of course -- one's best choice involves opening it using the .NET UnicodeEncoding class with one of the two constructors that tells people to expect Big-Endian Unicode, and the second best choice involves opening the file in binary mode and swapping the bytes.
I suppose I could take the opportunity Diane's question affords me to travel back in time to the pre-Windows 2000 days, and wax nostalgically to be present during the original conversations where they decided to expand that dialog to support three kinds of Unicode, and so on. I could ask Chris Walker, he'd give me the skinny here, maybe tell me the real story about what happened in 98-99 that led to adding both UTF-8 and UTF-16 BE. :-)