Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
Today is my birthday.
i don't particularly care for my birthday, and I'll not be celebrating it or anything....
Something much cooler happened yesterday.
Today's blog is gonna focus on that!
Unicode 6.2 was released!
{cue gratuitous graphic}
From the official Announcing The Unicode Standard, Version 6.2 announcement:
Version 6.2 of the Unicode Standard is now available. This version adds only a single character, the newly adopted Turkish Lira sign; however, the properties and behaviors for many other characters have been adjusted. Emoji and pictographic symbols now have significantly improved line-breaking, word-breaking and grapheme cluster behaviors. The script categorizations for some characters are improved and better documented. The Unicode Collation Algorithm has been greatly enhanced for Version 6.2, with a major overhaul of its documentation. There have also been significant changes to the collation weight tables, including improved handling of tertiary weights for characters with decompositions, and changed weights for some pictographic symbols. The newly encoded Turkish Lira sign, like other currency symbols, is expected to be heavily used in its target environment. The Unicode Consortium accelerated the release of Unicode 6.2, to accommodate the urgent need for this character. For more details of this release, see http://www.unicode.org/versions/Unicode6.2.0/.
Now I have talked about this character before,in blogs like Not the Lira or the New Lira, but a New Turkish Lira, nevertheless on March 14th of this year, and Every character has a story #39: U+20ba, aka TURKISH LIRA SIGN (coming soon to Turkey near you!) less than two months later.
And it is now in Unicode.
It will be in Windows Phone 8!
It is also in Windows 8!
You can see it in Regional Options:
You can see it right here in Segoe UI in Character Map:
You'll note it is undefined...
Well, perhaps this one bit of UI not being updated when everything else (collation, properties, locale data, character properties) has been updated is not really such a big deal.
I mean, the same thing happened with Windows 7 and the Rupee after the update:
which was added for Windows 8:
Soon after Windows 8 reaches general availability, they'll so something about the down-level and other product world, too (more on this when I have more to share!).
A lot of this will be coming up in my Show Me the Money! talk at the 36th Internationalization and Unicode conference on October 24th in Santa Clara that I talked about previously, here.
For Unicode version, this update expands my previous The oft-repeated 'What version of Unicode do we/will me support?' question, Redux where I said last February in the blog and in comments:
There is obviously time prior to ship yet after Unicode 6.1 was released that in theory might change that story slightly, though given the complexity of doing more work, I suspect that Windows 8 will largely be centered on 6.0, not 6.1.
Some specifics:
There are a few 6.1 characters that leaked into Windows 8 into fonts and keyboards.
But that is a topic for another day.
In another blog!
Sometimes we do stuff that's pretty cool and that people like.
And then without ever explaining why, it goes away.
Like the Microsoft Application Translator (from the old GlobalDev), which you pretty much have to look to the Internet Archive to read about!).
Interesting idea.
But understaffed, without funding and sans exec support, it died before it ever got to be cool. Just a few years after it first started.
It never made it past the closed beta.
And the name of MAT was dishonored.
Who would bring it back it's honor?
I guess I gave it away in the title, huh? :-)
Well, remember when I wrote Got Windows 8? Check out the Multilingual App Toolkit about a cool tool in beta?
Well, it has been just over six months, and now
Multilingual App Toolkit for Visual Studio 2012 has now been officially released!
You can get it right here from the Dev Center. :-)
From that page:
Multilingual App Toolkit for Visual Studio 2012 is an extension that enables translation support through tools and guides by focusing on the following areas:
Integration with Visual Studio IDE enables you to add and manage translation files to a project solution using standard Visual Studio menus and dialogs.
Pseudo language engine gives you ‘in house’ testing of localized apps by identifying translation issues during development such as hardcoded, concatenated, or truncated strings and other visual issues that arise when working with languages. Pseudo translations are stored in the localization industry standard XLIFF file format and can be edited just like any other language translation. This gives you granular control over pseudo translation testing.
Translation file export & import roundtrip provides you with the ability to send and receive resources in XLIFF files to friends, family, or a translator services for review.
XLIFF lightweight editor provides a lightweight localization UI for editing translated strings. Get translation suggestions quickly by using the integrated Microsoft Translator (requires active Internet connection). It also allows you to quickly edit data stored in XLIFF files by adjusting pseudo and/or actual translations.
*Required only for add-in functionality. The Multilingual App Toolkit also has a standalone editor.
There are already several videos up showing what it can do for you, like
And what's more: they've got staffing.
And funding.
And resources.
And a plan.
And the approval (and dare I say delight?) from our senior leadership.
And they're just getting started!
I think it's fair to say that the honor of MAT has been restored....
Completely off-topic, and unworthy.
It took me a while to decide.
You know, whether or not I liked the new office.
The window office, I mean.
You know that office I mentioned in Michael's window office -- Take 4! (aka Size Matters!).
First, there is the fact that I hate moving.
I. HATE. MOVING.
Foul tip.
STRIKE 1!!!
Well, to be fair, they did one of those special moves.
The ones they do for gimps in wheelchairs.
Where they'lll move all the stuff they usually won't move.
Anyway, where was I?
Oh yeah. Then there is the fact that the new office is smaller.
Half a ceiling tile narrower.
I've almost hit the wall a few times.
Let's call it s foul tip again.
STRIKE 2!!!
Then, the other day, I went to my mail slot.
I never check that thing often enough, you know.
And then, I saw it.
The new office means a new mail slot.
For reasons that surpass understanding, they order the mail slots by office number!
Talk about misuse of collation!
It makes no sense.
I mean, the mail room puts a sticker on everything that they deliver.
The sticker has two things on it.
The NAME.
And the OFFICE NUMBER.
Duh.
How hard would it be to make it alphabetical?
I can't find it.
Of course I don't remember the new number, I realize.
Then, I find it.
Sort of.
It is located on the very top shelf.
Even fully extended in the iBOT, I can't reach it.
My mail slot has a bunch of stuff in it, I can see from where I am there is stuff in there.
And I can't reach it.
Crap.
I look over at the mail slot of my old office.
I can reach it easily.
And now I really miss my old office.
It is even legal to locate a mail slot in an inaccessible location when you have self identified as being a gimp in a wheelchair?
Seriously.
I guess they couldn't have known, right?
I think they will know quite soon, though! Just a hunch. :-)
Okay, another foul tip.
Still STRIKE 2!!!
At least I'm getting a piece of it
Looking at the slots, I see no reasonable way to move it, even if I could reach it.
Which I can't do anyway.
Dammit.
Robbie helped me get the mail, and she is going to call them about moving the mail slots.
I have promised to not rip down ten slots I can reach, although as a form of catharsis, it might actually have helped.
But it's official.
Moving sucks....
Something it has in common with my new mail slot.
And my new office.
I quit.
Sigh.
Ok, I won't quit.
It will be better tomorrow, I'm sure.
Right?
But I'll still wish I hadn't moved....
At 7:56pm last night, someone sent me an email.
Now getting email is hardly an unusual thing for me.
In fact, if you omit spam (and don't class email you don't agree with as spam), it would be an unusual minute if someone hadn't contributed to my inbox!
So why on earth would this one particular email be something that I would comment about?
Not because it was about MSKLC; I have a dedicated folder for archiving those, which currently contains 16,956 items in it.
And not because it contains a suggestion for MSKLC. At least ⅓ of those 16,956 items do that.
But in this case, with this mail, something happened.
And, to quote Douglas Adams, "Once something actually happens somewhere in something as wildly complicated as the universe, Kevin knows where it will all end up- where “Kevin” is any random entity that doesn’t know nothin’ about nothin’"
In this case, it was about the File|Save As Image feature:
This is a feature that came out of one or those D&P meetings i first described back in the beginning of 2005 in Accessibility, Internationalization, and Keyboards (#3: MSKLC's UI):
Developing the Microsoft Keyboard Layout Creator was a unique experience. Almost all of the good ideas and concepts came from Cathy Wissink and Simon Earnshaw. At the time they represented the bulk of the NLS Program Management team (now they each have 5-6 people under them in GIFT -- we have grown quite a bit since then!). The three of us would have weekly meetings and talk about what kinds of tools would meet some of the customer requests that had been coming in. It was a little unconventional as projects go, since I did not have a spec to work from. But at these weekly meetings I was able to show them what I had and they were able to provide corrective measures. We started calling the meeting the Dog and Pony Show, shortened to Dog & Pony and eventually to D&P and these were very creative meetings with lots of good ideas in them. I mention the D&P because it helps point to the continuous "beta" and "usability" testing that was happening throughout the development cycle of MSKLC. Although it was pretty unconventional, it proved to be very effective when things line up as they did.
...
Although it is last on the list here, it is probably the most commonly used feature, and it highlighted the pattern of those weekly meetings. I would show the current state of the UI, Simon or Cathy would bring up a feature they felt was crucial, I would push back and explain why it was not feasible, they would grudgingly accept this, but the issues they brought up would keep bothering me until I thought of a way to make it feasible. Then I would show it to them the next week. I am pretty sure that was how the meeting became known as D&P in the first place....
In this case, Simon's spec (largely written after MSKLC was developed -- proof that even when PMs don't write specs they aren't lazy since both Simon and Cathy were very involved in the whole project!) had called out the need to make keyboards easy to document.
I told them we had no time to do it -- people would have to take screenshots of the tool in various states, or we'd risk slipping the release.
We all considered that, and everyone reluctantly agreed.
Then, back in my office, I was annoyed.
i had to try to do something.
and thus the "Save As Image" feature was born. I wrote up a simple screen scrape.
The next time we met, I shared this, and they were pleased we could take some steps to help keyboard authors.
My original prototype also included a "Save all states" functionality, but it was awkward and weird, and eventually we all agreed it was no good...
Now there was an earlier meeting (months earlier) about dead keys.
And how hard it was to see what was in them.
I mean, you had to right click on the key to get the Dead Key Dialog... option:
to launch the actual dialog:
This was really hard to imagine people finding intuitive or useful.
I hung my head and said "I can't work miracles, let alone design them!"
They agreed that this would be too hard to do.
For next week's meeting, I showed them my "tooltip" solution:
My original prototype looked much worse than that -- they spent that meeting figuring out how it should look, and ended up I think with a pretty good solution, I think!
As a side note, the contrast between MSLU and MSKLC is huge in terms of PM input. For MSLU, I leaned most heavily on Julie (my dev lead at the time), and the project was largely about her dream of a Unicode Layer on Win9x. But MSKLC was about Cathy, and her dream of a tool to make it easier to create keyboard layouts. I could have easily done MSKLC without her, but it would have sucked. So I am quite grateful for her involvement! :-)
Anyway, back to that email.
The DeadKey feature of MSKLC is more important than the time given to it. I am struggling now with how to communicate both to myself and others what the dead-key layout of my new keyboards is. How difficult would it be to add a refinement to MSKLC such that a .jpg of the dead-key layout could be requested also. All the code is there to do the jpgs for a standard layout, a CTL layout, a CTL-SHFT layout, etc. It should not be too difficult to add the dead-key image functionality.
This is indeed a big thing. It would be a great feature to add.
We clearly have all the data, right?
And now I am wishing for a D&P with Cathy and Simon to talk about it!
I should talk to some people... :-)
I've talked about vertical text support now and again over the years, for example in blogs like:
There are others, of course. But this is a nice sample that hints at the complexity.
That second-last blog in particular gets into the challenge of trying to imagine what "Vertical Windows" would look like, if such a thing ever existed for Japanese or Mongolian or Phags-pa.
But a comment in that fourth blog by John Cowan showed me that there were some people thinking about the issue, and striving to be more vertical, even if we weren't doing it in Windows:
I looked into vertical Unicode properties a few years back. What needs to happen IMHO is for Unicode to create two new properties: verticality (Top, Bottom, or Neutral) and rotatability (True or False). Verticality specifies whether, when written vertically, the script must appear in a particular direction or not. Generally, CJK characters (hanzi/hangul/kana) are Top, Mongolian ones are also Top, and everything else is Neutral. For example, Latin script often appears vertically on book spines, with English ones always top-to-bottom and German ones normally bottom-to-top: Latin script is readable either way, so it's Neutral.
Contrary to Unicode, Ogham isn't really Bottom, it's also Neutral. An inscription on a stone is read up the side, across the top, and down the bottom if it's long enough, just like a Latin-script inscription on an arch would be. It's just that most inscriptions are short and only have the bottom-to-top portion. In manuscript, it always appears left-to-right. I once asked Michael Everson what would happen if a book title were mixed English/Ogham; would it be written bidirectionally on the spine -- Latin top to bottom, Ogham bottom to top? He didn't think it was a realistic use case.
Rotationality is what you mention above: basically U+4300 ("one") is rotatable, the rest of CJK is non-rotatable, and everything else is rotatable. Given these two properties, all a user has to do is set the overall direction of text progression (the next line can be left of the current line, right of the current line, or below the current line, known in CSS3 as RL, LR, and TB respectively) and all can be done automatically.
I passed all this on to the UTC informally, but nobody expressed an interest.
And another comment by Andreas Goretzky helped me remember that Windows was not as far away from that ideal as we conventionally tend to think:
Thank you for your thoughts... the UI question you brought up is quite interesting.
Before Windows 7 there was absolutely no way to type classical mongolian letters in a simple way. There were fonts for XP and Windows 2000, but you needed special software for getting the right characters, and scissors and glue and a copy machine for rotating the text, and the Menksoft kbd driver for composing the characters. For simplifying the cyrillic typing I made a driver with MSKLC based on the latin transliteration, but for classical script there is absolutely no way. At least not until Windows 7 appeared.
Windows 7 has at least an integrated IME for typing the mongolian script, and it has fonts so horizontally typing and processing works in some apps, and a LIP is announced...... most likely in cyrillic.
Word 2007 can rotate the document, but has no line breaking into the right direction. There are only 3 options, and the fourth is missing. That's about phantasy of product managers. The easiest way for typing that stuff is Excel :-) having grouped cells and rotated document I can "linebreak" by myself.
I was about to give the RedOffice a try, it was founded by the Chinese government to support all of the minorities languages in P.R. China with a word processor.
But the UI is in Chinese by default so I have no chance to find the option. In theory it has the vertical typing feature, it should have vertical menus as well, and it accepts the IME composed text from W7, but without deep chinese language knowledge I can't solve the line breaking and writing direction setup.
If there ever appears a Windows version that has vertical menu and Mongolian script I mark that day in my calendar and celebrate that instead of my birthday.
But most people didn't see it that way.
It's a vision thing.
By which I mean it's a Vision thing.
Because if everyone can see it, then it isn't called Vision.
It's called Sight. :-)
But if you look at Windows 8, you can see us starting to be a bit more aspirational.
Just look at my list of keyboards, based on the Language Profile list:
Do you see those Phags-pa script and Mongolian script entries?
I expect us to find more people trying things out in Internet Explorer and Word and Publisher, seeing what works. And what doesn't.
And envisioning what the future can hold for us....
Feeling inspired yet?
Well I am, and it isn't just a good review that did it for me....
But don't take my word for it. Try things out in Windows 8 (with these products or others) and let me know what you find! :-)
Not a technical blog today, there will be something technical tomorrow!
I have a bit of a passive/aggressive about situations where I feel embarrassed or humiliated by something.
It doesn't happen often.
But, for example, I have been to Japan several times in my 20's and 30's.
Enjoyed the country every time, always wished I could stay longer.
People there were enthusiastic and interested and engaging, every time.
My last visit was in 1998.
I was walking with a cane, not that I needed to but I was falling more often (the first multiple sclerosis thing that started to have some impact on me, really). The cane was a self defense mechanism to stop the typical assumption in the USA when a young man falls in the middle of the day (drunk!). With the cane, the assumption is different (gimp!), and I felt more comfortable with the latter assumption than the former one....
In Japan, it's a little different.
Not quite as cut and dried as Tia Carrere put it in Rising Sun:
Do you know the term..."he's a bit burakumin"? It's like uh... untouchable. I was even lower than burakumin... because I was deformed. To the Japanese, deformity is shameful.It means you've done something wrong.
That's Hollywood. Real life is a bit more subtle, especially among visitors and tourists.
But there is a subconscious thing there, something that had me feeling pretty unhappy. A general feeling of being looked at as ...untouchable.
I never went back, even after 14 years.
My passive/aggressive reaction to this - skip the country from then on.
Several offers for consulting jobs were quite easy to regretfully decline....
The engineer in me would like to go back again, now that I have the iBOT. Since interest in technology is fascinating enough to counter many other forces. Though my interest in testing my theory hardly seems worth the time and effort to test my theory!
A more recent example....
I tend to join groups that take interest in the arts -- Wolfgang for the symphony at Benaroya, the BRAVO! club for the Opera, the Crew for the Seattle Repertory Theater.
Something about the different slant they put on their respective seasons -- something about aiming at younger audiences and trying to foster love of the arts, I just tended to find the seasons more interesting.
I usually don't even bother with discounted season tickets from these organizations -- better to pay full price, like a patron, in my mind. ;-)
In the most extreme case, my Seattle Symphony tickets are all in the Founder's Row at this point -- after getting hooked after a random upgrade, I got hooked on the better view, and sound!
I turned 40 nearly two years ago, but kept doing this. Why mess with what works? :-)
This is hardly just something I do -- I know of several others who violate the age rules, in each case, or all of them.
And then at one of the BRAVO! pre-season events, I was there and hanging out.
I was in the process of signing up, and then I made the mistake of giving my driver's license so they could get my name easily (my handwriting is not so great anymore -- another MS thing).
Now I'm not going to detail the exact way that my plans to join BRAVO! disintegrated. But I will say that I left almost immediately.
And haven't been to the Seattle Opera since.
Now I don't hate Japan.
Or the opera, for that matter.
At all.
And passive/aggressive reactions due to feeling a little bit burakumin (部落民) is hardly the most productive way to go through life.
Especially when you're dating someone who spent five years in Japan and who loves the opera!
I'm working on it....
And I'm fighting from making a pun about the fat lady singing!
I've had flat tires before in my life.
I guess a lot of people have.
I've changed them myself sometimes.
Once when I was dressed up, chaperoning for an event, one of the high schoolers at the event who chose to not really dress up offered to help me, and I let him.
Other times, I've called AAA and they came and changed it.
Now, in a more iBOT-centric world, not much has changed.
I mean, I go a lot of places.
And put a lot of miles on these tires.
Sometimes a tire goes flat.
Once I called Independence Technology and asked them to come out and fix it.
I never did that again after the first time.
Insane instructions caused confusion. Why didn't they say "they are tube tires, just like bicycles?
$60 for that call.
I've done them myself a few times, though less as my coordination started getting worse -- the whole multiple sclerosis thing.
More recently, and for at least the last 18 months, I've had the maintenance staff at Archstone Redmond Campus replace the tubes any time I get a flat.
They aren't allowed to accept tips ever, but I can load up a Starbucks gift card, give it to the maintenance staff, and ask that whoever did the work get to order first.
I wonder what happens when I move out? Can they still help out a former resident who lived there since 1999?
Maybe I can set up time for when one of them is on break!
Also, Claudia has two sons -- one 17 and one 14. Maybe I can enlist their help! :-)
Now one thing has changed from the old days.
Flat tires used to stop me, until the spare got put on.
But iBOT tires inevitably find themselves flat on a Friday before a long weekend.
Or a few times when I was out of town.
Every time, I'd rotate them so it wasn't the back wheel, so when I went up on two wheels it wouldn't be the one on the bottom!
And then I'd do whatever, and the only time anyone noticed the flat was when I had all four tires on the ground....
Thus far, I've avoided hurling anything heavy at people who mention
Did you know you have a flat tire?
They say this despite the grimace on my face.
As I sit there, tilted unnaturally, I think they are lucky I have nothing to hurl at that moment!
Although (ignoring those awkward moments) I function well, I am a little more tense when I have a flat tire.
Even when no one else realizes it.
I've never felt like a car was a part of me, but the iBOT? It kind of is, really!
So when it is ill, it affects me some, too!
You know?
No flat tires now, so I'm going to bed. This'll post in the morning....
It was about six weeks ago (give or take) that I blogged 'Because even if the new standard is perfect, most people don't like things changed out from under them'. about the new Hebrew keyboard in Windows 8.
And about some of the problems that I noticed in it.
Anyway, the Hebrew National Standard (SI1452) can be officially exonerated here -- the big problems were not inherited from the standard itself.
It seems that the .KLC file that purported to represent SI1452 was not the official one that was created and which can be found now if one looks on the project site.
The mistakes and problems I mentioned in that blog were from this file..
There are other, additional problems that others have found that have the same cause -- that incorrect .KLC file.
Interestingly, that file is no longer where it was originally -- apparently they noticed the problems too!
i find myself tempted to argue that we should service this one.
That the rule about never changing a keyboard layout, while perfectly valid in virtually every circumstance, is not valid in this one.
I'm going to talk to some people next week about this.
And if we get a copy of the official file, perhaps this can be the exception that proves the rule.
What do people here think? Do you think that this a reasonable exception to make?
I thought I'd talk about one of the interesting issues that came out of
The first one was important for background, because if there wasn't even a locale then I wouldn't be talking about this today.
The second one was important for background, since it introduced the two keyboards for that new locale.
And the third one was also important for background, since it pointed out that many Cherokee native speakers were going to be helping to translate Windows 8 to Cherokee!
Now one important part of the localization process is something people were talking about.
Hotkeys!
Those shortcuts, the letters that will appear underlined on menus and dialogs.
They are localized, and only single letters will work.
So what do we do?
I mean the Cherokee Phonetic layout has just six single letters on it (the six vowels):
By usual convention, numbers and punctuation are not used for hotkeys.
Most menus will need more than six hotkeys.
Well, one easy solution is to use the Cherokee Nation layout, which has a whole bunch of single keys on it:
Plenty of letters to pick, and pretty much there will always be choices in the strings you would be using!
Of course, Cherokee folk using the Phonetic layout might be confused if they see underlined letters that they can't seem to use.
One other option is English hotkeys!
I mean, both Cherokee layouts have English letters in them, with no keyboard switching required:
Note that this is the same solution that Chinese, Japanese, and Korean use for hotkeys in cases where IMEs would block the ability to use hotkeys.
To do it they would just have to add the English letter in parentheses at the end of the string.
Of course, some of the Cherokee letters look enough like English letters that people might be confused when hotkeys seem to fail -- so they'd ave to avoid some of those English letters!
Or they could just not do hotkeys.
Lots of people have opinions here, especially testers who look at hotkeys to make sure they aren't duplicated and such.
So it is an interesting issue to discuss now in the thick of things....
Previous blog in the series: Do you have the [short] time for this? Part #1: The intro.
With part 1, I laid out the history of long time and short time.
And I made it all sound like a story with a happy ending.
Sounds like all is good, right?
Wrong!
There are still several lingering problems.
They all have pretty much the same cause, though.
And all of them would not exist if the original Win32 solution had been used everywhere.
I'm not saying that VB, VBA, .NET, or the people clamoring for a "short time" were wrong, per se.
But I am saying that they share blame for these bugs with the NLS team members who did the work to support "short time"!
First I'll show what these formats look like in en-US, where one of the problems doesn't exist:
You will no doubt notice that the four options for long time match the four for short time (with the TIME_NOSECONDS flag being passed for the long time!).
In other words, they can kind of do okay when they match.
HOWEVER, there is no rule that enforces a need for them to match!
And HOWEVER #2, no documentation or description suggests that such matching is needed, wanted, desired, or even pined for!
In fact, there are several examples of locales -- more then three and less than twenty -- in Windows 8 that have non-matching long time/short time formats.
But since neither the people who originally provided the data nor the people who reviewed the data in Windows 8 were ever even given an insinuation that this is the case, they can hardly be blamed for not always having done so.
I still don't know that I would want to formally request it, if no one wanted to update the documentation.
People owning the underlying technologies would have to put their money where their mouth is here, if they want to change the rules and asked for what would otherwise be gratuitous busywork.
The other problem would of course be that "fixing" those as cases would not keep people from either adding new locales or modify existing ones to make them be out of sync again.
This means that there would need to be both Locale Builder updates (all new locales and locales we reviewed were done via Local Builder), and we'd probably need some doc updates there too.
I know they aren't updating Locale Builder these days, but they will have to spend some dev cycles for an internal update if they want to change the rules to force data to be better.
Not that I would object to a public Locale Builder update, either. :-)
But no one is going to build in new requirements on top of me or us if they can't be bothered to support it in their own toolset we are relying on!
Ironically, had they never gone this "we need a unique short time" route, this problem wouldn't even exist.
Because if there was only one set of formats that were modified as needed, they would all be in sync, with no special work needed.
And if this blog seems a scosh shriller than usual, it is how often this kind of thing has been reported as a bug in the data when the real bug has actual villians who caused the problems!
And there's more -- as I'll talk about in the next part....
So as Windows 8 was being developed, we had a huge review done of locale data.
Quite a few bugs were fixed during that process.
And we learned some very interesting issues, as well!
Like if you look back at a blog I wrote back in 2005 (Number format and currency format are not always the same), I didn't give the actual example I had in mind -- the fact that we have separate LOCALE_SDECIMAL and LOCALE_SMONDECIMALSEP settings -- because that example wasn't in my mind a very good one.
Because we had data that said the decimal separator for Farsi (Persian) was U+002e (aka FULL STOP), while the currency decimal separator for Farsi (Persian) was U+002f (aka SOLIDUS). But since it had been over a century that the IRANIAN RIAL had ever needed to be divided into fractional values (often known as DINARs).
They didn't even make the coins any more, so it seemed like a stretch to speak about such a theoretical issue.
Geeks such as myself are often proud of our arcane knowledge, but often try to avoid calling attention to it when it makes us seem silly!
This didn't stop us from voting in favor of changes suggested by Asmus Freytag to the Unicode Bidi Algorithm (UAX #9) to make it more "Microsoft-like) back in early 90s, by changing the Bidi class of U+200f (aka SOLIDUS) to make sure numbers on either side if it were not flipped/reversed, since we figured that was what we were doing anyway. Why not be happy if a standard changes in a way that makes us more conformant? :-)
But one of the review feedback items that surprised us was how much more likely we were told that the LOCALE_SDECIMAL value was often expected for Persian (Farsi) to also be U+002f (aka SOLIDUS).
Thus the whole original reason for having separate LOCALE_SDECIMAL and LOCALE_SMONDECIMALSEP values was perhaps not as true as we had thought!
But obviously we couldn't remove a constant that had been around/documented/change-able since Windows 95, so we just changed the one setting -- the LOCALE_SMONDECIMALSEP for Persian (Farsi), and left it at that.
Good thing we didn't fight those Unicode changes almost a decade ago for a theoretical currency formatting/parsing issue, since it seemed to also be a not-so-theoretical number formatting/parsing issue, too!
And then the wheels started to comer off the wagon.
Because it looks like Unicode may have also changed things back, too.
And because even though Uniscribe/DWrite/Access/Excel handled the <NUMBER>/<NUMBER> issue "properly" -- in other words the same as <NUMBER>.<NUMBER> -- it turns out that Word/One Note/Internet Explorer did not!
So the Windows 8 data change has now been reported as the cause of an Office regression.
As has Office's own inconsistent behavior!
The fact that those same apps are not comformant to wider Microsoft -- or perhaps even Unicode? -- behavior for currency values for almost as long as Win32 has been around is now something that has to be carefully considered as a potential "should we revert the platform" issue, instead of a bug in the apps!
Oops!
We'll need to take another look at this issue (I'll fold in any feedback people give here to that discussion when/if it happens).
And we'll have to never forget that not every part of Microsoft does things the same way!
Today's regular blog is being preempted by something fellow Softie David Monk passed to me via Facebook.
Great guy, om time to time and I've worked on stuff in Windows frwith his wife (Stephanie).
Both cool people, very professional. :-)
Anyway, he forwarded me this article:
McDonald’s Learns It Shouldn’t Trust Free Web Translators When It Comes To Its Billboards
Yup, that's a lesson we all should just know, right?
It is not the same kind of problem (or as dangerous!) as the problems I blogged about in Inaccurate localization can cause problems.
Perhaps a little more like the problem we had in the Portuguese Vista beta I blogged about in Inaccurate localization can make you bust out laughing.
Though since we never shipped that one and thus were never too embarrassed, perhaps We're back and we're embarrassing ourselves? (aka Making your localizer's life easier, Part 2) is closer.
Though that was just embarrassing for us -- it wasn't really offending people like the way McDonald's felt the need to reach out to the Hmong to apologize for.
Just like this other case is embarrassing for a localization company (and the people that asked them ffor a translation!): E-mail error ends up on road sign. Though that one is still less likely to cause offense.
We really need to create some metrics to help us decipher the relative offense, importance, entertainment, and such.
Now in the original article there were many comments, some of which tried to suss out the thought process (or lack thereof!) of McDonald's here.
I mean, they weren't even paying translators, so I doubt they were training cashiers to speak Hmong!
We have the same issue in LIPs here -- being careful about providing warm and sensitive localized product that will sometimes have to throw one back into English.
Maybe it is enough to make the gesture.
Unless you poorly auto-translate it, of course!
I also wonder why no one tracked down what free website provided the bad text.
If McDonald's is apologizing, the website should too. For providing text that was not resonating with members of the Hmong community who say it wasn't written the way anyone actually speaks in the their language — apparently the billboards are missing spaces between words and the whole thing is just a garbled mishmash of nonsense that doesn't mean a darn thing.
Though if you look at the article, many Hmong comments were added -- perhaps the bilingual population is significant there. So maybe the gesture will be appreciated, once they get it done properly.... :-)
If you look at Unicode as it is today, it is a hugely complex standard that defines more than 100,000 characters and has defined complex algorithms for using, displaying, sorting, and storing them.
Though there were simpler times, too.
I mean, before the Unicode Collation Algorithm defined in UTS #10, no way to sort the character was defined.
Every company had their own way to do it themselves -- it's not like Sybase or Oracle or IBM or Microsoft was going to build databases one could query via SQL without defining something useful for the ORDER BY clause to do, after all!
Some of those companies picked up UTS #10 as an option for collations.
And some companies that came along later vchose to use it once it was there.
But when one considers the fact that every character needs a position in the DUCET so it can have a place to go in the order of characfers, there are two kinds or characters to consider:
The first category is easy -- as huge push in the UTS definition to decide the order.
The second category is obviously a bit more complicated -- every set of characters proposed gives some suggestions about the collation of them, and the UTC places them all somewhere bases on that feedback, their expertise, and their knowledge of how the UAX and Unicode work.
Bugs are sometimes found; they are fixed in future versions (after they have been identified).
Old versions are left alone; no wants to break existing behavior, or database indexes....
Now the entire Unicode Standard works that way.
Even the Standard Annexes, like the Unicode Bidirectional Algorithm defined in UAX #9.
Now ordering is needed in the UCA, even when it makes no real sense -- like ordering Emoji or other symbols.
And the UBA has such cases too.
Because if you want to have a formal standard that defines how everything in Unicode behaves in bidirectional contexts, you have to also include that special category of characters.
I refer, of course, to the set of characters that no reasonable human ever expects to use in bidirectional contexts!
Enter these two characters:
These two characters, these two symbols, have the same properties as 0028 and 0029 -- and every other bracket pair that exists in Unicode.
But the Bidi "mirroring" property is not defined for them like it is for every other pair of brackets.
So they never mirror.
Why would they? Their purpose in the standard is for a legacy character in an East Asian standard,and a place where Bidi is for most of the users and potential users, irrelevant to consider for Bidi.
Eventually, this issue was discovered, but it was discovered too late.
The attempt to fix it led to problems in actual usage.
No wanted wanted to break existing usage that way.
And thus, a permanent exception was born.
Every once in a while, someone notices the problem again.
It happened just the other day, in fact.
And the issue was explained yet again. :-)
How bad the problem is depends on how to look at Unicode, and which you think is more important -- the intuitive global behavior of the algorithms, or the realization that when a scenario is not relevant you don't care so much about leaving a case alone....
The other day, Edwin asked me:
What's "the VNI debacle" from your blog?
I assume has was referring to What's the difference between Tiếng Việt, Tiếng Việt, and Tiếng Việt? (other than the obvious, I mean) from earlier this year, where I mentioned:
That second form is the one whose origins are shrouded in Microsoft's hasty cover-up of the VNI debacle, also known as Code Page 1258.
You have to go back in the before time.
The long long ago.
Back when Windows 95 was being built.
The Wikipedia article about VNI talks about it briefly:
VNI vs. Microsoft In the 1990s, Microsoft recognized the potential of VNI's products and incorporated VNI Encoding and VNI Input Method into Windows 95 Vietnamese Edition and MSDN, in use worldwide. Upon Microsoft's unauthorized use of these technologies, VNI took Microsoft to court over the matter. Microsoft settled the case out of court, withdrew the encoding and input method from their entire product line, and developed their own encoding (Windows-1258) and input method. Microsoft's Windows-1258 and the corresponding input method, although virtually unknown, have appeared in every Windows release since Windows 98.
The VNI site says a bit more:
Y-Sa: In 1998, some Vietnamese newspapers wrote that VNI has sued Microsoft for copyright infringement. Would you please provide more information about the “background” of the lawsuit and its progress?
Viet Thanh Ho: Microsoft introduced the Vietnamese Edition of Windows 95 in 1996. In this edition, the name VNI and the VNI accent keyboard layout was included without our permission. We first demanded that Microsoft to remove the name VNI and the related accent keyboard layout from their product. They refused, and we sued them. They subsequently agreed to settle out of court and removed the name VNI and the VNI accent keyboard layout from the Windows Vietnamese Edition.
I know that Phu Do Nguyen was their lawyer.
I know that the code page (1258) is terrible, in part because of the awful, in part because of the terrible pseudo Form V normalization it helps to foist on the Internet.
And I know the Vietnamese keyboard on Windows is terrible because it gives that awful form to everyone who use it.
i can't help but wish that a better deal had been worked out for the sake of the language and its speakers.
Thus my word: debacle!
The temptation to do a better keyboard via chained dead keys is huge, I'll admit.
But without a clear idea of ideal desired typing and whether the layout should be salvaged, it's hard to plan doing that...
Once upon a time, there were two Win32 functions:
This is how the Windows Shell would decide how to display the time in its clock.
Every locale had a default time format, and a list of potential other time formats they could choose instead.
There was no "short time" format. There was just a "time" format that you could pass a flag to modify if you wanted.
Things worked okay.
And then .NET and its DateTimeFormatInfo came,
As .NET was largely meant to replace VB and VBA, it included a LongTimePattern and a ShortTimePattern and actively avoided the notion of "one time format and modification by flags" stuff.
Suddenly everyone started hating that there was no "short time".
And then, starting with .NET 2.0, we had custom cultures.
They became custom locales in Vista and beyond.
And the meaning of TIME_NOSECONDS was retconed to being "use the short time format" now they had a short time to look at:
So now we are where we are.
The old way of doing it is left to the curiosity of SD logs, and the retcon has been achieved.
But clearly there must be a problem, or I wouldn't be writing a blog or blogs about it.
For the problem, you'll have to wait until tomorrow!