Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
So the question the other day was interesting:
I see that the OOB number field only allows English numerals and doesn’t allow say for example Arabic numerals. Is there a general rule we follow here or is this true only for the OOB number field. The reason I ask is that we have a feature called the ‘storefront’ and they have a text box which takes number of seats as input. Now a user in Saudi Arabia might not know the English numerals. How do we expect the end user experience to be like for those users?
Now if you are a regular reader you may know why I found this interesting.
After all, there is my Suddenly, in a bit more time than a blink of an eye, "standards support" becomes "less i18n support" blog, where I pointed out how IE decided to start moving out of the digit substitution business.
For the ironic point in the question asked above -- since all of the Arabic keyboards we ship depend on digit substitution to show the "Hindi" digits, the answer to the question is easy: the web page probably won't support the notion of any normal user being able to type those other digits, ever.
So technically it isn't a scenario.
I have a colleague who works in this area who mentioned to me the other day:
Digit substitution is maybe a tolerable hack for displaying UI, but it’s definitely bad if you’re creating content.
(And getting worse all the time!)
There you have it. Maybe we're moving away from digit substitution.
It's not like browsers are exactly going out of style, right?
But how come nobody is doing the work to improve support for
to support these non- ASCII digits, if we are truly moving away from digit substitution?
It seems like someone forgot to tag second base here, didn't they?
All I'd like to know is what the future is gonna need to look like. Since people like me may evenb own own a piece of what will contribute to that future, someone needs to be pushing for a plan here -- so that I can do my part to contribute to that future plan...
Perhaps the answer is to do nothing, not give users a good way to type the other types of digits, and then use the lack of documents with the digits as proof that no one is using them.
I'd like to think we're better than that, though....
The question came in a message to a few of the distribution lists I follow, a few months ago:
Hi,
Our LOC team is hitting an issue where WriteConsole fails with error code 31. The user locale in which this happens is German (the OS Windows Server 2k8). Replacing WriteConsole with WriteFile(stdout) does not fail, but prints garbage.I tried to repro the problem on another machine, by changing the locale to German, but did not hit this issue.
What does this error indicate? There seems to be very little information on the net about Unicode programming with windows consoles.
Regular readers might know what is going on here.
Though the point about the lack of information is valid (ignoring this Blog, of course!).
It turns out the problem was happening with a Japanese system locale on that German machine, which should give the clues for everyone else!
Now when you get down to it, any time the default system OEM code page ends up being
there are going to be some illegal byte combinations.
And as it turns out, the WriteConsoleA function can have trouble if some of the text you send to it is interpreted as if it is one of these code pages and converted, if in fact it in some other code page.
If ever there was a time to use WriteConsoleW and to just use Unicode, this would be it - when a mismatch between application language and console code page can leave one with either corrupt text or a nothing but an error code.
This blog is about two issues -- one that affects me much more personally than the other.
Disclaimer: I work for Microsoft, the company behind Bing. And I did once interview with Google (the company that finally paid my travel expenses for the interview!).
I'll start with the one than affects me more, since I have been assured by someone i trust that [at least here in the Blog] it is all about me....
Previously, in Once upon a time, you could earn my loyalty, for up to 24 hours at a time, I pointed out I had mostly moved to Bing finally. And that the one thing I still used Google for was its integration with bus routes.
Not too long after that, Bing picked up the bus route schedule stuff too, and now Google's one place for we was the built-in app on the Pam Pre that worked better than Bing via the web on that ultra-small form factor (Google on the web would have sucked too, but they did have the app).
So anyway, all was well and good.
Almost.
In actuality, all three Bus Schedule Integration solutions (Bing Maps via the web, Google via the web, and Google via the Palm Pre app) sucked in one specific way, a way that the King County Metro Trip Planner didn't suck at all.
With these various integrated solutions, there is no option to request an accessible trip.
This hurts me in two specific ways:
Yet either one includes lots of other options, so omitting these other choices (especially the accessible trip option but all three are useful if one has a mobility handicap!) really, really sucks.
Summary: Both Google and Bing suck in this regard. And they SUCK HUGE.
Okay, now there is the other problem.
And that problem is that if you ignore features 4,5, and 6 of the King County Metro Trip Planner, both Google and Bing are only as good as the King County Metro Trip Planner.
And no better.
Yet there is the service provided by sites like One Bus Away, which give live tracking info of where the buses actually are (Raymond mentioned it a while back, here).
These Bing/Google apps do nothing other than tell you where they ought to be if they follow the schedule. But in a city that spends a lot of its time running 1-20 minutes late (and occasionally 10 minutes early, as I have learned the hard way!), the data of what is rather than one might be in theory is pretty much indispensable.
Now One Bus Away is I believe still largely based on mybus.org, and thus like with the first problem they could work to integrate the data here if they so chose -- either with the source or with One Bus Away.
King County Metro has a Tracker, too. So there are a lot of options.
Integration can be a complicated problem, but at a minimum a special addition of the live data that could let you know if the suggested route is doomed would be nice (taking it a step further to suggest timely alternatives would be more difficult given limits on what i available, but the time to integrate just the data here would be really nice.
There is at least a mitigation here: I can run both one of the integrated apps/sites and One Bus Away (which even has a Bus Trackerapp on the Palm Pre!), or in my case all three since I am forced to also use the King County Metro Trip Planner site. But despite the Palm Pre's WebOS being able to multitask, I find the Google app to be pretty flaky about memory usage, which can make running both of them let alone all three of them hard.
Summary: Yet again, both Bing and Google suck in their bus integration, in this case Google a bit more than Bing.
So why do both Google and Bing let me down here. I'm not sure, but both of them have a very high suckage factor, especially on their refusal to support the built-in information that their source provides on accessibility.
Maybe they both could do a little something here. If they manage to do so and thereby stop sucking, I'll mention it.
And for now I'll use three sites, since no one site has proven able to do the job
So, in some recent blogs like When your Clipboard isn't their Clipboard. Or my Clipboard... and Microsoft is better as one big company, I have gone to some great lengths to contrast the roles, some of the strengths, and even some of the weaknesses of different parts of Microsoft, focusing on Windows and DEVDIV especially.
Office had a bit part in the drama here, but I mainly was talking about the stuff it did, that it created.
I never got into motivations that drove them -- it was all about what they did, not why.
Today I'm going to get into that a bit.
I'm going to start with Pi. You know, π.
For most general purposes, you can use 22/7, or 3.14159. You don't need to do 3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679 or longer (like writing code to get it to 1,000,000 places for pages like this one
It is hard to escape the fact that for most uses even 22/7 will do.
And generally speaking that's where Office places itself. In being good enough for people to get the job done. Thus you won't see nearly as much excitement over a need to do every possible theoretical thing within a feature as you will over the need to fit the scenario.
Now Windows is more of a platform, and as much as we like to focus on specific scenarios in the spec, we know that you can't design device driver stacks or window messaging semantics or protocol implementations or file systems on just a few scenarios. They have to work a hell of a lt more broadly than that.
This is how things happen like the feature I described in Oh (Saka to me, Saka to me, Saka to me, Saka to me) Whoa Babe (Just a little bit) A little respect (just a little bit). That feature is woefully incomplete from the standpoint of Windows. Terribly so. Though it is "good enough" for on epart of Office to use.
Thus when I originally wrote about Address formats are hard, let's go shopping!, and explained why even though a part of Office felt like they could jump into the whole address format space, even if the job wasn't complete. While at the same time, Windows wasn't willing to try to pick up a feature like that, since there wasn't really the resources or time or staffing to take on, and to do the job right.
But onc again the Office feature is good enough -- for Office, and the scenarios they thought would be nice to light up.
Now I don't want to say that Windows is perfect and Office is a crap just because they focus so often on being good enough.
I mean, the reason the Windows calendar story is so awful is that no one wants to make incremental improvements -- they want "the right fix". And since we lack resources for that, no fix ever happens. The Windows insistence on handling the full end-to-end job for a feature or an area causes more than one feature to be absent when the product ships. And there is no virtue in that.
Now with all that said, the massive migration of people from Office to Windows that started a few years ago has had some impact here, particularly in the upper levels of the UI. They talk more about scenarios and such and solve problems perhaps less completely but also perhaps more fully for a narrower number of cases. But the deeper you go the less such a philosophy can penetrate....
Anyway, I thought of this when I saw the folllowing issue in my email inbox on Friday:
The Unicode Consortium has posted a new issue for publicreview and comment. Details are on the following web page:
http://www.unicode.org/review/pri180/
Review period for the new item closes on April 11, 2011.
Please see the page for links to discussion and relevant documents. Briefly, the new issue is:
Issue #180 Proposed addition of address form metadata to Unicode CLDR
The Unicode Consortium is considering the addition to CLDR of address form metadata. This metadata is intended for presenting a form for users to fill in with address data. The format and data is being donated by Google. The consortium is soliciting feedback on these changes. Feedback should be submitted as comments to http://unicode.org/cldr/trac/ticket/3572.
The detailed proposal and background information can be found at this link:http://www.unicode.org/review/pri180/
If you have comments for official consideration, please follow the instructions and the link in the text of the PRI.
My comment?
Well, I wish them well.
A mondo address database will keep Unicode and the CLDR folks busy for years and is a very natural extension to the whole CLDR project.
But I've seen the data they are talking about.
It's fuzzy, and its incomplete. It's poorly scoped, and the real usages are not well captured by the data and thus for many cases what it provides will never be better than just letting people address their letters their own way.
And maybe that's "good enough" for some people.
To me, it's like weather prediction algorithm that can accurately forecast the weather three days in advance, and whose only flaw is that it takes three days to run. And thus in the end has results comparable to looking out the window.
Perhaps its my [non-Office] bias showing, the bias that caused me to be one of the group to turn down the similar offer/request years before, but a mound of data of uncertain value and high expectations is not always a good thing. Even if it's a huge horking mound of data.
YMMV, but to me "good enough" really isn't good enough, to me....
Mark Davis noticed something amiss:
As I recall, ISO said that they were not going to reuse codes for 50 years.But BQ has just been reused, and I don't think it has been 50 years since ithad the old meaning.
http://www.iso.org/iso/pressrelease.htm?refid=Ref1383
http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2#Deleted_codes
Mark
Some might remember the crazy time of the "CS which was Czechoslovakia and then Serbia and Montenegro" reassignment debacle.
The idea of "we won't reuse for 50 years" sounded vaguely familiar to me, too.
Though to be honest it sounded way too easy for the rules of an ISO standard!
And indeed it would have been. If you look at the second article aboute the ISO-3166-1 alpha-2 codes, you will see a framework worthy of an ISO standard!
Basically, there are some "reserved" elements:
Reserved code elements are codes which have become obsolete, or are required in order to enable a particular user application of the standard but do not qualify for inclusion in ISO 3166-1. To avoid transitional application problems and to aid users who require specific additional code elements for the functioning of their coding systems, the ISO 3166/MA, when justified, reserves these codes which it undertakes not to use for other than specified purposes during a limited or indeterminate period of time. The reserved alpha-2 codes can be divided into the following four categories: exceptional reservations, transitional reservations, indeterminate reservations, and codes currently agreed not to use.
So there you go, four categories of reserved:
And of course a mechanism to represent formerly deleted codes which may then be reassigned in ISO-639-3.
So that the historical info is held onto in a way that they believe is acceptable even though it freaks all of us out a bit.
Like I said, worthy of an ISO standard!
A little of the CS mess is covered by me in blogs like My aren't we looking quite Bosnianesque? and {Insert a pun about the word Serbian here, I can't think of one}. I am sure there are people in Microsoft who weondered what tehy got themselves into by using RFC 1766RFC 3066BCP-47 tags for their locale names. Both for self-caused CHS/CHT vs hant/hans issues and for ISO-caused problems like the CS debacle.
One day we may even miss LCIDs (some people aren't even aware that LCIDs suck, yet!)....
It was a while back when Emkay wrote in the Suggestion Box:
Making a case for a new topic - you are probably aware of this issue, but just in case, I explain it below.
You may want to explore how the "Nastaliq club" with mostly Urdu - but probably other langues too - avoids the use of Text world - (Unicode or any darn code!) for serious work - despite all the inconveniences (not sure if you are aware of it or not). Surely, there is the "textual" web in Urdu, but for any serious work, people exchange image files in Nastaliq. If you want evidence, just go to your favorite search engine link for images (e.g. http://www.bing.com/images OR http://images.google.com) and type "Urdu Poetry" as your search string. Facebook is full of people exchanging images.
This world uses specialized word processors like InPage. An urdu newspaper like the Urdu version of Daily Express from Lahore is published with images - http://www.express.com.pk/- you click on the story image in the epaper and out pops an image with text - in Nastaliq.
(Off topic: I only recently discovered your blog - despite being interested in issues you cover - and I love the blog so far! Its great to hear from people involved!)
Let me say that there is a lot of truth in Emkay's words here.
There are many different calligraphic writing styles for the Arabic script, such as:
and so on. There are many others.
For the Arabic language, Naskh is probably the most common today. But although so many of these various script forms can be used interchangably to represent the same text, users of the so called Perso-Arabic form of the Arabicscript is specifically distinguished and most commonly it is usage of the old Taliq or the Nastaliq script.
Originally developed in Iran, it is probably much more seriously associated with Urdu, a language which many of its speakers really tend to dislike using Naskh for its appearances. For computers, the most common program is InPage (as Emkay mentioned), but InPage is not using Unicode for the content, and thus the most common way to pass the poetry along is indeed to pass image files.
As I imagine you can all imagine, this is a really non-optimal way to store text -- I mean copy/paste and search are pretty much out of the mix entirely. But it has been many years and despite a lot of different attempts we have yet to see any widely used Unicode conformant Nastaliq font....
In theory it is doable, but in practice the attem,pts so far have all fallen short in approaching the beautiful complexity that is Nastaliq.....
I'll be honest and say that the feelings here are very much outside of my experience. For Latin script I feel none of the same attachments to various calligraphic forms. I don't know what that feels like, to feel that strongly about a font style.
Well, except for Comic Fixed, I mean.... :-)
Nastaliq as a tradition though is fascinating. I eagerly eagerly await the day that the company Unicodizes its font being used in InPage or some other company steps up to solve this problem.
Over in the Suggestion Box, Marcus B. asked:
I currently have to delvelop an on-screen-keyboard at work. I wonder how the windows osk knows what keyboard type to display. I mean not the keyboard layout but the location and size of the return key and the surrounding keys. Is there any win API and where is this kind of information stored or is that a hardcoded feature for the windows osk?
Thank you :)
Sometimes the quickest answers are the most disappointing ones. :-(
There is no function one can call to tell you what layout to use for the emulation of the physical keys.
Generally there are just a few basic general types of layouts, and to solve this problem one would keep a list matching the KLID of the keyboard with the layout type. And of course have a general choice to use when an uknwon keyboard comes along....
For hardware itself it is kind of tied up in the use of GetKeyboardType, as I mentioned in Had I known that my last release would be *the* last release..., aka hindsight is 2020. Though as I mentioned in that very blog, despite the fact that there is a clear realationship between each keyboard layout DLL and the keyboard type information, this data is not directly exposed.
You can actually see it in the KBDTABLES structure defined in kbd.h, added to the end of the structure in Windows XP:
...#if (NTDDI_VERSION >= NTDDI_WINXP) /* * Type and subtype. These are optional. */ DWORD dwType; // Keyboard Type DWORD dwSubType; // Keyboard SubType: may contain OemId#endif} KBDTABLES, *KBD_LONG_POINTER PKBDTABLES;
Now in theory this means one could write a function to call the KbdLayerDescriptor function exported by the layout DLL (as previously described in I know that header file is around here somewhere and then you could get the dwType here and then by spending some time in the header file and with that Had I known that my last release would be *the* last release..., aka hindsight is 2020 blog, one could have a simple mapping of intended keyboard types for each layout and the way the keys themselves are laid out.
However, the fact that the tools (kbdtool.exe and kbdutool.exe) that are used to build the keyboard layout DLLs do not expose a direct method you can use to set these values, it is unclear how useful the data will be. The heuristics that are used to set the values are deterministic, but they are not documented....
I would deem the risk of it to ever change to be exceedingly low, since such a change could break all the existing layouts; given this it may be reasonably safe for someone to try to reverse engineer it if they wanted to. Though I had a tough time omin up with a ompelling reason to want to. YMMV!
In the end you're better off just storing a table, like I said. But that looked like way too short of a blog, and I knew someone would ask about GetKeyboardType....
And in a bizarre turn of events it was a question that had very little to do with internationalization that caught my eye:
I saw there are two Clipboard in System.Windows and System.Windows.Forms respectively. Can someone help me to understand what’s the difference between these two and what’s the guidelines for when to use each?
Also, why we have two class that has nearly the same functionality, which is very confusing, at least for me.
Thanks,
Too true!
They actually come from two different places (System.Windows.Forms.Clipboard is in WinForms housed in System.Windows.Forms.dll, and System.Windows.Clipboard is in Windows Presentation Foundation (WPF) housed in PresentationCore.dll).
Admittedly they are pretty similar (both are wrapping the same native functionality that has been in Windows for the last fifteen versions, so how different could they really be?).
But they are not entirely compatible since they are different types (in particular their DataObject and IDataObject definitions in particular make drag-and-drop in shared components that span the two technologies a lot more complicated than they ought to be.
As my friend Paul pointed out in the thread, the WPF version was added for a simple reason -- so that a WPF application wouldn't have to load WinForms just to copy and paste.
Of course this points to a design problem that could have been avoided in the original .Net release, and one that could have been fixed in the version introduced in 3.5 when WPF was added: the clipboard functionality is not properly factored!
Now as you all know, the clipboard works throughout Windows -- and when I say this I mean that a technology completely unlike either of the above two technologies -- the console -- can also use the clipboard. It even has its own managed wrapper (System.Console), though if you want to use the clipboard you have to pull in either WinForms or WPF.
And that is dumb.
Or go native.
But the WinForms/Console split was understood when .Net 1.0 shipped, and the WPF/WinForms/Console splits were understood when .Net 3.5 shipped.
At any point they could have put the code at a lower level, one all could share. And if one or more were already written they could have gutted the written ones and called the lower level shared one.
It would have been easy.
And after all, they've already done it.
Twice!
Though when the suggestions along the above lines are made, they point out all the excuses why not to do the work, such as:
and so on.
I'm not impressed.
Even ignoring the whole Microsoft.Win32 namespace that would be perfect for a Clipboard class!
For #4 and #5 and pretty much all of them, .Net could have made their own brand new thing instead of wrapping the old one if they feel it is so limited. They could drive the change, and be the change! They ship with Windows so anything they add would be around for all of Windows too, in no worse of a way that the Windows "only latest version" provides, and probably better since they can go downlevel.
In other words, DEVDIV is much better than Windows to jump in here and do it right, if they are convinced Windows is doing it wrong....
Oh wait, never mind. As I pointed out in Microsoft is better as one big company, architecting new features like that is not what DEVDIV does. So of course they want the thing they wrap to do the heavy lifting.
It's like they shot their wad when they designed the memory allocator to get rid of ref-counting and a few other things. :-)
The above was me teasing them a bit, so don't read too much into it. Though it's fun when other people prove the points I make without even realizing they are doing so!
So never mind on pointing out they are the perhaps the best people to do it and get Windows toeventually pick it up like they do with Office. They don't want to think outside that box. And I can respect that.
To be perfectly honest trying to add new features to the clipboard is hard -- Office tried to do it a few versions ago, and have at this point almost completely removed the feature they added since the number of people with "this is great!" feedback were overwhelmingly outnumbered by the people with "this blows goats!" feedback. Even now pieces of it are still there (it's hard to remove low level functionality once it's surfaced to top level UI), but in Office 14 it is mostly history.
So with Office having struck out here I wouldn't suggest DEVDiV try to hit that one out of the park when they are bat.
But perhaps fixing what is broken in their own implementation:
would fit the mission that .Net has to do quality wrapping of functionality people use all the time. It would ease the maintenance burden and make it easier for developers to use.
As an aside, there are other features that got stuck in WinForms that should have been in Microsoft.Win32 -- like InputLanguage, for example.
It's like WinForms had a bunch of extra devs who did core work that had to be done, and
were all napping when such classes were stuck in the wrong place, and in the case of InputLanguage with incomplete functionality based on the abilities of what it's wrapping.
But I digress.... :-)
I really believe .Net did some stuff wrong here and really should take steps to fix things. I know I'm not the only one to do clipboard stuff in Win32 p/invoke just to avoid some of the weirdnesses the managed implementations add to the mix. And that can't be what they intended.
They are not the group that you generally expect poor factoring or shoddy design from. At all. They really ought to fix this!
In my mind I blame this all on the fact that Katy and Kim aren't at Microsoft anymore, and Joe isn't in that group anymore. Which is silly since none of them were working on these things anyway so its not like having then around would change anything. Well, I mean other the fact that all of them would listen to me rant about it, and be amused.
Though I am having lunch with Kim later today. Maybe I can guilt her into paying for lunch for not speaking up about this bug while she was here? Just kidding!
The moral of the whole story in this blog? I should clearly stick to internationalization, where I have strong opinions.... :-)
Today's late blog posting was to wait for the actual release. The Rapunzelian pun in the title wll only be completely effective if you pronounce CHAR as "care" (ref How is that pronounced, exactly?)
Perhaps, like me, you have some interest in the languages of India.
Perhaps you have visited.
Perhaps you were born there, or your family was.
Or perhaps you live there right now.
Either way, if you read my If it was me, I'd say "I love the Rupee, and it was my idea!" last July or Unicode 6.0.0 is [virtually] released! last October or Unicode 6.0.0 is [actually] released! last month, you may have been eagerly waiting to find out when Microsoft would be responding to and providing for support of the new Rupee symbol.
Well, I can tell you now, it is available for download, right now!
The details on how to get it for the various 32-bit and 64-bit versions on the x86, x64, and IA64 architectures can be found in
KB 2496898: An update to support the new currency symbol for the Indian Rupee in Windows Vista, in Windows Server 2008, in Windows 7 and in Windows Server 2008 R2
and you can install it now from the Download Center!
Anticipating a question: there is no update for Windows 2000, XP, or Server 2003, as all are out of support now. If you are running one of them you are on your own for rupee support....
The update includes several important pieces:
And in addition to telling you all of that, I thought I'd point out some interesting info about the package as well!
Some people may be surprised that the locale data was updated, as this is typically not something that has been done in the past. But there are two reasons that the update was done:
There may be a few folks who had to deal with Euro updates from years ago that didn't have this capability. I guess all I can say is that With massive bugs come huge opportunities. :-)
The updated fonts are limited to:
because these fonts are so often the fallbacks and they so often include currency symbols -- discussed before in my Falling back shouldn't mean falling over (though perhaps it does, a bit) -- this saves the patch from having to include hundreds of fonts that might have to see updates in a major release (this way they got away with just 17 files).
And finally there is the fascinating story of the keyboards.
The new en-IN keyboard has the rupee right at ALTGr+4, as people might want, though that comes at a price: on the English keyboard one can use either ALT key to get to the ALT shift state and to accelerator keys, but on this keyboard the right ALT key is no longer an ALT key: it's ALTGr. I spent a little time using it and it tripped me up, so it is just something to keep in mind....
Also, for the other keyboards, many of them already had assignments to the ALTGr+4 key (basically all of the numbers were in the ALTGr state), so the rupee could not be added there.
After a lot of discussion, the consistent placement at CTRL+SHIFT+4 was chosen (CTRL+SHIFT+1 and CTRL+SHIFT+2 already have ZERO WIDTH JOINER and ZERO WIDTH NON-JOINER). And that same key combination was added to the en-IN keyboard just to have that one point in consistency (though that led to six separate emails to me asking about why the rupee was in two places on the new keyboard layout; I suppose I should be happy people were looking!).
Now there are a few things that weren't updated -- like the data behind GetStringTypeW, the collation data, the name that would show up in Character Map, and so forth -- there is really only so much that can be done in between releases, so there are a few things that will stay missing for the time being....
But there you have it -- the Rupee support has finally arrived!
When I think about the fact that symbol was chosen just eight months ago, the fact that it has made it through the addition/update processes of Unicode, an ISO standard, and four versions of Windows -- all of which can often take years to see changes more typical or ordinary -- is nothing short of amazing....
So, the last few blogs were an interesting series of contrasts.
On Friday in It is totally awesome if they yell "Løp av sted!" in the Norwegian dub of Monty Python and the Holy Grail, I covered an IE9 localization bug found during the beta. I explained why it was interesting and perhaps even amusing. I also pointed out some of the interesting lessons one can learn about the localization process from it.
Then on Saturday in Lack of confidence in a feature can keep me from installing it. Oh yeah, a BSOD can, too., I covered a Windows 7 SP1 bug in the most negative possible terms and suggested how completely destructive the bug itself but more the poor description of it might prove to be for the usage of MUI and language support.
Both are bugs, but obviously they inspired two completely different reactions from me (and from other people).
Just to be clear on the fact that there is indeed a middle ground, I thought I'd point out a third bug.
Now this one comes from my Norwegian friend and colleague Kim. I'll have to write some day about a few of the more appropriate conversations he and I have had as many are quite interesting. We'll stay away from the Danish topics, which are akin to the conversations I have with my Slovak friend/colleague Zuzka about Slovenian -- neither is the best fodder for blogs.
But there are other conversations, like the one about this bug I'm going to talk about!
Here is the timeline:
2010
Kim installs nb-NO Windows 7.
He mentioned to me that he always does this, and has come to feel quite comfortable with the Norwegian localization quality. On a few occasions he has even pointed out how much it has improved over the years.
January[ish] 2011
Kim installs nb-NO Internet Explorer 9 Beta.
This is actually how he noticed the bug I mentioned in It is totally awesome if they yell "Løp av sted!" in the Norwegian dub of Monty Python and the Holy Grail, actually. Dog fooding FTW! :-)
February[ish] 2011
Kim installs nn-NO Windows 7 LIP.
I have mentioned the Nynosk LIP previously in Nordic duck duck goose -- Bokmål, Bokmål, Bokmål, Nynorsk!. I kind of goaded Kim to install Nynorsk, after he mentioned to me an interesting fact: that although he has been reading and writing Bokmål all his life, when he speaks Norwegian it is in many ways more Nynorsk-ian. And that really he is hardly alone here -- most of the country hovers between the two outer markers that these two official dialects represent. and people do understand each other. I convinced him to install it, nearly 11 months after it was released (I actually started in January 2011), that he might find it really interesting, certainly enough to be able to give some thoughts on the experience.
A month later he proved that it was quite worthwhile, in fact!
March 2011
Kim tries to install nb-NO Internet Explorer RTW
The result was the following dialog:
Translation:
Wrong version of Internet Explorer Setup
This version of IE does not support the current Windows language version. Go to [URL] to download the Internet Explorer version for your Windows language version.
Clearly, all of the people who poo-poo'ed claims that IE was an integral part of the operating system failed to consider how easy and natural it is to develop the two with assumptions about language support. The notion that the lack of support of a single language in IE would lead to problems understanding the install experience is something that is (in retrospect) an obvious problem with an even more obvious solution.
Though of course this is harder to contemplate right after you ship!
The failure is easier to recover from and the workaround here is much easier than the bug in Lack of confidence in a feature can keep me from installng it. Oh yeah, a BSOD can, too. because there is no crash and nothing needs to be uninstalled. You just have to change the UI language temporarily from nn-NO to nb-NO and that's it.
Much better, all the way around.
So even though it's not great,it doesn't feel nearly as bad to me...
In the long run, a change to a slightly more flexible design that does the following would probably be best
This bug is clearly somewhere in between the extremes of the blogs from Friday and Saturday, though probably closer in spirit to the Friday bug....
A long time ago, I started a series, with two parts:
I didn't continue the series, though I think maybe I should have. At some point, I imagine I will.
Those kinds of issues aren't "Hug a Localizer" Daysappy points, they really do make for a better product.
But today I am going to talk more about the consequences of software engineers not extending the importance of good engineering to include proper localizability at a far more basic level.
I mean, bad translations are a bad experience, to be sure. And I don't want to minimize that.
But when I consider issues like the bug described in
I almost want to tear my teeth out.
Do you recall all that rhetoric about most blue screens being due to third party drivers? If I were a third party driver, I would keep this link on speed dial to send every time someone made that claim again. Because this one is all Microsoft, and proof that no one can bluie screen like the big dog can.....
The article doesn't pull its punches, though it is kind of sparse on the information. An excerpt:
After installing Windows 7 Service Pack 1 (SP1), you might receive the following error message on a blue screen:
"Error C000009A applying update operation {###} of {###} (\Registry...)"
To resolve this issue, restore your computer to a point in time before you installed Windows 7 SP1, uninstall any unused language packs, and then reinstall SP1. To restore your computer to a previous point in time, you'll need to use the System Recovery Options menu.
Lots of questions raised there. Like for example:
After the article goes through the frightening post bluse-screen steps and into system restore and then trying to reinstall SP1, it continues:
After you reinstall SP1, if you still see the error message, follow the steps again to restore your computer to a previous point in time, and then uninstall more language packs.
Now let's ignore the fact that many of the most common MUI scenarios involve machines where you don't really know much about other languages or whether/wen/how they are installed.
And thus let's also ignore the nightmare helpdesk scenario here -- like what to do if you are an IT person and multiple machines pop up with this error.
In those nightmare scenarios, the IT person will be cussing at Windows 7 like as if it was Vista until sailors walk out, embarrassed at all the bad language.
But even in the more ordinary user scenario, the engineering behind language setup, configuration and switching will be very much associated with user pain.
And past that, the most important of all the unanswered questions:
Okay, that last question is important to me. Because even if no one else feels it or admits to feeling it, I am feeling some pain here. And although I'm not empowered by Microsoft to apologize on behalf of the company, let me take this opportunity to apologize to anyone who hits this bug. Because you deserved better. Further, I will apologize for that article, too. I (by which I mean we), ought to know better.
Now the General guidance before installing Service Pack 1 for Windows 7 and Windows 2008 R2 article that points to the one above has a little more information:
Remove any unnecessary language packs from your system. We’ve seen a lot of issues where customers have installed all of the language packs and are hitting issues installing the service pack because they run out of resources on their machine. Long story short, if you don’t need the language pack installed, remove it.
I would love to see metrics on how many people who did have a Language Pack or a Language Interface Pack installed (and who had to uninstall it for this bug) who will choose to reinstall them after their system is back up and running.
It almost reminds me of a service pack install version of the Vista bug I described in Install your language packs like you put on your socks -- ONE AT A TIME!. I guess maybe they never got around to fixing it. Sigh....
I myself am going to uninstall every single language interface pack and language pack on my machines before I try to install SP1. Some may get reinstalled, sure. But not as many and not as quickly. And not as often in the future (given the anticipated impact on a future service pack).
Not until someone provides some actual answers, and inspires confidence in the MUI feature again.
That lack of confidence is a blocking issue, to me. Your mileage may vary....
Sometimes the best bugs are the ones we find before we ship the product.
Because if we shipped it then it is out there in the wild causing problems. It may be interesting or intriguing, sure. it may be embarrassing. But unless you are an alt.i.hate.microsoft deacon then it won't be as truly funny
However, the bugs found prior to ship, they can be pretty freaking hilarious.
I mean who doesn't look at this screenshot from Does bear sh*t in woods^H^H^H^H^HSlovenia? from the beta Slovenian LIP where Sleep and Hibernate were both translated into the same word:
and find it at least a little bit funny?
Or this beta splash screen from Windows 2000 where they mirrored the logo bitmap:
Whoopsie!
But you see? It's fun because they were found before it shipped and we learned valuable lessons about localizability and localization!
Anyway, my Norwegian friend Kim told me about such another bug just recently, this one in the pre-release Norwegian IE9.
The story is a great tale (some names removed to protect the protected):
I had an issue with ██████████, in that some lines just decided to go blank (who doesn't love ████████...). Turns out there's a patch you need to accept (not fully tested by MS, etc.), and you get a temp password.
I download the patch, and try to execute it. IE9 & SmartScreen Filtering is very suspicious and try to tell me it might be fishy. Looks like SmartScreen has some logic that can detect how often something is downloaded. Anyway, it tells me that this particular patch/update is not downloaded very often and not signed by the author. It continues to warn me that I should delete this program if I don't know for sure whether it's good.
All good so far, and at least I get two options:
The two options are "Delete (recommended)" and "Run away".
The translator (who in this case is an ex-colleague of mine, which I know very very well :) misread "Run anyway" as "Run away".
And our colleague Claus pointed out (in addition to the hilarious bug (which was fixed prior to ship!) the lessons that can be gleaned from this bug:
it is an excellent example of how important it is to localize in context, and even when you do localize it as such, you preview and ask the question – does what I wrote really make sense?Then look at the source string again to compare – Run Anyway
Of course we can't really get fully past the image that Run Away brings to many people.
That of the Killer Bunny from Monty Python and the Holy Grail:
Run Away!!!
Luckily the QA process provides the appropriate Holy Hand Grenade of Antioch so we don't have to Run Away!
So does anyone who watched the Norwegian dubbed or subtitled version of the movie know if they yelled Løp av sted! in that scene? Because if they did that is totally awesome. :-)
As are funny bugs like this that get found prior to ship!
So a question came up the other day.
Someone was wondering why The following two functions:
both list an IRQL level in the docs of PASSIVE_LEVEL.
Now I've randomly talked about IRQL from time to time in blogs like A difference that makes no difference, makes no difference (aka IRQL <= APC_LEVEL vs IRQL < DISPATCH_LEVEL) but these days I don't have much that I need to do down there so I don't talk about it much.
Now you could think of these functions as the Rtl* versions of _itow and _wtoi. Since that's all they really are.
And in fact that is kind of tied up in why they cannot run at DISPATCH_LEVEL. Because at one time (back when those CRT functions were first done), they both moved stuff in and out of their non-Unicode counterpart functions. RtlIntegerToUnicodeString and RtlUnicodeStringToInteger simply inherited this implementation (which may no longer be there in the CRT now?), and they depend on RtlAnsiStringToUnicodeString and RtlUnicodeStringToAnsiString, neither of which can be run at DISPATCH_LEVEL since either the code or the tables the code uses may not be paged in at the time.
The other, related functions that do the same conversions to IntPtr and such have the same issue.
Now strictly speaking, this could be fixed.
Neither RtlIntegerToUnicodeString nor RtlUnicodeStringToInteger are doing anything all that complicated such that implementing them straight in Unicode would be a terrible speed or size burden.
Though since drivers often have to target multiple versions and it would be weird to have IRQL info documented as being different in different versions, no one is really clamoring to fix the issue in the more conventional way (create new functions).
To be honest I would put both in a header file under a special define so the fix would optionally be available in any version of Windows. And just fix it for once and all.
But again that lack of clamor would hardly make people want to run out and do a bunch of work here.
Maybe in the scheme of things, these small "corruptions" in the model don't really matter.
Perhaps kernel level and device driver level engineers don't think of functions that only run at PASSIVE_LEVEL as being flawed, even if (in cases like this one) they could be, if they were only designed to do so. Designed better.
Maybe they wouldn't even judge this in terms of better or not better (though the purely Unicode function would be both safer and faster, which makes it likely that all things being equal they would pick the purely Unicode function if they had the choice).
The impact with all these kinds of examples (and many others like them!) is little pockets of non-Unicode that will forever hold us back from being fully Unicode, even if you the third party developer moves completely to Unicode....
Breaking changes suck.
Even when the thing broken is of limited use and serves limited purpose.
The suckage still exists.
Anyway, the report was a clear enough indictment:
In .NET 2 it used to be that for neutral cultures CultureInfo.KeyboardLayoutId would return some specific culture.For example, new GetCultureInfo(“en”). KeyboardLayoutId would be 1033, while LCID is 9. Same for other neutral cultures. Not anymore in .NET 4: KeyboardLayoutId is 9 in the case above.
I cannot find any information regarding the reasons for this change. The article http://msdn.microsoft.com/en-us/netframework/dd890508.aspx claims the opposite of what I observed, saying that now we’ll get specific values instead of neutral. In reality, we used to get that with .NET 2, now we get neutral values for everything.
Being more specific!Previous releases of the .NET Framework throw an exception if applications try to access some of the neutral culture properties such as the CultureInfo.DateTimeFormat.FirstDayOfWeek property. In .NET Framework 4, all neutral culture properties will return values which will come from the specific culture which is most dominant for that neutral culture. For example, French neutral locale will retrieve the values of most of its properties from French (France). The CultureInfo.DateTimeFormat.FirstDayOfWeek property would return Monday for French which maps to the value in the French (France) culture.Some properties will be an exception to this rule where they will have different values from the dominant culture properties such as the language name. For example, the language name of the Norwegian neutral culture is Norwegian while the language name of the specific culture of Norwegian, Bokmål (Norway) is Norwegian (Bokmål).Some properties and methods of neutral cultures will return specific cultures instead of the neutral cultures such as KeyboardLayoutId property and GetConsoleFallbackUICulture method in CultureInfo class.
Every word of this description is true.
And yes this technically constitutes a breaking change.
Long time readers or web searchers of the KeyboardLayoutId may have run across probably the only useful historical discussion of this property, my blog What the #$!*& is a KeyboardLayoutId, anyway?.
The whole point of this property is to provide a "better than complete and abject failure" case for the InputLanguage.FromCulture method.
So by that metric, the .Net 4.0 regression is clearly not so very good.
From .Net Reflector, looking at the 4.0 framework:
public static InputLanguage FromCulture(CultureInfo culture) { int keyboardLayoutId = culture.KeyboardLayoutId; foreach (InputLanguage language in InstalledInputLanguages) { if ((((int) ((long) language.handle)) & 0xffff) == keyboardLayoutId) { return language; } } return null; }
Okay, that code will surely fail on all neutrals that the code used to succeed on.
Regression.
Now as to why, remember in my older blogs about neutrals from years ago, like What is a neutral culture? What is a neutral locale?, where I've talked about so many of the problems of neutrals and how they were implemented.
In Windows 7, the NLS team worked to fix many of the issues here, such as:
and more, and they did a bunch of work.
As mentioned in Mapping Locale Data:
List Neutral Locales
To enumerate neutral locales for Windows 7 and later, your application can call EnumSystemLocalesEx with dwFlags set to LOCALE_NEUTRALDATA. It can also use GetLocaleInfoEx with LCType set to LOCALE_INEUTRAL.
So they added a way to query for neutrals and get the actual neutrals rather than the defaults -- it is an opt-in flag.
And they added a bunch of script specific neutrals so that each multi-script language would have such a neutral to fall back on.
And then, as a part of syncing back up the managed cultures and the native locales, they gutted the old neutrals and made the new .Net 4.0 neutral cultures depend on this new native locales support.
In the process of this, they broke the KeyboardLayoutId for neutrals.
Thus the blog's title, because this blog is indeed reporting one casualty in Operation Rehabilitate Neutrals; luckily, it was the stupidest member of the unit....
So last week I got a note from Alex via the Contact link:
Subject: 64-bit TableTextService and 32-bit programs
Hi, I am using TableTextService to maintain my own Cantonese input method after reading your post: http://blogs.msdn.com/b/michkap/archive/2006/07/27/679538.aspx The posts you wrote about TableTextService is really informative and useful, I couldn't type Chinese without reading them! Few weeks ago, I upgraded my computer to Windows 7 (64-bit, Home Premium). Everything runs fine except the input method. I have US English and Cantonese input method configured. I found that the input method created by 32-bit TableTextService didn't work in 64-bit programs: I could still switch to Cantonese input method, but the computer kept typing English when I tried to type Chinese in 64-bit programs (such as 64-bit IE). The situation is similar if I try to type Chinese in 32-bit programs with 64-bit TableTextService configured. I wonder if you have experienced similar problems before? Do you have any suggestions/ solutions? Is there any way for TableTextService to work on 32-bit/ 64-bit programs no matter which version (32 or 64 bit) of TableTextService I configured? Many thanks. Alex
Well, I actualy did cover this in the series describing the TableTextService, though admittedly it's a litle buried.
If you check Behold the Table Driven Text Service, Part 12 (The knights who say நீ, redux, #2), it has a bit in the middle that covers this, basically steps 2a and 4a....
The trick is that you have to register it twice -- in the two different directories with the two different DLLs.
Anyway, to repeat steps 2, 2a, 3, 4, and 4a:
2) Copy the text file to \Program Files\Windows NT\TableTextService on your machine (if the "Program Files" on your machine is another language, use that directory, do not create a new one!). If you drg it in you will be prompted to elevate:
so accept that and copy the file. When you are done it will look something like the following:
Though if you do not have MUI language packs installed those language directories will be missing and you might have some localized path chunks up in the breadcrumb bar.
2a) If you running on a 64-bit machine then you must also put a copy of this file in \Program Files (x86)\Windows NT\TableTextService on your machine (if the "Program Files (x86)" on your machine is another language, use that directory, do not create a new one!). If you skip this step on a 64-bit machine then you will not be able to use the input method in 32-bit processes.
You will get that same elevation prompt as the other directory if you drag the file in
and when you are done it will look something like this:
3) Open up an elevated command prompt. All you have to do is right click on the icon in the Start Menu or whereever to get this menu:
and select the Run as administrator option.
4) You will be put in the system32 directory. You need to navigate to the place you put the file in step 2, ande then run the command to register the input method:
The command to run is
rundll32 TableTextService.dll RegisterProfile TableTextServiceTamil.txt
and it is somewhat case sensitive so do not experiment with the case unless you are being paid by me to test this (since I am paying no one this rules you all out).
You will then be prompted to OKAY this registration you just requested:
Say OK to this.
4a) If you are running on a 64-bit machine, you must also navigate to the place you put the other file in that elevated command prompt:
Run the same command again:
and then you'll get that dialog again:
Just OK it again.