Welcome to MSDN Blogs Sign in | Join | Help

It was Friday afternoon when Santhosh (Santhosh Pillai, aka THE Santhosh, the guy who helped us with the collation story for Malayalam way back when) was asking a question. The question was:

Hi:

Is there an updated version of this page http://www.microsoft.com/globaldev/keyboards/kbdinmal.htm available now that Malayalam has Atomic Chillus in Unicode 5.1?

Thanks
Santhosh

Interesting.... though of course the real underlying question should be more about the keyboard layout(s) in Windows -- the website is just a description of the set of layouts that are installed in the operating system.

Updating the web site (the actual website, I mean - not the old one we're talking about here!) is a separate matter, but that can't ever happen until/unless the thing the site is meant to be described is updated.

Now back in the old days, there was a time that I was one of the people Microsoft sent to Unicode Technical Committee meetings, one of the people who came back from those meetings working on how and when to make updates to Windows, sometimes the person who made the actual updates to the keyboard layouts, and always the person who checked in the final layouts to the product.

In those days, answering this question would have simply been an act of recollection -- remembering the salient details of

  • the updates in Unicode;
  • the plan for when to get the updates into fonts, character map, collation, and keyboards;
  • how the plan went.

but now things are different.

The Malayalam Chillu debate was going on strong while I was still involved with Unicode, though no final decisions had been made. And whether or not a need existed to include atomic characters for these entities was a fairly central question that would have to be solved before anyone discussed what product changes would be needed and when.

But other people were minding those stores, I was doing other things.

So to answer Santhosh's question, I did it the old-fashioned way - I looked at the product to see what was there.

First, I started in Character Map. I knew the fonts would be updated (Peter Constable was the one who explained to me how the Chillus worked way back when this all first started in Unicode years ago), so I wanted to look and see if there were any characters that were in the font but not in the Charmap list of names, like this one:

Indeed, there are 17 of them if you include the atomic Chillus, the ones added for Sanskrit, the symbols, the signs, and so on:

  •  ഽ U+0d3d MALAYALAM SIGN AVAGRAHA
  •  ൄ U+0d44 MALAYALAM VOWEL SIGN VOCALIC RR
  •    ൢ U+0d62 MALAYALAM VOWEL SIGN VOCALIC L
  •    ൣ U+0d63 MALAYALAM VOWEL SIGN VOCALIC LL
  • ൰ U+0d70 MALAYALAM NUMBER TEN
  • ൱ U+0d71 MALAYALAM NUMBER ONE HUNDRED
  • ൲ U+0d72 MALAYALAM NUMBER ONE THOUSAND
  • ൳ U+0d73 MALAYALAM FRACTION ONE QUARTER
  • ൴ U+0d74 MALAYALAM FRACTION ONE HALF
  • ൵ U+0d75 MALAYALAM FRACTION THREE QUARTERS
  • ൹ U+0d79 MALAYALAM DATE MARK
  • ൺ U+0d7a MALAYALAM LETTER CHILLU NN 
  • ൻ U+0d7b MALAYALAM LETTER CHILLU N
  • ർ U+0d7c MALAYALAM LETTER CHILLU RR
  • ൽ U+0d7d MALAYALAM LETTER CHILLU L
  • ൾ U+0d7e MALAYALAM LETTER CHILLU LL
  • ൿ U+0d7f MALAYALAM LETTER CHILLU K

Then I handed each one to LCMapString one at a time. None of them had weight in Vista but all of them have an assigned weight in Windows 7 (some as numbers, some as symbols, some as letters -- kine of what you might expect by looking at the list).

Okay, good so far -- just no updated character list in Character Map. Unfortunate, but hardly tragic, as the sadness over not seeing the name in the lower left hand corner of that dialog is quickly mitigated by the character's presence in the font itself! :-)

The keyboard story was less fortunate.

I loaded up the one and only keyboard layout in MSKLC:

took a quick look at the keyboard, and then saved it out as a KLC file looking at code points in case I missed anything.

They aren't there.

Oops.

My first reaction was that somebody must have messed up, been asleep at the switch, etc.

But then I realized that was how everyone felt whenever they came to me because of something they perceived as an omission or bug. Knowing more of the underlying infrastructure does not make me any more psychic than the people who used to come to me -- I cannot read the minds or intents of the owners.

Maybe the update was not so easy to do. The Character Map thing is an obvious omission, but that is just a small bug on someone to get it updated.

The keyboard layout is the complicated one, of course. The layout is based on the INSCRIPT standards coming out of India, and although adding the letters would not have been unreasonable, there are two sides to that story and there may well have been reasons not to add them, too.

Collation beyond the "some weight" question is an interesting one; ideally it would be handled with equivalences the way we did Romanian with the comma below/cedilla below.

Grabbing the table from the Unicode 5.1 update:

Table 1. Atomic Encoding of Chillus


 
Visual Representation in 5.0 and Prior Preferred 5.1 Representation
1 CHILLU_NN.png NNA, VIRAMA, ZWJ
(0D23, 0D4D, 200D)
0D7A MALAYALAM LETTER CHILLU NN
2 CHILLU_N.png NA, VIRAMA, ZWJ
(0D28, 0D4D, 200D)
0D7B MALAYALAM LETTER CHILLU N
3 CHILLU_RR.png RA, VIRAMA, ZWJ
(0D30, 0D4D, 200D)
0D7C MALAYALAM LETTER CHILLU RR
4 CHILLU_L.png LA, VIRAMA, ZWJ
(0D32, 0D4D, 200D)
0D7D MALAYALAM LETTER CHILLU L
5 CHILLU_LL.png LLA, VIRAMA, ZWJ
(0D33, 0D4D, 200D)
0D7E MALAYALAM LETTER CHILLU LL
6 k undefined 0D7F MALAYALAM LETTER CHILLU K

Ok, looking at the weights of the first five entries in that table:

old way weight new way weight
U+0d23 U+0d4d U+200d 3a 77 01 01 01 01 00 U+0d7a 3a 72 01 01 01 01 00
U+0d28 U+0d4d U+200d 3a 8b 01 01 01 01 00 U+0d7b 3a 86 01 01 01 01 00
U+0d30 U+0d4d U+200d 3a a7 01 01 01 01 00 U+0d7c 3a a6 01 01 01 01 00
U+0d32 U+0d4d U+200d 3a af 01 01 01 01 00 U+0d7d 3a aa 01 01 01 01 00
U+0d33 U+0d4d U+200d 3a b3 01 01 01 01 00 U+0d7e 3a ae 01 01 01 01 00

They don't match. I'd have to see what else is in the Malayalam table to know if it is only the equivalence that wasn't done (there might be actual ordering issues also) but I can't tell for sure (I have my hands full trying to learn Tamil and Bengali!). Offhand the weights never look to far from each other, so perhaps it was just a conscious decision to not support the equivalance....

I honestly don't know the answer to any of the questions I posed above, but I can probably ask a question or two of some people next week (post re-org I'm not 100% sure owns all this stuff now, so it could take me some time to track down who to ask!).

But either way there are at a mimimum a few bugs that I found in all this; I'll talk to some testers I know down the hall about those ones even sooner.

As I said in the title, the two most important components of letting go and moving on are (1) letting go and (2) moving on. But I'm likely to get curious now and again about how things are going....

The question from Jim was:

I am troubleshooting an issue with Vista SP2 where the common dialogs seem to be failing – Notepad – File/Open gives:

“Not enough memory available to complete this operation. Quit one or more applications to increase available memory, and then try again.”
Wordpad and other apps just never bring up the browse dialog box and seem to fail silently.  It makes it hard to troubleshoot since their email cant attach any files to send me data.

Is this likely a 3rd party shell extension?   Any other ideas?

Thanks,
Jim

My initial thoughts when I first saw the issue, days into it, were almost Darth Vader like, since I felt something that I had not felt since... etc., you get the picture.

The problem was tracked down to a bad file in the image: a bad comdlg32.dll.mui.

This trackdown was hella complicated and took a bunch if people trying a bunch of different tools.

And then all the pieces fell into place, as I remembered my Random irreverent thoughts about the Ultimate Fallback, which points to Mark Russinovich's The Case of the Notepad that Wouldn't Run.

This new bug was just a recycled version of that same old one, the one that Erik has told me point blank and period is by design, because English is just another language.

But in this world we live within, this world where Microsoft is a company in Redmond, WA, in the USA, trying to proclaim English as being just another language seems a little naive, doesn't it?

I mean, in this world where the fact that the region and the language of just about everything we produce will bear the indelible mark of its main source, this policy's real impact is just to add a few more English bugs to the mix -- bugs like the one Jim reported.

Bugs where the [almost] never used, [almost] never tested code paths of core resource failures are tested, so the veritable world of responses in code are unleashed upon an unsuspecting populace for the occasional random scare like this one, where even the bug reporting tools (tools which encourage attachments) can be stymied just like they were here.

Where not everyone hads all of knowledge and of the tools we have here, either.

It all makes me wonder once again whether the DEVDIV solution that would have some ultimate fallback there isn't just a smidge better.

Well, what do you think?

I've had more and more people asking me lately about my symptoms -- just what are they, people wonder.

My guts tell me that this is not something to write about, but after 39 years on this planet I've decided that if I have learned anything, it is that my guts really have shit for brains. If you know what I mean.

So her we go, covering the current major symptoms.

There are basically four of them that fall into the "affects my life" category. Which is the category I am going to stick to for now.

First, I've lost proprioception (joint position sense) in my legs. This means that I can put them somewhere but I can't tell where they are by any other means. So I have to trust that they will be where I put them and of course the corollary to that is that you have to remember where you put them. Sometimes I also get the symptom in my arms too, though this is not as worrisome. Summary: I have to trust where I put my legs and feet.

Second, I've got a foot drop on the left side, which means that I place my foot somewhere and it can decide to let gravity assist and pull it down a bit farther than I thought. The workaround is easy enough -- just make sure you lift the foot higher. But it makes the walk look dorky. Summary: I cannot trust where I put my left foot.

You can no doubt see the conflict in these two items already. :-)

Third, I have disequilibrium. This is not like vertigo as I feel nothing spinning -- I just tend to see the ground coming toward me or falling away from me, without warning. I always have it somewhat and sometimes it can be extreme. I have had this symptom for years and in fact it is the one I originally got a cane for (I'd rather that people who saw me fall in the daytime assume I am a gimp, not a drunk!), the one that can at times be the most difficult and debilitating. Summary: I can't always tell where I am in relation to the ground.

Fourth, I get fatigue. Now even at its worst this is not a mental fatigue, it is just as if the body decides it has had too much then it shuts things down. This is a symptom I first had over a decade ago in Amsterdam and have a great memory of Richard Campbell and Stephen Forte carrying me when I couldn't walk any more. I've usually tried to avoid the situation since then and the only time it seems to happen these days is after office moves at Microsoft (even with assistance some maximum gets reached and I am basically on my butt for a few days). Summary: There are times I can't do anything.

Now the iBot has taken many of these items and made them much harder to measure or sometimes even notice. And as I was discussing with my neurologist just the other day, this frustrates the engineer in me that wants to measure things to see how they are doing. The iBot has in many ways ruined a significant piece of my neurological care, since I only see her once every six months and she gets only brief snapshots into how I am doing.

Though when I compare battery life in normal situations vs. extreme ones (in this case a Moby concert where I chose to rock out the drain is a little more. So more precise measurement of the battery charge would allow me to get back the kind of more precise understanding of how hard the iBot has to work to keep me balanced in some of these  situations. Though at the moment they do not seem too disposed to releasing that information or how to easily get it, unfortunately.

The Moby concert was an interesting case, as I was rocking out quite a bit in place (as Cathy and Kevin who I ran into and ended up hanging out with can attest) since it did almost edge into fatigue at the end of the performance - it just shows how much movement can impact fatigue, with the movement being outside the control of the iBot.

In any case, in the end I am happier to be doing well than to obsessively think about disease course (even this blog you are reading which makes me think about the symptoms I no longer deal with as a huge issue on a daily basis) is unusual since I no longer have to think about it, really. I like a machine that takes all four of my debilitating symptoms and makes them either irrelevant and in as very real sense does it by using technology to replace the functionality my body used to use.

Getting back to the office move issue -- I am giving up my four computers, I am slowly moving books home, doing all I can to minimize the issue of the office move by making it a small office move. And I'll miss the extra machines just like people who used to borrow books will miss the library (as I will), but a man has got to do what a man has got to do. Although pushing the fatigue limits has no impact on long term curse of the disease, those days off are hardly pleasant and I am incredibly unhappy about them, just as I am always unhappy about gratuitous moves....

The question was deceptively simple:

Hi,

I used all three and I find ToLower() to be fastest .But the Msdn article says that ToLowerInvariant should be faster. http://msdn.microsoft.com/en-us/library/system.string.tolowerinvariant(VS.85).aspx .
Which one is better with respect to performance.

Results are ,

173ms ToLower
260ms ToLowerInvariant
265ms ToLowerCultureInfoInvariant

The results seem okay to me, it is the link that is off a bit....

It is a matter of where/how you aim expectations.

You see, starting in .Net 2.0 (and subsequent versions), "invariant" casing actually means "use the operating system casing results" and also "don't do those weird linguistic results".

Thus just using ToLower() will cause you to miss hundreds of characters being mapped, and it will map some characters likely best left alone.

Performance is one reasonable axis to use for the judging of results, but correctness is, too, right?

The ad campaign "ToLower may be wrongest, but at least it can be fastest!" is unlikely to be as effective as that 5-year old using Windows 7. :-)

Horrendously, stupendously, tremendously off-topic!!!

I do not believe in discrimination.

It is I think fairly important to point out this simple fact today. You'll see why in a moment. And you might want to take me to task by the time you're done reading so I just wanted to put that out there.

Now this is true regardless (and irregardless!) of gender, race, religion, creed, sexual preference, height, sexual orientation, disability, political beliefs, philosophical beliefs, weight, opinions about Microsoft, opinions about me -- really beliefs or opinions about anything.

I do not believe in discrimination.

With that said, there are some things that simply can't work.

I, for example, as someone in a wheelchair (albeit a cool one, an iBot!) would not make a good fireman. And Bob help the family that would be relying on a gimp like me to save your home from burning to the ground as this is just something that I would not be able to do effectively.

I haven't actively wanted to be a fireman in (as far as I can recall) a good 34 years, so clearly I have made my peace with this.

But even if I had not, and somehow still dreamed of passing that exam and riding in a hook-and-ladder, in my humble opinion it would not be an act of discrimination if whatever reasonable tests for job fitness proved me to be inadequate for the job. It would be an act of responsibility for the ones testing and an act of mercy toward that family, the one that might now live through the fire at their place.

Now why do I mention this?

Well, because the other day I took a shuttle.

A shuttle from building 80-something-or-other (I forget where I was, it isn't relevant anyway) to building 9. A wheelchair-accessible shuttle, since I am in the iBot and all. And the driver was simply unable to fit comfortably in the spaces between the iBot and the walls of the shuttle, yet insisted on getting all four tie-downs in place.

I did not overly criticize this, but I did point out that the chair can balance itself whether it is tied down or not but was scolded down off this point quickly because of the rules.

Fair enough.

Soon enough the destination is reached; I remove the one tie-down that I can reach and wait for the other three that I cannot to be removed. Two of them are removed with minor struggle, but the last one cannot be reached.

I am directed by the driver to try and move the chair in all kinds of directions to try and make the tie-down reachable by someone who is literally unable to fit within the space in question, to the point that it is struggle against a tether that causes it to start emitting warnings of overheating - fiorst yellow lights, then the dreaded red light. I have to turn it off and back on to reset it, and now I have warnings to clear after I am free and back on the ground.

Finally, I stay stop. STOP.

I unbuckle, I slither to the floor of the shuttle, I stretch with my unslender frame which nevertheless is able to reach the tie-down release, and release it.

Then I struggle back into the chair.

I say nothing as I am then lowered via the lift down to the curb. I bite my fucking tongue on the thank you that I know will quickly descend to a brutal tongue lashing about the entire experience. And I simply do not feel like putting that out into the universe. At least not to the driver.

Well, I am doing it now. But not for consequences to the driver. I am not going to complain to the shuttle folks -- I am going to vent about it here in my unofficial, personal blog.

I won't even say it is necessarily the overweight driver at fault; perhaps if they had sent one of the other shuttles with more space on each side, it would have worked out better.

I refuse to put myself in the uncomfortable position of requesting a driver who can fit in the space to take care of the tie-downs, or refusing a shuttle with such a person who shows up.

But I am not going to really want to be required to refuse to get in a shuttle that I do not like the configuration of, either.

And if I can help it, I am never, ever going to take one of the shuttles again for the rest of my time at Microsoft.

Because I'd rather rely on myself than risk another incident of potentially damaging the chair due to either personnel or equipment (both beyond my control) not being able to handle the situation.

Maybe I am just refusing to trust anyone who isn't me, and some may believe this indicates that I am discriminating against people who are not me in certain situations. But it actually just a strong desire to avoid being in a situation where I might feel uncomfortable enough to discriminate against someone else.

Maybe this makes me a villain. A situational misogynist, even.

Or maybe it is just that I am unwilling to be put in a situation where I might discriminate against someone....

(for those who are curious, I was able to clear the warnings -- there were two -- and the iBot is fine now)

Yesterday, Eran asked (via the Contact link):

Shalom,

I apologize in advance if this is the wrong way to report this issue. I've waisted two hours trying to figure out how to use Connect for this, to no avail. Being an occasional reader, contacting you directly was the only option I could think of.

So, without further ado:
GetKeyboardLayoutList( 0, (HKL*)"garbage" ); works correctly on all 32 bit versions of Windows I've tried (XP, 2003, Vista), and when run from a 64 bit executable it also works on 64 bit versions of Windows (2003, 2008). However, when run from 32 bit executables on 64 bit windows (2003, 2008) it fails with a "Invalid access to memory location" error. If a NULL is passed instead of the garbage, the call succeeds on all bitnesses and OSes.

In the MSDN docs, nothing is said about the second parameter when the first is 0. This is the first time I come across incompatibility between WOW64 and the equivalent 32 version of Windows. And it breaks a feature of our product.

I will highly appreciate any response - be it a direction to the appropriate address, an acknowledgment of the bug, or any other advise.

Thank you for your time,
Eran

Sometimes, like the title says, the problem is the premise.

In particular I will yield to Raymond's Basic ground rules for programming - function parameters and how they are used, which is in my view the definitive way to look at the contract here.

And while there are nuances when differences exist between 32bit and WOW64, these nuances are caused by the same kind of problem that comes from people who don't use CallWindowProc but instead call window procs directly -- people who then run afoul of the ANSI/Unicode layer requirements.

So, is it a bug if the WOW64 layer has always behaved this way? Saying NO does mean resigning to a certain kind of backward compatibility break in the whole "run 32-bit apps on a 64-bit OS" scenario, though the defense of passing bad parameters is one that in my view is fairly indefensible just to allow one to avoid initializing a buffer (or more likely to avoid causing one to change one's existing code to initialize a buffer).

Though of course your mileage may vary....

Jim's question from a few weeks ago was:

I’m being given a Unicode string and I need to determine if it will render cleanly using the system font (not displaying any blocks or “non-supported-glyph” symbols). I’ve tried using ScriptGetCMap() and GetGlyphIndices(), but both of these flag a character like 0x0C60 as not having a glyph – although it’s actually composed of multiple glyphs and it does render properly.

Our product allows an administrator to push policy to client machines, which includes a custom message to show in a notification balloon. The administrator can enter the text on a console and might include Japanese characters, for example, and that text gets pushed to a bunch of clients, some of which can’t display those characters. The client software is supposed to display the custom message if possible (no blocks displayed) and fall back to a built-in message if the custom message won’t display correctly.

Any pointers to APIs or sample code that will accurately determine if a string can be drawn?

Sound familiar?

Well, it should!

At first glance, it is the same problem discussed in Is that character in the font or isn't it?, and that is a blog that is chock full of potential solutions!

Unfortunately, Jim's question adds one element to the problem, one new wrinkle.

And that is to also try and figure out the problem for any fonts that the system might map to via linking/fallback/substitution, etc.

And this does not exist.

To be honest, it isn't actually a problem that is worth trying to solve. As Michael Warning pointed out in that thread:

This unfortunately is a really hard problem.  And the answer will be different depending on the text stack you’re using (GDI, GDI+, DWrite).  The problem is that each stack has a different set of rules for font fallback – how it automatically changes fonts around when it encounters a character that isn't’t supported in the font you asked for.

Now Michael is thinking about the macro problem -- the complexity of all of the different models and trying to deal with how improbable it would be to capture all of these differences in code.

But to be honest the micro problem (looking at any one of these technologies) is still pretty complicated -- the kind of project where one will almost certainly fail, in the end.

So what can we do?

Well, the answer I would suggest will have to wait for the next blog.... :-)

I do try my best know what it is, and where it's at, as this makes me seem more in touch with things, you know?

The other day, Joe asked:

A friend of mine asked me the following question and I don't know, so I thought I'd see if anyone in here had an idea.

"For some reason I get a list of font names beginning with @ in my font selection dialog (CFontDialog) on Vista, these fonts don't work correctly if I use them, any idea what they are?"

Search engines don't seem to let you search for @ so it makes searching for a solution rather difficult.

Wow, where to start, huh?

Well obviously I could talk about the vertical fonts and point to blogs like Let's get vertical and Rotate it when vertical? or even the more memorable ones for me like Expertise isn't always everything (aka When the one who is learning teaches us something important) -- the great @Arial blog! -- that were great professional relationship forming things (I still work with that team and they remain kick ass and cool with their simultaneously naive yet insightful take on issues!).

But I've already talked about that.

I was actually thinking about Tod Neilsen when I saw this, if you must know.

A former marketing VP at Microsoft who I have known for years via MS Access, he wrote a very nice foreword for my book (other, less important things he has been doing: he has also been an Oracle VP and a Crossgain principal and a Borland CEO and now runs VMWare).

One thing people don't tend to think about in association with him is how he managed to inspire fundamental changes in search engines.

Under his watch as the VP of marketing in the Developer Division they took the awful NGWS (Next Generation Web Services) message from Steve Ballmer's first CEO address back in 1999 and transform it into the language that became known as managed code: the .NET Framework, the C# language, and so on.

Now nearly 10 years later one has to wonder if the thought that the search industry had to change some of its fundamental algorithms to properly distinguish conversations about .Net from the second most popular generic top level domain or add the hash mark/pound sign (#) to mean something much more prominent than it was so as to pick up on C# helped Tod smile now and again as all the Crossgain hoopla was going on. :-)

As mischievous as all of that may seem from the other end of the telescope (ooooh! song title!), Joe's question about the @ fonts had me realize how lame all of the current Search technologies are about the @ sign -- both Bing and Google suck as they ignore it in a way that cannot be escaped or overridden -- a fact that really delayed the time before anyone was able to get good information about the vertical font feature.

And as Joe's experience shows, this experience still kind of blows. In almost every search engine.

Now I often complained in the past that neither Live (now Bing) nor Google really handles my blog all that well -- ref: Google doesn't seem to get blogs and others -- and they still kind of suck in some important ways.

But the fact that they can't handle the @ really proves that they don't understand Twitter either, given its fundamental importance there.

Now Unicode has on at least one occasion had to dissuade in response to a proposal a particular language's use of the @ as a letter in their orthography given the strong usage of it as a symbol and its behavior in search engines, which makes it all the more ironic that Twitter could succeed where even email addresses and teenage IM habits have failed -- to force a linguistic meaning on the @ sign!

So perhaps one day they will get their heads out of their collective asses and fix this problem -- and with luck that will mean that search engines might finally fix this 15 year gap in searching for information about vertical fonts in Windows and finally these stodgy relics who try to be so hip and cool might break the generation gap enough to understand text messages. :-)

 

Disclaimer: I once had a woman break up with me because I wasn't texting her enough, though I don't think is influencing my opinions here; I was texting her plenty but apparently I would only text her in response to her texting mne; I simply never initiated in this ares. Fair enough, and she had a point. It is hard to change habits that were formed in an age where only drug dealers and doctors (who are also drug dealers when you think about it!) had cellular phones and thus no one was texting yet.

The other day, in "What kind of soup?" is not exactly a soup question, is it?, I mentioned that I might have a technical example of the issue of

Not exactly a soup question, is it?

so think of this blog as me finally getting around to doing that.

It has to do with triage.

World-ready triage, to be precise.

It is a group that in most cases met twice a week, and worked to go through every bug in Windows that had some kind of globalization/localizability/international kind of issue and give a recommendation on how important it was to fix it, and by when.

It was pretty important in terms of the fact that a "must fix" recommendation could not be ignored, and it dovetailed nicely into my actual work of assisting other teams with their globalization/localizability/international issues since often a team that did not know exactly how to fix such an issue would benefit from someone who could work with them on how they could!

But I am not going to talk about the fixing so much in this blog.

This blog is about the triage.

The group included experts in a wide number of specialties -- development, test, and program management, for one. But also people specializing in lots of different areas, like:

  • bidirectional scripts
  • globalization
  • localizability
  • pseudo localization
  • geopolitical issues
  • localization

and so on. The number of people would vary from me meeting to meeting, with bugs sometimes skipped to the next meeting if the best people to look at a particular bug report weren't in the room.

A tight little group, very efficient in almost every way.

Ironically, the one place they sometimes fell short was due to the very thing that got them the seat at the table -- their various/varied ares of interests and areas of expertise!

Because they had those interests, they would often be interested in bug details such as looking deeper into the description, checking out provide screen shots, asking for more information, and so on.

Here is the kicker -- they would want to do some of these things even if it would in no way change the recommendation of triage.

And if one measures efficiency of a triaging group in terms of how fast they go through bugs (so that they can get through more bugs in a meeting) then the fact that members would so often ask questions not relevant to everyone in the room -- that were not soup questions -- could really affect that efficiency.

Now I myself was at times guilty of the same problem, and it was an effort of will to remember that

If it isn't a soup question for the meeting to keep it out of the meeting!

Now with Windows 7 out the door and me working on something else now (though I do not know what, yet!), I my not be, in fact probably won't be, in that meeting anymore.

But I do know that if I were I would want to be better to keep those non-soup questions the hell out! :-)

Apologies for the title (note to self: never author blog titles under the influence to try to appear as a cunning linguist!)

So it was just the other day that Yong asked:

Ok, so it looks like we got a regression (or a design change) on Vista/Windows 7 from Windows XP.

On XP/W2K3:
============Start of regopts.txt============
[RegionalSettings]
InputLocale = 0409:00000409,0404:E0020404
============End of regopts.txt============
// Just having this adds for example the Chinese Traditional (ChangJie) keyboard.

On Windows Vista SP2/W2K8 SP2
============Start of regopts.xml============

<gs:GlobalizationServices xmlns:gs="urn:longhornGlobalizationUnattend">
  <!-- User List-->
  <gs:UserList>
    <gs:User UserID="Current" CopySettingsToSystemAcct="true" />
  </gs:UserList>
  <!--System locale-->
  <gs:SystemLocale Name="zh-TW"/>

  <gs:InputPreferences>
    <!--en-US-->
      <gs:InputLanguageID Action="add" ID="0409:00000409" Default="true"/>
    <!--zh-TW-ChangJie-->
      <gs:InputLanguageID Action="add" ID="0404:E0020404"/>
  </gs:InputPreferences>

</gs:GlobalizationServices>
============End of regopts.xml============
It fails with:

Unexpected Failure.  Unsupported parameter.

On Windows 7/W2K8 R2.
It fails with:
Event ID: 10008
Source: International
Error while changing keyboard/input method for "0404:E0020404".

This is one of those architected backcompat breaks that was put in -- GUIDs were now needed, to replace the "fake" KLID values of prior versions that would forward to the appropriate Text Services Framework TIPs (which had been around for several versions, often atop the same KLID values that the older IMM based variants of IME that they replaced used to be on).

It amazed me that after all this time no one had still seemed to have published the list of the GUIDs so that people could replace existing scripts!

In fact, no one had asked me if such a list existed, really.

Which is odd since that is the sort of question I do tend to get a lot.

Anyway, I thought I would just take care of that now.

Here is the big table, with the old and new values:

LANGID XP/Server 2003 KLID Language/Script Input method description Vista/Windows 7
0411 00000411 Japanese Japanese keyboard No Change
0411 E0010411 Japanese Japanese IME {03B5835F-F03C-411B-9CE2-AA23E1171E36}{A76C93D9-5523-4E90-AAFA-4DB112F9AC76}
0412 E0010412 Korean Korean IME {A028AE76-01B1-46C2-99C4-ACD9858AE02F}{B5FE1F02-D5F2-4445-9C03-C568F23C99A1}
0412 00000412 Korean Korean keyboard No Change
0404 00000404 Traditional Chinese US Keyboard No Change
0804 00000804 Simplified Chinese US Keyboard No Change
0404 E0010404 Traditional Chinese Phonetic {531FDEBF-9B4C-4A43-A2AA-960E8FCDC732}{761309DE-317A-11D4-9B5D-0080C882687E}
0804 E0010804 Simplified Chinese QuanPin {E429B25A-E5D3-4D1F-9BE3-0C608477E3A1}{54FC610E-6ABD-4685-9DDD-A130BDF1B170}
0404 E0020404 Traditional Chinese ChangJie {531FDEBF-9B4C-4A43-A2AA-960E8FCDC732}{4BDF9F03-C7D3-11D4-B2AB-0080C882687E}
0804 E0020804 Simplified Chinese ShuangPin {E429B25A-E5D3-4D1F-9BE3-0C608477E3A1}{EF63706D-31C4-490E-9DBB-BD150ADC454B}
0404 E0030404 Traditional Chinese Quick {531FDEBF-9B4C-4A43-A2AA-960E8FCDC732}{6024B45F-5C54-11D4-B921-0080C882687E}
0404 Did not exist?!? Traditional Chinese New Quick {531FDEBF-9B4C-4A43-A2AA-960E8FCDC732}{0B883BA0-C1C7-11D4-87F9-0080C882687E}
0804 E0030804 Simplified Chinese ZhengMa {E429B25A-E5D3-4D1F-9BE3-0C608477E3A1}{733B4D81-3BC3-4132-B91A-E9CDD5E2BFC9}
0404 E0040404 Traditional Chinese Big5 REMOVED
0404 E0050404 Traditional Chinese Array {E429B25A-E5D3-4D1F-9BE3-0C608477E3A1}{D38EFF65-AA46-4FD5-91A7-67845FB02F5B}
0804 E0050804 Simplified Chinese NeiMa REMOVED
0404 E0060404 Traditional Chinese DaYi {E429B25A-E5D3-4D1F-9BE3-0C608477E3A1}{037B2C25-480C-4D7F-B027-D6CA6B69788A}
0404 E0070404 Traditional Chinese Unicode REMOVED
0404 e0080404 Traditional Chinese New Phonetic {531FDEBF-9B4C-4A43-A2AA-960E8FCDC732}{B2F9C502-1742-11D4-9790-0080C882687E}
0404 e0090404 Traditional Chinese New ChangJie {531FDEBF-9B4C-4A43-A2AA-960E8FCDC732}{F3BA907A-6C7E-11D4-97FA-0080C882687E}
0804 E00E0804 Simplified Chinese PinYin {81D4E9C9-1D3B-41BC-9E6C-4B40BF79E35E}{F3BA9077-6C7E-11D4-97FA-0080C882687E}
0404 E01F0404 Traditional Chinese Alphanumeric (ABC) {81D4E9C9-1D3B-41BC-9E6C-4B40BF79E35E}{FCA121D2-8C6D-41fb-B2DE-A2AD110D4820}

A few quick words about this table.

  • Yes, the three input methods based on having a code point (Big5, NeiMa, and Unicode) were removed;
  • And yes, the four keyboards stay in as they were in prior methods, unchanged;
  • The rest of the input methods are now associated with two different GUID values -- the first identifying the TIP engine and the second identifying the language profile under that TIP engine.

Note that this information is mostly useless to you but does explain why all of the IMEs that use TextTableService.DLL have the same GUID for the first one -- you can use this imnformation to sound particularly impressive at a client site, by the way. :-)

The two GUIDs are used the same way that the KLID values used to be used. Thus for Yong's case,

0404:E0020404

becomes

0404:{531FDEBF-9B4C-4A43-A2AA-960E8FCDC732}{4BDF9F03-C7D3-11D4-B2AB-0080C882687E}

Anyway, sorry I never printed this list before; I did mean to but never got around to it. And then I forgot. :-(

Hopefully this will be of user to people, going forward!

Sometimes an implementation makes a certain feature impossible.

Like the way Microsoft does collation, in particular the way its DEFAULT table is implemented (a flat DWORD table for everything 0x0000 to 0xFFFF) means that you can't ever have compressions in the default table.

Could the implementation be expanded to allow for this feature, so that more languages could be a part of the default table?

Certainly.

But the current implementation has no solution here to the problem.

Now the Unicode Collation Algorithm does not define such a limitation, they allow compressions (they call 'em contractions) in their DUCET (what they call their default table).

Thus questions like Doug Ewell's are obvious ones to ask:

The announcement of the Public Review issue stated:

1. The data files contain weights for all new assigned characters.
      b. The ordering for Tamil and Malayalam has been improved,
         but would still need tailoring for the Tamil and Malayalam
         languages.

I guess I'm puzzled why the default order for these two scripts wouldn't match the overwhelmingly dominant language written in those scripts.  It's often stated that the default ordering for Latin also isn't appropriate for any language, but that's more understandable since so many languages are written in Latin.

I don't claim to be an expert in either Tamil or Malayalam.

So why don't they just put everything in the default table to make it better for languages that have no need of the "dumber" version for these letters?

Why not, indeed!

Well, this is described in the UCA in section 3.2 Default Unicode Collation Element Table:

The Default Unicode Collation Element Table does not aim to provide precisely correct ordering for each language and script; tailoring is required for correct language handling in almost all cases. The goal is instead to have all the other characters, those that are not tailored, show up in a reasonable order. In particular, this is true for contractions, because the use of contractions can result in larger tables and significant performance degradation. While contractions are required in tailorings, in the Default Unicode Collation Element Table their use is kept to the bare minimum to avoid such problems.

In the Default Unicode Collation Element Table, contractions are required in those instances where a canonically decomposable character requires a distinct primary weight in the table, so that the canonically equivalent character sequences are also given the same weights. For example, Indic two-part vowels have primary weights as units, and their canonically equivalent sequence of vowel parts must be given the same primary weight by means of a contraction entry in the table. The same applies to a number of precomposed Cyrillic characters with diacritic marks and to a small number of Arabic letters with madda or hamza marks.

Contractions are also entered in the table for Thai and Lao logical order exception vowels. Because both Thai and Lao both have five vowels that are represented in strings in visual order, instead of logical order, they cannot simply be weighted by their representation order in strings. One option is to require preprocessing of Thai and Lao strings, to identify and reorder all logical order exception vowels around the following consonant. That approach was used in Version 4.0 (and earlier) of the UCA. Starting with Version 4.1 of the UCA, contractions for the relevant combinations of Thai and Lao vowel+consonant have been entered in the Default Unicode Collation Element Table instead.

Those are the only two classes of contractions allowed in the Default Unicode Collation Element Table. Generic contractions of the sort needed, for example, to handle digraphs such as "ch" in Spanish or Czech sorting, should be dealt with instead in tailorings to the default table -- in part because they often vary in ordering from language to language, and in part because every contraction entered into the default table has a significant implementation cost for all applications of the default table, even those which may not be particularly concerned with the affected script. See the Unicode Common Locale Data Repository (CLDR) for extensive tailorings of the DUCET for various languages, including those requiring contractions.

Kind of says it all. There is a strong desire to not slow down for everyone's results just to help specific languages -- a tailoring for those languages just ends up being a better option overall, from the point of view of the people who write the spec for the algorithm.

Microsoft takes it a step further by not even allowing these exceptional cases in the default table; the only one that is really fascinating is the Thai case as it has an interesting story that I'll talk about another day (tomorrow, maybe?).

Now with all that said, there are times that I simply do not buy either Microsoft's or Unicode's argument, mainly when doing the design for a language that big companies are unlikely to ever provide tailorings for in their software implementations -- in such cases, putting the entries in the default table if it were possible (for Microsoft) or desirable (for Unicode) would mean no support required to make these languages work in a LOOT of places. And it would be nice for there to be a way to provide optimal support for as many people as possible.

Say if Microsoft had a "bonus default table" one could opt into that would contain all compressions that would go into the default table, if possible.

Unicode could solve the problem the same way, with a general purpose tailoring designed for everyone except when the extra performance benefits of its absence made it essential (if Unicode had this they might even be able to pull out some of the ones they have in there now!)....

Disclaimer: I am not an expert or even an inspired amateur in the financial world, and am not claiming to be here.

I was thinking about Las Vegas the other day.

Not the actual Las Vegas but the one from Oceans 11, which somehow seemed more real than the one of Oceans 13.

Anyway, in the movie the Nevada Gaming Commission requires every casino to have cash to cover every chip in play on the gaming floor.

I don't know if this is required in real life, though I'm gonna guess that it might not.

But then I thought about articles like this one, with interesting bits like:

“There is less leverage in the entire financial system,” said David A. Viniar, Goldman’s chief financial officer. At Goldman, $1 in capital now supports about $14 in loans and investments, compared with $24 a year ago.

Now obviously one can lose in Vegas, I have done it myself on occasion (it is why I no longer gamble in Vegas at all, actually, other than sometimes in the choice of event or party I go to!).

And obviously casinos can make money.

But what that notion of requiring a casino to have cash on hand to cover its chips does is guarantee for the people playing that if they win they will never lose anyway by being unable to redeem their chips when they are done.

Note that the financial industry has a safeguard to protect those people too, and themselves -- they have the government and the taxpayers to bail them out when they make mistakes, when they give out more chips than they can cover with their cash on hand.

The (fictional?) Vegas idea seems safer, because in that world the mistakes a player makes are the source of problems for the player, and not the mistakes of the casinos.

I never before thought of Las Vegas as being a safe bet, though in this aspect betting on a casino's ability to cover its losses is safer than betting on a brokerage house.

A regulation like requiring a casino to have the money to cover the chips has an interesting consequence, doesn't it? It means that the casino isn't gambling with the player money trying to spread the risk enough so that they never lose it all.

In other words the casinos would not be able to act like these financial institutions do.

There are differences like the odds ultimately favoring the house in Las Vegas and such. But with government propping them up, seems like the house always wins in the financial industry, too.

Of course there are nuances here that I am almost certainly missing and friends of mine like Monica are certain to talk my ear off about how I am comparing apples to bicycles.

But the notion of Vegas feeling safer is a hard one to shake, especially given the real lack of stories of casinos going out of business and customers being unable to cash in their chips, and the real plethora of stories like that in the banking industry that taxpayers are paying billions to cover.

So forget about the financial industry. I won't even do the safe bets like Vegas....

Microsoft is a company based in Redmond, Washington, in the United States of America.

Yes it is a world wide company.

Yes almost 60% of its products are sold to customers who are not in the US.

Yes there are development centers around the world and in many of them code that is written there ultimately can end up in Microsoft products.

But ultimately, that original fact is inescapable:

Microsoft is a company based in Redmond, Washington, in the United States of America.

All it takes something like the DST 2007 snafu to get people to see it: a bug affects users throughout the world (including in the USA in places like Indiana) for over a decade with minimal help/work from Microsoft yet as soon as it affects Redmond too the push to fix problems and help users even have vice presidents and general managers and directors of Microsoft logging phone hours to help users and afterward there are numerous presentations about how each team dealt with the problem that all ignore the fact that they had been ignoring the problem all along for a decade.

I could give countless other examples but many are less well known and some might violate my NDA so for now you can trust me that there are other examples.

Now there is no shame in being a company based in Redmond, Washington, in the United States of America.

And I would not want imply otherwise either in a blog or in person.

Though there probably ought to be some shame involved in not realizing the pain one causes others (e.g. those other countries dealing with time zone issues for a decade, something I even not-too-gently but not-too-harshly chastised a couple directors about when that DST 2007 thing was winding down!).

Anyway, take the above as valid, if you don't then you may as well skip the rest of this blog and maybe even this Blog (since no relationship can really stay healthy when there is no trust!).

Did you know that any developer who is enlisted in the full sources for Windows (sources that include the compiler, linker, headers files, and LIBs as well as source) can build Windows?

It is true.

There are in fact developers in many parts of the world who work on Windows who have to do that very thing either occasionally or regularly. Or both.

Many people inside Microsoft have even given presentations about the strengths of such distributed development models and the advantages of being a company so large as to offer the opportunity of such models.

Now, for the other shoe to drop.

To build the full Windows product, all sources, you really must have a default system locale that will cause your default system code page to be 1252.

Such as US English.

The reason for this is that there are some source files that contain characters that are legal in cp1252 but in other code pages are either interpreted differently (incorrectly) or that will cause the build of those files to fail.

I ran across many of these as I was looking at code all over Windows and in most cases was not allowed to "fix" the problem as no one really saw it as a problem.

In almost every case I saw it was the same character (see Dumb quotes... or maybe they are just smart-ass quotes for which character it was) and the problem was in a comment.

A comment that was clearly created in an email written in Outlook using Word as the mail editor and then copied/pasted into the source.

Of course it is not a bug to make this mistake since it is not a bug to make a file unable to compile on another system locale.

Being a company based in Redmond, Washington in the United States of America, that just isn't a priority....

Now this is all well and good and is generally an internal issue at Microsoft that never impacts a customer in a way they would realize.

But if you look at recent version of the Windows SDK (formerly known as the Platform SDK), you may see an exception to this generalization.

First we'll look at the older version of the file in question, shobjidl.idl.

This one compiles everywhere.

The non-offending bit of the file, if you scroll down a bit, is:

// IShellFolder::CompareIDs lParam flags
//
//  SHCIDS_ALLFIELDS is a mask for lParam indicating that the shell folder
//  should first compare on the lParam column, and if that proves equal,
//  then perform a full comparison on all fields.  This flag is supported
//  if the IShellFolder supports IShellFolder2.
//
//  SHCIDS_CANONICALONLY is a mask for lParam indicating that the shell folder
//  that the caller doesn't care about proper sort order -- only equality matters.
//  (Most CompareIDs test for equality first, and in the case of inequality do
//  a UI sort.  This bit allows for a more efficient sort in the inequality case.)

Ok, see the problem?

That was a trick question, there is no problem.

Fast forward to a much newer version, like the one in the 6.1 and 7.0 SDK:

// IShellFolder::CompareIDs lParam flags
// *these should only be used if the folder supports IShellFolder2*
//
// SHCIDS_ALLFIELDS
//
// only be used in conjunction with SHCIDS_CANONCALONLY or column 0.
// This flag requests that the folder test for *pidl identity*, that is
// “are these pidls logically the same”. This implies that cached fields
// in the pidl that would distinguish them should be tested.
// Without this flag, you are comparing the *object* s the pidls refer to.
//
// SHCIDS_CANONICALONLY
//
// This indicates that the sort should be *the most efficient sort possible*, the implication
// being that the result will not be displayed to the UI: the SHCIDS_COLUMNMASK portion
// of the lParam can be ignored. (Before we had SHCIDS_CANONICALONLY
// we assumed column 0 was the "efficient" sort column.)
//
//

Ok, now we have a party.

We have a couple of those quote characters that don't exist on all code pages and in fact for Japanese represent a byte that is illegal to have by itself, which means it will not compile.

The long and short of it is if you have a Japanese system locale you can't use this .IDL file unless you munge the file to remove the bogus quotes.

Now I don't know of any devs who write either code or comments in Word, but getting an email containing an "updated comment to better explain this bit" seems pretty obvious and not at all uncommon to see (if you ignore the relative uncommonality of such updates).

Oops.

This oops is in a couple of Windows SDK editions and some of those that shipped in products like Visual Studio and in the not-yet-shipped VS 2010.

In fact, I don't think it will be fixed for VS 2010 since they ship an already shipped PSDK and there won't be an update they pick up before they ship.

Oops again.

Anyway, they're on it now, and this will get fixed at some point.

That fix will eventually end up everywhere.

If you hit this problem, maybe you will feel somewhat less unhappy knowing that people like me can hit this problem a bunch of times in a night if I do a full Windows build. So that I share your pain....

And we are still a company in Redmond, Washington, in the United States of America.

Regular reader Jan Kučera asked me via the contact link:

Hello Michael,
first time using this contact form, I hope I have chosen the most appropriate way for my question. :-)

I would like to ask if you have any plans attending or speaking at the Tamil Internet conference 2009 in Köln this October...?

Now generally speaking I'd say that Jan chose the ideal way for this type of question, but I have had a couple of other people ask about the conference and whether I would be there, so I thought I'd just blog about it anyway....

Indeed there is a Tamil Internet 2009 being held in Germany (a lot more info about it here and it is indeed being held in Köln, which is to say Cologne).

I had originally, when I first heard about the conference, consider submitting a fuller version of that Behind the Proposed Change to Tamil in Unicode presentation I did for Unicode (slides here) with more of the follow-up info and the interesting code chart update issues (like what happened and can keep happening in Unicode and  the one I did and the one Scott did on Wikipedia).

I imagined providing slides with both German and Tamil subtitles, or perhaps just separate German and Tamil versions of the slides. It is [perhaps not so] surprisingly easy to find volunteers to assist with such efforts, as I have discovered in many presentations I have done abroad in the past! :-)

Unfortunately, I can't even get funding from my company to fly down to San Jose for a standards meeting that Microsoft has an official relationship with; flying to Germany is something that I would be totally on my own for and I lack those kinds of funds.

Having been to a few of the previous conferences I know it would have been very interesting, and if an unknown wealthy uncle passed away leaving me a sack of bullion or I won the lotto next week I might be sending some urgent email to the conference chair begging for a last minute slot in which to do the talk.

However:

  • I have no relatives with sacks of gold lying around;
  • And I almost never buy lotto tickets (yes you have to play to win but by coincidence playing is the way to lose as well!);
  • Not to mention the lack of wealthy benefactors in this economy;
  • Plus there is a Halloween party about 1000 miles south of me that I definitely have to be at a week later - meaning there'd be no tourist time in Cologne.

So in the end, this is one I'll really have to sit out.

Perhaps some future conference will be in the US or Canada and an even more updated version of the talk might be in the cards for the future. My Tamil and Bengali learning continue (albeit slowly) and perhaps might even be a subject for a second interesting talk about what Unicode does to (slightly help but mostly hinder) language learning.

If you will be there then be sure to have a drink and if possible tell me when ahead of time so I can do the same from here!

It started with an expression.

One I got from a movie.

The name of the movie was Finding Forrester.

This is a movie I liked a lot, though this is about one thing in particular. The relevant dialog from the movie, between William Forrester (Sean Connery) and Jamal Wallace (Rob Brown):

William: You better stir that soup.
Jamal: What?
William: Stir the soup before it firms up.
Jamal: Why doesn't ours get anything on it?
{{William looks out the window through the camcorder he is holding}}
William: Come on. Closer. Now.
Jamal: You got someone doing that kind of yelling?
William: What I have is an adult male. Quite pretty. Probably strayed from the park.
{{William shows Jamal the image on the camcorder}}
William: A Connecticut warbler.
Jamal: You ever go outside to do any of this?
William: You should have stayed with the soup question. The object of a question is to obtain information that matters only to us. You were wondering why your soup doesn't firm up? Probably because your mother was brought up in a house that never wasted milk in soup. That question was a good one, in contrast to, "Do I ever go outside?", which fails to meet the criteria of obtaining information that matters to you.
Jamal: All right. I guess I don't have any more soup questions.

Now this shows up a couple more times in the movie, times when one of them has a question and the other responds:

Not exactly a soup question, is it?

The whole concept is one I picked up from this movie, from time to time thinking about the interrogative statements of others and specifically classifying questions as to whether or not they were soup questions.

I suppose if you wanted to more succinctly define a soup question like if you wanted that top entry in the Urban Dictionary, you could think of it in terms of its antonym -- a question that is "not exactly a soup question" is one that is really not the business of the person asking.

I don't usually say it as often as I think it, mainly because most people don't get the reference.

But I find it to be a useful one, as there are entirely too many questions people ask that are not, in fact, soup questions.

I will give a technical example another day (tomorrow, unless something bumps it to later in the week) but for now will stay away from the technical, if that is okay.

And to be frank even if it isn't.... 

Anyway, on this last Saturday I happened to put in a twitter tweet/facebook status:

Michael is sticking to soup questions, and tequila, for the rest of the weekend.

to which my friend Melanie responded:

What kind of soup?

Now this is a fascinating question.

I live in Seattle and Melanie lives in San Francisco.

So if there were actual soup (which there was not; this was a metaphorical thing as the above exposition implies) then the kind of soup, while relevant to me, is not important or meaningful to her. There is no way that the type of soup would have any effect on her whatsoever and therefore would not be important to her.

Thus the question "what kind of soup?" is not, in this case, much of a soup question.

Despite being a question pretty much only about soup!

More Posts Next page »
 
Page view tracker