Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
Warning: this blog will not be as nice as some of the other blogs in this Blog have been, previously.
Remember how I used to have a Unicode character sponsor every blog in this Blog?
Well yesterday I was in my Twitter account (http://twitter.com/michkap) having some random goofy moments, so I was tweeting some sponsorship tweets, with characters sponsoring me.
Like this one:
is sponsored by ䷲ (U+4df2, aka HEXAGRAM FOR THE AROUSING THUNDER). This one spins itself, no additional comment required. #fb
Now people who had seen the feature on the blog would see this as nothing new. But this was a novel thing for me in Twitter and probably a good reason that most people think Twitter can be a waste of time since it was not doing anything useful.
Now since the primary purpose for Twitter (for me) is to being some of the discipline that its 140-character limits bring to its tweets and extend it to the statuses in my Facebook account (http://facebook.com/michkap), that #fb hashtag at the end causes a Facebook app to pick up the tweet and make it a Facebook status.
This lets me waste twice as much in the way of people's time while spending half the time that would usually require.
Anyway, I managed to [re]discover a problem this way.
You see, fellow blogger Larry Osterman is also a Facebook friend of mine, and he noticed a problem with this tweet:
do you have a link for those of us who are running a unicode challenged browser like IE?
Now I had been running in FireFox for reasons not terribly relevant here relating to a bug which I am told has been fixed but it takes me a while to recover.
So my view was like this:
Clearly I was able to see the character.
So I launched Internet Explorer to see it as Larry was seeing it:
and yes, the character is not visible there. Bummer.
Now I know some people hate Internet Explorer.
And one could jump on the bandwagon and show this is convincing proof that FireFox rules while Internet Explorer drools but one would be wrong to do so.
Because this problem is not Internet Explorer's fault.
Well, not really.
It is a problem of group focus versus customer scenarios, in actuality. I should probably explain:
You see, the Windows team is most focused on and concerned with the product version they are working on. This makes sense since for the most part there is another sustained engineering group that is most focused on prior versions.
But if you are a member of a group that produces Microsoft Office or Visual Studio or the .NET Framework or SQL Server or Internet Explorer then you know you have to run on other versions. So the exciting features of th latest version of Windows are of interest but hardly the only consideration since these products have minimal interest in sucking on earlier versions of the operating system.
The picture:
Got it?
Now the NLS folks have owned MLang for years now, part of a restructuring from I believe when the original IE6 team kind of disbanded and went to the four winds, which in its own way is kind of unfortunate since the reconstituted IE team did not take it back but relied on the NLS team to continue to own this library.
Why is it unfortunate?
Because the NLS team didn't touch it.
They fixed security bugs and occasionally fixed major reported problems (though usually did not touch those either due to lack of testing resources to verify changes or backcompat concerns.
In my opinion every time a feature was better developed in Windows than MLang, the MLang feature should have been gutted and made into a wrapper around the non-MLang version. And any feature that did not exist elsewhere but was needed by the IE team should continue to be maintained for the simple reason that Internet Explorer was depending on it and they are a partner team that for some people is the only time that some international features would ever be used with any kind of frequency. And if not then we should just give it back to them and let them control their destiny here. This worked with .Net (where we chose to continue to own the globalization pieces) and in Office (where we provided them with snapshots of the data)
I was regularly overruled on this opinion.
So support for several international features in Microsoft's premier Internet platform piece just started falling farther and farther behind.
In this particular case, the case that Larry noticed here is yet another side effect of the problem I mentioned in The importance of Tagalog to Burmese, aka "Of course I'd lie to you, I'm a font!".
Yet another bit of font support that the typography team worked so hard to support - in this case to add to the new Segoe UI Symbol:
becomes a great opportunity to make FireFox look better on Windows 7 while Internet Explorer 8 gets to look dumb for no good reason.
Note that after I first put up The importance of Tagalog to Burmese, aka "Of course I'd lie to you, I'm a font!" I made the recommendation that at a minimum the two bugs get fixed (since those were scripts that Windows claimed to support and were legitimate bugs) but ideally the basic table get updated to support everything in Unicode (which would be harder; the bug only involved entries in a table while the full fix involves some new script IDs which means other work).
I was overruled since the idea of updating MLang was simply not one that the folks deciding stuff wanted to entertain.
Personally, I think Internet Explorer should just make a land grab and take back MLang, doing a good solid job on it to bring their support to where it ought to be. Because being owned by the NLS team is a good thing when they are supporting you and your goals, but it really sucks if you are being put in maintenance mode. IE8 is by report a pretty good browser and deserves to be treated with more respect by its partners.
Perhaps this won't happen either, but if nothing else maybe the team that owns MLang now (post Windows re-org I cannot claim to know with 100% certainty who that is) can be shamed into updating a frigging table. Either on their own or with help.
I could do the bug fix work myself in afternoon by updates to one source file. I'll even give them the updated mlflink.cpp source file myself if they are worried about the time sink to look up the latest Unicode information. I'll even give the update to the SE folks in case they would like to unlameify any of the prior versions of IE. Plus I'd help whoever wanted to do the full fix any way I could....
Internet Explorer 9 (and frankly Internet Explorer 8 and Internet Explorer 7) deserve better.
This post brought to you by ䷐ aka ䷐ if you explicitly tag the font (U+4dd0, aka HEXAGRAM FOR FOLLOWING - as @DaleSchultz pointed out to me, a great character for Twitter!)
A few days ago, via several different methods (the Visual C++ Development Center forum, email to my non-Microsoft account, the contact link here, multiple off-topic comments with increasing impatience apparent in each for a solution), Rajesh asked:
Hello Michael: I have visited your blog, and know that you are an expert in Windows Uniscribe, here I have some questions about Uniscribe to ask you. Inter-character spacing for labeling results in a composite text collection with each character being split as a separate one. Hence each character is presented as a separate one and cannot arrive at a combination character. Problem with combinational characters is not only specific to right to left language( Arabic Language- Example:يُساوِي), the problem can exist with left to right language(Hindi Language - Example:ठऑक्षझॉ) also. So,Please let us know if there exists any API that identifies the given set of pre composed characters comprises a composite character. Thanks in advance,Rajesh Reddy
Hello Michael:
I have visited your blog, and know that you are an expert in Windows Uniscribe, here I have some questions about Uniscribe to ask you.
Inter-character spacing for labeling results in a composite text collection with each character being split as a separate one. Hence each character is presented as a separate one and cannot arrive at a combination character. Problem with combinational characters is not only specific to right to left language( Arabic Language- Example:يُساوِي), the problem can exist with left to right language(Hindi Language - Example:ठऑक्षझॉ) also.
So,Please let us know if there exists any API that identifies the given set of pre composed characters comprises a composite character.
Thanks in advance,Rajesh Reddy
Now of course I generally can't do the kind of 1-on-1 support that the many messages entailed, and people who are looking for support like that really need to find a more appropriate method, as I point out in my Contacting Me link.
But the question is an interesting one, and the blog that was going to be put in for today has to have a bit more done to it, so I thought I'd take a stab at it.
For starters we'll have to take the word composite out if the mix. Not that the word isn't descriptive enough, just it carries some baggage with it. It can confuse people into thinking the question is more about code pages and the difference between what Microsoft calls composite vs. precomposed sequences. This is the problem that the support engineer had in this forum thread at first.
Now the biggest problem is in the assumption that simply adding space in between every character is the right thing to do, as any language/script that does shaping when certain characters are placed next to each other will fail -- and this is the very problem that Rajesh points out.
What someone trying to do a complex operation like full justification could use is the information that Uniscribe returns in its ScriptString_pLogAttr Function (if one is using the ScriptString* functions) or the ScriptBreak function (if one is calling the fuller low level Uniscribe functions) -- in particular the array of SCRIPT_LOGATTR structures that each function returns that will, for each character in the list of characters Uniscribe is processing will return all of the following information:
Now once one has all of this information, one knows the safe places where space can be inserted if one is trying to extend the width of a line in order to make the justification match other lines, if one is using simple space insertion to do so.
But this is the wrong approach.
Note that in pretty much all cases such an algorithm has a pretty fundamental flaw, which is that the actual widths one might need to insert can be different and using full characters between the words will make the text jagged on the far edge (as can the different widths of the words themselves).
The better way to perform such operations is by use of the ScriptJustify Function as possibly modified by a more advanced editor, as the function indicates:
This function provides a simple implementation of multilingual justification. It establishes the amount of adjustment to make at each glyph position on the line. It interprets the SCRIPT_VISATTR array generated by a call to ScriptShape, giving top priority to kashida. The function uses interword spacing if no kashida points are available. It uses intercharacter spacing if no interword points are available. Note: Sophisticated text formatters might generate their own delta dx array by combining formatter-specific features with the information retrieved by ScriptShape in the SCRIPT_VISATTR array. The application should pass the justified advance widths generated by ScriptJustify to ScriptTextOut in the piJustify parameter. ScriptJustify creates a justified array containing updated advance widths for each glyph. When an advance width for a glyph is increased, the extra width is rendered to the right of the glyph, with a white space or, for Arabic text, a kashida.
This function provides a simple implementation of multilingual justification. It establishes the amount of adjustment to make at each glyph position on the line. It interprets the SCRIPT_VISATTR array generated by a call to ScriptShape, giving top priority to kashida. The function uses interword spacing if no kashida points are available. It uses intercharacter spacing if no interword points are available.
Note: Sophisticated text formatters might generate their own delta dx array by combining formatter-specific features with the information retrieved by ScriptShape in the SCRIPT_VISATTR array.
The application should pass the justified advance widths generated by ScriptJustify to ScriptTextOut in the piJustify parameter.
ScriptJustify creates a justified array containing updated advance widths for each glyph. When an advance width for a glyph is increased, the extra width is rendered to the right of the glyph, with a white space or, for Arabic text, a kashida.
This is the Uniscribe model for dealing with the kind of advanced justification one might see in a program like Word or PowerPoint or Publisher -- as it can be used to precisely place text to allow desired justification to take place....
For the other issue, the way of getting my (or anyone's) attention, I expect in most cases if one just thinks of me not as an employee of your company or you personally but as someone who has a job and really just blogs because it is fun and interesting to talk about the things that interest me (such as Uniscribe). If you met such a person, how would you approach them? If you had their email address, how would you word the email? And what would your expectation be? I expect the majority of people who frame the question that way will come up with an appropriate answer.
If the answer is needed urgently (which I assume it is) then there are many more formal support options that will guarantee the timeliness of the response, much more effectively than shouting the question from the rooftops (sometimes I end up involved with those too, and I serve at the pleasure of the customer).
I mean if they have an interesting enough question maybe I'll answer anyway. But my interests are pretty hard to pin down sometimes, and even the girl I go out with wonders how she catches my eye (though she does and I suppose once one catcheas my eye and not my ire then the hardest part is taken care of!).
And all of that is ignoring the challenges of figuring out my blogging schedule!
I got an email from Mike the other day:
Hi Michael, Just a quick FYI, a bit of great news (I guess 15 years is as good a time as any). VC2010 now generates Unicode RC files (when using the project wizard to generate a new app). Wow, I'd never thought I'd see the day. It was a great day when VC2005 actually supported opening and saving of Unicode RC files, but this is the icing on the cake. Now all those people using obsolete source control systems and diff utilities are really gonna have to update to support these newly generated projects that include Unicode RC files or they're in for a surprise :)
Woo hoo!
I agree this is very good news, and very good icing on this particular cake.
Remember when I talked about the first part of this, in The Unicode train is leaving the station, back in 2005?
I remember inside Microsoft how awful the diff'ing sitution was until someone updated WinDiff to support Unicode; Mike has a good point about those other diffing programs!
The question that came in was:
My customer have a questions with EUDCEDIT program on Windows XP. As we know, if we use EUDCEDIT to add some characters on XP. It will create two files: eudc.tte and eudc.euf. The question is, if we lost eudc.euf file, could we restore this file from corresponding eudc.tte? Because even we only have eudc.tte, the new characters still can work well.
My customer have a questions with EUDCEDIT program on Windows XP.
As we know, if we use EUDCEDIT to add some characters on XP. It will create two files: eudc.tte and eudc.euf.
The question is, if we lost eudc.euf file, could we restore this file from corresponding eudc.tte? Because even we only have eudc.tte, the new characters still can work well.
Now I have written about EUDC before, on several occasions. But knowing something about how it is used and how it interacts with different parts of the system doesn't necessarily make someone knowledgeable about the authoring issues.
I mean, I had some thoughts on the subject but this seemed like a better one to get some more expertise on....
Luckily Peter was around to provide the answer that I suspected was true:
I believe the .euf file contains the originally-edited bitmap data from which the .ttf is edited. If the .euf is lost but you still have the .ttf, then you can display those EUDC characters, but any editing of the glyphs would have to be done in a different font-editing tool, such as Fontographer or Fontlab. I don’t know of any way to recover the .euf file from the .ttf file. (In theory, it should be possible to generate an .euf that was approximately the same, but I don’t know of any tools that support that.)
It is possible that he is being a shade optimistic about tools being able to view/edit the .TTE files, but there aren't a whole lot of technical issues blocking it, so if they don't then they ought to. EUDCEdit itself is primitive enough that sophisticated options such as this seem a little out of scope, but of the many tools out there I assume some must be able to do something with TTE files.
Any of the regular readers here know for sure?
Developer Jason (an enthusiastic reader of the Blog) asked:
We need to be able to convert UCS-2/UTF-16 to a user-specified SBCS/DBCS/MBCS code page. Currently, we achieve this by simply taking the UCS-2 string and passing it on to WideCharToMultiByte with dwFlags set to zero. When converting to the Vietnamese code page 1258, this process can’t find a representation for the Vietnamese character U+1ec5 (Latin e with circumflex and tilde) even though one actually does exist (albeit with a combining diacritic from code page 1258: 0xea 0xde). Converting Vietnamese glyphs from the Unicode BMP to the corresponding glyph representation in the Vietnamese code page seems like a reasonable thing for us to be doing. My question is, should I be expecting WideCharToMultiByte to know this and successfully convert the character? I can’t be the first person to hit this issue and I imagine the mapping tables have been reasonably static, so it seems like perhaps there is something more that I should be doing. Is there, for instance, an expectation that the input string is normalized into some canonical form before calling WCToMB? Presumably decomposed form?
We need to be able to convert UCS-2/UTF-16 to a user-specified SBCS/DBCS/MBCS code page. Currently, we achieve this by simply taking the UCS-2 string and passing it on to WideCharToMultiByte with dwFlags set to zero. When converting to the Vietnamese code page 1258, this process can’t find a representation for the Vietnamese character U+1ec5 (Latin e with circumflex and tilde) even though one actually does exist (albeit with a combining diacritic from code page 1258: 0xea 0xde).
Converting Vietnamese glyphs from the Unicode BMP to the corresponding glyph representation in the Vietnamese code page seems like a reasonable thing for us to be doing. My question is, should I be expecting WideCharToMultiByte to know this and successfully convert the character? I can’t be the first person to hit this issue and I imagine the mapping tables have been reasonably static, so it seems like perhaps there is something more that I should be doing. Is there, for instance, an expectation that the input string is normalized into some canonical form before calling WCToMB? Presumably decomposed form?
An interesting question that will really draw on information from several different blogs from this Blog:
There are several people who tend to be dismissive about this code page, calling it at best incomplete and at worst broken. From a Unicode standpoint it certainly is, and arbitrary, to boot!
But there is a reasoning behind the code page, a point to which regular reader John Cowan's comment to blog #5 above is particularly relevant:
There's a Vietnamese-specific logic to CP 1258 that transcends the arbitrary Unicode normalization rules. The breve, circumflex, and horn accents, unlike the rest, affect vowel quality. If you look at a Vietnamese alphabet like the one at Wikipedia, you'll see that A WITH BREVE, A WITH CIRCUMFLEX, E WITH CIRCUMFLEX, O WITH CIRCUMFLEX, O WITH HORN, and U WITH HORN (as well as D WITH STROKE, which isn't Unicode-decomposable) are considered separate letters from their unaccented correspondents. Consequently, in 1258 they are encoded using seven precomposed characters. On the other hand, the grave, acute, hook above, tilde, and dot below accents are tone marks, conceptually not part of the letters they appear on. They're encoded using combining characters, since encoding them using precomposed characters would create a combinatorial explosion of 12 x 6 x 2 = 144 distinct vowel characters. (The VISCII encoding actually does that, at the expense of filling the whole 0x80-0xFF space with letters and even usurping six of the control characters!) Unsurprisingly, Vietnamese conventions always place the tone mark outside any breve, circumflex, or horn diacritic (and therefore following it according to Unicode rules).
There's a Vietnamese-specific logic to CP 1258 that transcends the arbitrary Unicode normalization rules. The breve, circumflex, and horn accents, unlike the rest, affect vowel quality. If you look at a Vietnamese alphabet like the one at Wikipedia, you'll see that A WITH BREVE, A WITH CIRCUMFLEX, E WITH CIRCUMFLEX, O WITH CIRCUMFLEX, O WITH HORN, and U WITH HORN (as well as D WITH STROKE, which isn't Unicode-decomposable) are considered separate letters from their unaccented correspondents. Consequently, in 1258 they are encoded using seven precomposed characters.
On the other hand, the grave, acute, hook above, tilde, and dot below accents are tone marks, conceptually not part of the letters they appear on. They're encoded using combining characters, since encoding them using precomposed characters would create a combinatorial explosion of 12 x 6 x 2 = 144 distinct vowel characters. (The VISCII encoding actually does that, at the expense of filling the whole 0x80-0xFF space with letters and even usurping six of the control characters!)
Unsurprisingly, Vietnamese conventions always place the tone mark outside any breve, circumflex, or horn diacritic (and therefore following it according to Unicode rules).
Thus it is incorrect to say that U+1ec5 is not supported by cp1258; it may be true that U+1ec5 (ễ, aka LATIN SMALL LETTER E WITH CIRCUMFLEX AND TILDE) is not supported as a discrete single code point, but U+00ea U+0303 (ễ, aka LATIN SMALL LETTER E WITH CIRCUMFLEX + COMBINING TILDE) is - and according to Unicode those two things are to be treated as the same thing. Given that the tilde is a tone mark in this particular case the language-specific way in which the various components of the letter are used makes sense whether the form follows Unicode normalization rules or not.
Which is this case it doesn't.
Thus the wider question of which Unicode normalization form to use that was one of the main points of Jason's inquiry is in fact a trick question: the answer is neither!
Instead, the Microsoft-specific normalization pseudo-Form V mentioned in #5 above is what would be needed here if one wanted to convert.
Now that is a big if in that last sentence.
Since Microsoft's Vietnamese keyboard layout produces text that will be perfectly represented on code page 1258, there are only three scenarios where one would that "pseudo-Form V" to convert out of Unicode:
For the third point the quick answer is to just not do that, if it is possible.
But of course even that is not always possible, so if the sad truth is that some component that cannot be changed is putting the data in some other form then some type of conversion between [probably] normalization Form C would be needed.
This is something that does not exist though the only requirement that a single byte code page such as 1258 cannot handle is the times when one code point would need to be converted to two, e.g.
and so on through all the other various letters covered by the code page.
Unfortunately a simple, table-based double byte code page could not properly support such a custom "Vietnamese Plus" code page mapping.
EXTRA CREDIT: Can anyone here discern and/or explain why, exactly? :-)
Thus one could build a DLL-based mapping (as in Custom code pages? Redux) and just keep these tables around in code if one wanted to. But one would obviously have to have some vested interest in wanting to (e.g. a need to support cp1258 data with data in Unicode that isn't currently in pseudo-form V.
I was most of the way to having this done (auto-generated) to post as a sample before it occurred to me that there might be very good reasons for a full-time Microsoft employee, even a pain in the ass one like me, to post such a thing.
Though if anyone wants to do it, note that I was using cp 51258, for obvious reasons.
If you wanted to create such a DLL-based code page and there is any way to create a standard usage out of a non-standard/unsupported code page, I would encourage you to do the same! :-)
Now for the record let me say this is an area where I do not really tend to agree with the Microsoft party line completely. I mean, I truly believe that Unicode is the best answer here in the long run, but I am hardly naive enough to believe that everyone has made that change yet and surprisingly [to some] not obnoxious enough to think it is acceptable to do nothing further to assist customers. Especially when we expect people to migrate and we know we aren't the most popular non-Unicode solution, the fact that we provide no assistance here and aren't even remotely apologetic as we vote to make Unicode less and less compatible with our own solution even as we make it harder to use is really not my style. To be honest, the fact that we do not have a better solution for integrating with Unicode in the Vietnamese case is also pretty bad -- not even the excuse of backcompat, the only explanation is that no wants to do the work because supporting Vietnamese correctly and more consistently with Unicode just doen't hit anyone's radar. So no one wantsd to use us and the problem perpetutates itself.
When you consider in particular the history of Microsoft in regard to VNI, it just makes Microsoft look worse. Perhaps there are even legal reasons related to the VNI thing that we are requitred to suck here that no one has told me about?
But anyway and either way, that is how things are right now, so my dissenting opinion is unlikely to reach any higher level than the blog post you are reading....
This morning, I am over at her place (I'll talk about the identity of "her" some other time!), around her as she hurriedly prepares some victuals for an upcoming "Suffer with the Seahawks" party at a friend's house and I ask her if she'd like me to grab a coffee from the nearby Starbucks.
Grateful about the thought, she mentions she'll need to lend me her key to get back in to the apartment (the building she lives in is secure, with an outer locked fence and of course the locked door for the building itself). Fair enough, I say.
And I head out to get her venti iced coffee with skim, 7 pump classic, very light ice.
I had her text me the coffee order, for fairly obvious reasons. :-)
I get the coffee and head back, briefly acknowledging the people who wanted to tell me how amazing the iBot is. Everyone does that, even the ones who don't see dancing or anything complicated happening in it. I just smile and agree, mostly.
Where was I?
Oh yeah, bringing back the coffee.
I get to the outer gate of the apartment building.
And suddenly realize that having the key is not going to help me.
Because opening that gate requires turning the key and while it is turned pulling the gate open. The heavy gate, which does not seem terribly fond of opening and requires a real effort to open.
I'd pushed the gate open from the outside several times, so I know the iBot can handle the pushing it open part.
But the iBot is slower at the reverse, and I lack the number of hands to turn the key, pull the knob, and pull back with the joystick, all while balancing the coffee in my lap.
After some failed tries that leave splashes of my own grande espresso chocolate truffle, three shots, no whip on my jacket sleeve, I give up on the side gate and head for the front gate.
The front gate's only real difference is that it is wider, it is still heavy and difficult to open from the outside with an iBot. Oh, and this time some of my grande espresso chocolate truffle, three shots, no whip also splashed on my jeans as my attempts failed.
In my mind's eye I recall heading up the hill with her and watching her have trouble with these gates herself at times, and in a too-late flash realize that I need to yell for help.
Maybe I was too hasty and should have tried the regular door next to the garage door to the basement garage of the building. But now I'm wearing enough of the grande espresso chocolate truffle, three shots, no whip that I'd really not want to take the risk that I'd be trying on some venti iced coffee with skim, 7 pump classic, very light ice.
So I text her and she heads down. And she rescues me.
She is doubtlessly thinking about her sister with a complete spinal injury who will be visiting her in this new apartment at some point, and who could easily run into the same sort of problem, as she talks about saying something to the management.
After all, the building is an anomaly in Capitol Hill -- ramps from the street to the building's doors, an elevator to get to any floor. In a location that is as likely to have multiple stairs to get to get inside the entrance of a four-story walk-up with no elevator, the building is mostly a pleasant surprise that I've been happy about every time I've visited.
Not looking at me as inadequate at all, she understands and even as I apologize in my embarrassment she assures me that it's not my fault.
But I hate the reminder of things I have trouble doing now.
I am back at the 2007 TypeCon pre-con event to a building that was only accessible via a light of stairs that nevertheless had handicapped-accessible bathrooms. And I had to climb up the stairs on my own while others pulled the 70-pound scooter up the stairs. They were embarrassed for the sake of the building they did not own and weren't blaming me at all.
Or maybe back at SDGN in Amsterdam in the evening after the conference with Stephen Forte and Richard Campbell when while walking with a cane I found I couldn't anymore, and needed Richard to practically carry me the rest of the way back to the hotel. Neither of those two guys were blaming either, even though I was almost certainly prematurely cutting off the night in Amsterdam.
But then, as now, I don't want people understanding why I can't do stuff; I want to just be able to do stuff.
I want to borrow her key again and practice with that gate, spending five hours if that is what it takes until I figure out how to get in, even knowing in my mind if not my heart that I'd have more fun just going ion and hanging out with her.
Feeling pathetic has little to do with whether the people who see you think you look pathetic; that can help but it certainly isn't required!
I'll be over this soon enough, whether I spend those 5 hours to try and win my war with the gate and even whether I succeed.
But how to get over not being able to do stuff that my own experiences tell me are do-able?
The Suggestion Box has been getting way too busy during my extended absence from blogging, so I thought I'd clear out a few today. :-)
First, from Jeroen Ruigrok van der Werven, who asked:
Hi Michael, any idea why the locale identifier is missing for Corsican at http://msdn.microsoft.com/en-us/library/ms776260 Also, what additional locales does Windows 7 have over Vista?
Hi Michael,
any idea why the locale identifier is missing for Corsican at http://msdn.microsoft.com/en-us/library/ms776260
Also, what additional locales does Windows 7 have over Vista?
Now I'll be honest with you about these two topics (Locale Identifier Constants and Strings and Language Identifier Constants and Strings). I don't like them. They are huge lists that are static looks at a list of locales but neither says when it is from, specifically. Which is really the only useful purpose for HAVING such a static list (knowing when the list is designed to encompass). You could use the list itself to work backward and discern the dates under the if you already know the answer you'll be able to find out the answer doctrine, but that's kind of not the point.
The previous paragraph gives a hint as to how once can understand answer to the original question -- obviously the one missing an LCID came before the one not missing it. Further along the lines of FDA (Forensic Documentation Analysis, a common science seen on the tv show CSI: MSDN) one can look at the top line info on each page, the latter with
MSDN -> MSDN Library -> Win32 and COM Development -> User Interface -> International Support -> Globalization Services -> National Language Support -> National Language Support Reference -> National Language Support Constants -> Language Identifier Constants and Strings
and the former with
MSDN -> MSDN Library
The fact that the one has several locales marked Vista and later and the other has a locale not in the first make it clear that one is from Vista and the other is from Windows 7, and that the Vista topic is "orphaned" from the full indexing of a Table of Contents.
So the topic without the locale in question should probably just be removed or something.
The second question comes from Jan Kučera:
Hi, glad to see you back, Michael! Actually I've already tried twice to post the following question some time ago, but it did not go through. However, I know how you like sorting topics, so I thought I might try it once more… here we go: Say I have a web page, and I'm providing the content in several languages. Now the question is, what do you thing is the best way to present the languages available, in which language and the most interesting – in which order? I've told to myself – okay, let's see how the MSDN does it. If I open the MSDN web, I see "Česká Republika - Česky" in the top right corner. Hmm well, if you happen to not know the Czech language, I guess you have no idea this is the language selector. Anyway, here is the list expanded: Argentina (Español) Australia (English) Brasil (Português) Canada (English) Canada (Français) 中国 (简体中文) Colombia (Español) Deutschland (Deutsch) España (Español) France (Français) India (English) México (Español) Perú (Español) Россия (Pусский) United Kingdom (English) United States (English) What is the reason for listing the country first and then the language? I also see both country and language are displayed in the native language, which prevents you discovering the selector if the page is displayed in a language you are not familiar with. Now after couple of your posts about sorting, I see this list is sorted neither by language, nor by country (the non-Latin characters would go down, right?). So my first guess is that the list is sorted by country language in English – an item not in the list – a bit confusing, especially if you don't know the English names – though quite interesting idea for me. And now, what sort is used? Assuming the 中国 thing is China, it seems to be sorted using English rules, because in Czech (in which the web is shown), this would go after Colombia. Wasn't it me to whom did you advised to use the sorting expected by the user? Funny is that if you click to display more languages, the combo box is sorted different way, I would say by native country names (non-Latin at the bottom), accent insensitive (Česká republika before Chile) using English rules (Chile before Colombia) – at least that page is in English only. Though for me, looking for Česká republika in the middle of 'C' names is really weird. Looking at another sites, I see everyone implemented it differently. So...I wonder, do you have any thoughts what could be the most correct way? Thanks!
Hi, glad to see you back, Michael!
Actually I've already tried twice to post the following question some time ago, but it did not go through. However, I know how you like sorting topics, so I thought I might try it once more… here we go:
Say I have a web page, and I'm providing the content in several languages. Now the question is, what do you thing is the best way to present the languages available, in which language and the most interesting – in which order?
I've told to myself – okay, let's see how the MSDN does it. If I open the MSDN web, I see "Česká Republika - Česky" in the top right corner. Hmm well, if you happen to not know the Czech language, I guess you have no idea this is the language selector. Anyway, here is the list expanded:
What is the reason for listing the country first and then the language? I also see both country and language are displayed in the native language, which prevents you discovering the selector if the page is displayed in a language you are not familiar with. Now after couple of your posts about sorting, I see this list is sorted neither by language, nor by country (the non-Latin characters would go down, right?). So my first guess is that the list is sorted by country language in English – an item not in the list – a bit confusing, especially if you don't know the English names – though quite interesting idea for me. And now, what sort is used? Assuming the 中国 thing is China, it seems to be sorted using English rules, because in Czech (in which the web is shown), this would go after Colombia. Wasn't it me to whom did you advised to use the sorting expected by the user? Funny is that if you click to display more languages, the combo box is sorted different way, I would say by native country names (non-Latin at the bottom), accent insensitive (Česká republika before Chile) using English rules (Chile before Colombia) – at least that page is in English only. Though for me, looking for Česká republika in the middle of 'C' names is really weird.
Looking at another sites, I see everyone implemented it differently. So...I wonder, do you have any thoughts what could be the most correct way?
Thanks!
I don't know how back I was in April when Jan put this question up, though I think I'm back now. :-)
There are several questions packed in there, let's try and get all of them.
First there is the country followed by language is just a choice that several of the pages and subwebs of microsoft.com do to underscore the reality of the different subsidiaries (both to help foster a sense of "ownership" and to make it clearer who is best to be looking after the interests of accuracy whethr subsidiary or localizer, etc.).
Now the putting the region/language in the native language is a common way to keep the readers on the pages they are most likely to find useful - if you can't read it, then why would you click on it? :-)
As for the two lists, the first is the one in the dropdown:
where the second list is on the separate page:
It is clearly based on the less sexy list sorted alphabetically, though not by a specific language, as Jan mentioned.
One of the bad things bout it is that will place all of the non-Latin script names at the end, which almost hads the unfortunate connotation of putting some entries at the "back of the bus". So the smaller list, which is indeed sorted in the order of the English spelling of the entry but which even though you may not see the order but can almost certainly find your language if its there, and the bigger list is fun to look at the source and you'll see the locale names there, too:
<option value="es-ar">Argentina (Español)</option><option value="en-au">Australia (English)</option><option value="nl-be">België (Nederlands)</option><option value="fr-be">Belgique (Français)</option><optio/n value="es-bo">Bolivia (Español)</option><option value="pt-br">Brasil (Português)</option><option value="en-ca">Canada (English)</option><option value="fr-ca">Canada (Français)</option><option value="cs-cz">Česká republika (Česky)</option><option value="es-cl">Chile (Español)</option><option value="es-co">Colombia (Español)</option><option value="es-cr">Costa Rica (Español)</option><option value="da-dk">Danmark (Dansk)</option><option value="de-de">Deutschland (Deutsch)</option><option value="es-ec">Ecuador (Español)</option><option value="es-sv">El Salvador (Español)</option><option value="es-es">España (Español)</option><option value="fr-fr">France (Français)</option><option value="es-gt">Guatemala (Español)</option><option value="es-hn">Honduras (Español)</option><option value="en-in">India (English)</option><option value="id-id">Indonesia (Bahasa Indonesia)</option><option value="en-ie">Ireland (English)</option><option value="it-it">Italia (Italiano)</option><option value="es-mx">México (Español)</option><option value="nl-nl">Nederland (Nederlands)</option><option value="en-nz">New Zealand (English)</option><option value="es-ni">Nicaragua (Español)</option><option value="nb-no">Norge (Norsk)</option><option value="de-at">Österreich (Deutsch)</option><option value="es-pa">Panamá (Español)</option><option value="es-py">Paraguay (Español)</option><option value="es-pe">Perú (Español)</option><option value="pl-pl">Polska (Polski)</option><option value="pt-pt">Portugal (Português)</option><option value="es-pr">Puerto Rico (Español)</option><option value="es-do">República Dominicana (Español)</option><option value="ro-ro">România (Română)</option><option value="de-ch">Schweiz (Deutsch)</option><option value="en-sg">Singapore (English)</option><option value="sk-sk">Slovensko (Slovensky)</option><option value="en-za">South Africa (English)</option><option value="fr-ch">Suisse (Français)</option><option value="fi-fi">Suomi (Suomi)</option><option value="sv-se">Sverige (Svenska)</option><option value="tr-tr">Turkiye (Türkçe)</option><option value="en-gb">United Kingdom (English)</option><option selected="selected" value="en-us">United States (English)</option><option value="es-uy">Uruguay (Español)</option><option value="es-ve">Venezuela (Español)</option><option value="el-gr">Ελλάδα (Ελληνικά)</option><option value="bg-bg">България (Български)</option><option value="kk-kz">Қазақстан (Русский)</option><option value="ru-ru">Россия (Pусский)</option><option value="uk-ua">Україна (Українська)</option><option value="ko-kr">한국(한국어)</option><option value="zh-cn">中国(简体中文)</option><option value="zh-tw">台灣(繁體中文)</option><option value="ja-jp">日本 (日本語)</option><option value="ar-sa">الشرق الأوسط - العربية</option><option value="he-il">ארצות הברית - אנגלית</option>
See what I mean? Interesting, right? And clearly not in the same order as the smaller list.
Kind of interesting the letters they chose to use NCRs for too, since they did not use all of them.
I actually get asked the final question about what would the best order be all the time, at least 10 times thie year (mostly from inside Microsoft but a few from the outside). the really aren't good rules for this, though I find the English only ordering to be pretty wrong myself - not accent insensitive since in the English sort that Č really is just a letter with an accent that is a tertiary distinction, so the primary distinction later in the letter will win. Maybe they'll do better if we switch languages - say to French:
Well they did localize the More... to Plus... but the list itself is the same, and the list when you click that list is identical, which I would call unfortunate as I was hoping they'd be using the locale name, which they have in the list, for the sorting. Though they did not do so. That really seems less than ideal to me. But there aren't a whole lot of standards out there for sorting these lists so almost anything one does isn't "wrong" in a technical sense.
The web is rife with different answers to this question, though.... on the whole I don't really like any of them....
I have a blog I have been writing off and on for a couple of years now all about digit substitution.
That blog is coming soon and will be my definitive and final thoughts on the feature and its implementation(s).
This is not that blog.
This is a blog about digit substitution, though.
Digit substitution and GDI+, this one is about, actually.
The question:
Issue: Need to get the lang id and “standard digits” values that the user has chosen for digits. Scenario: the user locale is set to "English (United Status)" In "Additional settings..." in Region and Language, "Standard digits" has been set to ٠١٢٣٤٥٦٧٨٩. "Use native digits" has been set to National. Problem: In the above scenario, all the windows UIs like shell, notepad, etc are showing ٠١٢٣٤٥٦٧٨٩.However, when I render text using GDIplus drawstring, it is rendered as 0123456789 I was able to get the desired output when I used format.SetDigitSubstitution(0x0C01, Gdiplus::StringDigitSubstituteNational); (Our code is in C++.) Question: Does anyone know the win 32 api or how to get the lang id and “standard digits” values that the user has chosen for digits.
Issue:
Need to get the lang id and “standard digits” values that the user has chosen for digits.
Scenario:
the user locale is set to "English (United Status)" In "Additional settings..." in Region and Language, "Standard digits" has been set to ٠١٢٣٤٥٦٧٨٩. "Use native digits" has been set to National.
Problem:
In the above scenario, all the windows UIs like shell, notepad, etc are showing ٠١٢٣٤٥٦٧٨٩.However, when I render text using GDIplus drawstring, it is rendered as 0123456789
I was able to get the desired output when I used format.SetDigitSubstitution(0x0C01, Gdiplus::StringDigitSubstituteNational); (Our code is in C++.)
Question:
Does anyone know the win 32 api or how to get the lang id and “standard digits” values that the user has chosen for digits.
This ought to be easy enough to answer, one might expect.
Unless of course one has spent any time dealing with digit substitution, of course!
Unfortunately, the answer is that there is no way to query for this information, this scenario where one changes the fundamental digit choice to be different than the user's standards and formats language, al so known as the default user locale for the given user.
Now if one looks at the internal tables behind Uniscribe and GDI+ one will see lots of hard-coded, locale based info (as discussed in Digits -- there is no substitute) but the fact that they are kind of locale based is hidden from the caller of both technologies, though hidden in very different ways (in the case of GDI+ this way is a lot harder to work around as this problem illustrates, as there is no intrinsic way to say "use the default user locale setting as modified by the user when appropriate". This is the setting that would easily solve this StringFormat::SetDigitSubstitution Method-based question.
Now in theory one could hope that not calling the above method with its limitations would lead to correct behavior, but this is apparently not the case. Before one gets too haughty it is reasonable to consider that one of the most common reported performance complaints about Uniscribe is its overabundant interest in user locale settings due to the anal retentive checking of the digit substitution settings.
So GDI+ is giving us some gain for the lack of this functionality - one less performance issue!
Unfortunately, if it is not giving a parameter on a StringFormat::SetDigitSubstitution Method overload that lets one say this (like the Uniscribe ScriptRecordDigitSubstitution Function not only allows but explicitly documents) then GDI+ is leaving this hole in support, a hole that really has no good excuse (if the lookup is only triggered by a StringFormat::SetDigitSubstitution Method call with LOCALE_USER_DEFAULT then it is hardly a negative performance issue since no one is monitoring anything at all, really -- it can be a one-time lookup.
Note also that the Uniscribe ScriptRecordDigitSubstitution Function and in particular the SCRIPT_DIGITSUBSTITUTE structure that both it and the ScriptApplyDigitSubstitution Function depend on, introduces the notion of separate NationalDigitLanguage and TraditionalDigitLanguage values that do not have to be the same as the user locale since they can be updated via other processes.
There is a part of me that would like to mistrust the report from the questioner that LOCALE_USER_DEFAULT to the GDI+ StringFormat::SetDigitSubstitution Method really fails here, but I'm going to trust that it did fail to fix the problem as they told me when I made the suggestion.
Worst case has me reporting that I was mistaken here and I will only be the second most embarrassed person and I can live with that! :-)
Plus if it does actually work I'll still get the last word about how weird it would be that LOCALE_USER_DEFAULT and the LCID that is the current default user locale would have different behavior, despite all the work that NLS does to make them the same thing. Even though it is a screwy semantic at times, it has been there for a long time and it really ought to be respected by people who wish to opt in to the "LCID" datatype....
The question was easy enough:
Hi, I am trying to use the GetTimeFormat to retrieve the Time format from the system. Win7 added the new customized format. How can I get the format(as in the attached image)?Which Locale ID should I use? Thanks
Hi,
I am trying to use the GetTimeFormat to retrieve the Time format from the system. Win7 added the new customized format. How can I get the format(as in the attached image)?Which Locale ID should I use?
Thanks
Well, first let's take a moment to recognize something important that this screenshot shows.
Do you see it?
Hint #1: It is from windows 7.
Now?
Hint #2: Remember blogs like Customizing the SHORT time format? and We do seem to be short on time... and I see LONG TIME and SHORT TIME; where are SHORTER TIME and SHORTEST TIME? and Predictably (in retrospect), aka Where Wild^H^H^Hindows-Only Things Are, aka SHORT [on ]TIME for a LONG TIME and such.
Oh, never mind. I gave it away.
They added a short time format to Regional and Language Options!
Awesome, right? Now there is parity between this piece of managed code and native code, between NLS and the Globalization classes.
Very cool!
Well, almost.
Getting back to that question:
I am trying to use the GetTimeFormat to retrieve the Time format from the system. Win7 added the new customized format. How can I get the format
Suddenly we lose some awesomeness, here.
You see, GetTimeFormat/GetTimeFormatEx add no new flags to get at this new data item that Regional and Language Options exposes. They have the same old flags they always had, but no new meanings atre ascribed to them, and behavior changes would be bad so it is probablyh good that nothing changed here.
And the Windows 7 version of winnls.h adds no new flags to get at the short time, either (just in case there was some worry that the docs were falling behind the product features).
There is no way to directly get at this new format that you can get at the time for formatting from the time formatting functions.
Though you can get at it through EnumTimeFormats/EnumTimeFormatsEx with the new TIME_NOSECONDS flag, or GetLocaleInfo/GetLocaleInfoEx with the LOCALE_SSHORTTIME flag.
As a by the way, the LOCALE_SSHORTTIME flag has some really disturbing (to me, at least) information:
Windows 7 and later: Short time formatting string for the locale. Patterns are typically derived by removing the "ss" (seconds) value from the long time format pattern. For example, if the long time format is "h:mm:ss tt", the short time format is most likely "h:mm tt". This constant can specify multiple formats in a semicolon-delimited list. However, the preferred short time format should be the first value listed.
Um, if true, that means that LOCALE_SSHORTTIME does not behave like LOCALE_SSHORTDATE or LOCALE_SLONGDATE. It returns the semicolon-delimited list of short times that EnumTimeFormats/EnumTimeFormatsEx enumerate.
Now if true that would make it easy for the .NET Framework's new upcoming version to ask Windows for information since that is the format it grabs out itself anyway, but not so easy for developers in Windows or outside of Microsoft using Windows, since which entry is the (possibly customized but if nothing else "current default" is not documented and thus the Windows code that grabs formats out of EnumTimeFormats/EnumTimeFormatsEx can't be officially relied on (in practice it should be the first one, I suppose - maybe that should be officially documented).
And even if it could, parsing a semicolon delimited list of formats is easier than calling the enumeration functions which make callbacks and so forth.
That may be why LOCALE_SSHORTTIME has this unusual data return value - the fact that this keeps the .Net folks from having to call the complicated callback function. So at least someone might have had an easier job.
The solution to the original problem?
Well, Step 1 is to call GetLocaleInfo/GetLocaleInfoEx with the LOCALE_SSHORTTIME flag.
And then Step 2 is to take the string that is returned and pass it as the format to GetTimeFormat/GetTimeFormatEx.
And of course that last question (which locale?), is to use either the LOCALE_USER_DEFAULT or LOCALE_NAME_USER_DEFAULT constant, depending on which function you call.
You might get a sense of why I stopped thinking it was awesome.
I mean, don't get me wrong, it is better than nothing.
But there is a lot of room for improvement in future versions, based on the ways that developers might want to make use of the information....
UPDATE 12:32pm - it seems that they did the work in GetTimeFormat[Ex] to support the formatting; if you pass TIME_NOSECONDS they give you the short time in Regional and Language Options, with seconds stripped out if you added them. Though is you want what the user put in you have to go through the above steps due to the overloading of this flag that has had its meaning changed. Maybe I'll write about that tomorrow....
Over in the Suggestion Box, Andrew West asked:
Hi Michael, I have used the Table Driven Text Service to create an input method for a not-yet-encoded script (Tangut) using PUA codepoints. It was easy and it works great, except that the PUA Tangut characters do not display on the candidate list, which is quite inconvenient, especially in those cases where two or three characters share the same input sequence. So I was wondering, is there any way to specify what font to use for the candidate list? Thanks, Andrew
I have used the Table Driven Text Service to create an input method for a not-yet-encoded script (Tangut) using PUA codepoints. It was easy and it works great, except that the PUA Tangut characters do not display on the candidate list, which is quite inconvenient, especially in those cases where two or three characters share the same input sequence. So I was wondering, is there any way to specify what font to use for the candidate list?
Thanks,
Andrew
Now it is interesting that you ask this question, Andrew.
As you know, we aim to please here. :-)
As it turns out, the folks in Taiwan were not thrilled about the way that the Array and DaYi IMEs looked on non-CJK UI language systems -- since DEFAULT_GUI_FONT was being used, the size ended up being 8 on those machines, which is not great for those IMEs on Chinese, Japanese, or Korean UI language systems.
They wanted 9pt, darn it!
Plus any time they were on Simplified Chinese UI language system, they were getting the font chosen in the IME that they just didn't want (SimSun).
They wanted PMingLiU, gosh darn and to heck with it!
So in Windows 7, the information/option to support them was added to the TableTextService text files. You can check it out in the Windows 7 files for TableTextServiceDaYi.txt and TableTextServicesArray.txt in the configuration section:
[Configuration]FontFaceName=PMingLiUFontSize=9
And there you go!
You can put in some other font facename/size there and that font will be used instead (otherwise it will default to whatever GetStockObject(DEFAULT_GUI_FONT) returns, as it does for most of the built-in IMEs, and in Vista....
So I guess maybe you could say that you love Windows 7, and that it was your [independently requested] idea, Andrew!
Sometimes even when two problems are different, there are very strong similarities -- and the answer is still the same.
Like the other day, when Jacques asked:
Is there a standard way to determine if a given locale orders the firstname/surname as "First Last" or "Last, First"? I have looked through the Globalization namespace but haven't found anything useful. Is there a built-in way to do this or do I need to implement my own? Thanks.
Is there a standard way to determine if a given locale orders the firstname/surname as "First Last" or "Last, First"? I have looked through the Globalization namespace but haven't found anything useful. Is there a built-in way to do this or do I need to implement my own?
Thanks.
Now this may remind some regular readers of What's in a name? from way back when, though it was really more talking about parsing names in general (in order to find the last name), rather than the display order of names in general.
However, the two problems are obviously related, and share a common heritage of the overall complexity of names - a complexity that doesn't ever seem to get any easier.
So the quick answer for Jacques is that there is no such beast in the Globalization namespace.
And the long answer is that not only is there no such thing but one should not try to write it oneself, for the very same reason that there is no built-in solution in the .NET Framework. Because any such solution will be incomplete across the huge variety of issues in names!
Now in a way, the two problems are converse issues (one is given a display name how to parse out the constituent parts, while the other is given the constituent parts how to construct the full form of the name for display) so perhaps one could think of the problems as being merely on the other end of the telescope (pardon the oblique musical reference!).
This is the kind of problem that apps like Outlook Express/Windows Mail/Windows Live Mail have had lame solutions for all along, a lameness that is pretty much mitigated by the easy way users have to notice the problems when they occur and override the program's guesses
But one could take a step back and ask WHY the question is being answered in each case - what scenario leads one to have just one kind of information while needing the other - in order to come to the ideal solution for the problem.
It may involve creating a localizable solution so that a localizer (someone meant to be the most knowledgeable person about users expectations in a given culture) can give the user sensible defaults that the user can then override when the code falls short.
Or it may involve just making sure to ask for the information, all of the information, explicitly. And not try to be too clever in the code.
If you look in not just What's in a name? but in the comments to it, there are countless examples that should tend to dissuade even the most eager developer hoping to try to add this particular feature as a pure code solution meant to solve the problem, since you can't. :-)
This blog is about a fun story.
In fact, this story is one that every time I think about it, I laugh.
Even if it weren't a true story (which it is), I would still think it was fun, but the fact that it is a true story just makes it even more funner.
Perhaps even most funnest!
Anyway, it started back in the early part of the last decade, back when we were all just getting used to the fact that the year 2000 did not cause society to collapse.
I was over at a friend's house, hanging out with her and her husband. Actually he is a friend too, but I met her first. So mentally I just think of them that way. Given the fact that there is a bunch of music that he and I both like while she think its sucks (both suspect her reasons are mainly that we like it though haven't conclusively proven it yet), I suppose it could really go either way.
It doesn't matter though for the purposes of the story, and given that the previous paragraph is one of the biggest digressions ever in this Blog, I think we should move on now.
The story actually started just before I was over at their house, maybe the night before.
I was watching the movie True Colors, a movie that would have been much more interesting had it not been so extreme in its principal characters (John Cusack as the senator's aide who runs for congress in a sleazy way, and James Spader as the justice department agent who is John's friend and the one who runs the sting operation on him) were not so obviously evil and good, respectively.
Also, the role reversal in who was good and who was evil was also fun, since John Cusack seemed to be a good guy in every role even when he is contract killer, and James Spader had been doing so many bad guy roles that we almost forgot the bumbling, lovable genius who marries the girl and stays behind in the movie version of Stargate (this is something else we forget in our post-SG1 world, but I digress again).
Anyway, there was one interesting plot point in the story, one in which John Cusack (while with James Spader and his girlfriend Imogen Stubbs on New Years Eve), as the introduction to him claiming he wanted to be a congressman, was a resolution that he would be elected to congress within ten years, betting a case of the champagne they were drinking (against a bottle that they would each give him if he was not elected).
This movie was kind of in my head when I was over at their house and we were drinking, celebrating a promotion that she had just gotten.
I was teasing her a bit since she was always claiming that she was so close to leaving her program management job at Microsoft to become a barista and that she now clearly was moving up the ladder of success. She was a bit embarrassed about it and I felt a little bad putting him in the position of either agreeing with me or claiming his wife of many years wouldn't be successful but it was all in fun so the topic was a great piece of conversation.
Then, in a fit of mild drunken excitement, I (remembering the movie) picked up the bottle of champagne (well actually it was cava - Spanish style champagne - but close enough for our purposes), and claimed that she would be a Director within ten years, and that I would be willing to bet a case of this champagne that she would accomplish this goal.
Her husband nodded and admitted it was possible, and she just smirked and restated her prediction that she was sure to be a barista by that time, at best a lead barista. But she agreed to the bet, I assume because she was drunk and figured she'd win anyway.
Over the years following it would periodically come up as a recurring joke, mentioning the bet. Any time she did some job particularly well, or got promoted, or got recognized by someone important. You know, it was happening frequently enough that the joke was kind of fun....
Then in February of 2007 I thought I had been given a bit of a late Christmas present (always fun for someone who is M.O.T. like moi!) when a new organizational announcement listed her job title as "Director of International Planning and Strategy".
I immediately sent her mail congratulating her and asking her when I could get my champagne, of course!
She (being obviously a bit higher up in the org) had apparently been given some insight into all kinds of rules related to standard titles that were going to be applied to all of the people all across Windows, and said the bet hinged on her title being updated in the Global Address List. I wanted to call shenanigans on this "improvement" but was forced to admit that since the nature of the directorship had not been specified that the announcement lacked rigor as a bet winning move.
Imagine my disappointment when a few days later her title was "Principal Program Manager Lead".
Congratulations on the promotion and all dear, but not a "Director" in sight! :-(
She stuck to her guns about the need to see the change in the GAL for the bet to be won, and perhaps jokingly added a new alternate job aspiration of "beet farmer" to her barista aspiration.
Sigh....
But they say to never give up hope, and it is quite lucky for me that I never do, despite being such a cynical bastard that I will often point that the glass may indeed be half full -- but of poison!
Why should one never give up hope?
Well, in the end of October 2009, after our big re-org, an announcement mail about her new role in a new group was sent out, with a title of "DIRECTOR, STANDARDS MARKETING" (caps theirs, bold emphasis mine).
She told me later on when she had first gotten the offer letter she was incredibly excited and then about a half hour later she was thinking something else entirely: "Oh crap, I lost the bet."
The message to me via IM, perhaps just moments before the mail was sent out to the world was simple enough:
"you have officially won our bet"
And I instantly knew what she meant and was instantly insufferably pleased. :-)
I pointed out she probably got over it quickly enough, what with the new job she was so excited about and all. She admitted this, and promised to deliver to me the case of cava, forthwith (riding the cava home on my lap in the iBot was also fun, but in a minor way; the winning was the cool part).
This story is one she has shared with a few friends since then and they all loved it. As do I.
And despite the joking claim of friend Holly that "as memory serves me correctly cava is strictly for males, correct?", and my sorta girlfriend and I have managed to get lightly toasted with a bottle consumed jointly after earlier drinks. But then I suppose everyone is a rule breaker now and then!
Funny how things work out, isn't it? :-)
Some of you may remember a few year back when a VP over at Adobe commented that perhaps the solution to the problem of rampant software piracy in China that the government there seemed uninterested in combating, or at the very least unable to combat (not to mention other problems), was to simply stop shipping software there.
Now it was very quickly denied as a matter of official Adobe policy, but the point was made - that patience could be exhausted.
I was reminded of all of this after reading David Drummond's (Google Senior VP, Corporate Development and Chief Legal Officer) A New Approach to China on Google's official blog.
Now of course Microsoft has seen both the issues the Adobe VP mentioned and the ones the Google VP mentions, in the case of Google before that company even existed, and despite feeling offended by rampant software piracy in a theoretical sense I have to admit that it makes little real difference to my own personal bottom line. So it bothers me, sure. But I only have so much RAM to devote to issues each day, and it often falls off.
Some of the information issues that Drummond raises feel kind of the same to me, despite the fact that I can recognize I am taking my own morality and using it to judge China's. It makes sense that an Adobe (or a Microsoft) would feel as offended by people stealing their product as Google would feel about the open flow of information being controlled -- in each case the stronger response about China's approach affecting something important to them is obvious.
For me, the China issue has always been most important to me in terms of the issues that directly affected the things in front of me -- their baffling minority language policies toward Uyghur, their Taiwan policies, their policies toward Tibet and Bhutan and even Kashmir, their terrible and inconsistent approach to international encoding and other standards, all sharing the same feel of being a large country that acts small, that acts so worried that it will be small in people's eyes that it will prove it is big even at the expense of language.
I guess you could say that is my biggest issue, and in a way it is the most arbitrary of all of the reasons to try to take issue with China or any of its policies, any of its politics.
My role vis-a-vis support in China, for minority languages whose support is demanded even as expertise is not made available, in the supplementary DLL to assist with GB18030 support to GB18030 work in standards to adding support for more ideographs that China will ever need or use doesn't make me feel proud of the work I do so much as ashamed that I and the company I work for is so easily bullied.
I don't know how much time I can really spend railing on the issues though. Even if I were a VP at Microsoft (and I'm obviously not), these aren't Microsoft's issues exactly. My "test balloon" therefore carries none of the weight of Adobe's or Google's. But in my own way the fact that it is not my business, my money, my bottom line, that is one the line makes it feel like it is less about self-interest, for what it is worth.
Maybe I'll put some thought into what I can talk about, and then talk about it. Because my passions here do run high and it would be nice to be able to put some of my thoughts out there more specifically....
Gwyneth's question was an interesting one:
Out of curiosity, do you know the history of why Unicode didn’t create separate characters for the Turkish i? The i is only character that changes casing based on the language (Turkish/Azeri). I did a little searching online, but didn’t find any obvious references to the rationale behind this. Thanks!
Out of curiosity, do you know the history of why Unicode didn’t create separate characters for the Turkish i? The i is only character that changes casing based on the language (Turkish/Azeri). I did a little searching online, but didn’t find any obvious references to the rationale behind this.
As was Peter's answer:
That would have been a Unicode 1.0 decision, before I had even heard of Unicode (which I first heard about around 1992/3), so I’m not sure. I suspect it was pre-determined by legacy standards. Encoding a Turkish i probably wouldn’t have been enough; it probably would have been necessary to encode a separate Turkish I as well. Arguably, both would have been duplicating characters and certainly they would have resulted in confusion with 0049/0069 – likely with some data getting encoded one way and other data using the other. Chances are we would have ended up facing the casing issues as well as data in mixed representations.
This sums up the principal reasons quite nicely!
It is unfortunate that the experience most people have with Turkish is how it highlights code that does not handle globalization issues (blogs like this one summarize the approach quite well and the "Turkey Test" is no worse that anything else one could call it).
Though I think I still owe a blog post discussing vowel harmony and other linguistic features affecting Turkish. It is coming, eventually. The goal of giving Turkish a better legacy than the Turkey Test may be tilting at windmills, but maybe I can at least point out there is more out there....
Over in the Suggestion Box, Vincent asks:
Why does Syriac display so poorly in WPF? If you enter Syriac with vowels, the letters don't join as they should, often using the wrong form of the letter (final instead of medial or initial, etc.). The Syriac abbreviation mark doesn't function at all. You cannot combine a Fatha and Kasra on the same consonant. The tatweel won't join with consonants. All of this is true in the .NET 4.0 Beta as well as the shipping version, so no progress has been made since initially reporting these issues over a year ago. Syriac in WPF is just gibberish. The last time I reported these bugs, someone from Microsoft tried to blow me off by suggesting that Syriac wasn't a supported script in WPF, even though it is clearly listed as supported on Microsoft's website. What gives? The Office team doesn't have this many problems with Syriac, why can't the WPF team get it right?
Why does Syriac display so poorly in WPF?
If you enter Syriac with vowels, the letters don't join as they should, often using the wrong form of the letter (final instead of medial or initial, etc.). The Syriac abbreviation mark doesn't function at all. You cannot combine a Fatha and Kasra on the same consonant. The tatweel won't join with consonants.
All of this is true in the .NET 4.0 Beta as well as the shipping version, so no progress has been made since initially reporting these issues over a year ago.
Syriac in WPF is just gibberish. The last time I reported these bugs, someone from Microsoft tried to blow me off by suggesting that Syriac wasn't a supported script in WPF, even though it is clearly listed as supported on Microsoft's website. What gives? The Office team doesn't have this many problems with Syriac, why can't the WPF team get it right?
Well Syriac is interesting here in that the shaping engine work first done in Uniscribe years ago did nit seem to get completely ported.
One of the major problems I have with WPF is how internationalization support is a line item which, while important, has to compete with every other line item, and it is hard to get enough attention sometimes based on potential market benefit when you compare to features that are considered "sexier" by which I mean more widely used.
In fact one of the reasons for the resiliency of Uniscribe is no one seems able to step up to support everything the way Uniscribe does - so in Windows Uniscribe is still the best in most cases.
Plus there are some other general Bidi issues with WPF that I'll talk about soon....
I can say whoever claimed this was not a problem is dead wrong, though I have not tried the absolute latest version of .Net 4.0's WPF here so if they have addressed this then someone try it and let me know.
Here is some sample text from the Syriac New Testament - grab from here.