Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
This post may not be of interest to all readers since it not only covers technical topics (which can easily turn off half my readers!) but does so in a slightly more sociological kind of way (which can easily turn off the other half!). Feel free to skip, as appropriate.
Although this post shares a conceptual framework with Optimized for English (oh, and also Japanese, and maybe a few others), that post was really about ClearType and the group that was at the time referred to most often as the Advanced Reading Technologies team, while this one is about string comparisons in the .Net Framework.
So we are thinking about two entirely different groups of people sitting in two entirely different [sets of ]buildings in two entirely different divisions of Microsoft, and there is no relevant organizational or architectural connection between them....
My hints about this topic have been sprinkled throughout he blog but one of the more obvious ones can be found in a side paragraph in A&P of Sort Keys, part 13 (About the function that is too lazy to get it right every time):
In another astounding burst of irony, there is an even quicker version of the string comparison function found in the .NET Framework which, due to a bug that I only accidentally found in the beginning of the Whidbey (2.0) product cycle, was never even called in versions 1.0 or 1.1. Subsequent to the decision to fix the bug by enabling the superfast function1, no fewer than five bugs were found by testers in the logic of this attempt to capture optimizable calls to CompareString-like code, due not only to the problem above but an unrelated obsession with making English faster even if it makes the more complicated languages a bit slower2....
Now some may think I was being a bit harsh, but the optimization is for string comparisons done on Unicode strings when the characters are all in the ASCII ( <= U+007f) range. Basically when the string is created (and remember that in .NET strings are immutable so creation time work is pretty much the only time that the contents can really change!), it walks the string and determines whether it is one of these simple ASCII-content strings.
Now obviously that is not all that needs to happen, since word type sorting (as compared to string sorts, discussed earlier in A few of the gotchas of CompareString) affects the way the hyphen (U+002d) and the apostrophe (U+0027) sort, and they are down in that range. And there were other exceptions, too. So a big map of the "fast" characters had to be put together for this "default" case.
If you are a developer I'd like you to take a moment and think carefully about the consequences of building such a map and using it in the pre-processing. Incredible effort is being gone to here to look for the most optimizable case, given the very real belief that this will happen often enough that the case is truly worth optimizing.
Of course many locales have exceptions for how the letters in this range sort -- and so these other locales have to be excluded too. The decision for how to decide what locales to use is a rather simple macro in a C++ header file (you can actually find it yourself in the Rotor source if you have spent time in there, it is fairly hard to miss):
//This is the list of locales which we currently understand are fast.//We should only do fast comparisons in one of these locales.#define IS_FAST_COMPARE_LOCALE(loc) \ (((loc)==LOCALE_INVARIANT) || (PRIMARYLANGID(loc)==LANG_ENGLISH))
Ah, so we are restricting ourselves to invariant and English (all of the English locales that are there) -- nothing else is "fast" here?
But of all of the cultures that the .NET Framework supports, about half of them don't use the default sorting table (i.e. they have exceptions or compressions or reverse diacritics or whatever) -- say about 70 of them, and a little over half of those remaining (say 40 of them) have exceptions/compressions that impact this default table3. Thus of all of the cultures in the .NET Framework, just under 3/4 of them could use this optimization.
Yet it is essentially restricted to English and Invariant (which is basically English).
So you notice the team bends over backwards and is even willing to slow down string creation4 in order to find this optimized case -- but only for English (no one chose to build up a "fast locale map" for the almost 3/4 of cultures that could benefit from the same optimization). Do we need further proof that the framework comes from a US-based company? :-)
When you consider the fact that there was a 1.0/1.1 bug that kept the optimization from ever being used (once it was turned on rather than being yanked out, they found several bugs in sorting results and each time I suggested the code be removed but was overruled -- it was a lot of fun being right but not so much fun not being heeded!).
In my mind, the proof that the code was never used and thus probably had bugs was sufficient for me -- the old code was good enough, and they could even try to remove some of the string initialization performance hit if they wanted to make all operations faster rather than enabling an unproven optimization in string comparison. It did seem like there was an almost religious battle about the heresy of removing an optimized (but dead) code path merely because it was dead....
Now I have alway looked at .NET as a full-size Skunk Works-type project from the very beginning and right on through Silverlight, with small teams of people working to get good proof of concept work and then after driving the concept bulking it up and then shipping it. The team is obviously huge and there is a lot of process there (so my Skunk Works notion is not a fully accurate picture of how things actually work in practice) but I mean in terms of mindset and focus.
I mostly mean it as a compliment from an engineering standpoint, with the exception of international support issues, which often fall off in proof of concept and don't always get fully added back later.
So what does this team have in common with the ClearType team? Well, clearly there is a particular scenario-based focus on a subset of the customer population in both cases; the .NET case is perhaps slightly worse since I think we wrote a lot of the code here and own most of the code now (even the parts we didn't write). Though on the other hand much of the overall ownership of ClearType started from a small group that splintered from the core typography team which obviously was required to have a wider focus that included other languages, so perhaps both teams are guilty of this scenario narrowing effect?
Now I don't want to make too much of this, since in the end even the original code (which never used the optimization) was fast enough to work. But I do want to point out how the intent shapes the implementation in ways both subtle and not so subtle....
1 - My recommendation to yank it out since it we never called anyway was denied; I was overruled given the universal reluctance to remove an intentional optimization from the code, even one that was never used.2 - An occupational hazard of a US-based software company. :-(3 - As an example of one that would not be included, consider Farsi/Persian, which has exceptions but obviously none in the ASCII range.4 - Anyone who does not think that walking the string is not a performance issue does not understand the lesson taught by That function is always faster! (well, except for that one case when it can actually be slower...)!
This blog brought to you by B (U+0042, LATIN CAPITAL LETTER B)
I know regular readers have been waiting impatiently with the next post to the series after How many ways can a developer say 'File Not Found?' (aka Making your localizer's life easier, Part 1)....
Yesterday, Larry Osterman had a pretty funny blog entitled "We're back and...".
And it is pretty funny when a T-Shirt slogan like
Lan Manager... We're back and we're BAD.
gets translated to the other language equivalent of
Lan Manager... We're back and we're not very good".
and to stop for a moment to take a joke and make it decidedly unfunny by dissecting it, there is an important lesson in localization to be learned here.
the problem is simply stated, and I'll add a bit to it so that items that you may yourself be guilty of can show up in the list:
Avoid colloquial words and phrases, and minimize the use of abbreviations and acronyms
Here are some great negative examples that have been purported to have ended up having localizers needing to do something with them:
Interestingly, just like as in Part 1, the advice is also a recommendation for the original pre-localized product as well -- too much of this is likely to be hard for regular users of the unlocalized product and there are often strict guidelines in both user interfaces and documentation guiding behavior here.
But geeks will be geeks, and it is a constant battle to get the right results.
And the guidelines themselves often fail to assist: for example, in documentation on the first occurrence of an acronym one is expected to spell out the acronym. But if one finds GDI confusing one is unlikely to find GRAPHICS DEVICE INTERFACE to be the magical road to understanding. In fact, the guidelines can often increase confusion!
The T-Shirt is a great example of what localization can add to the mix when it is entirely possible for a localizer to either "not understand the text" or "miss the joke", and the results can be less than stellar.
Though admittedly quite hilarious (when not dangerous!).
(Hat tip to the three people who sent me the link to Larry's blog between midnight and five in the morning, all several hours after most of this blog had been written!)
This post brought to you by ⌢ (U+2322, aka FROWN)
People never ask the easy questions, now do they?
It starts from an early age -- why is the sky blue? Why do I have to go to school on days when nothing interesting is being taught? Why I do I like that girl? And so on....
And it never gets any easier. Like yesterday, when the questions went something like this:
I work on an application and we use the digit grouping symbol set by the user. We are currently able to support digit grouping symbol one character long, whereas intl.cpl allows the user to set up to 3 character string as the digit grouping symbol. Are there any languages which have 3 character digit grouping symbol and why does windows provide this support?
Like I said, people never ask the easy questions, now do they?
Now if you look at every shipping version of windows and every possible built-in locale that ever has been or is, there is only one locale with a LOCALE_STHOUSAND that has a length greater than one WCHAR in length when returned from GetLocaleInfoW, and that is the test qps-ploc pseudo locale (LANGID value 0x0501) I mentioned in Walking off the end of the eighth bit and The name of the song is not 'ps-PS I Love You' (it's value is ,, presumably just to help people test the kind of code assumptions that this application was making.
So the direct answer is that no locale on the system that is really there will have this problem.
But in the words of The Wolf in Pulp Fiction, Let's not go sucking -- er, never mind, you can watch the movie. Let's just not congratulate ourselves just yet....
This particular upper limit has been around Windows since at least NT4 (possibly earlier?) and I believe has been settable programmatically and in the UI all this time. It can be hard track down why something was done a decade ago in Windows (people have either moved up, moved on, or moved out -- of the fewer than 10 people who have "owned" the data and 30 or so who have made changes to it, probably half of them are still at Microsoft but among their cadre are enough General Manager and Director types that a meeting of locale data changes checked in club seems unlikely!).
But with that said, most of the locale upper limits were based on research into future locales that might need the higher limit (for some other properties the limits had to be raised when future locales proved that the extreme choices were themselves not extreme enough -- that way lies When the roof got raised, and why).
And of course it almost goes without saying that in a custom locale/custom culture, all bets are off and it is quite likely that one of those "not [yet] added to Windows" can suddenly be on a machine.
Given that, the "why" question is probably not as interesting as it seems. If any user can apparently break your application in some way, that sounds serious and it should definitely be considered a bug in the application, though it likely does not happen much or people would be reporting it more -- so it can be triaged accordingly (the fact that no built in locales other than a test one designed to ferret out such applications would hit the problem would also be a good factor in the triage).
But given how easy it is for someone to injure anyone who makes the assumption, the bug should be triaged on the seriousness of the behavior if it does happen -- no one wants an application that can be brought to its knees with a single SetLocaleInfo call.
In part we already have that -- the knowledge base is full of articles like Q251005 which point to the strange things that happen to applications when these features are changed, and the problems are not going away (even some of .NET parsing gives up if you have the decimal and thousands separators the same, to this day!).
But if your application can't work within the documented limits that any user an take advantage of, you are going to find yourself with big problems down the road....
This post brought to you by ≠ (U+2260, a.k.a. NOT EQUAL TO)
Over in the Suggestion Box, regular reader Jeroen Ruigrok van der Werven asked:
Michael,any idea why MSDN offers all language packs for Office 2007, yet does not offer the multi-language one? Doing testing on this front makes installing a bit tedious (especially since you cannot just launch the MSIs from the different proofing directories --for installing the newer IMEs-- since it will give you a "Error 1713. Setup cannot install one of the required products for Microsoft IME (...)"). Thankfully I managed to find out about the LAUNCHEDBYSETUPEXE=1 trick to pass to the MSI on the command line, but it sure is annoying.
Of course I can't really speak for anyone on the Office side of the business, but I know from listening to people talk about the (not entirely dissimilar) Windows language pack that what has driven these particular SKUs.
It really wasn't the MSDN style environment, which is not really at all what they are about.
They are about a particular interesting deployment scenario -- OEMs who want to build multiple languages into a single image in order to deploy languages that are expected to be used together, to people who pay the extra money to get that kind of a SKU....
And clearly they aren't slicing up the pieces the way you are related to the IMEs, right? I suspect over time those workarounds will stop working since they wanted those this to not fail.
If you look at white papers like Customize a multilanguage deployment of the 2007 Office system and others under the Deploying 2007 Internationally node, it is pretty clear who their targets are (by looking at who they made it easiest for by giving them useful instructions!).
Of course the people who are building such images can always test the ones they create, but making it easier for everyone else to do so doesn't necessarily seem like a scenario they are going out of their way to make easier. Kind of like they aren't against it but they aren't jumping all over making it easy to do.
Now again I have no idea if this is what either Windows or Office has in mind here, and I don't get invited to those kinds of meetings. But from a priorities standpoint it is what things look like to me.
So, to properly "sell" the importance of the scenario, you probably have to explain why it is important to you, and then if that is compelling then perhaps it will become easier....
This post brought to you by O (U+004f, aka LATIN CAPITAL LETTER O)
So yesterday in The most important language in the whole wide world is yours, and you hardly even know yours! -- NOT!, post, I bemoaned a specific situation where the only documentation that existed for a particular issue was some poor hints in scattered MSDN topics and some posts on my blog:
Between you and me, the idea of "documentation" that is only contained in this blog is not something I am entirely comfortable with, and not only for the reasons that inspire disclaimer's like Raymond's. This is a topic that I plan to blog about another day since it includes just the mix of Policy, LCA, insanity, and inanity so as to delight and inspire the cynic in me. Stay tuned if this topic interests you! :-)
An even better example of when this whole issue came to a head is in Raymond Chen's Things I've written that have amused other people, Episode 4, which showed a real example that concerned him with is blog being treated as actual official documentation.
I get this from time to time too, though usually not as much -- Raymond once told people that he was envious about how I was able to set the tone here such that people didn't generally take my blog as official documentation (I talked about this issue and gave my own opinions on why that is the case in SIAO: As unofficial as you can get without a prescription).
But then there are times that some feature or bug fix was denied by triage and my lead literally asked if I would be willing to help out by posting about the workaround here in my blog. A good example is that post Cue the smarter version of GetDateFormat... ok, it's a wrap!.
And there are even more concrete examples like the Behold the Table Driven Text Service series, which is specifically being written because the team that was going to document this major feature ended up not doing it (after specifically asking me to delay my posts for over a year in anticipation of that documentation, which ended up never happening. It doesn't read like official documentation, but it is as close to official documentation as this feature is likely to get for the immediate and maybe even known future.
So what does it mean when you aren't official but you are more of an occasional un-official back-channel when no official work or documentation is planned -- or when the work/documentation that was planned was scrubbed?
Putting information here among the "random stuff of dubious value" does not make it magically supported (especially since "here" stubbornly refuses to act like a well-behaved blog that is "on message"), and there is no remedy or recourse to trying to make something supported solely on the basis of it being here.
So while it means something, I am wracking my brain trying to understand what that might be, exactly.
This may be one of the reasons that I continue to treat the blog as something outside of my job despite the fact that it kind of isn't - the fact that what happens here often does not seem to be something that can be (in the words of Fox Mulder) programmed, categorized, or easily referenced.... it doesn't exactly fit....
Just for the record -- this site and its content is not officially supported, and I mean that officially. It is unofficially as useful as you believe it to be, but I am only saying that unofficially.
All characters, in an effort to prove that their sponsorship does not imply endorsement by Microsoft (a Unicode Consortium member) for the contents of this blog, have declined to sponsor this post.
Nothing technical, and yes, more Comcast stuff, different from the earlier stuff, but even so, sorry!
It suddenly occurs to me that someone from Comcast might be paying attention, if this comment is accurate (Gwyn doesn't think so, but even so....).
My Comcast high speed Internet just went down. It went down and although I called right away I was told that they knew about a problem in my area and were working on it.
They told me it should be up some time by 6:30 AM (it is 3:03 AM right now).
Yes my hours are strange. Why do you think my blog is? :-)
It is weird when the high speed Internet service is down (which it is right now) but the cable service is up, which it is right now, an Angel rerun on TNTHD, not much on this time of night beyond uninteresting reruns and softcore adult stuff on Skinemax I'd rather skip. Angel is also a niche fetish kind of thing too, but the moral depths are more appealing to me (and as the poem says, deep roots are not touched by the frost).
Unrelated point, but does anyone else in Seattle notice now the HD cable channels are the East Coast feeds while the regular non-HD versions are the West Coast? I guess I don't really care that much at home but it was annoying in a hotel room a few weeks ago at the last UTC meeting since the schedule is just off on stuff!
Oh yeah, I was talking about weird. Like that last paragraph wasn't? Hmmmm.
I personally prefer to talk to the cable folks than the high speed folks when reporting an outage, they both have the same outage info even if they have different troubleshooting steps when it's not a known outage, and frankly they just seem nicer. Some have suggested that is just that when the high speed goes down I am more strident, but when I talk to the cable folks when the high speed is down, I am just as anxious. They just seem more patient, you know?
One very important part -- when they tell you on the line that they know about an outage that affects you and that you don't have to talk to someone? Wait to talk to someone. Give me a second and I will explain why this is a Very Good Idea™.
A moment later someone is on the line. They verify my address -- that is, they make me repeat mine and I say it fast enough that it is easy to verify but impossible to look up. That's just me, always working to make sure I can see a few moves ahead.
So anyway this is when I am told that everything should be up by 6:30 AM. Sigh.
He heard my sigh, I think. It was not exceptionally loud but all he has is my voice on the phone so it makes sense that he'd hear my frustration.
He points out that he will credit me for the day.
That's nice, actually. I mean since it will be up all day when I'm not here but down for like three hours when I am not, this is a very nice gesture. I say so, and thank him.
This is why it was a Very Good Idea™ to stay on the line, by the way.
But then I am curious.
I ask whether I would have been credited that way for the down time if I hung up when the recorded message told me I could and that I did not have to wait to talk to someone.
He admits that I would not have gotten credit for the day, in that case.
Ah, my cynical nature is assuaged nicely here. There are two "flaws" in this design from the overall me-as-the-customer satisfaction standpoint.
Now both points are excusable when there is no known outage, but when something is out both are a bit more suspect.
It is perhaps my cynical nature again, but I would feel much more comfortable never being charged when the service is down and always being charged when it is up -- even though I'd only get a few hours credit rather than a day, I am sure that over the course of a month there would be enough random outages that it would all even out.
Because although the gesture was nice, this is just a perk that the person on the phone is empowered to give a customer who might be unhappy. It is not an institutional thing designed to get people the best service, but a way to help customer satisfaction at a micro level -- in a framework to allow anyone in the country to be helped out this way.
I suppose it looks like I am being cheap here, but it is more complicated than that. I am very much a value-for-value guy, who will not even blink about being charged for weeks of high speed service while I am out of the country but would be unhappy to know that for three hours of those weeks the service was down.
Clearly I am not trying to save money so much as just trying to get what I pay for.
But maybe this is intentional. Perhaps the bean counters at Comcast have calculated the issue both ways and this way is just more profitable.
Or even more deviously, perhaps it is not about the money at all, at least not directly. Maybe the Comcast bean counters were charged with figuring out which way would lead to more customer satisfaction in the long run. Maybe they calculated that there would be much dissatisfaction if people found out how often the service was down, after the fact. Plus if the credit is automatic then the implied sense of entitlement kicks in and no one is especially interested, but if those who are especifically unhappy enough to have a conversation are given a perk then they might actually be converted from unsatisfied to satisfied.
You know, while I was typing this the high speed service has gone up and down a few times.
I knew there was no sense using the RAS to get back in just yet. They weren't done fixing and there is no sense frustrating myself while they are still working on stuff. They kind of have a free pass for the day anyway. right? I mean, if the service goes down tonight I can hardly call and expect them to credit me twice for the same day, right?
Well, I guess I could. But I wouldn't (the whole value-for-value thing again).
I am reading this post over and I can already hear some of my friends pointing out how I am thinking about this way too much. Which might be true, but I am up anyway and Basic Instinct 2 (which to be fair is on Showtime right now and not Skinemax, er Cinemax!) is just not crying out to me to be watched. And the stuff I have to work on at this moment kind of needs that RAS connection.
This is actually how I am -- I do tend to really think all the way through stuff, perhaps in part to make up for the occasional impulsive act that I do without thinking. Which is not to say that I do that, but....
Back to the whole Comcast thing.
In the end, I don't expect Comcast to change how this works. For whatever reason they ended up giong with, they ran a bunch of numbers and made a decision that they felt was best for the company overall. On the whole I can't even claim to be dissatisfied since the whole appeals to my cynical nature (to compare, I used to love to stay in Grand Cayman the week before, the week of, and the week after Christmas, just to watch the rates go from low to high to low for the exact same room and meals, unapologetically -- you can do the same thing in Las Vegas though I have a much easier time getting rooms comp'd there!).
On the other hand, my kind of satisfaction is decidedly not what Comcast is looking for, since mine is the confident recognition that someone is running a good scam and running it so well that the masses don't notice. No company wants people to think they are the bad guy, even if they give tacit (yet cynical) approval. Especially since people like me talk to others (or to be more accurate, more popular versions of people like me talk to bigger groups of others than I myself can reach), and that monkeys with the whole "satisfied customers" end of the scam....
In any case, my hat is off to Comcast, for coming up with the best way to make the most of their non-service for themelves. Which in the end has a lot more to do with customer satisfaction than their service does. Off the top of my head I can think of many times that both Unicode and Microsoft have managed to handle such situations much more terribly (perhaps topics for future blogs, some other day!).
Looking at the cable modem, the connection has been up solid for long enough now -- it is only 4:30 AM but it looks like Comcast underpromised and overdelivered on that expected fix time. Excellent, I think (in my mental Mr. Burns voice).
This post brought to you by Ⳕ (U+2cd4, aka COPTIC CAPITAL LETTER OLD COPTIC HAT)
Some of you who grew up with Sesame Street may recall the old short they ran with the catchy lyric that inspired the title:
The most important person in the whole wide world is you, and you hardly even know you!
It is something that Paul C. Vitz (in his Psychology as Religion: The Cult of Self-Worship) has denigrated due to how it "...fills the empty self, but it perpetuates passivity and weakness."
Harsh words, but perhaps good ones to hear (I kind of ignored the Sesame Street bit, since even at a young age I knew that I was the most important person in the whole wide world to me even if not to the other misguided souls that are held by gravity to the third rock from the sun and didn't have a ton of respect for the people who needed this kind of thing to feel good about themselves, much preferring to "hanker for a hunka cheese" as the other little short would point out. In fact I suspect that this kind of sappy crap is actually where my cynicism was first born? :-)
Besides, as I just overheard, You don't go and change the color of the carpet when you are invited over for dinner. If you really were the most important person in the whole wide world, such behavior would be quite socially acceptable, as long as you did it. Do we need further proof in the flw in the reasoning? :-)
In any case, I was thinking about all of this just yesterday, when a question came in to product support from a customer who was seeing some strange behavior changes between versions of Windows:
Below are two lines of code that customer used to reproduce the discrepancy between Vista and XP. Why the difference in behavior? Is there a good work around?printf(" using 0x409: %d\n", CompareStringW(0x409, 0, L"대담한", -1, L"Roman", -1));printf(" using 0x412: %d\n", CompareStringW(0x412, 0, L"대담한", -1, L"Roman", -1));Using these two lines of code on Windows XP (32-bit) the output was: using 0x409: 3 using 0x412: 1Under Vista (64-bit) the output was: using 0x409: 3 using 0x412: 3Under Vista (32-bit) the output was: using 0x409: 3 using 0x412: 3
Just to decipher some of the hard-coded numbers for people who don't speak the MicrosoftReturnValue or MicrosoftLCID dialects, as described in MSDN in the Language Identifier Constants and Strings topic and the CompareString/CompareStringEx topics:
0x0409 == MAKELANGID(LANG_ENGLISH, SUBLANG_ENGLISH_US) 0x0412 == MAKELANGID(LANG_KOREAN, SUBLANG_KOREAN) 1 == CSTR_LESS_THAN 3 == CSTR_GREATER_THAN
0x0409 == MAKELANGID(LANG_ENGLISH, SUBLANG_ENGLISH_US)
0x0412 == MAKELANGID(LANG_KOREAN, SUBLANG_KOREAN)
1 == CSTR_LESS_THAN
3 == CSTR_GREATER_THAN
Some may recall a previous blog, in fact one of the blogs from the very first month I started blogging, entitled Unlike LCMapString, the sort keys for English characters precede the sort keys for Korean. The blog was trying to explain some confusing text in the depths of the notes for the managed SortKey class help topic:
Working with the value of a SortKey object is equivalent to calling the Windows API LCMapString method with the LCMAP_SORTKEY value specified. However, for the SortKey object, the sort keys for English characters precede the sort keys for Korean characters.
The truth is that this Windows behavior that puts Han and Hangul first before all the other scripts in the Korean locale exists for both both sortkey generation and string comparison, even though it is only really ever documented in obscure places like that SortKey topic, that blog post reference above, and my later A&P of Sort Keys, part 12 (aka Han sorts first!).
Now which script comes first -- Latin or Hangul, Han/Hangul vs. everything else, is entirely arbitrary. And the exact reasoning and nature of the Korean behavior being different, being at least a decade old, is no longer fully understood or known beyond the vague nature of a "request from the subsidiary" that people within the subsidiary can't recall the exact information about (or at least they didn't the last time I asked!) though given the generic idea that being first is somehow "better" is as good of a theory as any. I suppose we are just lucky that no other subsidiary had such a thing done for their language, too!
Reader George commented in that early blog:
Why can't you just remove this re-ordering in a future version? It seems like a weird step best removed.
Funny George should mention that, huh?
That is exactly what happened in Windows >= Vista -- this weird re-ordering was removed from the product.
Of course as a side effect we are seeing the very issue that is central to the original customer complaint that inspired this blog!
There is no real "workaround", though. The update to the collation tables that happened in Vista was accompanied a major version change as well, which means such changes can be expected....
Now you may disagree with this direction, in which case you should leave your address in the comments with an invitation to dinner and I will come by to change the color of your carpets as you explain your point of view on the matter to me. :-)
This blog brought to you by 대 (U+b300, aka HANGUL SYLLABLE TIKEUT AE)
Somewhere in between zero and the smallest possible negative number there lies another number.
NEGATIVE ZERO.
I am being facetious here; it is not a number that is any different than zero. In the end,
0 == -0
as just about any normal person in the world will tell you (I learned it in the first day I learned about negative numbers, myself). It has as much meaning as +0 does, in the end. Either way, it is still zero.
But that doesn't stop programmatic folk from coming up with explanations in the BCL Team blog like Decimal Negative Zero Representation, "by design" issues such as In VS2005 C++, Zero reported as negative zero for double type and bugs like Math functions make a distinction between 0 and -0.
We even had an issue in collation, believe it or not!
You see, when people are not using the whole sort digits like numbers functionality brought to you by the StrCmpLogicalW function, then digits and all of the rest of the numbers in Unicode have to get sorted as text.
Conceptually it makes total sense to sort all of the values -- so that ⓴ (U+24f4, aka NEGATIVE CIRCLED NUMBER TWENTY) comes first and then all of the other numbers greater than that like 8 (U+0038, aka DIGIT EIGHT) until at the far end you have ∞ (U+221e, aka INFINITY). And that is how it works in collation -- with numbers that have the same digit values, like
given the same primary weight but alternate other weights, in some kind of weird attempt to give them some kind of numerosoity kind of thing.
But then there is ⓿ (U+24ff, aka NEGATIVE CIRCLED DIGIT ZERO).
It was obviously kind of zeroish. Like 0 (U+0030, aka DIGIT ZERO). But the special extra weight slot for circled digits was already taken up by ⓪ (U+24ea, aka CIRCLED DIGIT ZERO).
Uh oh, what to do?
So in the end, in Vista, because of the notion that -0 somehow (conceptually) needs to come before 0, the weights for the ASCII digits were all incremented a bit so there would be room for NEGATIVE ZERO while still giving the digits consistent weights.
This, as it turns out, was not such a great idea, since it increased the size of sort keys that contain innocent items in them like GUIDs. So in Server 2008 it gets stuck somewhere after 0 (U+0030, aka DIGIT ZERO), where there is lots of room for zero-ish things like this.
This (and a handful of other bugs like this one, only fixed in Server 2008 as I explained here) is why Server 2008 has an updated major sorting version.
All because there was no room in the weights to make -0 < 0!
So in the end, in the real world -0 == 0, in Vista -0 < 0, and in Windows Server 2008 -0 ≮ 0!
This post brought to you by ⓿ (U+24ff, aka NEGATIVE CIRCLED DIGIT ZERO)
The specific advice was to do with shell extensions and their "guest" status in the world of Windows Explorer, but the general principle is something that can be and should be applied quite broadly....
Even to blog comments. :-)
I am never the type of superior elitist code snob who feels above people who improve on the things that I do. Or have done.
And even if something is no longer officially supported by Microsoft, I am not the sort to just ignore people who are still trying to work productively with code that I have produced in the past....
Like the other day over in the microsoft.public.platformsdk.mslayerforunicode newsgroup, Igor Solodovnikov wrote:
To build your project with MSLU it is recommended (http://msdn.microsoft.com/msdnmag/issues/01/10/MSLU/default.aspx) to include following in the beginning of project's link list:/nod:kernel32.lib /nod:advapi32.lib /nod:user32.lib /nod:gdi32.lib /nod:shell32.lib /nod:comdlg32.lib /nod:version.lib /nod:mpr.lib /nod:rasapi32.lib /nod:winmm.lib /nod:winspool.lib /nod:vfw32.lib /nod:secur32.lib /nod:oleacc.lib /nod:oledlg.lib /nod:sensapi.libunicows.libkernel32.lib advapi32.lib user32.lib gdi32.lib shell32.lib comdlg32.lib version.lib mpr.lib rasapi32.lib winmm.lib winspool.lib vfw32.lib secur32.lib oleacc.lib oledlg.lib sensapi.lib<you own libraries>There is problem when you try to use MSLU in Visual Studio 2005: if you add recommended list of nod's and libraries to "Linker::Input::Additional dependencies" option in project's property ages then every time you will start debug session using "Debug->Start Debugging" or build your project using "Build->Build myprj" you will get something like this: ------ Build started: Project: myprj, Configuration: Debug Win32 ------ Linking... Embedding manifest... Build Time 0:10 Build log was saved at "file://c:\myprj\Debug\BuildLog.htm" myprj - 0 error(s), 0 warning(s) ========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========So i want to say that your project will never be in up-to-date state. Interestingly enough that this problem does not appear if you build problematic solution from command line using msbuild tool: msbuild mysolution.sln /p:Configuration=UDebug This means that building solution from command line is slightly differs from building from IDE.There is workaround for this problem:1. Include the following in "Linker::Input::Ignore Specific Library" option: kernel32.lib advapi32.lib user32.lib gdi32.lib shell32.lib comdlg32.lib version.lib mpr.lib rasapi32.lib winmm.lib winspool.lib vfw32.lib secur32.lib oleacc.lib oledlg.lib sensapi.lib2. Include the following in "Linker::Input::Additional dependencies" option: unicows.lib kernel32.lib advapi32.lib user32.lib gdi32.lib shell32.lib comdlg32.lib version.lib mpr.lib rasapi32.lib winmm.lib winspool.lib vfw32.lib secur32.lib oleacc.lib oledlg.lib sensapi.lib <you own libraries>Using such settings your project will be up-to-date when it should.
That link at the beginning of what he wrote is to the article Cathy Wissink and I wrote in October 2001 (Develop Unicode Applications for Windows 9x Platforms with the Microsoft Layer for Unicode) and although it has obviously been many years and I can forget details in co-author situations as to who added what information, I am pretty sure the linker settings were part of what I contributed (with help from others behind the scenes, of course!).
Anyway, like I said I have no problem with people suggesting enhancements to what I do.
The only advantage that the original plan has over Igor's enhancement is that it is slightly easier to use one setting than two, but the benefit pales wshen compared not forcing rebuilds even when no rebuild is needed. Therefore I am happy to agree that this is a better idea.
So if you are using MSLU and you build it in Visual Studio, feel free to avail yourself of this advice!
This post brought to you by u (U+ff55, aka FULLWIDTH LATIN SMALL LETTER U)
These are my opinions, not Microsoft's. They are not even informed opinions. So please feel free to weigh them with that in mind....
Disclaimer: There is some kind of Comcast/Microsoft relationship, I think. To be honest, I have no idea.
I don't know a whole lot about being a monopoly.
Of course everyone assumes I am full of it when I say that. I work for Microsoft, after all. And they admitted to being a monopoly. So I must know all about them.
Well, not so much.
The bulk of the people in the company have nothing to do with either the reported OEM deals made for Windows or the reported strong-arm tactics using one market dominance to try to influence other markets. And they also are not involved in the architectual decisions that combine or sever components.
I know what I hear, which is what people on the outside hear. Believe it or not, senior VPs and corporate VPs don't talk to me about their decision-making process, either (I know a few of them and have even had hot cocoa with on occasion with one of them, but we didn't talk about this kind of stuff).
In effect, I know what you know about Microsoft as an "evil monopolist" and really nothing more, other than the fact that I know that a lot of what people believe isn't true since I get to see groups and how they work together. And how they don't work together even when I wish they would. I am so frustrated by the slow adoption of custom cultures/custom locales into Microsoft Office that if OpenOffice or really anyone else contacted me interested in integration I'd be happy to help them, just so some customers would get to see the power of having this feature available to them. Line up and integrate! :-)
Anyway, you probably know what I mean if you read here at all.
But this blog isn't about Microsoft, even though this Blog often is and even though so far I have just talked about Microsoft.
You see, in the past few years I have received countless pieces of mail from Comcast.
Now me, I am a Comcast customer, so this is kind of to be expected. I have all of the pay channels like HBO and Showtime, and I also have high speed Internet service.
Most of the mail I receive is not to do with my monthly bill or any of the channels I have or could have.
It is pretty much all about switching to use Comcast for my phone service.
I really don't want to do this, and I have made it clear that I do not. I have a minimal service landline phone service through Verizon that has an uptime that Comcast can't touch. And I know they can't because I know how often my cable goes down in a month and how much it has gone down and they have admitted to me that when one service (phone, cable, Internet) goes down they all probably would.
And every time I have tried to get some kind of service (another cable box, a different cable modem, a new channel package) they look for the best deal for me and the one big red flag that they ask about is whether I want the phone service -- if I get that, then they can always offer me a better deal for whatever service I am asking for. Even though it has nothing to do with that service.
When I tell them I am not interested, I feel like Donald Sutherland arguing with the pod people in that Invasion of the Body Snatchers remake, they make it clear how easy it is to just get the service -- cheaper for a year, an additional cable box free, a great package deal, whatever I want is mine if I sign on for the phone service.
I'm not joking when I expect they'll be offering neck rubs and sports massages any day now, if only I say yes to the PS (phone service, as a clever rhyme).
I point out that when we had the huge power outage in the end of 2006 (the one that I blogged about here and here) that my phone service was up (it was the only way I could blog in that time), and the cable/broadband service was as gone as the power was (in fact they were not up until almost a week after the power finally came back up wso it was even more gone -- Bill and Karolyn Slowsky would have been proud of me for my week of being a slow dial-up connection!).
They give up at that point, sadly -- how can they argue that point, really? But it does not stop the next ten mails (I once got three in a single day) or the way that they will offer me phone service again the next time I work with them. Even at the end of technical service calls and calls about service problems and outages -- it is never the wrong time at Comcast to push the phone service.
I'd really hate (under the circumstances) to work for Comcast and lose a love one -- I suspect that I'd be expected to sign up mourners at the funeral for Comcast phone service -- that is how pushy this all feels!
It has gotten to the point where the first thing I say when dealing with Comcast is that they should not offer me phone service, no matter what they do. They laugh a bit and then usually don't offer it (sometimes they do anyway), but I can tell where in the script they are skipping the push for phone service even when they respect my request.
Now, generally speaking, cable service is a "legal monopoly" since most markets don't give you a lot of choice in which service you want. It has kind of always been that way.
Technically Comcast is currently the biggest cable company in the US and the second bigger broadband company in the US.
Clearly they are bundling phone service in their cable signal, and making it the number one possible deal as they try to sign up every person they can. If it were not for the fact that I am stubborn son a bitch I am sure I would have taken the deal by now.
If I were a phone company, I'd be more than a little concerned about a monopoly bundling competitive technology in a way to coerce customers to use the technology.
Isn't this push to try to turn their monopoly (cable service) into increased market share for their other technology (their bundled phone service) the kind of crap that Microsoft was getting accused of all this time, trying to leverage one de facto monopoly to affect product placement and market share in another market? It isn't like they are offering up their cable lines to rival phone service, either. And they also aren't bundling phone service for free to people who download it, either. Compared to even what Microsoft was accused of in relation to Windows an Internet Explorer, they aren't even shooting par.
Now I know there are lots of differences and I know that I am not comparing apples to apples, or even apples to oranges. Hell, I may be comparing apples to bicycles or apples to pro wrestlers here.
But I do know that this is a pretty hard push and given the reactions that Comcast employees have to my reactions I suspect that I am either a rare holdout in this plan to bundle the three services together, or these people have received training to make me think that I am a rare holdout.
Like I said, I don't know much about monopolies. Sounds funny when a Microsoft employee says that, I know. But I really don't.
And I know even less about coercive monopolies (I have heard the term, and that is the extent of my knowledge), where companies take that market share and flex those muscles to try to become bigger in a way that is not well-thought-of.
I'm not turning away from the Comcast services I do like, in the end; I'm just not turning toward the ones I don't.
But isn't what they are trying to do in this market that is illegal, at least by some broad definition of the word? An illegal that is made legal by deregulation, perhaps, but then at least kind of slimy? And in that case perhaps an argument for some de-de-regulation so they stop coloring so far outside the lines?
Yes, I know since I work for Microsoft this is clearly a case of the pot calling the kettle black an whatever, but damn.
What's up with Comcast here, anyway?
I'm just tired of a relationship that is all about a sales pitch.
No, I don't want Comcast phone service, dammit!
This post brought to you by M (U+ff2d, aka FULLWIDTH LATIN CAPITAL LETTER M)
So the other day I noticed a friend of mine had added a My Blogs application to their facebook profile. I figured that with very small amount of facebook activities I do I needed way to fill my mini-feed without having to do actual work -- so something that would add my blog post every day would be awesome!
Unfortunately, it was neither platform double suede nor disco lemonade.
Because while facebook's mini-profile supports Unicode just fine, the My Blogs facebook application created by Space Program (a company with a facebook profile claiming they "...make cool stuff" and a website tagline claims that they focus on "...delivering extremely high quality products and services to make IT life easier") is not supporting Unicode. The screenshot:
Note the My Blogs application in the middle that puts question marks in for all of the non-CP_ACP characters, and the same entires in the mini-feed on the right that put the actual Unicode characters in.
And yes, the picture of me is my avatar, David Sim's delightful character Cerebus the aardvark. He is better looking than me, and has no smile issues since he never smiles!
Anyway, I am kind of torn at this point.
Should I give them a chance to fix this? You may have seen near the bottom of mu mini-feed that I gave them the feedback on this issue -- perhaps I should give them a chance to address the problem.
On the other hand, I could just make their application invisible while still letting the minifeed pick up the entries. It just seems kind of obnoxious to not give them some more profile space, without even seeing if they planned to do something here (if I hide them then when will I ever see their improvements!).
Not supporting Unicode does not make a company evil; it merely suggests the company is being narrow-minded.
And not doing so in a free application that will never make the authoring company much money also doesn't make a company evil; it merely suggests the company is being lazy.
To disprove both of those suggestions, we'll have to see if they address the inherent limitation in their facebook plug-in....
I've can wait to see if they step up here!
If they want help they can ask; As regular readers know, I am all about Unicode enabling!
This post brought to you by 女 (U+2f25, KANGXI RADICAL WOMAN, though represented by its more attractive butt double that it is a compatibility form of, U+5973).
I am not speaking for Microsoft here, so anyone who claims I am is subject to the utter moronic wingnut judgment for their lack of comprehension!
People love to complain about poor localization quality in software from Microsoft.
But if you think about it, other people talk about poor user interface and core product usability even before localization enters into the picture.
And it is really hard to separate the two.
But let's take that second point first and think about it.
Now if you ignore Microsoft Bob (as many in Microsoft like to do!) and that Windows Home Server book for a moment, the truth is that very little of the software produced by Microsoft has more than one user interface, despite the fact that there are definitely different levels of users in terms of knowledge of computers, experience, age, maturity, gender, and numerous other facets. So in the end the software may well be usable to some part of the audience (perhaps only geek developers with the same social skills as the original engineers, perhaps more!), and the usual problem being suggested is that it is not usable to some other segment of the audience, presumably (though not always) a wider audience.
This is easy enough to understand conceptually, and most of the buzz around Microsoft these days from UX people is about the usability, and those UX people have the job of trying to make the product usable by the biggest possible audience.
Some products are better at this than others. I do not know of a single person inside or outside of Microsoft who does not know of at least one negative example here, where a product kind of missed some chance to be considered usable in some error message or dialog or process or technique.
Sometimes that is a pure fault in the product, and other times it is a fault in this general idea of trying to average out the abilities of every single user even though every user is different and there is no way to have everybody find something 100% usable without giving different user interfaces for different groups (which most groups would find prohibitively expensive and even if they did not there is no generalized expertise around writing to such different audiences.
And so far such attempts to fix the problem suffer from the "expert mode" problems that Raymond Chen has spoken about in the past.
Plus there are only very limited mechanisms for "training wheels" that can be taken off for users who get more experienced and want to move into a different user interface experience.
Now when you add localization to the mix, you add additional dimensions to the problem:
At which point we get to the essence of the problem -- it is difficult to separate localization complaints that are indeed just purely due to bad localization and/or terminology versus those that are due to it just bring the wrong terns or the wrong interface or the wrong error message for a specific market segment.
The former problem, when it is discovered, can be treated as a pure localization bug, just as a core problem that affects localized platform can be treated as a pure localizability bug.
The fact that there are limited means to address the latter problem is kind of besides the point unless you really consider that this is completely the point. We need mechanism to address this problem.
Because this is the core problem related to localization quality that cannot be solved until after it is recognized and treated as a problem, and more effectively addressed from an engineering standpoint.
In a company and an industry that applauds ideas like Language Interface Packs due to how they make localization cheaper, I don't imagine that there are a whole lot of "Bill Gates" awards for people who make any language more expensive,even if the idea brilliantly solves the problem....
Know what I mean?
This post brought to you by ⷞ (U+2dde, aka ETHIOPIC SYLLABLE GYO)
One of the interesting things about being involved with internationalization is that anytime something vaguely interesting comes up with a hint of internationalization comes up, everyone will forward it to you!
In this case I had nine different people forwarding it to me, each I assume thinking they were giving me a good idea for a blog -- reader Tamara was the first one! :-)
Ultimately Mark E. Shoulson forwarded it to the Unicode List -- and with my pre-existing account at http://www.politicalcartoons.com/ raring to get used, it was easy to pick up a copy for the rest of you to enjoy....
Mark is eagerly awaiting the full Unicode version of scrabble....
A computerized version might be easiest if we can get the fonts together, makes me wonder if there are any font foundries want the gig to build that font, the one that will appropriately scale all of the letters so they look okay next to each other -- something kind of fixred width. Anyone?
It also leads to an interesting variation from the usual requirement that people agree on the acceptable dictionary is the agreement that people agree on the version of Unicode? :-)
Ken Whistler identified that last tile at the end as 𝌼 (U+1d33c, aka TETRAGRAM FOR DIMINISHMENT) and with a score of 𝑖 (U+1d456, aka MATHEMATICAL ITALIC SMALL I), it is clear that the score for its usage is indeed the square root of -1, an imaginary number.
So don't pity the one with the tiles; pity the poor soul who has to decide how to score this game!
His move is obvious to me personlly -- I immediately would put the Æ next to the G near the top of the board to spell GÆ or GAE, a virtually extinct (there is like one fluent speaker trying to teach others at this point) language of Peru also known as Andoa. That will be 21 points and your turn....
I did find a few other potential moves myself but I like GAE best given the linguistic aspects.
You can try yourself -- here is the cartoon. think you might recognize the other letters there? Anyone else have a move they want to suggest? :-)
This post brought to you by ש (U+05e9, aka HEBREW LETTER SHIN)
I had a friend tell me the other day (after watching a few of the Love Monkey episodes) that she totally understands why I loved the show and that I am a total Tom Farrell.
Probably the nicest thing anyone has said about me all week! Admittedly, it may not have been intended as quite the compliment that I took from it. :-)
I was talking about it the other night with Andrea, and she agreed with this assessment.
I explained to Andrea that I originally believed there were some real differences (beyond the fact that my job is not as cool, and that I have ususally neither a Bran nor a Julia in my life most of the time!).
The difference I had in mind was kind of implicit in comparing the episode The Window and the episode Mything Persons, and specifically the women who were interested in Tom in those episodes (Indie rock critic Abby Powell played by Rosemarie DeWitt in the former, and physician/life saver Dana plated by Erin Daniels in the latter).
In the former case, Tom was not interested in a relationship because what he really wanted from her was a review for the Barbarian Brother's Hello Nurse album, and in the latter he was very interested in a relationship even though it turned out Dana was heading off to Angola (not the one in Louisiana, the one in Africa!).
I just had trouble seeing myself as being that unable to be flexible; for the longest time after I had seen The Window it didn't sit right with me that the character could realistically be so focused on something (in this case the job) that even when he is heavily flirting and seducing a woman with whom we had so much in common that he could not even realize his exact interest and could be so shocked that the other person was interested in him.
With multiple people claiming they saw a bit of me in the character, I took this pretty personally; I just didn't feel that obtuse.
Liz was the one who told me that I could be that obtuse -- in fact, that I had been that obtuse, and probably would be again some day. She told me it was in my nature....
I kind of realized that she was actually talking about herself -- I had for years been not seeing that she was interested in me, and that I really never saw it until after she was gone and it was too late to tell her I finally understood what she was telling me.
I wonder whether I should have given her more. You know, noticed.
The whole situation reminds me of the series finale for Angel (Not Fade Away), when the demon Illyria inhabited Fred's body but Wes said he didn't want to deal with the lie of her pretending she was Fred (as he pointed put "The first lesson a watcher learns is to separate truth from illusion. Because in the world of magics, it's the hardest thing to do"):
Illyria: [Wesley has been fatally stabbed] You'll be dead within moments. Wes: I know. Illyria: Would you like me to lie to you now? Wes: Yes. Thank you, yes. [Illyria morphs into Fred] Wes: Hello there. Illyria [as Fred]: Oh Wesley. My Wesley. Wes: Fred, I've missed you. Illyria [as Fred]: It's gonna be okay. It won't hurt much longer and then you'll be where I am. [Begins crying]Illyria [as Fred]: We'll be together. Wes: I-I love you. Illyria [as Fred]: I love you. My love. Oh, my love.
That last kiss I mentioned was kind of the same thing, in a way.
Though in saying this I self-consciously note that I am not as interesting as Alexis Denisof nor am I likely to be catching the eyes of either an Amy Acker (who played Fred) or an Alyson Hannign (his real life spouse).
The "Barbarian Brothers" song from their fictional album "Hello Nurse" that plays at the end of the episode? I do not know if that was a fake song that actor James Ransone just sang, or if it is a real song (if it is I want to buy it!). It hit me pretty hard.
It is in fact the song that plays in the mental montage I see when I close my eyes and think of the whole situation -- if not my biggest regret ever then certainly in the Rob Gorden-esque top five biggest regrets I have. I see the whole exchange in the airport as part of the montage at the end of the show -- just after we see Tom watching the band play and just before Jake is looking at the new apartment he is thinking about moving into with his boyfriend. Me kissing Liz goodbye at the airport and her shiver. All the sadder that I only have figured out 90% of the lyrics....
Okay, so maybe my friend had a point. Perhaps one can on occasion be so blind to what is right in front of us. And when I say "one" I mean "I" of course.
The title of this post is a bit from the song. As I said I think I have most of the lyrics figured out now but what I am hearing may not be not 100% right, being colored by what the song reminds me of so I'll probably keep it to myself for now. No one else ever hears lyrics anyway, if you know what I mean.
Sorry, folks. I expect to be past all of this, some day.
This post brought to you by ♫ (U+266b, aka BEAMED EIGHTH NOTES)