Sorting it all Out Michael Kaplan's random stuff of dubious value Be sure to read the disclaimer here first!
Robert A. Heinlein told a story in his book Expanded Universe back in 1980 (bear with me, I promise I'll be making a point eventually):
A few years ago, I was visited by an astronomer, quite young and brilliant. He claimed to be a long-time reader of my fiction and his conversation proved it. I was telling him about a time I needed a synergiestic orbit from Earth to a 24-hour station; I told him the story it was in, he was familiar with the scene, mentioned having read the book in grammar school.This orbit is similar in appearance to a cometary interplanet transfer but is in fact a series of compromises in order to arrive in step with the space station; elapsed time is an unsmooth integral not to be found in Hudson's Manual but it can be solved by the methods used on the Siacci empiricals for atmosphere ballistic: numerical integration.I'm married to a woman who knows more math, history, and languages than I do. This should teach me humility (and sometimes does, for a few minutes). Her brain is a great help to me professionally. I was telling this young scientist how we obtained yards of butcher paper, then each of us worked three days, independently, solved the problem and checked each other -- then the answer disappeared into *one* line of *one* paragraph (SPACE CADET) but the effort had been worthwhile since it controlled what I could do dramatically in that sequence.Dr Whoosis said "But *why* didn't you just shove it through a computer?"I blinked at him. Then said slowly, gently, "My dear boy--" (I don't usually call PH.D.'s in hardcore sciences "My dear boy"--they impress me. But this was a special case. "My dear boy... this was *1947*."It took him some moments to get it, then he blushed....
Its a story that comes into my head every time I get a question these days that proves the person asking is not thinking about the fact that the passage of events has an influence on what is possible. Nowhere is this greater that the subject of this posting -- people who wonder why Microsoft does not support the Unicode Collation Algorithm. People notice that Windows seems to have a similar framework and they assume that both of them use the same "default table" that works as a basis for all collations (in other words they assume that Microsoft is based on the based on the Unicode sort weight tables).
The truth is quite different. Unicode's weights have been a part of the UCA, which was first a DRAFT Unicode Technical Report in March of 1997. It did not lose its DRAFT status until November of 1999 and was not a Unicode Technical Standard until August of 1999.
Windows, on the other hand, has had its architecture and its default table in place since NT 3.1 shipped, over a decade ago. How could it be based on the Unicode sort weight tables, which did not exist at that time even in draft form? The temptation to respond to the person asking with a "My dear boy..." (or "My dear girl...") is at times overwhelming!
As to the extra functionality, I'll just say that in the past 15 years have seen a lot of language support being added to Windows, and the expertise that has been applied to its collation support is truly amazing. Its a daunting functionality to work on at times given how well it has performed over the years. :-)
From a philosophical perspective, collation in Windows has always based primarily on the linguistic data that is at its core -- the technical issues have always been driven by the data, not the other way around. I think this is a unique strength of the implementation that allows it to outperform others across a range of languages that is also (in my opinion) far superior. The tables were certainly built up with an entirely different linguistic and development philosophy, and ignoring my opinions about which is better, the data of either one would really be a poor fit for the other.
It is of note (well, to me at least!) that at the last two Unicode Technical Committee meetings that several decisions were made which will cause future versions of the UCA's default table to behave more like Microsoft's. This is not because it's Microsoft's way (we give advice about principles for the UCA but really do not innovate for it since we are not using it to come up with innovations) but because one of the authors of the UCA suggested tweaks to the UCA behavior based on expert advice and user feedback. I guess that means we had the right idea, huh? :-)
Microsoft has had Unicode as a part of its operating system offerings since the easrliest days of its
In recent conversations about the atomic Malayalam chillu on Unicode's Indic list, we do find that the
In one of the very first blogs I wrote, I pointed out that Microsoft does not use the Unicode Collation
It is common knowledge to those not guilty of my dear boy type offenses that native, Win32 NLS pre-dates
Just yesterday, I was asked:
Hi Michael,
I'm running the Release Preview, and I found something