Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
Jim's question from a few weeks ago was:
I’m being given a Unicode string and I need to determine if it will render cleanly using the system font (not displaying any blocks or “non-supported-glyph” symbols). I’ve tried using ScriptGetCMap() and GetGlyphIndices(), but both of these flag a character like 0x0C60 as not having a glyph – although it’s actually composed of multiple glyphs and it does render properly. Our product allows an administrator to push policy to client machines, which includes a custom message to show in a notification balloon. The administrator can enter the text on a console and might include Japanese characters, for example, and that text gets pushed to a bunch of clients, some of which can’t display those characters. The client software is supposed to display the custom message if possible (no blocks displayed) and fall back to a built-in message if the custom message won’t display correctly. Any pointers to APIs or sample code that will accurately determine if a string can be drawn?
I’m being given a Unicode string and I need to determine if it will render cleanly using the system font (not displaying any blocks or “non-supported-glyph” symbols). I’ve tried using ScriptGetCMap() and GetGlyphIndices(), but both of these flag a character like 0x0C60 as not having a glyph – although it’s actually composed of multiple glyphs and it does render properly.
Our product allows an administrator to push policy to client machines, which includes a custom message to show in a notification balloon. The administrator can enter the text on a console and might include Japanese characters, for example, and that text gets pushed to a bunch of clients, some of which can’t display those characters. The client software is supposed to display the custom message if possible (no blocks displayed) and fall back to a built-in message if the custom message won’t display correctly.
Any pointers to APIs or sample code that will accurately determine if a string can be drawn?
Sound familiar?
Well, it should!
At first glance, it is the same problem discussed in Is that character in the font or isn't it?, and that is a blog that is chock full of potential solutions!
Unfortunately, Jim's question adds one element to the problem, one new wrinkle.
And that is to also try and figure out the problem for any fonts that the system might map to via linking/fallback/substitution, etc.
And this does not exist.
To be honest, it isn't actually a problem that is worth trying to solve. As Michael Warning pointed out in that thread:
This unfortunately is a really hard problem. And the answer will be different depending on the text stack you’re using (GDI, GDI+, DWrite). The problem is that each stack has a different set of rules for font fallback – how it automatically changes fonts around when it encounters a character that isn't’t supported in the font you asked for.
Now Michael is thinking about the macro problem -- the complexity of all of the different models and trying to deal with how improbable it would be to capture all of these differences in code.
But to be honest the micro problem (looking at any one of these technologies) is still pretty complicated -- the kind of project where one will almost certainly fail, in the end.
So what can we do?
Well, the answer I would suggest will have to wait for the next blog.... :-)
I do try my best know what it is, and where it's at, as this makes me seem more in touch with things, you know?
The other day, Joe asked:
A friend of mine asked me the following question and I don't know, so I thought I'd see if anyone in here had an idea. "For some reason I get a list of font names beginning with @ in my font selection dialog (CFontDialog) on Vista, these fonts don't work correctly if I use them, any idea what they are?" Search engines don't seem to let you search for @ so it makes searching for a solution rather difficult.
A friend of mine asked me the following question and I don't know, so I thought I'd see if anyone in here had an idea.
"For some reason I get a list of font names beginning with @ in my font selection dialog (CFontDialog) on Vista, these fonts don't work correctly if I use them, any idea what they are?"
Search engines don't seem to let you search for @ so it makes searching for a solution rather difficult.
Wow, where to start, huh?
Well obviously I could talk about the vertical fonts and point to blogs like Let's get vertical and Rotate it when vertical? or even the more memorable ones for me like Expertise isn't always everything (aka When the one who is learning teaches us something important) -- the great @Arial blog! -- that were great professional relationship forming things (I still work with that team and they remain kick ass and cool with their simultaneously naive yet insightful take on issues!).
But I've already talked about that.
I was actually thinking about Tod Neilsen when I saw this, if you must know.
A former marketing VP at Microsoft who I have known for years via MS Access, he wrote a very nice foreword for my book (other, less important things he has been doing: he has also been an Oracle VP and a Crossgain principal and a Borland CEO and now runs VMWare).
One thing people don't tend to think about in association with him is how he managed to inspire fundamental changes in search engines.
Under his watch as the VP of marketing in the Developer Division they took the awful NGWS (Next Generation Web Services) message from Steve Ballmer's first CEO address back in 1999 and transform it into the language that became known as managed code: the .NET Framework, the C# language, and so on.
Now nearly 10 years later one has to wonder if the thought that the search industry had to change some of its fundamental algorithms to properly distinguish conversations about .Net from the second most popular generic top level domain or add the hash mark/pound sign (#) to mean something much more prominent than it was so as to pick up on C# helped Tod smile now and again as all the Crossgain hoopla was going on. :-)
As mischievous as all of that may seem from the other end of the telescope (ooooh! song title!), Joe's question about the @ fonts had me realize how lame all of the current Search technologies are about the @ sign -- both Bing and Google suck as they ignore it in a way that cannot be escaped or overridden -- a fact that really delayed the time before anyone was able to get good information about the vertical font feature.
And as Joe's experience shows, this experience still kind of blows. In almost every search engine.
Now I often complained in the past that neither Live (now Bing) nor Google really handles my blog all that well -- ref: Google doesn't seem to get blogs and others -- and they still kind of suck in some important ways.
But the fact that they can't handle the @ really proves that they don't understand Twitter either, given its fundamental importance there.
Now Unicode has on at least one occasion had to dissuade in response to a proposal a particular language's use of the @ as a letter in their orthography given the strong usage of it as a symbol and its behavior in search engines, which makes it all the more ironic that Twitter could succeed where even email addresses and teenage IM habits have failed -- to force a linguistic meaning on the @ sign!
So perhaps one day they will get their heads out of their collective asses and fix this problem -- and with luck that will mean that search engines might finally fix this 15 year gap in searching for information about vertical fonts in Windows and finally these stodgy relics who try to be so hip and cool might break the generation gap enough to understand text messages. :-)
Disclaimer: I once had a woman break up with me because I wasn't texting her enough, though I don't think is influencing my opinions here; I was texting her plenty but apparently I would only text her in response to her texting mne; I simply never initiated in this ares. Fair enough, and she had a point. It is hard to change habits that were formed in an age where only drug dealers and doctors (who are also drug dealers when you think about it!) had cellular phones and thus no one was texting yet.
The other day, in "What kind of soup?" is not exactly a soup question, is it?, I mentioned that I might have a technical example of the issue of
Not exactly a soup question, is it?
so think of this blog as me finally getting around to doing that.
It has to do with triage.
World-ready triage, to be precise.
It is a group that in most cases met twice a week, and worked to go through every bug in Windows that had some kind of globalization/localizability/international kind of issue and give a recommendation on how important it was to fix it, and by when.
It was pretty important in terms of the fact that a "must fix" recommendation could not be ignored, and it dovetailed nicely into my actual work of assisting other teams with their globalization/localizability/international issues since often a team that did not know exactly how to fix such an issue would benefit from someone who could work with them on how they could!
But I am not going to talk about the fixing so much in this blog.
This blog is about the triage.
The group included experts in a wide number of specialties -- development, test, and program management, for one. But also people specializing in lots of different areas, like:
and so on. The number of people would vary from me meeting to meeting, with bugs sometimes skipped to the next meeting if the best people to look at a particular bug report weren't in the room.
A tight little group, very efficient in almost every way.
Ironically, the one place they sometimes fell short was due to the very thing that got them the seat at the table -- their various/varied ares of interests and areas of expertise!
Because they had those interests, they would often be interested in bug details such as looking deeper into the description, checking out provide screen shots, asking for more information, and so on.
Here is the kicker -- they would want to do some of these things even if it would in no way change the recommendation of triage.
And if one measures efficiency of a triaging group in terms of how fast they go through bugs (so that they can get through more bugs in a meeting) then the fact that members would so often ask questions not relevant to everyone in the room -- that were not soup questions -- could really affect that efficiency.
Now I myself was at times guilty of the same problem, and it was an effort of will to remember that
If it isn't a soup question for the meeting to keep it out of the meeting!
Now with Windows 7 out the door and me working on something else now (though I do not know what, yet!), I my not be, in fact probably won't be, in that meeting anymore.
But I do know that if I were I would want to be better to keep those non-soup questions the hell out! :-)
Apologies for the title (note to self: never author blog titles under the influence to try to appear as a cunning linguist!)
So it was just the other day that Yong asked:
Ok, so it looks like we got a regression (or a design change) on Vista/Windows 7 from Windows XP. On XP/W2K3:============Start of regopts.txt============[RegionalSettings] InputLocale = 0409:00000409,0404:E0020404 ============End of regopts.txt============// Just having this adds for example the Chinese Traditional (ChangJie) keyboard. On Windows Vista SP2/W2K8 SP2 ============Start of regopts.xml============ <gs:GlobalizationServices xmlns:gs="urn:longhornGlobalizationUnattend"> <!-- User List--> <gs:UserList> <gs:User UserID="Current" CopySettingsToSystemAcct="true" /> </gs:UserList> <!--System locale--> <gs:SystemLocale Name="zh-TW"/> <gs:InputPreferences> <!--en-US--> <gs:InputLanguageID Action="add" ID="0409:00000409" Default="true"/> <!--zh-TW-ChangJie--> <gs:InputLanguageID Action="add" ID="0404:E0020404"/> </gs:InputPreferences> </gs:GlobalizationServices>============End of regopts.xml============It fails with: Unexpected Failure. Unsupported parameter. On Windows 7/W2K8 R2.It fails with:Event ID: 10008Source: InternationalError while changing keyboard/input method for "0404:E0020404".
Ok, so it looks like we got a regression (or a design change) on Vista/Windows 7 from Windows XP.
On XP/W2K3:============Start of regopts.txt============[RegionalSettings] InputLocale = 0409:00000409,0404:E0020404 ============End of regopts.txt============// Just having this adds for example the Chinese Traditional (ChangJie) keyboard.
On Windows Vista SP2/W2K8 SP2 ============Start of regopts.xml============
<gs:GlobalizationServices xmlns:gs="urn:longhornGlobalizationUnattend"> <!-- User List--> <gs:UserList> <gs:User UserID="Current" CopySettingsToSystemAcct="true" /> </gs:UserList> <!--System locale--> <gs:SystemLocale Name="zh-TW"/>
<gs:InputPreferences> <!--en-US--> <gs:InputLanguageID Action="add" ID="0409:00000409" Default="true"/> <!--zh-TW-ChangJie--> <gs:InputLanguageID Action="add" ID="0404:E0020404"/> </gs:InputPreferences>
</gs:GlobalizationServices>============End of regopts.xml============It fails with:
Unexpected Failure. Unsupported parameter.
On Windows 7/W2K8 R2.It fails with:Event ID: 10008Source: InternationalError while changing keyboard/input method for "0404:E0020404".
This is one of those architected backcompat breaks that was put in -- GUIDs were now needed, to replace the "fake" KLID values of prior versions that would forward to the appropriate Text Services Framework TIPs (which had been around for several versions, often atop the same KLID values that the older IMM based variants of IME that they replaced used to be on).
It amazed me that after all this time no one had still seemed to have published the list of the GUIDs so that people could replace existing scripts!
In fact, no one had asked me if such a list existed, really.
Which is odd since that is the sort of question I do tend to get a lot.
Anyway, I thought I would just take care of that now.
Here is the big table, with the old and new values:
A few quick words about this table.
Note that this information is mostly useless to you but does explain why all of the IMEs that use TextTableService.DLL have the same GUID for the first one -- you can use this imnformation to sound particularly impressive at a client site, by the way. :-)
The two GUIDs are used the same way that the KLID values used to be used. Thus for Yong's case,
0404:E0020404
becomes
0404:{531FDEBF-9B4C-4A43-A2AA-960E8FCDC732}{4BDF9F03-C7D3-11D4-B2AB-0080C882687E}
Anyway, sorry I never printed this list before; I did mean to but never got around to it. And then I forgot. :-(
Hopefully this will be of user to people, going forward!
Sometimes an implementation makes a certain feature impossible.
Like the way Microsoft does collation, in particular the way its DEFAULT table is implemented (a flat DWORD table for everything 0x0000 to 0xFFFF) means that you can't ever have compressions in the default table.
Could the implementation be expanded to allow for this feature, so that more languages could be a part of the default table?
Certainly.
But the current implementation has no solution here to the problem.
Now the Unicode Collation Algorithm does not define such a limitation, they allow compressions (they call 'em contractions) in their DUCET (what they call their default table).
Thus questions like Doug Ewell's are obvious ones to ask:
The announcement of the Public Review issue stated: 1. The data files contain weights for all new assigned characters. b. The ordering for Tamil and Malayalam has been improved, but would still need tailoring for the Tamil and Malayalam languages. I guess I'm puzzled why the default order for these two scripts wouldn't match the overwhelmingly dominant language written in those scripts. It's often stated that the default ordering for Latin also isn't appropriate for any language, but that's more understandable since so many languages are written in Latin. I don't claim to be an expert in either Tamil or Malayalam.
The announcement of the Public Review issue stated:
1. The data files contain weights for all new assigned characters. b. The ordering for Tamil and Malayalam has been improved, but would still need tailoring for the Tamil and Malayalam languages.
I guess I'm puzzled why the default order for these two scripts wouldn't match the overwhelmingly dominant language written in those scripts. It's often stated that the default ordering for Latin also isn't appropriate for any language, but that's more understandable since so many languages are written in Latin.
I don't claim to be an expert in either Tamil or Malayalam.
So why don't they just put everything in the default table to make it better for languages that have no need of the "dumber" version for these letters?
Why not, indeed!
Well, this is described in the UCA in section 3.2 Default Unicode Collation Element Table:
The Default Unicode Collation Element Table does not aim to provide precisely correct ordering for each language and script; tailoring is required for correct language handling in almost all cases. The goal is instead to have all the other characters, those that are not tailored, show up in a reasonable order. In particular, this is true for contractions, because the use of contractions can result in larger tables and significant performance degradation. While contractions are required in tailorings, in the Default Unicode Collation Element Table their use is kept to the bare minimum to avoid such problems. In the Default Unicode Collation Element Table, contractions are required in those instances where a canonically decomposable character requires a distinct primary weight in the table, so that the canonically equivalent character sequences are also given the same weights. For example, Indic two-part vowels have primary weights as units, and their canonically equivalent sequence of vowel parts must be given the same primary weight by means of a contraction entry in the table. The same applies to a number of precomposed Cyrillic characters with diacritic marks and to a small number of Arabic letters with madda or hamza marks. Contractions are also entered in the table for Thai and Lao logical order exception vowels. Because both Thai and Lao both have five vowels that are represented in strings in visual order, instead of logical order, they cannot simply be weighted by their representation order in strings. One option is to require preprocessing of Thai and Lao strings, to identify and reorder all logical order exception vowels around the following consonant. That approach was used in Version 4.0 (and earlier) of the UCA. Starting with Version 4.1 of the UCA, contractions for the relevant combinations of Thai and Lao vowel+consonant have been entered in the Default Unicode Collation Element Table instead. Those are the only two classes of contractions allowed in the Default Unicode Collation Element Table. Generic contractions of the sort needed, for example, to handle digraphs such as "ch" in Spanish or Czech sorting, should be dealt with instead in tailorings to the default table -- in part because they often vary in ordering from language to language, and in part because every contraction entered into the default table has a significant implementation cost for all applications of the default table, even those which may not be particularly concerned with the affected script. See the Unicode Common Locale Data Repository (CLDR) for extensive tailorings of the DUCET for various languages, including those requiring contractions.
The Default Unicode Collation Element Table does not aim to provide precisely correct ordering for each language and script; tailoring is required for correct language handling in almost all cases. The goal is instead to have all the other characters, those that are not tailored, show up in a reasonable order. In particular, this is true for contractions, because the use of contractions can result in larger tables and significant performance degradation. While contractions are required in tailorings, in the Default Unicode Collation Element Table their use is kept to the bare minimum to avoid such problems.
In the Default Unicode Collation Element Table, contractions are required in those instances where a canonically decomposable character requires a distinct primary weight in the table, so that the canonically equivalent character sequences are also given the same weights. For example, Indic two-part vowels have primary weights as units, and their canonically equivalent sequence of vowel parts must be given the same primary weight by means of a contraction entry in the table. The same applies to a number of precomposed Cyrillic characters with diacritic marks and to a small number of Arabic letters with madda or hamza marks.
Contractions are also entered in the table for Thai and Lao logical order exception vowels. Because both Thai and Lao both have five vowels that are represented in strings in visual order, instead of logical order, they cannot simply be weighted by their representation order in strings. One option is to require preprocessing of Thai and Lao strings, to identify and reorder all logical order exception vowels around the following consonant. That approach was used in Version 4.0 (and earlier) of the UCA. Starting with Version 4.1 of the UCA, contractions for the relevant combinations of Thai and Lao vowel+consonant have been entered in the Default Unicode Collation Element Table instead.
Those are the only two classes of contractions allowed in the Default Unicode Collation Element Table. Generic contractions of the sort needed, for example, to handle digraphs such as "ch" in Spanish or Czech sorting, should be dealt with instead in tailorings to the default table -- in part because they often vary in ordering from language to language, and in part because every contraction entered into the default table has a significant implementation cost for all applications of the default table, even those which may not be particularly concerned with the affected script. See the Unicode Common Locale Data Repository (CLDR) for extensive tailorings of the DUCET for various languages, including those requiring contractions.
Kind of says it all. There is a strong desire to not slow down for everyone's results just to help specific languages -- a tailoring for those languages just ends up being a better option overall, from the point of view of the people who write the spec for the algorithm.
Microsoft takes it a step further by not even allowing these exceptional cases in the default table; the only one that is really fascinating is the Thai case as it has an interesting story that I'll talk about another day (tomorrow, maybe?).
Now with all that said, there are times that I simply do not buy either Microsoft's or Unicode's argument, mainly when doing the design for a language that big companies are unlikely to ever provide tailorings for in their software implementations -- in such cases, putting the entries in the default table if it were possible (for Microsoft) or desirable (for Unicode) would mean no support required to make these languages work in a LOOT of places. And it would be nice for there to be a way to provide optimal support for as many people as possible.
Say if Microsoft had a "bonus default table" one could opt into that would contain all compressions that would go into the default table, if possible.
Unicode could solve the problem the same way, with a general purpose tailoring designed for everyone except when the extra performance benefits of its absence made it essential (if Unicode had this they might even be able to pull out some of the ones they have in there now!)....
Disclaimer: I am not an expert or even an inspired amateur in the financial world, and am not claiming to be here.
I was thinking about Las Vegas the other day.
Not the actual Las Vegas but the one from Oceans 11, which somehow seemed more real than the one of Oceans 13.
Anyway, in the movie the Nevada Gaming Commission requires every casino to have cash to cover every chip in play on the gaming floor.
I don't know if this is required in real life, though I'm gonna guess that it might not.
But then I thought about articles like this one, with interesting bits like:
“There is less leverage in the entire financial system,” said David A. Viniar, Goldman’s chief financial officer. At Goldman, $1 in capital now supports about $14 in loans and investments, compared with $24 a year ago.
Now obviously one can lose in Vegas, I have done it myself on occasion (it is why I no longer gamble in Vegas at all, actually, other than sometimes in the choice of event or party I go to!).
And obviously casinos can make money.
But what that notion of requiring a casino to have cash on hand to cover its chips does is guarantee for the people playing that if they win they will never lose anyway by being unable to redeem their chips when they are done.
Note that the financial industry has a safeguard to protect those people too, and themselves -- they have the government and the taxpayers to bail them out when they make mistakes, when they give out more chips than they can cover with their cash on hand.
The (fictional?) Vegas idea seems safer, because in that world the mistakes a player makes are the source of problems for the player, and not the mistakes of the casinos.
I never before thought of Las Vegas as being a safe bet, though in this aspect betting on a casino's ability to cover its losses is safer than betting on a brokerage house.
A regulation like requiring a casino to have the money to cover the chips has an interesting consequence, doesn't it? It means that the casino isn't gambling with the player money trying to spread the risk enough so that they never lose it all.
In other words the casinos would not be able to act like these financial institutions do.
There are differences like the odds ultimately favoring the house in Las Vegas and such. But with government propping them up, seems like the house always wins in the financial industry, too.
Of course there are nuances here that I am almost certainly missing and friends of mine like Monica are certain to talk my ear off about how I am comparing apples to bicycles.
But the notion of Vegas feeling safer is a hard one to shake, especially given the real lack of stories of casinos going out of business and customers being unable to cash in their chips, and the real plethora of stories like that in the banking industry that taxpayers are paying billions to cover.
So forget about the financial industry. I won't even do the safe bets like Vegas....
Microsoft is a company based in Redmond, Washington, in the United States of America.
Yes it is a world wide company.
Yes almost 60% of its products are sold to customers who are not in the US.
Yes there are development centers around the world and in many of them code that is written there ultimately can end up in Microsoft products.
But ultimately, that original fact is inescapable:
All it takes something like the DST 2007 snafu to get people to see it: a bug affects users throughout the world (including in the USA in places like Indiana) for over a decade with minimal help/work from Microsoft yet as soon as it affects Redmond too the push to fix problems and help users even have vice presidents and general managers and directors of Microsoft logging phone hours to help users and afterward there are numerous presentations about how each team dealt with the problem that all ignore the fact that they had been ignoring the problem all along for a decade.
I could give countless other examples but many are less well known and some might violate my NDA so for now you can trust me that there are other examples.
Now there is no shame in being a company based in Redmond, Washington, in the United States of America.
And I would not want imply otherwise either in a blog or in person.
Though there probably ought to be some shame involved in not realizing the pain one causes others (e.g. those other countries dealing with time zone issues for a decade, something I even not-too-gently but not-too-harshly chastised a couple directors about when that DST 2007 thing was winding down!).
Anyway, take the above as valid, if you don't then you may as well skip the rest of this blog and maybe even this Blog (since no relationship can really stay healthy when there is no trust!).
Did you know that any developer who is enlisted in the full sources for Windows (sources that include the compiler, linker, headers files, and LIBs as well as source) can build Windows?
It is true.
There are in fact developers in many parts of the world who work on Windows who have to do that very thing either occasionally or regularly. Or both.
Many people inside Microsoft have even given presentations about the strengths of such distributed development models and the advantages of being a company so large as to offer the opportunity of such models.
Now, for the other shoe to drop.
To build the full Windows product, all sources, you really must have a default system locale that will cause your default system code page to be 1252.
Such as US English.
The reason for this is that there are some source files that contain characters that are legal in cp1252 but in other code pages are either interpreted differently (incorrectly) or that will cause the build of those files to fail.
I ran across many of these as I was looking at code all over Windows and in most cases was not allowed to "fix" the problem as no one really saw it as a problem.
In almost every case I saw it was the same character (see Dumb quotes... or maybe they are just smart-ass quotes for which character it was) and the problem was in a comment.
A comment that was clearly created in an email written in Outlook using Word as the mail editor and then copied/pasted into the source.
Of course it is not a bug to make this mistake since it is not a bug to make a file unable to compile on another system locale.
Being a company based in Redmond, Washington in the United States of America, that just isn't a priority....
Now this is all well and good and is generally an internal issue at Microsoft that never impacts a customer in a way they would realize.
But if you look at recent version of the Windows SDK (formerly known as the Platform SDK), you may see an exception to this generalization.
First we'll look at the older version of the file in question, shobjidl.idl.
This one compiles everywhere.
The non-offending bit of the file, if you scroll down a bit, is:
// IShellFolder::CompareIDs lParam flags//// SHCIDS_ALLFIELDS is a mask for lParam indicating that the shell folder// should first compare on the lParam column, and if that proves equal,// then perform a full comparison on all fields. This flag is supported// if the IShellFolder supports IShellFolder2.//// SHCIDS_CANONICALONLY is a mask for lParam indicating that the shell folder// that the caller doesn't care about proper sort order -- only equality matters.// (Most CompareIDs test for equality first, and in the case of inequality do// a UI sort. This bit allows for a more efficient sort in the inequality case.)
Ok, see the problem?
That was a trick question, there is no problem.
Fast forward to a much newer version, like the one in the 6.1 and 7.0 SDK:
// IShellFolder::CompareIDs lParam flags// *these should only be used if the folder supports IShellFolder2*//// SHCIDS_ALLFIELDS//// only be used in conjunction with SHCIDS_CANONCALONLY or column 0.// This flag requests that the folder test for *pidl identity*, that is// “are these pidls logically the same”. This implies that cached fields// in the pidl that would distinguish them should be tested.// Without this flag, you are comparing the *object* s the pidls refer to.//// SHCIDS_CANONICALONLY//// This indicates that the sort should be *the most efficient sort possible*, the implication// being that the result will not be displayed to the UI: the SHCIDS_COLUMNMASK portion// of the lParam can be ignored. (Before we had SHCIDS_CANONICALONLY// we assumed column 0 was the "efficient" sort column.)////
Ok, now we have a party.
We have a couple of those quote characters that don't exist on all code pages and in fact for Japanese represent a byte that is illegal to have by itself, which means it will not compile.
The long and short of it is if you have a Japanese system locale you can't use this .IDL file unless you munge the file to remove the bogus quotes.
Now I don't know of any devs who write either code or comments in Word, but getting an email containing an "updated comment to better explain this bit" seems pretty obvious and not at all uncommon to see (if you ignore the relative uncommonality of such updates).
Oops.
This oops is in a couple of Windows SDK editions and some of those that shipped in products like Visual Studio and in the not-yet-shipped VS 2010.
In fact, I don't think it will be fixed for VS 2010 since they ship an already shipped PSDK and there won't be an update they pick up before they ship.
Oops again.
Anyway, they're on it now, and this will get fixed at some point.
That fix will eventually end up everywhere.
If you hit this problem, maybe you will feel somewhat less unhappy knowing that people like me can hit this problem a bunch of times in a night if I do a full Windows build. So that I share your pain....
And we are still a company in Redmond, Washington, in the United States of America.
Regular reader Jan Kučera asked me via the contact link:
Hello Michael,first time using this contact form, I hope I have chosen the most appropriate way for my question. :-) I would like to ask if you have any plans attending or speaking at the Tamil Internet conference 2009 in Köln this October...?
Hello Michael,first time using this contact form, I hope I have chosen the most appropriate way for my question. :-)
I would like to ask if you have any plans attending or speaking at the Tamil Internet conference 2009 in Köln this October...?
Now generally speaking I'd say that Jan chose the ideal way for this type of question, but I have had a couple of other people ask about the conference and whether I would be there, so I thought I'd just blog about it anyway....
Indeed there is a Tamil Internet 2009 being held in Germany (a lot more info about it here and it is indeed being held in Köln, which is to say Cologne).
I had originally, when I first heard about the conference, consider submitting a fuller version of that Behind the Proposed Change to Tamil in Unicode presentation I did for Unicode (slides here) with more of the follow-up info and the interesting code chart update issues (like what happened and can keep happening in Unicode and the one I did and the one Scott did on Wikipedia).
I imagined providing slides with both German and Tamil subtitles, or perhaps just separate German and Tamil versions of the slides. It is [perhaps not so] surprisingly easy to find volunteers to assist with such efforts, as I have discovered in many presentations I have done abroad in the past! :-)
Unfortunately, I can't even get funding from my company to fly down to San Jose for a standards meeting that Microsoft has an official relationship with; flying to Germany is something that I would be totally on my own for and I lack those kinds of funds.
Having been to a few of the previous conferences I know it would have been very interesting, and if an unknown wealthy uncle passed away leaving me a sack of bullion or I won the lotto next week I might be sending some urgent email to the conference chair begging for a last minute slot in which to do the talk.
However:
So in the end, this is one I'll really have to sit out.
Perhaps some future conference will be in the US or Canada and an even more updated version of the talk might be in the cards for the future. My Tamil and Bengali learning continue (albeit slowly) and perhaps might even be a subject for a second interesting talk about what Unicode does to (slightly help but mostly hinder) language learning.
If you will be there then be sure to have a drink and if possible tell me when ahead of time so I can do the same from here!
It started with an expression.
One I got from a movie.
The name of the movie was Finding Forrester.
This is a movie I liked a lot, though this is about one thing in particular. The relevant dialog from the movie, between William Forrester (Sean Connery) and Jamal Wallace (Rob Brown):
William: You better stir that soup.Jamal: What?William: Stir the soup before it firms up.Jamal: Why doesn't ours get anything on it?{{William looks out the window through the camcorder he is holding}}William: Come on. Closer. Now.Jamal: You got someone doing that kind of yelling?William: What I have is an adult male. Quite pretty. Probably strayed from the park. {{William shows Jamal the image on the camcorder}}William: A Connecticut warbler.Jamal: You ever go outside to do any of this?William: You should have stayed with the soup question. The object of a question is to obtain information that matters only to us. You were wondering why your soup doesn't firm up? Probably because your mother was brought up in a house that never wasted milk in soup. That question was a good one, in contrast to, "Do I ever go outside?", which fails to meet the criteria of obtaining information that matters to you.Jamal: All right. I guess I don't have any more soup questions.
Now this shows up a couple more times in the movie, times when one of them has a question and the other responds:
The whole concept is one I picked up from this movie, from time to time thinking about the interrogative statements of others and specifically classifying questions as to whether or not they were soup questions.
I suppose if you wanted to more succinctly define a soup question like if you wanted that top entry in the Urban Dictionary, you could think of it in terms of its antonym -- a question that is "not exactly a soup question" is one that is really not the business of the person asking.
I don't usually say it as often as I think it, mainly because most people don't get the reference.
But I find it to be a useful one, as there are entirely too many questions people ask that are not, in fact, soup questions.
I will give a technical example another day (tomorrow, unless something bumps it to later in the week) but for now will stay away from the technical, if that is okay.
And to be frank even if it isn't....
Anyway, on this last Saturday I happened to put in a twitter tweet/facebook status:
Michael is sticking to soup questions, and tequila, for the rest of the weekend.
to which my friend Melanie responded:
What kind of soup?
Now this is a fascinating question.
I live in Seattle and Melanie lives in San Francisco.
So if there were actual soup (which there was not; this was a metaphorical thing as the above exposition implies) then the kind of soup, while relevant to me, is not important or meaningful to her. There is no way that the type of soup would have any effect on her whatsoever and therefore would not be important to her.
Thus the question "what kind of soup?" is not, in this case, much of a soup question.
Despite being a question pretty much only about soup!
The other day I was sent mail about a Connect bug. This Connect bug, in fact.
The title alone (mbstowcs_s does not return an error when the current code page does not support all the characters in mbstr) might suggest what is going on to some of you
And the description will give a hint to some of you too:
When mbstr contains characters not supported by the current process code page, mbstowcs_s does not return an error and put garbage characters in wcstr.Example:setlocale(LC_CTYPE, ".1252"); //set the process to use a locale with English code page //you can also try not setting the locale. The default process LC_CTYPE locale is C //which means 7-bit ASCII.char* mbTestStr = "Test. 真的."; //this is a 9 character string with 2 Chinese characters. size_t charCount;wchar_t wcStr[50];errno_t error = mbstowcs_s(charCount, wcStr, 50, mbStr, -1);After the call, no error is returned, charCount becomes 12, and wcStr contains "Test. ÕæµÄ." It seems charCount is the actual byte count in mbStr. The two Chinese characters each takes two bytes in mbStr.The function should fail to convert the Chinese characters and return an error because the code page does not support Chinese characters. If I set the locale to ".936" (936 is a code page for simplified Chinese). No error is returned, charCount becomes 10, and wcStr contains "Test. 真的.". Everything is correct. _mbstowcs_s_l has the same problem if you give it a locale that does not support all the characters in mbstr.
Sound familiar yet? :-)
When people started digging into the issue, they found that under the covers, MultiByteToWideChar was being called with the MB_ERR_INVALID_CHARS flag.
Which should really at first glance be able to protect developers from this kind of thing -- if a character is invalid there are times you would like it to be treated as such!
Unfortunately, like I pointed out back in 2007 in What's up with MB_ERR_INVALID_CHARS?, it doesn't always get to work this way.
In fact the byte in question (0x8F) is not defined in code page 1252, but not handled by MB_ERR_INVALID_CHARS -- thus you get this "ignore it" behavior, along with being mapped to a control character that comes up as garbage.
So the backcompat issue rears its ugly head, with the argument being that this behavior has always been there.
When I think of all the breaks that have been introduced in the last few years in code pages for stated reasons like security hardening and Unicode conformance, I wonder whether it is a good time to question these issues and clean up crap like this.
Though I may be the only one who feels that way....
It was supposed to be a magical night.
I was given tickets for a show by the funk master George Clinton down at the Seattle Showbox.
My musical tastes often confuse and occasionally frighten people, but in this case whether one is thinking about Parliament or Funkadelic or Parliament-Funkadedlic or the P-Funk All Stars, one is thinking George Clinton and I'm hardly alone at feeling like he is the master of something important.
Several important things, in fact.
Because sometimes, we DO need the funk, and gotta have that funk.
I asked Jennifer if she was free that night, she was.
And whether she's be interested in seeing the show, she was.
Jennifer.
I should probably say a word or two about her to explain what that name is supposed to convey if you don't know her.
Do you know that myth/fantasy that nerds/geeks have (including myself) about really hot blondes who are geeks/nerds themselves, and who are attracted to brainy people?
Well, that's Jennifer.
It was like I had just asked out a unicorn or something!
This was starting to shape up into something hot.
A reservation for dinner at Dahlia Lounge was a no-brainer (admittedly a venue that is better for the other Showbox rather than Showbox Sodo, but the reservation was early enough to make it work).
Anyway, as the event approached, my twitter tweets/facebook statuses started to reveal my excitement about the show coming up.
I even took one of those status said I liked it. And she tagged it the same way: she liked it too.
The anticipation, man. The anticipation.
And then it was happening. :-)
Dinner was wonderful, as expected.
And as usual (well, at least to my thinking of usual given recent examples) Jennifer was wonderful too.
The day might someday come where I tire of hanging out with her but I can't really picture it at the moment, and in any case it wasn't happening last night.
We were slightly late getting out of the restaurant (caught up in the conversation, or rather conversations!) and missed the buses heading that way on 1st. So we figured we'd hoof it to the venue (which worked out interestingly, given we arrived before someone else who was waiting for the bus we could have taken).
We kept talking on the way. And she has a fun eccentricity I have seen before where she would take the same path I would with sidewalk cutouts and such. It works for me. So even the walk there was fun.
We arrived probably 45 minutes after the doors opened, which worked out too since the show didn't start until over 45 minutes after that.
Remember everything about the evening had been great so far.
I'm pretty sure that has been the vibe of the description to now, but I wanted to make it clear, just in case.
Then the show started.
I could do a review of the show, I could.
But I noticed Jonathan Cunningham did one, titled Last Night: George Clinton Stinks it up at The Showbox, which kind of sums up how both of us felt.
The crowd was largely made up of long-time fans who knew all the lyrics and sang them loud.
This was good since George was barely doing 20-30% of the job himself, and neither the man in the wedding dress or the man wearing the diaper could distract us from the fact that we were witnessing a hollow shell of that which was once the legendary George Clinton.
This is music that is supposed to seduce the crowd.
Draw 'em in, make 'em want more, then give it 'em.
Make 'em want the funk.
Make 'em need the funk.
By that metric, the show itself, with George at way under half his vocal range and a set that built up nothing for anyone who couldn't have done the show in their heads themselves, was an unmitigated disaster.
Seduction?
This was like going home with someone who knew he could get the job done but was too drunk/high to perform. So you get three times the sex with 1/6 of the foreplay.
Both of us were incredulous, and I was embarrassed.
Maybe some of the fans who were so into the show that they felt having the funk was facile and didn't notice over the sounds of their own voices how ungood the performance was.
Had the gig been watching a show of funk fans carrying the funk master, it would have been a good show.
But as it was, I couldn't watch this train wreck that was robbing the memory of one of my favorite legends.
I asked Jennifer of she wanted to get some air; she did. We then just kept walking.
I apologized profusely, not that it was my fault but the fact that I had been excited enough about the show to encourage enthusiasm from her, I felt like I had set us up for the debacle.
Would anyone trust my musical tastes after I waxed so enthusiastically?
More importantly, would she?
She did assure me that although the P-Funk didn't deliver the goods, the M-Funk did.
Sincerely enough that I'll believe her.:-)
She is clearly more forgiving than I might be, so this is something I should work on too as she seems to be allowing one helluva mulligan.
Thankful for that I am, geez.
Whoduve thought that George Clinton would be messing up my game, anyway?
There have been nights over the last couple of decades when he was my game, or at least a contributing factor.
Now you can stick a fork in him, as the Funkmaster is done....
Over in the Suggestion Box, Aaron asked:
Michael,We noticed a problematic breaking change between Windows 7 and previous OS's around ConvertDefaultLocale and the Spanish LCIDs LCID lcid1 = ConvertDefaultLocale(PRIMARYLANGID(1034)); LCID lcid2 = ConvertDefaultLocale(PRIMARYLANGID(3082));On XP and Vista, these both return 1034 as the value. On Windows 7, this returns 3082.Same happens if we wrap this call with a MAKELCID/SUBLANG_NEUTRAL call: LCID lcid3 = ConvertDefaultLocale(MAKELCID(PRIMARYLANGID(3082), SUBLANG_NEUTRAL)); LCID lcid4 = ConvertDefaultLocale(MAKELCID(PRIMARYLANGID(1034), SUBLANG_NEUTRAL));This obviously causes bugs when using this function as part of best-guess resource lookups based off of GetUserDefaultLangID() results.Do you have any insight into why this changed, and more importantly, what other lcid's have changed?Thanks,Aaron
Interesting!
Now I have talked about ConvertDefaultLocale in the past.
Like way back in 2005 (in Change is in the cards for ConvertDefaultLocale....) when I pointed out that ConvertDefaultLocale would likely be changed to fix the inconsistency between native NLS and managed globalization classes methods of fallback.
On in 2006 when I mentioned it again (in What the hell is wrong with TranslateCharsetInfo, anyway?).
Then the kicker came a few months later in that same year when I wrote How ConvertDefaultLocale sorta broke backward compatibility in Vista, and why. In it I chronicled some though not all of the minor tweaks that ConvertDefaultLocale was using to try to make the two platforms behave more consistently.
This is not Aaron's issue, which is about a different change -- one apparently made in Windows 7.
It looks like they did the work to take on the issue I mentioned we almost fixed in Vista (in The modern solution to the problem of Traditional Spanish in Vista) but then backed out the fix due to backward compatibility problems (in They say it happens to everyone, at some point...).
And in that fix it appears they may have broken some assumptions that a developer might have made related to treating 0x040a and 0x0c0a as two different locales rather than as one locale with one alternate sort version of that locale.
Strictly speaking, the Windows 7 behavior is more correct.
However, the attempt to fix the similar bug in Vista was more correct, too (for all the good it did me and the others working on it!).
I've often suspected that the NLS team would "benefit " from having people like Julie and Cathy and me move on to other work since we wouldn't be around to make hairy nuisances of ourselves over the backcompat issues, and this appears to be another good example.
The behavior is actually better and more sensible from someone paying attention to meaning and context in figuring out behavior, but only time will tell if the ways in which this breaks other usages (like Aaron's) will be serious enough to inspire further tweaking....
The question from Alex Ewing was:
Hi Michael,I've read a number of your posts regarding the automated installation of supplemental language support on Windows XP but I have a question. I can't seem to get the same exact languages by using my unattend.txt as I get when I click the checkboxes in the Regional and Language settings GUI. Here's what my unattend.txt file looks like:[RegionalSettings]LanguageGroup = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17I want to get everything exactly as you get when you check the checkbox with the GUI, not just one or two or a handful of languages. Is there a good way to get this?Thanks for your time,Alex Ewing
The answer to this one has its roots where I first described language groups, back in that blog from 2005 (Language groups -- the vestigial tail of NLS).
You see, the whole language group concept, those 17 groups, were eliminated in XP and Server 203 and replaced with three categories:
Looking at which group is in each category:
Of course category 3 is always there now, installed.
And if you install any one of the language groups in one of the other categories, the whole category is installed.
This is not really mentioned in Q289125, but it is still quite true.
So while you can do that "LanguageGroup = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17" thing, it is just as effective for an XP/Server 2003 install to just use "LanguageGroup = 7,11" kind of thing instead.
And from there if you have any doubts then looking at How to REALLY handle the unattended version of Regional and Language Options for info on logging what is happening so if anything is happening incorrectly, it will tell you what.
Who'd have thought it would all come down to 7-Eleven? :-)