Blog - Title

April, 2005

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    Brett Shirley really ought to blog

    • 6 Comments

    Eric Fleishman posted earlier about the fact that sometimes, blogging really is that hard in a hilarious sendup of Brett Shirley's indecisive plunge into the blogosphere.

    I am glad I had not just sipped any Limonata, because if I had it would be all over the keyboard. Eric's post was freaking hilarious.

    Brett is someone I met last year when Cathy and I were meeting with Andrew Goodsell to talk about NLS, Active Directory, and Longhorn.

    Andy invited Brett to come by too. He struck me as incredibly knowledgable and I immediately took to him since he was the first person at Microsoft outside of the Access world who was not afraid of saying that three letter word that starts with "J" without immediately spouting out some kind of curse word. The way most people around here seem to enjoy doing. :-)

    I definitely think that he should start posting in his blog. As soon as he has a single post I will add him to my list of blogs I read.

    C'mon Brett!

     

  • Sorting it all Out

    Why are the HKL and KLID of the keyboard different?

    • 3 Comments

    I actually get this question on a regular basis, believe it or not.

    People look at the two numbers, see the similarities, and then start assuming they are the same. To review from my old keyboard terminiology post:

      • KLID -- a keyboard layout identifier. Traditionally pronounced "Kay-El-Eye-Dee" because some people in the USA get very uptight about certain homonyms (you can catch me slipping on this point from time to time). It's also sometimes called the input locale identifier since the name for HKL has been updated (see the HKL definiteion for info on why that is incorrect since the HKL is for something different). The KLID can be retrieved for the currently selected keyboard layout in a thread through the GetKeyboardLayoutName API (note the pswzKLID parameter), though that is not true of any other selected or installed keyboard layout. Every keyboard layout on the system has one of these. Each KLID is 32 bits (thus 8 hex digits), and they can all be found in the registry as the subkeys under HKLM\SYSTEM\CurrentControlSet\Control\Keyboard Layouts\. The bottom half of the KLID is a LANGID, and the top half is something device-specific. By convention, the first hex digit is usually as follows:

        • 0 -- Most keyboard layouts
        • A -- Keyboard layouts defined by MSKLC
        • D -- Some non-CJK input methods that have been defined by the Text Services Framework (note: reported to me; I have never seen one of these!)
        • E -- CJK input methods, also known as IMEs
      • HKL -- a handle to a keyboard layout, traditionally pronounced "Āch-Kay-El", the terminiology folks have pretty aggressively tried to call this an "input locale identifier" despite the obvious problem that it has nothing to do with locales and that it is not the same value as the actual identifier (the KLID). The HKL in actuality is the handle to an input method. Althought defined as a handle, only the lower 32 bits are currently used. Of those 32 bits, the bottom 16 bits represent a LANGID, and the top 16 bits represent a value defined by the USER SUBSYSTEM which helps to uniquely identify an installed keyboard layout. This is crucial since any keyboard layout can be installed more than once (by installing it under different languages, which helps user operations like spell checking).

    The differences are not obvious if you install keyboards via the LoadKeyboardLayout API (by the way, note the pwszKLID parameter name for those who doubted me about the whole KLID thing!). In that case, the same LCID is always used, and if the keyboard is one of the many with KLID values like 00000409 or 00000407 then the HKL value will be the same as KLID, further making people think they are the same. However, there are two times when they can and will be different:

    1. Any time the KLID value is more than just the LANGID -- like 00010439 for the Hindi Traditional keyboard layout or 0003041e for the Thai Pattachote (non-ShiftLock) keyboard layout, the HKL will have a high word of the lower DWORD filled with different information.
    2. Any time the keyboard is assigned via the Text Services Framework UI under a different language than the original keyboard (e.g. installing the US keyboard under the French language), the HKL will be based on the language you chos, not the language implicit in the KLID.

    One could say that making the HKL a handle was a waste, especially since it is a waste of 32 bits of space on 64-bit platforms, but I kind of wish we had defined LCIDs in the same way, it allows for a lot more flexibility.

    In any case, these two values are based on entirely different principles, and the times that they are the same value are entirely coincidential....

     

    This post brought to you by "Ѐ" (U+0400, a.k.a. CYRILLIC CAPITAL LETTER IE WITH GRAVE)
    A letter whose current importance is almost entirely based on its lack of substance in Windows!

  • Sorting it all Out

    My linguistic profile....

    • 5 Comments

    While I doubt that it is very scientific, the results were interesting:

    Your Linguistic Profile:

    70% General American English
    20% Yankee
    5% Dixie
    5% Upper Midwestern
    0% Midwestern

    It actually kind of fits someone who was

    • born in Missouri (but left before it had any real influence)
    • moved to Wisconsin for one year
    • lived in Cleveland, OH for ten years
    • was in Phiadelphia, PA for two years
    • hung out in Hartford, CT for five years (where I shed the fetish of calling soda anything like "pop" that I picked up in Cleveland)
    • slummed in Columbus, OH for five years (where I only hung out with a few people with a Southern Ohio/Kentucky twang)
    • spent the rest of the time in Redmond, WA

    With of course lots of random travel and short times in other places in between.

    Does not cover those broad midwestern vowels that some people say I still have, but still maybe there is some science buried in there....

    (via Heather Hamilton)

  • Sorting it all Out

    Not all keyboards are included in MSKLC's lists

    • 6 Comments

    Yesterday, Ovate left a comment to my post What is my locale? Well, which locale do you mean? entitled Chinese locales and Dvorak. Not entirely ontopic and probably better for the suggestion box, but I am not going to judge too harshly. :-)

    Ovate said:

    Hi there, I type with Dvorak for the right hand. However the Chinese locale input method MS-Pinyin 98 that I need to use to type Chinese relies on the Qwerty layout. I need to be able to change this to the right hand Dvorak layout within the Chinese locale (I actually need to be able to do the same for Japanese)

    I have downloaded a copy of MKLC (Microsoft Keyboard layout creator). However when I go to File\ load existing keyboard, it only displays Chinese (PRC) US keyboard option. This is not even a Chinese input method. I already have all the Chinese locals installed on my computer. Would anyone have any idea why the other Chinese input methods do not display or how I can locate them so that I can alter the keys to a Dvorak right hand layout. If I'm going about this totally the wrong way, please excuse the ignorance :-)

    This is unfortunately an intentional filtering that is done in the Microsoft Keyboard Layout Creator.

    Back when I first wrote MSKLC, the "Load Existing Keyboard Layout" functionality was intended to support every layout under the HKLM\SYSTEM\CurrentControlSet\Control\Keyboard Layouts subkey, but there were two problems with this:

    • Keyboards installed by third party tools like Keyman often did not use the keyboarding APIs to report results, so the methods MSKLC would use to interrogate the layout were ineffectual;
    • Keyboards attached to IMEs were basically pointers to other keyboards, and thus modifying them would never return the results you would expect.

    Entries like the Chinese (PRC) US layout actually exist if you go to the Add Input Language dialog provided by the Text Services Framework team, as you can see below.

    And those entries are the only ones that MSKLC can actually get good information out of using the standard keyboarding APIs. If you look at the entries in the registry for the IMEs, they look like this:

    Now obviously the "Layout File" is the keyboard DLL that is used. But the rules for the IME-attached DLLs are different. For example the Japanese file actually forwards to different DLLs depending on other system settings that are not documented, and the actual code in the IME has all different kinds of rules for reading the keyboard input, only some of which is Virtual Key-based and none of which is easily discerned with the keyboarding APIs. So in the end I hd to filter out those IME entries since I was unable to do anything with them.

    Now in some cases, you can actually modify that Layout File setting to actually get the keyboard you want attached to the IME. In Ovate's case, the United States-Dvorak for right hand info can be found in the 00040409 entry:

    So the layout file is KBDUSR.DLL. It may or may not work to change this value in the IMEs, mainly depending on what the IMEs themselves do to use the layout. But MSKLC could not really get in the business of modifying existing system configuration settings (for obvious reasons), since that is a sure recipe for disaster, even if it would always work.

    So if you choose to make those kinds of changes in the registry, you are on your own. Be sure to save somewhere what the original settings were, by right-clicking on the registry key and choosing to export it to an .REG file:

    so you can restore the old settings and recover from anything that does not work properly. If you do not and you mess up the IME beyond all repair, then be sure to tell the support person on the phone what you did, and don't get mad if they suggest that you reinstall Windows. :-)

     

    This post brought to you by "" (U+1e30, a.k.a. LATIN CAPITAL LETTER K WITH ACUTE)

  • Sorting it all Out

    The headline was 'Killer flu samples shipped via FedEx, DHL'

    • 5 Comments

    Nothing technical in this post!

    The headline that brought back a lot of memories for me was on MSNBC on Thursday was indeed Killer flu samples shipped via FedEx, DHL. Not quite flashbacks but maybe a tamer version of them.

    You see, I actually worked (in a past life) as a part-time contract package handler in the Hartford hub for RPS (Roadway Package Systems) which was eventually bought out by FedEx to become FedEx Ground. When I was working for them was like ten years ago, so all of memories of the place may be obsolete. Fair warning!

    The work was part time so they did not have to give health insurance to package handlers, and they managed to keep the unions out. Good honest work.

    I do remember the shoddy way most packages were handled, the way companies would send ten packages with flourescent bulbs since they expected nine of them would get busted. But no one messed around with the boxes marked hazardous. Mainly because no matter how dumb a person is, they are not that dumb, you know?

    For a lot of the time, I was guy handling the small packages, sorting them in to various bins. It is a definite promotion, not in terms of money but in terms of benefits -- I no longer had to lift 100 pound packages in trucks that heated up to as much as 100° for four hours. A definite benefit, I can promise you.

    Of course I do not have to wonder how easy it would have been to smuggle somehing out of there; security guards were not brilliant and although they would notice people obviously crating packages out, they were not grabbing one's crotch or ankles or whatever and if one were willing to put something there for the trip to to the car then they probably wouldn't have noticed. I never did it but I remember there were people who bragged about doing it for various fun/novelty items. One got the feeling that 70% of them were bragging and had not taken anything, but that does leave 30% genuine theft. I could imagine someone doing it for something dangerous (though no one ever took credit for something like that!).

    I certainly would not put something packed in dry ice marked hazardous anywhere on my person for any length of time at all. But not everyone was necessarily as smart.

    I had a few friends who worked in the area where they fixed up the damaged packages, and every once in a while there would be damaged package marked hazardous -- everyone was careful then. The shift managers chose to number among the damaged packakes staff a young lady named Gina. In so doing, they (possibly intentionally) caused people to pay a lot more attention to the area, since being able to talk to a woman was kind of a novel thing for some of these guys and they always found excuses to bring packages to her. I even know of times that people would intentionally break packages so they could make that trip, though (a) they were supposed to set them aside so someone could pick them up an (b) Gina did not find dumb sweaty guys to be her cup of tea. In fact, a bunch of us used to go to a sports bar named The Penalty Box (so named because of a small penalty box that you had to sit in if you spilled a drink!). It was fun and a great unwinder where they used to do Karaoke there on Friday nights and it was pretty amusing when a group of us all got up to do Love Shack by the B-52s. I remember times that Gina told us what she thought of the whole setup at RPS. She thought it was insane the way people would go out of their way to talk to her by breaking boxes and such. It made a lot of sense to me though -- what better way for managment to get extra eyes on the people in the best position to take stuff out (the trailers always had 2-3 people unloading, even small packages had two people)?

    I honestly doubt her and Rich would ever take a damn thing anyway. It was a place to put smart people, and both of them were smart. And I knew other women who worked there, one of them named Jen was even a roommate of mine for a while (just good friends looking to split rent, nothing sordid!), so I knew plenty of women there who would spend time in the big trucks (which to be perfectly frank, I never enjoyed much myself). I do have to admit it was interesting to see how such things could disrupt a workplace that did not know quite what to do with women. Maybe a little like parts of Microsoft now that I think about it, if you listen to the stories from the old days. :-)

    Roadway Package Systems was an interesting place to work, and RPS was even more interesting as a company....

    You see, Roadway Express was shipping merchandise to big department stores like G. Fox, and they had a bunch of old tandem trailers they wanted to upgrade. As a way to get rid of them, they spun off a sister company named Roadway Package Systems that was on paper set up to act as a corporate competitor to UPS. Of course in reality everyone expected the whole enterprise to fail, which would allow the tandem trailers to be written off. Who would ever have guessed that the UPS package handlers and truck drivers (both Teamsters Union-controlled) would have chosen this point to have a huge strike? RPS turned out against all expectations to do quite well targetting the coroporate space, probably suiuccessful enough that the Roadway Express execs did not mind losing their writeoff. When they bought new trailers for RPS a few years later, it was clear that they had moved on to accept the new reality. :-)

    Anyway, to swing on back to FedEx (whose FedEx Ground was formerly RPS), I remember talking to people who had worked for FedEx, UPS, and all the others. The setup was pretty much the same in all these places, unionized or not. I can only hope that the security has gotten better considering all of the things that ship via these services, which was the point of concern in the MSNBC article, though not for the reasons I am concerned, as a former package handler who knows how little people cared about the packages. If someone is making $9.50 an hour (scaled back to circa 1990 wages, not sure what that would be now) and someone offers $10,000 for something out of a package, would a handler stand on principle, or take the money and run? I know I would not be stealing, but I also know that there were people who would bust open boxes costing the company who knows how much just to talk to an attractive woman so I doubt they would balk too much at making some gelt if they could find a way to do it.

    I do hope the security is better now, especially thinking about virtulent strains of viruses and such. For all of our sakes....

  • Sorting it all Out

    When a user sets something. please assume they meant it

    • 10 Comments

    I am not going to claim our UI in Windows is so intuitive that we can trust that anything that is set is what the user really wants. In fact, I have stated many times that Regional Options is not intuitive.

    But when a developer tells me that the reason they do not use the override information is that it may not be valid, in my opinion they are a bit thin. If you know what I mean.

    After all, if the user never launches Regional Options then the overrides are identical to the original Windows data. If there are any differences, then somebody went into Regional Options and changed something. And they have a good faith basis for believing applications will pick those settings up. Not picking them up is kind of irresponsible in a client machine scenario....

    Now my buddy Mike definitely points out a use of the NLS SetLocaleInfo function that is downright irresponsible, no question about it. After all, if ignoring the user's preferences is disrespectful, then supplanting their preferences wuth your own is downright obnoxious! What is up with some people?

    Another pet peeve of mine related to all this was one that Dean pointed recently in his post Disabling ClearType in Reading Layout View. Now I agree that the ClearType settings are pretty hidden, but is the Word replacement any better? If you look at the poor documentation and the way it is buried in the registry in ways that are hard to find, the argument that the Desktop Control Panel settings are obscure is pretty specious. I would be a tremendous fan of anyone who ripped this code out, root and branch, and used the SystemParameterInfo function with the SPI_GETFONTSMOOTHING, SPI_GETFONTSMOOTHINGCONTRAST, and SPI_GETFONTSMOOTHINGTYPE flags. Consider this post a standing offer of a dinner somewhere nice that I will give to any Office developer who accomplishes that. :-)

    The principle is simple -- follow the user preferences. If they did not feel strongly enough about changing them that they are untouched, then that too may be a preference. And a good developer does not ignore messages the user is sending to them....

     

    This post brought to you by "®" (U+00ae, a.k.a. REGISTERED SIGN)

  • Sorting it all Out

    Instead of a 'blue screen', I am waiting for the blue screen

    • 5 Comments

    I was reading Mike Poulson's I feel your pain. My first Call to Microsoft PSS (and kb890859). And thinking about my own story about the VP who did the same thing, calling support for a technical problem. And having a degree of knowledge above a naive user.

    Anyway, it got me thinking about how I see blue screens all the time.

    Part of my job is to do some of the data updates in Windows, mostly to the collation bits. I am pretty much the person who authors all of the data changes that require no linguistic talent and I am the person who checks in the changes for the linguist that actually do require linguistic talent. This is because of some of the important rules related to being able to run all the tests and build and such. If you break the build, it is much better to have an answer to questions like "did you build the change?" and "Did you test it?" and so forth. The rules make sense, truly they do.

    Anyway, when I making these changes, sometimes I mess something up. And the OS is not very forgiving about that sort of mess (any more than if you replaced kernel32.dll with a file resembling cottage cheese or something). So I get a blue screen like the one described all over the KB (like in 885523 and 839517). Of course in my case I am the one who corrupted the system, so it is my own fault:

                                                                                                                            
     A problem has been detected and Windows has been shut down to prevent damage to your computer...                       
     Technical information:                                                                                                 
     STOP: c0000135 {Unable To Locate Component}                                                                            
     This application has failed to start because winsrv was not found. Re-installing the application may fix this problem.  
                                                                                                                            

    I was struck by the number of times that I am rebooting and hoping I will not get a blue screen. I want to boot into Windows normally, and see the logon dialog. And behind that logon dialog there is... (wait for it) a blue background. So I am hoping I do not see a blue screen because I want to see is a blue screen.

    No one else seems to find that joke funny, but for some reason it appeals to me. Its like my own version of the multithreaded chicken joke, only not as funny.

    I remember once when Cathy tried using a file she had just built for a Server sorting bug fix and replacing the file on her machine, and made it blue screen. We both had forgotten for a moment that the file format had changed between XP and Server so that the two were not compatible. She said it reminded her of the old days when that would happen all the time with data updates....

    Now for the record I have never called Product Support about my own c0000135 issues because frankly I am not that mean. How on earth can anyone train for someone who modifies system files and breaks the OS, and calling PSS for assistance?

    Though it might be amusing -- they could put together another "STOP: c0000135" KB article that had something like the following:

    CAUSE

    This problem may occur if all the following conditions are true:

    • You are Michael Kaplan.
    • You screwed up a change while trying to update system data.
    • You are bored.

    RESOLUTION

    You are a punk, Michael. Do not call product support again with this kind of nonsense.

    MICROSOFT INTERNAL NOTE

    Remind Michael that it is not wise to tease the Product Support Engineers. Contact Michael Kaplan's manager ASAP so that someone can have a conversation about Michael's apparent lack of focus.

    Maybe just for the internal PSS web? :-)

     

    No character was willing to sponsor this post. So it must be pretty far "out there"....

  • Sorting it all Out

    From Win32 to .NET (and vice versa)

    • 4 Comments

    Charts like the one in Some suggested updates to the Win32-->.NET mapping for NLS functions are kind of amusing when you take the context in which the System.Gloablization class was conceived.

    Internationalization in Windows did not really exist way back in the days of 16-bit Windows, and when all of that code was written back in the early days, all of the expert advice that came from linguists and standards experts went into the pot. But mistakes were still made and not everything was done in the way we might do it today if we had the choice to do it all overe again. If you know what I mean.

    Back before the 1.0 version of the .NET Framework shipped, it all started as an interesting "What if there was no legacy, how would we support internationalization knowing everything we do now?"

    Julie Bennett (based on meetings with John McConnell and others) put together an interesting spec that laid out the way such a framework would work. This spec is the very spec that I had found that made we want to meet with her and talk about Globalization support in .NET that I mentioned in Why/how MSLU came to be, and more that led to me getting my first contract with this group.

    It describes a lot of what actually became the System.Globalization classes in the .NET Framework. That big "What if?" discussion document made many interesting changes that do make a lot more sense.

    Yesterday I was talking to a developer working on the Longhorn project who was trying to work on our support of the way Longhorn will consume the results of Whidbey's CultureAndRegionInfoBuilder class (which as discussed earlier will pave the way to how locales will be able to be customized in Longhorn) and was baffled at why a certain bit of code would not return the right results. Until he found the documentation of NumberFormatInfo.NumberGroupSizes and looked at the remarks:

    Every element in the one-dimensional array must be an integer from 1 through 9. The last element can be 0.

    The first element of the array defines the number of elements in the least significant group of digits immediately to the left of the NumberDecimalSeparator. Each subsequent element refers to the next significant group of digits to the left of the previous group. If the last element of the array is not 0, the remaining digits are grouped based on the last element of the array. If the last element is 0, the remaining digits are not grouped.

    For example, if the array contains { 3, 4, 5 }, the digits will be grouped similar to "55,55555,55555,55555,4444,333.00". If the array contains { 3, 4, 0 }, the digits will be grouped similar to "55555555555555555,4444,333.00".

    And compared that to the documentation he already knew about from the GetLocaleInfo topic in Windows and its link to Locale Information that discusses the LCTypes, specifically the one on LOCALE_SGROUPING:

    Sizes for each group of digits to the left of the decimal. An explicit size is needed for each group, and sizes are separated by semicolons. If the last value is zero, the preceding value is repeated. For example, to group thousands, specify 3;0. Indic locales group the first thousand and then group by hundreds, for example 12,34,56,789, which is represented by 3;2;0.

    Note the change -- the .NET Framework says "If the last element is 0, the remaining digits are not grouped" while in Windows it says "If the last value is zero, the preceding value is repeated". Aha! or maybe Eureka! The way to make the code work becomes clear....

    Now by any objective standard, this is a good change, made in the sober spirit of "what if we could do it all over again?" but it is still a bit jarring when you have to come back and put them side by side. Either because you are building a big table that tells you how to take a Win32 API and find its .NET Framework equivalent, or because you are working on code to consume the .NET Framework's format in Win32. Either way you understand both why it changed, the terrible freedom you have when you do not have a legacy support issue, and how embarrasing the old ways were, even though they worked quite well for over a decade....

    Yet somehow it does not feel all that embarrassing to me to put them both up, side by side. It feels quite liberating to admit you are wrong, especially when

    1. The "wrong" solution still works just fine, thank you;
    2. The "right" solution exists now, or will exist soon;
    3. We are the team responsible for both;
    4. There are sensible reasons for both to exist as they do;
    5. It comes up in the context of both opening it all up and getting out of the way.

    That is, quite simply, awesome. Both as a legacy and as a future direction.

     

    This post brought to you by "Ă" (U+0102, LATIN CAPITAL LETTER A WITH BREVE)
    Who, after seeing yesterday' sponsorship by "A" wanted to point out that it is more than just an "A" -- it is like an "A+", as it were!

  • Sorting it all Out

    Comments work here (though not so you'd really notice or anything)

    • 7 Comments

    My friend Cathy politely asked me the other day why no one was making comments on my blog anymore.

    Since I had been accepting comments daily, I respectfully pointed out that they were not actually paying attention -- there were comments aplenty!

    She respectfully pointed out that my blog's home page had plenty of [0] comments notations on it. And I'll be damned if she wasn't right!

    I did at that point respectfully point out if she clicked on the links (which was likely possible for her) then she would see the comments.

    She politely suggested what I could do at that point....

    Anyone who knew me or more to the point her would know that my use of the words "politely" or "respectfully" above is utter crap -- we have a vigorously dynamic sort of relaionship! :-)

    Anyway, be sure to ignore the

    [0] comments

    at the bottom of the post. Close your eyes, trust in your feelings, and use the force to guide you to where people talk about either how on track or on crack I am1.

     

    1 - I have to give credit to Chris Bryant, a Development Lead on the Access team, for first introducing this delightful scoring system, initially applied to describing a spec provided by Program Management. It is useful in so many areas. :-)

  • Sorting it all Out

    On approaching international programming....

    • 9 Comments

    Yesterday, someone named Ben posted the following comment to my post Invariant and Ordinal Redux:

    I appreciate your enthusiasm for picking out common programming errors like this, but as a professional programmer, I find a lot of these internationalization parameters confusing.
    How do I know if I need to pass the NORM_IGNOREKANATYPE flag to CompareString? How do I know if I want LOCALE_USER_DEFAULT or LOCALE_SYSTEM_DEFAULT, or some other locale?

    I simply don't know. Unless I learn Japanese, or know someone who knows Japanese, I'll never know the answer. The trouble is that the APIs feel like they were written by linguists.

    Me? I just want to compare filenames, or compare entries in a hash table, or compare usernames, etc. I don't want to even have the choice of ignoring kana types. I just want the CompareStrings API do the *right thing* out of the box. If that is too hard for a single function, then let's write some API sets that are easy to use for common cases. I think this would be a more useful endeavor than to write articles about the nuances between CT_CTYPE3 and CT_CTYPE2.

    Sometimes less choice is better. Please please finish that list of do's and don'ts. Please please make a list of "If you want to sort like a dictionary, do this... If you want to put filenames into a hash table, do this..."

    My initial reaction was to point out that the APIs were not written by linguists -- but the developers had expert advice from linguists when the functionality was exposed.

    My second reaction was a technical one, thinking of which ones I had already covered (like What is my locale? Well, which locale do you mean? answering some that locale question) and which ones might make good future posts (like the care and feeding of NORM_IGNOREKANATYPE) and so on.

    My third reaction was to slow down this "developer" in me trying to solve the technical problem and look to what was really being suggested. Unfortunately, Ben's supposition is correct -- the APIs are complicated, and there is too much functionality to try to distill into simple usage without having detailed articles about the nuances. Articles that could be read by the kind of devs who try to solve the problem you indicated.

    In a very real and almost biblical sense, one can talk about "CompareString which begat lstrcmp and lstrcmpi in the USER kingdom, and was fruitful an multiplied in the SHELL kingdom and begat StrCmp, StrCmpI, IntlStrEqN, IntlStrEqNI, StrCmpN, StrCmpNI, StrIsIntlEqual, some of whom later begat StrCmpLogicalW. And in that kingdom functions which were not begat from CompareString also flourished like those that used the C rules -- StrCmpC, StrCmpIC, StrCmpNC, and StrCmpNIC. And in the kingdom of .NET the managed brother CompareInfo was also fruitful and begat the five overloads of String.Compare and in Whidbey begat the StringComparer class and the StringComparison enumeration. And CompareInfo.IsPrefix and its overrides begat String.StartsWith. And CompareInfo.IsSuffix and its overrides begat String.EndsWith. And..."

    Of course what the SHELL folks and the BCL folks did showed that in attempting to simplify individual functionalities into single APIs, you cause an explosion of simple APIs that are also very tough to unravel what to use.

    Topically modifying what Hal Holbook said on The West Wing (playing the cantankerous Albie Duncan) in the episode Game On:

    It's not simple. It's incredibly complicated. I've been doing NLS work for over 10 years and there is no right answer to these questions and software development needs all the words it can get its hands on...

    I could tell you when it is ok to use lstrcmp and lstrcmpi and StrCmpLogicalW.  I could not even try to tell you how to navigate the rest of that stuff in the Shell or a lot of the stuff in .NET, even though a lot of it calls right into us. Because to me it is just a decision of whether one wants one's complexities to be horizontal or vertical, with the bonus of the vertical complexity (the NLS kind) being that all of the functionality is there, versus the individual McNugget that the developer was trying to surface in the simplified method, which will always be missing one or more of the functionalities that are possible, despite seeming to me to be a lot more complex....

    So while I will give practical advice from time to time like (like "use the new OrdinalIgnoreCase type comparisons when trying to imitate the OS, because the OS does not know CompareString from Cholesterol"), the bulk of what I say will be exploring that vertical space of the NLS managed and unmanaged APIs and how best to use them to get the results you want.

    Because the problem I have personally with the horiztonal space is that when you have to change behavior because the call did not do what you thought it did, the change is more than just passing a new flag; it is often calling a whole new function in a whole new way (just take the String.StartsWith method as an example -- if you want to do some operations you have to move to CompareInfo.IsPrefix, which has entirely different calling semantics (one is a static method that takes two strings, the other is an instance method on a string). Or if I want to change the STRINGSORT/WORDSORT behavior of StrCmp, I have to go figure out all the parameters of CompareString now, which if I had done in the first place I would not have been trapped in the Sargasso of SHLWAPI.

    Hopefully this fits with the model people are expecting here. If not then maybe the Shell or BCL folks will step up and work to provide the uber-conversion charts to know when to call which of the 30 methods that are all designed to simplify the five methods that NLS provides (or in the unmanaged world the 30 functions designed to simplify the one function).

    Simplification is just too complex for me. :-)

     

    This post brought to you by "A" (U+0041, LATIN CAPITAL LETTER A)
    After Happy Days went off the air and everybody realized the Fonz was short, the letter behind "Aaaaay" had its reputation injured a bit andis looking to expand into new markets, like this blog!

  • Sorting it all Out

    AVICAP32.DLL sucks (from the MSLU point of view)

    • 14 Comments

    It is all about perspective.

    I am sure there are people who look at this DLL as being the answer to their prayers in terms of providing helpful interface to AVI capabilities.

    But from my point of view, it kind of sucks. :-(

    The other day someone with the handle PRR posted the following to the microsoft.public.platformsdk.mslayerforunicode newsgroup:

    Problem description: If unicows.lib is included in the project, floating point control word may become invalid during program startup on Windows 98/95 machines (not tested on ME).

    Compiler platform: MS VS.NET 2003 Pro, Platform SDK Feb 2003, Unicows.dll 1.1.3790, Win XP Pro, P4@2.4G, 1G RAM,

    Steps to repro:

    • Create new Win32 Console Application (default settings).
    • Under Project Property Pages dialog choose Release configuration
    • Add following to Linker/Command line:

    /nod:kernel32.lib /nod:advapi32.lib /nod:user32.lib /nod:gdi32.lib /nod:shell32.lib /nod:comdlg32.lib
    /nod:version.lib /nod:mpr.lib /nod:rasapi32.lib /nod:winmm.lib /nod:winspool.lib /nod:vfw32.lib
    /nod:secur32.lib /nod:oleacc.lib /nod:oledlg.lib /nod:sensapi.lib unicows.lib kernel32.lib advapi32.lib
    user32.lib gdi32.lib shell32.lib comdlg32.lib version.lib mpr.lib rasapi32.lib winmm.lib winspool.lib
    vfw32.lib secur32.lib oleacc.lib oledlg.lib sensapi.lib

    • Modify main cpp file to following:

    #include "stdafx.h"
    #include <stdio.h>
    #include <float.h>

    int _tmain(int argc, _TCHAR* argv[])
    {
        printf("%x\n", _control87(0, 0));
        return 0;
    }

    • Compile the Release project and run on Win98SE PC.
      Output: 40003
      Expected output: 9001f

    The problem was reproduced on machine, which has a clean install of Win95OSR2 + clean upgrade to Win98SE. It does not happen all the time. If program should return 9001f, reboot Win98 and try again.

    Note: It does not matter, whether project uses Multi-byte or Unicode charset.

    Note: As soon as unicows.lib is removed from project, program starts acting as expected.

    Now for the record, it is not MSLU that is doing this. In order to have maximum compatibility with every version of Win9x (including Windows 95), there is no dependency whatsoever on the C Runtime. So it is defnitely not setting the floating point stuff. It is actually harder to do this than I realized, but Phil Lucido helped me shed the dependency while still using stuff like structured exception handling....

    Now I knew this issue had come up before but honestly could not remember what it was. Luckily, Ted (who unlike me remembered this issue) came to the rescue with the answer for PRR, and a workaround:

    The thing that actually destroys the floating point is avicap32.dll which unicows.dll is dependent on.  Several searches will come up with information about this.

    The solution is to create your own AVICAP32.DLL stub DLL that sits in the same folder as unicows.dll (if you don't rely on functionality in that DLL).

    The problem is that unicows.lib is statically linked to many of the system DLLs that it has to call for the functions it wraps, and AVICAP32.DLL does indeed change these settings in the process. Whether you wanted it to or not.

    Now this DLL only has two APIs that MSLU wraps: capCreateCaptureWindow and capGetDriverDescription. For all of the trouble that they cause with this floating point crap, I wish no one had noticed these two APIs that were missed for so long (they were added to MSLU on March 24, 2001 and I doubt anyone has actually used the wrappers since then, beyond the meager tests I wrote!).

    Back then, I had toyed with the idea of delay loading all of the DLLs and functions being called, but somewhere in the 15+ DLLs and 550+ APIs it just seemed like an excessive amount of work. And it never ended up happening. It probably should have happened for this one DLL/two functions to work around the floating point problem, but it is not really worth rev'ing the DLL for that one change. I'll put it on the list of things to triage, if and when....

     

    This post brought to you by "" (U+0c8a, a.k.a. KANNADA LETTER UU)
    A letter that is selcom seen on Win9x but well represented in the Tunga font that ships with Windows XP and Server 2003!

  • Sorting it all Out

    Invariant and Ordinal Redux

    • 5 Comments

    I have talked about LOCALE_INVARIANT / CultureInfo.InvariantCulture before, in Comparison confusion: INVARIANT vs. ORDINAL and Where is the locale? "Its Invariant." In where? and talked a little about the if not noble then at least deterministic intent of this odd locale with no real country and no real language. But when I look at how people use it, what I am most often struck by is two different things:

    1. Developers tend to misuse it at least ten times more often than they use it correctly. And that is a charitable estimate.
    2. The name of this beast is so staggeringly bad that it is probably the reason for the heavy pattern of misuse described in #1.

    That  post I link to for LOCALE_INVARIANT has probably the best description I have ever seen of the purpose of Invariant:

    The LOCALE_INVARIANT is a special locale identifier that is locale independent. It is designed for system level functions that require consistent results (for example, sorting in the file system) regardless of the locale that the user has chosen. Typically, an application does not use LOCALE_INVARIANT because it expects the results of an action to depend on the rules governing each individual locale.

    LOCALE_INVARIANT is defined as LANG_INVARIANT for the primary language, SUBLANG_NEUTRAL for the sub-language, and SORT_DEFAULT for the sort id.

    In fact the only real problem with this summary is that it located in the remarks for the MAKESORTLCID macro, whose only connection to LOCALE_INVARIANT is that like all proper LCIDs you can construct the value with the macro if you did not want to use the predetermined constant. At least they mention how the construction is done (I have seen developers wonder why MAKELLANGID(LANG_INVARIANT, SUBLANG_DEFAULT) causes an error when you try to use it -- I tell them to just the predefined constant and not try to build them all when you do not have to!).

    The most common misuse is for people to do a ToLower() operation followed by an invariant comparison to validate filenames, which if you are a regular reader here you will know that it is hard to get less accurate results out of code than this approach. Well, you could write code that reformats the user's hard drive when you run it and that would be worse. But it is in the top five "bad algorithms you can create while soberly writing code." With the bonus of being an extra string allocation.

    Yikes!

    I have decided that I will from time to time remind people of the difference between ORDINAL and INVARIANT, because neither name is all that intuitive but each of them can be incredibly useful (when used properly/appropriately). As I said in the first article above:

    That problem remains to this day, though every single time I speak at a conference or answer a question in a newsgroup or get someone to look at posts like this one, then there is at least one less developer who has this problem. Maybe this time it is you? :-)

    I kind of feel a summarized list of DOs and DON'Ts coming on in a future post, and the issues surrounding Invariant and Ordinal comparisons will probably have a prominent place in that list (as will some of those casing issues).

     

    This post brought to you by "ლ" (U+10da, a.k.a. GEORGIAN LETTER LAS)

  • Sorting it all Out

    Why the proofing tools are all bundled....

    • 6 Comments

    Earlier this month, saturnin02 posted the following opinion onto at least seven English and Italian Microsoft newsgroups, in response to a general thread about the Microsoft Proofing Tools and the fact that they basically bundle all of the languages together, rather than selling the languages separately:

    This sort of discussion underscores what's wrong with Microsoft (and others) pricing policies, indirectly supporting illegal activities. Having paid a premium price for Office 2003 and using it at 98% of the time with only English, it is ridiculous to have to purchase an entire CD for dozens of languages one would never, ever use. Microsoft is making life easier for itself by bundling them all together, for it figures companies would buy it regardless. To hell with individual, occasional user, who only needs one or two languages for occasional use. Then they turn around and complain about illegal market and how many sales they are loosing. I too have a need for a second language, but I would never pay Microsoft for the CD at that price.

    Our own Mike Williams responded in the thread thusly:

    Over 90% of the tools on that CD are licensed from other companies (mainly in Europe). The companies that produce these proofing tools individually charge more for single elements of their packages (e.g. spelling or grammar or thesaurus) than Microsoft charges for all languages and all the tools (which includes speech and handwriting and other tools) on the entire CD.

    This is a very true point -- and it is a very real truth. I am going to take it a step further.

    There are times that the model of the U.S. Senate (where each state gets the same number of seats) if preferred and times that the model of the U.S. House of Representatives (where each state gets seats based on population) is better. Microsoft is not a government (I have trouble imagining someone acting as the British Ambassador to Microsoft!), but even so there are times that the customer benefits most from separate packages and times that they benefit most from bundled packages. And the Proofing Tools are a time that the bundle is best.

    And not just for the reason that Mike hints at -- a way to charge less for the tools altogether (some of which would cost more for a single language than they do for the whole package as it ships today!).

    Because the people who create the tools need to be worried about recouping their costs, which can obviously be significant. But a company like Microsoft needs to not worry so much about that issue, and certainly not to set different prices for different languages, based on how much they can recoup (which could make some languages less appealing, even if they would be able to see potential use). By bundling them all, they provide a compelling feature set that is still lower than picking up the tools for even a single language might be, if one could find a single language and all of its features bundled.

    The same model explains a lot of why the GIFT team ships almost all of the locale support to all platforms -- because the platform needs to be consistent for developers who are counting on that locale support being present. Even the items that do not get installed automatically which are on the CD are only that way for reasons of performance (extra perf. hit for complex scripts)  or size of install (the code pages and the CJK fonts/IMEs can add up to the size of the install). But the goal of having international support easily available is really the only line item here. If Windows were splintered out with language support varying for different SKUs then many languages would have to have different pricing models.

    Which would really kind of suck from the standpoint of developers who needed to avoid targeting specific SKUs.

    Now I understand saturnin02's point, and it can be frustrating if you just want one to get the whole bundle. Because it is easy to assume that if over 30 languages costs N dollars than one would cost N/30 or better. But that is not really how it is, and the current model lets the price of the Proofing Tools be about something other than the cost of the Proofing Tools, if you catch my drift.

    Just as we do not fragment the market to make all of the languages we do not support yet cost more just because they might involve markets to which Microsoft has not yet been which may not have a many customers. It lets my job be about the opening it all up and getting out of the way and not the pure about what each language can get on the auction block. Which is one of the biggest reasons why I accepted the offer to make it my job.

    Imagine if Microsoft did that for languages that needed fonts, and a big Chinese font and the IME that supports it would cost a small fortune, while a font and keyboard for a language that used only characters that are already in fonts would cost nothing. What would happen to the pricing model for languages? And how often would people buy Windows versions based on how little support is installed, even if they could have made use of more. I really would not like such a situation, and I am really glad that it is not the model that Microsoft uses for its language support, either for locales on Windows and the .NET Framework of in the Proofing Tools for Office.

    If you know what I mean.

    This post brought to you by "" and  "" (U+099d and U+0d2c, a.k.a. BENGALI LETTER JHA and MALAYALAM LETTER BA)
    Two letters that exist in fonts that first shipped (along with shaping engines) in Windows XP SP2, with a price to customers of exactly $0. :-)

  • Sorting it all Out

    It is all about making sure developers respect the user's preferences

    • 7 Comments

    Now this is a topic I have talked about before, so I am going to put a slightly different spin on the issue, this time.

    To review, a user can go into Regional Options and make choices about the language they want to use.

    Now I could go on forever (hell, I have gone on forever!) about using the NLS APIs to have that setting respected and how it is such a good idea.

    But developers do not tend to think that way.

    Generally speaking, they develop programs that meet requirements, and the "respect user locale" requirement is such a stealth one one that it usually does not show up on the list a developer is working from.

    So, put yourself in my shoes. Or the shoes of a developer working on NLS. How do you make sure that the user preferences are going to be respected?

    Well, the best way is to make sure that by default the APIs pick up the user preferences. It has to be harder to not get those preferences. And we do that -- all the NLS APIs like GetNumberFormat and GetLocaleInfo require developers to pass the LOCALE_NOUSEROVERRIDE flag explicitly to not get the user's preferences. In the same way, the CultureInfo ctor has a default that picks up that same set of user preferences. Sure, you can explicitly pass false to that UseUserOverride parameter, but the default does not require that.

    Now there are people who actually work on .NET who feel that the .NET Framework should have had different defaults here -- better in their mind to go with the locale-insensitive behavior and then make picking up the user settings an "opt-in" process. But such an approach is doomed to be non-intuitive since users who make changes in Regional Options have a realistic expectation that they will be able to see the change picked up. It is the main reason that even the .NET Framework was given mechanisms to pick up these changes -- because people expect to not only make them but to see the results reflected by applications, managed and unmanaged.

    So, over the objection of some people, the .NET Framework does the same thing as Windows functions like lstrcmp and lstrcmpi -- it respects the user's setting by the default in comparison, parsing, and formatting. If you are a developer writing code and you want to ignore those settings, you have to explicitly work at it....

    Because I am a developer myself, and I know that sometimes the respect must be enforced to make sure developers do it. No offense to the developers (and this does not apply to all of you!) but it applies to enough to make doing it this way a requirement.

    This post brought to you by "ܜ" (U+071c, a.k.a. SYRIAC LETTER TETH GARSHUNI)

  • Sorting it all Out

    What the %#$* is wrong with German sorting?

    • 23 Comments

    (Apologies to those who are offended by the South Park movie scene that inspired the title of this post!)

    About a month ago, Daniel J. Smith asked me something that prompted me to say Dere are qvestions? In zat case...

    Then last week, Martin Müller asked in the microsoft.public.dotnet.internationalization:

    Recently I've stumbled across the fact that the CompareInfo for my default culture de-DE as well as for InvariantCulture considers "ss" and german "ß"  (szlig) equivalent, which is not correct!

    For example, calling lassen".IndexOf("ß") yields 2 instead of 0.

    CultureInfo.InvariantCulture.CompareInfo.Compare("lassen", "laßen") returns 0, which is wrong, too.

    Using CompareInfo.IndexOf() without special CompareOptions gives the same incorrect results. When I use CompareOptions.Ordinal, however, IndexOf correctly returns -1 and  Compare returns inequality. But CompareOptions.Ordinal cannot be combined with any other flag, so a case insensitive comparison isn't possible this way.

    This bug occurrs with IndexOf and Compare of both String and CompareInfo.

    Any comment on this or info when this will be fixed?

    Well, I have a comment, but things are working as designed so nothing is going to be "fixed". I will explain....

    In the German language, the Sharp S ("ß" or U+00df) is a lowercase letter, and it capitalizes to the letters "SS". Now Microsoft's casing tables only support simple Unicode casing, which does not include any rules that would change the size of the string such as this one. So doing a "ß".ToUpper() call will not return "SS".

    (for more info on those casing rules, see CaseFolding.txt in the Unicode Character Database)

    But in any case, collation can be a bit more flexible. Since the Sharp S is very much a German letter and not one widely used outside of German, it is included in the default table rules used by all locales (which allows German to be kept in the default table and it will be used by all locales that do not conflict).

    But obviously on most locales, "ss" is what uppercases to "SS". Even on German, "ss" would uppercase to "SS".

    So it is only logical to assume that in such a case, that if

    "ss".ToUpper() == "ß".ToUpper() == "SS"

    then

    "ss" "ß"

    at least for the technical purpose of facilitating the ability to treat these other cases properly.This why on almost all locales (including the invariant locale), "ß" looks so much like "ss".

     

    This post is brought to you by "ß" (U+00df, a.k.a. LATIN SMALL LETTER SHARP S)
    And really, who elase would it be? :-)

Page 3 of 5 (65 items) 12345