Blog - Title

August, 2006

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    And if your language starts playing a different TUNE

    • 16 Comments

    Warning to readers: this post is completely and totally my own opinions based on my efforts to assist with Tamil's representation in Unicode, and truly have nothing to do with Microsoft's opinions on the matter (whatever they are). If you quote anything from my words here as being 'According to Microsoft' then be aware that you are a complete moron whose only saving grace is that being a moron is a venial and not a mortal sin. You have been warned!

    As I write this post, the lyrics of Roger Waters wash over me:

    And if the cloud bursts, thunder in your ear
    You shout and no one seems to hear.
    And if the band you're in starts playing different tunes
    I'll see you on the dark side of the moon.

    It is quite ironic that these words, (from the song Brain Damage, on Pink Floyd's Dark Side of the Moon), seem to so easily link to the insanity that I have seen from afar related to Tamil Unicode - New Encoding (TUNE).

    You can see the introduction page for Tamil Virtual University's Request For Comments here.

    What this standard amounts to is an attempted re-encoding of the Tamil script using Unicode's PUA (Private Use Area) in an attempt to make Tamil into a simple script (rather than a comple one), to build collation support directly into the order of the code points in the encoding, to encourage ISV's like Adobe to support Tamil.

    The fact that this ignores the rules in Unicode related to the re-encoding of scripts that already exist, the fact that collation is never designed as a part of the order of code points in any language (even English!), the fact that INFITT (the INternational Forum for Information Technology in Tamil) and it's 'WG02' Unicode working group (of which I am a member along with several native Tamil speakers from around the world) is on record as disagreeing with the bulk of the claims and assertions made by TUNE supporters, the fact that the Unicode Technical Committee is on record as considering many of the fundamental aspects of TUNE to be entirely unsupportable -- all of these things are ignored.

    After WG02 made its feelings clear on the matter, the TUNE supporters had their own working group created (WG08) and although I am officially the liaison between INFITT and Unicode, have never been given any communication related to TUNE to present to the Unicode Consortium (I have been told this is due to my obvious bias against TUNE, though no one from WG08 has communicated to Unicode through other means, either).

    So yes it is a request for comments, but one in which if the comments are negative, the commenter can expect little more than to be ignored, or dismissed due to bias.

    So there are two kinds of people here -- those who agree with TUNE, and those who are wrong.... :-(

    Tamil Nadu has had a similar appoach to 8-bit standards, where they rejected the TSCII standard that was widely used outside of Tamil Nadu and instead formulated their own TAB/TAM standards. Historically their recent efforts in areas such as encodings and keyboards have not been as well received by members of the Tamil Diaspora as other orthographic changes in the language and script in the last 30-35 years.

    Anyway, back to being out of TUNE. :-)

    For those who are in Tamil Nadu:

    "...TVU is organizing a one day conference for obtaining the public opinion and to deliberate on the comments received. The proposed one day conference will have an inaugural session, a session for open discussion in the forenoon. The conference will be held in the Clive Hall at Taj Coromondal Hotel., Nungambakkam during 9.30 a.m. on 2nd September 2006."

    If any of my readers are in Tamil Nadu and would like to attend this one day conference, please let me know what happens (and if you contribute anything be sure not to mention you agree with anything I say, given my bias and all!). Given the step backwards that I truly believe this whole effort represents, I am truly hoping that those in Tanil Nadu and TVU who are championing the new standard can be finally convinced that they are out of TUNE....

    The lunatic is in their head
    The lunatic is in their head
    they raise the blade, they make the change
    They rearrange it; it's insane
    they lock the door
    and throw away the key
    there's someone in their head but it's not me

    And if bad standards thunder in their ear 
    We shout and TVU doesn't seems to hear.
    And if the standard they're in starts implementing TUNE 
    We'll see them on the dark side of the moon.

    This post brought to you by க் (U+0b95 U+0bcd, a.k.a. TAMIL LETTER KA + TAMIL SIGN VIRAMA, a.k.a. TAMIL KA puLLi, a.k.a. TAMIL LETTER K)
    A letter that is separataely encoded in TUNE, along with several hundred othere)

  • Sorting it all Out

    The advanced feature in MSKLC doesn't really scan

    • 2 Comments

    Earier this month, Emerson asked me via the 'Contacting Michael' link:

    Hello,

    My name is Emerson, I'm from Brazil and I am kind of desperate. I have a question concerning the Microsoft Keyboard Layout Creator and I would like to know if you can help me on that.

    Does the MSKLC really allow the changing of scan codes? The help says it's possible, but I just can't do it. The virtual keys with changed scan codes end up with two scan codes: the original one and the one I assigned by me. It would not be a problem if I could "build DLL and setup package" even that way, but when I try to do that, I get the following:

    1) A message box appears. It says: "There was a problem building the keyboard file. Would you like to see warning/error information?". I click "Yes".

    2) A message box titled "Standard output information" says: "Error 2020 (..\.\tmpLayout01.txt, line 74): VK_OEM_2 (bf) found at scancode 73 and 35." I click OK.

    That's it. Obviously some paramaters from the second message box may vary according to the keys I try to change the scan code. By the way, to change the scan code I right-click on the key I want, click on "Properties for VK_ in all shift states", click on the check box for "Advanced View", change the scan code for whatever I want and finally click OK.

    The layout may work with no problems on that text box and it also may be validated with no errors (only warnings) -- it does not matter, it seems impossible to build the DLL and setup package.

    Am I doing something wrong? What would you recommend me to do?

    Oh, by the way, I'm using a Japanese laptop (that's why I want to change some scan codes -- basically adding scan code 73 over some other not used by my keyboard) with Windows XP Service Pack 2 (Brazilian, slipstreamed).

    Actually the only problem I have with this laptop is that I can't type diacritics properly, because the Japanese layout has no dead keys. Since MSKLC does not handle the Japanese layout properly, I found that the best way to get the keyboard working the way I want would be changing some scan codes and adding the dead keys with MSKLC. Also it would be useful if I could change the virtual key name, so the old scan code could keep with the original virtual key, and I would put a new scan code for a new virtual key.

    Sorry for the big message, hope you can help me. Keep the good work up.

    Thanks in advance.

    Regards,

    Emerson

    Ok, I think we're going to have one of those awkward moments now. :-)

    The dialog Emerson is referring to is this one:

    (I am using the Greek polytonic keyboard in the examples here, by the way!)

    With the Advanced view on, the Scan Code is available to be set, in this case it is 0x1b (or as it is shown in the dialog, 1b).

    Now if you take a keyboard, save it out as a .KLC file, and then open that .KLC file in Notepad, you will see (among other things) a table that looks something like this, with one row per key:

    //SC VK_   Cap  0     1     2    6     7
    //-- ----  ---- ----  ----  ---- ----  ----
    19   P      1   03c0  03a0  -1   -1    -1
    1a   OEM_4  0   005b@ 007b@ -1   00ab@ -1
    1b   OEM_6  0   005d@ 007d@ -1   00bb@ 0387

    and so on.

    Note how the first column is the scan code, and the second column is the virtual key. Essentially this mapping is built entirely from scratch by MSKLC, based on some keyboard layout.

    When we discussed the user interface, initially we wanted to just hide the whole thing, as every method we tried to work out to let people adjust the VK values just got us in trouble with usability. We finally ended up putting in this obscure method to switch scan codes around a bit, which is mostly useless. Since it is up to the hardware keyboard layout where the actual scan codes sit (the keys are the ones that emit the scan codes), being able to switch them would just make people feel better if they wanted to feel like they were moving stuff around.

    Which I knew was kind of dumb even back then, but only realized how dumb when I was playing with the feature as research when writing this article. MSKLC sharply limits stuff like putting in both new scan codes and higher valued scan codes (which is what Emerson is trying to do).

    Given that it is useless in its current form and that MSKLC blocks most of what would be legitimately useful (but which never occurred to anybody to ask for while it was being developed and tested), I think it would be a good idea to just treat the whole scan code "feature" in MSKLC as being completely broken and get a bug in to try and fix this in a future version....

    In the meantime, the only real workaround would be to do some of the work manually. Not for the faint of heart, but I think I can put together an example as an upcoming advanced blog post over the weekend? :-)

     

    This post brought to you by  (U+30bb, a.k.a. KATAKANA LETTER SE)

  • Sorting it all Out

    If you wanted to get it done with the font...

    • 1 Comments

    It wasn't all that long ago that I was talking about how Sometimes, uppercasing sucks. And between that post and the follow-up one I was showing example strings like

    Ρύθμιση σήματος

    which a native would expect to be capitalized not as

    ΡΎΘΜΙΣΗ ΣΉΜΑΤΟΣ

    but instead as

    ΡΥΘΜΙΣΗ ΣΗΜΑΤΟΣ

    and I pointed out that the NLS casing and collation tables, which pride themselves on features like reversibility and weighed equivalences, simply did not have this type of transformation in mind.

    I even suggested that some people were considering the problem from a typographical point of view. Which really makes sense given that it is essentially an appearance issue much more than anything else -- but an important appearance issue, one that should not really be ignored.

    And luckily, typographical wizard John Hudson pointed me to this thread over on Typophile that really digs into the issues of dealing with the expected appearance of uppercase Greek from the point of view of the fonts.... definitely worth a look as it really digs into some of the complexities.

    So now that it turns out someone has actually been out there doing what I was just kinda hypothesizing about, I thought it would make sense to take a step back and make sure people understand the trade-offs of each method, so I'll talk about some of these issues in a future post....

     

    This post brought to you by Σ (U+03a3, a.k.a. GREEK CAPITAL LETTER SIGMA)

  • Sorting it all Out

    When you think it couldn't get any harder, it gets easier

    • 4 Comments

    From the prophet Douglas Adams:

    Click, hum.

    The huge grey Grebulon reconnaissance ship moved silently through the black void. It was travelling at fabulous, breath-taking speed, yet appeared, against the glimmering background of a billion distant stars to be moving not at all. It was just one dark speck frozen against an infinite granularity of brilliant night.

    On board the ship, everything was as it had been for millennia, deeply dark and silent.

    Click, hum.

    At least, almost everything.

    Click, click, hum.

    Click, hum, click, hum, click, hum.

    Click, click, click, click, click, hum.

    Hmmm.

    A low level supervising program woke up a slightly higher level supervising program deep in the ship's semi-somnolent cyberbrain and reported to it that whenever it went click all it got was a hum.

    The higher level supervising program asked it what it was supposed to get, and the low level supervising program said that it couldn't remember exactly, but thought it was probably more of a sort of distant satisfied sigh, wasn't it? It didn't know what this hum was. Click, hum, click, hum. That was all it was getting.

    The higher level supervising program considered this and didn't like it. It asked the low level supervising program what exactly it was supervising and the low level supervising program said it couldn't remember that either, just that it was something that was meant to go click, sigh every ten years or so, which usually happened without fail. It had tried to consult its error look-up table but couldn't find it, which was why it had alerted the higher level supervising program to the problem .

    The higher level supervising program went to consult one of its own look-up tables to find out what the low level supervising program was meant to be supervising.

    It couldn't find the look-up table .

    Odd.

    It looked again. All it got was an error message. It tried to look up the error message in its error message look-up table and couldn't find that either. It allowed a couple of nanoseconds to go by while it went through all this again. Then it woke up its sector function supervisor.

    The sector function supervisor hit immediate problems. It called its supervising agent which hit problems too. Within a few millionths of a second virtual circuits that had lain dormant, some for years, some for centuries, were flaring into life throughout the ship. Something, somewhere, had gone terribly wrong, but none of the supervising programs could tell what it was. At every level, vital instructions were missing, and the instructions about what to do in the event of discovering that vital instructions were missing, were also missing.

    Small modules of software - agents - surged through the logical pathways, grouping, consulting, re-grouping. They quickly established that the ship's memory, all the way back to its central mission module, was in tatters. No amount of interrogation could determine what it was that had happened. Even the central mission module itself seemed to be damaged.

    This made the whole problem very simple to deal with. Replace the central mission module. There was another one, a backup, an exact duplicate of the original. It had to be physically replaced because, for safety reasons, there was no link whatsoever between the original and its backup. Once the central mission module was replaced it could itself supervise the reconstruction of the rest of the system in every detail, and all would be well.

    Robots were instructed to bring the backup central mission module from the shielded strong room, where they guarded it, to the ship's logic chamber for installation. 

    This involved the lengthy exchange of emergency codes and protocols as the robots interrogated the agents as to the authenticity of the instructions. At last the robots were satisfied that all procedures were correct. They unpacked the backup central mission module from its storage housing, carried it out of the storage chamber, fell out of the ship and went spinning off into the void.

    This provided the first major clue as to what it was that was wrong.

    Further investigation quickly established what it was that had happened. A meteorite had knocked a large hole in the ship. The ship had not previously detected this because the meteorite had neatly knocked out that part of the ship's processing equipment which was supposed to detect if the ship had been hit by a meteorite.

    The first thing to do was to try to seal up the hole. This turned out to be impossible, because the ship's sensors couldn't see that there was a hole, and the supervisors which should have said that the sensors weren't working properly weren't working properly and kept saying that the sensors were fine. The ship could only deduce the existence of the hole from the fact that the robots had clearly fallen out of it, taking its spare brain, which would have enabled it to see the hole, with them.

    You may be wondering why I shared this 'Mostly Harnless' excerpt. I'll try and see if I can point out the relevance of dealing with a problem where the information you need is not directly available, and you can only find what you need through inference....

    Remember back in March when I posted Getting all you can out of a keyboard layout, Part #9a?

    In part of that post I talked about the fact that MapVirtuaKey[Ex] did not distinguish keys such as the left and right CONTROL keys, or the left and right ALT keys, since their scan codes differed only by an extended bit and MapVirtuaKey[Ex] strips out the extra bits.

    And without explicitly mentioning how lame it was, I talked about how to work around this problem.... using indirect methods.

    Now you might think that this blog represents such a powerful force at Microsoft that the mere suggestion of such a limitation inspired action at the highest levels and that the developer who owns the code was asked to immediately address the issue.

    Of course, you'd be mistaken. Utterly mistaken.

    Luckily, there was another Windows component that had the same requirement, and there was a developer who was slightly less warped than me who did not realize there was that odd, indirect way to get at the info.

    And that request led to new functionality of MapVirtuaKey[Ex] -- an extension to two of the existing flags, a whole new flag, and constants to cover the flag values so you don't have to pass a bunch of numbers.

    The update info that will make its way into the Platform SDK eventually (after a scrub by the doc. writer of course!):

    0      MAPVK_VK_TO_VSC
    uCode is a virtual-key code and is translated into a scan code. If it is a virtual-key code that does not distinguish between left- and right-hand keys, the left-hand scan code is returned. If there is no translation, the function returns 0.

    1      MAPVK_VSC_TO_VK
    uCode is a scan code and is translated into a virtual-key code that does not distinguish between left- and right-hand keys. If there is no translation, the function returns 0.
    Windows Vista: the high byte of uCode can contain 0xe0 or 0xe1 extended scan code prefix to specify the extended scan code.

    2      MAPVK_VK_TO_CHAR
    uCode is a virtual-key code and is translated into an unshifted character value in the low order word of the return value. Dead keys (diacritics) are indicated by setting the top bit of the return value. If there is no translation, the function returns 0.

    3      MAPVK_VSC_TO_VK_EX
    Windows NT/2000/XP: uCode is a scan code and is translated into a virtual-key code that distinguishes between left- and right-hand keys. If there is no translation, the function returns 0.
    Windows Vista: the high byte of uCode can contain 0xe0 or 0xe1 extended scan code prefix to specify the extended scan code.

    4      MAPVK_VK_TO_VSC_EX
    Windows Vista: uCode is a virtual key code and is translated into a scan code. If it is a virtual-key code that does not distinguish between left- and right-hand keys, the left-hand scan code is returned. If there is not translation, the function returns 0. If the scan code is an extended scan code, the high byte is the extend prefix (either 0xe0 or 0xe1).

    Now if I were a lesser person I might have started a rumor about the VP who read my blog and intervened to get the feature in, but honestly I am just so happy that it is there that I see no need to make up any stories about it. :-)

    It does mean that I might need to do a new part to that series to cover the things that are now easier to do in Vista, of course. But I can't remember the last time I was afraid to do a blog post....

     

    This post brought to you by ܞ (U+071e, a.k.a. SYRIAC LETTER YUDH HE)

  • Sorting it all Out

    No, Virginia, Technical Aides do not report to Technical Leads

    • 0 Comments

    So when Bryan asked me the other day whether a Technical Aide was someone who reported to a Technical Lead, I had to smile.

    It is not like the titles, which aim for consistency, really meet their goals.

    A Software Development Enginner reports to a Development Lead who reports to a Development Manager.

    And a Software Test Engineer reports to a Software Development Enginner in Test who reports to a Test Lead who reports to a Test Manager. Which is kind of consistent, too.

    But then does a Program Manager report to a Program Manager Lead who reports to a Program Manager Manager?

    No, a Program Manager reports to a Lead Program Manager who reports to a Group Program Manager.

    Then all three of those managers may report to a Group Manager who may report to a Product Unit Manager may report to a Director who may report to a General Manager who may report to a Corporate Vice President who may report to a Vice President (or vice versa), and so on.

    Consistency is not our weakness here.

    Now I talked about a Technical Lead a bit in this post, but I can add to that the fact that a Technical Aide does not report to a Technical Lead.

    A Technical Aide is kind of like the Microoft version of a Lord Privy Seal or a Minister Without Portfolio. They would usually report to someone really important like a CTO or a VP.... which of course I am not.

    Maybe a Technical Lead could report to a Technical Aide, though usually not.

    So Bryan, does that answer your question? :-)

  • Sorting it all Out

    There should always be a fallback available

    • 9 Comments

    So during the season finale of Entourage, one of the sage pieces of advice given to Vince about his plan to fire Ari was that he really ought to get a feel for what is out there in terms of agents before he actually fires Ari. It is all about having a good fallback in case the resource you have been relying on turns out to not be able to come with the goods....

    One of the things that has been built into the Language Interface Pack design, required due to the fact that it is only a partial localization, is the notion of a fallback language to use in case the resources do not exist in the expected language.

    Hoefully you would not fire the LIP since it is not screwing up like Ari did, it is just doing its best. But clearly English is not always the best choice for language fallback.

    In Vista, this notion has been further extended so that it does not have to only be embedded in the locale itself but can also be a user overridable choice. The Regional and Language Options tab will look as follows when a partial localization is chosen:

    Notice how there is even a third language choice you can make if the second one is thought of as a partial localization. Importantly, note that neither the second nor the third choice will be visible if a partial localization is not chosen.

    Which is not to say Dutch is a partial localization in Vista, but it is temporarily with not all the strings localized, and it allows the testng of this feature to happen more widely.

    Now all of this is very cool, but I think the design is a bit flawed, in my opinion.

    Because we actually encourage people to build their own applications with MUI-like features and we encourage them to use both the MUI resource loading and UI language functions, and clearly our business cases that are used to decide which localizations are full ones and which ones are partials will not always match those of an ISV writing an application with MUI functionality.

    Therefore, the option to say "if you can't find this language, then use that one" should always exist. Because even if a full localization is involved, what we call full and what a user does are two different things....

     

    This post brought to you by (U+0eab, LAO LETTER HO SUNG)

  • Sorting it all Out

    What the hell is wrong with TranslateCharsetInfo, anyway?

    • 6 Comments

    The other day, Colin had a customer with a question about some unexpected results from the Win32 function TranslateCharsetInfo:

    My customer is having issues with the TranslateCharsetInfo API. They are using the TCI_SRCLOCALE to use the LCID as the source

    fResult = TranslateCharsetInfo((DWORD*)(langid), &csi, TCI_SRCLOCALE);

    What they find is that the results of this call do not match always match the values listed in http://www.microsoft.com/globaldev/nlsweb/default.mspx

    Specifically:

    • LCID 0x0C04 – Chinese (Hong Kong S.A.R.) - returns codepage 936 instead of the expected codepage 950
    • LCID 0x0004 - Chinese (Simplified) – returns codepage 950 instead of the expected 936

    Is what I’m seeing expected, if not is there a better way to achieve this? 

    We did also try:

    GetLocaleInfo (langid, LOCALE_IDEFAULTANSICODEPAGE, szLocaleData, sizeof(szLocaleData)/sizeof(char)) ;

    However from this the 0x0C04 LCID returns the expected 950 codepage (getting better), but the 0x0004 LCID still returns 950 instead of the 936 that the web documentation suggest I should have returned.

    Or am I missing something?

    Many thanks,

    Colin

    The results of the two LCID values are actually caused by two entirely different issues.

    In both cases, GDI is relying on the info that GetLocaleInfo with the LOCALE_FONTSIGNATURE value for the LCTYPE returns. In other words, GDI is depending on NLS for the info on what to do here.

    In one case, the fact that

    • the .NET Framework considers 0x0004 to be "Simplfied Chinese" even though
    • Windows has for so many versions treated the neutral LCID as having an implicit SUBLANG_DEFAULT attached making it 0x0404 (Chinese - Taiwan, a Traditional Chinese locale)

    can be considered a legitimate design flaw in NLS that has only been fixed in Vista (via its ConvertDefaultLocale function whose logic all NLS API functions that take an LCID go through.

    In other words, that bug no longer repros on Vista. :-)

    Now for the other half of the question, let's use the CultureAndRegionInfoBuilder's secret LOCALESIGNATURE parsing with a simple bit of code like the following:

    using System;
    using System.Globalization;

    namespace LDML {
        class LDML {
            [STAThread]
            static void Main(string[] args) {
                CultureAndRegionInfoBuilder carib = new CultureAndRegionInfoBuilder(args[0], CultureAndRegionModifiers.Replacement);
                carib.LoadDataFromCultureInfo(new CultureInfo(args[0], false));
                carib.LoadDataFromRegionInfo(new RegionInfo(args[0]));
                carib.Save(args[0] + ".ldml");
            }
        }
    }

    Then, after you save this as ldml.cs and compile it:

    csc ldml.cs /r:sysglobl.dll

    then you can save out the LDML that zh-HK (0x0c04) uses, and look at the markup afterward:

          <msLocale:fontSignature>
            <msLocale:unicodeRanges>
              <msLocale:range type="0" />
              <msLocale:range type="1" />
              <msLocale:range type="2" />
              <msLocale:range type="3" />
              <msLocale:range type="5" />
              <msLocale:range type="7" />
              <msLocale:range type="9" />
              <msLocale:range type="31" />
              <msLocale:range type="35" />
              <msLocale:range type="36" />
              <msLocale:range type="37" />
              <msLocale:range type="38" />
              <msLocale:range type="39" />
              <msLocale:range type="42" />
              <msLocale:range type="43" />
              <msLocale:range type="45" />
              <msLocale:range type="46" />
              <msLocale:range type="48" />
              <msLocale:range type="49" />
              <msLocale:range type="50" />
              <msLocale:range type="51" />
              <msLocale:range type="54" />
              <msLocale:range type="59" />
              <msLocale:range type="60" />
              <msLocale:range type="68" />
            </msLocale:unicodeRanges>
            <msLocale:defaultCodePages>
              <msLocale:ansiCodePage />
              <msLocale:ansiOemCodePage>
                <msLocale:codePage type="936" />
              </msLocale:ansiOemCodePage>
              <msLocale:oemCodePage />
            </msLocale:defaultCodePages>
            <msLocale:codePages>
              <msLocale:ansiCodePage />
              <msLocale:ansiOemCodePage>
                <msLocale:codePage type="936" />
              </msLocale:ansiOemCodePage>
              <msLocale:oemCodePage />
            </msLocale:codePages>
          </msLocale:fontSignature>

    Note especially the code page that the LOCALESIGNATURE has (marked in RED above). So technically it is not GDI's fault for returning the wrong information, since it is simply relying on NLS.

    Though technically you cannot blame NLS either, since all of the LOCALESIGNATURE data is provided to us by the typography team. Eventually we may want an update from them on this so that GetLocaleInfo can return consistent results between LOCALE_IDEFAULTANSICODEPAGE, LOCALE_IDEFAULTCODEPAGE, and LOCALE_FONTSIGNATURE....

    How to fix is an interesting question, though.

    In theory, given the increased usage of Simplified Chinese in Hong Kong in recent years would make it interesting to change the LOCALESIGNATURE's default code page to be 950, but to list both 936 and 950 in the code page section. And from a descriptive standpoint that might make a lot of sense.

    In practice, however, the LOCALESIGNATURE's raison d'être is to provide informaton for creating a sensible default font to use for the locale. And generally speaking one does not have fonts that would support both -- the fact that either one may be used does not take into account that it really is an either/or proposition. So the best fix is likely just to make the LOCALESIGNATURE match the locale it sits in....

     

    This post brought to you by (U+1789, KHMER LETTER NYO)

  • Sorting it all Out

    It has not always been so invariant

    • 0 Comments

    The other day, Bill asked the following question:

    Anyway, I have the following code:

        SYSTEMTIME st;
        memset(&st, 0, sizeof(SYSTEMTIME));
        st.wYear = 2005;
        st.wMonth = 9;
        st.wDay = 12;
        st.wHour = 12;
        st.wMinute = 30;
        DATE date;
        SystemTimeToVariantTime(&st, &date);

        VARIANT myVar;
        VariantInit(&myVar);
        myVar.vt = VT_DATE;
        myVar.date = date;
        VARIANT myStr;
        VariantInit(&myStr);

        HRESULT hr = ::VariantChangeTypeEx(&myStr, &myVar, LOCALE_INVARIANT, 0, VT_BSTR);
        printf("%S\n",myStr.bstrVal);


    on Win2K, this will print:

        123000

    on WinXP, this will print:

        09/12/2005 12:30:00

    So my customer asks: "Is this a known issue?  Is there a work around that can be suggested?"

    Some of you have read this post (which gives the exact date that the change to support the invariant locale was added to the Windows source tree), then you know the answer already.

    In a nutshell, the invariant locale didn't exist until after Windows 2000 shipped.

    So the behavior that is seen here is what you get when an unsupported LCID is passed (we have previously established that no one checks return values!).

    For workrounds, picking any specific LCID here will do the trick. As long as you use a particular one consistently then the results will have consistency....

     

    This post brought to you by (U+12d9, a.k.a. ETHIOPIC SYLLABLE ZU)

  • Sorting it all Out

    About the Fonts folder in Windows, Part 3 (aka What changes in Vista?)

    • 27 Comments

    Previous posts in this series:

    This time, I will be just quickly talking about the changes in Vista. Qucik, because not very much has changed....

    One thing that has not changed is that diaog for adding fonts that I talked about back in Part 2 of the series. Sorry folks, I know people have been wanting this one to go away. It won't be going away for Vista, though.

    Another thing that has not changed much is the typical way people use to install and remove fonts -- dragging them in and out of the Fonts folder. Although, since Administrative permissions are still required to install fonts into the Fonts folder, the addition of the UAC feature to Vista will change the experience for some people. I mean, since even an Admin is not really an Admin anymore unless they okay the elevation.

    Which gets us to something that has changed -- copying files to the Fonts folder and then opening the folder in an Explorer window, one of the weirdest ways to install a font programatically that I could ever imagine, will no longer work in Vista. As a feature, it never worked all that well anyway. Hopefully people won't miss it too much, if people do I'd love to know what you were doing with it....

    Perhaps one of the biggest changes for fonts in Vista is that you no longer need to specially install other language fonts via checkboxes in Regional and Language Options. All languages are installed automatically, which is a wonderful thing for almost everybody (though there is a small group of people who unhappy with the huge font list. I look forward to an update to the ChooseFont dialog in a future version that manages the huge font list a little bit better.

    Otherwise, it is business as usual for the Fonts folder, in Vista.

    I'll be talking about Unicode version support of fonts in a future post....

     

    This post brought to you by F (U+0046, a.k.a. LATIN CAPITAL LETTER F)

  • Sorting it all Out

    I must admit that an example would be nice

    • 7 Comments

    Regular reader and Internationalization MVP Mihai asked in the Suggestion Box:

    Just read the today post ("Creation of transliterating input methods")

    And I do agree with Thakara that the TSF documentation is very poor.
    If TSF is soo alive and well in Vista, and seems to be "the way to go," then maybe is time for a better doc.

    I know, updating the MSDN can take long, but maybe you can shot some articles on the subject, to get us started?

    The post Mihai is talking about is this one....

    Well, I can't say I disagree with the idea; I said as much in this post, Though as I also said there, it will take some time to really dig in there.

    But on the other hand, there are those other samples I have been posting for the Cantonese and Unicode IMEs -- which are also Text Service Framework TIPs. And I will be posting more samples of them, updates to those samples, and information about the format (and how it worked out to help for Yi and Amharic in Vista!).

    So hang in there, I will be doing more posts soon in this area. :-)

     

    This post brought to you by (U+0d07, a.k.a. Malayalam Letter I)

  • Sorting it all Out

    The myth of cross-product compatibility

    • 0 Comments

    Editorial note: there will be a certain type of Drunkard's Walk feel to this post, but that is because the navigation is actually controlled by a specific customer's attempt to understand behavior in SQL Server. The timeline will be a little abbreviated, but I'll try and hit all of the high points....

    The other day, when I was talking about Decimal vs. hexadecimal LCIDs, backcompat, and being weird, I made a statement about the irony of finding a bug that could have been found in many different COLLATIONS supported in SQL Server in a Unicode-only locale and a binary sort -- one of the specific collations that has nothing to add to either collation or code page behavior.

    In fairness, I should take a bit of that back. The truth is that Unicode only, binary collations have one thing to add to the mix that other collations do not.

    They can add a bug that adds inconsistency, makes SQLCLR integration a bit harder, and is very poorly documented, to boot!

    To see the problem, let's take the following query and see which collations we are talking about (based on all the lessons we learned from this post and comments thereof!):

    SELECT
        name,
        COLLATIONPROPERTY(name, 'CodePage') as CodePage,
        CONVERT(binary(4), COLLATIONPROPERTY(name, 'LCID')) as LCID,
        CONVERT(binary(4), COLLATIONPROPERTY(name, 'ComparisonStyle')) as ComparisonStyle,
        description
    FROM
        ::fn_helpcollations()
    WHERE
        COLLATIONPROPERTY(name, 'CodePage') = 0 AND
        name LIKE '%_BIN'

    This will return just three rows:

    Divehi_90_BIN           0 0x00000465 0x00000000 Divehi-90, binary sort
    Indic_General_90_BIN    0 0x00000439 0x00000000 Indic-General-90, binary sort
    Syriac_90_BIN           0 0x0000045A 0x00000000 Syriac-90, binary sort

    Hmmm... where is the Georgian? I mean, Georgian is a Unicode-only locale!

    Ok, we will try a slightly different query:

    SELECT
        name,
        COLLATIONPROPERTY(name, 'CodePage') as CodePage,
        CONVERT(binary(4), COLLATIONPROPERTY(name, 'LCID')) as LCID,
        CONVERT(binary(4), COLLATIONPROPERTY(name, 'ComparisonStyle')) as ComparisonStyle,
        description
    FROM
        ::fn_helpcollations()
    WHERE 
        name LIKE 'Georgian%' AND
        name LIKE '%_BIN'

    and run it. We get back just one row:

    Georgian_Modern_Sort_BIN 1252 0x00010437 0x00000000 Georgian-Modern-Sort, binary sort

    It thinks the code page is 1252? That is nothing like CultureInfo("ka-GE").TextInfo.ANSICodePage, at all! Somehow SQL Server has made Georgian a locale with a system code page that does not support any of the Georgian characters that are needed in the language.

    So we already know that the sorting doesn't match -- now we know sometimes the codepages don't match, either?

    Let's table that for a moment; we'll come back to it with the explanation.

    It gets worse. That code page value of 0? That happens to be the number behind CP_ACP, the default system code page. Let's try that out with the following script:

    use master
    IF DB_ID (N'sql_test') IS NOT NULL
        DROP DATABASE sql_test;
    GO

    CREATE DATABASE sql_test;
    GO

    use sql_test

    CREATE TABLE sql_cp (
        colD nvarchar (50) COLLATE Divehi_90_BIN NULL,
        colA nvarchar (50) COLLATE Arabic_BIN NULL
    )

    GO

    use sql_test
    INSERT INTO
        sql_cp (colD, colA)
    VALUES (
        'ابةتثجحخدذرزسشصضطظعغ',
        'ابةتثجحخدذرزسشصضطظعغ')

    SELECT * FROM sql_cp;

    If you really did pass 0 to MultiByteToWideChar on a system with an Arabic system code page, it would convert the string via cp1256. What does the query return?

    ????????????????????    ابةتثجحخدذرزسشصضطظعغ

    Hmmm.... it must have changed the meaning of 0 to mean the default database codepage. Which in this case is based on the server collation, which happens to be one of the Latin1_General collations.

    Ok, we'll just call that one unexpected rather than a bug. But the fact that Georgian is no longer a Unicode only locale when both Windows and the .NET Framework think it is remains a problem.

    What about other Unicode only locales like Armenian? Well, they are slightly worse off; since they have no collations of their own (no code page requirement and no unique sort beyond the default table means no unique collation in SQL Server), they have lost their identity, in a way.

    And finally we know why this bug in Georgian exists -- because the Traditional Georgian sort is in the default table (along with lots of other Unicode only locales) and since it has the CodePage of 1252, the modern sort has to as well.

    But this means that a whole lot of SQL Server collations do not match their analagous cultures in the .NET Framework or locales in Windows. And that is even before we get into the whole rogue version of the sorting tables issue I've mentioned in the past, or the newer Danish/Norwegian problems or the potential upcoming problem with Swedish/Finnish.

    This is hardly the only reason for the changes that happened with WinFS, if you will notice it was not really mentioned at all. But these differences had a lot to do with the integration challanges, which had still not been fully worked at the point that the project's energies were redirected.

    It is a problem that we in GIFT had to deal with ourselves with a managed child that was engineered to try to fix old problems but then which later had to find itself integrated back with its unmanaged parent for custom locales to work. And we did not even have the excuse of different teams since it was often the very same people. But we decided to dig in and solve the problems -- because that integration is so crucial.

    What the next version of SQL Server needs here is a fix to the whole problem. A way to consistently represent the locale, codepage, and collation data that it consumes from Windows and then later on tries to use in concert with the operating system and the .NET Framework....

     

    This post brought to you by (U+0f8a, a.k.a. TIBETAN SIGN GRU CAN RGYINGS)

  • Sorting it all Out

    Shortcuts can be so Goth, you know?

    • 3 Comments

    Cameron mentioned to me via the Contacting Michael... link:

    In your blog posting "Getting rid of your extra yen" on 2005/12/28, someone made a comment that even after the locale was changed from Japanese back to English, "MS Gothic" (rather than Lucida Console) remained in the list of available fonts for the Command Prompt. Today, I ran into exactly the same problem/behavior that Nicholas Allen (the commenter) was describing. Then I realized that any shortcuts made to cmd.exe while the OS was in Japanese locale retained the MS Gothic font whereas cmd.exe itself always displayed the list of fonts applicable to the current locale. So there appears to be something sticky about the codepage that gets recorded in the Shortcuts to cmd.exe. Thought you would like to know.

    Thanks,

    Cameron

    The post Cameron is referring to is here. :-)

    This seemed like a worthwhile tip to share with people here, as it isn't terribly obvious that this is the case and I don't think I've ever seen it in the documentation....

     

    This post brought to you by "¥" and "" (U+00a5 and U+20a9, a.k.a. YEN SIGN and WON SIGN)

  • Sorting it all Out

    The Seattle Times plays Quechua-up, or 'How soon is now?'

    • 1 Comments

    Some may recall when I posted about Quechua me if you can! just this past June, which included a link to the download for the Quechua LIP.

    Well, the Seattle Times managed to break the news yesterday in their article Microsoft launches software in Andean language and pointed out that it was released in Peru in June and is "now available for download".

    So, apologies to The Smiths and all, but How soon is now? :-)

     

    This post brought to you by Q (U+0051, a.k.a. LATIN CAPITAL LETTER Q)

  • Sorting it all Out

    It's not my imagination; that function is ignoring me!

    • 5 Comments

    There is an alias at Microsoft that is the front line for the Shell team, the one place where the important issues like build breaks and blocking issues get brought up. Since even within Microsoft so many people assume that Windows == the Shell, this alias gets to triage and appropriately redirect issues a lot, too. Raymond talked about this list late last year in this post.

    And I belong to that list, so that every once in a while when someone complains about a Shell bug that is actually an internationalization bug, I can pitch in (though I am not on the official list to become a Dev O'Day or anything, it is always nice to help out if one has an area that know about!).

    Also, every once in a while something interesting to blog about comes up.... :-)

    Take the other day, for example. A problem was reported in Vista, with the SHLWAPI SHSetLocalizedName function, which sets the localized name of a folder. Basically it lets you set things up so that the Shell can use the SHLoadIndirectString function (which I have mentioned before) can be called.

    Anyway, the problem was that sometimes it wasn't working. The function was reporting back success (S_OK), but the name was not being set. However, if the folder was made readonly, then the localized name would appear (this post by Raymond explains how the setting actually has nothing to do with the folder being read-only, and why the setting would have an effect.

    On the list it was quickly determined that the change of the attribute worked from an elevated command prompt and failed from a non-elevated one, and was thus quickly determined to be a permissions issue. The caller lacked the permissions to completely succeed in making the call to SHSetLocalizedName.

    Of course the problem remained -- why was SHSetLocalizedName claiming success when it had technically failed?

    I asked Raymond about this, hoping for understanding of some higher purpose behind it all. He quickly set me straight:

    No, it's just a bug.

    I mean, who would be so crazy as to make a directory that you could create and modify files in but which you couldn't attrib +R!

    At the point this error occurs SHSetLocalizedName is kind of in a fix.

    It already made half of the change and got an error making the other half.

    Now what do you do?

    You can't undo the first half because your change to the first half overwrote what used to be there (if the file already had a different localized name).

    Sure, you could remember what used to be there before you overwrote it so you could restore it afterwards, but what if you encounter an error trying to restore it!?

    (or worse, what if somebody else also changed it in the meantime -- now your "restoration" is actually undoing somebody else's change)

    Basically, SHSetLocalizedName got itself into a jam and does its best to crawl out from under the rubble.

    One could even argue that it set the localized name just fine -- but like any situation where there is a setting and then a way to enable it, it simply hasn't been enabled yet. :-)

    Technically, it seems that SHSetLocalizedName is doing its level best in the face of adversity, so perhaps there is a higher purpose here....

    Anyway, we chatted about blogging about this issue. It seems worth doing since setting folder attributes is a pretty public process, and SHSetLocalizedName is a pretty public function. So anyone could run into this very same problem, depending on how the foder was created and where.

    And people really should be using the function to get the job done -- manually mucking about in desktop.ini or the cache in the registry or in file attributes in an NTFS stream or the folder's aura or its astrological sign or whatever other sneaky, undocumented method developers find to do work here can break anytime the underlying implementation of the functionality changes.

    For the record, such a change did happen in Vista, which kind of proves the point, right?

    Anyway, to answer the paranoia implicit in the title of the post -- the function isn't ignoring me, it is simply not pointing out the flaw in my approach when I didn't do everything right on my side. And the Shell is your friend -- what kind of friend is going to nitpick every little detail of a non-atomic operation? :-)

    And now you can prove your worth to the Shell by calling the function to get the work done -- what kind of friend is going to ignore the polite ways to communicate and instead rifle through the drawers while your friend isn't looking?

     

    This post brought to you by (U+0911, a.k.a. DEVANAGARI LETTER CANDRA O)

  • Sorting it all Out

    Everybody's working for the weekend

    • 1 Comments

    Peter asks:

    My customer is programming with .NET.

    Now he wanted to know if there is a way to determine if a given day in a Calendar is a weekend based on certain culture.

    It is “Friday/Saturday” in the Middle East, “Saturday/Sunday” in Europe/Americas and probably something else in somewhere else.

    We are trying to find a method similar as IsWeekend(Date d).

    Based on my research, this code below will get the first day of week based on the culture.

        CultureInfo myCI = new CultureInfo("en-US");
        MessageBox.Show(myCI.DateTimeFormat.FirstDayOfWeek.ToString());

    Is there a similar method for the information about weekend?

    Thanks!

    Hmmm. Not exactly the same question as the one asked in How many days in a weekend?, but close. And I did hypothesize back in that post from almost a year ago that the weekend was always in LOCALE_SDAYNAME6 and LOCALE_SDAYNAME7. But several people have pointed out that it ain't always so (there are indeed places that consider Sunday to be the first day of the week but Saturday/Sunday to the weekend.

    So there really is no way in Windows or the .NET Framework to find out when the weekend is, per locale/culture.

    What do you expect from a company with developers who think that working a half day means putting in 12 hours? :-)

     

    This post brought to you by (U+24e6, a.k.a. CIRCLED LATIN SMALL LETTER W)

Page 1 of 5 (64 items) 12345