Blog - Title

April, 2007

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    Giving Yi the weight it deserves

    • 0 Comments

    Bob Eaton asked over in the microsoft.public.win32.programmer.international newsgroup:

    I'm trying to compare two Yi Unicode-encoded strings using lstrcmp (which
    underlyingly uses CompareString).

    But when I compare the two strings, lstrcmp always returns 0 as if they are
    equal, even if they are not (e.g. (ua2d9) (ua2e0) vs. (ua2da) (ua2e0)).

    I presume it's because the OS (XP Pro SP2) doesn't have collation tables for
    Yi, but I thought I'd check if there was a work-around.

    Bob

    Sorry, Yi has no weight in any version of Windows prior to the latest one....

    The only workarounds here are:

    • use strcmp (which does a binary comparison), or
    • use Vista (which gives all of Unicode 5.0 weight)

     

    This post brought to you by (U+a2d9,a.k.a. YI SYLLABLE ZZAX)

  • Sorting it all Out

    When moving to from XslTransform to XslCompiledTransform...

    • 1 Comments

    Marco asked:

    Given a XSLT stylesheet with the following element:

            <xsl:output method="html" encoding="Windows-1252"/>

    NET Framework 1.1 System.Xml.Xsl.XslTransform.Transform method outputs:

            <META HTTP-EQUIV="Content-Type" Content="text/html; charset=Windows-1252">

    Using NET Framework 2.0 System.Xml.Xsl.XslCompiledTransform.Transform method outputs:

            <META http-equiv="Content-Type" content="text/html; charset=utf-8">

    I need the 1.1 output but using NET 2.0,

    How can I do that?

    Thank you very much in advance for any help,

    Luckily Anton was ready with the answer:

    If you pass a TextWriter/XmlWriter to the Transform() method, the encoding specified in the xsl:output element is ignored, and the encoding of that TextWriter/XmlWriter is used instead.  To respect the xsl:output encoding setting, you need to pass to the Transform() method either a Stream or a TextWriter/XmlWriter created with the desired encoding.  For example,

        XslCompiledTransform xslt = new XslCompiledTransform();
        xslt.Load("MyStylesheet.xsl");
        xslt.Transform("MyDocument.xml", XmlWriter.Create(myStream, xslt.OutputSettings));

    This seems quite sensible, though the fact that this information does not seem to be captured in the migration information under either XslCompiledTransform.Transform or XslTransform.Transform methods (e.g. it is not in the Migrating From the XslTransform Class topic).

    Which is why I thought I'd mention it here.... :-)

     

    This post brought to you by (U+0e5b, a.k.a. THAI CHARACTER KHOMUT)

  • Sorting it all Out

    A way better model for features, part 3 (a.k.a. downlevel LoadMUILibrary)

    • 2 Comments

    You may have seen part 1 and part 2 of this series. Here we are with part 3!

    Pavel S. Tsarevskiy asked last month over in the microsoft.public.win32.prograsmmer.international newsgroup: 

    I have a problem with using LoadMUILibrary() function and MUI technology under Windows XP.

    I tried to use Vista  MUI technology which is supposed to work under previous version of Microsoft OS.

    I want to solve the problems with  localization my application.

    I want to make one project which contain neutral resources for example icons, bitmaps, etc. And projects which contain language specified resources for each language. So, The main benefit, we can use LoadMUILibrary which make HMODULE which contain handle to both neutral and language specified resources. Then we change default resource handle using AfxSetResourcesHandle(). There're no resources in exe module. And therefore it's not necessary to indicate HMODULE for some load resource functions which load neutral resources. Otherwise, it's not necessary to duplicate all neutral resources in each language dll library. So, for all load functions resource handles would be clear.

    But, when I did test version of my application I saw that It's working under Windows Vista, but under Windows XP, HModule didn't indicate neutral resources, so it's impossible to load both neutral and language specified resources using one HModule under WinXP.

    Does anybody know about it?

    Microsoft says that this technology is working under XP.

    Pavel is correct here-- the LoadMUILibrary documentation does imply that it will properly redirect a needed to pick up resources in either the language neutral or appropriate language specific directory.

    Unfortunately, in downlevel situations, it does not provide an HINSTANCE that will magically find the resource when it is used in other, later resource function calls. In the words of MUI tester Mike McAdams:

    I agree with the person's post.  You can not expect to get at both the LN and LD using LoadMUILibrary on down-level (meaning pre-Vista).  You can however get at resources in both binaries using that API on Vista.  ...this is because of the resource redirection being done by the RL [Resource Loader].

    This particular "by design" is one that I consider to be rather unfortunate -- there is very little point (in my opinion) to doing all the work to provide a function downlevel that nevertheless behaves differently between the >= Vista and the downlevel case.

    The goal here was clearly to make sure that no special case code is needed, but if you want to use LoadMUILibrary downlevel, you end up having to use it differently....

    Given that, there is very little point to including the function downlevel!

    In my opinion, they should have found a way in the downlevel case and done the extra work here to make sure that DLLs loaded via LoadMUILibrary can behave the same way they do in Vista. Like by hooking the low level Resource Loader functions and redirecting as needed.

    Of course, only a team that owns the resource loader would really have a fair chance to accomplish such a thing, but as luck would have it, the MUI team owns the Resource Loader. :-)

    I think they should try to address this -- in an effort to make the suggested "best practices" more accessible to developers.

    It would be a much better model for supporting the pleanned feature, in any case!

     

    This post brought to you by(U+0b0a, a.k.a. ORIYA LETTER UU)

  • Sorting it all Out

    _wsetlocale doesn't support Unicode-only locales

    • 1 Comments

    There are many people who are writing code in C/C++ who try to use the more portable CRT functions rather than the Win32 ones.

    Since in many cases the CRT functions actually wrap the Win32 functions when you run on Windows anyway, this approach allows for more portable code if you do ever want to run on other platforms.

    This does not always work, though.

    I was talking to Ale Contenti over on the C++ team just the other day and one of the things we talked about was the fact that the locale support sitting underneath setlocale and _wsetlocale was not stored in a Unicode encoding, which is why even though there is a _wsetlocale function which clearly provides a Unicode interface to the locales, that the locales that are in fact set will not be Unicode.

    This is why the language strings that _wsetlocale supports do not include any of the Unicode only languages, and why if you try to load them anyway via their three-letter windows code you can expect a lot of question marks in your future if you try to use the data that the locale provides....

    In the end, the interface is not the most important part -- the underlying support is. So for now you'll have to stick to Win32 if you want all of the locales that the OS can support....

     

    This post brought to you by (U+0d0f, a.k.a. MALAYALAM LETTER EE)

  • Sorting it all Out

    What type to use for code page values

    • 1 Comments

    I was asked via the Contact link:

    Why does CharPrevExA/CharNextExA take a WORD for code page, whereas MultiByteToWideChar takes a UINT? Which type should I use to store code pages?

    It is true that CharNextExA and CharPrevExA take WORD values for code pages, while MultiByteToWideChar and WideCharToMultiByte take UINT values.

    But then as I pointed out in Is CharNextExA broken?, those two functions that take WORD values don't handle code pages greater than a WORD even though such code pages exist.

    So the functions named are able to accept all of the code pages values that they can respectively handle.

    To answer the question about what type to use to store code pages, the same rule could be followed -- use the type that is able to handle all of code pages that your code can handle....

     

    This post bought to you by(U+1654, a.k.a. CANADIAN SYLLABICS CARRIER SHU)

  • Sorting it all Out

    Testing MSLU

    • 2 Comments

    I had someone ask me the other day what kind of testing was done for MSLU.

    I told him that the whole project was an interesting example of how the places you expect to find bugs are fine while the places you think are fun have many bugs!

    As a starting point, I actually took lots of the actual tests run on NT on all of the functions that MSLU covered, compiled them to use MSLU, and ran them on Win9x with MSLU on the machine. This found a few bugs in the beginning, though most of them related to determining the ideal settings for unicows.lib/unicows.dll integration, which ultimately helped defined the instructions. There were very few bugs found through those tests.

    From there I downloaded all of the VC++ samples I could get my hands on like CTRLTEST and HELLO and HELLOAPP and MDIDOCVW and OCLIENT and like about sixty others, and converted them all to integrate with MSLU. This was an educational experience for me since I was always much more of a command line builder than someone to use Visual Studio, and I ended up having good instructions that worked with all of the samples I tried -- including features like function overrides and even MSLU loader overrides (required for MFC applications).

    Now with these samples I did find some bugs (surprisingly enough to me more than the actual tests), but again, most of what was learned related to integration, not bugs in actual functions.

    I was given ten machines to set up a small test lab in. After discussion with Julie and others about the fact that we were dealing with three different versions of Windows with three major flavors (SBCS, DBCS, RTL), the machines were set up a follows:

    • English Windows 98 (the first machine I was running all the tests on, later switched to Russian) 
    • German Windows 95 (later switched to Greek)
    • Italian Windows 98 (later switched to Turkish, still later switched to Thai)
    • French Windows Me (later switched to Polish)
    • Japanese Windows 95 (switched to Korean halfway through)
    • Korean Windows 98 (switched to Simplified Chinese halfway through)
    • Traditional Chinese Windows Me (switched to Japanese halfway through)
    • Arabic Windows 95
    • Hebrew Windows 98
    • Hebrew Windows Me (The Arabic failed with an error I couldn't read)

    I was given the lab as my office which was amusing since the A/C was broken in some kind of "always on" way that required me to wear a winter coat when I was in there. Any time I had to talk to people they asked me to meet them in their offices due to the ambient temperature. :-)

    The SBCS machines found very few bugs, the DBCS ones found a bunch of bugs in buffer sizes, and the RTL ones found a few GDI bugs early on but then none afterward.

    Other than being able to verify that a few things worked on Win95 vs. Win98 vs. WinMe that had special case code for the platform (to handle functions that existed in the later but had to be faked in the earlier), there was really no case where either Win95 or Win98 were needed -- the whole thing could have been handled with one Win95, one Win98, and eight WinMe machines with good coverage....

    The source for lots of the code that made its way into MSLU came from various Unicode layers all over Microsoft, from the VSAnsi layer to the MSO functions to the Access ones to many many others (something like 32 in all). In the beginning they were f great help, toward the end I was usually sending people email about all of the things that were broken in their layers!

    Starting in the middle of that time and then continuing on past the initial release, I worked with ISVs and internal devs at Microsoft who were building huge projects and who wanted to consider using MSLU -- from an early version of Product Studio to projects being produced by the then-named Crystal Reports that were going into the VS box to several other, similar applications.

    There were even a few cases where existing applications had link re-run on them to make them MSLU-ized (something I never even considered until "linker god" Dan Spalding suggested it), like Paint, Notepad, Wordpad, both NT4 and Win2000 Character Map, and even a few third party apps I had on my machine. I even got permission from Asmus Freytag to run a copy of UniBook compiled MSLU-style, and at his request made some "code review" suggestions as "payment"....

    I can honestly say that this is where the bulk of the bug reports came from -- on real applications. Which actually makes a lot of sense, when you think about it.

    So, would it all have been done differently had we known at the beginning what we learned by the end?

    Probably.

    But we did cover a lot of ground between that first lunch interview with Julie I described here and the time that Win9x as a platform fell out of support (described here). I probably got to learn more about all of these different projects than I ever would have been able to, otherwise.

    Was it a fun project to work on?

    Indubitably!

     

    This post brought to you by (U+ff37, a.k.a. FULLWIDTH LATIN CAPITAL LETTER W)

  • Sorting it all Out

    When the system locale is the display language

    • 3 Comments

    Björn pointed out a new take on the system locale (a.k.a. the Language for non-Unicode Programs) in Vista, from the help:

    Now compared to the confusion people hit in Windows 2000 and even in Windows XP/Server 2003, it seems nice to see the area covered in help.

    But notice how the most important points conventionally associated with this setting (the default system code page, the GDI font linking chain, and other font behavior) is not even mentioned?

    Now as a rule, any time you have a non-Unicode application, its user interface language will end up matching the default system locale because if it doesn't, the user interface will show all question marks.

    But obviously there a lot of languages covered by the same code page in many cases, so while it is important to point out this link due to the potential bad application behavior if you try to work outside of that code page, making this side effect the only topic of the article seems a bit like overkill, doesn't it?

    For completeness we can click on that display language link at the bottom of the page:

    So anyway it seems pretty clear that the help is different written from an MUI or user interface language point of view, doesn't it? The actual core effects that the default system locale has are not even really mentioned!

    Though to be honest this is a case where decisions are made by those who show up. And although I and the rest of the NLS team did a lot of review of various documentation topics, the review was really on the Platform SDK topics related to the NLS API. So if the MUI folks do a review and mention their principal concerns and the NLS people don't, then what can we expect the documentation's focus to be?

    Anyway, I expect this will get better in the next version so I'm not too worried. Even this counts as progress! :-)

     

    This post brought to you by(U+0ec3, a.k.a. LAO VOWEL SIGN AY -- where Laos means Canada?)

  • Sorting it all Out

    Characters, now half off!

    • 0 Comments

    Vivek's question over in the microsoft.public.win32.programmer.international newsgroup really confused me:

    What is the sizeof wchar_t / WCHAR on WIN64 platforms? Is it 4 bytes and
    UTF32 or same as WIN32 -- 2 bytes and UTF16?

    I wasn't confused due to not knowing the answer, which is the latter (as David Lowndes pointed out).

    My confusion was why the person asking was expecting that while the 32-bit platform support UTF-16 that the 64-bit platform would support UTF-32.

    Maybe the thought was that it would be some kind of special on the characters Windows supports -- 50% off? :-)

     

    This post brought to you by(U+0d87, a.k.a. SINHALA LETTER AEYANNA)

  • Sorting it all Out

    'The 44' (*not* 'The 4400')

    • 4 Comments

    The 4400 is an interesting television show that this post has nothing to do with.

    This post is about what happens if you run the script from No Regex in the Unicode room! (and no sex in the champagne room, either!) on a Vista machine.

    Basically, you will still get 44 characters with different results between char.IsLetter and the Regex expression:

    regex: False function: True char in hex: 130 - UppercaseLetter
    regex: False function: True char in hex: 1c5 - TitlecaseLetter
    regex: False function: True char in hex: 1c8 - TitlecaseLetter
    regex: False function: True char in hex: 1cb - TitlecaseLetter
    regex: False function: True char in hex: 1f2 - TitlecaseLetter
    regex: False function: True char in hex: 23a - UppercaseLetter
    regex: False function: True char in hex: 23e - UppercaseLetter
    regex: False function: True char in hex: 3d2 - UppercaseLetter
    regex: False function: True char in hex: 3d3 - UppercaseLetter
    regex: False function: True char in hex: 3d4 - UppercaseLetter
    regex: False function: True char in hex: 3f4 - UppercaseLetter
    regex: False function: True char in hex: 1fc3 - LowercaseLetter
    regex: False function: True char in hex: 1fcc - TitlecaseLetter
    regex: False function: True char in hex: 1ff3 - LowercaseLetter
    regex: False function: True char in hex: 1ffc - TitlecaseLetter
    regex: False function: True char in hex: 2102 - UppercaseLetter
    regex: False function: True char in hex: 2107 - UppercaseLetter
    regex: False function: True char in hex: 210b - UppercaseLetter
    regex: False function: True char in hex: 210c - UppercaseLetter
    regex: False function: True char in hex: 210d - UppercaseLetter
    regex: False function: True char in hex: 2110 - UppercaseLetter
    regex: False function: True char in hex: 2111 - UppercaseLetter
    regex: False function: True char in hex: 2112 - UppercaseLetter
    regex: False function: True char in hex: 2115 - UppercaseLetter
    regex: False function: True char in hex: 2119 - UppercaseLetter
    regex: False function: True char in hex: 211a - UppercaseLetter
    regex: False function: True char in hex: 211b - UppercaseLetter
    regex: False function: True char in hex: 211c - UppercaseLetter
    regex: False function: True char in hex: 211d - UppercaseLetter
    regex: False function: True char in hex: 2124 - UppercaseLetter
    regex: False function: True char in hex: 2126 - UppercaseLetter
    regex: False function: True char in hex: 2128 - UppercaseLetter
    regex: False function: True char in hex: 212a - UppercaseLetter
    regex: False function: True char in hex: 212b - UppercaseLetter
    regex: False function: True char in hex: 212c - UppercaseLetter
    regex: False function: True char in hex: 212d - UppercaseLetter
    regex: False function: True char in hex: 2130 - UppercaseLetter
    regex: False function: True char in hex: 2131 - UppercaseLetter
    regex: False function: True char in hex: 2133 - UppercaseLetter
    regex: False function: True char in hex: 213e - UppercaseLetter
    regex: False function: True char in hex: 213f - UppercaseLetter
    regex: False function: True char in hex: 2145 - UppercaseLetter
    regex: False function: True char in hex: 2c65 - LowercaseLetter
    regex: False function: True char in hex: 2c66 - LowercaseLetter
    TOTAL mismatches: 44

    The remaining characters make up an interesting bunch that give insight into the specific flaws of certain Regex operations:

    • U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE) -- no lowercase form in the invariant table, only one on Turkish
    • U+01c5 (LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON) -- no lowercase form in the invariant table
    • U+01c8 (LATIN CAPITAL LETTER L WITH SMALL LETTER J) -- no lowercase form in the invariant table
    • U+01cb (LATIN CAPITAL LETTER N WITH SMALL LETTER J) -- no lowercase form in the invariant table
    • U+01f2 (LATIN CAPITAL LETTER D WITH SMALL LETTER Z) -- no lowercase form in the invariant table
    • U+023a (LATIN CAPITAL LETTER A WITH STROKE) -- no idea why this one fails, there is a lowercase form (U+2c65)
    • U+023e (LATIN CAPITAL LETTER T WITH DIAGONAL STROKE) -- no idea why this one fails, there is a lowercase form (U+2c66)
    • U+03d2 (GREEK UPSILON WITH HOOK SYMBOL) -- this is a symbol; no lowercase form in the invariant table
    • U+03d3 (GREEK UPSILON WITH ACUTE AND HOOK SYMBOL) -- this is a symbol; no lowercase form in the invariant table
    • U+03d4 (GREEK UPSILON WITH DIAERESIS AND HOOK SYMBOL) -- this is a symbol; no lowercase form in the invariant table
    • U+03f4 (GREEK CAPITAL THETA SYMBOL) -- this is a symbol; no lowercase form in the invariant table
    • U+1ff3 (GREEK SMALL LETTER OMEGA WITH YPOGEGRAMMENI) -- no idea why this one fails, it IS a lowercase form
    • U+1ffc (GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI) -- no idea why this one fails, it has a lowercase form (U+1ff3)
    • U+2102 (DOUBLE-STRUCK CAPITAL C) -- this is a symbol; no lowercase form in the invariant table
    • U+2107 (EULER CONSTANT) -- this is a symbol; no lowercase form in the invariant table
    • U+210b (SCRIPT CAPITAL H) -- this is a symbol; no lowercase form in the invariant table
    • U+210c (BLACK-LETER CAPITAL H) -- this is a symbol; no lowercase form in the invariant table
    • U+210d (DOUBLE-STRUCK CAPITAL H) -- this is a symbol; no lowercase form in the invariant table
    • U+2110 (SCRIPT CAPITAL I) -- this is a symbol; no lowercase form in the invariant table
    • U+2111 (BLACK-LETTER CAPITAL I) -- this is a symbol; no lowercase form in the invariant table
    • U+2112 (SCRIPT CAPITAL L) -- this is a symbol; no lowercase form in the invariant table
    • U+2115 (DOUBLE-STRUCK CAPITAL N) -- this is a symbol; no lowercase form in the invariant table
    • U+2119 (DOUBLE-STRUCK CAPITAL P) -- this is a symbol; no lowercase form in the invariant table
    • U+211a (DOUBLE-STRUCK CAPITAL Q) -- this is a symbol; no lowercase form in the invariant table
    • U+211b (SCRIPT CAPITAL R) -- this is a symbol; no lowercase form in the invariant table
    • U+211c (BLACK-LETTER CAPITAL R) -- this is a symbol; no lowercase form in the invariant table
    • U+211d (DOUBLE-STRUCK CAPITAL R) -- this is a symbol; no lowercase form in the invariant table
    • U+2124 (DOUBLE-STRUCK CAPITAL Z) -- this is a symbol; no lowercase form in the invariant table
    • U+2126 (OHM SIGN) -- this is a symbol; no lowercase form in the invariant table
    • U+2128 (BLACK-LETTER CAPITAL Z) -- this is a symbol; no lowercase form in the invariant table
    • U+212a (KELVIN SIGN) -- this is a symbol; no lowercase form in the invariant table
    • U+212b (ANGSTROM SIGN) -- this is a symbol; no lowercase form in the invariant table
    • U+212c (SCRIPT CAPITAL B) -- this is a symbol; no lowercase form in the invariant table
    • U+212d (BLACK-LETTER CAPITAL C) -- this is a symbol; no lowercase form in the invariant table
    • U+2130 (SCRIPT CAPITAL E) -- this is a symbol; no lowercase form in the invariant table
    • U+2131 (SCRIPT CAPITAL F) -- this is a symbol; no lowercase form in the invariant table
    • U+2133 (SCRIPT CAPITAL M) -- this is a symbol; no lowercase form in the invariant table
    • U+213e (DOUBLE-STRUCK CAPITAL GAMMA) -- this is a symbol; no lowercase form in the invariant table
    • U+213f (DOUBLE-STRUCK CAPITAL PI) -- this is a symbol; no lowercase form in the invariant table
    • U+2145 (DOUBLE-STRUCK ITALIC CAPITAL D) -- this is a symbol; no lowercase form in the invariant table
    • U+2c65 (LATIN SMALL LETTER A WITH STROKE) -- no idea why this one fails, it IS a lowercase form
    • U+2c66 (LATIN SMALL LETTER T WITH DIAGONAL STROKE) -- no idea why this one fails, it IS a lowercase form

    So there you have it -- a combination of ones that shouldn't have failed since they were already lowercase and ones that failed due to that weird optimization to not look at Title Case and Upper Case characters since it attempted to lowercase first.

    That RegexOptions.IgnoreCase is just a nightmare!

    Interestingly, the OS casing table combined with a non-invariant culture (which is not possible in the .NET Framework today) would have picked up many of these letter like symbols and other one way mappings. But not all of them....

     

    This post brought to you by every member of "The 44"

  • Sorting it all Out

    No Regex in the Unicode room! (and no sex in the champagne room, either!)

    • 5 Comments

    (apologies to Chris Rock for the title!)

    Ted first sent me mail years ago, he was asking some questions about MSLU and Julie (who knew Ted back from when he was working for Microsoft) sent him to me. If memory serves he actually pointed out an interesting bug or two in the course of answering those questions that I ended up fixing.... :-)

    Anyway, a few years later he came back to Microsoft and from time to time a question would come up about some random Unicode or internationalization thing and I'd often know the answer.

    Though the question that came up yesterday from his colleague Kevin, I did not know for sure what was going on.

    The problem amounted to a Regex expression that should have returned the same results as char.IsLetter, but it wasn't. This code listed the characters with the problem:

    using System;
    using System.IO;
    using System.Text;
    using System.Globalization;
    using System.Text.RegularExpressions;
    namespace UnicodeCategory {
        class Program     {
            static void Main(string[] args)
            {
                StringBuilder sb = new StringBuilder();
                int cnt = 0;
                char c = char.MinValue;
                do {
                    const RegexOptions opt = RegexOptions.Compiled
                        | RegexOptions.CultureInvariant
                        | RegexOptions.IgnoreCase
                        | RegexOptions.ExplicitCapture;
                    Regex regex = new Regex(@"^([\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}]+)$", opt);
                    bool regexOK = regex.Match(c.ToString()).Success;
                    bool functionOK = Char.IsLetter(c);
                    if (regexOK != functionOK) {
                        cnt++;
                        sb.AppendLine(string.Format("regex: {0}\tfunction: {1}\tchar in hex: {2:x} - {3}",
                                                    regexOK, functionOK, (int)c, CharUnicodeInfo.GetUnicodeCategory(c)));
                    }
                    if (c == char.MaxValue) {
                        break;
                    }
                    c++;
                } while (true);
                sb.AppendLine(string.Format("TOTAL mismatches: {0}", cnt));
                File.WriteAllText("result.txt", sb.ToString());
            }
        }
    }

    The code was finding a total of 213 characters that were detected by char.IsLetter that the Regex expression that was literally searching for the same Unicode categories was not finding. The full list of characters this code was returning was:

    regex: False    function: True    char in hex: 130 - UppercaseLetter
    regex: False    function: True    char in hex: 1a6 - UppercaseLetter
    regex: False    function: True    char in hex: 1c5 - TitlecaseLetter
    regex: False    function: True    char in hex: 1c8 - TitlecaseLetter
    regex: False    function: True    char in hex: 1cb - TitlecaseLetter
    regex: False    function: True    char in hex: 1f2 - TitlecaseLetter
    regex: False    function: True    char in hex: 1f6 - UppercaseLetter
    regex: False    function: True    char in hex: 1f7 - UppercaseLetter
    regex: False    function: True    char in hex: 1f8 - UppercaseLetter
    regex: False    function: True    char in hex: 218 - UppercaseLetter
    regex: False    function: True    char in hex: 21a - UppercaseLetter
    regex: False    function: True    char in hex: 21c - UppercaseLetter
    regex: False    function: True    char in hex: 21e - UppercaseLetter
    regex: False    function: True    char in hex: 220 - UppercaseLetter
    regex: False    function: True    char in hex: 222 - UppercaseLetter
    regex: False    function: True    char in hex: 224 - UppercaseLetter
    regex: False    function: True    char in hex: 226 - UppercaseLetter
    regex: False    function: True    char in hex: 228 - UppercaseLetter
    regex: False    function: True    char in hex: 22a - UppercaseLetter
    regex: False    function: True    char in hex: 22c - UppercaseLetter
    regex: False    function: True    char in hex: 22e - UppercaseLetter
    regex: False    function: True    char in hex: 230 - UppercaseLetter
    regex: False    function: True    char in hex: 232 - UppercaseLetter
    regex: False    function: True    char in hex: 23a - UppercaseLetter
    regex: False    function: True    char in hex: 23b - UppercaseLetter
    regex: False    function: True    char in hex: 23d - UppercaseLetter
    regex: False    function: True    char in hex: 23e - UppercaseLetter
    regex: False    function: True    char in hex: 241 - UppercaseLetter
    regex: False    function: True    char in hex: 3d2 - UppercaseLetter
    regex: False    function: True    char in hex: 3d3 - UppercaseLetter
    regex: False    function: True    char in hex: 3d4 - UppercaseLetter
    regex: False    function: True    char in hex: 3d8 - UppercaseLetter
    regex: False    function: True    char in hex: 3da - UppercaseLetter
    regex: False    function: True    char in hex: 3dc - UppercaseLetter
    regex: False    function: True    char in hex: 3de - UppercaseLetter
    regex: False    function: True    char in hex: 3e0 - UppercaseLetter
    regex: False    function: True    char in hex: 3f4 - UppercaseLetter
    regex: False    function: True    char in hex: 3f7 - UppercaseLetter
    regex: False    function: True    char in hex: 3f9 - UppercaseLetter
    regex: False    function: True    char in hex: 3fa - UppercaseLetter
    regex: False    function: True    char in hex: 3fd - UppercaseLetter
    regex: False    function: True    char in hex: 3fe - UppercaseLetter
    regex: False    function: True    char in hex: 3ff - UppercaseLetter
    regex: False    function: True    char in hex: 400 - UppercaseLetter
    regex: False    function: True    char in hex: 40d - UppercaseLetter
    regex: False    function: True    char in hex: 48a - UppercaseLetter
    regex: False    function: True    char in hex: 48c - UppercaseLetter
    regex: False    function: True    char in hex: 48e - UppercaseLetter
    regex: False    function: True    char in hex: 4c0 - UppercaseLetter
    regex: False    function: True    char in hex: 4c5 - UppercaseLetter
    regex: False    function: True    char in hex: 4c9 - UppercaseLetter
    regex: False    function: True    char in hex: 4cd - UppercaseLetter
    regex: False    function: True    char in hex: 4ec - UppercaseLetter
    regex: False    function: True    char in hex: 4f6 - UppercaseLetter
    regex: False    function: True    char in hex: 500 - UppercaseLetter
    regex: False    function: True    char in hex: 502 - UppercaseLetter
    regex: False    function: True    char in hex: 504 - UppercaseLetter
    regex: False    function: True    char in hex: 506 - UppercaseLetter
    regex: False    function: True    char in hex: 508 - UppercaseLetter
    regex: False    function: True    char in hex: 50a - UppercaseLetter
    regex: False    function: True    char in hex: 50c - UppercaseLetter
    regex: False    function: True    char in hex: 50e - UppercaseLetter
    regex: False    function: True    char in hex: 1f88 - TitlecaseLetter
    regex: False    function: True    char in hex: 1f89 - TitlecaseLetter
    regex: False    function: True    char in hex: 1f8a - TitlecaseLetter
    regex: False    function: True    char in hex: 1f8b - TitlecaseLetter
    regex: False    function: True    char in hex: 1f8c - TitlecaseLetter
    regex: False    function: True    char in hex: 1f8d - TitlecaseLetter
    regex: False    function: True    char in hex: 1f8e - TitlecaseLetter
    regex: False    function: True    char in hex: 1f8f - TitlecaseLetter
    regex: False    function: True    char in hex: 1f98 - TitlecaseLetter
    regex: False    function: True    char in hex: 1f99 - TitlecaseLetter
    regex: False    function: True    char in hex: 1f9a - TitlecaseLetter
    regex: False    function: True    char in hex: 1f9b - TitlecaseLetter
    regex: False    function: True    char in hex: 1f9c - TitlecaseLetter
    regex: False    function: True    char in hex: 1f9d - TitlecaseLetter
    regex: False    function: True    char in hex: 1f9e - TitlecaseLetter
    regex: False    function: True    char in hex: 1f9f - TitlecaseLetter
    regex: False    function: True    char in hex: 1fa8 - TitlecaseLetter
    regex: False    function: True    char in hex: 1fa9 - TitlecaseLetter
    regex: False    function: True    char in hex: 1faa - TitlecaseLetter
    regex: False    function: True    char in hex: 1fab - TitlecaseLetter
    regex: False    function: True    char in hex: 1fac - TitlecaseLetter
    regex: False    function: True    char in hex: 1fad - TitlecaseLetter
    regex: False    function: True    char in hex: 1fae - TitlecaseLetter
    regex: False    function: True    char in hex: 1faf - TitlecaseLetter
    regex: False    function: True    char in hex: 1fbc - TitlecaseLetter
    regex: False    function: True    char in hex: 1fcc - TitlecaseLetter
    regex: False    function: True    char in hex: 1ffc - TitlecaseLetter
    regex: False    function: True    char in hex: 2102 - UppercaseLetter
    regex: False    function: True    char in hex: 2107 - UppercaseLetter
    regex: False    function: True    char in hex: 210b - UppercaseLetter
    regex: False    function: True    char in hex: 210c - UppercaseLetter
    regex: False    function: True    char in hex: 210d - UppercaseLetter
    regex: False    function: True    char in hex: 2110 - UppercaseLetter
    regex: False    function: True    char in hex: 2111 - UppercaseLetter
    regex: False    function: True    char in hex: 2112 - UppercaseLetter
    regex: False    function: True    char in hex: 2115 - UppercaseLetter
    regex: False    function: True    char in hex: 2119 - UppercaseLetter
    regex: False    function: True    char in hex: 211a - UppercaseLetter
    regex: False    function: True    char in hex: 211b - UppercaseLetter
    regex: False    function: True    char in hex: 211c - UppercaseLetter
    regex: False    function: True    char in hex: 211d - UppercaseLetter
    regex: False    function: True    char in hex: 2124 - UppercaseLetter
    regex: False    function: True    char in hex: 2126 - UppercaseLetter
    regex: False    function: True    char in hex: 2128 - UppercaseLetter
    regex: False    function: True    char in hex: 212a - UppercaseLetter
    regex: False    function: True    char in hex: 212b - UppercaseLetter
    regex: False    function: True    char in hex: 212c - UppercaseLetter
    regex: False    function: True    char in hex: 212d - UppercaseLetter
    regex: False    function: True    char in hex: 2130 - UppercaseLetter
    regex: False    function: True    char in hex: 2131 - UppercaseLetter
    regex: False    function: True    char in hex: 2133 - UppercaseLetter
    regex: False    function: True    char in hex: 213e - UppercaseLetter
    regex: False    function: True    char in hex: 213f - UppercaseLetter
    regex: False    function: True    char in hex: 2145 - UppercaseLetter
    regex: False    function: True    char in hex: 2c00 - UppercaseLetter
    regex: False    function: True    char in hex: 2c01 - UppercaseLetter
    regex: False    function: True    char in hex: 2c02 - UppercaseLetter
    regex: False    function: True    char in hex: 2c03 - UppercaseLetter
    regex: False    function: True    char in hex: 2c04 - UppercaseLetter
    regex: False    function: True    char in hex: 2c05 - UppercaseLetter
    regex: False    function: True    char in hex: 2c06 - UppercaseLetter
    regex: False    function: True    char in hex: 2c07 - UppercaseLetter
    regex: False    function: True    char in hex: 2c08 - UppercaseLetter
    regex: False    function: True    char in hex: 2c09 - UppercaseLetter
    regex: False    function: True    char in hex: 2c0a - UppercaseLetter
    regex: False    function: True    char in hex: 2c0b - UppercaseLetter
    regex: False    function: True    char in hex: 2c0c - UppercaseLetter
    regex: False    function: True    char in hex: 2c0d - UppercaseLetter
    regex: False    function: True    char in hex: 2c0e - UppercaseLetter
    regex: False    function: True    char in hex: 2c0f - UppercaseLetter
    regex: False    function: True    char in hex: 2c10 - UppercaseLetter
    regex: False    function: True    char in hex: 2c11 - UppercaseLetter
    regex: False    function: True    char in hex: 2c12 - UppercaseLetter
    regex: False    function: True    char in hex: 2c13 - UppercaseLetter
    regex: False    function: True    char in hex: 2c14 - UppercaseLetter
    regex: False    function: True    char in hex: 2c15 - UppercaseLetter
    regex: False    function: True    char in hex: 2c16 - UppercaseLetter
    regex: False    function: True    char in hex: 2c17 - UppercaseLetter
    regex: False    function: True    char in hex: 2c18 - UppercaseLetter
    regex: False    function: True    char in hex: 2c19 - UppercaseLetter
    regex: False    function: True    char in hex: 2c1a - UppercaseLetter
    regex: False    function: True    char in hex: 2c1b - UppercaseLetter
    regex: False    function: True    char in hex: 2c1c - UppercaseLetter
    regex: False    function: True    char in hex: 2c1d - UppercaseLetter
    regex: False    function: True    char in hex: 2c1e - UppercaseLetter
    regex: False    function: True    char in hex: 2c1f - UppercaseLetter
    regex: False    function: True    char in hex: 2c20 - UppercaseLetter
    regex: False    function: True    char in hex: 2c21 - UppercaseLetter
    regex: False    function: True    char in hex: 2c22 - UppercaseLetter
    regex: False    function: True    char in hex: 2c23 - UppercaseLetter
    regex: False    function: True    char in hex: 2c24 - UppercaseLetter
    regex: False    function: True    char in hex: 2c25 - UppercaseLetter
    regex: False    function: True    char in hex: 2c26 - UppercaseLetter
    regex: False    function: True    char in hex: 2c27 - UppercaseLetter
    regex: False    function: True    char in hex: 2c28 - UppercaseLetter
    regex: False    function: True    char in hex: 2c29 - UppercaseLetter
    regex: False    function: True    char in hex: 2c2a - UppercaseLetter
    regex: False    function: True    char in hex: 2c2b - UppercaseLetter
    regex: False    function: True    char in hex: 2c2c - UppercaseLetter
    regex: False    function: True    char in hex: 2c2d - UppercaseLetter
    regex: False    function: True    char in hex: 2c2e - UppercaseLetter
    regex: False    function: True    char in hex: 2c80 - UppercaseLetter
    regex: False    function: True    char in hex: 2c82 - UppercaseLetter
    regex: False    function: True    char in hex: 2c84 - UppercaseLetter
    regex: False    function: True    char in hex: 2c86 - UppercaseLetter
    regex: False    function: True    char in hex: 2c88 - UppercaseLetter
    regex: False    function: True    char in hex: 2c8a - UppercaseLetter
    regex: False    function: True    char in hex: 2c8c - UppercaseLetter
    regex: False    function: True    char in hex: 2c8e - UppercaseLetter
    regex: False    function: True    char in hex: 2c90 - UppercaseLetter
    regex: False    function: True    char in hex: 2c92 - UppercaseLetter
    regex: False    function: True    char in hex: 2c94 - UppercaseLetter
    regex: False    function: True    char in hex: 2c96 - UppercaseLetter
    regex: False    function: True    char in hex: 2c98 - UppercaseLetter
    regex: False    function: True    char in hex: 2c9a - UppercaseLetter
    regex: False    function: True    char in hex: 2c9c - UppercaseLetter
    regex: False    function: True    char in hex: 2c9e - UppercaseLetter
    regex: False    function: True    char in hex: 2ca0 - UppercaseLetter
    regex: False    function: True    char in hex: 2ca2 - UppercaseLetter
    regex: False    function: True    char in hex: 2ca4 - UppercaseLetter
    regex: False    function: True    char in hex: 2ca6 - UppercaseLetter
    regex: False    function: True    char in hex: 2ca8 - UppercaseLetter
    regex: False    function: True    char in hex: 2caa - UppercaseLetter
    regex: False    function: True    char in hex: 2cac - UppercaseLetter
    regex: False    function: True    char in hex: 2cae - UppercaseLetter
    regex: False    function: True    char in hex: 2cb0 - UppercaseLetter
    regex: False    function: True    char in hex: 2cb2 - UppercaseLetter
    regex: False    function: True    char in hex: 2cb4 - UppercaseLetter
    regex: False    function: True    char in hex: 2cb6 - UppercaseLetter
    regex: False    function: True    char in hex: 2cb8 - UppercaseLetter
    regex: False    function: True    char in hex: 2cba - UppercaseLetter
    regex: False    function: True    char in hex: 2cbc - UppercaseLetter
    regex: False    function: True    char in hex: 2cbe - UppercaseLetter
    regex: False    function: True    char in hex: 2cc0 - UppercaseLetter
    regex: False    function: True    char in hex: 2cc2 - UppercaseLetter
    regex: False    function: True    char in hex: 2cc4 - UppercaseLetter
    regex: False    function: True    char in hex: 2cc6 - UppercaseLetter
    regex: False    function: True    char in hex: 2cc8 - UppercaseLetter
    regex: False    function: True    char in hex: 2cca - UppercaseLetter
    regex: False    function: True    char in hex: 2ccc - UppercaseLetter
    regex: False    function: True    char in hex: 2cce - UppercaseLetter
    regex: False    function: True    char in hex: 2cd0 - UppercaseLetter
    regex: False    function: True    char in hex: 2cd2 - UppercaseLetter
    regex: False    function: True    char in hex: 2cd4 - UppercaseLetter
    regex: False    function: True    char in hex: 2cd6 - UppercaseLetter
    regex: False    function: True    char in hex: 2cd8 - UppercaseLetter
    regex: False    function: True    char in hex: 2cda - UppercaseLetter
    regex: False    function: True    char in hex: 2cdc - UppercaseLetter
    regex: False    function: True    char in hex: 2cde - UppercaseLetter
    regex: False    function: True    char in hex: 2ce0 - UppercaseLetter
    regex: False    function: True    char in hex: 2ce2 - UppercaseLetter
    TOTAL mismatches: 213

    I probably should have recognized the list since I have dealt with it before. But off the top of my head I didn't, and in the meantime Ryan over on the CLR team  stepped in help explain what was going on:

    This appear to be a bug in the Regex class. If IgnoreCase is present we will translate Lu and Lt to just Ll since we call Char.ToLower for every character in the input.  You would likely know more about this than I do but I verified that Char.ToLower for one of the characters returns the same character presumably because there is no lower case version of the character.  So the expression fails to match because the Unicode category for the character is still uppercase letter and we are trying to match Ll.

    Ah, now it all came together.

    Well, if you are running on Vista and have the updated casing table then they will work. But otherwise, when you are not running on Vista, the casing table does not cover all of Unicode 5.0 even though the property table in .NET 2.0 will.

    (if you run on .NET 1.1 then you will be missing even more characters since not all characters are identified, though in that case they will not be listed as missing in the script since neither function knows asbout them!)

    So if you are running on 2.0 of better, this Regex "optimization" is the cause of the bug.

    Strictly speaking, there was no need to pass RegexOptions.IgnoreCase since char.IsLetter is going to pick both of them up anyway. So there is a workaround here -- don't pass flags that slow down the Regex and break its functioning anyway, and you can then freely use the Regex if you like (though it did still seem kinda slow to me, maybe there are some optimizations here.... :-)

     

    This post brought to you by(U+2c00, a.k.a. GLAGOLITIC CAPITAL LETTER AZU)

  • Sorting it all Out

    Sprechen Sie IME?

    • 5 Comments

    The other day, Keith asked in the Suggestion Box:

    In creating an on screen keyboard for Korean, I began to notice that the Korean IME seems to do things differently than, say, the Japanese IME.  In Japanese, to get the characters from the ToUnicodeEx function is as simple as setting the VK_KANA virtual key to the on state when you pass in the Keyboard State array parameter.  However, in Korean it does not seem to behave in this simple a way.  More confusing, the Japanese IME has a Kana status button that turns this virtual key 'on' in the keyboard state to switch character sets.  However, the Han/Eng toggle seems to make no change to the keyboard state.  What happens internally when this button is clicked?  How would I get the correct Korean characters from the ToUnicodeEx function?  Why is this so confusing?

    He also hedged his bets in the microsoft.public.win32.programmer.international newsgroup:

    I am currently enhancing an on-screen keyboard adding the ability to enter Korean.  I am having problems getting the Korean characters to be displayed on the keyboard keys.  The code which does it correctly for other languages but doesn't work for Korean makes a call to the function ToUnicodeEx.  Further, I just noticed an old post on the newsgroups that said this method was problematic for Korean or other IMEs using TSF.  That being the case, how would I go about doing this then?  And why does ToUnicodeEx not work for certain IMEs?

    Thanks for any assistance, Keith.

    (I take no offense, there is definitely no promise of immediate response or anything!)

    Then a couple of months ago, ibon asked in that same newsgroup:

    HWND hWnd = GetForegroundWindow( );
    HIMC himc = ImmGetContext( hWnd );

    But "himc" always returns "NULL".

    If MS has blocked this, is there other ways to access info about the input language on a common IME?

    Sincerely,

    And a few months ago, Matthias asked in that same newsgroup:

    Hello

      i have a problem with korean. For a touch screen application we use a virtual keyboard to enter data. Unfortunatly does the driver generats a mouse click event when the user presses on the screen.
      If the input is korean, the IME interprets such a click as something as a cancel event and stops the composition. Can someone tell me how to surpress this so that a mouse click does NOT interrupts the composition?

    Thanky you
    Matthias

    Then there was another post on that same newsgroup early last month from Digital Ice:

    I try to retrieve the ime candidate list of Microsoft Pinyin IME.
    This code is working fine with Windows XP but not Windows Vista.

    if (msg->message == WM_IME_NOTIFY)
    {
        if (msg->wParam == IMN_OPENCANDIDATE || msg->wParam == IMN_CHANGECANDIDATE)
        {
            HWND hFocus = msg->hwnd;
            HIMC hImc = ImmGetContext(hFocus);
            _ASSERT(hImc);
            DWORD dwSize = ImmGetCandidateList(hImc,0,NULL,0);
            if (dwSize)
            {
                ..........leave out here.

    ImmGetCandidateList always returns zero when it wotrks with Windows Vista.
    ImmGetCandidateList should returns the size of CANDIDATELIST required.
    Why the behavior is different?

    On the whole, if you look into these problems deeply enough, you will find they have a few things in common:

    1. They all have to do with IMEs
    2. All of the IMEs in question are actually Text Services Framework (TSF) Text Input Processors (TIPs)
    3. In each case, there are one of two causes to the problem being reported, either:
      • The compatibility layer between TSF and the original Input Method Manager (IMM) API within imm32.dll has a bug, sdomewhat akin to What broke the input language messages? but without as good of a justification, or
      • The IME's own interaction with the keyboard handling API within user32.dll is not as full as it is with some other IME.

    Now obviously of those two cases the second one is the one for which there is no specific solution that will allow the code to work -- in those cases, you have to work more directly with the IME rather than the keyboard handling functions, as they simply do not provide the information where it is being requested. Every IME is made up of code and data and if they handle the situation differently then that is what they do -- how many times would you expect to see code written by different developers within different countries supporting the input of different languages where they all worked the same way?

    The first case is a bit less forgivable, though to honest after working with the IMM API in the past, I can understand why the legacy support to have TSF support the IMM programming interface would be incomplete -- it is not a terribly easy API to use.

    In the bulk of these cases, the answer is to look at the Text Services Framework and its myriad of classes, interfaces, methods, and properties to work with the IMEs. Starting in XP where some of them were converted up until Vista where just about all of them were, it really is the only answer that is going to avoid frustration that does not have a chance of leading to success....

     

    This post brought to you by   (U+17c0, a.k.a. KHMER VOWEL SIGN IE)

  • Sorting it all Out

    The nature of OrdinalIgnoreCase vs. intuitive expectations

    • 1 Comments

    A while back Patrick asked me:

    Hi Michael,

    Please forward if you’re not the right person to ask the question…

    Let’s say I have the following code snippet.


    int comparison = String.Compare( x, y, StringComparison.CurrentCultureIgnoreCase);
    if (comparison == 0) {
          comparison = String.Compare( x, y, StringComparison.OrdinalIgnoreCase);
    }


    If you compare “A” and “a”, the compared results are the same (i.e. comparison=0 NO MATTER if you use current culture or OrdinalIgnoreCase). If you passed in the Turkish I (i.e. "I" and "ı" ), the first comparison result is 0 (using current culture ignore case). Since it’s 0, the code does further comparison and the value is -232 (using ordinalIgoreCase).

    Since Turkish "I" and "ı", are just upper/lower case, should the 2nd comparison return 0 as well?

    Thanks!
    Patrick

    Well, Patrick is right about the way casing in Turkic works (ref: The [Upper]Case of the Turkish İ (or: Casing, the 2nd)).

    But one of the core ideas in both Ordinal and OrdinalIgnoreCase comparisons is that they stay independent of culture-specific differences, so the casing operation is independent of culture.

    (Starting in .NET 2.0, the casing for OrdinalIgnoreCase even uses the operating system tables, which means on Vista it even uses the tables that were updated (first to 4.1 and then to 5.0!)

    But it actually shows how weird OrdinalIgnoreCase really is, doesn't it? I mean, Ordinal is bad enough in its current form, add in casing and then all bets are off.

    Just think of the operation as quite Некультурные....

     

    This post brought to you by İ (U+0130, a.k.a. LATIN CAPITAL LETTER I WITH DOT ABOVE)

  • Sorting it all Out

    Building your own better Ordinal comparison

    • 0 Comments

    OK, I previously talked about the problem with Ordinal comparisons and one of the more uncool suggested ways around the problem.

    So what is a potential better way to approach the problem?

    Well, if we take the simplified model of the Ordinal comparison where every code point has an equal weight and focus on the fact that the real flaw is in the order, what is the best way to proceed?

    Well, the entire table could be built up by ordering every single code point by their sort keys, breaking ties with the code point's numeric value.

    And then giving each of those code points a weight from 0x0000 to 0xFFFF, beginning to end.

    With each new version of the sorting tables (like when new Unicode versions come out and the new characters are added), this process can be repeated.

    In fact, why wait for Microsoft to do this?

    For the cost of 256k (the amount of space that 216 DWORDS will take up), you could implement this yourself! :-)

    In fact (were I interviewing candidates at this moment!) I think this would make a fun interview question, focusing on not just building the table but designing the interface to use it.

    I'd probably have to think this through a bit more first, and I'll most likely be thinking about something else entirely by then anyway. But if you ever found yourself dismayed by how stupid the results of an ordinal comparison seem then you could code this idea up pretty quickly.

    If you go to the next Unicode Conference you could show me what you came up with and impress me....

     

    This post brought to you by  (U+ffee, a.k.a. HALFWIDTH WHITE CIRCLE)

  • Sorting it all Out

    Maybe it was registry rumination

    • 1 Comments

    One of the things that MSKLC does in its installation packages is add the necessary registry key and related values to support the keyboard layout on the machine.

    These are the ones under HKLM\SYSTEM\CurrentControlSet\Control\Keyboard Layouts.

    Anyway, since one of the big features of the 1.4 update was 64-bit support, I had to look into what had to be done differently for 64-bit machines.

    Among other things, I looked up all the information about registry redirection, specifically about registry reflection and shared registry keys.

    And after I read those topics and noticed that my subkey's tree was apparently not covered by either technology, I made a little note to look into whether I would need to write the key twice (once for 32-bit and once for 64-bit!).

    Luckily I did not do more than make that note, since it was not needed. I only had to write the keys once in the install package, and the keyboard worked for both 64-bit and 32-bit (once I was putting the right layout DLLs into the right directories, at least!).

    OK, so its not redirection or reflection or replication or any of these other technologies (or if it is, it is not documented as such!).

    So I ruminated on this for a bit, and decided to tentatively name the effect "registry rumination" in honor of this intellectual pause I took between Limonatas.... :-)

     

    This post brought to you by  (U+3089, a.k.a. HIRAGANA LETTER RA)

  • Sorting it all Out

    Don't forget to reboot, please

    • 4 Comments

    I was told about the cause behind an interesting bug a few hours ago.

    The behavior:

    A crash in managed code running on Server 2003 with the following call stack:

    System.ArgumentException: Culture ID 2155 (0x086B) is not a supported culture.
    Parameter name: culture
       at System.Globalization.CultureTableRecord.GetCultureTableRecord(Int32 cultureId, Boolean useUserOverride)
       at System.Globalization.CultureInfo..ctor(Int32 culture, Boolean useUserOverride)
       at System.Globalization.CultureInfo..ctor(Int32 culture)
       at System.Globalization.CultureTable.GetCultures(CultureTypes types)
       at Microsoft.Exchange.Setup.Common.SetupContext.GetExchangeCulture()
       at Microsoft.Exchange.Setup.Common.SetupContext.GetSetupContext(PropertyBag parsedArguments)
       at Microsoft.Exchange.Setup.Common.RootDataHandler.OnReadData()
       at Microsoft.Exchange.Management.SystemManager.WinForms.SingleTaskDataHandler.OnReadData(CommandInteractionHandler interactionHandler)
       at Microsoft.Exchange.Management.SystemManager.WinForms.DataHandler.Read(CommandInteractionHandler interactionHandler)
       at Microsoft.Exchange.Setup.Common.LauncherBase.Run(String[] args)
       at Microsoft.Exchange.Setup.Common.LauncherBase.MainCore[T](String[] args)
       at Microsoft.Exchange.Management.ExSetupUI.ExSetupUI.Main(String[] args)

    What is happening?

    Well, I guess you could blame it on Service Pack 1 of Server 2003....

    Remember ELKs aren't roaming where the servers are?

    Well, they fixed that and added a whole bunch of ELKs to Server 2003.

    That means they added a whole bunch of registry keys saying that these locales were present and an updated locale.nls that contains the data for those locales.

    Unfortunately, locale.nls is one of those files that cannot be replaced without a reboot. So after the service pack is installed, the machine is not rebooted.

    Then the machine starts running some managed code that is enumerating all of the locales, including the Windows-only ones that everyone wanted working on Server 2003.

    So EnumSystemLocales is claiming a bunch of locales exist which won't really exist until after the reboot that the person installing the service pack decided not to do yet. And the .NET Framework trusts those results and crashes since it was unable to get the data.

    SUMMARY: the .NET Framework trusts Windows, Windows trusts the user, and the user trusts reboots after service packs are optional. Or at least optional so that other things can be installed first.

    I am not sure how to respond exactly....

    Well, other than to suggest that people please reboot after service packs and security updates and such, I mean.

     

    This post brought to you by (U+215b, a.k.a. VULGAR FRACTION ONE EIGHTH)

Page 1 of 4 (50 items) 1234