Blog - Title

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    What's wrong with what FxCop does for globalization, Part 0

    • 7 Comments

    Some people really like the visibility that globalization gets with FxCop in the managed world.

    I have a net neutral feeling about it, myself. I mean, I like the visibility, I am just not sure that it is improving the behavior much. And I am sure that it is exposing a flaw in the way many of the methods are put together since it makes some code feel less readable to people at times.

    Like the other day, when Navid asked:

    What are the benefits or possible disadvantages of doing the following (assume b is any non-string object):

        string a = string.Concat(b);

    Instead of:

        string a = b.ToString();

    In actuality, to adhere to FxCop guidelines, you’ll need to specify the IFormatProvider when using the ToString() method. Therefore, the correct statement should be:

        string a = b.ToString(CultureInfo.InvariantCulture);

    The first thing to notice is that if b is null, the ToString() method will throw an exception whereas the first statement will work without problems. This may save you a null-check in certain circumstances. Furthermore, since no IFormatProvider is necessary, you don’t have to import System.Globalization into your class either.

    However, I suppose some disadvantages may be that using Concat() instead of ToString() may not be as clear for some readers. Also, if you need to use a specific CultureInfo, then ToString() may be the obvious choice. With respect to performance, I have no idea on the implications but would guess that ToString() is probably slightly faster. It may be worth looking at the IL to see what the differences actually are.

    In any regard, I find the first statement more pleasing but I am unsure if it’s actually The Right Thing To Do. I am more or less using the first statement as a shorthand for:

        string a = string.Empty;
        if (!string.IsNullOrEmpty(b))
        {
            a = b.ToString(CultureInfo.InvariantCulture);
        }

    Now it is unfair to blame FxCop here -- it is making a very valid point, stating (in SDET David Kean's words) "Make sure you pass an IFormatProvider to b.ToString() to make it clear and explicit what IFormatProvider is being used. If you do not specify one, the default in most cases is CultureInfo.CurrentCulture, which will cause the output of ToString to vary depending on the user’s currently selected locale."

    Using System.String.Concat, which is clearly the wrong method from an intent standpoint, rather than System.String.ToString, just to avoid the other two evils (an FxCop warning or an unhanded FxCop error) is as clear indication that in this case, FxCop has no choice but to either encourage bad behavior or encourage developers to use unclear code constructs.

    Has the actual issue in question been addressed? Is the code more properly globalized?

    In my opinion the only thing that happens to globalization in this case is that developers who wouldn't know Unicode from UNICEF have every reason to feel like globalization is a four letter word.

    And it hurts the reputation of the tool too -- I mean, how did everyone feel about the kid in high school that made them do some silly thing like not run in the halls? Did they feel more educated? Or did they just decide that the person handing out citations is just really annoying?

    (apologies if you were one of those ho used to hand out the warnings, but many people did find you to be kind of annoying!)

    Now some people suggest that the problem is that the wrong defaults were chosen in the beginnings of the .NET Framework (Anthony Moore is one of the more prominent holders of this opinion). But if the problem is that people don't like adding a StringComparison.OrdinalIgnoreCase to their String.Compare calls, or alternately if the problem is that they do not find it intuitive to use String.CompareOrdinal, then the bug is not that OrdinalIgnoreCase isn't the default comparison type. Such a default would just mean that we'd be complaining about a whole different set of bugs when people blindly used a default that did not match their scenarios.

    The bug, to the extent that there is a bug, is in the naming scheme being so unintuitive that there is no way on earth that any human would have expected the "preferred" terms to be used unless they either read documentation or blog posts like this one that suggested it to them. And as much as I may think of my little fruit stand here, I am really not quite ready to assume that it is thought of as "intuitive" to read this blog before one writes code! :-)

    Similarly, I do not think the developers who think there ought to be an OrdinalCulture of some type are retarded; I simply think that they are trying to work within a system that simultaneously suggests the importance of always specifying a culture and always using Ordinal comparisons. The developer who is wondering where he can find the OrdinalCulture is arguably the only person involved who doesn't seem to be developmentally disabled, if you know what I mean.

    Now with that said, I believe that there are things that can be done in all of the following to lead to a place where code could be better and globalization might not be thought of along with various four letter words:

    • FxCop
    • Globalization methods/properties within .NET
    • Documentation

    and I'll try blogging about some of my thoughts about what could be done in each area.

    You an think of this post as the introduction to the topic.... :-)

    This post brought to you by  (U+0DDD, a.k.a. SINHALA VOWEL SIGN KOMBUVA DIGA AELA-PILLA)

  • Sorting it all Out

    Why we have both CurrentCulture and CurrentUICulture

    • 30 Comments

    It was late last year that I got the following through the contact link:

    Michael, I have a question that might be of particular interest to you. You might be able to answer it in one of your 30 to 40 blog entries you'll post today or tomorrow.

    I'm taking Microsoft curriculum classes for various MCTS tests that we're taking and we're doing the training for the 70-526 tests. We were discussing CurrentUICulture and CurrentCulture.

    Will you explain the difference between the two in .Net and also will you explain the differences between localization, globalization, and internationalization. If you would, I'd appreciate it, and I'll share your response with my class.

    Thanks,
    Chris

    I probably missed the deadline of the class, but a I have mentioned previously this is probably not the place for time-critical responses. Sorry about that, Chris.

    The question about globalization vs. localization vs. internationalization is one I have covered before, both jokingly and more seriously in posts like this one and this one that points at Larry's post. The "official definitions" according to the team behind Dr. International have been substantively changed at least once (and I don't exactly go along with the current "blessed" definitions for what it is worth), and not everyone agrees with the definitions I use (which pretty much match the ones Larry gave).

    Given all of this, as a question it is meaningless if it is just going to be a bunch of people arguing about terminology definitions; conversations about the concepts have a chance of being more interesting.

    So I am just going to focus on the other question of CurrentCulture vs. CurrentUICulture that came up again yesterday in this post and this one.

    Now I have already pointed out that the actual names are probably not the greatest so if you want to talk about that you should probably comment in one of those posts. This post is gonna focus on the concepts.

    The question that people asked yesterday is about wanting to know why these have to be two different properties-- why there is not just one.

    Interestingly, it gets back to the fact that the concepts of internationalization and localization are two different things.

    CurrentCulture, the managed analogue of the default user locale on Windows, comes from a list of many different choices (~208 of them on Vista) that allow a user to take any language version of Windows (even if it is not their native language) and see defaults for date formats and number formats and order of text in sorted lists that make sense to them. Now all you have to do is wait until December 5th for a tax refund check that you were supposed to receive on May 12th due to an ambiguous date like 5/12/2006 to realize that no matter what language one speaks one will have certain fixed opinions on these items. This list has existed in Windows for the entire life of Win32, and serves well the people who learned one of those few lucky languages to have Windows translated into it and who nevertheless had a preference for their formats and such that does not match that language.

    In other words, the concept behind CurrentCulture has been in Windows all along, and the .NET Framework is merely trying to make use of this longstanding concept. Of course the name is not so great, as I pointed out yesterday, and it does not really describe to people what it is so that the name alone suggests appropriate use, but that is a reasonable excuse for when somebody makes a mistake, not an excuse to keep making the mistake once the meaning has been explained to you.

    This setting does not, should not, and will not require one to run a localized version, because these preferences that one grows up with treating as "the way things are" cannot be changed without being a really awful user experience.

    Now let's move over to CurrentUICulture, the managed analogue of the user default UI language in Windows that was introduced in Windows 2000 and is meant to cover the localization (a.k.a. localisation, a.k.a. translation) of Windows into other languages. It is about the resources that are loaded up so that the dialogs and alerts and menus and help screens and tooltips and so forth can be in a specific language. The managed version of this obviously needs to be scaled down to the application level rather than all of Windows. But the intended use is obvious no matter how much the name might suck.

    The connection between the managed and unmanaged versions of these functionalities is not just one of heritage, by the way. The unmanaged Windows settings are the defaults used for the managed .NET ones. Just in case people did not want to rely on documentation or subtle hints, they could fall back to paying attention to this default behavioral imperative....

    Now sometimes these two settings will obviously be expected to be the same. But there are several obvious scenarios where they wouldn't:

    1. Often there are no such resources available, which is true of every version of Windows and every version of the .NET Framework itself, where the number of languages into which the software is translated is smaller than the list of possible locales.
    2. There are lots of times that a person knows more than one language, and if they have the localized version or the MUI version they can switch to, they would like to do it, even if they still want that default user locale setting that affects behavior they really have been accustomed to since for all intensive purposes their zygote phase.
    3. Interestingly, there are also times when people (especially developers) have historically preferred the English version due to perhaps perceived stability issues or cultural issues, yet they still want those built-in defaults that they grew up with and expect to be what they always were.

    And there are other scenarios as well, and they are also valid.

    So in the end there are two different concepts that for perfectly valid reasons a user might want as two different defaults.

    Which means they get two different properties. For all the times that they are not the same....

    Which is not to say that everyone has the right view on this; SQL Server, for example, has been doing it wrong at least since it was a Microsoft/Sybase joint project (discussed further here).

    And if the names weren't so non-intuitive, this whole post might not have been so requested! :-)

     

    This post brought to you by  (U+0d9b, a.k.a. SINHALA LETTER MAHAAPRAANA KAYANNA)

  • Sorting it all Out

    Why I don't like the IsTextUnicode API

    • 12 Comments

    The IsTextUnicode API has been around since NT 3.5, according to the Platform SDK histories. According to the PSDK, its purpose is as follows:

    The IsTextUnicode function determines whether a buffer is likely to contain a form of Unicode text. The function uses various statistical and deterministic methods to make its determination, under the control of flags passed via lpi. When the function returns, the results of such tests are reported via lpi.

    It then goes on to describe the many different tests that it can do when the appropriate flags are passed:

    IS_TEXT_UNICODE_ASCII16
       The text is Unicode, and contains only zero-extended ASCII values/characters.

    IS_TEXT_UNICODE_REVERSE_ASCII16
       Same as the preceding, except that the Unicode text is byte-reversed.

    IS_TEXT_UNICODE_STATISTICS
       The text is probably Unicode, with the determination made by applying statistical analysis. Absolute certainty is not guaranteed. See the following Remarks section.

    IS_TEXT_UNICODE_REVERSE_STATISTICS
       Same as the preceding, except that the probably-Unicode text is byte-reversed.

    IS_TEXT_UNICODE_CONTROLS
       The text contains Unicode representations of one or more of these nonprinting characters: RETURN, LINEFEED, SPACE, CJK_SPACE, TAB.

    IS_TEXT_UNICODE_REVERSE_CONTROLS
       Same as the preceding, except that the Unicode characters are byte-reversed.

    IS_TEXT_UNICODE_BUFFER_TOO_SMALL
       There are too few characters in the buffer for meaningful analysis (fewer than two bytes).

    IS_TEXT_UNICODE_SIGNATURE
       The text contains the Unicode byte-order mark (BOM) 0xFEFF as its first character.

    IS_TEXT_UNICODE_REVERSE_SIGNATURE
       The text contains the Unicode byte-reversed byte-order mark (Reverse BOM) 0xFFFE as its first character.

    IS_TEXT_UNICODE_ILLEGAL_CHARS
       The text contains one of these Unicode-illegal characters: embedded Reverse BOM, UNICODE_NUL, CRLF (packed into one WORD), or 0xFFFF.

    IS_TEXT_UNICODE_ODD_LENGTH
       The number of characters in the string is odd. A string of odd length cannot (by definition) be Unicode text.

    IS_TEXT_UNICODE_NULL_BYTES
       The text contains null bytes, which indicate non-ASCII text.

    IS_TEXT_UNICODE_UNICODE_MASK
       This flag constant is a combination of IS_TEXT_UNICODE_ASCII16, IS_TEXT_UNICODE_STATISTICS, IS_TEXT_UNICODE_CONTROLS, IS_TEXT_UNICODE_SIGNATURE.

    IS_TEXT_UNICODE_REVERSE_MASK
       This flag constant is a combination of IS_TEXT_UNICODE_REVERSE_ASCII16, IS_TEXT_UNICODE_REVERSE_STATISTICS, IS_TEXT_UNICODE_REVERSE_CONTROLS, IS_TEXT_UNICODE_REVERSE_SIGNATURE.

    IS_TEXT_UNICODE_NOT_UNICODE_MASK
       This flag constant is a combination of IS_TEXT_UNICODE_ILLEGAL_CHARS, IS_TEXT_UNICODE_ODD_LENGTH, and two currently unused bit flags.

    IS_TEXT_UNICODE_NOT_ASCII_MASK
       This flag constant is a combination of IS_TEXT_UNICODE_NULL_BYTES and three currently unused bit flags.

    Sound impressive and interesting enough yet?

    A bit of trivia -- the code for a flag that used to be documented (IS_TEXT_UNICODE_DBCS_LEADBYTE) is still there (and it is still in the header file, obviously -- the PSDK never breaks people like that). But the flag does not work well, so it is probably just as well that it is not documented any more. I highly recommend not passing it. Or ignoring when it is returned. The flag not dangerous or anything; it's just not too terribly useful for its intended purpose (detecting text that is actually DBCS).

    As I mentioned, the API has been around since NT 3.5. It was written by someone else, outside of the NLS team (such as it was in those days). That is fairly cool since there was not as much Unicode awareness/acceptance back then as there is now....

    In those heady days when to most developers Unicode was little more than a foreign word that translated to "twice the memory and space required for strings", this function was mostly used as a way to know when to call WideCharToMultiByte to know when to convert strings out of Unicode1, and there were very few callers even for that not-so-noble purpose. NT 4.0 did not see much of a usage explosion, although Windows 2000 did , where the number of callers throughout the entire Windows source tree just about tripled (to 65 or so callers). Not much movement on the caller side in XP or Server 2003, either. I don't mind this fact much, given why it mostly seemed to be used.

    Some time between XP and Server 2003, I did add it to MSLU, as a nice gesture to developers who were frustrated by NT-only APIs2.

    Nevertheless, as the title of this post indicates, I don't like the IsTextUnicode API.

    You may think you know why -- go ahead, I'll give you three guesses.

    Guess #1: Because I do not own it?

    Sorry, that's not it -- but your opinion about my ego is noted. :-)  Strike one!

    I'll give you a hint.

    Hint#1: Look at the Platform SDK description (I'll add emphasis to enhance the hint):

    The IsTextUnicode function determines whether a buffer is likely to contain a form of Unicode text. The function uses various statistical and deterministic methods to make its determination, under the control of flags passed via lpi. When the function returns, the results of such tests are reported via lpi.

    Guess #2: Excuse me, I meant because the NLS team does not own it?

    Hmm, sorry. I figured that was you meant the first time. Strike Two!

    I'll give you another hint.

    Hint #2: There has only been one substantive change made to this API from the time of its creation until Server 2003 shipped -- a const was added to the lpBuffer parameter.

    Got it now? Think carefully now, this is your last guess.

    Guess #3: Because it considers "CRLF (packed into one WORD)" to be illegal, even though U+0d0a is MALAYALAM LETTER UU?

    Ooh, good one -- that looks like a bug in the IS_TEXT_UNICODE_ILLEGAL_CHARS flag detection. Even cooler that you properly figured out the byte reversal issue. Or maybe you did not notice that part, since both that ASCII CRLF packed into a WORD and the character would reverse on little-endian systems to look like 0x0a0d in memory, and if you did not allow for byte reversal you would have been right then anyway.

    Given the support for Malayalam described previously in the post Lions and tigers and bearsELKs, Oh my!, this is kind of embarrassing. Or maybe given the fact that the code point has been allocated since Unicode 1.1 (according to DerivedAge.txt) which was released in June of 1993 (according to enumeratedversions.html), this is particularly embarrassing. Though that does make the comment over its use in the API source pretty amusing:

                //  The following is not currently a Unicode character
                //  but is expected to show up accidentally when reading
                //  in ASCII files which use CRLF on a little endian machine.

    If you think about it, most UTF-16 big endian files would be from other operating systems and have just a CR or just an LF for their line breaks, even if they were just ASCII. I guess we know why there is no big-endian check for illegal characters? :-)  Makes the whole IS_TEXT_UNICODE_ILLEGAL_CHARS check weird even if it were not totally busted anyway.

    For MSLU fans, yes I ported this bug there as well, though not on purpose. Sorry about that, I am not used to reading code points as reversed bytes....

    Of course, since I did not know about this problem before, it can't be why I started this post not liking the API. Hell, if not for this imaginary conversation I put together, I still wouldn't know about it. Lucky for everyone that I have displayed this psychological dysfunction in public and thus cannot be further embarrassed by reporting the bug on it, right? Strike 3!

    Or we could call it a foul tip, since you found a decade-old bug and all. Ok, it is still Strike 2. :-)

    One more hint:

    Hint #3: There has been no change to this API's underlying mechanics since at least NT 3.51 (and probably since the original NT 3.5 release).

    Any more guesses?

    Guess #4: Because it only seems to test the first 256 bytes, no matter how big of a string I pass?

    Well, no. I never cared too much for that one, even before I came to Microsoft. But I never really found a file where it made a difference. It would be nice if someone were to change this, but I wouldn't lose any sleep over it -- so it's definitely not a reason to dislike an API. Strike 3!

    Ok, I'll just tell you now. Because as an API intended to verify whether a string is following a standard, it wins an award for its obtusitality. Why on earth would the following not have been added, over the years if not in the initial release?

    IS_TEXT_UNICODE_UNPAIRED_SURROGATES
       
    Since it is invalid to have a high surrogate without a low surrogate following it and a low surrogate not proceeded by a high surrogate, why not detect such non-conformant cases?

    IS_TEXT_REVERSE_UNICODE_ILLEGAL_CHARS
       It seems only fair to round out the checks for UTF-16BE by including the reverse version of this flag, doesn't it?

    IS_TEXT_UNICODE_INVALID_FOR_4_00
       Obviously new flags could be added for each major version -- what better way to check for what is invalid then to check against an official "valid" list?

    IS_TEXT_UNICODE_INVALID_SCRIPT_USAGE
       
    There are all kinds of sequences that would indicate bad usage, from combining marks from one script used in an unrelated script to illegal sequences to text with invalid ordering per the canonical combining classes, and so on.

    IS_TEXT_UNICODE_VALID_UTF8_PER_RFC2799
       The initial description of UTF-8 in RFC 2279, which I think is the method used by Notepad3.

    IS_TEXT_UNICODE_VALID_UTF8_PER_UNICODE
       
    The more strict definition of UTF-8, which disallows surrogate code sequences and other non-shortest forms.

    IS_TEXT_UNICODE_VALID_UTF32 / IS_TEXT_UNICODE_VALID_REVERSE_UTF32
       
    These flags could be combined with some of the older signature detection flags if a UTF-32 LE or BE signature is found.

    IS_TEXT_UNICODE_UCS2_32 / IS_TEXT_UNICODE_REVERSE_UCS2_32
       
    Analagous to the IS_TEXT_UNICODE_ASCII16/IS_TEXT_UNICODE_REVERSE_ASCII16 flags, they would detect UTF-32 that looks like it could all be represented as UTF-16 without needing surrogate pairs.

    You get the idea -- Unicode is a dynamic standard, getting more interesting and more complicated all the time, not just for its own sake but in how the platform uses it. How can an API which is written a decade ago and never updated, whose job is to ask "is this flipping buffer full of Unicode text?" ever hope to keep up with such a standard?

     

    1 - Notepad being a noteworthy exception to this rule, since it used the API to try to detect when a text file was Unicode without a BOM.

    2 - Similar to why BeginUpdateResource, UpdateResource, and EndUpdateResource were added, though I must admit that for the *UpdateResource APIs it was mainly due to the fact that former MSFTie Matt Curland did all the work to make the functions Win9x-friendly. :-)

    3 - These are the rules that have been used by MultiByteToWideChar in later years. Ironically, the MultiByteToWideChar API is used by Notepad to convert files that it detected as UTF-8 by using RFC 2279 rules, meaning that any illegal sequences will be dropped without so much as a warning. Better keep those CESU-8 files away from recent enough versions of Notepad!

     

    This post sponsored by out much maligned little brother "ഊ" (U+0d0a, a.k.a. MALAYALAM LETTER UU)
    Who, like the rest of the Malayalam script, felt very supported by XPSP2, only to find out that the IsTextUnicode API did not share that opinion....

  • Sorting it all Out

    Dumb quotes... or maybe they are just smart-ass quotes

    • 28 Comments

    (I think I mentioned 'Smart Quotes' previously, in passing

    If I had a dime for every time someone who was having trouble getting the Regional and Language Options unattend setting to work who posted as the command line they were running something like this:

    control intl.cpl,, /f:”filename.txt”

    Then I'd have to worry about the tax bracket I was going to be put into....

    In case you can't see the problem, it is pretty obvious if you blow up the text some:

    control intl.cpl,, /f:”filename.txt”

    At some point the person was looking at instructions in documentation or in email written by a copy of Outlook that has Word set as its mail editor.

    It replaced the regular ASCII quotes with so-called "smart" quotes, which can turn " (U+0022, a.k.a. QUOTATION MARK) into something else such as  (U+201d, a.k.a. RIGHT DOUBLE QUOTATION MARK). Which of course the command prompt will not recognize.

    Man I hate that feature. Not because it is isn't useful, because it can be. But it is not quite smart enough of a feature to know when it isn't helpful!

    Anyway, colleague Gwyneth Marshall provided me with as list that some version of Office uses for quotes used in different languages:

    Symbol
    Unicode Value
    Language
    'O'
    U+0027 Danish, Dutch, English, Finnish, Norwegian, Swedish
    "O"
    U+0022 Danish, Dutch, English, Finnish, Norwegian, Swedish
    ''O''
    U+0027 Danish, Dutch, English, Finnish, Norwegian, Swedish
    ‘O’
    U+2018, U+2019 Dutch, English, Italian, Norwegian, Portuguese, Spanish
    ‛O’
    U+201B, U+2019 Dutch, English, Italian, Spanish
    ’O’
    U+2019 Danish, Finnish, Hungarian, Norwegian, Swedish
    ,O‘
    U+201A, U+2018 Bulgarian, Czech, German, Icelandic, Lettish, Lithuanian, Polish, Romanian, Russian, Serbian, Slovak, Slovenian, Ukrainian
    ,O’
    U+201A, U+2019 Afrikaans, Dutch
    ‛O,
    U+201B, U+201A Greek, Italian, Turkish
    “O”
    U+201C, U+201D Dutch, English, Italian, Portuguese, Spanish, Turkish
    O”
    U+201F, U+201D Dutch, English, Italian, Portuguese, Spanish, Turkish
    „O“
    U+201D, U+201C Bulgarian, Czech, German, Icelandic, Lettish, Lithuanian, Polish, Romanian, Russian, Serbian, Slovak, Slovenian, Sorbish
    „O”
    U+201E, U+201D Afrikaans, Danish, Dutch, Hungarian, Polish, Russian
    ”O”
    U+201D Danish, Finnish, Norwegian, Swedish
    “O„
    U+201C, U+201E Greek, Italian, Turkish
    ‹ O ›
    U+2039, U+203A Albanian, Byelorussian, Estonian, French, Greek, Italian, Lithuanian, Norwegian, Portuguese, Romanian, Russian, Spanish, Turkish
    ›O‹
    U+203A, U+2039 Danish, Polish, Serbian, Slovak,Slovenian
    ›O›
    U+203A Finnish, Swedish
    « O »
    U+00AB, U+00BB Albanian, Byelorussian, Dutch, Estonian, French, Greek, Italian, Lettish, Lithuanian, Norwegian, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian
    »O«
    U+00BB, U+00AB Croatian, Danish, German, Hungarian, Polish, Serbian, Slovak, Slovenian
    »O»
    U+00BB Finnish, Swedish
    〝O〟
    U+301D, U+301F East Asian
    〞O〟
    U+301E, U+301F East Asian
    「O」
    U+300C, U+300D East Asian
    『O』
    U+300E, U+300F East Asian
    ׳O׳
    U+05F3 Hebrew
    ״O״
    U+05F4 Hebrew

     

    Now if only Office 14/Word 14 can be made smart enough to detect the cases where the feature is not needed, it could save everyone a lot of grief!

    We're all sick of the smartass/dumbass aspects of this particular feature.:-)

     

    This post brought to you by  (U+201d, a.k.a. RIGHT DOUBLE QUOTATION MARK)

  • Sorting it all Out

    Consistent garbage text can be incorrect encoding identification (or detection)

    • 15 Comments

    Mushy asked in the Suggestion Box:

    Michael,
    You may have an old post that explains the following situation, so point me to it if you do.  Here is the issue. If you look on my blog page you'll see some quotation marks printed as ’ instead of ".  When I first posted the material they weren't there.  Now, most of the " marks are in some other code.  How do you get rid of them?  Is this caused by Word, .doc, .txt, .rtc formats?  Which is best to post in?

    P.S. I like your blog and have bookmarked it.

    Thanks,
    Mushy

    (For those who are interested, Mushy's blog is Cross+Hairs.)

    Some people may be familar with the byte sequence; if you are then you are a geek. :-)

    But to see what is going on, let us first consider Microsoft Notepad's detection behavior around encodings:

    • If a UTF-16 LE BOM is there, then it's UTF-16 LE.
    • If a UTF-16 BE BOM is there, then it's UTF-16 BE.
    • If a UTF-8 BOM is there, then it's UTF-8.
    • If it appears to be valid UTF-8 according to the old RFC2279 definition, then it is assumed to be UTF-8.
    • Otherwise, it is assumed to be in the default system code page, CP_ACP.

    So, armed with this knowledge, lt's try the following:

    • Create a new text file in Notepad
    • Add the following string to it:    ’
    • Save the file and close it.
    • Open the file

    What you will find is that the byte sequence 0xE2 0x80 0x99, which in code page 1252 (and the original saved file) looks like:

    ’

    has been interpretted by the new instance of Notepad as:

    in UTF-8 -- because sequence 0xE2 0x80 0x99 is what U+2019 (RIGHT SINGLE QUOTATION MARK) looks like in the underlying UTF-8 datastream.

    If a web page is showing such sequences, this is usually caused by incorrect charset meta tag info on the page, incorrect header info from the server, incorrect code page detection on the client, or some combination of those issues....

    If the problem occurs with other code pages, the exact representation will be  different:

    Slight differences in some of them, but it helps point out why strange garbage character sequences are often just not properly detecting UTF-8....

     

    This post brought to you by   (U+2019, a.k.a. RIGHT SINGLE QUOTATION MARK)

  • Sorting it all Out

    For the [locale] explorer in you....

    • 7 Comments

    One of the big problems with presenting about locales, for example the updated ones in Vista, is that they can be hard to demo -- I mean, how can you show them all, or show what is in each one?

    In many prior posts and in MSDN Magazine articles and presentations. I have been using a slightly modified version of Francois Liger's Culture Explorer sample (I did the minimal modifications to have Windows only cultures and custom cultures show up).

    Thankfully, I can quit using my hacked up version, since Francois has updated his sample and Culture Explorer 2.0 is now available!

    (24 downloads since the update went up this afternoon, let's see if we can bump that number up a bit!).

    Now if a picture is worth 100 words, then this app is worth 1000 pictures. :-)

    Whether one is looking at the way you can zoom on any text to help make the Lao text more legible:

    or whether one is looking at the hilarious day and month names in the Valley Girl custom locale:

    Or maybe at its number and currency info:

                    

    Maybe the Inuktitut day names will catch your fancy?

    Though I am partial to the Mongolian month names, myself:

    Whether one is staggered by the Oriya full date time pattern:

    or the Uighur native currency name:

    For some it will always be about the Klingon:

    Though the fact that the About... button will get you the link to the email address of the guy who put this one together may be the coolest part for people who have even more suggestions for version 3.0!

    Definitely worth picking up to see Vista's locale support in all its glory, as well as the .NET Framework's support of cultures, which is even more impressive on Vista....

    Whether you  want to admire the fascinating typography or the interesting way Francois added support for genitive dates or the source code so you can see how to do some of it yourself or whether you noticed the interesting bug hidden in the screenshots above that is either in Windows or the .NET Framework (though I am betting on Windows in this case and I'll explain why I think so before I even investigate the problem in a day or two!), there is something for everyone in here....

    Enjoy!

     

    This post brought to you by  (U+1831, a.k.a. MONGOLIAN LETTER SHA)

  • Sorting it all Out

    What's wrong with what FxCop does for globalization, Part 1

    • 7 Comments

    It was way back in December of last year that I was talking about problems with FxCop and globalization (in posts like this one and this other one).

    Then I sort of forgot about it.

    Sorry, I have been busy. :-)

    But then in the course of a month I was once again reminded of the problems in the form of three large projects that were (for lack of a better word) assaulted by these problems.

    If you are a fan of the programs from the BuffyVerse (Buffy the Vampire Slayer and Angel) then you know that one of the rules with vampires is that they can't come in to your home unless you invite them in.

    Well, FxCop is the same way -- you have to run it and then do something to follow its advice; if you don't, then nothing bad can befall you. :-)

    Here is how the FxCop experience works:

    1) You find the FxCop and run it.

    2) You don't understand the errors at all as they seem wrong, then you discover that you are running the wrong version of FxCop for your project.

    3) Repeat steps 1 and 2 another time or two while you get the right version.

    4) Look at the errors and warnings, there are a bunch of level 1 and level 2 errors so you decide for this version to leave the level 3 and level 4 errors alone1.

    5) You ship your product.

    6) In the space between ending development work on the version you just shipped and starting the development on the next version, someone decides to tackle all of the globalization errors.

    Here is where the trouble happens, where some well-meaning developer has decided to let the vampires in, and indeed to help to extract the blood for them from your project.

    The specific warnings are described here, there are really just three of them in particular, though they can easily leads to hundreds or even thousands of occurrences in a single project. They apply to every single method that either has an overload that takes a

  • Set locale for data types
  • Specify CultureInfo
  • Specify IFormatProvider
  • The problems with the warnings are twofold:

    1) FxCop assumes that if you are running the method that does not let you specify and IFormatProvider or a CultureInfo or you use the method with an override that dies not let you do so that your usage is wrong.

    2) FxCop shuts up if you specify such a thing, even if it will return results that can be entirely wrong (like passing CurrentUICulture for formatting) or will be ignored )like passing a DateTimeFormatInfo style IFormatProvider when one is formatting numbers).

    #1 can make the code less readable even if it was right before, and #2 the problems are I think self-explanatory though when one considers how easy it is to add CurrentUICulture to get rid of 90% of the warnings even though the answer is almost always wrong, you see where I start ascribing field-like attributes to our FxCop vampire we invited in.

    In one such project, between the time that Vista was released and the beta 3 date of Server 2008, someone had updated the code in 250 different places, almost all of them wrong, and FxCop was happy with each mistake since it assumes that if you take the time to specify an IFormatProvider or a CultureInfo that you must have chosen one correctly, even if the choice is dead wrong.

    Needless to say, the developer who was left holding the bag on this one (she was not the well-meaning developer who introduced the regressions) Kathy, once we walked through the very basic rules by which one could intelligently decide what the right fix should have been, was able to bring the product back to health in short order.

    Unfortunately, not everyone is as skilled as she is, and not everyone has a team like the International Fundamentals team to ask for help.

    But Kathy was kind enough to let me talk about the project as long as I didn't identify it and as long as I gave the rules that she was able to use to quickly bring the code back to a state of readiness. Here they are:

    • String comparison:
      • String comparison for equality -- Specify StringComparison.Ordinal or OrdinalIgnoreCase. This is true when comparing against string resources, and particularly important when comparing file names.
      • String comparison for sorting – Specify StringComparison.CurrentCulture when sorting lists
    • String.ToUpper – Use ToUpper rather than ToLower, and specify InvariantCulture in order to pick up OS casing rules
    • Determining BiDi languages, for example to load correct icons – Continue to specify CurrentUICulture for these cases.
    • String formatting including numbers, and int.ToString – Specify CurrentCulture
    • String formatting of strings only (no numbers) – the Culture arg here is a no-op.  Use CurrentCulture.

    But getting back to the problem with FxCop, it does not do enough real analysis of the usage. And as long as FxCop wants to warn that the code below is bad (switch to "English (United Kingdom)" in Regional Options to see the differences):

    Console.WriteLine(DateTime.Now.ToString());

    Even when it thinks the code below which returns identical results and is just completely wrong and weird is somehow fine to it:

    Console.WriteLine(DateTime.Now.ToString(NumberFormatInfo.InvariantInfo));

    When the intended code which will return different results would be:

    Console.WriteLine(DateTime.Now.ToString(CultureInfo.InvariantCulture));

    Then FxCop for globalization errors has a net effect of leaving the code in a possibly still incorrect and in all likelihood less readable form....

    To spread the blame around a bit, the .NET Framework decision to have the IFormatProvider parameter be one that is implemented as "if the param applies then use it, otherwise treat is as null" is also slightly boneheaded, since that is the sort of thing that can easily lead to the wrong code being put in there in the first place.

    But it is too late to fix all of them, so with our last defense as FxCop the tool's strength at defending against these scenarios is quite limited....

     

    1 - The bulk of the globalization errors are there in those lower levels, I assume not because globalization isn't important but because there are a lot of false positives so if you average them together then they are less important per warning.

     

    This post brought to you by (U+1001, a.k.a. MYANMAR LETTER KHA)

  • Sorting it all Out

    When it is harder to fit your calendar into things than things into your calendar

    • 0 Comments

    Support Engineer Scott Heim had a question he asked yesterday:

    Hi all,

    I have the MonthCalendar control on a form and when this is displayed in XP, the calendar displays correctly; however, on Vista machines the calendar appears larger and the “Saturday” dates are cut off. Has anyone seen this before? Is there a way around this?

    The form the control is on is not much larger than the control itself. I have a small repro here:

    [I compiled the repro and ran it on both Server 2003 and Vista to take the screen shots below -- michkap]

         

    Thanks,

    Scott

    And support engineer Dave Anderson came to the rescue with the following response:

    Yes, the MonthCalendar control is larger when using the V6 common controls on Windows Vista. You can adjust the size of the form based on the size of the control at runtime. I added the following code for the form’s Load event handler:

            private void Form1_Load(object sender, EventArgs e)
            {
                this.ClientSize = monthCalendar1.PreferredSize;
            }

    -Dave

    And indeed, when you add this code things fit once again:

    Perfect. :-)

    Now obviously this is a special case (a form that is meant to be the same size as the calendar) but the general principle can be applied in situations where controls are packed too tightly and changing the size might affect localized form by causing controls to overlap (definitely something to avoid).

    One thing developers should be very careful about any time they are building dynamic UI metrics this way in projects that are going to be localized is to make sure that the fact that the UI metrics change at runtime is communicated to the localizer -- there are few things more frustrating than truncation bugs that a localizer can't do anything about but that they have to go through multiple iterations to discover that fact!

    And now that I have hijacked the question to get up on my localizability soapbox, I'll close with a message of more general use. :-)

    The messge? The fact that the Shell common controls do not guarantee backward compatibility with their metrics is an important issue to keep in mind -- or you could find yourself getting truncated, too....

     

    This post brought to you by (U+2ea6, a.k.a. CJK RADICAL SIMPLIFIED HALF TREE TRUNK)

  • Sorting it all Out

    What's wrong with what FxCop does for globalization, Part 0.5 (a segue)

    • 3 Comments

    I thought I'd talk about the FxCop issue from a slightly different standpoint, and discuss something that has nothing to do with FxCop to give an example of my concerns.... 

    If you look at Writing Culture-Safe Managed Code (a .NET Framework Deployment white paper), you'll see a good and typical picture of how a technically savvy person might approach supporting international code without really trying to delve too deep into it (for an example of what I mean, see the section entitled Other Countries  where a quick enumeration of cultures to "worry" about is given!).

    Incidentally you can be amused if you look at the section ironically entitled Incorrect Code Example you will see one of the earlier beliefs -- that CurrentCulture was evil for string comparisons but InvariantCulture was a good idea, something that this blog has gone to some trouble to debunk since that time.... :-)

    Anyway, if you scroll down a bit, you will see a conversation about the Turkish I (a popular devil when one is trying to talk about culture-safe coding practices!). But the text, which names the Unicode code points for the dotless lowercase and dotted uppercase I (U+0130 and U+0131), actually (presumably unintentionally) shows the capital and small Y with acute (U+00dd and U+00fd):

    What's up with that?

    Well, if you look at the definitions for Windows code page 1252 and Windows code page 1254, you'll see part of the problem -- where 1254 defines the Turkic I additions, 1252 defines the Y with acute.

    Of course that only tells part of the story. The page itself is encoded as UTF-8, so trying to change to either of these other two encodings will mess up the page:

     

    So what is going on here?

    The most likely problem is that some tool or application that produced the document did not save it as Unicode but instead as the Turkish code page, and then later some other tool, in converting it to Unicode simply assumed it was cp 1252. The text is therefore corrupted at this point, with no clean way to fix it.

    The paper itself reminds me somewhat of that .NET Framework Developer's Guide: Custom Case Mappings and Sorting Rules topic I have discussed previously, in that neither one of them helps with international awareness; they are both written mostly from the standpoint of international mitigation, of how to protect your app from the world.

    In my opinion, this is unfortunately the biggest problem in what FxCop does, the problem underlying the issues I was talking about in What's wrong with what FxCop does for globalization, Part 0. The final result that people seem to most often work toward after reading these pages or running this tool is to "culture proof" their code much more than any kind of attempt to properly support other cultures or enable an application to do so.

    I am not going to blame FxCop here, as I think it is really the many surrounding documents and topics that are kind of directing the effort. As I was kind of giving some examples of here....

    For the next post of the series, I'll start moving into my suggested solutions. :-)

     

    This post brought to you by  (U+0DDE, a.k.a. SINHALA VOWEL SIGN KOMBUVA HAA GAYANUKITTA)

  • Sorting it all Out

    Something .NET does less intuitively than they ought

    • 2 Comments

    If you are someone who reads the BCL Team blog, you may have seen Josh Free's String.Compare() != String.Equals() that he just posted.

    Of course this is old hat if you are a regular reader here and remember seeing Invariant vs. Ordinal, the third or Something .NET does more intuitively than Windows, both posted last year.

    Even just today, I was asked by someone to provide some comments for this MSDN topic to clarify something in the comments for the String.Compare Method (String, String, Boolean) overload.

    The specific question related to which StringComparison enumeration member corresponded to that third ignoreCase parameter. Which is not clear to everyone. But then that whole overload wasn't necessary (more on this in the future).

    And I just realized that I am sick of all of the extra overloads off of System.String, the StringComparison enumeration, the StringComparer class, and all of the rest of the confusing methods that are there, all of which should have and could have been replaced with a simple usage of CompareInfo for linguistic comparisons.

    The whole reason methods kept getting added is that although they found the one method with an enumeration confusing, they found the one method with no options to be too limiting. So they started adding overloads and methods and named them such that no one could ever know which one to use without reading a fifteen page document that no one understands, not even the really smart developers.

    "But Michael," they tell me, "the System.Globalization namespace is not referenced by default." This is an argument I refuse to by since every time there is an interface that is important feature, it does get added by default, like Generics did in VS 2005. So System.Globalization is clearly not important enough to include, but it is important enough to wrap dozens of different ways that no one understands.

    Grrrr.

    Ok, I am over it now....

    That is part of letting go when you work in new areas like I do now -- you have to try to not jump into the old area all the time if you think people might be doing something that you don't care for....

    Though on the plus side, I do get to help out some teams work through these issues (I'll blog about this soon), and I suppose I still get to complain here.... :-)

     

    This post brought to you by Щ (U+0429, a.k.a. CYRILLIC CAPITAL LETTER SHCHA)

  • Sorting it all Out

    Pluralization(s) can be singularly difficult

    • 2 Comments

    A tribute to plurals, with fondest memories of the first comedian I ever enjoyed, Allan Sherman (original inspiration of Weird Al Yankovic for those who don't know the name):

    One Hippopotami

    One hippopotami cannot get on a bus,
    Because one hippopotami is two hippopotamus.
    And if you have two goose, that makes one geese.
    A pair of mouse is mice. A pair of moose is meese.

    A paranoia is a bunch of mental blocks.
    And when Ben Casey meets Kildaire, that's called a paradox.
    When two minks fall in love, with all their heart and soul,
    You'll find the plural of two minks is one mink stole.

    Singulars and plurals are so different, bless my soul.
    Has it ever occurred to you that the plural of "half" is "whole"?

    A bunch of tooth is teeth. A group of foot is feet.
    And two canaries make a pair--they call it a parakeet.
    A paramecium is not a pair.
    A parallelogram is just a crazy square.

    Nobody knows just what a paraphernalia is.
    And what is half a pair of scissors, but a single sciz?
    With someone you adore, if you should find romance,
    You'll pant, and pant once more, and that's a pair of pants

    Pluralization is hard.

    Even in English you need a huge dictionary with all of the weird and interesting exception cases (once you convince yourself that sticking an "s" on the end of every word won't do it.

    It came up again in multiple comments to In a much better position to handle inserts by Centaur like this one and this other one.

    And his examples were not really overstatements, believe it or not.

    We'll start with the obligatory Wikipedia link on pluralization, which will help to scratch the  surface enough to make one realize what one has just gotten oneself into.

    Languages like Spanish are considered pretty simple but can still fill a page with the explanation of them.

    English is reasonably simple with its cases for one, many, and uncountable (where the uncountables are usually in singular form, and zero items take the plural form). But all of the rules with subject/verb agreement are the min force behind me not paying attention in English class as a youth and being much more interested in the weird rules of other languages than the rules of my own. You could almost blame my linguistic notions on the crazy orgy of inconsistencies embodied in my native tongue.

    And then in French things are pretty simple (ref: here, here, and here). Then again those page list exceptions up the wazzoo. Oh, and zero items take a singular form, which also sounds weird to me though someone from France would find the converse to be true.

    Then there is Hebrew, whose uncountable words tend to take plural form. Oh, and they add gender to the mix as many others do, each with a different suffix. Then there are those bisexual words like "one" which have both a masculine and feminine suffix form. And some words that are feminine yet take a masculine plural suffix.

    Most Indic languages have singular, dual, and plural forms, though Hindi only has singular and plural while Sanskrit has the dual form too.

    Lots of other Indo-European languages also have a dual form.

    Polish has singular and plural like most of them, but then it also has a paucal form for when the last digit is 2, 3, or 4 (not including 12, 13, or 14).

    Persian (or is it Farsi? Or maybe not!) has many rules a lot like English, other than the influx of Arabic loan words that come with their plurals and make up a lot of exceptions -- which, come to think of it, is also a lot like English. Though with different loan words (and of course the different script).

    And Slovenian has a special purpose "dual" that is used for all numbers ending in two.

    To put into programming a bit, Jeff Boulter has talked about it in his 5 way(s) to pluralize, and I just noticed that Tom White also quoted a bit of that Allan Sherman sing has even made a plea for people with knowledge of other languages to get involved with Java solutions here.

    But C# is out there too -- see dmitryr's Simple English Noun Pluralizer in C#, for example, which has a couple of great comments that delve into additional exceptions and other language.

    Or fun ones like Bradley Tetzlaff's C# 2.0 Ninety Nine Bottles of Beer Example, which shows a very important practical implementation. :-)

    Even my own IStemmer'ed the tide talks about how stemming is involved with pluralization (among other things).

    The rules are very complex even to get any one language done perfectly, so doing lots of languages is staggering.

    Definitely a hard problem to consider. I think I'll leave that one alone, myself, and just try and stem some tides (leaving the stemmering to others!).

     

    This post brought to you by S (U+0053, a.k.a. LATIN CAPITAL LETTER S)

  • Sorting it all Out

    Look out for Font Rage

    • 4 Comments

    It is a known fact that some people hate Comic Sans MS (why else have a website like http://www.bancomicsans.com/ if everyone loved it?).

    Though as Mark Liberman pointed out yesterday in Language Log in his post entitled Font Rage, some people are choosing to be pretty extreme.

    I happen to like the font -- it is my main font in email in Outlook, and just like last year I still pine for the day that they make Comic Sans Fixed a reality:

     

    This despite the fact that holding my breath waiting would likely prove fatal....

    I'll dig up another instance of font rage in a few hours.

     

    This post brought to you by "" (U+0d86, a.k.a. SINHALA LETTER AAYANNA)

  • Sorting it all Out

    Sad? You can sit at your console and console yourself by putting Consolas in your console

    • 7 Comments

    Several people have asked me about using Consolas in CMD. Why isn't it in the list automatically, if they have the font, some of them have wondered....

    The info on the requirements for a font to appear in the console on Windows are in the article Necessary criteria for fonts to be available in a command window.

    Also in that article is the way to force the font in there on all of the non-Win9x based versions of Windows. Basically you can create a .REG file with the following text in it:

    Windows Registry Editor Version 5.00

    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont]
    "00"="Consolas"

    Click the file to write to the registry, and then reboot. You can now use Consolas in the console. :-)

     

    This post brought to you by (U+221e, a.k.a. INFINITY)

  • Sorting it all Out

    How to turn off the CAPS LOCK key

    • 43 Comments

    Earlier today, I talked about the fact that some people hate the CAPS LOCK key. Well, Jeff D. posted the steps to turn the CAPS LOCK off in Windows XP and Server 2003, and I thought that I would repost them so that others who are not looking at the comments would benefit:

    The other solution I've used is to simply change the cancellation of CAPS LOCK to be done with the Shift key instead. sO THAT i CAN'T ACCIDENTALLY DO THIS.

    1. Control Panel | Regional and Language Options applet | Languages tab | Details button, which gets you to the "Text Services and Input Languages" dialog.
    2. On that page "Add" a second input language (you don't have to make it the default). The two (or more) I usually have installed are "English (Canada) - US" and "English (United States) - US". Once you've got more than one input language installed, the "Key Settings..." button becomes enabled: click it. That opens the "Advanced Key Settings" dialog. Right at the top you'll see:
    3. To turn off CAPS LOCK: [ ] Press the CAPS LOCK key [x] Press the SHIFT key

    Voila! Now if you have accidentally hit the CAPS LOCK key, as soon as you start typing the next sentence, which presumably starts with a capital letter, you'll cancel the CAPS LOCK immediately.

    Very cool.... thanks, Jeff!

     

  • Sorting it all Out

    Explaining the Windows XP/Server 2003 Regional and Language Options Dialog

    • 34 Comments

    (This page was originally posted at http://i18nWithVB.com/win2k.htm but I thought it could use a wider audience)

    Explaining the Windows XP/Server 2003 Regional and Language Options Dialog

    Disclaimer: This page is not officially sanctioned by Microsoft. If it were, it would almost certainly be more polite. :-)

    A lot of work was done to this dialog since Windows 2000, including massive shifts in terminology. Here is the handy-dandy conversion chart of the most important items:

    Windows 2000 term

    Windows XP/Server 2003 term

    Regional Options

    Regional and Language Options

    Default User Locale

    Standards and Formats

    Default System Locale

    Language for Non-Unicode Programs

    Language Settings for System

    Supplemental Language Support

    There are many other changes, as well. While I do welcome change when there are confusing issues, I am not sure how much I welcome change that others will find to be just as confusing. I'll let you decide how you feel about this particular issue yourself....

    Anyway, here are some screenshots for the three important tabs for the dialog:





    The first change is obvious -- the settings that used to show up on the first tab are now spread across three of them.

    Here is each part, explained:

    Language for Standards and Formats - Located in the first tab, these are the preference that you, the user, has for items like date formats, calendar, preferences for text sorting, etc. Now most of these settings can be handled individually by clicking on the Customize button. You can think of this dropdown combobox as a useful way to be lazy and have settings made automatically based on the locale you choose. There are really no standards per se involved (such as sorting), but not everything there is a format so there had to be something else there.

    I will talk about the Location stuff some other time.

    Supplemental Language Support -- In Windows 2000 this was a list containing various familes of locales corresponding to lanaguage groups, but now most of the support is already installed and turned on. In fact, there are only two groups that are not:

    1. Languages that require complex script support, including Thai, Hebrew, Arabic, Hindi, Tamil, and all the rest of the Indic locales
    2. East Asian languages, i.e. Chinese, Japanese, and Korean

    This information is in the second tab and is handled by two checkboxes. These two checkboxes control the installation of all the code pages, fonts, keyboards, etc. so that applications can support the particular language. You will probably be prompted for your Windows CD to install the files that you are in essence requesting.

    The top of the second tab handles input methods. I will talk about that more another time.

    User Interface Language - You may not have this control on your regional options at all; it is only there if you have MUI (the Multilingual User Interface) installed. This allows you to change the actual language of Windows itself. It has no effect, I repeat no effect, on your installation of Windows otherwise. At all. Period. If you think it will, then cure yourself of this delusion and realize that you do not need MUI to have a multilingual experience on Windows XP and Server 2003!

    Language for Non-Unicode Programs (aka Default System Locale) - Located on the third tab, this setting is the one that controls, at the machine level, the locale that will be used for all conversions to and from Unicode for applications without Unicode support (like VB 6.0, for example). If you change the Default System Locale, you will be prompted to reboot afterwards (you may be prompted for your Windows CD first if you need to install some files). But I cannot stress it strongly enough: this is the top control on the third tab. You would not believe how many people mess this up and try to change the language at the top of the first tab under "Standards and Formats"! So think carefully and allow yourself to be one of the people laughing about the confusion, rather than one of the people being laughed at.

    Incidentally, it also controls the font "language" that is used for the case of [primarily] East Asian fonts that have more than one name, based on language.

    Under this are the various code pages you can install. I recommend you use Unicode and avoiding needing these things. :-)

    Default Settings - Although the title is misleading, this checkbox located on the bottom of the third tab is incredibly useful in many situations. What is does is apply any changes you make on any of the three tabs to .DEFAULT, the default user profile (copied for all new user accounts), and several system accounts. In the case of keyboards, it copies all keyboards that have been selected by the given user whether they were selected at this time or not and applies them to the .DEFAULT account. The latter is very useful if you want the ability to switch keyboards in the logon dialog.

    This setting does not exist in prior versions, which is a damn shame since people try all the time to e.g. set the default user locale on a web server and expect that change to be applied to their IIS. It does not immediately occur to most people that the setting only applies to the currently logged in user; unfortunately understanding is likely piss off any reasonable person since Windows 2000 does not provide any user interface to resolve the issue. Thankfully, much of the problem is taken care of with this one confusing setting.

    That's all for now. Let me know if you have any questions or comments about this page!

  • Page 1 of 257 (3,844 items) 12345»