Blog - Title

December, 2006

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    What's wrong with what FxCop does for globalization, Part 0

    • 7 Comments

    Some people really like the visibility that globalization gets with FxCop in the managed world.

    I have a net neutral feeling about it, myself. I mean, I like the visibility, I am just not sure that it is improving the behavior much. And I am sure that it is exposing a flaw in the way many of the methods are put together since it makes some code feel less readable to people at times.

    Like the other day, when Navid asked:

    What are the benefits or possible disadvantages of doing the following (assume b is any non-string object):

        string a = string.Concat(b);

    Instead of:

        string a = b.ToString();

    In actuality, to adhere to FxCop guidelines, you’ll need to specify the IFormatProvider when using the ToString() method. Therefore, the correct statement should be:

        string a = b.ToString(CultureInfo.InvariantCulture);

    The first thing to notice is that if b is null, the ToString() method will throw an exception whereas the first statement will work without problems. This may save you a null-check in certain circumstances. Furthermore, since no IFormatProvider is necessary, you don’t have to import System.Globalization into your class either.

    However, I suppose some disadvantages may be that using Concat() instead of ToString() may not be as clear for some readers. Also, if you need to use a specific CultureInfo, then ToString() may be the obvious choice. With respect to performance, I have no idea on the implications but would guess that ToString() is probably slightly faster. It may be worth looking at the IL to see what the differences actually are.

    In any regard, I find the first statement more pleasing but I am unsure if it’s actually The Right Thing To Do. I am more or less using the first statement as a shorthand for:

        string a = string.Empty;
        if (!string.IsNullOrEmpty(b))
        {
            a = b.ToString(CultureInfo.InvariantCulture);
        }

    Now it is unfair to blame FxCop here -- it is making a very valid point, stating (in SDET David Kean's words) "Make sure you pass an IFormatProvider to b.ToString() to make it clear and explicit what IFormatProvider is being used. If you do not specify one, the default in most cases is CultureInfo.CurrentCulture, which will cause the output of ToString to vary depending on the user’s currently selected locale."

    Using System.String.Concat, which is clearly the wrong method from an intent standpoint, rather than System.String.ToString, just to avoid the other two evils (an FxCop warning or an unhanded FxCop error) is as clear indication that in this case, FxCop has no choice but to either encourage bad behavior or encourage developers to use unclear code constructs.

    Has the actual issue in question been addressed? Is the code more properly globalized?

    In my opinion the only thing that happens to globalization in this case is that developers who wouldn't know Unicode from UNICEF have every reason to feel like globalization is a four letter word.

    And it hurts the reputation of the tool too -- I mean, how did everyone feel about the kid in high school that made them do some silly thing like not run in the halls? Did they feel more educated? Or did they just decide that the person handing out citations is just really annoying?

    (apologies if you were one of those ho used to hand out the warnings, but many people did find you to be kind of annoying!)

    Now some people suggest that the problem is that the wrong defaults were chosen in the beginnings of the .NET Framework (Anthony Moore is one of the more prominent holders of this opinion). But if the problem is that people don't like adding a StringComparison.OrdinalIgnoreCase to their String.Compare calls, or alternately if the problem is that they do not find it intuitive to use String.CompareOrdinal, then the bug is not that OrdinalIgnoreCase isn't the default comparison type. Such a default would just mean that we'd be complaining about a whole different set of bugs when people blindly used a default that did not match their scenarios.

    The bug, to the extent that there is a bug, is in the naming scheme being so unintuitive that there is no way on earth that any human would have expected the "preferred" terms to be used unless they either read documentation or blog posts like this one that suggested it to them. And as much as I may think of my little fruit stand here, I am really not quite ready to assume that it is thought of as "intuitive" to read this blog before one writes code! :-)

    Similarly, I do not think the developers who think there ought to be an OrdinalCulture of some type are retarded; I simply think that they are trying to work within a system that simultaneously suggests the importance of always specifying a culture and always using Ordinal comparisons. The developer who is wondering where he can find the OrdinalCulture is arguably the only person involved who doesn't seem to be developmentally disabled, if you know what I mean.

    Now with that said, I believe that there are things that can be done in all of the following to lead to a place where code could be better and globalization might not be thought of along with various four letter words:

    • FxCop
    • Globalization methods/properties within .NET
    • Documentation

    and I'll try blogging about some of my thoughts about what could be done in each area.

    You an think of this post as the introduction to the topic.... :-)

    This post brought to you by  (U+0DDD, a.k.a. SINHALA VOWEL SIGN KOMBUVA DIGA AELA-PILLA)

  • Sorting it all Out

    For the [locale] explorer in you....

    • 7 Comments

    One of the big problems with presenting about locales, for example the updated ones in Vista, is that they can be hard to demo -- I mean, how can you show them all, or show what is in each one?

    In many prior posts and in MSDN Magazine articles and presentations. I have been using a slightly modified version of Francois Liger's Culture Explorer sample (I did the minimal modifications to have Windows only cultures and custom cultures show up).

    Thankfully, I can quit using my hacked up version, since Francois has updated his sample and Culture Explorer 2.0 is now available!

    (24 downloads since the update went up this afternoon, let's see if we can bump that number up a bit!).

    Now if a picture is worth 100 words, then this app is worth 1000 pictures. :-)

    Whether one is looking at the way you can zoom on any text to help make the Lao text more legible:

    or whether one is looking at the hilarious day and month names in the Valley Girl custom locale:

    Or maybe at its number and currency info:

                    

    Maybe the Inuktitut day names will catch your fancy?

    Though I am partial to the Mongolian month names, myself:

    Whether one is staggered by the Oriya full date time pattern:

    or the Uighur native currency name:

    For some it will always be about the Klingon:

    Though the fact that the About... button will get you the link to the email address of the guy who put this one together may be the coolest part for people who have even more suggestions for version 3.0!

    Definitely worth picking up to see Vista's locale support in all its glory, as well as the .NET Framework's support of cultures, which is even more impressive on Vista....

    Whether you  want to admire the fascinating typography or the interesting way Francois added support for genitive dates or the source code so you can see how to do some of it yourself or whether you noticed the interesting bug hidden in the screenshots above that is either in Windows or the .NET Framework (though I am betting on Windows in this case and I'll explain why I think so before I even investigate the problem in a day or two!), there is something for everyone in here....

    Enjoy!

     

    This post brought to you by  (U+1831, a.k.a. MONGOLIAN LETTER SHA)

  • Sorting it all Out

    What's wrong with what FxCop does for globalization, Part 0.5 (a segue)

    • 3 Comments

    I thought I'd talk about the FxCop issue from a slightly different standpoint, and discuss something that has nothing to do with FxCop to give an example of my concerns.... 

    If you look at Writing Culture-Safe Managed Code (a .NET Framework Deployment white paper), you'll see a good and typical picture of how a technically savvy person might approach supporting international code without really trying to delve too deep into it (for an example of what I mean, see the section entitled Other Countries  where a quick enumeration of cultures to "worry" about is given!).

    Incidentally you can be amused if you look at the section ironically entitled Incorrect Code Example you will see one of the earlier beliefs -- that CurrentCulture was evil for string comparisons but InvariantCulture was a good idea, something that this blog has gone to some trouble to debunk since that time.... :-)

    Anyway, if you scroll down a bit, you will see a conversation about the Turkish I (a popular devil when one is trying to talk about culture-safe coding practices!). But the text, which names the Unicode code points for the dotless lowercase and dotted uppercase I (U+0130 and U+0131), actually (presumably unintentionally) shows the capital and small Y with acute (U+00dd and U+00fd):

    What's up with that?

    Well, if you look at the definitions for Windows code page 1252 and Windows code page 1254, you'll see part of the problem -- where 1254 defines the Turkic I additions, 1252 defines the Y with acute.

    Of course that only tells part of the story. The page itself is encoded as UTF-8, so trying to change to either of these other two encodings will mess up the page:

     

    So what is going on here?

    The most likely problem is that some tool or application that produced the document did not save it as Unicode but instead as the Turkish code page, and then later some other tool, in converting it to Unicode simply assumed it was cp 1252. The text is therefore corrupted at this point, with no clean way to fix it.

    The paper itself reminds me somewhat of that .NET Framework Developer's Guide: Custom Case Mappings and Sorting Rules topic I have discussed previously, in that neither one of them helps with international awareness; they are both written mostly from the standpoint of international mitigation, of how to protect your app from the world.

    In my opinion, this is unfortunately the biggest problem in what FxCop does, the problem underlying the issues I was talking about in What's wrong with what FxCop does for globalization, Part 0. The final result that people seem to most often work toward after reading these pages or running this tool is to "culture proof" their code much more than any kind of attempt to properly support other cultures or enable an application to do so.

    I am not going to blame FxCop here, as I think it is really the many surrounding documents and topics that are kind of directing the effort. As I was kind of giving some examples of here....

    For the next post of the series, I'll start moving into my suggested solutions. :-)

     

    This post brought to you by  (U+0DDE, a.k.a. SINHALA VOWEL SIGN KOMBUVA HAA GAYANUKITTA)

  • Sorting it all Out

    Changing MUI settings from MUISETUP command line?

    • 3 Comments

    The other day Paul asked me:

    Hi Michael,

    Been playing around with MUISETUP from XP SP2 (with just Australian English installed), trying to get Japanese installed as the default language. I'm running the following from the command line:

    muisetup /i 0411 /d 0411 /l /f /r /s

    Followed by a reboot, which changes the logoff/logon screens to Japanese, but when I log back on as the same user I ran MUISETUP as, the menus are in English.

    I'm sure I'm missing something, but how do I change this for that user so all the menus etc change to Japanese from the command line?

    Rgds

    Paul

    Paul, you aren't missing anything -- the documented MUISETUP.EXE command line parameters do not support setting the user default UI language of the user in whose context the code is run. Per the documentation, here are all the supported command line parameters for Windows 2000/XP/Server 2003:

    Command Prompt Setup

    To enable quiet mode installations, Muisetup.exe accepts parameters entered at the command line. This can be useful either during an unattended installation of the Windows 2000 MultiLanguage Version or simply during the addition and/or removal of user interface languages.
    To use the command line parameters, use the command prompt to navigate to the directory containing the Muisetup program, and then type:

        muisetup.exe

    followed by any of the following switches:

        /i (specifies the user interface language(s) to be installed)
        /d (specifies the default user interface language that will be applied to all new user accounts)
        /u (specifies the user interface language(s) to be uninstalled)
        /r (specifies that the reboot message should not be displayed)
        /s (specifies that the installation complete message should not be displayed)

    When using the /i, /d, and /u switches, the languages must be entered using their four-digit hexadecimal Language ID values. Language IDs should be separated by a space, as in the following example:

        muisetup.exe /i 0411 0409 0c0a /d 0411 /u 0414 040c

    Kind of says it all -- you can install/uninstall UI languages and you can change the UI language for new accounts (which also handles .DEFAULT which does the logon screen UI language). No integrated logon user UI language changing. 

    For that setting Paul is looking for, you have to go the route of the unattend file and intl.cpl....

    Now of course in Vista there must be a whole lot of changes -- I'll have to cover those changes soon!

     

    This post brought to you by М (U+041c, a.k.a. CYRILLIC CAPITAL LETTER EM)

  • Sorting it all Out

    Sometimes what a person really wants is a LACK of size

    • 21 Comments

    Over the time I have had this blog, I have often had occasion to say nice things about work that the Shell team does. One of the main reasons for this is a particular characteristic that its members tend to have,  one where if there is not an easy way to do something, they don't just complain about the lack. They write the code to get the job done.

    Now sometimes this doesn't work out as well, like in the tons of shlwapi functions that do string comparison, which just leads to confusion as to which function to use. But that's not what am I going to talk about today; this is a "glass half full" post. :-)

    For many years, the MAX_PATH constant of 260 characters has been getting to be progressively more annoying. As hard drives get bigger and therefore can contain more and more files, it has become easier and easier to run into real problems where you can create files in adirectory that can no longer be easily accessed due to a path that is to put it simply too freaking long (where too freaking long is defined as 260 characters!).

    This is a problem that can get bad faster in international environments due to path names being even longer in languages that tend to have longer words. But in truth no one is immune to the problem.

    Historically, no one has been willing to do anything about it since obviously solving the problem in one component can simply expose it in another, and in the end no one actually gets a fix to the problem. The sheer scope of it keeps people away.

    Anyway, in Vista the Shell team decided to try to do something about it.

    Now first they have some obvious fixes, like using shorter special directory names. I mean \Documents and Settings becoming \Users and getting rid of the zillion My prefixes snd so on was a positive step. A small one, but how many times has the problem been trying to access paths starting with \Documents and Settings\All Users\Application Data\Microsoft when \Users\All Users\Microsoft would have done?

    Ok, that is nice but really that is just them trying to fix a problem that they did most of the work in initially causing. So kudos for the reversal, but why was I claiming this was going to be a "glass half full" post?

    Well, it is actually one of the efforts that took place in Vista, a feature whose official name I am not entirely sure about but which can be best described as "auto path shrinking."

    The idea is simple enough -- when a path turns out to be too long, start shrinking individual elements in the path one by one until it is short enough to fix within MAX_PATH.

    Now this is obviously cooler and much more usable than a call to GetShortPathName, which will just shrink the whole thing and make the filename less usable. By keeping as much of the path long as they can (especially the file name) that big problem with nested paths with too many long directories is easily solved (it does not take too many subdirectories with GUID names to hit a MAX_PATH problem!).

    I have never seen anyone else suggest this as a possible mitigation for the whole class of MAX_PATH problems that it helps to solve, and it is work that they put into the bulk of the Shell functions dealing with paths. While obviously not solving every instance of the problem, it at least proves that there are ways to try to make things work.

    Anyway, just thought I'd take a moment to say kudos to the Shell team for being a group to actually take the first positive steps I have heard about in a long time to try to help out with the MAX_PATH problem. Thanks!

    To take advantage of auto path shrinking, you do need to not turn off short name generation. So start spreading the word that NtfsDisable8dot3NameCreation may not always be in the best interests of people, no matter what you may hear elsewhere. Just something to consider....

     

    This post brought to you by  (U+0cad, a.k.a. KANNADA LETTER BHA)

  • Sorting it all Out

    The Romanian keyboard layout on XP is the brokenest layout of all

    • 8 Comments

    Technically, the word broken represents an absolute; this means the terms brokener and brokenest are really not words.

    But they say that it is the exception that proves the rule, and the Romanian keyboard on Windows NT represents the surefire god awful damnedest exception in the history of keyboards!

    First we'll start with the layout as it was in Windows 95, as shown in the first edition of Developing International Software for Windows 95 and Windows  NT, in appendix Q here, specifically the AltGr shift state:

    Now you can how the top row contains a bunch of dead keys, right?

    Well, when this keyboard was ported from Win9x to WinNT, the person doing the porting made a small set of errors. basically not defining any of those dead keys.

    Since this small omission keeps the keyboard from creating 77 different characters, a not entirely insignificant amount of which are needed for the Romanian language, it is fair to say that the keyboard is broken.

    Cristi had been trying to tell people about this for a while but was having problems getting the message sent to the core team that owns the keyboard layouts, so it really wasn't until just a few years ago that he bumped into me online and after looking into it I put in the bug entitled "Romanian keyboard is wrong, Wrong, WRONG!", which we finally managed to fix in Vista by restoring those dead keys (and those 77 characters) to their former glory....

    Which of course left us with one additional problem.

    You see, we added two new keyboard layouts for Romanian to Vista, one following the Romanian national standard and the other a commonly used phonetic layout.

    And the two new layouts use the characters with comma below rather than cedilla below (the characters we updated fonts for, mentioned here and here).

    (Another post coming up soon will talk about the larger issues surrounding these letters in national standards and in software!)

    But there was this huge question of whether to update the old layout we fixed to use the new characters or to instead keep the ones that were originally there.

    The particular commitment to never change a layout really speaks more to the intent of the layout (what a person would expect to be typed) rather than the literal characters. So it was a non-trivial question.

    But in the end, the decision was made to have the older keyboard keep typing the older, non-preferred letters, which have at least one real advantage -- they are on the relevant default Windows, OEM, and ISO code pages -- an important consideration in legacy applications.

    Like I mentioned before, the history of these letters, which makes for a fascinating story, is coming up soon and will discuss this issue further. :-)

     

    This post brought to you by ț (U+021b, a.k.a. LATIN SMALL LETTER T WITH COMMA BELOW)

  • Sorting it all Out

    I guess we're not exporting the Zune just yet

    • 24 Comments

    I bought a Zune the other day.

    This is something of a departure for me, as I have not bought a portable music player smaller than a Tablet or laptop since the Walkman I picked up over two decades ago. And generally speaking I don't go for the latest entertainment devices, something that has been true since I bought an Atari 5200 around the time I bought that Walkman.

    But I figured what the hell, I'd go to the MS Company Store and pick one up. :-)

    I bought a white one, which is (according to Carolyn) the least cool of the three colors. But I never claimed to be cool, so that was no problem.

    So I set up my new cool Zune.

    I found that I was quite happy with the sound quality and especially happy that I was able to sync the over 25gb of music from my Lattitude D820 and still have enough room for all eight episodes of Love Monkey in WMV format.

    (that would be Paul Bryan and Aimee Mann from the episode entitled The One That Got Away)

    But I admit I was a little bit less happy when I scrolled through those many gigabytes of music.

    As you can imagine, I have lots of music on my machine that is imported from other countries, and lots of that music is not in English.

    It was very cool to see it in the Zune app on my laptop:

    It even included my humorous playlists! 

    On the other hand, it was decidely doubleplusuncool to see what it looked like on the device:

    Hmmmm. Well, I guess I have to install some fonts. I better go find out how....

    But I find article 928210 in the MS Knowledge Base (Boxes (□) or other characters appear instead of letters in the name of a song or an album when you browse for music on your Zune device) which confirms my diagnosis with some decidely un-international text:

    CAUSE

    This behavior occurs when the name of the media contains characters that are not part of the United States English (en-us) font that is installed on the Zune device.

    followed by a suggested seven step workaround, step #6 of which is:

    Modify the album name or the song name so that it contains U.S. characters to help you identify it.

    Hmmm.

    I asked around and nobody had any advice on installing some more fonts. :-( 

    What are we republicans that we assume that US text == English text? Grrr.

    Well, I am not going to return my Zune. But the behavior as well as the text of the KB article (which does not even put in the text that this is a bug) are bad enough that the Zune makes it to the Unicode Lame List of What's [Internationally] Weak this week....

    Microsoft Typography to the bridge! Chekov, fire an MS UI Gothic torpedo at the Zune to port!

    Update 3:14 PM: In the final insult, this somewhat obnoxious KB article has translations into Japanese, Simplified Chinese, Traditional Chinese, French, German, Italian, and Spanish, all of whom screw up the NULL glyph.

     

    This post brought to you by (U+30d0, a.k.a. KATAKANA LETTER BA)

  • Sorting it all Out

    Strangely Symbolic font issues

    • 3 Comments

    Chad's question was simple enough:

    Hello. I am trying to output character 0x80 from font Wingdings 3.

    I add the text as:

    tszText[2] = {0x80, '\0'};

    I create the font as:

    HFONT hFont = CreateFont(14, 0, 0, 0, 0, 0, 0, 0, SYMBOL_CHARSET, OUT_DEFAULT_PRECIS, CLIP_DEFAULT_­PRECIS, NONANTIALIASED_QUALITY, FF_DONTCARE, _T("WingDings 3"));

    I output the text as:

    TextOut(hDC, iLeft, iTop, tszText, 1);

    This produces the square that shows that the character isn't found.  However, if you look at the character map, it is there.

    If I change the text to output something like 0x21, then it has no  problem displaying the character. Yes, the Wingdings 3 character is  properly picked.

    Any ideas what the problem may be?  

    Indeed, if you look at the font, the character clearly appears to be where Chad was expecting it:

    The key to understanding the problem here is something I covered last year in More than you ever wanted to know about CP_SYMBOL.

    Basically, within the font itself, this symbol may be at position 0x80.

    But to NLS (and thus any attempt to convert the text via CP_SYMBOL), it is actually going to be at U+f080. Which of course points to the easy workaround....

    Interestingly, fileformat.info (the provider of all those Unicode links) claims that U+F080 is not a valid unicode character, depite the fact that it returns data about the Unicode block it is in. I also noticed that it is not using CharUnicodeInfo to get its Microsoft data, which I suppose might be a cool enhancement to that site. :-)

    But in the meantime it causes interesting problems if you don't use those PUA assignments, especially given all of the random places in GDI, GDI+, RichEdit, Word, and Avalon where sometimes 0x80 will work, and other times it won't, up to and including random problems like the one I pointed out in U+0080 is not the Euro!. You may see tht currency symbol popping up in random places if the font isn't Wingdings 3!

    For what it's worth, there are suggestions in the Unicode community to encode many of the widely available symbols not already in Unicode, such as ones contained in the Webdings/Wingdings fonts. It is unknown whether that will happen or not (and even if it did it is unlikely that the fonts would be changed; only new fonts would use the new code points if they were added), but one advantage to this approach is that it removes the problems that symbol fonts can so easily run into.

     

    This post brought to you by  (U+f080, a character in the PRIVATE USE AREA)

  • Sorting it all Out

    Vista turns on everything

    • 12 Comments

    The question that came in was:

    Hello All,

    Does anyone tell me how to turn off advanced text services in Windows Vista?

    It's funny but I feel like I have spent nearly 50 years this past half decade answering the opposite question -- how to make sure that advanced text services were turned on, and how to make sure that their functionality was extended to all programs....

    In the end, the answer was that this was user selectable and quite easy to turn off both features -- just uncheck the top box and check the bottom one (obviously you can't check both of them since extending the support of something that is turned off just doesn't make a whole lot of sense -- extensibly non-functional? Not as pithy of an oxymoron as other common ones like thunderous silence or military intelligence or girlie man, but still kind of fun.

    Now to get to this latest question how to turn off this functionality in Vista), the answer is you can't. The new dialog gives you many choices:

    But there is no choice to turn it off. Which I am personally quite happy about, not just for the sake of all of those developers and ISVs who needed to have the support on for work they were doing or their apps were, but also for the sake of the many languages in Vista that you would not be able to type in at all with this support turned off.

    So just like the fonts that people can count on being there and the old "feature" of optional NLS support going away, Vista is stopping a lot of this "stunt language functionality" nonsense. Which is a good thing....

    So, I guess you could say that not only is advanced text services in Vista a turn on, but you can also say that now that this sassy little feature is so exciting and excitable that it will never be a turn off! :-)

     

    This post brought to you by 𝌝 (U+1d31d, a.k.a. TETRAGRAM FOR JOY)

  • Sorting it all Out

    Converting a project to Unicode: Part 0 (The introduction)

    • 18 Comments

    It was over a year ago that Jeff D. asked in the Suggestion Box:

    Michael,

    What I'd be interested in reading (assuming you haven't already given us one) is a primer on how to go about converting a non-Unicode app to Unicode compliance... for those of us who are starting to "see the light", as it were.

    Cheers,
    ~Jeff D.

    P.S.
    Forgive me if you've already written an article like this and I simply haven't found it yet.

    First, let me say that I hope Jeff is still around and that he wasn't holding his breath waiting for me to answer. :-)

    Although I left the question "on the books" for all this time, I was hesitant to post about it because it really seems like a more a tutorial that one might expect and less of a blog topic.

    But I kind of thought that the next time I had to take a purely non-Unicode app and convert it to Unicode that it would be cool to perhaps try to start an answer to Jeff's question by covering what was involved, and what I did to minimize the amount of work while maximizing the productivity.

    The last app that I had in mind for this idea that I had to do this with (long before Jeff posted his question) was converting the Windows build tool kbdtool.exe to the kbdutool.exe that MSKLC ships and uses. I specifically decided not to write about the kbdtool --> kbdutool conversion though, for a few reasons. First of all, the source for kbdtool.exe doesn't ship in the SDK, so no one could really see the project I did the work on, and second of all, the work was long done and it is easy to forget lots of details that long after the fact.

    Plus I didn't want to just do something that wasn't related to what I'm doing.

    And finally, I decided if I was ever going to do it I wanted it to be something that might be generally useful for others, too.

    Tall order? Definitely.

    And then earlier today, a project that fit my criteria pretty much fell in my lap!

    In response to a bug that was reported earlier today:

    BUG Title: MSKLC 1.4: The bootstrapper, setup.exe, is not Unicode?

    Repro Steps:

    1. Launch MSKLC 1.4
    2. Customize any key so we can build a keyboard layout
    3. Point the working directory to a path that contains Unicode only chars like a Hindi char
    4. Build the keyboard layout
    5. Launch the created setup.exe

    Result: Setup was unable to find the msi package or patch.

    Suddenly I had a project to convert --  the SETUP.EXE Bootstrap sample from the Platform SDK that Heath Stewart mentioned a while back!

    As a bonus, perhaps one day the Windows Installer folks might want to pick it up and include in the SDK. :-)

    Now I realized there are many ways to do this kind of thing in a blog.

    I decided to take the approach of a multiple posts that anyone could follow along with if they have the project on their machine.

    This is a nice small project (~3500 lines of code or thereabouts) that I can use to show how I approached the idea.

    It also has some other cool aspects I'll be able to point out as I go along (and also a few aspects that don't apply here that I will be able to point out as well!).

    The actual amount of time it took me for this project gong from A to Zed to a compiling Unicode version that works was two hours and three minutes. But this series will take a bit longer as do it again and take the time to comment on what I am doing each day.

    Hopefully everyone will find this a fun and/or useful way to spend some time over the next week or so....

    Each day, the sponsoring character will be a random character not found in any Windows code page that would be suitable for step #3 in the repro steps of the bug, given earlier. :-)

     

    This post brought to you by  (U+0915, a.k.a. DEVANAGARI LETTER KA)

  • Sorting it all Out

    What's in a name? (once more)

    • 16 Comments

    Not all of Microsoft's customers who are involved with software development understand about localizability.

    This is something for which they should be held blameless, since not all of Microsoft's employees who are involved with software development understand about localizability. :-)

    I was reminded of both of these things the other day when the following customer question happened to pass my field of vision:

    I would appreciate if you could get us local names of the "NT AUTHORITY\System" account for all different language Windows Oss. The following article lists some of them but the list probably is not full.

    We just need the entire list. We use an API to get the localized name but would like to have a work around if the API call fails.

    Thank you very much,
    Vladimir

    Setting Up Windows Service Accounts

    Ok, so obviously one can point to blog posts like Administrator vs. Administrateur, et. al. and such, and one can talk about using well-known SIDs and the LookupaccountSID function and so forth.

    In fact this is what some people did.

    But clearly we are fighting an uphill battle when topics in SQL Server 2005 Books Online provide tables like this one:

    Language Name for Local Service Name for Network Service Name for Local System Name for Admin Group

    English

    Simplified Chinese

    Traditional Chinese

    Korean

    Japanese

    NT AUTHORITY\LOCAL SERVICE

    NT AUTHORITY\NETWORK SERVICE

    NT AUTHORITY\SYSTEM

    BUILTIN\Administrators

    German

    NT-AUTORITÄT\LOKALER DIENST

    NT-AUTORITÄT\NETZWERKDIENST

    NT-AUTORITÄT\SYSTEM

    VORDEFINIERT\Administratoren

    French

    AUTORITE NT\SERVICE LOCAL

    AUTORITE NT\SERVICE RÉSEAU

    AUTORITE NT\SYSTEM

    BUILTIN\Administrateurs

    Italian

    NT AUTHORITY\SERVIZIO LOCALE

    NT AUTHORITY\SERVIZIO DI RETE

    NT AUTHORITY\SYSTEM

    BUILTIN\Administrators

    Spanish

    NT AUTHORITY\SERVICIO LOC

    NT AUTHORITY\SERVICIO DE RED

    NT AUTHORITY\SYSTEM

    BUILTIN\Administradores

    Russian

    NT AUTHORITY\LOCAL SERVICE

    NT AUTHORITY\NETWORK SERVICE

    NT AUTHORITY\SYSTEM

    BUILTIN\Администраторы

    Now this is not evil in and of itself, but it really describes nothing of what is underneath these names.

    And most importantly, it says nothing of the all-important issue regarding the fact that these account names can be changed, and often are changed for security reasons.

    In the end, any dependence on a list like this is a recipe for trouble. And including it in the form that they did is just a bad idea.

    The argument for wanting the list is that they want it in case the Win32 API function fails. But how far is using an account name going to get a person if the Win32 API security related functions are failing?

    The solution is described in KB 157234 (How to deal with localized and renamed user and group names), with a title that underscores the importance of the issue (though personally I would have put the word "localized" second since some people might skip it without reading the whole title if they figure it does not apply to them!).

    But the answer to the question in the title (What's in a name?) is really both everything and nothing....

    SQL Server Books Online needs to do better with this sort of thing. I think it is great when PSS picks up the slack and covers an issue well, because it is their job to do that when product groups screw up. But in an ideal world, we don't make them work as hard as we tend ro do, if you know what I mean. :-)

     

    This post brought to you by А (U+0410, CYRILLIC CAPITAL LETTER A)

  • Sorting it all Out

    Converting a project to Unicode: Part 2 ('Sorry, you're not my type.' 'Um, maybe I could change that?')

    • 14 Comments

    Previous posts in this series (including today's!):

    The alternate title of this post is not something I would ever recommend using in a conversation about a relationship! 

    Ok, first we'll start with the source code, you can either get it from the Platform SDK as I pointed out in Part 1, in the Samples\SysMgmt\Msi\setup.exe\ folder, or you can just download it from right here. So far nothing has happened to it -- this is our Tabula Rasa.

    So now let's start dirtying it. :-)

    I am going to stick it in its own folder (E:\SETUP.EXE\), navigate there from within Visual Studio 2005 Command Prompt and use either NMAKE to do a build (I have tested with the Platform SDK command prompt and others as well, the steps should work either way).

    For starters, I'll make sure it can build:

    Ok, good start. We probably won't be seeing success much until we are done so let's not get too used to it!

    The first thing that has to happen is a search for uses of Unicode that are happening now -- obvious ones like types such as WCHAR, LPWSTR, and LPCWSTR. Clearly if there is anything that supports Unicode now we want to know where it is so that we can look at it more closely later when we decide what to do with it. Here are the results:

    Find all "WCHAR", Whole word, Subfolders, Find Results 1, "All Open Documents"
    E:\setup.exe\vertrust.cpp(193): WCHAR *szwSetup = 0;
    E:\setup.exe\vertrust.cpp(194): WCHAR *szwPackage = 0;
    E:\setup.exe\vertrust.cpp(226): szwSetup = new WCHAR[cchWide];
    E:\setup.exe\vertrust.cpp(256): szwPackage = new WCHAR[cchWide];
    E:\setup.exe\utils.cpp(232): WCHAR *pch = 0;
    E:\setup.exe\utils.cpp(238): if ((pch = (WCHAR*)LockResource(hGlobal)) != 0)
    Matching lines: 6 Matching files: 6 Total files searched: 10

    Find all "LPCWSTR", Whole word, Subfolders, Find Results 1, "All Open Documents"
    E:\setup.exe\vertrust.cpp(62):itvEnum IsFileTrusted(LPCWSTR lpwFile, HWND hwndParent, DWORD dwUIChoice, bool *pfIsSigned, PCCERT_CONTEXT *ppcSigner)
    E:\setup.exe\setupui.h(87): HRESULT __stdcall OnProgress(ULONG ulProgress, ULONG ulProgressMax, ULONG ulStatusCode, LPCWSTR szStatusText);
    E:\setup.exe\setupui.h(88): HRESULT __stdcall OnStopBinding(HRESULT, LPCWSTR ) {return S_OK;}
    E:\setup.exe\setupui.cpp(372):HRESULT CDownloadBindStatusCallback::OnProgress(ULONG ulProgress, ULONG ulProgressMax, ULONG ulStatusCode, LPCWSTR /*szStatusText*/)
    E:\setup.exe\setup.h(85):itvEnum IsFileTrusted(LPCWSTR szwFile, HWND hwndParent, DWORD dwUIChoice, bool *pfIsSigned, PCCERT_CONTEXT *ppcSigner);
    Matching lines: 5 Matching files: 6 Total files searched: 10

    Find all "LPWSTR", Whole word, Subfolders, Find Results 1, "All Open Documents"
    Matching lines: 0 Matching files: 0 Total files searched: 10

    Find all "MultiByteToWideChar", Whole word, Subfolders, Find Results 1, "All Open Documents"
    E:\setup.exe\vertrust.cpp(225): cchWide = MultiByteToWideChar(CP_ACP, 0, szSetupExe, -1, 0, 0);
    E:\setup.exe\vertrust.cpp(233): if (0 == MultiByteToWideChar(CP_ACP, 0, szSetupExe, -1, szwSetup, cchWide))
    E:\setup.exe\vertrust.cpp(255): cchWide = MultiByteToWideChar(CP_ACP, 0, szPackage, -1, 0, 0);
    E:\setup.exe\vertrust.cpp(263): if (0 == MultiByteToWideChar(CP_ACP, 0, szPackage, -1, szwPackage, cchWide))
    Matching lines: 4 Matching files: 6 Total files searched: 10

    We'll set these lists aside for later, now that we know what they are.

    Now of course this sets us up to the first set of changes -- using the handy dandy Replace in Files dialog:

    Note the important settings here -- all open documents, match whole words, and using the Find Next button to review followed by the Replace button if we like the change.

    LPSTR -> LPTSTR (less than 25 occurrences, no surprises other than perhaps a few that are on our "review later" list)

    LPCSTR -> LPCTSTR (less than 100 occurrences, no surprises)

    char --> TCHAR (over 100 occurrences, no big surprises in any of them)

    Well, there are actually a few issues here that I will be talking about in later posts, but we're starting with the simple approach. :-)

    And of course with these changes, note that we are not doing Unicode builds yet, so that so far the build will keep working.

    Some things to look forward to here that I'll be talking about in upcoming posts:

    • hardcoded strings
    • GetProcAddress calls
    • A few specialty functions
    • Some cch/cb issues
    • a few non-Unicode ("A") functions being called specifically
    • The MSLU/Win9x question

    Believe it or not, what was done today is likely the second most widespread change we'll have to do for this project!

    Wasn't that easy? :-)

    Stay tuned for the next step....

     

    This post brought to you by  (U+0d17, a.k.a. MALAYALAM LETTER GA)

  • Sorting it all Out

    It's not just ordinary script, dude -- it' superscript!

    • 9 Comments

    The mail I got via the contact link from arya said:

    Hello Sir,

    I hope this finds you well. There is something that is driving me crazy. We want to dynamically superscript numbers in excel. We can only do this for superscript 1,2,3 using the CODE() funcition. However, when you look under verdana it has superscripts 4,5,7,8 but they are under Unicode(hex) and are private use characters. There codes U+F00B etc will not work.

    Is there anyway we can accomplish this?

    Thanks,

    Arya

     No need to involve the Private Use Area, since all the digits are available in superscript form! See:

    • ⁰    U+2070    SUPERSCRIPT ZERO
    • ¹    U+00b9    SUPERSCRIPT ONE
    • ²    U+00b2    SUPERSCRIPT TWO
    • ³    U+00b3    SUPERSCRIPT THREE
    • ⁴    U+2074    SUPERSCRIPT FOUR
    • ⁵    U+2075    SUPERSCRIPT FIVE
    • ⁶    U+2076    SUPERSCRIPT SIX
    • ⁷    U+2077    SUPERSCRIPT SEVEN
    • ⁸    U+2078    SUPERSCRIPT EIGHT
    • ⁹    U+2079   SUPERSCRIPT NINE

    They are all there....

    As for why they are out of order (a common question that people ask), the first three were inherited from ISO 8859-1, and the rest of them were added all at once to fill out the set. Characters cannot be moved, so the odd placement is just the way things are. There are many people who wonder whether some kind of conceptual ordering in tools like the Windows Character Map (charmap.exe) would make sense here, though in practice it might prove to be harder to find things at times. Not to mention all the arguments about whether all the ones go together, or sll the superscript things, or all the numbers, and so on.

    For what it is worth, things went better with their "below the line" cousins the subscript numbers:

    • ₀    U+2080    SUBSCRIPT ZERO
    • ₁    U+2081    SUBSCRIPT ONE
    • ₂    U+2082    SUBSCRIPT TWO
    • ₃    U+2083    SUBSCRIPT THREE
    • ₄    U+2084    SUBSCRIPT FOUR
    • ₅    U+2085    SUBSCRIPT FIVE
    • ₆    U+2086    SUBSCRIPT SIX
    • ₇    U+2087    SUBSCRIPT SEVEN
    • ₈    U+2088    SUBSCRIPT EIGHT
    • ₉    U+2089    SUBSCRIPT NINE

    Their original names in Unicode of several of these numbers prior to the merge with ISO 10646 were SUPERSCRIPT DIGIT * rather than just SUPERSCRIPT *, though since they are not really digits in the conventional sense (their Unicode category is No rather than Nd), this was probably a good change coming out of the merger....

    As you may or masy not see in the above, in many cases on different browsers and different platforms, you will not always see the full set. Alternately, some digits may not look like they are using the same style. This is due to the fact that not all fonts cover all 20 characters, so you are seeing fonts being linked or substituted in.

     

    This post brought to you by ¹ (U+00b9, a.k.a. SUPERSCRIPT ONE)

  • Sorting it all Out

    Pretty damn close to top of the line

    • 11 Comments

    Anyway, it wasn't too long ago that I got that dream laptop, a Dell Precision M90.

    They kind of screwed up the order a bit (it came installed with the x86 version of Windows XP SP2 when I had ordered the 64-bit version, and the folks in support did not realize it actually was a 64-bit system until I found a really helpful person who went up to the Intel site and looked at the CPU specs in question and realized that it was indeed 64-bit hardware.

    But I told them I worked for MS so I could get the OS, I just wanted it marked in their records that I was using 64-bit Windows, which he did, so now all is good. The only 64-bit driver I couldn't find on the Dell site was for the Synaptics touchpad (which I found on the Synaptics site) and the modem (which I couldn't find anywhere). But it has been so many years since I've used a modem that I decided I didn't mind. :-)

    I did have to borrow a USB thumb drive from Shelby to get the NIC driver on the machine, but that was probably the only hard part about it.

    The system is marked Windows Vista capable, so after installing Windows 2003 64-bit in a nice snug 15gb partition on the 100gb 7200 RPM drive, I installed 64-bit Vista on the partition made up of the remaining space.

    One amazing part about the setup was that it ran so fast on this clean install that it took more time to run the rating of my system performance than the actual installation. :-)

    And then another amazing part was that all of the drivers that I had to install manually before were there on the CD when I installed Vista (with the exception of the Synaptics Touchpad driver which I once again got off the Synaptics site and the modem driver, which I still didn't need anyway).

    After it took so long to run that "checking my performance" step I figured I should look and see what it decided. So looking at that basic info page:

    And when I clicked on that Windows Experience Index link:

    Now looking at Nick White's post with the in-depth look at the Windows Experience Index, the machine is somewhere between the high end of desktop replacement laptops and top end of the market PCs. Which kind of makes sense. :-)

    Maybe an AMD dual proc would have done better than that Intel Centrino dual core (had that been an option, which it wasn't), and I guess the memory is at the bottom of the top (had it been faster it would have pushed me to a 5.0 score), but it still does really well. And Vista does scream on the machine; performance is excellent and Vista is looking really good, too. I don't think there is anything I can do to improve either the memory test or the processor test, and I couldn't find a faster hard drive at the moment, so I think this is going to be my score for now....

    Now in both Server 2003 and in Vista, that whole limitation around not being able to see 4gb is still there (as you can see above with the latest BIOS update it got me up to seeing 3326mb). But those devices look to be making use of that memory so I won't quibble.

    And test installs of 64-bit keyboards are also looking good, which I admit has nothing to do with the machine's performance (but it has something to do with mine if you want a hint at one of the things the machine is doing at this very moment!).

    I also have all of the Vista MUI language packs and LIPs installed and will keep installing them; you may see a screenshot from time to time here when it makes sense, like for some posts about MUI that are coming up soon....

    As a small side complaint, the D-Bay port is on the back of the machine instead of the side, an unfortunate regression from the Latitude D800 series when having that port on the side gives one an extra port when docked (since it is on the back the port is blocked when the machine is in the docking station, so you have to get by with the one port on the docking station itself). But I guess the folks who plan out the actual laptop hardwre architecture don't get too bogged down by that sort of detail!). But this is a minor issue on a machine that doesn't have a huge need for extra peripherals in the D-Bay anyway....

    So anyhow, if you are looking for a top of the line machine that can run Vista, this is a choice I would highly recommend. By the time they train their support folks to realize that Intel has 64 bit hardware, they fix the website, and 64-bit Vista is officially being released on this machine, they may even have the modem driver thing figured out. :-)

     

    This post brought to you by D (U+0044, a.k.a. LATIN CAPITAL LETTER D)

  • Sorting it all Out

    Converting a project to Unicode: Part 3 (Can I quote you on that?)

    • 14 Comments

    Previous posts in this series (including today's!):

    (If you are just tuning in and want to start now you can grab the current source from here.) 

    The biggest source of actual changes in most conversions of legacy projects to Unicode is handling hard-coded strings. The simple fact is that what you might have in your code as

    "This is a string"

    and which in a purely Unicode application would be

    L"This is a string"

    now will have to be either

    1)        TEXT("This is a string")

    or

    2)        _TEXT("This is a string")

    or

    3)        _T("This is a string")

    I myself prefer the third one but many people like the first or the second. You can look at the MSDN topic Using Generic-Text Mappings to get more information on _T and _TEXT; the one with no underscore prefix is actually defined in the Platform SDK header file winnt.h and this is the reason why it is used by Windows header files that do not want to include tchar.h in their source files.

    (If you are bored, the section of winnt.h with the // Neutral ANSI/UNICODE types and macros comment is where these all are.)

    Since we will need tchar.h for a few CRT functions you can pretty much take your pick -- the other reason some people prefer the shorter one is that they consider it less distracting (I consider them all to be about the same in that respect).

    I am going to use TEXT() for reasons I will point out shortly.

    There are several different ways to approach this kind of change:

    1. Compile the program with UNICODE/_UNICODE tags and catch all of the compile errors that yesterday's changes will cause due to the mismatch between Unicode types and non-Unicode strings
    2. Simply find every occurrence of " (U+0022, a.k.a. QUOTATION MARK) and ' (U+0027, a.k.a. APOSTROPHE) and each time it is appropriate, surround the quoted string with TEXT() or _T()
    3. Use regular expressions to do the find and replace, either using the VS :q (quoted string) or its equivalent in your tool, e.g. (("[^"]*")|('[^']*'))

    If you prefer #1, then you may want to skip tody's post and wait for Part 4, tomorrow (which is when we will be doing that). Today is dedicated to taking care of over 100 cases without the complie-time checking....

    I tend to prefer the #3 myself, so your find/replace box in VS will look something like this:

    The most important things to note are the syntax for tagging an expression (in VS, surround it in curly braces) and then use the tagged expression in the replaced string (in VS, the \# where # is 1-9 which tagged expression to use).

    There are many strings you won't want to affect, including obvious ones like

    #include "common.h"

    and there are even a few "already done" strings like this one from common.h in the source:

    #define ISETUPPROPNAME_BASEURL              TEXT("BASEURL")

    Note that this code is probably shared with other Windows Installer source code projects (like maybe msistuff.exe's?), which would be why it is written with Unicode in mind even though most of the rest of the project is not. And why it would have a name like common.h.

    If this convinces you that you would rather use TEXT() to be able to use the same thing in the rest of the project then like I said you can use whatever you like  (it is what I chose here!).

    The other bonus is that it will keep us from having to include tchar.h to files for just this definiton (if you have been trying it you will see that the source is still compiling right now, before e move it to Unicode).

    Of course function names are another case where you do not want to wrap them in TEXT macros since the function names will go to GetProcAddress calls. So you would wrap "advapi32.dll" but you would not wrap "CheckTokenMembership" (a function inside advapi32.dll). Though if you mess this up don't worry, it will be a simple compile error later, very easy to fix....

    One other interesting string that needs special handling:

    "\""

    Which we want to become:

    TEXT("\"")

    and not

    TEXT("\")"

    obviously. The simple regular expression is not quite smart enough for the escaped quote case (there are like five of these). if you want to try and create a more complex regular expression you are welcome to!

    In any case, hopefully I have convinced you that you will definitely want to be careful about your use of Find Next vs. Replace -- and definitely not be tempted by Replace All. :-)

    Other things you do not need to "fix" are pretty much anything in the makefile, or anything in a comment (unless you want to amuse future code reviewers).

    Now after you go through all of these, you will have noticed that 56 of the strings to edited were calling one of the three overloads of the DebugMsg function found in utils.cpp. I would recommend you go ahead fix them up too, since (a) you have already changed their datatypes anyway, (b) they all call OutputDebugString which will map to OutputDebugStringW after we compile UNICODE, and (c) there is no harm in seeing Unicode text if you run a debugger that supports Unicode. :-)

    Amazingly, we are much, much closer now!

    We'll do one more big find/replace in today's post. There are several places in the source code where GetProcAddress is being called to get the address of a ANSI function rather than a Unicode one. Let's fix those up right now. You could search for GetProcAddress, but in this project (as in most other projects) it just goes to constants. Just remember (like I said before) -- you always want to make sure that you do not put the TEXT() macro wrapper around funtion names since GetProcAddress's second parameter never expects a Unicode string. You DO weant them around library names and just about everything else.

    The easiest way to find all of the occurrences is the following search:

    It is pretty rare to ever have a string that ends with a capital A that you wouldn't want to become a capital W, so although you will want to check each one, you are unlikely to have a ton of noise in the results....

    Believe it or not we are getting rather close now (tomorrow we're going to take the next step to find major things to look at).

    Stay tuned....

     

    This post brought to you by (U+1003, a.k.a. MYANMAR LETTER GHA)

Page 1 of 5 (65 items) 12345