Blog - Title

October, 2006

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    "àèìòù" < "äëïöü" but "àèìòù " > "äëïöü"

    • 4 Comments

    You may remember my post I need my SPACE, symbolically speaking from this past March.

    There are some interesting consequences of this behavior, which I thought I would talk about a bit further since they have been the subject of several recent bug reports....

    Let's take a simple string like

    àèìòù (U+00e0 U+00e8 U+00ec U+00f2 U+00f9)

    and compare it with

    äëïöü (U+00e4 U+00eb U+00ef U+00f6 U+00fc)

    Just pass them both to CompareStringW using 0x0409 for the LCID, and you will find that "àèìòù" < "äëïöü". But if you add a space to the first string, then you will see that "àèìòù " > "äëïöü".

    Huh? How'd that happen?

    Well, let's look at the sort keys of each of the three strings we are looking at here:

    "àèìòù"
    0e 02 0e 21 0e 32 0e 7c 0e 9f 01 0f 0f 0f 0f 0f 01 01 01 00

    "äëïöü"
    0e 02 0e 21 0e 32 0e 7c 0e 9f 01 13 13 13 13 13 01 01 01 00

    "àèìòù "
    0e 02 0e 21 0e 32 0e 7c 0e 9f 07 02 01 0f 0f 0f 0f 0f 01 01 01 00

    Aha, things maybe are a little clearer now. The letters have consistent weights, as do the diacritics. And so the first string comparison sees equal primary weights but a difference in the secondary weights. And that second comparison sees a difference in the primary weights, so suddenly the order is reversed. Oops!

    Now this will happen with any symbol (or for that matter anything with a primary weight, but for some reason the SPACE and similar characters have results that seem less intuitive!), though simply passing NORM_IGNORESYMBOLS will cause the space or other symbol to be ignored.

    Now this is the first example. I will give some of the others in a later post. And maybe some thoughts about how the issue of intuitive results could perhaps be looked into, and why the solution is less obvious than it may seem at first....

    Have I scared anyone yet? If so, then Happy Halloween! :-)

     

    This post brought to you by " " (U+0020, a.k.a. SPACE)

  • Sorting it all Out

    Mapudungun is not a map to a dungeon

    • 6 Comments

    I am wracking my brains over here in SIAO Plaza.

    It was easy enough to come up with interesting titles talking about LIPs for اردو, Inuktitut, മലയാളം, Qhichwa Simi, فارسی, isiZulu, ಕನ್ನಡ, नेपाली, Afrikaans, कोंकणी, Setswana, বাংলা, తెలుగుਪੰਜਾਬੀ, Lëtzebuergisch, татарча, and Nynorsk.

    But then Soren had to go and send mail out talking about the Mapudungun Language Interface Pack for Windows XP and how it was now available.

    I can't think of either anything that rhymes or upon which I can build a clever pun. I guess I'll have to go with "Mapudungun is not a map to a dungeon" and call it a day. I have had worse, as my regular readers will likely attest.

    A bit about Mapudungun:

    Number of speakers:  ~350,000

    Name in the language itself:  Mapudungun

    Mapudungun is an Amerind language spoken by about 250,000 people in the central valley of Chile and another 100,000 in the Argentinean region of Patagonia. It has no official status and lacks protection and promotion from official side so far - but the support of the Chilean government for a Mapudungun LIP can be considered a first substantial step.

    Mapudungun features an interesting grammar in which animate nouns are distinguished from inanimate ones, there is not only singular and plural but also a dual (so there is, for example, I, the two of us and we) and an extremely rich scheme of verb conjugation. The last allows to express very dense information with only a few words, like in piñmalkan (from pin = to say) which means to criticize someone who is, along with others, present, but without confronting him/her directly.

    Mapudungun is also know as Mapudungu, Mapuche, Araucano, Araukano or Araucanian. Despite the fact that the ISO-639-2 code for the language is "arn", the name Araucanian and its related forms are nowadays avoided by most linguists and native speakers.

    Interesting facts:

    • The name Mapudungun is derived from mapu (earth) and dungun (to speak), so literally meaning language of the earth.
    • The study of Mapudungun was the domain of Jesuits for a long time, from Luis de Valdivia's grammar (1606) to Bernard Havestadt's 3-volume study (1776).
    • Via Spanish the Mapudungun word cauchu (nomad, traveller) has come to English - as gaucho.
    • Mapudungun is the second Windows XP LIP for South America and the fifth one based on Spanish as a base language (the others being Basque, Catalan, Galician and Quechua)

    Classification: Despite attempts by scholars to establish relationships with other Amerind languages, Mapudungun has to be considered an isolate, meaning that it has no known relatives.

    Script: The Latin alphabet is used, extended by ñ and ü..
     

    Enjoy!

     

    This post brought to you by M (U+004d, a.k.a. LATIN CAPITAL LETTER M)

  • Sorting it all Out

    If you are more sensitive, you'll pick up on more problems

    • 5 Comments

    (this post is not about relationship advice!) 

    Regular reader Dean Harding pointed out a few days ago when I talked about When collations collide?:

    Yeah, this is a real problem when you're developing an application that can be installed on somebody else's instance of SQL Server.

    You basically HAVE to develop it on a case-sensitive instance locally, otherwise one of your users invariably has a case-sensitive instance installed and your app breaks. They get upset if you tell them to install it on a new instance that is not case-sensitive :)

    I couldn't agree with this point more, and not only because I suggested the same thing in the post Case/kana/accent/width sensitive SQL Server, for testing back from May of last year. :-)

    The reason why case/kana/accent/width sensitivity finds more bugs is actually kind of a cultural issue for people who do database work as opposed to programmers.

    In database design there is not usually as much of a conscious effort to use that case sensitivity as a differentiation feature -- so that if you use the uppercase vs. lowercase vs. capitalized version of an identifier, it is more likely to be bug in the actual application, if something does not work, rather than explicit attempt to use multiple forms.

    Now this is in sharp contrast to some programming language cases, where (for example) it is not uncommon to see developers who use specific conventions like "lowercasing of internal variables/parameters vs. proper casing/camel casing for public properties/methods" where the "collision" one would find in a case insensitive environment would be intentional. And of course in programming languages one cannot usually flip a setting to change the behavior like one can do with SQL Server -- one's choice is implicit in one's language decision.

    Of course if one is providing a database or stored procedure or UDF or sample in SQL Server, ideally one will test both scenarios since it is impossible to know where the information will be used. But if one has to choose, being more sensitive will (in the end) reveal more problems, which makes it a better choice....

    (Which is also probably true of relationships, though I am the last person in the world who one should be listening to on that score!)

     

    This post brought to you by (U+13ea, a.k.a. CHEROKEE LETTER WE)

  • Sorting it all Out

    Semper ubi sub ubi while doing translation badly

    • 1 Comments

    It was a few years back that Julie and Cathy were laughing about the Latin phrase Semper ubi sub ubi, meaning "Always wear underwear." Or I suppose one could take the inverse, Joey Tribbianiesque approach and say Nunquam ubi sub ubi (meaning "never wear underwear") instead.

    Of course, as any Latin speaker can explain, both of these attempts at translation are wrong.

    They both rely on the fact that the Latin word ubi means WHERE (i.e. "the place in which"), which is a homophone for the word WEAR (i.e. to have placed on one's person, like a shirt). So what the phrases actually mean are Always where underwhere and Never where underwhere -- which is to say they are nonsensical to a native speaker even if you add a bunch of punctuation (luckily there are none now, or else they might fail to "get the joke" even moreso than Germans don't understand why people in the US think Sprockets is so funny).

    Latin is an especially handy language to do this in, since it is used in so many interesting contexts, enough so that people can puzzle out what it is, realize the joke, and have a little fun. It is almost a joke made for people who are learning Latin.

    Even online machine translation can usually do better than this, though unfortunately that is because they are usually too unsophisticated to mix up homophones, which would mean they were smart enough to find errors when trying to translate, too. So in other words the quality that it requires to understand this joke is one that feels like a mistake for machine translation to understand, though isn't that possibly a flaw in our model of machine translation, which is trying to so perfect yet is unable to understand the jokes that even beginning students of Latin coming from English understand?

    Perhaps machine translation needs to spend more time in the mistakes, and in the imperfection. Since its goal is to translate for humans, who are also imperfect.

    Of course this is a development that is less likely to happen as long as it is a project led by software companies that deal with highly formal and technical content, and arena where such examples are aberrations, not goals to aspire to....

    But, let's say for a moment that Machine Translation reaches the maturity of its current goals; this will likely not get us closer to the AI-ish world of handling the masjority of the world that is not formal documentation. I mean imagine even smal projects like this one or an attempted localization of that Jack Winter piece I brought up before.

    Anyway, in an admittedly lame attempt to honor all of this, I will provide this reworked version of George Carlin's "Affair of the Hair" schtick:

    Not sure why some stare at my underwear.
    In fact, it's not fair,
    But some really despair of my underwear.
    But I don't care,
    Cause they're not aware,
    Nor are they debonair.
    In fact, they're just square.

    They see underwear down to there,
    Say, "Beware" and go off on a tear!
    I say, "No fair!"
    A crotch that's bare is really nowhere.
    So be like a bear, be fair with your underwear!
    Show it you care.
    Wear it to there.
    Or to there.
    Or to there, if you dare!

    My wife bought some underwear at a fair, to use as a spare.
    Did I care?
    Au contraire!
    Spare underwear is fair!
    In fact, underwear can be rare.
    Fred Astaire got no underwear,
    Nor does a chair,
    Nor nor a chocolate éclair,
    And where is the underwear on a pear?
    Nowhere, mon frere!

    So now that I've shared this affair of the underwear,
    I'll admit "Nunquam ubi sub ubi" in my lair where I go bare, do you care? 

     :-)

     

    This post brought to you by (U+a282, YI SYLLABLE WA)

  • Sorting it all Out

    SQL Server: compatibility collations vs. Window collations

    • 7 Comments

    The other day when I talked about When collations collide, John Ingres commented:

    We've been looking at the implications of moving our database from

        SQL_Latin1_General_CP1_xxxx

    to

        Latin1_General_xxx

    since it is the recommended practice to use Windows collations instead of the legacy SQL collation but this is a large change affecting more than 30 applications and over a hundred production systems and I have been investigating differences between those two collations and information is extremely scarce. Is there a source of information with detailed information? Of course, we will test exhaustively but subtle differences in sort order for example are not always easily apparent.

    thanks

    John 

    There is really no single source of info John, but as the question becomes more and more common for people to ask, it will become more and more important for the folks on the SQL Server team to provide some of those answers (and to provide more consistent messaging around the right collation to use).

    (That the defaults will also need to change for those times that the SQL Server product itself is using for its server default collation goes without saying!)

    In the meantime, I have posted here about some of the real problems with the legacy SQL_* collations in posts like these:

    and others as well. I mean, there is room for improvement in the Windows collations too (and there are posts that point out some of the issues there) but they are all much easier to deal with than the ones embedded in the SQL compatibility collations, which even at their best are just a mess that will never be changed or addressed or fixed.

    Some of these posts focus on the international support of these legacy items, which ranges from lame to meager. But other posts focus more on a lot of the consistency problems with these collations, which simply don't match user expectations. They are definitely something to really think about getting rid of, if you can....

    But I don't want to let the SQL Server team off the hook here -- they really need to provide more information on the differences here if they want people to migrate. Their solution is more compatible with prior versions than the one used in the Unicode changes from Jet Red 3.5 to 4.0 (they simply moved from all strings being non-Unicode to all strings being Unicode, period, and I can recall a conversation I had with Ken Whistler a few year back where I pointed this out and he was even more horrifed than I was!).

    But some effort to underscore the benefits and especially the migration issues and differences to expect is really the least they can do at this point (in fact I'd say this a long overdue work item).

     

    This post brought to you by (U+122d, a.k.a. ETHIOPIC SYLLABLE RE)

  • Sorting it all Out

    Why don't the keyboards select themselves when you install them?

    • 4 Comments

    Keith asks via the Contacting Michael link:

    I have been using MKLC for some time now and I have a question about it. 

    It has been at times hard to explain to my users how to use the keyboards after running setup.

    Is there a way to make that automatic? The steps in the help file are a bit much for some of them.

    MKLC is great though. Thanks for it!

    Keith

    Excellent question, Keith.

    There is no way to do this automatically at the moment, though it has been a feature that has been requested from several different users and one we are considering for the next version. Because despite the exciting features in Windows that support multiple users logging in and such, the most common scenario is still the one person who would be installing the keyboard also being the one person who plans to use it. Given that simple fact, it makes a lot of sense to just put the keyboard in the user's Language Bar....

    I'll keep people posted on the next release as soon as there is something to say about. A handy, small list of important features seems to be making itself known right now (and yes, 64-bit support is one of the items on that list!).

    On a side note, I always wonder what makes some people turn Microsoft Keyboard Layout Creator to MKLC rather than MSKLC, even though all of the documentation from Microsoft points to the latter and not the former. Is it a "purity of acronyms" kind of thing? :-)

     

    This post brought to you by (U+17a6, a.k.a. KHMER INDEPENDENT VOWEL QII)

  • Sorting it all Out

    You can just byte me

    • 0 Comments

    Evan asked in one the many programming aliases:

    Hi:

    Anyone knows why there are 3 extra characters added to the XML file saved via XmlDocument?

    I viewed the file from a hex editor and found 3 characters (0xEF 0xBB 0xBF) are added to the XML file saved.

    I did a simple test to verify that:

              XmlDocument doc = new XmlDocument();
              doc.Load(“test.xml”);
              doc.Save(“test2.xml”);

    I created test.xml in Notepad and view it with hex editor to make sure the first char is “<” (0x3C). And when I view test2.xml, I found the 3 extra characters. These characters are not viewable and don’t affect Notepad, IE, VS.NET from viewing it at all.

    I wonder what are these characters needed for?

    Thanks,

    Evan

    Indeed these three bytes are the well known and somewhat controversial UTF-8 incarnation of the Unicode Byte Order Mark. The controversy is of course whether it is needed in UTF-8, and it comes up on a somewhat regular (though thankfully infrequent) basis....

     

    This post brought to you by U+FEFF, a.k.a. ZERO WIDTH NO-BREAK SPACE)

  • Sorting it all Out

    for(int iMoonTrip = 0; iMoonTrip == 0; iMoonTrip++)

    • 3 Comments

    Figured I should get some Hungarian notation involved if I was going to post about Charles Simonyi's efforts to be the first nerd in space!

    If you are thinking that the code in the title is a silly and wasteful way to run a loop one time, I'll just point out three things:

    • Any halfway decent optimizer will probably get rid of it anyway;
    • It was the only way to make sure Community Server did not expand "iMoonTrip < cMoonTrip" to "iMoonTrip &amp;lt; cMoonTrip" in the title;
    • You are probably also a geek and should inquire whether Simonyi will take you with him as checked baggage (he can be seen at http://www.charlesinspace.com/)

    Pretty exciting stuff!

    This post brought to you by (U+263d, a.k.a. FIRST QUARTER MOON)

  • Sorting it all Out

    When collations collide?

    • 5 Comments

    Praveen asks in one of the SQL Server aliases in Microsoft a question about an issue that us not very well understood:

    Hi

    Let me know if this is not a appropriate question for this DL.

    My Sql Server (2005) has case sensitivity turned on. I create a database with case sensitive option turned off (SQL_Latin1_General_CP1_CI_AI), create a table and create a stored procedure. The creation of stored procedure gives me a error indicating that case sensitivity does not work for the datatype in variables. Is this true? Is there some other option I can specify during the create database statement so that this will work too?

    Thanks
    Praveen

    CREATE DATABASE  Test1 COLLATE SQL_Latin1_General_CP1_CI_AI
    go
    use Test1
    go
    CREATE TABLE table1(        Column1         nvarchar(64) NOT NULL  )
    Go

    create proc proc1
        @Column1 nvarchar(64)
    as
    begin
        if exists (select * from table1               where
                   @column1 = Column1)
        begin
            return -1
        end
        return 0
    end

    Error message during the sproc creation:

    sp_helpdb Test1Msg 137, Level 15, State 2, Procedure proc1, Line 8
    Must declare the scalar variable "@column1".

    Bart was very quick to point out where the issue is documented:

    See the “Identifier Collation” topic in BOL:

    The collation of an identifier depends on the level at which it is defined. Identifiers of instance-level objects, such as logins and database names, are assigned the default collation of the instance. Identifiers of objects within a database, such as tables, views, and column names, are assigned the default collation of the database. Variables, GOTO labels, temporary stored procedures, and temporary tables can be created when the connection context is associated with one database and then referenced when the context has been switched to another database. Therefore, the identifiers for variables, GOTO labels, and temporary tables are in the default collation of the instance.

    The topic that Bart refers to is right here and has links to several related topics about collation at various levels. None of them really talk about the additional issues related to security, given the 2-3 different possible login logics for comparing user names (server collation, local machine using the NT object namespace, or domain login usually but not always using Active Directory).

    But I thought Praveen's question might get the ball rolling here, with what may be one of the easier scenarios to understand after it is explained. Future topics may delve more into the ones that actually are just as hard to fathom after you understand them as they are before. :-)

     

    This post brought to you by (U+0e01, a.k.a. THAI CHARACTER KO KAI)

  • Sorting it all Out

    'Managing' [List] Separator Anxiety

    • 0 Comments

    Now I have praised the folks in GIFT Ireland in the past, like in this post and also this one, for example.

    But when you are not talking to someone every day, you sometimes forget they are there. Luckily, sometimes they find a way to remind you that makes you really happy about their presence. This post is an example of that....

    The other day, John Caffrey (an SDET across the puddle) sent the following in mail:

    Hi guys,

    Quick question if that’s ok… so during the testing of Locale Builder, we’ve noticed that the CARIB ignores the list separator data when loading it from an LDML file (for a replacement locale).

    Is this a known issue with the CARIB? If so, can you maybe send me the relevant bug # so I can link our bug?

    Thanks!

    J. 

    There was a brief pause because I had not seen the mail at first and both Shawn and Tarek were out that day, but by the next day I had looked at the code and was able to comment:

    This is an unintentional bundling of a user overridable setting (the list separator) with a non overridable setting (the code pages) -- no one noticed this to copy the setting over.

    Worth putting a bug in and easy enough to fix, though I'm not sure where it would be triaged...

    In my mind I was thinking that the code in the CultureAndRegionInfoBuilder dates back to the original stuff I had written, which meant that (a) this bug was probably my fault though (b) this at least partially mitigated by the fact that many people had modified and tested the code since and no one had noticed. :-)

    Since it was not intentional, I assumed that problem would be limited to trying to set CultureAndRegionInfoBuilder.TextInfo (which fails in replacement cultures) but that setting the CultureAndRegionInfoBuilder.TextInfo's actual TextInfo.ListSeparator property should work just fine. I had not actually tried it yet or anything like that before Erich Barnstedt (a dev from the Ireland team) responded to my reply to John's mail:

    Yeah, that was my hunch since it sits in TextInfo. We’ll work around it in Locale Builder by reading the value directly from LDML and re-applying it to the CARIB after loading from LDML. Thanks for the quick response,

                    Erich

    Things I think about in my head but don't share with others do not count as ideas, and I certainly didn't put any thought into what to do with this information, so Erich's public statement of the idea really allows him to be credited with inspiring me to write the following code that tests the theory (while Erich put the actual fix into the MS Locale Builder code):

    using System;
    using System.Globalization;

    namespace Testing {
      class oooo {
        [STAThread]
        static void Main(string[] args) {
          CultureInfo ci;
          string stCulture;

          // First figure out the name
          if(args.Length > 0) {
            stCulture = args[0];
          } else {
            stCulture = CultureInfo.CurrentCulture.Name;
          }

          // Create the culture and say what it is
          ci = new CultureInfo(stCulture, false);
          Console.WriteLine("\r\nUsing the following culture: '{0}' ({1})\r\n", ci.DisplayName, ci.Name);

          // Create the replacement and fill it
          CultureAndRegionInfoBuilder carib = new CultureAndRegionInfoBuilder(stCulture, CultureAndRegionModifiers.Replacement);
          carib.LoadDataFromCultureInfo(ci);
          carib.LoadDataFromRegionInfo(new RegionInfo(stCulture));

          // Change the list separator as appropriate
          if(args.Length > 1) {
            carib.TextInfo.ListSeparator = args[1];
          } else {
            carib.TextInfo.ListSeparator = ";";
          }

          // Return some status
          Console.WriteLine("List separator value");
          Console.WriteLine("  ...in the old ci: '{0}'", ci.TextInfo.ListSeparator);
          Console.WriteLine("  ...in the carib:  '{0}'", carib.TextInfo.ListSeparator);

          // Register the replacement
          carib.Register();

          // More status
          ci = new CultureInfo(stCulture, false);
          Console.WriteLine("  ...in the new ci: '{0}'", ci.TextInfo.ListSeparator);
          ci = null;

          // Unregister -- cleanup is important in samples
          CultureAndRegionInfoBuilder.Unregister(stCulture);

          // Even more status
          ci = new CultureInfo(stCulture, false);
          Console.WriteLine("  ...after removal: '{0}'", ci.TextInfo.ListSeparator);
        }
      }
    }

    So if you have .NET 2.0 on your machine you can stick this code into a file named carib.cs and compiled it from the .NET command line as follows:

     csc carib.cs /r:sysglobl.dll

    The command line syntax is easy enough:

     carib.exe - uses the current culture and changes the separator to a semicolon

     carib.exe <culturename> - uses the specified culture and changes the separator to a semicolon

     carib.exe <culturename> <separator> - uses the specified culture and changes the separator to what is specified 

    On my machine (which is set to a default user locale of en-AU to look at issues that this guy has been talking about!):

    e:\test>csc carib.cs /r:sysglobl.dll
    Microsoft (R) Visual C# 2005 Compiler version 8.00.50727.42
    for Microsoft (R) Windows (R) 2005 Framework version 2.0.50727
    Copyright (C) Microsoft Corporation 2001-2005. All rights reserved.

    e:\test>carib.exe

    Using the following culture: 'English (Australia)' (en-AU)

    List separator value
      ...in the old ci: ','
      ...in the carib:  ';'
      ...in the new ci: ';'
      ...after removal: ','

    e:\test>

    So there you have it.... an easy workaround for the oversight about the list separator (which at this point pretty much everyone wishes were just a part of CultureInfo rather than TextInfo anyway, since the latter is supposed to be about the writing system and the list separator is defnitely more of a cultural preference!). And in the meantime it will be addressed in the final release of the Locale Builder when it goes out.

    Special thanks to both John and Erich for not only finding my bug but also thinking ahead to the fix and making the Locale Builder behave more intuitively! :-)

     

    This post brought to you by  ,  (U+002c, a.k.a. COMMA)

  • Sorting it all Out

    It feels good to help others, doesn't it?

    • 3 Comments

    Last Christmas, a man named Byron had a problem:

    Due to an unfortunate and marriage-threatening miscommunication, I have an Xtra Xbox 360 that appeared under a tree this morning (magical video-game elves in the forest or something). Before we return one to the retailer, is there any interest in buying one from me?

    Now most people were of course not on email since it was literally Christmas night 2005, but as usual I was (as a lonely Jew on Christmas and all). But I couldn't bear to see a marriage dissolved due to hardware problems and games, so I stepped up and offered to help. :-)

    I have been resistant to playing games on computers for a long time, mainly because I already spend so much time on the computer that I figured if I got hooked on an XBox no one would ever see me again. So that XBox with a wireless controller, headset, ethernet and HD cables sat on my shelf for a long time, never opened and never used. But I figured at least I had contributed to someone's marital bliss, so I didn't mind. I knew I'd be able to do something with it,some day.

    So the last time I was back in Cleveland (for my grandmother's 90th birthday), I mentioned to brother-in-law Zach (well, brother-out-law I guess!) if he had an XBox. He started to explain that he didn't, and he was actually holding off on the XBox 360 and waiting for Sony's new offering. I mentioned that I had this XBox if he was interested (subject to approval by my sister, his wife -- there is no sense saving Byron's marriage if I ruin my sister's, right?), and he said yes so fast that it scared me.

    I think this turn of events did disappoint my father a bit since he was the backup in case my sister decided to ixnay on the xboxay, but I think he got over it. He was still enjoying the present that Zach, Meredith, and I went in on for their anniversary, after all.

    Anyway, I was reminded of all of this the other day when Meredith sent me a piece of email where (among other things) she pointed out to me "You have created a monster in my husband with that damn xbox" which was probably hyperbole since a) I recall a smiley there and b) Zach was a technofreak long before the XBox (at the time he was just waiting for the latest PlayStation to come out, after all, and he corrupted my father into the church of Tivo and HD long before!).

    But it tells me that I may have helped convert a member of the family away from the evil Sony if he was enjoying the XBox, at least. And he obviously is enjoying it. Plus now I know that games would make a suitable gift in the future....

    It feels good to help others, doesn't it? :-)

     

    This post brought to you by (U+26ad, a.k.a. MARRIAGE SYMBOL)
    (A proud member of Unicode since version 4.1)

  • Sorting it all Out

    Out of [implied] range

    • 1 Comments

    The question was something like:

    I have a japnese OS and japnese .net installed on it. The DateTime format is set to gregorian calendar in japnese and when I execute the following piece of code I am getting an argumentoutofrange exception. Any idea what could be the problem

    Console.Writeline(constants.Data.SqlTypes.SqlDateTime.MinValue);

    However If I do it by providing cultureinfo.invariantculture to the string things appear to work fine. Is it the case that minvalue is culture independent and cant really be converted to a String in certain cultures like JPN?

    As a first step, looking at my prior post Long Live the Emperor, it has the earliest supported date in the Japanese Emperor calendar as being August 9th, 1868.

    As a next step, System.Data.SqlTypes.SqlDateTime.MinValue has as its earliest supported date January 1, 1753.

    So clearly, the most likely explanation for the problem here is that the Gregorian calendar is not what is being used here.

    Perhaps this is actually a good object lesson about being careful about default data type coercions (such as this one between SqlDateTime and System.String) -- because it can be hard to predict how that coercion might happen across multiple people and multiple versions....

    So let's see if we can figure out what did happen. Obviously the .NET Framework felt comfortable with the coercion, so it must have figured it would be lossless. Let's start by looking at all of the Console.WriteLine() overloads....

    Hmmmm. No DateTime overload. Ok, let's go to the DateTime.ToString overloads....

    Aha! If you do not pass a CultureInfo object (which obviously the implicit call does not), then you do not get any special overrides based on the CultureInfo object that has been modified to use the Gregorian calendar. But according to the DateTime.ToString() topic, "The value of the current DateTime object is formatted using the general date and time format specifier ('G').", described in Standard DateTime Format Strings as representing "...a combination of the short date (d) and short time (t) patterns, separated by a space." Which maybe it is doing, but it does not mention what CultureInfo it will be using, or whether overrides/changes in the CultureInfo will be used.

    Or maybe the person asking the question did not actually set the calendar to Gregorian properly, and everything is working as expected?

    An interesting problem not documented terribly well, in any case -- and a great argument to be explicit when it comes to formats? :-)

     

    This post brought to you by   (U+1806, a.k.a. MONGOLIAN TODO SOFT HYPHEN)

  • Sorting it all Out

    It is a challenge to make ClearType irrelevant

    • 14 Comments

    DPI (dots per inch) which I have discussed before previously) has two entirely different and somewhat-at-odds uses. It can

    • Increase the apparent sizes of fonts on the screen without modifying the applications using the fonts, or
    • Increase the sharpness of the text on the screen while not generally changing the size of applications

    Of course most people only ever see the first use, because if they change the DPI that is what they see. There is no setting to scale down the font sizes in the Shell automatically while increasing the DPI, do you get to watch everything seem to grow.

    But as I point out here, it is quite possible to make everything look quite good and have apps be the same apparent size as they were when it was 96 DPI if you scale down the Shell font sizes to match as you jack up the DPI setting.

    In a wider sense, the reason ClearType exists is because this setting does not happen even with new LCD screens where it could happen, and it works by faking a higher DPI. But after hearing Peter Constable talk about using an LCD screen's "natural resolution" (in my case on a Latitude D820, 1680 x 1050) and by combining that with scaling down the Shell font sixes (7 pt Segoe UI instead of 9 pt), apps look about the same size as they always did but sharper.

    In theory (or maybe even in practice!) you could take that even further, jack it up to 300 DPI and push the font sizes down even further. You get to the point where everything is still the same size but ClearType becomes irrelevant, like you cannot even tell the difference between when it is on and when it isn't (other than the fact that you don't have to worry about the ClearType problems I talked about in You say it 'looks good on paper?' It must not be using ClearType, of course!).

    However, it is so uncommon for people to muck with the font settings, and honestly it is not very common to even use anything other than 96dpi or 120dpi due to the difficult UI for both of these settings and the lack of any intuitive/automatic connection between them. To add insult to injury, if you change the system locale, custom DPI settings are lost, a bug that exists in every version of XP and is still not fixed in Vista.

    Is it an accident that most spell checkers suggest that Segoe is a misspelling of Segue? What does that mean? :-)

    If I were a more paranoid and cynical person than I am and I were outside of Microsoft, I would wonder if this was not a huge NLS/typography/shell/ClearType conspiracy to keep ClearType as a relevant technology even as monitors push the envelope to make it less potentially relevant.

    Of course being on the inside I am pretty confident that there is no such cabal, as the interaction of all of these things is mostly accidental. :-)

    Maybe a cool DpiPlusSysemLocale applet needs to be put together that coordinates these disparate settings so that they ll work together. Now THAT would be a PowerToy!

     

    This post brought to you by (U+0f85, a.k.a. TIBETAN MARK PALUTA)

  • Sorting it all Out

    Sometimes in the future 'ANSI' is really going to be unsupported!

    • 4 Comments

    The question to the microsoft.public.win32.programmer.international newsgroup was simple enough:

    TITLE: RegLoadMUIString Vista P Invoke

    Hello,
    I'm trying to get an MUI string out of the registry in display friendly format. From what I've read, strings in the following format:
    “@[path]\dllname,-strID” are MUI strings and receive special handling via the RegLoadMUIString API call.

    This is what I have come up with for the call:

    [DllImport("advapi32.dll")]
    internal static extern long RegLoadMUIString(IntPtr hKey, string pszValue, StringBuilder pszOutBuf, int cbOutBuf, out int pcbData, uint Flags, string pszDirectory);

    I'm calling this function with a valid pointer to an open registry key, passing in the appropriate value key (pszValue) which has the MUI formatted string. When I check the output buffer (pszOutBuf) it's always an empty string.

    Has anyone been able to get this call to work? I cannot find any examples on the web.

    thanks,
    -bp

    A few exchanges back and forth where he showed me how he was calling the function showed three problems: one caused by him, one caused by the .NET Framework team, and one caused by the MUI team. I'll explain all three of these problems here....

     The problem that was bp's fault was that several Win32 API functions were being called, but the return value was not being checked. So a functionality error was being reported with an implicit assumption that function calls succeeded when there was really no evidence of success. Luckily this same person put the code up on the MSDN Forums in this post where this problem was fixed, and the failure was now known -- it was returning error 120, which can be found in winerror.h:

    //
    // MessageId: ERROR_CALL_NOT_IMPLEMENTED
    //
    // MessageText:
    //
    // This function is not supported on this system.
    //
    #define ERROR_CALL_NOT_IMPLEMENTED 120L

    Ok, this leads us to the next problem, one that in my opinion is caused by the .NET Framework. For the sake of backward compatibility with VB4, VB5, and VB6, all P-Invoke calls that do not have charset information attached default to use the "A" version of functions. This means that even though the code is running in .NET where all the strings are Unicode on Vista where the registry and everything else is Unicode that everything is being dealt with as if it were a non-Unicode string, and the non-Unicode version of functions is being called.

    Remember when I wrote about how The Unicode train is leaving the station and was quite clear that there would no longer be non-Unicode function calls added to the NLS API? And how we'd be recommending to other teams that they do the same, either the way we did with FindNLSString (not even decorating the name with a "W") or the way the Shell team did with StrCmpLogicalW (a "W" decoration), no "A" version is being provided.

    Anyway, we now hit the final problem, which in my opinion is the MUI team's fault. They have only provided a Unicode implementation, but have provided both RegLoadMUIStringA and RegLoadMUIStringW, one of which the precompiler will convert RegLoadMUIString to depending on whether UNICODE is defined.

    RegLoadMUIStringA is, in fact, not supported.

    But if you look at the requirements section in the docs, they clearly claim that the function is "Implemented as RegLoadMUIStringW (Unicode) and RegLoadMUIStringA (ANSI)."

    For those who are interested (someone who spends time on the MSDN forums can pass the word if it has not been answered there yet!), the code that will work on Vista and solves all three of these problems is:

    using System;
    using System.Text;
    using System.Runtime.InteropServices;
    using Microsoft.Win32;

    namespace RegMUITest {
      class Program {
        [DllImport("advapi32.dll", CharSet=CharSet.Unicode, ExactSpelling=true, EntryPoint="RegOpenKeyExW", CallingConvention=CallingConvention.StdCall)]
        public static extern int RegOpenKeyEx(IntPtr hKey, string lpSubKey,int ulOptions,int samDesired, out IntPtr phkResult);

        [DllImport("advapi32.dll", CharSet=CharSet.Unicode, ExactSpelling=true, EntryPoint="RegLoadMUIStringW", CallingConvention=CallingConvention.StdCall)]
        internal static extern int RegLoadMUIString(IntPtr hKey, string pszValue, StringBuilder pszOutBuf, int cbOutBuf, out int pcbData, uint Flags, string pszDirectory);

        [DllImport("advapi32.dll", ExactSpelling=true, CallingConvention=CallingConvention.StdCall)]
        public static extern int RegCloseKey(IntPtr hKey);

        static void Main(string[] args) {
          try {
            IntPtr localMachine = new IntPtr((long)unchecked((int)0x80000002));
            IntPtr regKey;
            int pcbData, retval;

            //NOTE: Open a device key with KEY_READ access rights.
            retval = RegOpenKeyEx(localMachine, @"SYSTEM\CurrentControlSet\Control\Class\{36FC9E60-C465-11CF-8056-444553540000}", 0, 0x20019, out regKey);
            if(retval != 0) {
              Console.WriteLine("RegOpenKeyEx failed with error {0}.", retval);
            } else {
              //NOTE: Build the output buffer reference
              StringBuilder lptStr = new StringBuilder(1024);
              //NOTE: ClassDesc contains the MUI formatted string
              retval = RegLoadMUIString(regKey, "ClassDesc", lptStr, 1024, out pcbData, 0, null);
              if(retval != 0) {
                Console.WriteLine("RegOpenKeyEx failed with error {0}.", retval);
              } else {
                //NOTE: Output values to console
                Console.WriteLine("Reg key : {0}", regKey);
                Console.WriteLine("LPWSTR : {0}", lptStr.ToString());
                Console.WriteLine("pcbData : {0}", pcbData);
              }

              //NOTE: Close the key
              RegCloseKey(regKey);
            }
          }
          catch (Exception ex) {
            Console.WriteLine("Exception : " + ex.Message);
          }

          Console.ReadLine();
        }
      }
    }

     Enjoy!

     

    This post brought to you by  (U+a13e, a.k.a. YI SYALLABLE DDAX)

  • Sorting it all Out

    Typos, the 5th Amendment, the 25th Amendment, and Language Log already did it!

    • 6 Comments

    So old friend Andrea IM'ed me last night (yes, that Andrea).

    She had apparently been watching West Wing reruns on Bravo, and found something that caught her eye....

    The conversation went exactly like I type below as I copied it from the IM window, with her permission. :-)

    Andrea: Is there really a typo in the US Constitution?

    Me: Huh?

    Andrea: It's a chat window, Michael. Repeating myself is not necessary.

    Me: I think I need some context. Like maybe your source?

    Andrea: Toby said on West Wing that he found a typo.

    Me: You understand that this is a fictional show, right? :-)

    Andrea: Yes, I do. Are you saying they made it up?

    Me: This was from like the next to last episode of the show, I think. Is that the one you mean?

    Andrea: Thats the one. Did they make it up?

    Me: At the very end of the convo with CJ, I believe Toby makes a reference to Tom Merrill. Which is actually kind of funny, but only in a very obscure way that most people wouldn't get. Hang on, I'll find the text....

    Ok, found it. I remember looking all this up after the episode. It was a fun in-joke that I doubt people got.

    Andrea: Are you going to tell *me* about it? I don't even know who Tom Merrill is!

    Me: He is a law professor. He had testified before the Senate that the Fifth amendment, claiming that the "Takings Clause", the text of which is "nor shall private property be taken for public use, without just compensation", does not describe a separate restriction of eminent domain on items that are "for public use".

    Andrea: So where is the typo, exactly?

    Me: Well, the problem is that something like half of the copies out there, a comma exists between the word "use" and the word "without".

    Andrea: I admit English isn't my first language. But that doesn't sound any different to me.

    Me: Ok, here is the link I saved to his testimony. Look at Myth #4:

    Myth #4: The original understanding of the Takings Clause limits the use of eminent domain to cases of government ownership or public access.
    Justice Thomas filed a separate dissenting opinion in Kelo, arguing that the Court should return to the original understanding of the Takings Clause, which he claimed limited eminent domain to acquisitions of property for the government or for actual use by the public. Justice Stevens did not respond to Justice Thomas’s opinion, which may have reinforced the impression in some circles that the Court’s decision was a clear departure from the original understanding.
    Unfortunately, other than the language of the Takings Clause itself (“nor shall private property be taken for public use without just compensation”), there is virtually no direct evidence about what the Framers understood by the words “for public use.” The phrase modifies “taken,” and thus clearly establishes that the Takings Clause is about a subset of takings – those for public use as opposed to other possible types of takings. But this narrowing language does not necessarily mean that the Clause imposes an affirmative requirement that a taking must be for a “public use.” It is also possible that the Framers were simply describing the type of taking for which just compensation must be given – a taking of property by eminent domain as opposed to some other type of taking, such as a taking by tort or taxation. This reading would not, as Justice Thomas argued, render the words “surplusage.” No other words in the Clause tell us the just compensation requirement is about eminent domain (the term “eminent domain” did not enter constitutional discourse until sometime later). Moreover, for all his parsing of old dictionary definitions, Justice Thomas never explained why the prohibitory word “without” is placed before “just compensation” rather than before “public use” – a piece of textual evidence that seems to cut against the thesis that the Clause imposes a public use requirement.
    Given the utter lack of direct evidence, the debate over original meaning probably comes down to whether the Framers understood the power of eminent domain from an “English” perspective, reflecting the views of Locke and Blackstone, or from a “continental” perspective, reflecting the views of natural rights thinkers such as Pufendorf, Grotius, and Vattel. The English perspective emphasized the importance of the property owner’s constructive consent to the taking through the owner’s representation in Parliament. If the Framers viewed takings this way, the most plausible interpretation of “for public use” is that it was just descriptive of the power of eminent domain, i.e., a taking of property authorized by the legislature. The continental perspective emphasized that eminent domain should be used only for certain types of public purposes. If the Framers viewed takings this way, the most plausible interpretation is that public use is an implied limitation on eminent domain. Since the Framers left no clues as to which body of thought was more influential in their thinking, the issue cannot be resolved with any certainty. But it would be hazardous to bet against the English perspective, which was almost certainly familiar to more participants in the ratification process.

    Andrea: Hang on, Michael. Maybe I am reading this wrong, but "taken for public use without just compensation" does not sound different than "taken for public use, without just compensation". What does this have to do with anything?

    Me: I didn't  claim it had anything to do with the issue Tom Merrill raised. My understanding was that he was kind of saying that had the phrasing been something like "taken, for public use, without just compensation" that the meaning would be different.

    Andrea: So that's where the comma typo is?

    Me: No, it isn't. But it is within a few words of it, which is why putting a call in to Tom Merrill is humor that is pretty subtle.

    Andrea: Yes, that is *very* subtle.

    Me: Well, they could have gone for a more visceral allusion, like a random case involving self gratification and putting a call in to Paul Reubens (Pee-wee Herman). But maybe that would have been too obvious.

    Andrea: I'll say. {pause}

    You know, punctuation was hardly a science back then.

    Me: Also very true. Look at my blog -- typos a plenty, just thank Bob that I wasn't the constitutional transcriptionist?

    Andrea: The whole country can probably be glad about that.

    Me: Hey, I just found another typo reference in GoogleLive, this one in the 25th amendment. Look here for it -- a much more significant constitutional point if you ask me.

    Andrea: {pause} I'll say! It is much easier to believe in a crisis based on a singular/plural mismatch than the lack of a comma. Too late for the constitutional issue. But maybe you should blog about the singular/plural thing?

    Me: They already talked about it on Language Log. ["Singular they" mailbag]

    Andrea: Maybe you could have a post about "Language Log already did it" or something?

    Me: Nope, South Park already did it, see here: http://en.wikipedia.org/wiki/Simpsons_Already_Did_It

    Andrea: Well, take the chat log from this conversation. Maybe you can make something of that.

    It is unclear whether I have actually made something of it or not. I'll let you folks judge.

    Maybe Andrea should get her own blog!

     

    This post brought to you by (U+3255, a.k.a. CIRCLED NUMBER TWENTY FIVE)

Page 1 of 5 (66 items) 12345