Blog - Title

April, 2005

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    Pronunciational ambiguity

    • 3 Comments

    (light on the technical content, but there is something linguistical-ish here)

    I walked into a used record store not too long ago, looking for a specific CD -- the 1995 edition of Steve Taylor's "Liver" album. The store clerk did not recognize it, but I took a look anyway. And indeed, I found it. When I took it up, the clerk was a little surprised -- he said "oh, thats how you prononuce it -- Līvər rather than Lĭvər" (forgive me for the IPA1, think Lie-ver or Live-er rather than Līvər for the first entry if it helps). But its funny how easy it is to slip on these little examples, especially in a case where the word "live" has more than one pronunciation depending on whether it is a verb or an adjective, and in both cases it is an absolute so that the use of an "-er" or "-est" addition is really inappropriate. It was therefore an intentional play on words, and kind of a clever one (except for the fact that no one seems to know about it, including the store clerk!). From the liner notes, on the back of the CD:

    Dear Sir/Madame

    You hold in your hand all that remains of a single night's concert performed by some band and myself during the fall portion of the Squinternational tour in the year of our lord 1994 so if you're buying this because you weren't there to make an unauthorized recording you should know that this is not one of those so-called live albums where everything has been replayed and resung in the studio except some drums and an audience and even the audience has been enhanced to make it sound like a bigger crowd that's a whole lot more excited than they actually were because this album is so much liver than any live record you've ever heard that you can actually close your eyes and pretend you were one of the paying thousands who watched me sing and fall over every night both of which I did on purpose even though the falling over part hurts more than the singing part so if you can't take the naked truth of a live concert with occasional bad notes and buzzes and feeback of the undesirable variety then go buy some Yanni so-called live record but I guarantee you when you close your eyes he won't sing and fall over because it might mess up his spacesuit.

    Your friend,
    Steve

    Well, the meaning of "Liver" is quite unambiguous when you read the notes, unless of course you are a prescriptive grammarian who is too distracted by the misuse of the language (and of course the the fact that the whole note above was a single run-on sentence!).

    He goes on, inside the non-Jewel case (Aimee Mann hates those things, too -- a hatred I have also managed to acquire after carefully considering the well-thought-out arguments against jewel cases), to define the "some band" which he referred to in the notes [emphasis mine]. And I realized in reading all of this why I like Steve Taylor's songs. Because he is like me, except more clever. He has the same sardonic wit (except witier), the same intelligence (except more intelligent), and he has a sense of humor (I have humor, but no sense of it). I certainly don't share his religious faith, and I wonder to some extent how much that faith has hurt his career ("God rock" seems out of style for many). But I can respect it, and him.

    You may have heard Steve Taylor before without realizing it -- in a song that Steve co-wrote, called Tale of the Twister (you can find it on the Soundtrack to the movie Pump Up The Volume, CD version only). The band (Chagall Guevara) was pretty great too, though I think that is mostly Steve's influence. You can definitely feel it in their albums.

    I realized that there is a common thread in the music that I like. I like the songs that are clever, in part because they make me feel clever for liking them, and for understanding them (when I do). As if the singer/songwriter and I are sharing a joke each time. And maybe we are, whether I know them or not.

     

    1 - I have a colleague who thinks I can learn IPA. She is entirely wrong about that, but if I can have delusions of my linguistic aptitude then so can she! :-)

  • Sorting it all Out

    Not perfect, but perfecting feedback loops

    • 2 Comments

    I was reading Paul Vick's discussion about Imperfect Feedback Loops and it got me thinking a bit about the various community efforts over the years.

    A few years ago I was visting a friend in Orem, Utah and had an opportunity to see Jeffrey R. Holland speak. I did not agree with everything he was saying, but one particular bit seemed to resonate with me. He said something like "I am not perfect and cannot be perfect, but when I am at my best I are perfecting; we all are." As he started to say this many people around me seemed quite shocked (they clearly did think of him as perfect!), but when he continued they understood what he was trying to say to them. He went on talk about perfecting as a process, a way that one can better oneself.

    So I was thinking about those words. Perfect and the state of being perfect (perfection) are absolutes -- one cannot really be "more perfect" after all. But they are also absolutes that we as software people are all unlikely to reach, be it in a product or a process or in anything else. But the word perfecting has a different impliction (imputation? I sometimes mix them up!) -- it suggests to me striving to try to reach that absolute, as a process. The fact that one cannot reach it does not make the goal any less good of an idea to try, does it?

    Though words are not my strong suit here. Perhaps my intuitive definitions do not match what the actual definitions are. :-)

    As I stated in the end of How whining a whole bunch got a feature added, I and all of the people I know on my team and the BCL team are taking bugs and feature requests posted to the MSDN Product Feedback Center quite seriously. The triaging of the issues is done as quickly as we can manage and there is a sincere effort to look at the issue being raised. As Paul states we cannot really address everything (especially this late in the product cycle) but we do work to address everything we can, and what we cannot is put in the hopper for the next version.

    And as I also stated in the article, I take feedback to posts and articles here quite seriously, even if it annoys me (which does happen sometimes). And in several cases others have taken feedback posted here seriously as well. I admit that I do not always report on those issues; perhaps that is something I should work on as a part of my own perfecting process.

    I have found and fixed bugs based directly and indirectly on both the blog and the feedback center, and thus as imperfect as things may be, I really do feel that things are getting better here, not worse. And both processes have a great deal of transparency since both involve talking much more directly with customers and their thoughts, their ideas, their concerns, and their bug reports.

    So, things are not perfect. But were they ever really going to be? The main thing is that they are getting better all the time.

  • Sorting it all Out

    Normalization vs. .NET text elements

    • 12 Comments

    Ok, here is the updated code for that internationally savvy palindrome checker. It supports that interesting situation with ligatures like (U+fb02, a.k.a. LATIN SMALL LIGATURE FL) vs. an lf on the far side, originally suggested by our old friend Maurits (with comments):

    //////////////////////////////////////////
    //
    // IsPalindrome
    //
    // in : a string
    // out: true if the string passed in is a palindrome.
    //
    // NOTES: This function handles both canonical/compatibility
    //        equivalences and grapheme clusters (a.k.a. text
    //        elements) as defined by the Unicode Standard.
    //
    //////////////////////////////////////////
    bool IsPalindrome(string st) {

        // A null string or a ZLS is not a palindrome
        if ((null == st) || (0 == st.Length)) return false;

        // Convert to NFKC and set up the text element detection object
        StringInfo si = new StringInfo(st.Normalize(NormalizationForm.FormKC));
        int count = si.LengthInTextElements;

        for (int i = 0; i < (count / 2); i++) {
            // get the text elements for comparison
            string st1 = si.SubstringByTextElements(i, 1);
            string st2 = si.SubstringByTextElements(count - i - 1, 1);

            // see if the text elements on each side are linguistuically equivalent
            if (CultureInfo.CurrentCulture.CompareInfo.Compare(st1, st2) != 0) {
                // they are not, so it is not a palindrome. 
                return(false);
            }
        }

        // both ends appear to be equivalent; it is a palindrome.
        return (true);
    }

    Now, Maurits went on in his Channel 9 posting to discuss sort elements, or cases when two or more characters are to be given a single sort weight (kind of the opposite of an expansion like these ligatures). However, in my opinion these are not really suitable for a palindrome detection algorithm, as I don't think they are usually treated as letters except in the case where they are also treated as unique text elements (the case covered by the StringInfo code).

    Any native speakers of languages with such constructs as the Spanish ch and the Hungarian dzs who think they should or should not be treated as a unit in trying to detect palindomosity should feel free to leave a comment to that effect. Also, if any of my collagues in the GIFT group agree or disagree here (and they are reading this!) they are invited to do the same (or stop me in the hall and accost me with this information!).

    If I mistaken on this point, then a very interesting problem develops since there is really not an easy method for detecting such cases given the current collation function set (although I can imagine a few avenues of attack and we obviously have the underlying data if we had to support an IsPalindome function in Win32 NLS!). This might even make an interesting interview question one day for a very talented candidate.... :-)

    Now, aside from all that, it is important to note that normalization makes some uses of text elements in this context completely unneccessary -- after all, either technology will treat U+0061 U+030a (a + combining ring) as identical to U+00e5 (a ring), one by conversion and the other by giving identical sort weights. Therefore, there is some overlap between the two technologies. However, there are some differences:

    • Normalization will handle those ligature characters, which although text comparison can usually handle, it cannot when the equivalent sequences are not directly compared.
    • Text element comparisons will handle cases such as U+0061 U+030a on both sides of the palindrome, rather than just handling two different normalized forms.
    • Normalization will help treat larger ligatures like fdfa as equivalent to 0635 0644 0649 0020 0627 0644 0644 0647 0020 0639 0644 064A 0647 0020 0648 0633 0644 0645, something that comparison will likely not ever be able to fully do (as discussed in my FoldString.NET post).
    • Text elements will handle text elements like 0e41 0e0b 0e4c (THAI CHARACTER SARA AE + THAI CHARACTER SO SO + THAI CHARACTER THANTHAKHAT) which may or may not have precomposed forms.
    • Normalization will handle cases like Hangul Syllables versus Jamos, as is discussed here, a fact that is less important now than it will be once font technologies catch up with Unicode and canonical/compatibility equivalences.

    So it is fair to say that both technologies as provided in the Whidbey release are potentially useful in (of all things) the detection of palindromes, today and tomorow. I am sure the people who spec'ed, developed, and tested these feature are very proud for the technologial advances!

     

    This post brought to you by "แซ์" (U+0e41 U+0e0b U+0e4c, a.k.a. THAI CHARACTERS SARA AE + SO SO + THANTHAKHAT)
    The last two codepoints of which make up a text element, but all three of which make up a unique sort element!

  • Sorting it all Out

    Getting enough exercise?

    • 10 Comments

    (No technical content in this post)

    I love Typhoon, a Thai restaurant that has muliple locations, the closest one for me in Redmond about 4 miles away. They have a dish there called General's Noodles with the following description on the dinner menu:

    Egg noodles with chicken, shrimp, fried wonton, sprouts, peanuts, sugar and lime.

    I just love it. I have been in the habit lately when I eat there to order another one to-go and eating later or the next day. It is awesome food. I don't think it is a genuine Thai dish since I have never seen it in any other Thai restaurant. And believe me, I have looked. I asked them if they had a cookbook, but no such luck.

    On the whole I would say it is my second favorite food in all the world (right after stuffed grape leaves, but that is a story for another day!).

    Anyway, I was home yesterday, feeling a bit peckish around 5:00pm. And it was sunny out in Redmond. I had just left a foot of April snow in Cleveland. And like I said it is just 4 miles from here. And there is a cool bike trail next to SR 520 that is a straight shot for most of the way. I looked at my fully charged Pride Mobility 3-Wheel Victory Scooter and considered the fact that its 20-25 mile range was almost certainly on flat ground, not on hills. Would I be able to scoot there?

    I decided I would.

    (People are probably shuddering when they think about where this story may be going, especially in the context of the title of the post. Think of it as FORESHADOWING, a sign of quality literature!)

    Now Pride Mobility scooters have a battery guage on them with colored circles on it -- red for empty, moving into yellow for near empty and then up through to green for fully charged. There is one red, one yellow, and then four greens, each one a little darker. The meter is most effective while you are scooting on level ground -- it is how you know if you really are full or not. Going uphill drops the meter down, and downhill makes it look fully charged when it is not.

    But I jammed quickly, and I made it there is about 40-45 minutes, no problems at all. They messed up my order a little (one of the only flaws of an otherwise wonderful restaurant is the fact that they are not so good taking to-go orders over the phone). But I was in no hurry. A few minutes later I was off, heading home with my bounty. As I left, then even under load I was in the second highest green circle. I figured I was all set.

    Of course that was to change.

    Suddenly, just before I made it to the bike trail ( say about 5 miles into this ~8 mile journey), something happened. The power dropped down to between the last and second last green circle.

    Damn.

    I realized there might still be enough juice to make it home, though I remembered most of the trip home being uphill a little. I was a little nervous, but I figured I did not have too much choice so I should just go for it.

    As I countinued, the meter was poking dangerously in towards the yellow circle, and then suddenly it hit the red and just stopped.

    I got up, took a look at the freewheel lever, pulled it up, and started pushig the scooter. It weighs about 150 pounds, but I figured if I was holding on to it I would probably not fall. After about 50 yards I noticed there was a circuitbreaker reset button, and pressing it gave me back the juice. So I gratefully stopped pushing, turned of the freewheel, and started sooting. I carefully avoided making the scooter go fast enough for the meter to head down to the red circle as that seemed like a surefire way to lose power again, and I slowly plodded home (probably closer to 2mph than the scooter's 5mph maximum, at this point). It looked like I would make it back.

    Or not. Just before 51st St., it decided any hill was too much, and I had to push it all the way back from then on. Mike tells me that it is 1.3 miles that I pushed it, plus a little bit of the time on 40th St. that I was pushing.

    I had a lot of time to think as I was pushing the 150 pound scooter, mostly uphill.

    Mostly wondering if I would ever be stupid enough to let this happen again. And then realizing I probably am that stupid. So as a mitigation strategry planning on how I hould get some extra batteries for it to carry with me for next time, like I can with the smaller scooter. And wondering how best to have someone take a look at the scooter and let me know what is wrong with it now (is one of the two batteries in trouble? Or is there another problem?).

    At one point a biker stopped and asked if there was anything he could do. But I was almost to 40th St. and it seemed like there was no practical way he could help ithout abandoning his bike. I told him thanks very warmly, but that I was almost there and I would make it. I actually felt really good that someone just saw me having trouble and asked if they could help -- even if there is no practical way to do so, his motivations were purely along the lines of "someone is in trouble, how can I assist?" and that is a great thing about people sometimes.

    Anyway, I made it home finally, though I was barely able to walk or even stand for most of the night. I am still a little shaky now but I am mostly recovered. It was probably the most exercise I have had in a while, even more than the show shovelling exercise from last week. I am not anxious to repeat it (and may not make a Typhoon jaunt in the scooter again), but like a time many years ago that extreme circumstances caused me to run on a beach in New Jersey when I was usually having trouble walking, it is good to know that I still have reserves that can be tapped, when needed.

    And I still have those General's Noodles to eat! :-)

     

    This post brought to you by "♿" (U+267f, a.k.a. WHEELCHAIR SYMBOL)

  • Sorting it all Out

    Where did the new StringInfo stuff come from?

    • 16 Comments

    I used it in a very confusing and obfuscated way in Normalization as obfuscation in C#. And then yesterday I used it again in my internationally savvy palindrome checker, in a slightly more intuitive manner.

    It is the all new StringInfo class in Whidbey.

    Now the old StringInfo class had only static methods -- in other words it was a walking FxCop violation.

    And the main method it had was StringInfo.ParseCombiningCharacters, which was a static method that would take a string and return an array of int values, each one of which would be an index into that string that showed where a new text element was started. A text element could be a single letter, a letter and a diacritic, a letter and a bunch of diacritics, a hugh and low surrogate representing a surrogat pair, etc.

    ParseCombiningCharacters is an incredibly useful method, but it is not very intuitive to use, certainly not and use effectively. The same goes for the other methods for dealing with text elements (GetTextElementEnumerator and GetNextTextElement) -- people were just getting confused.

    But people have no problem understanding the need to be able to count entities based on what a typical user might think a character is. Once one explains what a text element is, they immediately understand the need for ways to make use of them.

    So we had some meetings to talk about how to make the ways to work with text elements more intuitive, at least as intuitive as the concept of a text element itself. In the last of those meetings, someone pointed out that people usually had no problem understanding the semantic of the Substring method or the Length property of System.String. Maybe we could learn a lesson from that?

    And viola, the SubstringByTextElements method and the LengthInTextElements property were born!

    Each behaves just like their cousins, the Substring method and the Length property, but rather than being based on UTF-16 code points, they are based on text elements, or what the user might reasonably point to and call a character. The same thing that the Win32 CharNext and CharPrev functions do (at least, when we have not accidentally broken them!).

    Now the method and property are useless if there is not some object that they can hang off of which has the string. People were leery about adding them directly to System.String since they really want to try keep that object as lightweight as they can (and some would even say they are not trying hard enough on that). That's when somebody remembered this class you could instantiate yet had no instance methods, this FxCop violation with a hat. And we added a constructor that takes a string and a StringInfo.String property to retrieve the string later if you wanted or change it without having to tear down the object.

    Now we were rolling....

    Internally, it just uses that incredibly useful but not-so-intuitive StringInfo.ParseCombiningCharacters and stores that System.Int32 array. That makes StringInfo.LengthInTextElements a simple call to Length on the array, and StringInfo.SubstringByTextElements is a simple tip-toe through the array, using the very start and length parameters that the method contains in order to know where and how far to go. So we get to be intuitive and pretty fast at the same time. and we get to get rid of that FxCop issue, to boot. Everybody wins!

     

    This post brought to you by "¾" (U+00be, a.k.a. VULGAR FRACTION THREE QUARTERS)

  • Sorting it all Out

    Looking for that internationally savvy palindrome checker....

    • 24 Comments

    Jonathan Payne asked if I had an international thought about the palindrome pseudo interview question at this site:

    http://channel9.msdn.com/ShowPost.aspx?PostID=19171  

    I did. :-)

    Using the new StringInfo stuff in Whidbey Beta 2:


    bool IsPalindrome(string st) {
        StringInfo si = new StringInfo(st);
        int count = si.LengthInTextElements;

        if (count == 0) return false;

        for (int i = 0; i < (count / 2); i++) {
            string st1 = si.SubstringByTextElements(i, 1);
            string st2 = si.SubstringByTextElements(count - i - 1, 1);

            if (CultureInfo.CurrentCulture.CompareInfo.Compare(st1, st2) != 0) {
                return(false);
            }
        }

        return (true);
    }

     

    Quickest way to handle all those cool issues like cultural sensitivity and combining characters and supplementary characters and such!

  • Sorting it all Out

    Cleaning out the suggestion box a bit

    • 3 Comments

    The theme for this post is going to be disappointing people's hopes.... picking out suggestions that do not go along with the probable desires of those asking the questions.

    The first question comes from marius mihalca:

    Sorry to bother you with this but I found no other way to send you a message. I fallowed all the stept from this article (How to build the 7.1 MFC and the CRT DLLs with MSLU) but my simple VC71 generated application can't display unicode on Windows98 SE.

    Everything sims OK except the fact that instead of bulgarin (or greek) language my application displays ? and _ (or other characters). I am positive that unicows.dll is loaded.

    I tried to make an simple editbox and tried to paste some unicode text. The editbox shows incorect text.

    How can I be sure that my applicatin is using unicows.dll correctly?

    Please help.
    10x

    MSLU is actually behaving as designed here. The Microsoft Layer for Unicode is not, as some people seem to think, a library that provides Unicode support for Win9x. It is, rather, a thin layer over the non-Unicode Win9x that allows one to write a Unicode application. It will still convert those Unicode string parameters from the various Win32 API functions out of Unicode in order to call the underlying operating system. The key is that the Unicode application, when run on an operating system that supports Unicode, will allow one to get full Unicode support.

    This design is the very one that Julie Bennett originally envisioned and that Cathy Wissink and I wrote about for MSDN Magazine.

    Now there are components that provide a degree of actual Unicode support on Win9x, such as Uniscribe, RichEdit, the Shell Common Controls (version 5.80 and later), and others. But those items do not include MSLU, which is not a rewrite of those 550+ APIs but a wrapper around them. I know this may be disappointing to those who were hoping that it was indeed a Unicode solution of Win9x, but it has never been documented anywhere as such a solution....


    On April 13, I heard from AC:

    I'm receiving this question from the developers who'd like to have the possibility to make the language customization for the users of Windows 9x/ME which would allow them to later more gracefully upgrade to NT/2K/XP:

    - how can they add the additional code page to the Windows 9x/ME?

    The goal is to be able to enter unicode characters by typing them. Good enough MS KLC equivalents for Win 9x/ME already exist.

    I know that once was probably considered not so good idea to open the specs of Windows 9x/ME .NLS files. But I guess now it shouldn't be such a problem?

    Well, the answer to this will also not thrill. There is not going to be any kind of opening up of code pages or other .NLS files on Win9x or elsewhere. In practice, this would not help graceful upgrades but it would delay the time before people did upgrade to a NT/2K/XP by offering an incomplete solution for other languages today, in a way that hurts interoperability.


    In the beginning of February, Richard Caruana asked:

    Do you know if the following can or will be instituted :

    1. To be able to have a core fontlet/character in the centre which can have overlay accents or diacritics which can be added/overlaid/plyed at run-time in a unicode text editor.
    2. A section for developers in the UNICODE font (~ 1000 character spaces or less if the above idea can be instituted )

    Also, do you know where I can download a unicode character search program which if you paste in the character (chinese) it gives you the character code (eg 5CFO) and a definition.

    Well, #1 is sort of there now, although it only works well when the font author does the hard work of having all the appropriate code points and attachment points defined.

    #2 is sort of there with now, although the private use area of Unicode is only for private use and not for interchange, and I honestly don't see how it could help this situation anyway.

    As for the place to get info about Unicode characters, check out the Unicode Character Search (by fileformat.info) that will accept both characters and code points. I do not know offhand about downloadable tools, but if you could reach my blog then you can reach that site, I suppose. Right? :-)


    Then, back in the end of January, G* asked:

    Care to take a break from the wonders of Unicode and elaborate on the pain that is the console codepage 437?

    Sorry, no. :-)

    I am all about the Unicode thing.

    Unless you have something particular in mind beyond what I have talked about in prior posts about encoding issues, like this one?


    Ok, that is enough disapointment for one day. I'll try to work harder to meet people's expectations/wishes next time....

     

    This post brought to you by "峰" (U+5cf0, an ideograph meaning "peak, summit; hump of camel" according to the Unihan database)

  • Sorting it all Out

    Reason #8 to not be so anxious to update your Platform SDK?

    • 6 Comments

    Back in March, I gave Reason #124 to update your Platform SDK from time to time (it was to pick up a fix to an AppVerifer bug that existed in unicows.lib).

    A comment from Nektar suggested that there were good reasons to be cautious about doing such updates. And he gave seven very good reasons.

    I will now, to prove that I can argue both sides of an issue, suggest an eighth. :-)

    Many people who have been compiling with VC 6.0 have noted a "debugging information corrupt" error showing up in their debug builds that include MSLU.

    As usual, one of our internationalization MVPs (Ted) stepped up explain in the most recent post to the microsoft.public.platformsdk.mslayerunicode newsgroup:

    ...that's a known issue with the February 2003 SDK and later.  The unicows.lib contains debugging information compatible only the VC 7 and above.  There's nothing that can be done about this, unfortunately. Microsoft decided that Visual Studio 6.0 is no longer a supported platform for any future Platform SDKs.

    I thought I would explain a bit of the backstory about the issue with the Platform SDK. :-)

    Basically three facts unintentionally cause the problem. Since there are three we can call it a conspiracy (sorry, my Law & Order background is showing through again!). The three facts are:

    1. The .LIB files that make up the Platform SDK are built in the same tree as Windows itself, so that day by day the files like kernel32.lib will have in them whatever they ought to have for being considered a .LIB file in the version of the Platform SDK that goes with that version of Windows.
    2. The toolset used to build Windows (compiler, linker, etc.) is upgraded regularly, to pick up fixes of the sorts of bugs that only a project as huge and complex as Windows can find.
    3. The folks who build the toolset over in the Developer Division upgrade the format of debugging information from time to time, and then sometimes remove the older formats as the years pass. Sometimes the formats are not compatible with the older toolsets.

    When you add these three facts together with a fourth fact:

    1. The MSLU .lib file, unicows.lib, has code inside of it

    the unintentional conspiracy causes the problem with the older toolset in VC6 not working with the debugging information in unicows.lib. The retail build still works just fine, at least. But the debug build will not work.

    Luckily Ted gives the workaround:

    To workaround this issue, you'll have to find another unicows.lib from an SDK before February 2003 SDK (e.g. the October 2002 SDK).  You can obtain that here:

    http://groups.google.co.uk/groups?hl=en&lr=&selm=ev4JMaN0EHA.804%40TK2MSFTNGP12.phx.gbl

    The older .LIB file will allow VC6 to work properly.

     

    This post brought to you by "ű" (U+0171, a.k.a. LATIN SMALL LETTER U WITH DOUBLE ACUTE)

  • Sorting it all Out

    Good things can happen when religious authorities work with science and technology

    • 10 Comments

    The Hijri calendar is not really subject to an easy alogorithm. As Dr. International pointed out back in August of 2000:

    Perhaps Dr. International should provide some background to help explain why SQL Server refers to this as an Arabic style date that uses the Kuwaiti algorithm. The Hijri calendar is a very old and complex calendar, which has an issue when it comes to automating conversion between Gregorian and Hijri: there are specific days that the conversion can potentially be off by a day or two in either direction. The exact reason for this has to do with the proclamation of the new moon by religious authorities based on visibility of lunar crescent. Therefore, the natural temptation of programmers to want to automate everything must be resisted in this case. The Hijri calendar is very important to Saudi Arabia and other countries such as Kuwait, and thus this seemingly unsolveable problem must be solved.

    In an effort to solve this challenging problem, several years ago some of the top developers in Microsoft's Middle East Products Divison (MEPD) did extensive research into it. They had the longest timeline of information on the Hijri calendar as is used in Kuwait, and they took this information and did statistical analysis on it, finally arriving at the most accurate algorithm they could devise. This algorithm is used in many Microsoft products, including all operating systems that support Arabic locales, Microsoft Office, COM, Visual Basic, VBA, and SQL Server 2000. Whether you refer to this as the Hijri date, the Arabic style, or the Kuwaiti algorithm, you should understand that it is technically none of these things; it is simply the most accurate algorithm that Microsoft was able to derive using a large number of known Hijri dates. The actual determination of the new moon by religious authorities does not bow to a computer algorithm (nor should it, obviously!).

    Now, I am not even going to imply that what I am about to say were due to direct help from Microsoft, and I have no knowledge that suggests otherwise.

    But earlier today, colleague Shawn Steele pointed me (and others) at an article entitled Satellite will help set Islamic dates which describes a fairly cool development, in my opinion:

    The Organization of the Islamic Conference, the world's largest Muslim body, said Sunday it plans to launch an $8 million satellite within two years to take pictures of the moon to find lunar calendar dates.

    The 57-nation group said religious scholars would have access to accurate pictures of the shape of the moon instead of having to rely on naked-eye sightings, which have in the past created discrepancies between Muslim countries or led to mistakes.

    "Hopefully the satellite will stop the problems associated with lunar sightings," spokesman Ahmed Imigene said.

    It is ironic that the sort of problem that I would struggle with for the technical reason of wanting a purely algorithmic solution is one that bothers some religious authorities as well (many of whom are for obvious reasons not wanting to see even unintentional, innocent mistakes made). I think it is amazingly cool that that there are people whoare interested in leveraging technology to better aid the intent of the rules used by the religion.

    There are understandably some who are unhappy with the plan, as the article goes on to state:

    It was not immediately clear how many countries will use the technology to determine religious dates. There is already some criticism from religious officials in Saudi Arabia, which uses the lunar calendar.

    "The shape of the moon has to be seen from the ground," said Osama al-Bar, dean of the Custodian of the Two Holy Mosques Institute for Haj Research in Saudi Arabia.

    Now I realize that this hope has all of the problems that the issue of instant replay versus umpire/referee calls in sports has had, with the additional burden of being a LOT more meaningful, if you know what I mean. Knowing how bitter the battles got about the instant replay issue, I can only imagine how many problems this may cause for people who truly believe there is something wrong with the plan.

    The real problem (in my opinion) is that the original intent is not completely known. Even if the motivation for rules was known and the rules were made since they were the best at the time, then at this point there is still no way to know if those who made the rules would accept such an innovation or not. Thus it could be easily considered either pious or heretical, depending on how you look at it. And one would be hard pressed to argue the point either way, since it is a legitmate religious question.

    My hope for this development, to help break the stalemate, is that eventually a careful combination of the techniques is used, to help assist the religious authorities. Every effort would be made to try and spot the shape of the moon, but the appropriate authorities would ideally have access to the data from the sattelite as an additional data point to assist them.

    The issue actually reminds me of an issue in Judaism, interestingly enough. It has to do with the laws about Kashrut (כשרות). Kashrut (means "keeping kosher") are the Jewish dietary laws. Food that is allowed to be eaten is kosher (כשר), and food that is not is treif (טרפה). The rules I am referring to are the ones related to the method by which animals must be slaughtered for them to be able to be considered kosher. Described in this article:

    Kosher Slaughter and Preparation

    Jewish law states that kosher mammals and birds must be slaughtered according to a strict set of guidelines, the slaughter (shechita) (שחיטה) being designed to minimize the pain inflicted. This necessarily eliminates the practice of hunting wild game for food, unless it can be captured alive and ritually slaughtered.

    A professional slaughterer, or shochet (שוחט), using a large razor-sharp knife with absolutely no irregularities, nicks or dents, makes a single cut across the throat to a precise depth, severing both carotid arteries, both jugular veins, both Vagus nerves, the trachea and the esophagus, no higher than the epiglottis and no lower than where cilia begin inside the trachea, causing instantaneous loss of blood flow to the brain and death in a few seconds. Any variation from this exact procedure could cause unnecessary suffering; therefore, if the knife catches even for a split second or is found afterward to have developed any irregularities, or the depth of cut is too deep or shallow, the carcass is not kosher (nevelah) and is sold as regular meat to the general public. The shochet must be not only rigorously trained in this procedure, but also a pious Jew of good character who observes the Sabbath, and who remains cognizant that these are God's creatures who are sacrificing their lives for the good of himself and his community and should not be allowed to suffer. In smaller communities, the shochet is often the town rabbi or the rabbi of one of the local synagogues; large factories which produce Kosher meat have professional full time shochets on staff.

    Once killed, the animal is opened to determine whether there are any of seventy different irregularities or growths on its internal organs, which would render the animal non-kosher. The term "Glatt" kosher, although it is often used colloquially to mean "strictly kosher", properly refers to meat where the glatt (גלת) (lungs) are carefully examined for adhesions (i.e. scars from previous inflammation).

    Large blood vessels must be removed, and all blood must be removed from the meat, as Jewish law prohibits the consumption of the blood of any animal. This is most commonly done by soaking and salting, but also can be done by broiling. An interesting fact, little-known outside of Jewish communities, is that the hindquarters of a mammal are not kosher unless the sciatic nerve and the fat surrounding it are removed (Genesis 32:33). This is a very time-consuming process demanding a great deal of special training, and is rarely done outside Israel, where there is a greater demand for kosher meat, since all meat sold in Jewish towns is required to be kosher by law. When it is not done, the hindquarters of the animal are sold for non-kosher meat.

    Now I will be the first to admit that at the time these rules were codified, they were state of the art in the most humane method of slaughter that was really possible. However, I sincerely doubt that it is the most humane possible method today, given all of the technologies that exist. But there is no way to know if the original rules were only to do with picking humane methods (the first time I read about this explanation was a book by Samuel Dresner, a rabbi who freely admitted that he was speculating -- though he did have an awful lot of evidence in his speculation). So the real question is whether technological changes in the shochet's techniques should be allowed?

    I am sure that if such a change were made, that some orthodox jews would refuse to accept them. The whole system of kashruth would changed as some would not accept the "Kosher" marks that others would (a minor issue today that would become much more significant).

    So how to decide when technology should be used to help further tradition, and when it should just butt out? The intents of both sides of these kinds of debates are mostly just trying to help. And often they are all very pious people trying to do the best thing. But how can one know when one is doing the best thing?

     

    This post brought to you by "؍" and "✡" (U+060d a.k.a. ARABIC DATE SEPARATOR, and U+2721 a.k.a. STAR OF DAVID)

  • Sorting it all Out

    Intelligent unmanaged string comparison

    • 3 Comments

    If you look at the documentation for CompareString (but not LCMapString, though it probably ought to be there, too), there is a small security note in there:

    security note Security Alert  Using this function incorrectly can compromise the security of your application. Strings that are not compared correctly can produce invalid input. Test strings to make sure they are valid before using them and provide error handlers. For more information, see Security Considerations: International Features.

    That link about Security Considerations: International Features leads to an interesting discussion:

    Comparison Functions

    String comparisons can potentially present security issues. Because all comparison functions are slightly different, one function might report two strings as equal, but another function might consider them distinct. There are various functions that you can use to compare strings. The following are three examples of such functions.

    • lstrcmpi
    • lstrcmp
    • CompareString

    The lstrcmpi function compares two character strings. The comparison is not case sensitive but is sensitive to the locale selected by the user in Control Panel. The lstrcmpi function does not perform byte comparisons. It compares strings according to the rules of the selected locale. The lstrcmpi function compares the strings by checking the first characters against each other, the second characters against each other, and so on until it finds an inequality or reaches the ends of the strings. The selected locale determines which string is greater (or whether the strings are the same). If no locale (language) is selected, the system performs the comparison by using default values. For some locales, such as Japanese, the lstrcmpi function might not be capable of comparing two strings. For more information, see CompareString.

    The lstrcmp function is like the lstrcmpi function. The only difference is that it performs a case sensitive comparison.

    CompareString is similar to lstrcmpi and lstrcmp except that its first parameter specifies a locale instead of using the user selected locale. Usually, CompareString, lstrcmp, and lstrcmpi evaluate strings character-by-character. However, many languages have multiple-character elements, such as the two-character pair 'CH' in Traditional Spanish. Because CompareString uses the locale passed in the locale parameter to identify multiple-character elements and lstrcmp and lstrcmpi use the thread locale, identical strings might not be found as equal. In addition, CompareString ignores undefined characters so it returns 0 (equal) for many string pairs that are quite distinct. A string might contain values that do not map to any character or it might contain characters with semantics outside the domain of the application, such as control characters within a URL. Test strings to make sure they are valid before using them and provide error handlers.

    (Ignore the typo in RED above, they are going to fix that to read "so it returns CSTR_EQUAL").

    Regular readers of this blog will recognize many of the concepts that are discussed, from Comparison confusion: INVARIANT vs. ORDINAL to The jury will give this string no weight, the issues here have all been covered. But it all boils down to intelligent use of the APIs. If you are trying to match the results of the file system (or of Win32 namespace objects like the names for events, names pipes, mutexes, etc.) then you should be uppercasing the string and then doing a binary comparison. If you are not, then you have to ask yourself why are you bothering to compare at all, since your comparison will not match the one that the opedrating system is about to do. It seems like common sense to me.

    But then APIs like _wcsicmp do a lowercase comparison of strings, so what do I know? :-)

    Ok, no fair to pick on an implementation that actually follows a standard; there are only two good reasons to uppercase here:

    1. The operating system does it for other purposes;
    2. The whole Georgian thing on Windows;

    And there is a good reason to lowercase if you are doing full Unicode casing (which no one in Win32 or the CRT is): the Sharp S moves to two characters if you uppercase it, increasing the length of the string.

    So the CRT can hardly be blamed for not going down that road, when no one was really thinking too much about it then anyway, can they?

    Now this whole security warning applies equally to LCMapString and sort keys, since they are designed to work the same way as string comparisons; any time they do not, we consider it a bug. Now if the bug is in LCMapString then we can't really change the result without changing the version number so we'd be more likely to break CompareString in the same way. Though in practice for as long as I have been here it is always CompareString that is broken, not LCMapString. Something to do with how much easier it is to make a mistake when you try to do less work, maybe? :-)

    I think what we need is a good way to match the operating system behavior that we can point to. People never read warnings that go on for paragraaphs about best practices like this blog, but they do pay attention to "Use function YYYY rather than function XXXX for this particular scenario, if you say it in enough places.

    Of course we'd have to figure out what to call it and all that kind of stuff.

    Let me go think on this for a bit....

     

    This post brought to you by "" (U+1163, a.k.a. HANGUL JUNGSEONG YA)

  • Sorting it all Out

    Speaking gig tonight (April 25) in Cleveland, OH, USA

    • 0 Comments

    Just a quick reminder to folks in the area....

    The presentation is at the Microsoft C#/VB.NET SIG on Monday night (TONIGHT!), April 25th at 6:30pm.

    Looking forward to seeing people there!

  • Sorting it all Out

    Limiting the languages of input

    • 0 Comments

    Windows 2000, XP, and Server 2003 have powerful support for the input of text. On a per thread level, you can designate the support of any language you want, whatsoever.

    For some applications, this is not so good. One may want to block the ability to change the language.

    The first line of defense in such cases is to cancel any attempt to change the input language to any language you do not like, either

    This method is not so useful if the language needs to be limited for some fields but not for others. Say for example in a password field, if you wanted to block the IME since allowing an IME would allow people to view the password that you are masking. Or if you want to make sure the language is not the wrong one to being with. In those cases, you have to be a bit more clever. When you enter a control that needs the limitation, you can check the current input language, either

    These methods allow you to sniff the input language and/or the actual keyboard being used, and if you do not like the current choice you can change it, either

    to change the input language to one that is more acceptable to the application.

    Make sure there is some indication to the user that the change has been made; it is very unintuitive to make the change without telling the user, since they may type thunking the input language has not changed, and the results will not meet with their expectations.

    It is probably a really good idea to cache the GetKeyboardLayout or InputLanguage.CurrentInputLanguage results so that you can switch back to the original input language when they leave the control that needs the restriction (a temporary change is sometimes unintuitive, but at least explainable; not changing it back is not only unintuitive but also a little bit obnoxious for an application to do!).

     

    This post brought to you by "฿" (U+0e3f, a.k.a. THAI CURRENCY SYMBOL BAHT)

  • Sorting it all Out

    Immoral? Illegal? Who can say? Well, I will!

    • 10 Comments

    There have been several interesting emails that I have gotten the last few days, relating to the Microsoft stance on the anti-discrimination bill.

    Many people have seen the letter Steve Ballmer sent out (either because they are employees, or beacuse they read Scoble's posting of it, or through their own nefarious sources), and the various comments that people have put into their blogs about the issue. I don't really a "me too" post to add, but I will give an extra thought or two about it that perhaps will cause someone to see it all in a different light, or maybe a slightly different shade of the same light....

    I read Vic Gundotra's thoughts about it, and found myself unconvinced about the attempt to move the issue from being a human rights issue to a moral issue.

    I myself am not gay. That is a personal choice of mine. I have never known anyone to not respect that. For others it may be a moral choice, but I simply do not see it that way myself. I will respect the right of someone to believe it is immoral if that is their belief, it is certainly no worse and no better of a reason to make such a choice than just not being interested in giving it a try or being afraid to do so or whatever. It is a choice. And I respect the right of anyone to make that choice.

    I am vigously opposed to the notion that such a choice should ever either help or harm my career in any way whatsoever. And certainly if I were gay I would feel the same way, perhaps even more vigourously since someone would essentially be discrminating against me. Either way, the notion that the way I live my life when I am doing nothing illegal should ever impact my career due to discrimination sickens me. If I am the CEO of a company with a strong policy against such discrimination, then how can I say that I personally feel that way and my company policies are shaped that way, but such a policy as law would not be appropriate for me to support?

    I think about my own situation, as a different kind of protected class, being handicapped.

    I love that I work for a company that either meets or exceeds all of the legal requiremnts related to my handicap, and I also love the support I am being given by a management team that wants me to be able to be a happy and productive member of the team. But I live in the real world where not all companies or management teams would give that level of support. And knowing that they are required to make at least some effort makes me feel safer as a person who is handicapped.

    If Microsoft were to do the same thing for a bill related to my situation, I would probably be shouting my displeasure from the rooftops.

    The moral issue is irrelvant to the issue at hand, because the bill does not legislate morality, It basically requires that you cannot discriminate against someone, even if those are your moral beliefs. If you do it at Microsoft, then you may well be fired. All people who are put in such a situation deserve such protections, even if they do not work for Microsoft.

    Taking a step back for a moment, I will admit that I actually do have certain prejudices.

    In the other direction.

    I tend to assume that someone of a different race, gender, creed, or sexual orientation may actually have a better chance at being good at their job than someone who is not. This ias because of the wry fact that they are fighting a harder daily battle and it is much easier for them to either give up or to be drummed out by someone with preudices against them. The fact that they are still there has some small positive effect (that they havemanaged to avoid being forced out by those who tend to discrimate).

    This prejudice of mine is deep seated and has probably been around since a good friend fought and won the battle be a neurosurgeon against an department chief who was inclined to feel that she had no place there, on the basis of her sex. I knew she was an excellent neurosurgeon, and when she asked if I would be comfortable if she scrubbed in when I was having surgery myself I told her I would be honored. And I was. The fact is that had she not been, that department head would have had her drummed out of the program. And I cannot say that all neurosurgeons are held to such high standards. Unfortunately.

    Here in software, its not quite the same life or death kind of situation, obviously. And I have worked to make sure I would never allow my "prejudice" in favor of a candidate who has overcome a system that generally seems disinclined to help deserving people to get a fair shake to change a decision or cause me to prefer one candidate for a job or a project over another. Because even a "good" prejudice is wrong and it is crucial that I make decisions based on the facts and not any of my preconceived notions. In the end I must have real reasons to support my choices, not just to defend myself from getting in trouble but so I can live with myself.

    I think the decision to not support the bill as a company that clearly does support the tenets that cause the bill to exist is wrong. It is a decision that shows that as a company we may have certain convictions, but that we lack the courage of some of those convictions.

  • Sorting it all Out

    Free at last!

    • 1 Comments

    Ok, with some shovelling by me, some shovelling by my father, and a bit of time with the snowblower from the neighbor, we have made it.

    We are now free!

    I am exhausted. I swear I used to shovel this driveway sometimes. But then I used to carry two golf bags of lousy golfers for 36-45 holes a day. Things change.

    We are now free.... to drive to another house in Beachwood, later today.

    I feel almost like I do when I avoid the bluescreen just so I can see the blue screen come up. :-)

    Thanks to all who sent words of support by email, it is much appreciated....

  • Sorting it all Out

    Trapped in Beachwood, Ohio?, Redux

    • 0 Comments

    Ok, I may have spoken too soon thinking I was not trapped in Beachwood, Ohio. The snow is a foot deep, all the way down the driveway. We may have a real problem if we cannot get someone with a plow to come by.

    I guess the cat is trapped on the roof, at this point. We'll see if we can get him down sometime soon.

    Maybe I'll grab a shovel and see if I can do some damage (to something other than me!).

    I'll keep people posted....

Page 1 of 5 (65 items) 12345