Blog - Title

August, 2007

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    Elegant? Beyond compare...

    • 9 Comments

    Sometimes I wonder if the posts I write are not clear.

    The good thing about the blog is that it is a lot like me talking to people.

    Of course the bad thing about the blog is that it is a lot like me talking to people....

    I was thinking about this when I read Jan Kucers contribution to the Suggestion Box:

    Hello!

    I'm reading your blog for couple of months and I've learned a lot of things.

    We've seen a couple of examples what we really should not do and some hints what is better.

    I'd like to know what is the most right way to compare strings while ignoring the case. (I work with managed classes but others could welcome unmanaged way as well.)

    From some of your posts it is clear that lower-casing is better than upper-casing, since there are lower case characters without upper case equivavalents.

    Also StringComparison.OrdinalIgnoreCase seems to be not the best win.

    So strA.ToLower() == strB.ToLower() ?

    or strA.ToLowerInvariant() == strB.ToLowerInvariant() ?

    or string.Compare(strA, strB, true) ?

    or string.Compare(strA, strB, StrinComparison.InvariantCultureIgnoreCase)?

    Does using CultureInfo.CurrentCulture for string operations mean that the code will behave differently over the same data when running under different culture? If so, wouldn't it be better to choose any particular culture?

    Well...is trustworthy case unaware comparation possible at all? :-)

    Thanks for any hints on this topics. Or have you already answered this in past?

    Jan

    There is a lot in there that does not represent best practices, unfortunately.

    There is a post in which I suggested a few guiding principles, entitled Browsing the shoals of managed string comparisons. In particular there is the bit at the bottom:

    • If you are trying to compare symbolic identifiers or operating system objects like filenames or named pipes, use an Ordinal type method (or an OrdinalIgnoreCase type method in Whidbey).
    • If you need appropriate linguistic results that work with the myriad of methods that Unicode supports for entering identically appearing strings, then use either a Culture-based comparison or an Invariant culture-based comparison if you need unchanging results.
    • Be aware of how the comparisons are being done in the technologies you are using, and do your best to match their behavior.

    That third rule is the most important one....

    For a slightly more complex breakout of items, you can see the post I mention in Something .NET does less intuitively than they ought that Josh Free wrote. Though for every person who has told me they found the table helpful, I have talked to at least one other person who found it made things more confusing -- the same way having all those different methods does.

    But using the above three principles one should be able to resolve just about any question about appropriate string comparisons (whether case sensitive or insensitive)....

     

    This post brought to you by and Ω (U+2216 and U+03a9, a.k.a. OHM SIGN and GREEK CAPITAL LETTER OMEGA)

  • Sorting it all Out

    The main criteria in determing whether a code page sucks? Suckage, of course!

    • 3 Comments

    Rasqual asks:

    Hello Michael,

    I'll keep the question short:

    What makes a 'good' encoding, and what makes it broken?

    When I think back to the various code pages that I have considered to be broken in one way or another:

    And I really can't discern any particular pattern in them -- some just have weird implementation issues, some are done in less than ideal ways, and some are simply outright broken.

    So it seems like any code page that has troubles is one that I would call broken. I have pretty high standards here, like I do in other areas. :-)

     

    This post brought to you by (U+104b, a.k.a. MYANMAR SIGN SECTION)

  • Sorting it all Out

    When it is harder to fit your calendar into things than things into your calendar

    • 0 Comments

    Support Engineer Scott Heim had a question he asked yesterday:

    Hi all,

    I have the MonthCalendar control on a form and when this is displayed in XP, the calendar displays correctly; however, on Vista machines the calendar appears larger and the “Saturday” dates are cut off. Has anyone seen this before? Is there a way around this?

    The form the control is on is not much larger than the control itself. I have a small repro here:

    [I compiled the repro and ran it on both Server 2003 and Vista to take the screen shots below -- michkap]

         

    Thanks,

    Scott

    And support engineer Dave Anderson came to the rescue with the following response:

    Yes, the MonthCalendar control is larger when using the V6 common controls on Windows Vista. You can adjust the size of the form based on the size of the control at runtime. I added the following code for the form’s Load event handler:

            private void Form1_Load(object sender, EventArgs e)
            {
                this.ClientSize = monthCalendar1.PreferredSize;
            }

    -Dave

    And indeed, when you add this code things fit once again:

    Perfect. :-)

    Now obviously this is a special case (a form that is meant to be the same size as the calendar) but the general principle can be applied in situations where controls are packed too tightly and changing the size might affect localized form by causing controls to overlap (definitely something to avoid).

    One thing developers should be very careful about any time they are building dynamic UI metrics this way in projects that are going to be localized is to make sure that the fact that the UI metrics change at runtime is communicated to the localizer -- there are few things more frustrating than truncation bugs that a localizer can't do anything about but that they have to go through multiple iterations to discover that fact!

    And now that I have hijacked the question to get up on my localizability soapbox, I'll close with a message of more general use. :-)

    The messge? The fact that the Shell common controls do not guarantee backward compatibility with their metrics is an important issue to keep in mind -- or you could find yourself getting truncated, too....

     

    This post brought to you by (U+2ea6, a.k.a. CJK RADICAL SIMPLIFIED HALF TREE TRUNK)

  • Sorting it all Out

    Are you Mr. Kaplan?

    • 4 Comments

    Those were the exact words of the person on the phone, those words in the title.

    I should back up a minute.

    I have a land line phone with Verizon that I only use for emergencies like that recent power outage and also to torture telemarketers (I am not on the do-not-call lists; I only give the phone number to the sort of people who might sell things so I can treat every call like entertainment if I am going to even bother answering it).

    The phone has every service stripped down and it is listed (another source of telemarketers, the phone book!).

    Anyway, where was I?

    Oh yeah. This guy on the phone asked "Are you Mr. Kaplan?"

    I figure there is no harm identifying myself since the number is listed. "Yes," I respond.

    "Vice president of Microsoft Customer Service?" he asked.

    Hmmmm.

    "No, that's not me," I reply.

    He ends the call quickly. "I'm sorry, I must have a wrong number."

    I guess he was looking for Richard Kaplan. Perhaps just going through the phone book calling all the Kaplan entries in the Seattle metropolitan area.

    I must admit that it is an interesting way to call product support!

    I might have commented had he not hung up so quickly.

    And you know I always tell people that I am not Microsoft product support, but Richard can't really get away with that, I suppose.

    I'd love to know how this all turns out but Richard and I are not golfing buddies and I am pretty sure we aren't related, so I guess I'll never know....

    As an FYI to people, this is likely not the most effective way to get in touch with customer support (and also I never answer technical questions on that phone so it is not the best way to get a hold of me, either!).

     

    This post brought to you jointly by ℡, ⌕, ☎, ☏, and(U+2121, U+2315, U+260e, U+260f, and U+2706, a.k.a. TELEPHONE SIGN, TELEPHONE RECORDER, BLACK TELEPHONE, WHITE TELEPHONE, and TELEPHONE LOCATION SIGN)

  • Sorting it all Out

    If the data is invalid, the results can be invalid too

    • 0 Comments

    They say that a good lawyer never asks a question in court without already knowing the answer.

    Well, I'd probably make a lousy lawyer.

    Because when I was doing the research for What's up with MB_ERR_INVALID_CHARS?, I did not know the full extent of the overall limitations in the flag.

    But given what I discovered, I made some recommendations.

    Though I find myself really agreeing with Yossi and the comment Yossi left:

    This inconsistency is pretty bad (the difference between how the actual Code page and best fit tables treat invalid characters). It renders MultiByteToWideChar pretty much useless in certain cases where these invalid characters are finding themselves into the output stream.

    I'm using MSXML2 to read an XML file which was produced after converting MBSC character stream to Unicode. Since the following characters:

    0x81 0x0081
    0x8d 0x008d
    0x8f 0x008f
    0x90 0x0090
    0x9d 0x009d

    in the 1252 best fit appears to be "OK", the MSXML2 just fails to parse the file.

    Is there a way to resolve this problem (other than to scan the stream in a for-loop and replacing this invalid characters?

    Is there a version of MSXML2 that is consistent with the behavior of MultiByteToWideChar?

    I do find myself curious about what method msxml2 is using here for its conversions that is managing to fail on these characters that are technically mapped in the code page 1252 that the system defines. How is this component doing its conversions, exactly?

    But on the other hand, I am left with the knowledge that this never-before-defined behavior is hardly referring to bytes that are useful in a stream of text.

    So if you are seeing them, it is entirely reasonable to consider the text to be corrupt. What is that expression? Garbage in, garbage out.

    And then of course relying on code pages in this day and age is not the best plan even when you stay within the valid mapped characters that make up the long-documented portions of the code pages.

    The best thing to do is just stay away from them, especially if you think you might have invalid data like Yossi was seeing (or maybe investigating whatever is converting text to these unexpected code points!).

     

    This post brought to you by ǻ (U+01fb, a.k.a. LATIN SMALL LETTER A WITH RING ABOVE AND ACUTE)

  • Sorting it all Out

    Sometimes a request is made of me and the question is asked in such a way that I just don't know exactly how to respond

    • 2 Comments

    You know the drill, nothing technical.... 

    Sometimes a request is made of me and the question is asked in such a way that I just don't know exactly how to respond....

    Like when I got home today and my neighbor (who was outside in the porch, holding a cigarette like it was something semi-precious that she do not know quite what to do with) kind of held up the cigarette and asked if I had a light.

    I thought back to the last time I had a cigarette (September 24, 1994) and that I honestly didn't remember the last time I had a lighter. I then corrected myself, I had those matches in my apartment still from the outage I mentioned before, though they were far enough away that I couldn't see scooting to my door, going inside, finding the matches, scooting back out to my neighbor, and giving them to her.

    So I chickened out. I just kind of shrugged and told her "Sorry, I don't smoke any more" with what I hope was the right tone to indicate that I was, in fact, sorry.

    Hopefully she is not a regular reader of this blog; if she sees this post then the jig is up, as it were. Sorry about that, I really just didn't see any reasonable logistics to make it happen....

    And then there was earlier in the day, when one of the program managers I talk to from time to time (name omitted for what should become obvious reasons, presently) had an interesting request that I did not know quite what to do with.

    The exact words: "...in the future, you should space out your interesting posts (by interesting, read relevant to me). If I'm having a hard time actually caring about the post I should be caring about because theres another more interesting post, its a valid point!"

    I had literally no idea what to do with this one, and was essentially rendered speechless.

    Truly anyone who found every post to be useful would scare the living hell out of me (since I find each post somewhat interesting, how I feel about myself should be obvious here!). But since I wouldn't expect anyone to find every post interesting, the notion of spacing out the interesting posts is a fascinating one to contemplate though very hard to actually deliver.

    My response was something very respectful along the lines of "I'll do my best" though the eye roll that went along with it probably negates the sincerity of the words.

    As does this post? :-)

    Which of course leads to a meta question -- would a program manager who wanted interesting posts to be metered find a post about that request to be interesting?

    Infuriating, sure. But interesting?

    Sometimes a request is made of me and the question is asked in such a way that I just don't know exactly how to respond....

     

    This post brought to you by º (U+00ba, a.k.a. MASCULINE ORDINAL INDICATOR)

  • Sorting it all Out

    It looks good until you look at it more closely somewhere else

    • 3 Comments

    No, this post is not to do with the phenomenon sometimes referred to as 'beer goggles' in any way, shape, or form! 

    (by the way, if you search for that term on Google, would that make it become 'beer googles'?)

    The other day Scott asked:

    Hi,

    Great blog!

    Im working on a really specialised text editor that is used for text from all around the world. To do this we are using Uniscribe to convert text to glyphs etc etc. Pretty normal stuff. We do wierd stuff with the glyphs in a printer driver later on!

    However today Im looking at Bengali, in particular Bengali (Bangladesh), and I found a wierdness in IE that you might be interested in.

    I have been cut-and-pasting text from webpages into my editor to validate that Im working OK. I have found a issue that is in my editor and in notepad!

    If you look at the webpage:

    http://www.prothom-alo.com/index.news.details.php?nid=OTkzMw==

    If I cut and paste the text into notepad it looses it character order and becomes junk, but whats more If I save the web page locally and reopen it in IE it turns to junk!

    I can fiddle with the character order manually to sort things out again, but thats not the point really!

    Anyway,

    Keep up the good blog work!

    Interesting, it does indeed contain text that looks good:

    until you try to put it somewhere else (at which point you get lots of dotted circles and such. Very odd!

    I went down the hall to talk to Simon Daniels.

    Like many people such as Raymond Chen and even myself sometimes, Simon is cursed with the burden of knowing stuff. And the problem with knowing stuff is that people will just randomly want to ask you stuff....

    Anyway, he immediately realized what was probably going on. He viewed the source, got the link to the CSS file that was being used, and looked at it:

    /* Embeded Font */
    <!-- /* $WEFT -- Created on 7/16/2007 -- */
      @font-face {
        font-family: Bangsee Alpona;
        font-style:  normal;
        font-weight: normal;
        src: url(http://www.prothom-alo.com/fonts/BANGSEE0.eot);
      }
    <!-- /* $WEFT -- Created on 7/17/2007 -- */
      @font-face {
        font-family: Prothoma;
        font-style:  normal;
        font-weight: 700;
        src: url(http://www.prothom-alo.com/fonts/PROTHOM0.eot);
      }
    -->

    And of course .EOT files created by WEFT (Web Embedding Fonts Tool) actually have the site that the .EOT was generated for embedded in them, so changing the link to remove the "www" so that the link didn't work showed very different results:

    (if you look very carefully you will see lots of dotted circles spread throughout)

    In the end, proper font creation following the rules that have been established in OpenType (e.g. this one for Bengali) is crucial. If the fonts you use don't follow those rules then you have to encode the text to match the expectation of the fonts, and then you have strange behavior any time the font in question is not available to you.

    Now in fairness to the Bangsee Alpona font, it may be a perfectly valid one at this point, perhaps the version that was used to generate the .EOT files was from before the various changes within Unicode and then later to Microsoft to support the language properly -- and perhaps the editor for the content has some of the same problems -- so new content is created using this slightly different use of Unicode that is not the standard (thus creating text that will not always look right if you try to copy and paste it somewhere else that may not have the font:

    ১২ োসেੳটਹর োথেক রাজৈনিতক দেলর সেਔ অােলাচনা ੂরઔ

    One of the reasons for the effort to provide a standard solution within Unicode is to keep under control the multiple contradictory methods of getting the rendering done, which is clearly what happens here....

     

    This post brought to you by (U+0985, a.k.a. BENGALI LETTER A)

  • Sorting it all Out

    Every character has a story #29: U+1000^H^H^H^H0f40, (TIBETAN or MYANMAR LETTER KA, depending on when you ask)

    • 4 Comments

    So I was chatting with Goldie the other day and I think just after or maybe it was just before I made some ridiculous stretch of a joke joke about Anatevka (forgetting momentarily that she did not go by Golde; her nom de plume was Goldie) she asked me if there was a test case I knew off the top of my head where collation results changed between XP and Server 2003.

    Interestingly, this is a question I have been waiting years for someone to ask, ever since I first pieced together the change that happened! :-)

    You see, prior to Server 2003, there was no version support. You know, those functions I mentioned in posts like this one, (IsNLSDefinedString and GetNLSVersion.

    As a part of the Server 2003 update, a bunch of code points got removed from the table. I'll list a bunch of them and you tell me if you see a pattern:

    0x1000  32    2   2  2  ;Tibetan Ka
    0x1001  32    3   2  2  ;Tibetan Kha
    0x1002  32    4   2  2  ;Tibetan Ga
    0x1003  32    5   2  2  ;Tibetan Nga
    0x1004  32    6   2  2  ;Tibetan Ca
    0x1005  32    7   2  2  ;Tibetan Cha
    0x1006  32    8   2  2  ;Tibetan Ja
    0x1007  32    9   2  2  ;Tibetan Nya
    0x1008  32   10   2  2  ;Tibetan Reversed Ta
    0x1009  32   11   2  2  ;Tibetan Reversed Tha
    0x100a  32   12   2  2  ;Tibetan Reversed Da
    0x100b  32   13   2  2  ;Tibetan Reversed Na
    0x100c  32   14   2  2  ;Tibetan Ta
    0x100d  32   15   2  2  ;Tibetan Tha
    0x100e  32   16   2  2  ;Tibetan Da
    0x100f  32   17   2  2  ;Tibetan Na
    0x1010  32   18   2  2  ;Tibetan Pa
    0x1011  32   19   2  2  ;Tibetan Pha
    0x1012  32   20   2  2  ;Tibetan Ba
    0x1013  32   21   2  2  ;Tibetan Ma
    0x1014  32   22   2  2  ;Tibetan Tsa
    0x1015  32   23   2  2  ;Tibetan Tsha
    0x1016  32   24   2  2  ;Tibetan Dza
    0x1017  32   25   2  2  ;Tibetan Wa
    0x1018  32   26   2  2  ;Tibetan Zha
    0x1019  32   27   2  2  ;Tibetan Za
    0x101a  32   28   2  2  ;Tibetan Aa
    0x101b  32   29   2  2  ;Tibetan Ya
    0x101c  32   30   2  2  ;Tibetan Ra
    0x101d  32   31   2  2  ;Tibetan La
    0x101e  32   32   2  2  ;Tibetan Sha
    0x101f  32   33   2  2  ;Tibetan Reversed Sha
    0x1020  32   34   2  2  ;Tibetan Sa
    0x1021  32   35   2  2  ;Tibetan Ha
    0x1022  32   36   2  2  ;Tibetan A
    0x1026   1    0   3  0  ;Tibetan Vowel Sign I
    0x1027   1    0   4  0  ;Tibetan Vowel Sign Short I
    0x1028   1    0   5  0  ;Tibetan Vowel Sign U
    0x1029   1    0   6  0  ;Tibetan Vowel Sign E
    0x102a   1    0   7  0  ;Tibetan Vowel Sign O
    0x102b  32   37   2  2  ;Tibetan Chuchenyige
    0x102c  32   38   2  2  ;Tibetan Visarga
    0x102e   1    0   8  0  ;Tibetan Anusvara
    0x102f  32   39   2  2  ;Tibetan Right Brace
    0x1030   1    0   9  0  ;Tibetan Under Ring
    0x1031  32   40   2  2  ;Tibetan Ditto
    0x1033  32   41   2  2  ;Tibetan Single Ornament
    0x1034  32   42   2  2  ;Tibetan Shad
    0x1035  32   43   2  2  ;Tibetan Tseg
    0x1036   1    0  10  0  ;Tibetan Candrabindu
    0x1037   1    0  11  0  ;Tibetan Candrabindu With Ornament
    0x1038  32   44   2  2  ;Tibetan Comma
    0x1039  32   45   2  2  ;Tibetan Rinchanphungshad
    0x103a  32   46   2  2  ;Tibetan Rgyanshad
    0x103b   1    0  12  0  ;Tibetan Honorific Under Ring
    0x103c  32   47   2  2  ;Tibetan Left Brace
    0x103d   1    0  13  2  ;Tibetan Vowel Sign Ai
    0x103e   1    0  14  2  ;Tibetan Vowel Sign Au
    0x1040  12   16  70  2  ;Tibetan Digit Zero
    0x1041  12   47  70  2  ;Tibetan Digit One
    0x1042  12   66  70  2  ;Tibetan Digit Two
    0x1043  12   84  70  2  ;Tibetan Digit Three
    0x1044  12  102  70  2  ;Tibetan Digit Four
    0x1045  12  121  70  2  ;Tibetan Digit Five
    0x1046  12  140  70  2  ;Tibetan Digit Six
    0x1047  12  158  70  2  ;Tibetan Digit Seven
    0x1048  12  176  70  2  ;Tibetan Digit Eight
    0x1049  12  194  70  2  ;Tibetan Digit Nine
    0x104a  32   48   2  2  ;Tibetan Double Shad
    0x104b   1    0  15  0  ;Tibetan Virama
    0x104c   1    0  16  0  ;Tibetan Lenition Mark

    The problem here? The data is all wrong!

    This version of Tibetan, first described in Unicode Technical Report #2, was removed in Unicode 1.1 when the ISO 10646 merger happened, and then Tibetan was added back in Unicode 2.0 in an entirely different place.

    If you look at DerivedAge.txt, you will see that the new Tibetan was added in July 1996.

    But Windows had been carrying data around from Unicode 1.0 since the very beginning of its 32-bit life, possibly as far back as NT 3.5 or even NT 3.1 (I am almost curious enough to go try and find out which, actually!).

    In Server 2003, it was decided that this incredibly invalid data had to be removed.

    For one thing, it is just really bad to start a formal versioning functionality with crap like that in there.

    And for another, this space that was left empty after the 1.1 merge was actually filled as of Unicode 3.0 in 1999 -- with the Myanmar script. And even though Windows did not add weights for it yet (we did not do so until Vista), keeping known bad data seemed like a pretty bad idea...

    So, all of the above code points had weight in Windows from the early 32-bit days until XP, and then again in Vista (and were essentially weightless in the years between).

    And of course the snapshots in Jet 4.0, ACE (the version of Jet that ships with Access >= 2007), SQL Server 7.0, 2000, and 2005 all have these somewhat bogus code points as well....

    Oops for them (plus we can be snotty and superior about it now that is fixed in Windows!)

    When one talks to old timers about the 1.1 merge between Unicode and ISO 10646, you have trouble getting a straight answer -- it is like that bit from The Number of the Beast:

    I've given up trying to find out what happened in 1965: "The Year They Hanged the Lawyers." When I asked a librarian for a book on that year and decade, he wanted to know why I needed access to records in locked vaults. I left without giving my name. There is free speech -- but some subjects are not discussed....

    So that is all I can say about the old U+1000 TIBETAN LETTER KA which died in Unicode in the early 1990s only to rise from its ashes in 1996 at U+0f40 with U+1000 being assigned to MYANMAR LETTER KA in 1999. The same character lived on at Microsoft until 2003, only to be reborn along with its Myanmar cousin in Vista....

     

    This post brought to you by and က (U+0f40 and U+1000, a.k.a. TIBETAN LETTER KA and MYANMAR LETTER KA)

  • Sorting it all Out

    Small case is not just tinier capitals; italics are not merely slanted letters

    • 5 Comments

    The other day Lynn asked:

    Michael,

    I'm not sure which group you are working in, but I am hoping you can forward this message to someone who might be able to look at this.

    We got a message from a customer about the EU Expansion Font Update v.1.02 for XP. He loaded it on his machine and created the attached table to test out the additional characters. In the bottom table, the Italic and Bold columns are printing a U with grave accent instead of a Bulgarian ѝ (Cyrillic Capital letter I with grave, U+045D) for Arial, Times and Trebuchet. Verdana is correct. If I select the symbol for the Cyrillic I with grave out of the symbol table for any of these three fonts, the little pop-up window shows a U with grave accent and if I insert it into the document, a U with grave accent is displayed. This happens for both the upper and lower case characters (U+040D, U+045D).

    I am hoping there might be a fix for this, or that one is forthcoming.

    Thanks for any help.

    I forwarded it on to Judy and Simon since they know lots more about fonts than I.

    Judy verified that this was an expected difference, kind of the Bulgarian version of the Serbian difference in italic forms I talked about in this post.

    Here is an example where you can see this other form:

    Of course Tahoma's italic support is calculated by a bit of GDI slanting, so its results are just an algorithmic thing -- the form that looks more "u-ish" is the appropriate one for italic lowercase U+045d.

    And then Simon pointed out some Latin-based examples of the issue, which are much easier for people who do not know the Cyrillic script to fathom )I took it as a screenshot in case you don't have all the fonts in question):

    Bravo for the small differences -- kind of argues for a Tahoma Italic to get done here, doesn't it? :-)

     

    This post brought to you by ѝ (U+045d, a.k.a. CYRILLIC SMALL LETTER I WITH GRAVE)

  • Sorting it all Out

    There's a hole in the soft top, dear Liza, dear Liza (aka be sure to read the find print!)

    • 5 Comments

    (title inspired by that old children's song, I remember the Sesame Street version best)

    It was not that long ago that my car (a black 1995 Saab 900 convertible) was parked at the airport in long term parking, and I was somewhere else. San Jose, I think.

    While I was there, someone decided to break into the car.

    Of course there was an alarm, so they had to be careful.

    I was gone for most of a week so they had time to be industrious.

    Their first attempt was to remove the lock on the passenger side. This does not let the door open (and even if the door did open the alarm would go off, so it would not have helped).

    Undeterred, their second attempt was to cut a hole (well, actually two holes) in the soft top.

    Wait, I have some art for this:

    You can see the two rips in the soft top. If you really work at it you could maybe pull something out but the only thing there was to pull out was the GPS unit on the floor mostly under the seat (hard to see, even harder to reach), so it would have been difficult to get it out, I guess. strike two. Particular annoying failure for reasons that will become clear shortly.

    These thieves were quite determined, though. In the end they broke the driver's side window. And they got the GPS unit.

    It was surprising to me that the alarm did not go off. Iinterestingly it did go off when I unlocked the door by reaching in and lifting it, but the cop mentioned to me that it is possible to shatter a window with vibration but without motion, so it would just be my bad luck. I think I should probably look into a better alarm system. :-(

    In any case, insurance coverage was apparently not as full as it might have been....

    So far, the window and the door lock are covered but the soft top is not due to some kind of exclusion. Though since I had to pay a higher rate for loss/damage/theft for a convertible, it seems like something of an unfair exclusion (though how often can one out-argue an insurance company?).

    I will withhold the name of the insurance company for now while I wait for the appeal to finish up, more on this later,,,,

    If they won't cover it, then I have to worry about the price of the repair -- the cost of the soft top (about $3000, quoted by the dealer) and the labor to replace it (17 hours!!!) is fairly steep. So I may hold off on the full repair until I decide to sell the car. Or maybe find some auto repair school that wants to take in the repair as a project

    (ironically if someone steals it, then the whole car is covered, including the soft top. I can always just hope somebody steals it some day..... sigh).

    For some strange reason, the roof does not leak -- but that may not be the case forever as the top goes up and down.

    In the end I guess I'll just have to get it replaced whether the insurance company pays or not (one way or another)....

     

    This post brought to you by (U+2f27, a.k.a. KANGXI RADICAL ROOF)

  • Sorting it all Out

    157766400 seconds in....

    • 2 Comments

    A colleague and friend of mine, a former v- at Microsoft who went full time, was talking to me just after I had gone full time.

    He predicted that I wouldn't last two years.

    (He himself had already left within a year of when I started)

    I asked him why he thought I would only last two years, and he told me that I'd realize that the people who didn't want me there would make their feelings clear enough for me to realize there was no future for me.

    Intrigued, I turned the question around and asked him why he thought I'd last as long as two years, to which he replied that I am stubborn, idealistic, and cynical -- a combination rarely found in nature but then (as he further pointed out) I did not spend a whole lot of time in nature, either.

    I pointed out that I had been a boy scout in my youth (until roughly the time I discovered the existence of girls, in fact!). To which he said the scouts thing just proved the point about being stubborn, and the thing about girls proved the idealism.

    "What about the cynicism?" I asked him, but he had a response for that too - the fact that I am not in a relationship and seem to shun opportunities when they arise.

    I started to respond but then I realized it was quite possible he had thought this out and was going to be able to outflank any counter-argument I could come up with on the spot. And since no one wins an argument when they think of the comeback a month later, I realized conceding would be more sensible (perhaps a subtle disproof of his theory about me, but I knew enough not to run myself into that trap since pointing it out negates the effect of the concession).

    "We'll just have to see where we stand in a couple of years," I said.

    Today actually marks my fifth year, though. :-)

    I'm not in the office today since its Sunday and won't bring any special candy tomorrow (I have two huge bowls of candy sitting in my office now, I suppose my friend would say that every day that I have not left is reason to celebrate?).

    I asked him about the fact that his prediction turned out to not be prophetic, and of course he had a parry to the thrust of this argument too -- the Longhorn/Vista ship schedule threw off the timing.

    "We'll just have to see where we stand in a couple of years," I said again.

    "Exactly."

    So here I am, still sorting it all out. I'll be in late tomorrow (waiting for a repairman) but if you are on campus in Redmond and feel like popping by and grabbing a piece of candy during the afternoon, celebrating my stubborn/idealistic/cynical nature, and just generally saying hi then please feel free to do so. :-)

     

    This post brought to you by 𐒥 (U+104a5, a.k.a. OSMANYA DIGIT FIVE)

  • Sorting it all Out

    Blame Kannada! (ಕನ್ನಡ)

    • 2 Comments

    (Inspired by the alternate title from Oh Kannada... (ಕನ್ನಡ) and the South Park movie!)

    One of those interesting issues related to rendering Indic properly came up the other day, in this case with Kannada....

    The string in question, first:

    ಅಹ್ಮ್‌‌ದ್ ಷರೀಫ್

    If you are running on an OS that does the rendering correctly, it will not look identical to this other string:

    ಅಹ್ಮ್ದ್ ಷರೀಫ್

    Or this third string:

    ಅಹ್ಮ್‍ದ್ ಷರೀಫ್

    The customer was in this case seeing that third string visually for all three using some fonts, but not others, and in some technologies, but not others. And it was never working right in .NET 1.1 using GDI+ and its Graphics.DrawString method.

    Now as you might have guessed, we are dealing with the combination of several different issues here, including:

    • the one I pointed out in Why don't all the half forms sort right?, and the fact that the decision to unify the meanings of U+200c and U+200d across all Indic scripts is relatively recent idea recommended by Peter Constable and adopted by Unicode in recent versions;
    • the one I pointed out in A quick look at Whidbey's TextRenderer, and the fact that the GDI+ shaping engines are hopelessly out of date and see little chance of being updated -- so that TextRenderer.DrawText is much preferred over Graphics.DrawString;
    • the fact that (given all the above) later shaping engines and fonts and technologies will have a much better chance of displaying strings correctly.

    These issues are ones that will improve over time as the older implementations that do not have right rendering story are replaced by those that do. Though I can't help wondering whether it would have been so bad to update all of the supported technologies (including GDI+) so that customers could see text correctly without depending on technology shifts....

     

    This post brought to you by (U+0ccd, a.k.a. KANNADA SIGN VIRAMA)

  • Sorting it all Out

    Depends on what you meant about what you meant...

    • 6 Comments

    I admit I am no fan of either the MPAA or its ratings system.

    But some interesting issues in language are raised by its criteria.

    For example, the use of one of the harsher sexually derived words (e.g. fuck) even as few times can lead to a movie being given a PG-13 rating, while using it only once can lead to an R rating if it is used in a sexual sense.

    The distinction, while obvious and rather easily defined, can at times be problematic, though.

    Take for example the movie Crimson Tide, in which the word is used 28 times, mostly in a non-sexual sense, but certainly enough times to assure an R rating.

    Some are obviously sexual (and more than a little offensive), like the first occurrence:

    Yeah, horses are fascinating animals.
    Dumb as fence posts but very intuitive.
    In that way, they're not too different from high school girls.
    They might not have a brain in their head...
    but they do know all the boys want to fuck 'em.
    Don't have to be able to read Ulysses to know where they're comin' from.

    While most of the others are not, like the second (which can be considered offensive to some for entirely different reasons):

    Somebody asked me if we should have bombed Japan...
    a simple, "Yes, by all means, sir. Drop that fucker. Twice."

    Or the third:

    When you got somethin' to say to me, you say it in private.
    And if privacy doesn't permit itself, then you bite your fucking tongue.

    Now clearly these last two are not meant in a sexual sense. and so it goes for most of them.

    But then there is a third sort of a category, like in the thirteenth use which comes just after the non-sexual twelfth

    Sonar/Conn - let me know when our range to that Akula is open to 1,000 yards!
    <<Conn/Sonar>> Aye, sir!
    Damn it! Let's just shoot this fucker!
    What's 1,000 yards for?
    'Cause it takes 1,000 yards for the torpedoes to arm!
    Jesus! Who'd you fuck to get on this ship?

    Now this is a sexual sense, kind of. But not really -- it is obvious hyperbole and not a serious reference to sex. That someone can be so incompetent to understand such as basic issue about submarine warfare that the only way they could make it on submarine would be as payment for sexual favors (perhaps "Who do you have naked pictures of to get on this ship?" could have been used instead to get a similar meaning across, and it would have been more theoretically feasible of an idea in terms of getting assigned to a sub while incompetent. But in any case it is really not talking about a truly sexual context (or if it is, it is less serious of an example than that first one).

    Kind of like the twentieth use just after the non-sexual nineteenth:

    Weps, we've been ordered to launch.
    Now why in the world would we do that if they weren't prepared to launch at us?
    We don't know that for sure. That's the whole point.
    That's why he wants time to confirm the message.
    That's the whole fucking point is we don't have time!
    Radchenko is fueling his birds. Now why do you think he's doing that?
    Why? You don't put on a condom unless you gonna fuck!

    Again, it is sexual, sort of. But really only as metaphor -- trying to explain that fueling missiles without arming them would make as much sense as putting on a condom without actually having sex. It is certainly a different degree of sexuality, if it is truly going to be treated as sexual at all.

    Obviously with 28 uses of the word it was going to get an R rating anyway. But if a movie had just one or two examples of this "sexual, but not" kind of reference, I wonder whether it would be PG-13 or R?

    A basic problem of having the ratings decision be based on not just the word itself but also on both the semantic context of whether it is being used sexually and the pragmatic context of whether the sexual use really is about sex.

    Perhaps it is a distinction without a difference to some, but the uses seem different to me....

     

    This post brought to you by (U+59d8, a CJK ideograph that may or may not be some relevance)

  • Sorting it all Out

    MSKLC keyboard layout names in your own language

    • 5 Comments

    When I wrote Getting the language (and more!) of an LCID-less keyboard, which admittedly covered a lot of ground, I realized there were a lot of other points that would have to be clarified.

    Like how MUI (Multilingual User Interface) fits in.

    I mean, it is clear from looking at the registry that something is going on:

    That bit with the Custom Language Display Name and the Layout Display Name and their SHLoadIndirectString style strings is fairly obvious.

    (I still have to talk more about SHLoadIndirectString; I'll do that another day)

    And it leads people like regular reader Ivan Petrov to wonder and even ask:

    Hi Michael :-)

    I've the following question:

    How can someone USE, let's say something like the MUI technology, for the Description text when the custom Keyboard Layout is installed?

    I mean when some user is using English User interface (MUI) to see the English Description text and if some user (on the same machine) is using a Bulgarian User interface, to see a Bulgarian Description text. All this at the Language bar and in the Text Services and Input Languages window in the Installed services under the Keyboard tree as localized node!

    Regards,

    Ivan.

    Now of those two MUI-friendly strings, the Layout Display Name actually was added in Windows XP and is used to support localized keyboard layout names in every user interface language in Windows. All of the strings are in input.dll and the localizers can get to them.

    But for custom keyboard layouts, obviously one cannot add strings to the input.dll file that ships in Windows. So we talked about it an decided that the resources of the layout DLL itself would work just fine. Starting with MSKLC 1.4, we automatically add the language name at string resource 1100 and the layout name at string resource 1000.

    Of course there is no user interface within MSKLC to let you specify the various translations of those two strings, which would seem to defeat the purpose.

    But let's take a closer look at the .KLC file from those adventures the other day, near the bottom of the file:

    DESCRIPTIONS
    0409    Like Totally Fer Shure

    LANGUAGENAMES
    0409    Valley Girl (California)

    And there you have it. For any language you wish to add a translation for, and this is for either or both strings, you can add them here. MSKLC will not let you edit the name directly but if they are there then it will build the keyboard layout DLL containing them. Quite happily, in fact....

    You are LCID (technically LANGID) bound since all of the resources are contained in the one DLL and there is no way to do multilingual resource tagging in one file. Perhaps in a future version this would change to including the various .mui files in the language name directories. And then custom languages might fare better (of course for the time being MUI does not work well with custom locales so MSKLC has some time before anyone needs to worry about getting that bit right. :-)

    It is funny, the feature idea within MSKLC has been suggested for years but it never really got very far, as people struggled over what to make that UI look like -- some big grid where you choose the target language and put in the translation? Or would you give the DLL to some localization company and have them translate? Probably once they are separate DLLs, sure. But for now many some UI would have been nice? :-)

    Ah well, no worries. If you want to put in some different translations of the custom language name (ignored unless it is in fact a custom language) and/or the keyboard layout name, adding them to the file is easy enough by just putting in the LANGID and the name, one line to each you add.

     

    This post brought to you by (U+0fcf, a.k.a. TIBETAN SIGN RDEL NAG GSUM)

  • Sorting it all Out

    On Becoming Jane

    • 4 Comments

    So I went and saw a movie last night with colleague/comrade/friend Melanie.

    We went to Lincoln Square Cinemas armed with two recommendations (Death at a Funeral and Superbad) but we ended up seeing Becoming Jane, instead....

    This would not have been my first choice, to tell you the truth.

    Not due to any feelings against Jane Austen, mind you -- while not a genuine Austenite (Janeian? Not sure what the authentic term is here), I loved Sense and Sensibility and Emma nd the others and read them all on my own after being "forced" to read Pride and Prejudice in grade school (it was not the sort of book one could actually admit to enjoying at that point, so I didn't). But Jane taught me irony, something we all have and experience but so few people recognize, and that was quite a gift, if I do say so myself.

    But I was skeptical about the romance between her and Lefroy, and I was doubtful about Anne Hathaway in the role (though I loved her in The Devil Wears Prada), and I was petrified that a "Hollywood ending" would be bolted on to the story leaving us with more of a "Becoming What Jane Would Be Like Had She Married" rather than a more truthful "Becoming Jane."

    But I'll be honest, it seduced me.

    Perhaps there was no actual even almost relationship between them -- in reality I think they had less than a month for it to happen so in the end it is unlikely they did. But the movie made me believe it and I had no trouble suspending disbelief given the chemistry between Hathaway and McAvoy (though there were several scenes that would have benefited from a Steadicam!). After I got home I had to check dates for Lefroy's daughter Jane and see if things lined up as well as they did in the movie -- they did.

    Plus the preview of The Jane Austen Book Club coming this fall has also tempted me, I'll probably see it too (the book was wonderful).

    I am truly glad we saw this movie.

    Of course there was a price to be paid -- I am not the sort of person who can be moved that much toward romance (even ultimately unrequited) and was up most of the night reading a Thomas Gifford novel to swing my sense, sensibilities, pride, and prejudices back closer to where I usually keep them. There is romance there too, mind you -- but with the added notion of conspiracy and a more cynical edge upon which I can tune my own moral compass....

    But that is just me. Normal people can see the movie and meet the Jane before the Jane who wrote the books they knew so well, and enjoy thought that she did indeed have a chance to feel the stirrings about which she wrote so very well.

     

    This post brought to you by (U+0d60, a.k.a. MALAYALAM LETTER VOCALIC RR)

Page 1 of 5 (61 items) 12345