Blog - Title

April, 2009

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    Two wrongs don't make a right, but two lefts can take a good whack at it

    • 7 Comments

    When you were growing up, you might have had adults try to indoctrinate you with pithy biscuits of nothingness like

    Two wrongs don't make a right.

    Maybe it was just me, but nonsense like that used to just annoy me.

    Thankfully comedians would add to it, with items like

    Two wrongs don't make a right, but three rights make a left.

    This would not annoy me, because anytime you make fun of something that annoys me, I am entertained.

    Now blogs from over three years ago like Just when you think you know a function... have a different lesson. You know, by showing how you could take two U+200f (RIGHT-TO-LEFT MARKER) code points and by prepending your MessageBox text with them, you could make it mirror the MessageBox.

    I could have gone for the cute pithy sappy crap and claimed that this proves that

    Sometimes two rights make a right.

    but I did not because it just doesn't seem all that clever. Maybe I could have said, along the same logical lines of three rights make a left that

    Two rights will reverse everything.

    since mirroring a dialog does reverse the layout. Now that is clever. It even makes some linguistic sense.

    I had a very good reason for not saying it though.

    I didn't think of it at the time.

    Anyway, there is another fun trick like that, one that regular reader Mihai mentioned in a comment to that Just when you think you know a function... blog:

    I cannot say this is really "common knowledge". I was unable to find it documented anywhere. The only thing related is adding 2 RLM in the FileDescription of the version info, which causes the full application to be mirrored. This is documented in Developing International Software, 2nd Ed."

    Now Mihai actually had it wrong -- they actually want to LRM characters -- U+200e (LEFT-TO-RIGHT MARKER) -- in the FileDescription string of the version info of the binary.

    But luckily the book got it right. :-)

    Now I could get all clever and note that

    While two wrongs don't make a right, two lefts often can.

    In fact I just did. :-)

    I actually like the one in the title better, technically. And from a linguistic standpoint the fact that for most people it would require three lefts to make a right causes this whole area to be pragmatically interesting in a very language-geeky way.

    But the interesting question here is why did they do it this way!

    I mean, doesn't it look like a bug, some kind of typo in the code from years ago that we can't change now but maybe we ought to do something since it seems wrong?

    Two RLMs to make a MessageBox RTL, that makes sense.

    But two LRMs to make a window RTL, that just seems wrong.

    Luckily my colleague, co-worker, and teammate Mohamed is an ace in the Window Manager code and he knew why they did it this way, so when I asked him about the feature, he immediately volunteered:

    It is just kind of a signature that we need to call SetProcessDefaultLayout( LAYOUT_RTL ).

    We did not want to insert RLMs in front of Latin text, that is all. So we needed something that will not make any layout or shaping effect on Latin text. At the end it is a hack to make ease the localization work.

    Now this is a good explanation.

    The kind that takes what appears "silly" at first glance and makes some sense of it.

    HOWEVER.... :-)

    Anyone want to try to spot the flaw in the logic, though?

    From this maybe comes the way to showcase an actual longstanding bug, and perhaps the way it should have been done, instead.

    I know that most of my regulars aren't even looking since I didn't really tel anyone I was back blogging, but are there any takers? :-)


    The Unicode characters were released from their original contracts when SiaO went an hiatus; only time will tell if the Characters Union (AFL-CIO) is willing to negotiate new contracts for the characters it represents... 
  • Sorting it all Out

    On intentional gaps in calendar lists

    • 0 Comments

    Developer colleague Dmitri asked me:

    Hi Michael,

    I noticed the MSDN documentation for the calendar IDs claims that

    Note  The gap in numbering between the identifiers CAL_GREGORIAN_XLIT_FRENCH and CAL_UMALQURA is intentional. The designator for CAL_UMALQURA is 23, not 13.

    I am wondering if you are aware of the intention by which this gap was created?

    Let me tell you, it is kind of a funny story.

    That topic on Calendar Identifiers does indeed have this note.

    And the table has a gap that kind of begs for some explanation:

    Calendar identifier Meaning
    1 CAL_GREGORIAN Gregorian (localized)
    2 CAL_GREGORIAN_US Gregorian (English strings always)
    3 CAL_JAPAN Japanese Emperor Era
    4 CAL_TAIWAN Taiwan calendar
    5 CAL_KOREA Korean Tangun Era
    6 CAL_HIJRI Hijri (Arabic Lunar)
    7 CAL_THAI Thai
    8 CAL_HEBREW Hebrew (Lunar)
    9      
    CAL_GREGORIAN_ME_FRENCH Gregorian Middle East French
    10 CAL_GREGORIAN_ARABIC Gregorian Arabic
    11 CAL_GREGORIAN_XLIT_ENGLISH         
    Gregorian transliterated English
    12 CAL_GREGORIAN_XLIT_FRENCH Gregorian transliterated French
    23 CAL_UMALQURA Windows Vista and later: Um Al Qura (Arabic lunar) calendar     

    One almost wonders where the hidden calendars are, doesn't one? :-)

    Well, it all comes down to a mistake that was made many years ago.

    Some people might disagree with my characterization of events but I was there and I know I'm right so I'm gonna tell this story my way. If you don't like it then why the hell are you here? :-)

    Anyway, it was a mistake.

    In Office.

    They had taken some empty slots in the LCID table and made some assignments.

    Of course Windows had made assignments for some new locales in the same slots.

    Since Office back in the earlier part of the decade was shipping much more regularly than Windows, they had already shipped products with these values, so Windows changed its "not yet shipped" values to match the Office ones for these newer locales.

    And at the same time a more formal process by which Office and other groups would request values was put in place, so that they would never have this kind of problem come up again.

    Now this process had to be put in for calendars, too.

    So when Office added their support for the Saka calendar (ref: Oh (Saka to me, Saka to me, Saka to me, Saka to me) Whoa Babe (Just a little bit) A little respect (just a little bit)) and .NET added support for all those other calendars like the Jalaali (ref: Behold the PersianCalendar class) and so on, each one was given a CALID value.

    Even if Windows had no formal timeline to support it.

    Even if the technology requesting the calendar would never need it.

    Even if nobody ended up needing to use it.

    Here, in a totally unoffical way, I'll name them all.

    Some you may even know where they are.

    While others you won't since they aren't anywhere.

    Like placeholders, almost:

    • 13 - Julian
    • 14 - Japanese Lunisolar
    • 15 - Chinese Lunisolar
    • 16 - Saka
    • 17 - Lunar ETO Chinese
    • 18 - Lunar ETO Korean
    • 19 - Lunar ETO Rokuyuo
    • 20 - Korean Lunisolar
    • 21- Taiwan Lunisoar
    • 22 - Persian
    From there, at that time, Um Al Qura was just the next in line....

    And after that? Who can say for sure? One never knows what the next valoue might be, or even how (or IF) these placeholders will be used.

    For now it is just the items that cause the intentional gap in the list of supported calendars!


    The Unicode characters were released from their original contracts when SiaO went an hiatus; only time will tell if the Characters Union (AFL-CIO) is willing to negotiate new contracts for the characters it represents... 

  • Sorting it all Out

    A chess problem begging for a solution...

    • 7 Comments

    A regular reader gave me an interesting problem, from a book.

    If that kind of puzzle is not your thing feel free to ignore....

    Here is the text, from a book, that describes the setup for the problem:

        Behind them in the billiard room a man's voice grumbled, "Damn kid's game. Not a man's game in the place." The speaker intruded his wide shoulders between Ish and Joshua; a big man dressed in black clothing a bit too dandified for a rancher, a knight's head stickpin glinting in the dark silk of his cravat. The smell of whiskey hung faintly around him, but there was, too, an edge of danger, a readiness for trouble that said, Gunfighter.
        At the blackjack table, Jason leaned forward his red-and-gold waistcoat bright as blood against the white of his sleeves. He looked at his cards, leaned back, and folded.
        Behind them, the big mn grumbled, "About as much skill and thinking as Faro. Spit in the Ocean! Acey-Ducey Under-My-Shoosie! Doesn't anybody in this Godfersaken hell play chess?"
        Without so much as turning his head, Ishmael inquired, "At how much a piece?"
        Mate was set at two hundred dollars. Queen went for a hundred ("About the price of any woman in this town," remarked someone), rooks seventy-five, bishops and knights fifty. Pawns were twenty dollars a piece. A mystified owner scoured the surrounding saloons for a chess set and finally came up with one that the owner of Florinda's Place kept for decoration in her parlor.
        Ishmael beat the stranger in seven moves.
        "By God!" roared the big man. "Let me see you try that again, stranger!:
        He caught him with a reverse fool's mate, in three.
        "But that," he said, pocketing his cash, "is a classic fakement."
        The big man stroked his narrow black mustache and regarded his closed-in king thoughtfully through a haze of cigar smoke. Then he looked back up at Ish. "After I beat you this time," he said, "show me that one again."
        Warned, stung and $600 poorer, the gunfighter settled down to grim play...."

    Things you should keep in mind:

    • The book contains no additional relevant information that will help solve the problem; it is a good read but the read will not help you;
    • If you don't know about chess then you shouldn't bother;
    • Author Barbara Hambly is disqualified from being allowed to answer.

    The questions that you must answer to solve the problem:

    1. Is the $600 a true calculation based on two actual possible chess games?
    2. If the answer to #1 is yes, who was black and who was white in each game?
    3. If the answer to #1 is yes, what would be an exact sequences of moves in each game to cause the $600 cost of the two games?
    4. If the answer to #1 is yes, are there other potential sequences that could fit all of the known facts given in the problem?

    It took me a bit longer to wok this one out than I would have liked, but I have been out of practice as I have not played chess in well over 15 years and have not played speed chess in almost half as long.

    The prize?

    I'll be very impressed if you solve the problem, even moreso if you beat my time, but most of all if you beat the time I feel I should have solved it in....

    Ready? Set? Go!

     

    The Unicode characters were released from their original contracts when SiaO went an hiatus; only time will tell if the Characters Union (AFL-CIO) is willing to negotiate new contracts for the characters it represents...

  • Sorting it all Out

    Double Bite Character Set

    • 8 Comments

    It was just this last Sunday that long time reader Yuhong Bao wrote over in the Suggestion Box:

    BTW, someone mistyped DBCS as Double Bite Character Sets:

    http://www.microsoft.com/downloads/details.aspx?FamilyID=0e56788b-32e8-459d-b9c9-b9155a4836b4

    I was wondering if DBCS is really that painful.

    Interesting question, no? :-)

    In case someone fixes the bug, here is the screenshot from Update for Windows XP (KB961503):

    The KB article (961503: You cannot input characters as expected by using a non-English Input Method Editor in Windows Live Messenger on a Windows XP-based computer) does not have this problem, as it does not mention either BYTE or BITE. It is just the download page.

    There are those who would claim that DBCS does at times bite.

    So perhaps it is amusing to think of this as a Freudian slip of some sort.

    Though I suspect it to be more likely either a simple typo or one of those "typed by someone who does not understand after being briefed by someone who does" kind of situations....

    Or maybe it is an unexpected example of someone making the mistake I described in We're back and we're embarrassing ourselves? (aka Making your localizer's life easier, Part 2) about how people try to spell out acronyms to avoid confuson:

    And the guidelines themselves often fail to assist: for example, in documentation on the first occurrence of an acronym one is expected to spell out the acronym. But if one finds GDI confusing one is unlikely to find GRAPHICS DEVICE INTERFACE to be the magical road to understanding. In fact, the guidelines can often increase confusion!

    Even if the person doing it did not know what it really stood for!

    Thinking back to the days of

    • Single byte character set (SBCS)
    • Multibyte character set (MBCS)
    • Double byte character set (DBCS)

    I really do remember times that I really felt like saying (or even screaming) THIS BITES so there is a part of me that is inspired by the "Freudian slip" explanation. :-)

    Using Unicode is always easier. Always.

    So maybe we should all treat the B in SBCS, MBCS, and DBCS as BITE  rather than BYTE. :-)


    The Unicode characters were released from their original contracts when SiaO went an hiatus; only time will tell if the Characters Union (AFL-CIO) is willing to negotiate new contracts for the characters it represents...

Page 1 of 1 (4 items)