Blog - Title

June, 2008

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    When Unicode's "PDF" character isn't supported, you really *can* say that the app's Bidi support doesn't POP!

    • 3 Comments

    The problem that came in was an interesting one:

    • Win32 application running on Windows XP (Hebrew language);
    • Rich edit control in the application;
    • Try to display the folder path in left - to - right (i.e. in the form of C:\<Hebrew folder name>.txt, thus it is meant to be an "LTR" chunk, independent of the rest of the surrounding information);
    • Current UI language settings should be used;
    • Application uses BIDI control characters LRE/PDF to specify text as embedded left-to-right;

    When displaying the folder path, it does not show the text from left to right order.

    Indeed.

    Let's dig into this one a bit, shall we?

    If you are like me and you have spidey senses about this kind of thing, they are probably tingling right now. You may recall posts like these ones:

    but although there seem to be familiar issues, none of them are quite the same.

    Let's look at UAX#9 to get the definition of these controls:

    2.1 Explicit Directional Embedding

    The following codes signal that a piece of text is to be treated as embedded. For example, an English quotation in the middle of an Arabic sentence could be marked as being embedded left-to-right text. If there were a Hebrew phrase in the middle of the English quotation, the that phrase could be marked as being embedded right-to-left. These codes allow for nested embeddings.

    Abbr. Code Chart Name Description
    LRE U+202A http://www.unicode.org/cgi-bin/refglyph?24-202A LEFT-TO-RIGHT EMBEDDING Treat the following text as embedded left-to-right.
    RLE U+202B http://www.unicode.org/cgi-bin/refglyph?24-202B RIGHT-TO-LEFT EMBEDDING Treat the following text as embedded right-to-left.

    The effect of right-left line direction, for example, can be accomplished by embedding the text with RLE...PDF.

    Okay, so much for homework. Let's get practical now!

    On the whole I think we should try things out and make sure we can reproduce the issue.

    We will take a nice string that meets the particular criteria, such as:

    C:\שיקול דעת מוטעה.txt

    Now any time someone talks about RichEdit, I like to start in Notepad, then a RichEdit control, then Word. So let's try without the U+202a (LEFT-TO-RIGHT EMBEDDING) and U+202c (POP DIRECTIONAL FORMATTING) first -- in both LTR and RTL contexts.

    In Notepad:


    and then a RichEdit control (I use WordPad here but you could choose any old RichEdit control):


    and then in Word (I am using Word 2003 here):


    Hmmm.... I see the problem here. If you put any kind of RIGHT-TO-LEFT-OSITY on top of the string, despair follows.

    This does seem like a good time for those embedding characters!

    So we'll use the same string but put U+202a (LEFT-TO-RIGHT EMBEDDING) prefixing the string and U+202c (POP DIRECTIONAL FORMATTING) suffixing it.

    Again, first in Notepad:


    and then WordPad to look at RichEdit:


    and then finally in Word:


    Damn.

    It works fine in our EDIT control, but not in our RICHEDIT control or in Word.

    What's up with that?

    RichEdit expert Murray Sargent explained what is going on:

    No RichEdit version supports LRE, RLE, and PDF. They’re on the wish list[...]

    Ah, I guess that says it all. I took the liberty of asking him to add an entry to the wishlist on behalf of those of my readers who care about such things. :-)

    Though it really is times like this that I find myself think all of the work in RichEdit and Word and other parts of Office to support math1 might have been more ideally preceded by finishing this important bit about support of bidirectional text2.

    Luckily IE does the right thing here, both with Unicode and with its own dir tag that can be put places like the level of a paragraph or a span or a div. And these non-IE components for rich text might have support before we actually need complex mathematical operations to express how long we've been waiting for the support....

     

    1 - First added to Unicode in UTR#25 in the end of August 2003.
    2 - First added to Unicode in UAX #9 in the middle of August 1999.

     

    This blog brought to you by U+202a, U+202b, and U+202c (aka LEFT-TO-RIGHT EMBEDDING, RIGHT-TO-LEFT EMBEDDING, and POP DIRECTIONAL FORMATTING).

  • Sorting it all Out

    How do[es what] the common controls [call ]convert between ANSI and Unicode?

    • 0 Comments

    The other day, Raymond Chen blogged about How do the common controls convert between ANSI and Unicode?, in response to a question in his suggestion box:

    In the context of an ansi (not unicode) app: How do the common controls (listview for example) decide which code page to use when translating multibyte to widestring?

    I had to debug an ansi app that was displaying corrupt strings on a traditional chinese system because the dialog font was causing the listview to use a codepage other than the system ACP when translating multibyte to widechar.

    Although I would seldom if ever disagree with about anything that builds out of the Shell depot, in this particular case I know of two specific exceptions to the CP_ACP rule one generally sees, though the differences may have less of a direct relationship to the Shell/comctl32 code, meaning he might still be right within his domain. :-)

    The two other behaviors I have run across in various versions of the common controls:

    • Use of the thread code page (CP_THREAD_ACP)1.
    • Use of the code page associated with the font charset selected in a device context.

    I honestly don't know much about the first one, but I remember reports of bugs where changing the thread locale (which changes the thread code page) would change the behavior here, and particularly on the pre-6.0 controls there was a real ANSI-Plus thing going on here that tried to move beyond CP_ACP, so while I had no proof it was true I suspected it might be.

    The second one, I have more insight into since I had to debug it on a few occasions -- basically the text would not always be converted to Unicode at all; and the ANSI text is sent to GDI with a DC containing a font set to use a charset most associated with some other code page. GDI would then do its job to render and make choices there that it was kind of asked to, in a bizarre and not well understood sense.

    As a rule, any time GDI tries to get into NLS stuff, the results are predictable -- buggered, every time. Thus we have problems like the ones I pointed out in What the hell is wrong with TranslateCharsetInfo, anyway?. Between problems like that and the one discussed in Double Secret ANSI, part 2 (the brokenest one yet, sorry 'bout that!) and Sometimes when you say 'the fix is in' you mean it in a good way, one thing is clear: the GDI folks should consider taking a trip over to the NLS team and giving them all atomic wedgies.

    Just kidding, but you know what I mean.

    For the Common Controls, when I was doing MSLU work I ran across many cases where having the latest updates on Win9x would give a lot of GDI-influenced support of text where adding MSLU and a CP_ACP mechanism broke test applications until I changed the code to do something more like this to get the code page to convert with:

    UINT CpgFromHdc(HDC hdc) {
        int chs;
        CHARSETINFO csi;

        chs = GetTextCharset(hdc);
        if(TranslateCharsetInfo(&(DWORD)chs, &csi, TCI_SRCCHARSET))
            return(csi.ciACP);
        else
            return(g_acp);
    }

    So anyway, the CP_ACP rule should be the only rule. but there are way too many pieces of Windows that assume they know better what to use....

    On the other hand, so do I -- UNICODE! :-)

    1 - Now you know how I feel about this one if you've ever seen Nothing stinks worse than the thread locale, other than the thread code page. I think I was fairly unsubtle on my feelings.

    This blog brought to you by(U+0a36, aka GURMUKHI LETTER SHA)

  • Sorting it all Out

    "It makes total sense when you explain that the Turks have four eyes", aka Transcripts are hard, let's go shopping!

    • 3 Comments

    So regular readers might remember the recent blog entitled "It makes total sense when you explain that the Turks have four I's".

    Dale then commented:

    Oh, the transcript of the podcast is better:

    "It makes total sense when you explain that the Turks have four eyes"

    What, they wear glasses???

    Indeed. Here is the piece in question:

    Ah well. As my ex-fiancé used to say, pobody's nerfect....

    The corrected piece of the transcript would be:

    Scott Hanselman: Right, exactly and a very good example of that would be say that you had an application that was going to check for the word 'fail' in a TextBox and the user typed in 'fail' f-a-i-l, all lowercase with an i with a dot on it and in your code you said TextBox.ToUpper(), you would get, in Turkish, FA-capital I with a dot on it-L, and then you would check if that equaled 'fail' and it wouldn't, then your application would fail.

    Michael Kaplan: Right and…

    Scott Hanselman: Makes total sense when you explain that the Turks have four I's.

    Michael Kaplan: Yes.

    Scott Hanselman: But it was very not intuitive.

    Michael Kaplan: But all the wonderful things in language about vowel harmony and the way that the words flow together is lost on everybody. What people remember is that darn Turkish bug.

    Doing a transcript of any kind of a technical chat is a real challenge, truly.

    We should all go shopping!

    This one was a true LOL (laugh out loud) one, for me -- so I really said HA! :-)

     

    This post brought to you by(U+1200, ETHIOPIC SYLLABLE HA)

  • Sorting it all Out

    Sorting it all Out a forgery?, aka Cretan or cretan?

    • 9 Comments

    Just kidding, this blog is completely written by me and none of it is a forgery, except in the sense that I sometimes forge it out of my imagination (which is something very different).

    But yesterday over on The Unicode List, John Hudson stirred up a hornet's nest of sorts with an early morning contribution:

    The upcoming issue of the archaeology journal Minerva will apparently
    contain an essay charging that the Phaistos Disk, whose signs were
    recently encoded in Plane 1, is a modern forgery. Should be interesting.

    This particular encoding was not without controversy, in part because this charge has been made in the past, and in part because as John later pointed out:

    As I understand, the government of Greece has refused to allow a
    thermoluminescence test of the Phaistos disc. In this context, a
    significant article challenging the antiquity of the disc may be a good
    thing, since it will increase pressure for a reliable dating test to be
    done.

    There is plenty of buzz about the article in places like Language Hat and elsewhere, and there are certainly plenty of reasons to be somewhat suspicious about this exciting development, including:

    • Michael Everson's comment that put me in mind of a similar line from The Untouchables: "How weird. Would anyone making a forgery do it that way?"
    • Andrew West's comment: "It certainly won't be the first time that someone has claimed that the disc is a modern forgery ... nor the last time no doubt. But it is an opinion that does not represent scholarly consensus."
    • Andrew's other comment: "Incidentally, my spidey senses start tingling when I read that the author of the forthcoming paper is also the editor-in-chief and founder of the magazine in which it is to be published."
    • Asmus Freytag's [I shudder to call it a] contribution: "Who knows, perhaps there's someone out there already working on an article proving that Unicode itself is a moder [sic] forgery :-D"

    Perhaps it is just a board game, this Phaistos Disc, or perhaps it is a forgery. With an inability to really find out for sure at present (why is it that any time progress is stalled we can always find a government to plausibly blame?), this is one of those things that won't really go anywhere, for now....

    I had someone ask me if it would be a huge problem if it did turn out to be a forgery.

    Well, fictional scripts and artificial scripts are on the roadmap and it's not like all scripts aren't invented at some point.

    But it would make the description text:

    Phaistos Disc Symbols: U+101D0—U+101FF

    The Phaistos disc was found during an archeological dig in Phaistos, Crete about a century ago. The disc probably dates from the mid-18th to the mid-14th century BCE. Unlike other ancient scripts, relatively little is known about the Phaistos Disc Symbols. The symbols have not been deciphered and the disc remains the only known example of the writing. Nonetheless, the disc has engendered great interest, and numerous scholars and amateurs spend time discussing the symbols.

    kind of an embarrassment for Unicode since it would be missing out on that all important bit of information that it was, if not a fraternity prank, then at least not on the level.

    Given the (mildly obnoxious) Urban Dictionary definition of cretan, we are left with an interesting boggle -- either the Phaistos disc was created by a Cretan, or else it was created by a cretan.

    My head hurts just typing that one....

     

    This blog brought to you by 𐇕 (U+101d5, aka PHAISTOS DISC SIGN WOMAN)

  • Sorting it all Out

    If they say "it's all relative" then remind them it is not a coincidence that there is a show called Relative Madness on TV

    • 0 Comments

    Sometimes the question you ask will have different answers depending on who you ask.

    Like previously when I described (in the blogs entitled IsCharSomethingOrOther? and Is Kana 'alphabetic' ? Depends on who you ask....), where the answer would vary depending on whether one was asking the NLS API vs. the wrapper functions around NLS contained in Win32.

    One could (convincingly) try to argue the differences between these two interfaces, but in the it usually boils down to different intents translating into a different understanding of the question, and thus coming up with slightly different answers.

    Another such problem came up the other day, when a developer asked:

    I am using PathIsRelative API is my code to ascertain where a path is relative or not. Now if I am giving input as “\Programfiles” or “\” as input the api does not take it as RelativePath. My understanding is that API will consider both of these paths as relative since in command window these inputs take me to root folder or folder relatives to root.

    Let me know if I am wrong and also alternative way to find “\ProgramFiles” or “\” being relative or not.

    One developer pointed out the problem in understanding, from the point of view of the function:

    "\Program files" and "\" are absolute paths. Relative path is different if starts from different current folder. Relative to root is already an absolute path.

    While another developer pointed out the overarching problem of the viewpoint:

    I am confused about the meaning of "relative" here.

    "\" and "\Program Files" are relative—to the current working directory. If cwd is "D:\foo" then "\" means "D:\", but if cwd is "E:\foo", then "\" means "E:\".

    So… why is "relative to root … already an absolute path"? Is this simply how "relative" is defined in the world of the win32 API?

    And that second developed answered this one in a way that not everyone will probably be able to parse completely, but which certainly failed to see the conflict that comes from the other point of view....

    You  are correct. But \Program files has the same meaning when cwd is "d:\foo1" or "d:\foo2". Relative path starts from cwd. \Program files always starts from root (although the root is relative to cwd), so it’s absolute. Maybe you can just take this as routine.

    Another developer volunteered a response to this question that feel a lot like that USER32 response:

    ...a way to determine if paths with a leading "\" are to be considered relative ("\Program Files" was just an example).

    In our code we have the same issue and just special case it ourselves:

    BOOL PathIsRelativeBlah(LPCWSTR wszPath) {
        ASSERT(wszPath != NULL);
        BOOL bIsRelative = ::PathIsRelativeW(wszPath);
        if (bIsRelative)
            return TRUE;
        if (wszPath[0] == L'\\' && wszPath[1] != L'\\')
            return TRUE;
        return FALSE;
    }

    Why would one care?

    Well, if the current drive is unknown then in some cases the SHLWAPI PathIsRelative answer is dead wrong awful since one cannot make security decisions based on being sure the path is unambiguous with the response that PathIsRelative provides. So in some cases the function is acceptable, in others it is incomplete.

    Now when wants to determine whether one is a relative or not in real life, it is much more complicate, and the many tests include:

    • ABO blood group typing
    • HLA antigen testing
    • PCR (polymerase chain reaction) testing
    • RFLP (restriction fragment length polymorphism) testing
    • DNA testing

    What tests make sense depend a lot on the context -- is the goal to find out just if it is a relive? Or is it a check for parental identification?

    What one is prepared to accept as the answer will have a lot to do with which question one is really asking, and for what purpose....

    Now if one does want to call those "driveless" paths relative then wrapping the SHLWAPI PathIsRelative function with one's own additional logic the same ways USER32.DLL's Is* functions wrap the NLS GetStringTypeW function so they can provide their tweaks.

    So that people who want a slightly different bit of logic applied to "is this relative" have the chance to see that done.

    And I'll close this with a true story of a conversation that dates back several years with a friend of mine who was about to marry a single mother and the question of whether to adopt her kids came up:

    Doug: So you had to go through this too?
    Me: almost.
    Doug: Why almost?
    Me: One of the kids wanted me to adopt, the other two felt like it would be a betrayal, and it was mostly moot since the dad would never have gone for it and he h just started getting garnished for support so ironically in having the court acknowledge he was a deadbeat dad he actually had a say whereas he didn't before.
    Doug: And then entirely irrelevant after you didn't actually get married.
    Me: Also true. You have to read the fine print.
    Doug: Windows is easier. One call to SetParent and your problems are solved.
    Me: Actually, there are many times that the SetParent call is actually the start of even bigger problems....
    Doug: So Windows isn't really easier, it just looks that way.
    Me: Pretty much. The fine print gets you every time.

     

    This blog brought to you by P (U+0050, aka LATIN CAPITAL LETTER P)

  • Sorting it all Out

    Accelerator vs. Shortcut, revisited

    • 3 Comments

    In the before time, in the long long ago, when this blog was first finding its voice, I blogged two blogs:

    And a recent conversation with some localizers made me think about these two items from the point of view of a localizer.

    In a strangely ironic turn, LocStudio (the Microsoft internal and privished tool, mentioned previously) reportedly refers to both accelerators and shortcuts as accelerators.

    Anyway, the message I sent out to try and contrast these two items from the standpoint of localization went something like this:

    For shortcuts, localizers do need to translate the term for the key (e.g. CTRL vs. STRG), and they will usually put in the letter that appears on the keyboard layout the user of the language is most likely to be using for whatever is under VK_C (if they don't then the user will not know what the text means). The localizer has no way to change the behavior since it is VK-based and code mediated.

    For accelerators, localizers still need to translate the term for the key (e.g. CTRL vs. STRG). But for the letter itself, they are fully in control of the accelerator (on menu or dialog) and can change it to anything they like, with the choice usually based on a letter in the word they have chosen in the translation. The localizer has the complete control to change the behavior (assuming the developer did not hardcode the strings!), since it is character based and localizable text mediated.

    Localizers therefore have a much more controlling role for accelerators than for shortcuts; in the shortcut case they are forced to just be descriptive, and in rare interesting cases where the VK the code uses does not exist on the keyboard (a problem that has come up in the past), the localizer has no real choice since the application is broken when using that keyboard worldwide (and is just more noticeable in their language).

    Thus the fact that the main tool used by MS localizers calls them by the same term even though they play very different roles in the work of those same MS localizers is pretty tragically ironic....

    This blog brought to you by(U+0dab, aka SINHALA LETTER MUURDHAJA NAYANNA)

  • Sorting it all Out

    Meanwhile over in Colemakville

    • 5 Comments

    Colemak devotees -- please control your enthusiasm; comments that get overly preachy will never be posted (as this is my pulpit, not yours!)....

    The keyboard without a Caps Lock key that has come up now and again here in posts like this one and this one and this other one, also known as Colemak.

    Anyway, it got its article deleted from Wikipedia again (the ASfDs for the article - here and here make for fascinating reading).

    Yes I am the Microsoft blog "source" Vquex is talking about:

    The only other sources cited are the article's own previous deletion debate here on Wikipedia and a blog entry by a Microsoft developer stating that they would not be including Colemak in Windows precisely because it is non-notable.

    Well, not exactly. But that is how the Wikipedia text put it too, and it is probably close enough.

    The Colemak forum discussion about all this is 24 posts long spanning about 10 months, and there is an interesting conversation there about me and how I am so polarized on the issue that I would somehow block the keyboard even if there was a valid business case for it. In fact that is the only reason I am writing this blog, to set the record straight. :-)

    As a side note, the million dollar keyboard challenge that actually raised under 200 euro was also interesting. I probably would have not named the contest in such a way had it been me....

    For the record:

    1) I am not one true source of all the keyboardy goodness at Microsoft. I have a voice, and sometimes people want to hear it when I use it. But if there was a keyboard council (strictly speaking, there isn't), in my current role on the Windows International Fundamentals team I am hardly even an official member of it, let alone the only one with a valid set of opinions.

    2) My bias about specific people in a community who choose to annoy me is something I keep separate from when I am asked about a specific business decision. That separation is important to me and is really the only way I can act in a way that I can feel comfortable is ethically appropriate. So while the way no Colemak fan was able to comment without extolling the virtues of the keyboard is something that grew old so fast that I started trimming comments and will probably moderate comments in this post, that is an entirely separate issue from whether it makes sense for Windows to include a specific keyboard in the box.

    3) The more emotional/zealous in the Colemak community have polarized me about them, to be sure. But I could not list their names now if my life depended on it and wouldn't be looking them up to know them now unless my life did depend on it. So the next group have a clean slate, and will the only reason I'll probably have the same reaction is that the same rhetoric will probably be used and my patience, while almost infinite in some cases, is astoundingly scant in this case.

    4) People who are reading here are unlikely to ask the question "Michael Kaplan who?" but outside of a somewhat narrow group of people in a couple of professions, I am sure it would be a common question. I would probably word it more strongly than they did, with a kind of a Reservoir Dogs-esque "Who the fuck is Michael Kaplan" if you know what I mean. Even inside of Microsoft, even if the topic is something like MSLU where I really am "the guy" and not looping me in can sometimes be mildly insane and a tremendous waste of everyone's time, it will sometimes be months before someone knows to loop me in. So when you take something like keyboards, especially when MSKLC is not directly involved, my involvement is not completely assured every time.

    5) Getting off me for a moment.... you can certainly hear Microsoft folks, from the most highly paid executives to the greenest of just started program managers, bandy words like innovation around. But innovation is not a place where the piece of Microsoft involved with software layouts distinguishes itself as an innovator -- because this is a type of decision that is designed to always be following what customers are using. If there was a huge customer demand for a brand new keyboard and that demand had objective and provasble numbers behind it, then the hardware folks would be on board and everyone would have the keyboard they wanted. But these items are not created out of subjective interest, any more than they are killed for spite.

    And there it is -- if the Colemak folks convince the people, and then convince the hardware folks to create this keyboard by having the valid business case, then it will happen and no changes to the software layouts are needed.

    Note that this is the kind of innovative keyboard design that laptops of all kinds tend to do, for entirely different reasons.

    It is highly unlikely that Microsoft would ever in a million years just go off and add software layouts that would go against what is on the hardware keys. Microsoft isn't a college dorm with members trying to sneak in past the RA after curfew. The software layouts will follow the reality of what is expected on the hardware layouts -- and there is no Windows OEM who even has this on the radar as a reasonable thing to do.

    Colemak has not yet met that burden. Not by a long shot.

    Which is interestingly different but not entirely unrelated to why they can't get the article to stay on Wikipedia (though of course their burden is much lower than Microsoft's)....

     

    This blog brought to you by(U+101d, aka MYANMAR LETTER WA)

  • Sorting it all Out

    Using one Unicode input method at a time (Using the Unicode IME on XP)

    • 0 Comments

    So back in the beginning of 2008, Harold Fuchs (in response to my Typing in random Unicode code points blog from almost 2.5 years prior, asked over in the Suggestion Box:

    The method you described in May 2005 ("Typing in random Unicode code points") for entering Unicode characters by installing the Chinese language and IME simply doesn't work on my Win XP Pro + SP2 system.

    The idea is that you select Chinese and then, with Numlock turned on, type the decimal character code on the numeric keypad while holding down the Alt key. I've tried it in IE7, Outlook Express 6 and Wordpad. No joy. For example, the decimal code 10003 (hex 2713) should produce a tick (check mark in American English). It doesn't. It merely produces a double exclamation mark, the same as if Chinese were not even installed, let alone selected.

    Please, what have I done wrong?

    It took me a minute to try and follow this, then I realized the problem.

    My earlier blog started with some simple introductory test:

    People ask all the time how they can type in random Unicode data.

    Some people point out the vast array of supported
    Keyboard Layouts on Windows.

    Others point out how you can create your own keyboards with
    MSKLC.

    Still others talk about fancy things you can do with the numeric keypad.

    And then still others like to go on about typing a code point value in Word, highlighting it, and then hitting <Alt+X>.

    Personally, I like to just install the Unicode IME, first added for Traditional Chinese in Windows 2000 and available in every version of Windows since then.

    I then went on to explain how to install and use the Unicode IME.

    Of course note that the "every version of Windows since then" claim has since the time and date of that blog stopped being true when they stopped including it, which begin in Vista (as I explained here when I first gave a possible alternative!).

    Now my post a couple of months later (Typing in random Unicode code points redux) links to How to enter Unicode characters into Microsoft Windows which gives lots of ways to get the input done, also.

    But for XP SP2, the text in the original article is valid.

    The problem is that Harold was taking more than one of the five different methods I listed that can all be used to enter Unicode code points, and combining them. So when it didn't work, the entire article that happened to enumerate five completely different methods was considered incorrect. :-(

    Taking a closer look:

    The idea is that you select Chinese and then, with Numlock turned on,

    No, this part is not needed and therefore not really right. :-(

    type the decimal character code on the numeric keypad

    No, you use hexadecimal code unit, not decimal.

    while holding down the Alt key.

    No, this will make it not use the Unicode IME at all. This will use the numpad method (hinted at in the introductory part of that blog).

    I've tried it in IE7, Outlook Express 6 and Wordpad. No joy.

    Kind of expected under the circumstances. :-(

    For example, the decimal code 10003 (hex 2713) should produce a tick (check mark in American English).

    Actually, unless you add special information to the registry, the numpad method only uses decimal and only takes four decimal digits. So U+2713 (CHECK MARK) would not be produced in this case.

    It doesn't. It merely produces a double exclamation mark, the same as if Chinese were not even installed, let alone selected.

    I am not sure how U+203c (DOUBLE EXCLAMATION MARK) came out of here, but the Unicode IME wasn't being used (and it never uses decimal values here).

    So, the solution is definitely to keep the different methods separate and just use one of them -- for XP my preference is still to use the Unicode IME, which if you think about it is the only method I gave more than hints at in terms of instructions for use. :-)

     

    This blog brought to you byand(U+2713 and U+203c, aka CHECK MARK and DOUBLE EXCLAMATION POINT)

  • Sorting it all Out

    So logical that even Mr. Spock (and my fiancée?) would approve

    • 6 Comments

    Regular readers may recall having seen the blog entitled Somehow I just get a Visual of the Logical Song (as sung by Supertramp) a few months ago.

    Not everyone is convinced fully, just yet.

    So I thought I'd add some more information to the mix....

    Let's say you go to Regional and Language Options. I'm going to do it here in XP but you can use whatever version you like....

    First change the Standards and Formats locale (aka default user locale) to Arabic - Saudi Arabia. Do not click Apply.

    But do click that Customize... button and switch to the last tab (the Date tab):

    Here is that same view we talked about before, where the logical and the visual varied.

    So that dd/MM/yyyy seemed to look like yyyy/MM/dd in the right-to-left context.

    Well let's go to the Numbers tab.

    Change the Digit substitution setting from Context to National. This time you DO click Apply:

    And now go back to the Date tab:

    do you see it -- the context of the expression (LTR vs. RTL) determines the order even for almost entirely Arabic text.

    But the format string, by convention, is always left-to-right.

    Now you may argue that this is confusing, or wrong.

    But after being this way in just about every version of Windows that has ever shipped an Arabic locale, even if you were right (a point that I would probably not agree with you about unless we were in public and were going to get married (as I think it is best to never disagree with one's fiancée in public!), changing the behavior in Windows would require changing the expectations of a lot of different people.

    Not something to do lightly, if you know what I mean. :-)

    I think I could even convince the fiancée about this last part, since I could point out that we don't want make all of those people feel like they have been wrong all of these years. Since she is obviously an intelligent and sensitive (if stubborn) woman, she would never want to cause that much confusion....

     

    This blog sponsored by غ (U+063a, aka ARABIC LETTER GHAIN)

  • Sorting it all Out

    Back to Sri Lanka (conceptually)

    • 4 Comments

    I've been blogging about Sinhalese keyboard on and off for some time.

    Like in November of 2005 when in Custom keyboard, custom language? I bemoaned the lack of ability to extend the language list in MSKLC 1.3, which blocked Madhava Temmakoon's efforts to properly label the keyboard.

    Or in August of 2006 when in Creation of transliterating input methods I agreed with Thakara's conclusion that any regular keyboard such as those created by MSKLC is ultimately insufficient for languages like Sinhalese, at least in terms of usability. And this includes the input support in Vista and Server 2008, by the way.

    There are experts in the government in Sri Lanka who agree with all of this -- the distance between

    • (barely) internationally tolerable, and
    • internationally sufficient, and
    • internationally delightful

    is rather vast, from a language/linguistic standpoint.

    The fact that the latest versions of Windows are only at the first stage is a little embarrassing. :-(

    But taking up this cause, I have two sets of data that can be plugged into the framework from the Table Driven Text Service discussed in that so far 11-part but still continuing series of blogs:

    Now between the data that Thakara sent me, and another set from another source, and the Tamil one that I still need to post (which is also relevant to Sri Lanka), I have the data almost together for three layouts covering two languages used in Sri Lanka.

    Starting tomorrow, I am going to jump back in to the Table Driven Text Service once again, will these three new ones to add to the list of samples and examples I have mentioned in prior blogs:

    I am pretty excited about this. How about you? :-)

     

    This blog brought to you by(U+0da3, aka SINHALA LETTER MAHAAPRAANA JAYANNA)

  • Sorting it all Out

    Reliving life in the time before ASCII?

    • 6 Comments

    Over in the Suggestion Box, regular reader Jan Kučera asks:

    The other day, I was thinking... what would have Michael done differently, if there was no ASCII yet? If he could design encoding from scratch given all he knows today?

    I still have some doubts whether I should ask that but let's try to place the question... so any thoughts?

    Wow, I am sure there are many of the Unicode elders who have some strong opinions about this one!

    If we were starting from scratch and it were up to me, I don't think I'd try to show too much imagination -- the requirements of being compatible with every country and company out there might still be around.

    For all of the various ways of doing things in Unicode, whether one if thinking about normalization or combining classes or canonical equivalence or casing or properties -- really anything -- you can look at Unicode and see many of the wear marks as you see things done one way in some cases, and in another way in others.

    Or maybe Jan is really asking even before that -- literally before there was even ASCII?

    That is even scarier.-- I have a hard time believing that I (or really anyone) would be able to convince all of the people to think ahead to the need to support every script in the world, past and present. Sure I could make the arguments, but I doubt anyone would be willing to listen.

    So in the end, I think I would let things unfold as they did but concentrate on all of the weird cases where people now look at Unicode and just point out things that could have been done better -- and just do those things better from the start. With the benefit of hindsight before things have happened, encoding would probably be a lot cleaner, if nothing else. :-)

    You could probably fill in computer language names or runtime library implementations or API definitions or many other items here too for this kind of hypothetical -- would someone starting Win32 from scratch do it all the same? Probably not -- many of the "warts" that come from the project having been around so long and worked on by so many people would be able to be fixed -- and then we would just wait for the new warts to form once we run out of hindsight!

    Once I ran out of hindsight, I (and everyone else in this "secret knowledge" club would be required to retire or prove that they deserve to stay. Because being able to look at a problem in retrospect and discern the best solution is a great skill, but is NOT the same skill as that involved with making new decisions that have no prior precedent, and assuming that the rock stars in one space will even be competent in the other is likely to hasten the creation of the new warts.

    Lest we forget, it is our rock stars we have now who wrote most of the existing warts -- along with a lot of good work, too!

     

    All of the characters in Unicode, owing their existence to Unicode existing in its current form, have jointly agreed to not sponsor this post. The latest attempt to create a "Character's Union" was narrowly defeated and everyone breathed a sign of relief as the Cyrillic Local AFL-CIO chapter was not able to be formed...

  • Sorting it all Out

    "I work in Windows, but for this conference I'm here for SQL Server...."

    • 0 Comments

    I am back home in Redmond after an exciting week in sunny Orlando, most of the time spent at Tech·Ed....

    The total cost for me was nothing -- $75 and some miles (as I pointed out before). I would have stayed for the second week too -- if only to see Kim Tripp and some of the other SQLS talks that make me totally dislike the dev/itpro split since many of them really are crucial for developers -- but (despite the fact that I was offered staff credentials for the second week too) I didn't have a free place to stay for the second week, and I really need to get back to work.

    And since the trip was taken as vacation time, I have to be careful not to run out of that!

    A not insignificant number of the attendees I talked to in the SQLS section bemoaned the consequences of the split, and I see Kim mentioned it yesterday, too.

    Several Outlook/Exchange people expressed similar concerns, because there is a lot of overlap for them, too.

    Though for many other technologies, it was no so much of a problem -- the split just worked out better for others, I think....

    Perhaps the lack of much in the way of DEV week globalization content (which ends up being important for developers, for reasons that blend into both performance and security), and the fact that most of the relevant content is in the IT PRO week despite clearly being developer topics, will lead to improvements in future years.

    I had brought my blue Microsoft event shirts in anticipation of the staff thing, and it worked out well. I'll provide links to my Tech*Talk and the excellent panel as soon as they become available....

    It's funny, almost no one outside of Microsoft I talked to was really curious about where specifically in Microsoft I worked, while nearly everyone inside of Microsoft I met had it as just about the first question they asked.

    Just about every single time.

    The text in the title, that was my answer:

    I work in Windows, but for the moment I'm here for SQL Server....

    Several people expressed curious about that answer, so I'd explain how I used to work on the SQL Server team, and how I had been helping people out there more recently for SQL Server 2008, in a not-entirely-unsanctioned extension of my current role.

    Some would wonder why I wasn't at Tech·Ed for my own group, and I would point out that very few people/groups from Windows were there at all, really in any Tech·Ed. Certainly none of the learning center booths staffed by so many blue-shirted MS folks (with the exception of a few technologies like Hyper-V). Windows developers were really handled under the Developer Division -- whether through the VS folks or the various programming languages or platforms or libraries.

    So in a world where so few people are there from Windows at all, the idea of having folks there from the International group would seem a little odd.

    The IT PRO week might be different -- the deployment issues with MUI, for example, would probably attract more than a few interested customers. Someone might even be heading to Orlando for that reason in week two of Tech·Ed. But I wasn't going to stick around for that, either.

    Not just for the above reasons, but also because then it wouldn't be vacation from my job. And someone might have something to say about me representing the group just because I decide to without so much as an FYI to the PTB in the group....

    Plus between daytime on the floor and evenings at receptions/parties, there was not much time left for other things (like sleep).

    Though I have decided that going to Tech·Ed presentations that are videotaped is mostly a bad use of an attendee's resource.

    I mean attendees will get a DVD of all those presentations that they can watch any time, after all!

    So unless you have specific questions of the presenter that you want to ask while the session is happening or you want to heckle and/or provide moral support for the speaker, the time is much better spent in the various other presentations that are going on all over the floor throughout the week, and in the exhibition hall talking to vendors, and in the learning center talking to Microsoft and other experts in the various technology areas.

    You know, all of the things that you can't just play back later.

    There were lots of really fascinating questions that I helped people with. In lots of areas -- e.g. database design and replication and of course collations, and a lot of interesting questions related to Unicode and Unicode data types.

    There were even several fans of the blog there, some who even waved as I suggested. :-)

    And I helped some people with their slides, too. Easy when friends are giving talks and I never mind it.

    Somehow I also managed to miss Kate Gregory and Julie Lerman (who were there) all week, which was also sad. I'm sure we'll run into each other again eventually, but some improvement on the timing thing seems like a must!

    A lot of the technical topics discussed in the tech*talk, the panel, and the various conversations will probably show up in future blogs, here. But you probably already knew that....

    Many have asked me why I was offline all week -- that was unavoidable given the lack of connectivity in my hotel room (and the fact that I wasn't in it much, anyway!). But I'll be back to blogging next week.

    So was it a vacation? Abso-freaking- lutely, from beginning to end. Pure play and a ton of fun, from beginning to end!

     

    This blog brought to you by(U+29dc, aka INCOMPLETE INFINITY)

  • Sorting it all Out

    How does it work? It cheats, that's how!

    • 5 Comments

    The question developer Cynthia sent via the Contact link was a fun one:

    How can I program the "Fn" key on my laptop keyboard via MSKLC?

    The answer is that you really can't.

    Because the Fn or Function key has a very special job -- its job is to cheat.

    Laptop keyboards are smaller, and they have fewer keys on them than they claim to. They get away with that by having a key that can transform other keys, temporarily changing their identity!

    I'll show you what I mean....

    First we'll go to my Dell, taking that EXE from Handling [Unicode] input in the console to sniff out the keys like I mentioned earlier today.

    I'll type some keys that are on the keyboard and identify them right here so you can see what that EXE tells us. It tells us nothing about the Fn key -- that key does not exist as far as Windows is concerned:

    ReadConsoleInput test
    Ctrl-D to quit.

    #      UC    u/d  VK   SC  State

      0: U+0000 down 0025 004b 0100    <-- Left Arrow  (VK_LEFT)
      1: U+0000  up  0025 004b 0100
      2: U+0000 down 0026 0048 0100    <-- Up Arrow  (VK_UP)
      3: U+0000  up  0026 0048 0100
      4: U+0000 down 0027 004d 0100    <-- Right Arrow  (VK_RIGHT)
      5: U+0000  up  0027 004d 0100
      6: U+0000 down 0028 0050 0100    <-- Down Arrow  (VK_DOWN)
      7: U+0000  up  0028 0050 0100
      8: U+0000 down 0024 0047 0100    <-- Home  (VK_HOME)
      9: U+0000  up  0024 0047 0100
     10: U+0000 down 0021 0049 0100    <-- Page Up  (VK_PRIOR)
     11: U+0000  up  0021 0049 0100
     12: U+0000 down 0023 004f 0100    <-- End  (VK_END)
     13: U+0000  up  0023 004f 0100
     14: U+0000 down 0022 0051 0100    <-- Page Down  (VK_NEXT)
     15: U+0000  up  0022 0051 0100
     16: U+0008 down 0008 000e 0000    <-- Backspace  (VK_BACK)
     17: U+0008  up  0008 000e 0000
     18: U+0000 down 002e 0053 0100    <-- Delete  (VK_DELETE)
     19: U+0000  up  002e 0053 0100

    Now of these 10 keys, note that half of them do not exist as dedicated keys on my MacBook Pro.

    What's a guy who likes his Mac to do?

    No worries, because I can use my Fn key!

    Here is the same output on the Mac Book Pro (booted into Vista 64 bit and humming along quite happily):

    ReadConsoleInput test
    Ctrl-D to quit.

    #      UC    u/d  VK   SC  State

      0: U+0000 down 0025 004b 0100    <-- Left Arrow  (VK_LEFT)
      1: U+0000  up  0025 004b 0100
      2: U+0000 down 0026 0048 0100    <-- Up Arrow  (VK_UP)
      3: U+0000  up  0026 0048 0100
      4: U+0000 down 0027 004d 0100    <-- Right Arrow  (VK_RIGHT)
      5: U+0000  up  0027 004d 0100
      6: U+0000 down 0028 0050 0100    <-- Down Arrow  (VK_DOWN)
      7: U+0000  up  0028 0050 0100
      8: U+0000 down 0024 0047 0100    <-- Fn + Left Arrow  (VK_HOME)
      9: U+0000  up  0024 0047 0100
     10: U+0000 down 0021 0049 0100    <-- Fn + Up Arrow  (VK_PRIOR)
     11: U+0000  up  0021 0049 0100
     12: U+0000 down 0023 004f 0100    <-- Fn + Right Arrow  (VK_END)
     13: U+0000  up  0023 004f 0100
     14: U+0000 down 0022 0051 0100    <-- Fn + Down Arrow  (VK_NEXT)
     15: U+0000  up  0022 0051 0100
     16: U+0008 down 0008 000e 0000    <-- Backspace (Delete)  (VK_BACK)
     17: U+0008  up  0008 000e 0000
     18: U+0000 down 002e 0053 0100    <-- Fn + Backspace (Delete)  (VK_DELETE)
     19: U+0000  up  002e 0053 0100

    Pretty clever, huh? even though Windows can't see the Fn key, the keyboard can. And the hardware can fake out and make Windows think that one key is another, as needed.

    As far as Windows knows, that key identity is indistinguishable from it being that totally different key....

    Now if they could just use the Fn key to fix up the whole  | (U+007c, a.k.a. VERTICAL LINE) yet printed upon the face of the key is ¦ (U+00a6, a.k.a. BROKEN BAR) issue!

     

    This post brought to you by ¦ (U+00a6, a.k.a. VERTICAL LINE)

  • Sorting it all Out

    Sometimes, GDI respects users (even if no one else does!)

    • 4 Comments

    Regular readers know how I am always encouraging, cajoling, and sometimes even threatening developers to respect the user's locale settings choices.

    Not everyone is of a mind to do this, however....

    Like just the other day when someone was sending me mail about a problem they were having:

    We found that the GDI uses the user locale regardless of  the current thread locale for rendering. Data gets formatted using Current thread locale but at last moment the digit replacement happen using the user locale. Is there a strong reason behind this inconsistent behavior or is this a bug in GDI.

    Of course formatting behavior, via functions like

    • GetCurrencyFormat
    • GetCurrencyFormatEx
    • GetDurationFormat
    • GetDurationFormatEx
    • GetNumberFormat
    • GetNumberFormatEx
    • GetTimeFormat
    • GetTimeFormatEx

    is based on the locale you give it, and there is no constant that tells these NLS API functions to "use the thread locale" (though there are ones that will look up the user locale for the caller if requested. :-)

    So that text in red above is not really accurate, unless there is some rogue library or program clling GetThreadLocale and pasing the results to one of the non-Ex functions in the list above....

    Becsause the truth is that GDI doesn't give a crap about formatting or really anything related to locales, with one signle exception:

    Digit Substitution

    Any time you go to render text it will grab those digit substitution settings in the user locale (including the user override information) and use the info to decide how to display numbers.

    And since there is no way to override those settings at the level where GDI uses them like there is (kind of) for Uniscribe (ref: And the digits just keep on coming).

    Well, there is one way....

    If you are doing your GDI rendering vie ExtTextOutW, then there are two flags that have been around since Windows 95:

    ETO_NUMERICSLATIN Windows 95 and Windows NT 4.0 and later: To display numbers, use European digits.
    ETO_NUMERICSLOCAL Windows 95 and Windows NT 4.0 and later: To display numbers, use digits appropriate to the locale.

    Basically these are the (very differently named!) equivalents of GetLocaleInfo's LOCALE_IDIGITSUBSTITUTION settings:

    1 No substitution is used. This gives full Unicode compatibility.
    2 Native digit substitution. National shapes are displayed according to LOCALE_SNATIVEDIGITS.

    So you can override the assumption of the GDI code that the user locale preferences are to be respected if yo uwant to, by calling low level functions and pasing one of those two ETO_* flags.

    But there is no way to make it easier than that, sorry. And not because the OS respects me and my advice,  but because sometimes the only way the user will get any respect is if we force developers to give it to them.... :-)

    This behavior is completely by design.

     

    This blog brought to you by R (U+0052, aka LATIN CAPITAL LETTER R)

  • Sorting it all Out

    Thirdly, aka Forty two, aka Understanding the answer can require a properly defined question

    • 0 Comments

    Continuing on with the third of all point from A whole new spin on the term 'Vertical markets' (aka in SiaO we trust?), the one that had not much to do with typography but was a bit more all-encompasing....

    And much more to do with me personally than with anything else.

    Warning -- an overly introspective post that is too technical to be a worthwhile Potpourri read and too navel-gazing to be a worthwhile technical read. It's just therapeutic for me so you can probably skip!

    It has to do with the nature of the work in Windows International.

    I am once again drawn back to The Hitchhiker's Guide to the Galaxy:

        "It was a tough assignment,"said Deep Thought mildly.
        "Forty-Two!" yelled Loonquawl. "Is that all you've got to show for seven and a half million years' work?"
        "I checked it very thoroughly," said the computer, "and that quite definitely is the answer. I think the problem, to be quite honest with you, is that you've never actually known what the question is."
        "But it was the Great Question! The Ultimate Question of Life, the Universe, and Everything," howled Loonquawl.
        "Yes," said Deep Thought with the air of one who suffers fools gladly, "but what actually is it?"
        A slow, stupefied silence crept over the men as they stared at the computer and then at each other.
        "Well, you know, it's just Everything... everything..." offered Phouchg, weakly.
        "Exactly!" said Deep Thought. "So once you know what the question actually is, you'll know what the answer means."

    This will seem, relevant in a moment, I think.... 

    I have spent most of career at Microsoft with a focus on internationalization, and felt myself being drawn to Windows International all of that time, because I felt that it really was the center of the universe, speaking from the standpoint of Microsoft and internationalization, at least.

    And then eventually I ended up there, l and I realized I was right -- it is the center of that universe. It is where the support comes from, where it all starts. Where, if people are not using our stuff or building on our stuff in one way or another then they are likely to be doing it wrong.

    But the problem, and what led me to the changes I mentioned in Track change (a.k.a. A new job that has a few things in common with the old one) is that while I was sitting in the center of this universe, that it is by and large just where things start -- not where things happen.

    Where things happen is further out from the center or source of things, in the applications from both Microsoft and the ISVs outside of it.

    The Windows International Fundamentals effort in its purest and most original form (the form that the GM explained to me when I voiced my concerns about the increasing distance between where things start and where they happen) was really intended to be an effort to communicate with those places where things are happening and making sure that they are on course and doing things right. Helping them out in making sure things are happening right.

    And that work is incredibly rewarding since to be honest (to go back to that Duvall quote I cited) this is an area where I feel like I do my best work when I am moving through their world but keeping in mind all of the time that it is their world. I don't feel quite as far from them when I am assisting them....

    But (and you know there had to a but since the original allusion talked about this all having a 'disturbing' aspect!), the group really seems to be much more Windows focused, most of the time. When I work with groups in Microsoft outside of Windows or customers outside of Microsoft (both groups of which I think make up a large percentage of the SiaO demographic), I am not being told it is a bad thing, but it does feel like it is being treated like it is outside my actual job, but tolerated.

    Kind of like that India trip now that I think about it,

    And that extends to conferences I speak at or meetings I go to -- I am asked "Isn't that a program manager's job?" or whatnot. Maybe it is, but none of them were stepping up and I was doing those things before I was hired, and was pointedly told when I got the job originally that these things were recognized as being important and they wanted those things to keep happening. But now it is treated the same way -- as being outside of my actual job, but tolerated. Just barely, some days.

    Now I am not claiming that Windows is perfect here, or that it needs no help in improving, but I do think that Windows is not the only piece of this puzzle. and I signed up for a great deal more. In the end, Windows is a platform, a foundation. And just as with the foundation under the house, it is doing its best when we aren't thinking about it. Which leaves all the times that we actually are....

    International behavior in Excel/Word is roughly 42 million times more important than Calculator or WordPad/Notepad.

    And SQL Server's (or Access's) use of collation is roughly 56 billion (the British billion) times more important than the way files are ordered being based on the user locale.

    And what the keyboards do in Word is the only thing cutomers need in order to decide if the keyboard is broken -- behavior in Notepad is of trivial interest by comparison and only occurs to them when we ask them to try it.

    Windows isn't where the important stuff happens most of the time, it is what sits underneath the important things. Which still makes it important, but not the only thing that is important. Because it is not what most customers really see when it comes to international features (even when those features entirely depend on the OS).

    Not all of my colleagues share that same view about the importance though (and not everyone likes my approach to the whole situation, for reasons both good and bad, all of which I accept as valid even if they won't share that feedback with me), and it is hard to gauge how much of what I do amounts in the end to the same sort of strategic miscalculation as Loonquawl and Phouchg were guilty of -- working so hard to find the answer that I did not make sure that the question was being framed correctly.

    Well, that isn't entirely right -- when I originally was thinking about issues like the ones in Open it all up, get out of the way, and then what happens? and Subsets of subsets of subsets of subsets of subsets, I was working to frame the question how I imagined it. But posts in this blog do not amount to team mandates, at least not in this case -- since it is my team in the sense that I am a member, not in the sense of ownership. And when people in Windows have to allocate headcount and budget and resources they can hardly be faulted for framing the question differently than I would, from where I sit.

    I was momentarily stumped.

    But then I had a conversation with someone very wise (way wiser than me, in fact!). A VP in an entirely different business unit and division of Microsoft who I knew from way back when.

    With their help, I had an epiphany.

    Two years from now, if I am asked to identify some of the most important work I did two years prior -- for myself, for my interests, for my passions, for Windows International, for Microsoft, and most importantly for customers -- a lot of it is stuff I can't talk about yet that [technically by the above definition] falls beyond the scope of my job, as some people see it. The distance therefore between the job and what makes the job worthwhile for me is also pretty big.

    So I do think I am doing the right thing by and large, and I'll certainly keep on trying to be doing it.

    But I have to wonder what it is has been costing me from a career standpoint to have my [sometimes meager, other times substantial] efforts tolerated rather than embraced.

    So taking a step back, I let excerpts of the words of the prophet Gavin DeGraw wash over me (obligtory YouTube ref for people who don't have the album here):

    ...Part of where I'm going is knowing where I'm coming from

    I don't want to be anything other than what I've been tryin to be lately
    All I have to do is think of me and I've peace of mind
    I'm tired of looking 'round rooms wondering what I gotta do
    Or who I'm supposed to be
    I don't want to be anything other than me....

    ...I'm surrounded by identity crisis everywhere I turn
    am I the only one who's noticed
    I can't be the only one who's learned...

    ...I came from the mountains
    the crust of creation
    My whole situation made from clay dust stone
    and now I'm telling everybody

    I don't want to be...

    And I have come up with a plan, I think.

    I need to frame the whole problem a bit differently -- with my mangement and the people in the group -- going forward, pulling the bulk of these extra-occupational pieces that are accepted as personal idiosynchrocy in until they are part of my job. Because they are important to Microsoft's customers, and whether I want to blame myself for not doing something sooner or my management for letting it happen, in the end I own my own happiness. And my own career.

    Therefore in the end, if I want them to intersect, I own that, too.

    Plus other random things of dubious value about language and music and a MacBook Pro and more, in a Blog called Sorting it all Out...

    I assume I will probably still have a job by next Monday, but just in case I don't I think this blog would make for a hell of a coda.

     

    This post brought to you by(U+4dc3, aka HEXAGRAM FOR YOUTHFUL FOLLY)

Page 2 of 4 (51 items) 1234