Blog - Title

September, 2010

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    Office skipped version "13" (makes you wonder what they will do in 2013!)

    • 7 Comments

    For a long time, I have been talking about how much LCIDs suck.

    Blogs like Your LCID sucks and It is true that your LCID sucks, but your LANGID sucks more, in particular, try to make the point.

    The underlying practical issues that are often referenced, such as in blogs like Why do LCIDs skip around so much? and Walking off the end of the eighth bit.

    And all of that is true.

    But I think Jon Stewart's Rally To Restore Sanity has had a little bit of impact on me.

    Alythough it is a really good idea to stop depending on LCID and LANGID numbers, the truth is that some people and more importantly some tools still do depend on them.

    They will never be as completely flexible and as able to conform to standards and as extensible as using locale names, but for the languages that Microsoft itself has to support, the tendency to keep allocating LCID values is one that cannot simply stop. Because there are quite simply too internal tools and processes that break if they do not have these numbers to work with.

    When I point out problems like Walking off the end of the eighth bit, I am falling in to the trap of those who would support the literal rhetoric of Stephen Colbert's March To Keep Fear Alive.

    I am fear-mongering, making you afraid of what might break based on the code people might have created years ago that would break once Microsoft ran out of 8-bit LCID values.

    Which kind of ignores the fact that the most worrying of them, conceptually -- 0x0501 -- has already been allocated, for pseudo, and thus no actual locale will be mistaken for Arabic.

    And which also ignores the fact that if there are other, similar "dangerous" LCIDs then they can simply be skipped, just like Office skipped the number thirteen in their version releases (makes you wonder what they and other Microsoft products plan to do in 2013, doesn't it?).

    And which also ignores the fact that while the LCID model cannot scale to either every language in the Ethnologue nor any arbitrary custom locale that it can certainly scale to the (comparatively modest) number of locales and user interface languages that Microsoft has to support in its products.

    Just remember to keep the following in mind:

    • Every new locale that gets an LCID will have a name, too;
    • The name is probably the only thing you can use to work on other platforms that you may need to communicate with;
    • In the long run it is where you really ought to be going.

    and I will stop trying to use scare tactics to convince people.

    We need to restore sanity, not keep fear alive....

  • Sorting it all Out

    Dotting the t's and crossing the i's is more work than that, PDF edition

    • 2 Comments

    Regular readers may recall recent blogs discussing problems with PDF in complex scripts like Beauty isn't only glyph deep and Beauty isn't only glyph deep, even for Microsoft and Acrobat PDF: the Yugo vs. the BMW vs. the Ferrari and Providing more information is the best way to assure correct information is received.

    I thought it might be important to point out that one of the things I felt it important for Adobe to be doing that I mentioned way back in that first Beauty isn't only glyph deep blog:

    And Adobe calls PDF

    The global standard for trusted electronic documents and forms

    do they? For me Adobe has no credibility on that statement until they either

    • remove the word global and replace it with western OR
    • insert the prefix semi- in front of the word trusted OR
    • explain what they are going to be doing to make sure users are not losing their content by default in South Asia and other places in the future.

    That includes bring those free PDF writers forward. A hard problem, to be sure, but they created it themselves, and gave it to the planet. I think they owe it to the planet to make moves toward solutions.

    As it turns out that, without too much fanfare, Adobe is perhaps at least trying to work toward meeting that burden:

    As far as the rest of the world, one thing that is being done is that the PDF/A-2 specification includes a new “Unicode” conformance level which requires a ToUnicode table and/or ActualText entries, so that all text (visible or not!) has an associated Unicode codepoint.  In addition, there are also discussions on the ISO 32000 (PDF) committee about establishing such requirements for the forthcoming PDF 2.0 standard.

    Now given the complex issues I point out in both Acrobat PDF: the Yugo vs. the BMW vs. the Ferrari and Providing more information is the best way to assure correct information is received, this may turn out to not be enough (both the BMW and the Ferrari coming out of Acrobat meet the minimum suggested burden and yet still have problems, which sugges that more needs to be done.

    It is tempting to think of these issues as bugs in Acrobat to report that have little to do with the standard.

    But it seems likely to me that if Adobe can make the mistake than anyone can -- and thus it makes sense for the standard to close this loophole that allows conformant PDF to be wrong....

    And of course Adobe needs to be pushing people to not do old-style PDF whenever possible. There isn't nearly enough of that pushing going on.

    In the meantime, articles like this one give me hope. Well, a little bit, at least!

    Anyone know where that ISO standard is being discussed, exactly? :-)

  • Sorting it all Out

    Like one of those standards that can't/won't be fully implemented

    • 1 Comments

    Just over this last weekend, while I was off at Fremont Oktoberfest celebrating my birthday, friend/colleague/regular reader Doug Ewell was busy writing me mail:

    At http://www.europatastatur.de/info1-9995-3.pdf you can find Karl Pentzlin's proposal, supposedly approved and published this August, to overhaul ISO 9995-3, among other things by adding a fifth shift state.
    Almost all of the combining diacritics (usually implemented in Windows as dead keys) reside in this fifth state.

    What is the best way, using MSKLC and other standard Windows techniques (i.e. not a third-party layer), to implement this keyboard?  MSKLC Help tells me to stay away from the Ctrl and CapsLock states if at all possible.

    [Feel free to make this a blog in your Blog if you like.]

    I'm glad someone was holding down the fort while I was out abusing my liver with friends! :-)

    Anyway, I came back to a whole big thread with contributors like Karl Pentzlin and Alain LaBonté discussed some of the many issues here. Thankfully, the contributions fell into a few broad categories in regard to interest, position, and motivation:

    • Doug's primary interest was wondering about my thoughts on the best way to support this standard's fifth shift state (since he knew that MSKLC advised against using the CTRL and CTRL+SHFT shift states;
    • Karl's primary interest was pointing out that most of the points Doug mentioned were in earlier versions of the standard, and that he was one of the contributors rather than the sole author;
    • Alain's primary interest was pointing out it was hard to claim that Windows didn't support some of this functionality since the Canadian Multilingual Standard keyboard has been in Windows for some time (he cites Windows 3.0 but keyboards there, inWin9x, and WinNT are three separate implementations, so I will assume we mean NT 3.5 or 3.51 for present purposes).

    Now all three of them are completely accurate and correct. Mostly. I mean, close enough for our present purposes, and I am not one of those annoying prigs who will correct people on terminology issues and other such trivia....

    But I'll share my thoughts here.

    My credentials (by which you can judge the validity and/or applicability of my contribution!):

    • Former development owner of all the Windows keyboard layouts other than kbdus and the Japanese/Korean layouts;
    • Principal (well, actually only) developer of the only two versions of MSKLC that have shipped, to date;
    • Possibly the current owner of the Windows keyboard layouts (still being figured out, though I admit I'll be just as happy if some other qualified person volunteers!).

    First of all, there is an MSKLC help topic that I wrote, the text of which is:

    Avoiding the Ctrl Shift State

    While it is possible to define characters for both the Ctrl and Ctrl+Shift shift states, it is highly recommended that you avoid doing this if you can, for two reasons:

    * There may be conflicts with common applications such as Microsoft Word, which use the "Control" key for its intended purpose - control functions. Generally speaking, they will handle the WM_KEYDOWN message (which contains the VK value). Because of this, your keyboard layout will never see that message translated into a WM_CHAR message (which contains the characters). These applications document their usage of these Control keys

    You may rely on many of the same features without realizing it when hit Ctrl+S to save, etc.

    * Even if the application does not try to wrest control of these keystrokes, Windows internally tends to map the Ctrl shift states for VK_A - VK_Z to Unicode control characters any time the keyboard layout does not map them. This little known feature has been a part of Windows for a long time, and there are people assuming it will always be true, for compatibility with old DOS-based applications.
    If you feel you must use the Ctrl shift state, then at least consider avoiding VK_A - VK_Z (which is where most control functions would be kept) and also consider not putting any vital characters in your keyboard layout there, to guard against when the application blocks your layout's desired input.

    You may notice specific keys in the Ctrl shift state are assigned when you load an existing keyboard layout. These key assignments are used in certain legacy applications like Microsoft HyperTerminal and should usually not be removed.

    This is what Doug mentioned, about the recommendation to avoid these shift states.

    You won't die (or be killed!) if you use them, but you will have to live with the times that you don't get what you want from them, because an application puts its own needs first.

    Now second of all, and Alain will have to forgive me here on this point (or he could choose to not do so and just be unhappy with me, but I hope he will not take it in that direction), I don't really care for the Canadian Multilingual Standard keyboard layout, as I explained in Getting all you can out of a keyboard layout, Part #9a and Getting all you can out of a keyboard layout, Part #9b.

    You can think of the code behind these blogs as proof that I could have theoretically written the MSKLC feature to create such keyboards as ones that assign the "LEFT" shift state characters and the "RIGHT" shift state characters as different shift states, were there not tortuously complex usability reasons in the tool to avoid doing so. :-)

    Now this points to yet another reason that I don't like the keyboard layout: after stating why I think using the CTRL shift states is bad, this keyboard not only does that; it does it twice!

    Now this indirectly answers one of Doug's main concerns. Yes, MSKLC recommends against using the CTRL shift states, but the Canadian Multilingual Standard keyboard, considered by some of those creating multilingual keyboard standards based on ISO 9995-3 to be a model, does it with impunity. Therefore, by their definition, the CTRL shift states are just fine for a fifth shift state, at least!

    Okay, there is one problem solved. Well, solved enough for present purposes (I have palmed a card here that I may talk about some other day if someone noticed the card in question!).

    Another problem, which I described in other blogs like It is not easy to chain dead keys on Windows and other blogs like Those keys aren't going to be extended; they're dead! and Only ONE WCHAR per dead key in particular, is that the standard, which has many times that it describes a dead key feature that can support a result that has multiple Unicode code points in it.

    They describe a feature Windows cannot do. The closest we come if what I describe in It is not easy to chain dead keys on Windows, which is nowhere near close enough.

    Plus the fact that no one has ever described how to do the dead key chaining thing anyway makes it much more challenging, too. Though since most of the definitions would fail anyway it isn't too much of a loss.

    MSKLC definitely doesn't support it, again in part to avoid the usability issues it would cause with the current UI and the heavy limitations on what would work.

    Now I have been someone to strongly make this point. I did it for years in this blog, I did it when Erkki was working on the Finnish standard and I heavily impressed the issue several times on Microsoft representatives who attended some of the meetings that eventually became that ISO standard, and several others who have been involved with the standard all along.

    I find myself dissppointed though perhaps unsurprised that a standard would be written that way despite such feedback, particularly from vendor whose lack of support would generally limit the overall adoption of the standard in the market.

    But then I remind myself that there are many standards that do not get fully implemented by every (or even any) vendor -- like the SQL Standard, for example.

    I guess these various standards put out there for keyboards that ask for more than the platform can give may have to be in that category....

  • Sorting it all Out

    Megasupport of multiple ways to display text is the new "megafont"

    • 5 Comments

    The blog you are about to read is my opinion; it may not be yours!

    Blogs like Arial Unicode MS effectively [bites|sucks|blows] may have helped you notice how miuch I dislike megafonts.

    Okay, so I won't have to go over that point again. I appreciate that.

    OpenType optional features (outside of "default" optional features, I mean) have a lot of potential to give people a fine-tuned degree of control over how they want text to appear.

    Potentially, I mean.

    But there is one major problem in the way here, which really blocks the overall usefulness of this feature with all the potential.

    If you familiar with the area, you probably already know what the problem is.

    But if not, I won't make you guess.

    The problem is the fact that it is a largely unimplemented feature across all of the products that need to have it, in order for it to truly be useful.

    Forget about the Gabriola demo for a moment; I consider that to be a distraction.

    Well, don't forget about it. But you can look at it the way many people (including myself) do: as largely a waste of time.

    I mean, how many people using Publisher really need a font that looks different depending on the size.

    In the end, it is fancy demo of a largely theoretical requirement that no one in the real world ever really needs, unless they are showing off OpenType optional features.

    Ok, so let's not forget about the Gabriola demo; let's dismiss it as a demo that simply goes so far to prove a point that it becomes slightly pointless.

    Let's take a more theoretically useful feature to consider. Like the one I described in Which form to use if the form keeps changing? for example. Where the conjuncts you want in Sanskrit are nicely contrasted with the half-forms you want in Hindi....

    It can be interesting, in theory, to imagine having a single font that on could use for both Hindi and Sanskrit. Especially since there is just the one main Devanagari font that ships in Windows (Mangal).

    But tell me, if you actually are writing something that requires you to use Sanskrit and Hindi, can you really wait for Word to implement the ability when they have had versions to do it in? Or would you do it the way that works today, and worked five years ago, and five years from now: two different fonts?

    And of course completely avoiding using Mangal for the project since they changed the way it works and thus broke your Hindi/Sanskrit document anyway.

    Perhaps I am wrong when I postulate a lack of interest in word just because they haven't ever done it yet, but it isn't like they are talking about new features yet anyway. So you can't wrote documents that depend on a feature you can't even use yet....

    Put simply:

    Does it look like Word is in a hurry to support optional OpenType features?

    Does it look like font makerts are in a hurry to create fonts that have them?

    And really, in the end does it matter that much as features go to have all hat functionality in place, even if it were everywhere? Will any user interface ever be simpler than the CHANGE THE FONT interface that has existed in Word since the beginning?

    Perhaps this has been overthought a little. Just like megafonts can be a bad idea, megasupport of multiple ways to display text in one font may not be the best idea either....

  • Sorting it all Out

    If case conversion were harder, people would do it less

    • 0 Comments

    It was just last night that I got the following mail:

    Hi. I have seen your blogs regarding invariant culture, collation, etc and they are very, very useful. Thanks!

    Also, if you don’t mind, I do have a question :-). Do you know why the results below are different.

    Thread.CurrentThread.CurrentCulture = new CultureInfo("tr-TR");
    string upper = "FILE";
    string tolowerCultureSenstive = upper.ToLower();
    string tolowerInvariant = upper.ToLowerInvariant();

    The results will be:
    tolowerCultureSenstive --> fıle
    tolowerInvariant --> file

    My guess is that the tr-TR is not in the default collation table?

    Well, there is a little confusion here since the methods involved refer to casing yet the question is about collation.

    And as I have mentioned before, Collation != Case (a.k.a. Collation <> Case).

    Though for both casing and collation, Turkic languages see the specific alternate behavior with the letter "I" as I described in The [Upper]Case of the Turkish İ (or: Casing, the 2nd).

    Now any time one is dealing with files and the like, one needs to use the invariant style methods because it is important to avoid differences based on configuration. Changing the answer for Turkish to questions like "does the file" exist would never be a good idea but in particular is doing the wrong thing for a few other reasons:

    • in Windows, UPPER-casing is used for files and such;
    • the non-linguistic tables are used to avoid the locale-specific results;
    • Windows essentially does an "ordinal, ignore case by uppercasing" comparison, never doing a case conversion.

    If one is trying to mimic in managed code checks done in native code, then doing the same types of checks/comparisons is the best way to avoid bugs and problems down the road....

    Personally, I wish case conversion wasn't so easy. If it were harder to do then perhaps it would happen less since in most cases it is the wrong thing to do! :-)

  • Sorting it all Out

    Is there any Esperanto in that?

    • 6 Comments

    When I wrote the So when is Esperanto coming? blog nearly four years ago, I pointed out that Esperanto support, via a keyboard, was unlikely.

    People who were interested would really have to create it themselves, if they wanted to see either one.

    Every one in a while a new comment would get posted, like From Lance Fallin a year ago:

    I actually like using Esperanto to communiocate with others who feel that English is too dominant or difficult to learn, and I am running out of time to study more languages as well. I use Esoperanto to keep myself in practice with any second language possible, and it really is about the easiest there is (aside from maybe Bahasa Indonesia and Swahili?)  and as a result, I have no problems typing and seeing what I type in Esperanto ĉ ŝ ĝ ŭ ĥ ĵ  but I have a problem seeing them as typed by others in say ... an irc environment  #esperanto on freenode for example.  It would be nice to have that, even if i have to install czeĉ language or something?  dankon, ĝis la revido :)

    and Sn yesterday:

    I use a US International layout, it allows me to easily produce the Spanish accented characters such as ñáéíóú, it would be so nice if typing ^j ^c ^h would produce the desired Esperanto ĵ ĉ ĥ etc.  It probably can be created with the MSKLC, but it would be so much nicer if it comes in default.  Wasn't this what an "international" layout supposed to provide?

    It may stretch the definition of "International" a bit, but that US International has clear goals, and is used by some locales just the way it is (like Dutch, for example). It does not make much sense as a dumping ground for all letters, incluing ones that Windows does not claim support for Esperanto, the language that uses those letters:

    ĉ U+0109 LATIN SMALL LETTER C WITH CIRCUMFLEX
    ĝ U+011d LATIN SMALL LETTER G WITH CIRCUMFLEX
    ĥ U+0125 LATIN SMALL LETTER H WITH CIRCUMFLEX
    ĵ U+0135 LATIN SMALL LETTER J WITH CIRCUMFLEX
    ŝ U+015d LATIN SMALL LETTER S WITH CIRCUMFLEX
    ŭ U+016d LATIN SMALL LETTER U WITH BREVE

    is there a language that has them all? I think MSKLC remains the way here.

    And that there is little Microsoft can do here in any shipping version of Windows, since that effort to rethink the limitations of the locale model that shove locale usage firmly into buckets modern, specific region based usages still rule the roost. And even MSKLC assumes some locale for each keyboard, a locale representing the keyboard's language....

    As long as Windows thinks about things that way, a whole scenario of some specific locales that are not so easily categorized will continue to leave keyboards out in the cold....

  • Sorting it all Out

    Intentional, or maybe not. Hard to say from the outside....

    • 0 Comments

    There are a lot of things that happen between versions of a product like Windows.

    Not all of them are especially noteworthy in a way to end up in any kind of product review or documentation.

    One of the things that happens that would fall into this category would be small incremental improvements based on complaints from customers.

    Sometimes though, whether the "improvement" is actually better in terms of what it gives the user is unclear, especially when one has small behavior changes.

    It can even make one suspect that the change itself was unintentional, like perhaps a side effect of other, planned changes.

    At that point, there is little to do but have it end up in a KB article.

    Or a blog. :-)

    For example, take the following report:

    OS: Windows Server 2003 (can also reproducible on Windows XP)

    Issue: When we set customized settings for Regional and Language options , the customized settings, in particular currency settings are not saved.

    Details: From the main page "Regional Options" We select a value from the top drop down menu, say English (United States), click on Customize and go to currency tab. We change the value from default $ to some other. Click on apply and ok. We go back to main page.

    After this, from the drop down menu for  "select an item to match its preferences", click on some other value, say English (United Kingdom) and even without clicking on ok or apply, click the drop down menu and select the previous value, i.e. English(United States) and the customized settings we did earlier for that will be reset back to default, i.e. it will show as $.

    I am able to reproduce this on my Windows Server 2003 SP2 machine as well as we have tested this on Customer's XP machine as well.

    However, I am not able to reproduce it on my Windows 7 machine as the customization once set will remain the same and we have additional reset button under customization which will reset it back to default. May be the absence of reset button is the reason behind this behavior on Windows Server 2003 / XP machines.

    This is a perfect example. Prior versions has noted the strange inconsistencies with multiple level apply buttons and the fact that settings were not retained when someone might expect them to be.

    And with some of the many changes underneath Regional and Language Options, they were saving more of the interim state prior to that main "Apply" button on the main dialog.

    Now I know for fact that people have complained about the general issues with state in the control panel applet -- between the changes that temporarily get applied and then get rolled back on cancel, the ones that you hit the apply button but then cancel after a "nested apply", the changes that happen instantly and affect the clock in the system tray.... Many people fined the behavior confusing.

    I don't know whether the reported change was intended or not, for what it is worth.

    I know that some of the internal changes to Region and Language that happened in Vista required information to be cached a little differently, so perhaps it is a side effect of that.

    Or perhaps it was intentional. Though to be honest it occurs rarely enough that the only time it would be really useful would be when there were many changes being made -- which would be even more rare.

    One time the change truly comes in handy? If you have many changes, some of which you are copying from other locales! You can make a change, look at the other locale, make the next change, and so on. Not too common, but I can see that being useful....

    I'll be honest, I have no idea if it was done on purpose or not.

    What do you think?

    And do you like change to this one part of the "state inconsistency cloud" in the interface?

  • Sorting it all Out

    A confluence of circumstances leaves a stone unturned...

    • 2 Comments

    Thinking back to blog posts from earlier this year, like

     It can be important to think back to an earlier one, Conventional wisdom is retarded, aka What the @#%&* is _O_U16TEXT?.

    In it, I talked about how I was meeting with STL. What I didn't mention at at the time was that we were meeting to talk about what were the features in the CRT that were broken in regards to Unicode support.

    There was a bunch that was broken like the locale data only having a non-Unicode backing store, and all of these items were fixed in version of the CRT that came out most recently.

    However, as that blog points out, when STL was looking at one of my other complaints (the Unicode input/output problems), he found that someone had already done the work to add things like _O_U16TEXT and the like to get the right behavior when dealing with Unicode.

    So he left that stuff alone (since it worked) and focused on the stuff that was insanely broken (like the locale stuff; considering all the conversion code he had to remove he might have had negative LOC numbers during that time!).

    Now all of the blogs I list above do the work directly via the Win32 API (WriteConsoleW/WriteFile) and mention how you can the input via ReadConsoleW/ReadFile. I had tested all that and all of that works just fine, and has for some time.

    This begs the question: why would I do that extra work after starting it all years ago by pointing out that the comparatively easy code (just a few lines) using the CRT works well?

    Well, it was a side effect of how I was doing the samples!

    You see, in my samples, I was usually writing C code in C# as a way to be more universally accessible to developers who weren't as comfortable with strauight C, and I could not figure out how to do the setmode work via the CRT in C#, so I just decided to do it the long way and go from there.

    Unfortunately, there was a consequence of all of the previous history and this explicit choice of mine.

    As it turns out, the work to support stuff like _O_U16TEXT et. al., while properly handling stdout and stderr, has no proper support for stdin.

    It was architect Dave Thaler who first found this out when he was doing some essentially unrelated work. Like all good developers, he went the extra mile to try all of the reasonable things he could think of, like these three variations:

        wprintf(L"> ");
        wscanf(L"%ls", Buffer);
        wprintf(L"stdin: %ls\n", Buffer);



        wprintf(L"> ");
        _setmode(_fileno(stdin), _O_U16TEXT);
        wscanf(L"%ls", Buffer);
        wprintf(L"stdin _O_U16TEXT: %ls\n", Buffer);



        wprintf(L"> ");
        _setmode(_fileno(stdin), _O_BINARY);
        wscanf(L"%ls", Buffer);
        wprintf(L"stdin _O_BINARY: %ls\n", Buffer);

    but each of these, and more, were failing -- the first one was converting it in and out of Unicode (ick) and the second two were requiring the user to type ^Z (CTRL+Z) to end the input and then also corrupting the text.

    The investigation showed that the stdin side of this fix for Unicode input/output was never done.

    I was so used to thinking of STL as the guy who did the Unicode work that it was only after he remind me that I realized that this was work someone else had done, at least four years prior to him even having a chance to look at the code!

    Suddenly I felt a little bit relieved that I had been doing it the hard Win32 way all this time, because my code simply worked.

    Though I felt bad that I hadn't figured out the "setmode from C#" problem and stayed with the "easier" CRT code, since then I would have seen the same bug that Dave ran into later as he was writing something that was using the CRT.

    Anyway, the bug is in now so they can look at fixing it in the future, and in the meantime doing it all in Windows via functions such as ReadConsoleW/ReadFile will do the trick. And maybe someone from CSS could write a KB article about the CRT bug.

    Or failing that, there is this blog you are reading.... :-)

  • Sorting it all Out

    Looking at life a bit more vertically, for a moment...

    • 6 Comments

    The question Craig asked me last December was clear enough:

    When will Windows support a vertical layout for a "real" Japanese localized version?

    Now I have talked about vertical text in the past, in blogs like:

    and really this does not do much more than scratch the surface, though to be honest there is not too much that is really deeper than the surface. :-)

    Japanese verticals layout itself has just a few basic requirements:

    • knowledge of which characters should be rotated and which ones should keep their same orientation, and
    • a technology that can draw the glyphs in a vertical manner, and
    • an algorithm to handle "line breaking" properly, including the rules about acceptable characters to break lines on and not break them on.

    Perhaps it is the lack of overall complication that has led the bulk of technologies after GDI and its @-prefixed font logic (the former handles #2; the latter #1) from doing much more work here. But this lack does require the applications that want to support vertical text to do as lot of the actual work themselves, without a whole lot of shared code for the bulk of the work.

    Unicode itself doesn't store #1 even though it is an obvious potential UCD property; as such every company that wants to do this work must do it on their own. And they do, pretty much....

    But Craig's question is not just about vertical support in general (which exists, despite some assembly being required), it is about more specifically applying it to entire products.

    Now this is where things get complicated.

    The notions of UI mirroring and UI flipping are fairly well defined, providing a somewhat obvious and not entirely unintuitive notion of what it should do to the user interface.

    But how one would handle the analogous case of vertical text is not nearly as well defined, at all.

    Imagine what has been purported to be a Microsoft interview question in the past, which can be found on many different web sites like this one:

    Imagine you are standing in front of a mirror, facing it. Raise your left hand. Raise your right hand. Look at your reflection. When you raise your left hand your reflection raises what appears to be his right hand. But when you tilt your head up, your reflection does too, and does not appear to tilt his/her head down. Why is it that the mirror appears to reverse left and right, but not up and down?

    Disclaimer: I do not personally know of anyone who has either asked or been asked this question, so it may be apocryphal.

    Obviously the process involved with trying to mirror text horizontally for Hebrew or Arabic scripts is not the same as the process to take vertical text and make it horizontal, but when I mentioned that "interview question" of 20 different software developers over the last few months and asked them if they thought there was a relationship between it and the lack of vertical support that was a consistently methodical as we have for Bidi, 14 of the 20 said there was probably some relationship (though none could state what it might be).

    For the record, I think that the involvement may mildly exist but not as a direct thing --it is more of how we look at things on the vertical axis in general (which relates to the mirror question a little bit), and how a Japanese user who has been looking at vertical layout in newspapers and books and signs for their entire life might expect a "vertical layout" of software UI to look in particular (which does not really relate at all).

    For most intents and purposes, I'd say they are two separate problems, though perhaps a real study of how users (most of whom have two eyes placed on a horizontal axis) track vertical vs. horizontal text might find some relationship.

    But clearly the work to support a vertical user interface would require a lot of that actual design planning and consideration in order to determine how it would in fact look. And no one is really doing that kind of work unless they are looking to extend known uses vertical text today (like newspapers and books) into computers. Even that is not getting nearly enough attention! But one does not see a lot of metaphor extension of the general user interface of computers into a vertical world -- with the possible exception of those green CRT screen images in The Matrix, which themselves have a serious Japanese encoding bias. :-)

    For better or worse, the Japanese market moved from a largely Top-to-bottom/Right-to-left text model to a Left-to-right/Top-to-bottom text model for the sake of typewriters and computers. And thus the best chance computers had for a Unicode Vidi Algorithm was thwarted in almost its infancy, and no one ever really tried to extend the metaphors we use in computers to a vertical market....

    The only things that are ever made vertical are ones that extend metaphors that are vertical in Japanese, and it is limited to products like Internet Explorer, PowerPoint, Publisher, and Word -- plus a little bit of work here and there in places like Access forms, which themselves often model text forms.

    This is not lack of imagination so much as an explicit decision to not use that imagination and apply it here.

    Not that technology is doing much better. Even readers and newspaper type applications are skipping vertical way too often. We must do better here, truly!

    This is especially interesting for cases like Mongolian, which is itself in some ways almost at that same beginning that Japanese was once at, looking at computers. For the most part, they generally feel that a Left-to-right/Top-to-bottom text model would never be acceptable for their user interfaces. And they lack the market share -- a total of fewer than six million users, so that even a reported literacy rate of 90% does not look like it can move the needle enough to inspire that user interface revolution the way the > 120 million could have in Japan, back then.

    Our timid attempts to put it all in a left-to-right context:

    are about as unpopular as you might imagine, or perhaps slightly moreso, in Mongolia. According to those I have asked (though they don't mind it so much in China, apparently).

    I wonder what an interview question of a PM candidate (asked to describe how they would attack the problem of user requirements and design for a vertical user interface in Mongolian) would look like. And if their answers would show that spark of original thought in untapped areas (which is rare), or the ability to navigate murky scenarios (which is uncommon), or the ability to founder on the shoals of despair without coming up with much (which is more common than one might have hoped for).

    Tempting!

    Even so, I think about Craig (who is a computer person working in Mongolia for nearly a decade) and those like him may be the ones behind that next revolution, in the way on might conceptually translate enough user interfaces to a vertical world to allow the large companies like Microsoft and Apple (with the resources to do the work here even if they are not using them) to understand how one might look at the world more vertically.

    Perhaps a small group of people will even put the Japanese market to shame a little bit, by solving the conceptual problem that they shied away from, so many decades ago....

  • Sorting it all Out

    Is there a CLOSED FOR BUSINESS sign on the door or something?

    • 7 Comments

    About a month ago, Satchmo Pops asked in the Suggestion Box:

    Hi Michael, I would like your comment on VB6 today, and the support and the extinct art of programming for the OS. Well, let me explain better.

    1) Not long ago MS released the latest VB6 Cumulative Update (didn't think MS would do this, unless they are looking to finally break VB6 inside out).

    VB6 Update (05/2009)

    http://support.microsoft.com/kb/957924

    Then MS released a hotfix for it (10/02009):

    http://support.microsoft.com/kb/974899

    Would it be worth it to install this?

    2) MS seemed to have abandoned ALL application-development software for Windows to embrace only application development only for platforms (.Net, Azure, ..). Well, any of your thoughts on this.

    3) Not long ago MS discontinued the DLL database (which one could check for information of all MS DLLs including the products that included those versions).

    4) I'm still with the idea of developing applications for the OS, but that seems impossible anymore. Am I missing something or MS still have products that do this?

    Well, that kind of covers what I'd like for an article.

    Well, that is a lot to cover. And none of it really in my area! :-)

    I guess I can start by saying that if programming for the OS is an extinct art, then the very large development team within Windows have definitely been kept out of the loop!

    In other words, it isn't actually true. :-)

    Now, for the sub-questions:

    1) Not long ago MS released the latest VB6 Cumulative Update (didn't think MS would do this, unless they are looking to finally break VB6 inside out).

    Actually, cumulative fixes get released all the time. Usually after a sufficient number of hotfixes have gone out for a component and they decide to bundle them up and put them out there!

    As for the sub-sub-question about KB 974899 ("Would it be worth it to install this?"), there is exact information in KB 974899 that describes the bug being fixed. If the bug applies to you, then you will want to install it....

    2) MS seemed to have abandoned ALL application-development software for Windows to embrace only application development only for platforms (.Net, Azure, ..). Well, any of your thoughts on this.

    This is incorrect. Plenty of development still happens on Windows, both from Microsoft and others.

    Do these other platforms and tools exist? Sure. But many of them have crucial pieces written in C and C++ and calling traditional Win32 API functions.

    3) Not long ago MS discontinued the DLL database (which one could check for information of all MS DLLs including the products that included those versions).

    Was there a question in there? :-)

    Yes, it is true -- Microsoft discontinued the DLL Hell (b.k.a. DLL Help) database (here) on February 8, 2010.

    It was called "DLL Hell" internally. DLL Help is not an eggcorn so much as a way to pass Policheck style tools and not offend people who would prefer Dante's levels of Hell be reserved for less mundane things than software....

    I have no idea why they retired this.

    Though the answer to the Why don't they still have ______? question is alomst always money. Probably the group that did it lost their funding, or charter, or got re-org'ed into something that couldn't support it anymore.

    I think it sucks.

    Note that other KB articles still refer to the tool quite a bit as a resource. Anyone from PSS wanna update this one, for example? :-)

    4) I'm still with the idea of developing applications for the OS, but that seems impossible anymore. Am I missing something or MS still have products that do this?

    This confuses me more than anything to date.

    Over 85% of Windows itself uses native code, most of the big Microsoft applications like SQL Server and suites like Office have similar ratios or ratios less charitable to managed code, and significant pieces of Visual Studio also use native code (even when they are building managed code).

    Windows applications aren't only written in VB 6.0, and never really have been written in only VB 6.0.

    And clearly even if one dislikes managed code (which I do sometimes; it comes in spurts whether I do or not) but wants to write Windows applications, one still has options.

    If it comes to that, many people are fdond of managed code, too. You can write Windows applications with it as well. :-)

    Anyway, I guess that covers it. Windows is still open for business!

  • Sorting it all Out

    If you are Persian, you may not always want to follow your Parent

    • 6 Comments

    This blog is not making any kind of political statement about Iranian President Mahmoud Ahmadinejad or anyone else from Iran. It is about the Persian Language Interface Pack. Oh and the Pashto one. And the Urdu one. And the Dari one....

    The other day, in a comment to my Farsi? Persian? You'll be getting some LIP about it either way blog, stillife wrote:

    I've been using (and loving) the LIPs since Windows XP. They are really useful for getting people who haven't mastered English yet to still be connected to the world (like my grandmother). So kudos to Microsoft. Now, there is one problem and that is Microsoft's refusal to ship a native PDF reader app with Windows (I understand there may be antitrust issues here, but still). That leaves users to have to go for third-party tools like Adobe or Foxit. The problem is that Adobe doesn't seem to care too much about Farsi users and doesn't offer any language assistance to Farsi speakers. Foxit doesn't have any support either (although I'm in the process of creating a translation). The only useable alternative is SummatraPDF. Now I've been helping the developer of SummatraPDF with the translation into Farsi (which was started and substantially completed by another user). However, we are having a bit of problem trying to figure out how we can get the program to follow the LTR settings of the underlying Windows. A lot of applications seem to have no problem with this and switch the window controls and menus around once the LIP is installed. We are trying to do the same thing with SummatraPDF. I realize this might not be your area, but I was hoping you would be able to at least point us in the right direction. This way we can at least provide an alternative PDF reader for Farsi speakers until (fingers-crossed) Microsoft comes out with a PDF application for Windows (and includes the language files in the LIP).

    Again thanks for your efforts.

    Bests,
    Still

    Now the very beginning of the comment I talked about in On Feedback (some positive, and some the other kind), and I mentioned I'd talk about the rest some other time.

    Think of today as some other time. :-)

    I am not a lawyer, but I am unaware of any specific antitrust issue related to PDF reading or writing. Since it is not my background, I won't comment further on that except to say that after all the words I have said recently about PDF quality with complex scripts that even arguably the ultimate PDF tool (Adobe Acrobat) has, I imagine I would be VERY hard on quality issues for anyone who tried to jump into that foray on the platform side -- for either reading or writing. The world doesn't need yet another application doing it wrong....

    With that said, i wanted to talk about what the bulk of the comment was about, since that is an area I know something about.

    The rest of the comment is referring to a very interesting issue that affects many different Windows Language Interface Packs -- not just the Persian one, but also the Pashto one from prior versions, and the Urdu and Dari ones from Windows 7 and prior versions as well.

    It starts with the issues I discuss in Two wrongs don't make a right, but two lefts can take a good whack at it. And the many issues surrounding directionality on the process level.

    Now if one does not use the myriad of ways to explicitly set the directionality of the process, then the process directionality is inherited from its parent.

    The vast majority of processes that users directly deal with do not set directionality explicitly. And they tend to have Windows Explorer as their parent launching process.

    And here we see our problem, actually.

    You see, when one is on Arabic or Hebrew, one's "parent" has an RTL directionality (Arabic or Hebrew, respectively), but in the case of LIPs, they are all installed atop English, which means that even though the user interface language is Persian (or Urdu or Pashto or Dari), there is an underlying LeftToRight-ness of any process one creates while running on this configuration that does not explicitly try to be something else.

    In the case of SummatraPDF, it appears to be [incorrectly] assuming that the RightToLeft-ness of the localized pieces of the Persian LIP are getting their directionality from a parent inheritance type thing, when they are actually getting it from a couple of LRMs in the FileDescription property of the .mui files attached to the EXE of the process (I doubt SummatraPDF is using .mui files; in their case they just need the localized file to either insert the two LRMs or use one of the many other methods such as a

    SetProcessDefaultLayout( LAYOUT_RTL )

    call placed early during process creation).

    Now this solution is not perfect, but since there is so much of the user interface that is not "top-level" and thus not localized in a LIP, changing the design would essentially break the directionality of many pieces of the existing user interface since so much of it does currently rely on the inheritance from a parent, and no one wants to start tagging everything with some kind of new "use LTR" tag to update all of the UI in Windows.

    A related issue, the kind of bug discussed in The mythical nature of bidirectional support, and where the wheels come off the wagon, is also relevant, since that directionality also impacts default text layout, and causes several bugs thast occur on English that are fixed in Arabic to not be fixed on Persian or other RTL LIPs....

  • Sorting it all Out

    >50% of the web is Unicode? Meh, I say. Meh.

    • 8 Comments

    Back in the end of August, president of the Unicode Consortium, Google engineer, and all around nice guy Mark Davis broke radio silence on his twitter account and tweeted the following:

     That link is to the older article on the Google Blog (Unicode nearing 50% of the web) from January of this year.

    Now everyone repeated this and you can see the "25 retweets" annotation. It sounds like really big news, and very cool.

    I've decided that I am not really all that impressed.

    Sorry, Mark.

    I should explain why I am not impressed.

    First, I'll grab another tweet that has one of those stats of the type you hear all the time from Google exec types, this time mentioned by nomad411, someone else I follow in a tweet:

    Now one can obviously presume that Google has everything on the Internet indexed. They say they do.

    And obviously this quote isn't specifically talking about the Internet (though similar quotes from Google execs that this quote reminds me of often do!).

    So, in an environment that claims that the amount of data will be literally more than doubled in 48 months, the fact that it really takes seven months to go from "just under 50%" to "just over 50%" (a time frame that by using these rough numbers would mean that there is perhaps a ~14% increase in the data on the Internet, so the fact that we only moved a few percentage points on the Unicode side seems worrying).

    Why is so much of the data being created today on the Interent not in Unicode?

    Note that ASCII is UTF-8, etc.

    This seems a much more interesting question for Google to spend time on, so they can move more of the web -- why not a blog pointing out the nature of the new data that isn't using Unicode?

  • Sorting it all Out

    The messy concept of the LIPs from when XP was LIP-less

    • 2 Comments

    Sometimes, our documentation can be confusing.

    Like for example, take Comparing Windows XP Professional Multilingual Options, and the text/tables from Appendix A and Appendix B. These tables have managed to confuse many people.

    I'll provide them here:

    Appendix A: Localized Versions of Windows XP Professional

    There are 24 fully localized versions of Windows XP Professional.

    Arabic

    Hebrew

    Portuguese (Brazil)

    Hungarian

    Chinese Hong Kong

    Italian

    Chinese Simplified

    Japanese

    Chinese Traditional

    Korean

    Czech

    Norwegian

    Danish

    Polish

    Dutch

    Portuguese (Portugal)

    Finnish

    Russian

    French

    Spanish

    German

    Swedish

    Greek

    Turkish

    Appendix B: Windows XP Multilingual User Interface Pack Languages

    There are 33 languages available in the Windows XP Professional Multilingual User Interface Pack, which is an add-on to the English version of Windows XP Professional:

    Arabic

    Hebrew

    Portuguese (Brazil)

    Hungarian

    Chinese Simplified

    Italian

    Chinese Traditional

    Japanese

    Czech

    Korean

    Danish

    Norwegian

    Dutch

    Polish

    English

    Portuguese (Portugal)

    Finnish

    Russian

    French

    Spanish

    German

    Swedish

    Greek

    Turkish

    Note: Any of the following supported languages that do not have an official localized version listed in Appendix A are recognized as the only localized version in their corresponding geographic areas.

    Bulgarian

    Romanian

    Croatian

    Slovak

    Estonian

    Slovenian

    Latvian

    Thai

    Lithuanian

    Now really all they were trying to convey was that the last CD of the MUI version of XP (which came on several CDs) contained these nine languages that were not like the others that were also available separately as localized versions.

    They were also smaller than the other 24, and could rightfully be considered "Language Interface Packs before Language Interface Packs actually existed".

    In fact, once LIPs did exist, a few years after this was written (ref: Microsoft, you giving us some LIP?), these nine "partial MUI packs" became LIPs!

    These nine "MUI languages" became the only ones that one could find a way to download for Windows XP when one searched around the web.

     Now this of course confused a lot of people later.

    Even though before LIPs existed, these nine MUI packs were just as difficult to get as the others.

    I guess we can blame it on some sloppy definitions in the model that led to a bit of confusion (if you want real confusion, take that last XP MUI CD and install one of these languages and then try and install the corresponding LIP!).

    Of course removing the whole Comparing Windows XP Professional Multilingual Options white paper would make very little sense, since there is a lot of other information in it that is not confusing.

    Editing a 7-year-old white paper can also be a little tricky (it probably should have been edited back in 2005 when the model was changed to allow for LIPs).

    Hopefully something can be done; if not it will just continue to confuse people....

  • Sorting it all Out

    {Insert appropriate pun/allusion to 'When I'm 64' here}

    • 3 Comments

    It is true.

    Every machine I run now is a 64-bit one, running a 64-bit version of a Microsoft operating system (well, in one case a 64-bit version of an Apple operating system....).

    Now last week in 64 bits of awesomesauce, delivered!, I pointed out how the work was happening to add 64-bit version of all of the Language Interface Packs that weren't up already.

    Well now, I can say that they are all officially up!

    Here are the ones that had been released as "32-bit-only, ever" versions, which now come in both 32 and 64:

     This is really awesome, I think. :-)

    One important difference that does remain between the above list (and the remaining LIPs still coming out) versus the original 13 LIPs from Thirteen (13) can be a lucky numberis that those 13 (Basque, Catalan, Galician, Hindi, Icelandic, Indonesian, Irish, Luxembourgish, Malay (Malaysia) Maltese, Norwegian (Nynorsk), Vietnamese and Welsh) will have both 32-bit and 64bit versions in the OEM and System Builder channels as well.

    By the way, I find it very annoying that Galician is not in the blog's built-in spell checker; they suggest Galilean instead. Like how often would that make sense across the MSDN blogs?

    Warning: Almost everything I have written up to now in this has been things that are fact. Everything that follows represents supposition on my part based on my observations and knowledge. You should only trust insofar as you trust ME since I am not speaking for Microsoft or any OEM partner or anyone else....

    That difference though for those 13 versus the rest is not such a big deal -- it underscores some of what I think are the reasons behind the original thirteen that I didn't get into in Thirteen (13) can be a lucky number, which involves requests from people building images they were deploying needing to support certain languages!

    Now OEMs are not shy about asking for what they need from Microsoft so that they can achieve their own plans, and if system builders are any less so, it would mainly be that they have fewer ways to get their message delivered than OEMs do (OEMs obviously have to have special connections because of the nature of the relationship). All of those folks also plan way out what they want to do and thus Microsoft wouldn't tend to be caught by surprise or anything in the way of demands or requests....

    But the change to always release 32-bit and 64-bit for all LIPs? That was in many ways driven by customer feedback, some of which happened right here. and by its nature that has a lot less to do with long term plans than customer reaction (e.g. how easy is it to see a "32-bit only note" and start complaining about the problems that causes).

    And the work done by Windows International Test to squeeze all of these languages in, and by Windows Release for doing all of that additional work that must happen before the release. because despite the fact that Microsoft has a lot of those same qualities related tro long-term planning and scheduling, there just happen to be some amazing people who are able to go the extra mile to get things done when it is needed. A lot of my thanks definitely go to them for their efforts....

    In any case, we are all caught up, and on a much better road now.

    And if in the future communications and plans see OEMs and system builders start to clamor for other languages, they will in all likelihood be able to spot the appropriate additional languages there as well. It is very customer-based.

    If there is anything Microsoft is good at, one of those things is properly supporting the partners who are behind the purchase of tons of copies of our software on behalf of their customers.... :-)

  • Sorting it all Out

    Providing more information is the best way to assure correct information is received

    • 3 Comments

    I thought a small amount of information on PDF and Unicode might make sense as a bit of introductory material for some of the blogs I have already written. :-)

    I owe a a great deal of the centralization of the Unicode question in PDF to information from Leonard Rosenthol of Adobe, since although I knew about most of this, I was often missing correct terms and never thought about them in such a good framework of comparison until after communicating to him. Anything incorrect here is probably my fault, though!

    Now there are three basic ways for Unicode data to be in PDF. And really any one, or two, or all three of these methods can be there.

    The first method is storage of the font encoding information itself (in TTF & OTF). Any effort to properly display Unicode data will include this information, which can make it one of the "easiest" pieces of information to include if you are an application that is doing the work to use the font information to display the text. If you are not such an application creating PDF then it is not something you would likely have - but if you are doing the work to render information in an application then it can be some easily available information to include.

    Reverse engineering from that information is obviously quite possible, though I like to think of it as being a way to extract essentially equivalent text, since there are cases -- from Unicode normalization and the way choices are made to favor composed glyphs, as well as other such mappings -- that the text you would get back from using only this data may not be exactly what was originally in the document. This is similar to the issue I pointed out in Documented, schmockumented! It's still kind of cool.... but easier since one has more information available to do the reversal if one has this information.

    The distinction should not matter in most cases, especially if you have the same fonts as were used in creating the orginal PDFs. But in cases where you don't have the exact same fonts there can be subtle differences that might matter to you.

    Although this is the easiest way to store "Unicode" data in the PDF since it is for the most part just embedding the work you had to do anyway to render, it is not always the easiest way to extract text, however. It can be easy to get it wrong when you do that extraction.

    The second method is the ToUnicode table, which is what it sounds like -- the actual Unicode data, with regular pointers (indexes into the text) so the data in the PDF information is directly referencing Unicode data. Depending on what data is stored, you can choose what data will be able to be extracted rather directly.

    Now this give a great opportunity to work around some of the potential problems with the font encoding information, obviously (since you are putting actual Unicode text in there you are choosing what goes there). Of course a PDF writer can put in information with the same kinds of transformations as well, but it is the opportunity to do better that is interesting.

    I have seen examples where a document would use Arabic compatibility characters -- for more on these see blogs like It Does Not Always Pay to be Compatible and Getting out of dodge (or at least out of the compatibility range!) -- and the PDF's font encoding information would have the same thing, but the ToUnicode table would have the real Unicode Arabic characters. This sort of thing is a great way to help with searching algorithms that scan PDFs so is not an unreasonable way to go here depending on the ultimate goal of the PDF itself.

    This is not the most common way to go though -- the thing you will most often see in a ToUnicode table is the original text. If you have it anyway, it is easy to add.

    The only "hit" to storing both is the size increase of the PDF. As Leonard specifically suggested to me, "...they require the least amount of work as the information is associated with the font itself and the page content need not be impacted. For most things, these should be sufficient..."

    The third methodis the ActualText tag. This is the magical way to provide easy shortcuts to the exact data to get back when there are contextual glyphs in the PDF.

    Now obviously you can in theory get the same information from either of the previous two methods (which is good because they are the most common!), but in the case of documents with many conceptual glyphs (only some of which might also be in Unicode, though in most cases when they are in Unicode they are compatibility characters as most of these contextual glyphs are things like the conjuncts you might see in Sanskrit, and so on -- special forms of text that can often represent explicit choices on the part of a font.

    A great example of this is something I talked about in Which form to use if the form keeps changing?. Judicious use of the ActualText tag can effectively preserve these distinctions without requiring special processing to get the actual text back, something that can be required if you rely on the ToUnicode table to try get the information back.

    When you think about cases like the "Ferrari" scenario of Acrobat PDF: the Yugo vs. the BMW vs. the Ferrari (for example), it is easy to see how not including good ActualText can make it easy to mess up cases like glyph reordering. I like to think of the ActualText (when it is available, which is comparatively rare since it does require extra work to create well) as a great list of shortcuts to get information that would otherwise be more difficult to get well.

    And the bugs that appear in even the most sophisticated tools tend to bear me out on this opinion; things are incorrect often enough that including this data more often would be one of the best ways to get good results back. Though this does assume that PDF readers will make use of the information (not all of them do!), my general feeling is that providing more information is the best way to assure correct information is received.

    Now, if you have PDF reading tools that can look at all of this information you can do a lot of forensic PDF work to determine the actual cause of many of the bugs that exist in PDFs today, especially for complex script cases. But forensic PDF work is really only interesting to two groups of people:

    • Those fixing the bugs, and
    • Those trying to get correct, searchable data out of PDF despite the bugs.

    I tend to find both such processes to be very noble goals whenever they are identified, though unfortunately both tend to be a lot rarer than I would like....

Page 1 of 2 (30 items) 12