Blog - Title

October, 2007

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    Script and Font Support in Windows

    • 5 Comments

    This is some info that Peter Constable put together which I am re-posting with permission... good stuff! :-)

    Script and Font Support In Windows

    Since before Windows 2000, new script support has been added in each major release of Windows. The following describes changes made in each major release.

    Note that support for a script may require certain changes to text stack components as well as changes to fonts. Windows has many different text stack components: GDI, Uniscribe, GDI+, WPF, RichEdit, ComCtl32… The information provided here pertains to GDI and Uniscribe.

    Comments on language usage are included in cases in which associations between scripts and languages may not be well known. The list of languages for any given script is not necessarily exhaustive.

     

    Windows 2000

    This was the first version of Windows that included the Uniscribe component (usp10.dll). Functional benefits included: 
    • Shaping support for complex scripts such as Arabic and Thai were brought together in a single library, allowing all language version of Windows to be built from a single source and to display text in any supported script. (Previously, enablement support for particular languages was added to localized versions.)
    • Some new complex scripts were supported.
    • Font fallback for different scripts was provided. (This is done in Uniscribe ScriptString* APIs that are typically used in UI components.)

    Support for basic Latin, Greek and Cyrillic (without combining marks) existed with W APIs in Windows NT 4 and Windows ME. In relation to complex scripts, support for Arabic, Hebrew and Thai existed in previous versions and was consolidated in Windows 2000 in the Uniscribe component. The following are new scripts supported in Windows 2000 and associated fonts:

    New scripts

    Region where script is from
    Fonts
    Comments on language usage
    Armenian Eurasia
    Sylfaen
    Devanagari South Asia Mangal Used for many langauges including Hindi, Marathi, Sanskrit.
    Georgian Eurasia Sylfaen
    Tamil South Asia Latha

     

    Windows XP

    New scripts supported and associated fonts:

    New scripts
    Region where script is from
    Fonts
    Comments on language usage
    Gujarati South Asia Shruti
    Gurmukhi South Asia Raavi Used in India for the Punjabi language.
    Kannada South Asia Tunga
    Syriac Middle East Estrangelo Edessa
    Telugu South Asia Gautami
    Thaana
    South Asia
    MV Boli
    Used for the Divehi language.

    A surrogates shaping engine was also added to Uniscribe to allow display of Unicode supplementary-plane characters in the GDI text stack. (GDI supports wide characters, but does not understand UTF-16 surrogate pairs.) Also, Uniscribe's font fallback mechanism was extended to support fallback for Unicode supplementary planes: for each plane from 1 to 16, a single fallback font can be specified in the registry.

    The changes in Uniscribe to support supplementary-plane characters allow Windows XP to support CJK Extension B characters.

     

    Windows XP SP2

    New scripts supported and associated fonts:

    New scripts
    Region where script is from Fonts Comments on language usage
    Bengali South Asia Vrinda Used for the Assamese language as well as Bengali.
    Malayalam South Asia Kartika

     

    Windows Vista

    New scripts supported and associated fonts:

    New scripts
    Region where script is from
    Fonts
    Comments on language usage
    Canadian Syllabics North America Euphemia Used for several languages including Inuktitut and Cree.
    Cherokee North America Plantagenet
    Ethiopic Africa Nyala Used for Amharic and other languages.
    Khmer Southeast Asia

    DaunPenh
    MoolBoran

    Lao Southeast Asia DokChampa
    Mongolian
    North Asia
    Mongolian Baiti
    Oriya South Asia Kalinga
    Sinhala South Asia Iskoola Pota
    Tibetan Central Asia Microsoft Himalaya
    Yi China Microsoft Yi Baiti

    Character coverage for several already-supported scripts was extended to Unicode 4.1 or 5.0. Several fonts were updated accordingly. In particular, support for Latin, Greek, Cyrillic, Hebrew and Arabic was extended in the following fonts:

    • Arial
    • Courier New
    • Microsoft Sans Serif
    • Tahoma
    • Times New Roman

    The new Aero-theme UI font, Segoe UI, also provides Unicode 5.0 support for Latin, Greek, Cyrillic and Arabic.

    Extension B fonts for the SimSun and MingLiU families were added, as well as a variation of MingLiU with HKSCS support.

    The Uighur language uses Arabic script, which was already supported. However, a different font is required for Uighur typography. The Microsoft Uighur font was added to support this.

    Uniscribe's font fallback mechanism for Unicode supplementary planes was enhanced to support different fallback fonts depending on the starting font. In this way, fallback for CJK Extension B characters uses appropriate fonts for the given language. For example, for UI displayed using the SimSun font (Simplified Chinese), the fallback font for Extension B is SimSun-ExtB.

    New APIs were added to Uniscribe to support OpenType advanced typographic functionality in non-complex scripts. This provides a way for clients to expose advanced font capabilities such as language-specific glyphs; discretionary ligatures; true typographic small caps, superscripts and subscripts; old-style as well as tabular digits.

    Windows Vista includes the Cambria Math font, which has additional tables used to support layout of mathematical formulas. Special software support is also required to render math formulas, however. This is provided in Office 2007, but not in Windows Vista text-stack components.

    Of course the obvious question is what's coming next, but it is a bit early for that at the moment.

    We should spend at least a little time revelling in what we have, right? :-)

     

    This post brought to you by(U+0f02, a.k.a. TIBETAN MARK GTER YIG MGO -UM RNAM BCAD MA)

  • Sorting it all Out

    Making things better out from under people: the story of MUI

    • 0 Comments

    Over in the Suggestion Box, Kriz asked a question related to something I have been wanting to cover for some time:

    Can you please shed some light on the inner workings of SHLoadIndirectString ()?

    I'm trying to "MUI enable" some of my application and it looks like SHLoadIndirectString() plays a key role in understanding how language switching works in Windows (Explorer).

    The documentation on MSDN (
    http://msdn2.microsoft.com/en-us/library/ms538594.aspx) is rather lacking especially the filename part of an indirect string and how the resource dlls should be named and I need to put those dlls.

    I'll start by pointing out that at the moment, that link does not work. Just a residue of some pages moving around for some reason or another. But if you use this other SHLoadIndirectString link instead, you'll have much better luck.... :-)

    Now when you look at MUI in terms of coverage, Windows 2000 claimed about 90% coverage if you look at most resources. The remaining 10%? As discussed in Question #4 of the MUI FAQ:

    Much of the additional localized coverage in Windows XP is achieved through "MUI-enabling" Windows XP system modules and applications, specifically by:

    1. Transferring User Interface strings from the registry to Windows resource files.
    2. Removing User Interface strings from the kernel.
    3. Using the MUI-enabled shell to display localized strings for Start menu items, desktop shortcuts, shell menu items, file type names and shell verbs (shell's right-click menu items).
    4. Making Windows services impersonate the current interactive user instead of the system default when displaying User Interface.
    5. Barring the use of hard-coded file paths when loading resource files including help files, so that an alternate resource path can be used to load the resource file.
    6. Provide special code in each component to install and load the User Interface resource if a non traditional Win-32 resource is used (such as XML or HTML based resource)

    Now item #1 was largely handled by taking hard-coded registry strings in the registry and changing them to a new format that allowed one to redirect the resource loading to a DLL somewhere (and the DLLs by supporting MUI can handle multiple different languages).

    This format works quite well not only for SHLoadIndirectString but also for the new RegLoadMUIString in Vista as well (since SHLWAPI is a bit higher in the food chain than some callers needed it to be for items like time zones and since in most cases the strings were indeed stored in the registry, this new function was quite a useful addition!).

    But their use is definitely not central to MUI.

    They can be thought of as one of the crucial percentages of the last 10% of MUI. :-)

    Whether a user would need either of these functions is not entirely clear though -- the only way either one would be needed was if you were storing resources in a non-MUI friendly format like in the registry or a text file or an INI/INF file, and you wanted to redirect that string load without re-architecting the application. Which is not the most common scenario for most applications.

    One of the most common scenarios where one wants to avoid re-architecting things is when one has a backcompat requirement. However, as I hinted at in Sometimes when sorting the index is the last thing you want to use, there is often a cost to taking this approach.

    For example, if you look at the keyboard layout names (migrated in XP to this indirect format and a SHLoadIndirectString call as I discussed here) or time zone names (migrated in Vista to this indirect format and a RegLoadMUIString call in Vista as I discussed here), you have to compare the two models:

    • The old way of doing things allowed applications loading the strings to use only the initial install language of Windows;
    • The new way of doing things allowed any installed UI language to be used, but apps using the old way would on;y ever load English strings.

    A good example of such an application is older versions of Outlook, which used the time zone strings in their own UI:

    Now suddenly in Vista, those strings which were previously localized into the initial install language of Windows (which sucks since they did not respect the user's UI language of either Windows or Office) were now always English (which sucks worse since they now respected neither the initial install language nor the user's UI language) -- unless they made a code change.

    The code change is not unreasonable for new applications and/or new versions, but asking Outlook all the way back to Outlook 97 to change after the fact might cross the line of reasonable request.

    But wait a moment -- before you think of poor innocent downlevel Outlook, keep in mind that these registry keys were never documented or supported, and they were therefore just hitching an illegal ride with Windows. Nothing would have kept them from doing their own localization efforts in this space other than the desire to avoid the extra maintenance of keeping the strings in sync with Windows, therefore an honest an unbiased assessment of that fact will change such apps in our eyes from innocent martyrs injured by the big bad Vista to cynical bastards who had been existing on borrowed time for far too many versions.

    Now I will talk more about SHLoadIndirectString and RegLoadMUIString soon, as there is some interesting under-the-covers stuff to talk about there, too. :-)

     

    This post brought to you byand 𒁁 (U+4dd8 and U+12041, a.k.a. HEXAGRAM FOR INNOCENCE and CUNEIFORM SIGN BAD)

  • Sorting it all Out

    The L word (Limonata, I mean - the *other* L word)

    • 2 Comments

    Even the most occasional reader of this blog probably has some notion of my love for San Pellegrino Limonata.

    Anyway, after Trader Joe's stopped carrying this vital substance (to me, less important than oxygen since I inhale but more important than water since I don't hydrate), I switch over to Larry's Market and once again enjoy those periodic trips to pick up my Limonata cases.

    I have been getting 10 cases at a time, a nice easy number and large enough to work out some kind of deal for....

    Then this last week U was running low so I called my contact over there. The conversation always starts the same way....

    "Hello, is _________ there?"

    "One moment please."

    {I am placed on hold}

    "Hello?"

    "This is Michael Kaplan, I was calling about getting an order of..."

    She interrupts me and puts the word in herself: "Limonata!"

    "You remembered! How sweet!"

    "Like I'm going to forget this order?" {smile}

    "How about 15 cases this time -- how soon would they be in?"

    "Well, ordinarily it would come in on Monday. But we just put in our order and with the holidays starting to coming up I'm not supposed to put in a second order. So it would probably not be until Friday."

    {pause}

    "But let me see if I can get them it add it to the order. 15 cases, hopefully Monday but worst case Friday. I'll call you on Monday either way."

    "Thanks, I really appreciate it."

    "Any time!"

    She was successful in getting it added, and so there would be no interruption in Limonata. But there was one problem....

    The ordering changed a bit, and while ordering was done by the case, the price was by the six-pack (there are 4 six-packs to a case).

    The order went through and she multiplied by 4 to get the price/quantity right, but that actually quadrupled the quantity.

    So waiting for me was 60 cases.

    Oops!

    Yes, I had a Math is Hard moment. :-)

    I had to admit to her that my car wouldn't fit that much. But I asked her whether if I bumped the order up to 25 could she make the deal any sweeter?

    She laughed and said she'd what she could do.

    So yesterday morning I drive to Larry's Market and enter the store.

     I ask the cashier for the assistant manager.

    "You're the Limonata guy?"

    Heh. "Yeah, that's me."

    "OK, I have the price here so I can ring you up, I'll call her so she can bring it to your car."

    I ask whether she will have problems with too muck stock?

    "Not at all, it's a fast seller even without your orders. By the time you're back I'll be order new cases to fill the order."

    Whew, I definitely didn't want to cause any extra stress!

    We manage to fill the backseat with 20 cases and the last five go in the front seat.

    I drop those five off in the garage but leave the 20 in the back to deal with after work. I drive into work -- who the hell would steal cases of Limonata from my car! :-)

    My neighbor at work pops by to mention he saw my car -- the license plate made him pretty sure but the backseat full of Limonata clinched it....

    And when I get home I make five trips between the garage and the apartment, taking five cases each time. They are now stacked in the kitchen. I should probably take a picture of it....

     

    This post brought to you by L (U+004c, a.k.a. LATIN CAPITAL LETTER L)

  • Sorting it all Out

    EXPECTED is in the eye of the [non-expecting type of ]expectant

    • 3 Comments

    Lori asked:

    I’m seeing results with VB’s StrComp function that I would not expect.  For example:

        StrComp("Lee-P", "Leema", vbTextCompare)

    Returns 1, but

        StrComp("Lee-P", "Leema", vbBinaryCompare)

    Returns -1 as I would expect.

    Why does this first return a 1 rather than -1? 

    PS: I see the same behavior in VB6 as well as VB.NET.

    This may look familiar to regular readers, especially people who have seen Punctuation... now, isn't that SPECIAL [weights]? or the more recent A&P of Sort Keys, part 9 (aka Not always transitive, but punctual and punctuating)....

    Yes, it is good old word sorting -- VB does it too! :-)

    (The vbBinaryCompare constant is what it sounds like, and the vbTextCompare one is the default user locale)

    Expected really is in the eye of the beholder, ain't it?

    Now, we are sitting on the opposite side of an implementation that was first written over 15 years ago, though I wonder whether in retrospect it would be more intuitive to make SORT_STRINGSORT the default behavior and only do word sorting when it was explicitly asked for. There would be fewer complaints about cases that are not as intuitive since they aren't as straightforward as the co-op vs. coop case, but at the cost of a lot of less than ideal results.

    I have gone back and forth on this one a whole bunch -- trying to make it easier on confused developers, trying to make it better for ordinary users who just wouldn't be expecting punctuation symbols to be weighed so heavily.

    Kind of a time waster, but every time this issue comes up I walk through the alternatives again....

    Which is better? If it were up to you, what would you have suggested as the default a decade and a half ago?

     

    This post brought to you by - (U+002d, a.k.a. HYPHEN-MINUS)

  • Sorting it all Out

    How are *you* feeling today? And can I quote you on that?

    • 3 Comments

    Yet another metablog post -- and this one even has linguistic delusions! 

    A good friend warned me the other day that some people might be afraid to talk to me, since they would fear me directly or indirectly quoting them here on my blog.

    In honesty I don't know that I have seen as much of that (people who are worried about such things tend to tell me explicitly that "this is not for the blog" before they tell me things if they are worried.

    Though I am keeping the issue in mind in case I am wrong (any responsible strategy always allows for the possibility of one's assumptions about the situation being completely wrong!).

    With that said, I did see another phenomenon today.

    I was scooting by Carolyn Parsons' office and stopped to say hello, and I asked how she was.

    She paused for a moment, carefully considering how to answer.

    Her eventual response after that pause was completely and utterly awesome:

    Somewhere between Michael Vick's dog and Ellen DeGeneres' dog.

    I laughed, and immediately asked whether I could quote her on that.

    Her reply: "Why do you think it took me a minute to come up with it?"

    Now my first thought was about how this was a new spin on the idea of people who were afraid of being quoted -- the people who would (if they are perhaps going to be quoted) rather have it be a clever quote. :-)

    This idea may not scale though, ultimately.

    In this particular case, Carolyn's response was freaking brilliant, but of course delusional belief in the law of averages help assure us that not every one of them will be.

    Could I be building up resentment in the long run with things that I wouldn't choose to blog (either because I forgot or I couldn't think of a hook1, or I just didn't think it was as smart as the other person did, or whatever) and I would tend to fear the whole You Wish factor:

    Anyway, back to Carolyn's response (to which the above does not apply!).

    It reminded me of the issues behind Star Trek with linguistic pretensions, After all, since I understood both references, it really was a case of a story being told by its story references!

    So maybe I need to rethink the theory of that episode. As long as you have people with enough shared stories, perhaps they could speak in metaphor, after all....

     

    1 - Like the recent silent auction our group did for the Giving Campaign. I just couldn't think of a hook. I'm still working on it, though....

     

    This post brought to you by պ (U+057a, a.k.a. ARMENIAN SMALL LETTER PEH)

  • Sorting it all Out

    If the Novantrone juice isn't worth the squeeze...

    • 1 Comments

    Jim asked me via the Contact link:

    Hello Michael.....I am interested in your research on Novantrone. I also am currently using it. I have not anything good or bad to report. Could you possibly share the experiences that others have told you about it with me? It is after all a poison with a black box warning. When i question my Doc about it, he plays down the side effects and has told me that out of thousands of people using it there are only a couple of instances of bad side effects. After reading your blog, I see that two people have already spoke out against it due to cardiac issues. My heart is doing alright but I also worry about lukemia....

    There is no way to sugarcoat it -- Novantrone is a serious drug.

    Since I first talked about this Napalm of the MS world and on through until my decision to stop the treatment for now, I have received a lot of mail and some comments from people who find themselves doing pretty badly on it. From the heart problems to the AML (secondary Acute Myelogenous Leukemia) and everything else that it can do, there is no way to avoid feeling a certain sense of karma if you do poorly based on the whole "I voluntarily took poison; should I be surprised that it poisoned me?" aspect of the drug.

    But horror stories won't help anyone make a decision. At least not a responsible one....

    But with that said, I would strongly recommend if anyone is taking Novantrone and it is either showing no obvious benefits or if it was helpful originally but no further benefit is being hit, then considering the idea of what I ended up doing by putting off the intentional self-destructive behavior of the drug may be worth talking about with your neurologist.

    Even if the risk is that later on it may not be okay to take.

    Because it's not the kind of drug you take just because you can....

    For me, Novantrone was useful. And if my short term symptoms hit the breaking point that would make it worth considering resuming it, then it is worth considering. But for me, some drugs are too serious to be taking unless you know that the Novantrone juice is worth the squeeze....

     

    This post brought to you by  (U+b3c5, a.k.a. HANGUL SYLLABLE TIKEUT OKIYEOK)

  • Sorting it all Out

    Microsoft is a Form 'C' shop, Part 1

    • 2 Comments

    Microsoft has had Unicode as a part of its operating system offerings since the easrliest days of its 32-bit platforms.

    And a lot that support predates asnything that Unicode later chose to provide, thus we don't use the Unicode Collation Algorithm for our sorting, for years we did not use Unicode normalization for our equivalences, and all kinds of random snafus like that somewhat random Tibetan/Myanmar thing with us not picking up Unicode changes when they happened still manage to pop up after all of these years.

    Now for the most part, data coming out of Microsoft's keyboards, data entry methods, functions, methods, and algorithms has always been in what we for years called the precomposed form, which Unicode calls Unicode Normalization Form "C" in their UAX #15. Other than hiccups like code page 1258 (discussed here and here), data always tended to be in Form "C".

    In fact, if you convert data to Form "D" then there are a bunch of places like in collation that you won't get the most accurate results, even in Vista where most of the equivalent forms were added to the tables to try to make the impact of using Form "D" text less noticeable....

    Yet even today if you convert to Form "D" then all kinds of languages from Korean to Tibetan won't always sort as expected or as deisgned. And Vista features like LINGUISTIC_IGNORE* flags won't always return exactly equivalent results if you compare Form "C" text to Form "D" text. You are always better off converting text if you are getting it from other sources before using the NLS API for the text....

    Chalk it up to gremlins in the computers and such.... not converting what they do not seem to handle on their own....

    Now note that products like Access and SQL Server, being based on similar technologies only up[dated less often, still had problems even doday..

    Anyway, future posts in this series will be explaining other uences our "Form 'C'- ness". This is just the intro.

     

    This post brought to you by (U+0cc0, a.k.a. KANNADA VOWEL SIGN II)

  • Sorting it all Out

    Sometimes RunOnce is one time too many

    • 0 Comments

    Former colleague, and consistently Australian regular reader Mike Williams asked in the Suggestion Box:

    The issue at http://blogs.msdn.com/michkap/archive/2007/01/25/1526224.aspx

    still hasn't been fixed. I think it is about a year since it was reported while Vista and IE7 were in beta.

    Maybe you'd like to write that long-promised article about non-US English locales;-)

    That was indeed the blog from the beginning of the year entitled Internet Explorer 7.0's language settings? This may be the last straw.... that inspired all of the healthy discussion!

    What became clear in the investigation is that Windows was doing the right thing here, and Internet Explorer's settings prior to being launched (which are essentially based on the Regional Options settings) were also being handled in the expected way.

    This user interface contained in that RunOnce page, however, is still pretty consistently ignoring all of the above settings (not just for UK English, either -- lots of languages are affected), and I have not had much luck  discerning or divining the owners of this particular generic service page.

    I am generally of the opinion that the page should simply be bypassed entirely -- in the rare case where

    1. the user runs setup and makes choices, yet
    2. the first time they lunch the browse they would want different choices

    The user can clearly find the special UI to make the changes. In all of the common scenarios, even if this UI worked right it is not needed, and the fact that it is still broken after all this time kind of indicates that not a whole lo\t of work is happening with it.

    Rather than graduating from broken to useless, I think we should skip useless and move right to absent!

    I will forward this suggestion onto the IE folks for them to consider....

    Now the other issue about non-US English is going to need its own post, as I have talked to some people and have a better understand of some of the competing forces here. So that bit will have to wait for a little later in the week (perhaps given its frightening nature the upcoming All Hallows' Eve,would be best!).

     

    This post brought to you byand(U+0ba4 and U+0bf2, a.k.a. TAMIL LETTER TA and TAMIL NUMBER ONE THOUSAND)

  • Sorting it all Out

    Sometimes everyone is happier when the game is Fixed (aka Consoling Consolas lovers)

    • 4 Comments

    Oscar asked over in the Suggestion Box:

    Hi Michael,

    I was just curious if any progress had been made on adding line drawing glyphs to Consolas.  You'd mentioned it on October 19, 2006:

    http://blogs.msdn.com/michkap/archive/2006/10/19/842895.aspx#844137

    <<Per Greg Hitchcock on the ClearType team back in February of this year: "...we’re in the process right now of arranging the work for adding the missing glyphs to support Consolas in the console window.">>

    Thanks,

    Oscar

    Yes, Oscar is right that Greg (as I mentioned in a comment to Sad? You can sit at your console and console yourself by putting Consolas in your console) and other folks on the typography side have expressed interest in fonts in general in the console and certainly Consolas.

    But this is the kind of change that requires not just support within the font but verification of behavior in the console by test, etc.

    It is not the sort of change that one would expect in a service pack, so I expect that it would not be seen until the next major version of a product that ships the font.

    Now which product that might be is a topic one could speculate about, obviously --- but I'm going to refrain from that and just wait until I know for sure before I do anything....

    For now, we must console ourselves that registry info is the ticket to not having to wait, which for now can be used by the people who want to use Consolas in the console.

    Though if it were up to me, a "Comic Fixed" or "Comic Serif Fixed" like the one I joked about here would be a real priority instead of a small personal fantasy of mine. :-)

     

    This post brought to you by (U+221e, a.k.a. INFINITY)

  • Sorting it all Out

    Japanese line breaking rules can be quite complex

    • 3 Comments

    A very long time ago, John Black asked in what had been the oldest post in the Suggestion Box:

    I've noticed when developing web pages that use Japanese characters that many web browsers treat *every* character as a potential word boundary, meaning if you try to rely on minimum-width side effects of text blocking, you end up with a lot of overly-wrapped Japanese text.

    This seems to be particular to Japanese language sets, and on more than one browser, so that's why I thought I would ask here (rather than rant about a potential browser bug.)

    How are word-boundary rules for different locales handled by Windows apps -- is it different from app to app, or is there some centralized property of the maps that have this info?

    And then not very long ago, a slightly different question was asked on an internal list:

    Hi,

    I am facing this problem in one Microsoft portals Japanese locale, below is the description of the problem:-

    We experience problems with kinsoku shori, where the line breaks at a specific character that shouldn't be broken.  This is also inconsistent in every browser because of personal preferences with resolution and font-size.
     
    The content is being pulled from a content management system (intaglio) and coupled with the product's css styles.

    Most Japanese websites I visited do not seem to have any special CSS styles within their blocks of text, so I am unclear of how they handle the issue or if they even handle it.  The available CSS styles that resolve these problems are CSS3, which every browser is yet to be compatible with.
    http://www.css3.com/css-line-break/ (Please go thru this link for better understanding)

    As a temporary fix, we have manually entered in white spaces in the content to introduce or force a line break, which may still be inconsistent in other browser settings.  I was hoping that you could give me a little insight on what can be done if you have seen any methods to resolve this.  We may need to have a more permanent solution in the future.

    Please reply to me if you have any information/suggestions on handling Kinsuko.

    Anyway, there are lots of other people noticing problems relating to Japanese text in the browser, an issue which almost gives lie to the cultural worries about lack of complexity in Japanese I talked about a bit in If you aren't adequate, I guess that means you're inadequate; if you're not complex, I suppose that means you're simple?, if you ask me....

    The truth is that there is indeed much complexity that is captured in CSS, it just isn't as well understood as it could be, I think.

    Michel Suignard responded to the second question:

    To me it does not look like a bug, it is basically loose kinsoku which allows break inside syllable like ‘sho’. If you want to disable that you have to set kinsoku (line-break) to strict. I could not find the sequence for ‘apurikeeshono’ in the appended txt file which seems to be the kana sequence that is ‘problematic’. This is consistent with typical Japanese text processing.

    And then Paul Nelson provided a sample that you could put in an .HTM page and then watch line breaking act a bit more appropriately:

    <html>
    <body>
    <p>ビジネス アプリケーション</p>
    <p>ビジネスを実行するオンラインアプリケーション</p>
    <p style="line-break:strict;">ビジネスを実行するオンラインアプリケーション</p>
    </body>
    </html>

    The full info on this subject is itself worthy of an article or a blog post probably, though I don't have the expertise to get into it without a bit of research being done first....

    So consider this post to be an acknowledgment of complexity, a promise of future effort, and then if you're interested you can just stay tuned. :-)

     

    This post brought to you by(U+30d3, a.k.a. KATAKANA LETTER BI)

  • Sorting it all Out

    They say a leopard can't change its spots, but I upgraded anyway!

    • 4 Comments

    I decided to say what the hell and pick up a new copy of Mac OS X 10.5 (Leopard) and went to upgrade my Mac earlier today (you know, the MacBook Pro I have mentioned before).

    This was a scary thing to do, since I had no AC adapter (I accidentally left in San Jose after my last IUC 31 talk, after having previously left it twice during the conference and getting it back!).

    So, armed with a battery with what I thought was only a 50% charge, and even with the setup warning me that I should plug in, I decided to throw caution to the wind.

    Leaving the DVD aside for days and not installing it just seemed like a worse idea!

    Of course as soon as the time had been calculated I decided I was in trouble -- it was saying the setup would take 2 hours and as it happens I had only 48% left on my battery.

    "I'm screwed," I decided.

    Luckily, Apple is just as bad as Microsoft is in this area (and Mac is better on battery life than Vista!) -- the time went down much faster than the battery power did, and setup was able to finish. :-)

    I even had enough to boot into my 32-bit and 64-bit Vista Boot Camp partitions and install the updated drivers right off the Mac OS X 10.5 disk.

    I barely had time to look at it yet, but 10.5 looked nice and spacey and 32-bit Vista seemed okay.

    The 64-bit Vista looked good before the reboot but would not let me log back in after (the keyboard wouldn't recognize the DELETE key for the CTRL+ALT+DEL and it failed in with the soft keyboard as well -- I'll have to plug in a USB keyboard once I have a power cord again and try and figure out what is failing there).

    But for the most part, it looks like the Tiger is now a Leopard and the spots have been upgraded! :-)

    I am sure I'll be blogging about this more once I have a power source again!

     

    This post brought to you by X (U+0058, a.k.a. LATIN CAPITAL LETTER X)

  • Sorting it all Out

    What it means to me to have a Blog containing a blog in which I blog about stuff (like 'Go [[to ]the ]Prom')

    • 2 Comments

    I might be a bit of a closet prescriptivist.

    Which I think sucks, given how much I ridicule the practice, especially when I think back to instructors and teachers who would do it while treating the issues as if they were natural law, or at the very least crimes against grammar.

    But I watched myself fighting the tendency to call a blog post a "blog" since in my mind the blog is the whole sheaf of them.

    And I watched myself shudder to use the word as a verb even as I enjoyed wearing a shirt that did it.

    But the force of repetitive usage has worn me down, and finally I'll accept these things. Which were never mine to accept or deny in the first place....

    It still sounds "wrong" to me, even when I do it myself. Silly, but there it is.

    So I was thinking about this and I realized the first time I noticed the phenomenon....

    To me, one would talk about going to the prom. Like you would tell people "Sheila and I are going to the prom" and so forth.

    Many other people would say "we're going to prom".Which seemed incorrect since the vast majority of people who say it have only one event within their grasp and therefore the might be more appropriate. But I could live with that, barely.

    But then there were those other people, the ones who would say Let's go prom or somesuch. No the. And no to.

    How was that even possible?

    It was like fingernails on the chalkboard to me, which was unfortunate since I was dating someone in high school who said it that way, and dates go so much better when they don't involve fingernails on the chalkboard.

    Believe me. :-)

    This one I never really got used to, though I found it is a regional thing and obviously once high school was over I did not have to hear it ever again (though two years and associations across five different high schools meant I had to hear the phrase and its variants a lot back then).

    This is not like Mark Liberman's Go X, exactly. But maybe there are similar forces at work in my Go X where x=Noun or maybe with "Go [[to]the ]X" too. I'm going to have to wade in here at some point....

    I am self aware enough to realize that if my notions of linguistic aptitude were really anything beyond notions that I would be more accepting of these natural variations and changes that are a pretty fundamental part of language.

    I'm even stubborn enough to want to go learn enough to cure me of this rampant provincialism and let's face it linguistic snobbery from someone who knows so little about it. The main thing stopping me from just enrolling at the UW is the fact that given the schedules for available classes there is no way to do it and also have a full time job. And there is a lot about my full time job that I love enough to be unwilling to shed it for the sake of shedding a language fetish. :-)

     

    This post brought to you by (U+0ddc, a.k.a. SINHALA VOWEL SIGN KOMBUVA HAA AELA-PILLA)

  • Sorting it all Out

    Schrödinger's post?

    • 3 Comments

    After I posted Jokes that aren't really all that funny in the end (aka At least SQL Server isn't on our case) and really had a chance to reflect on the fact that I thought the post was going to be quite funny and then before I posted it realized that it really might not be, the whole blog was in a weird place.

    The post about to go to live was both funny and not-funny, and until the wave form collapsed and a cat lived or died, there was really no way to know whether it was funny or not.

    Reader Bulletmagnet put it all in perspective with a comment:

    Perhaps this post is simultaneously funny and unfunny, like:

    http://xkcd.com/45/

    And indeed I saw the conceptual parallel:

    Schrodinger

    Shit.

     

    This post brought to you by (U+6292, a CJK IDEOGRAPH)

  • Sorting it all Out

    I am 20 out of 21 and flexible on the capital punishment issue

    • 6 Comments

    This post originally was part of a very different blog which (thankfully) no longer exists and could be best described as a rawer version of SiaO -- a fledging blogger's first effort at transformation. It has been reposted here for essentially no good reason whatsoever. I have no excuse yet still make no apologies....

    My latest task is to try and make myself more attractive to Canadian singer/songwriter Alanis Morissette.

    It is not that I am undeniably attracted to her in a conventional sense (I have to admit that I am not).

    And it is not that she falls under the "angry singer/songwriter" category that so many believe I fancy.

    To be completely honest, the principal appeal is that she has explicitly listed out what she is looking for from a man, and I think I can meet a lot of the "qualities that she prefers".

    So often men wonder what women want, and (ignoring trite Gibson/Hunt comedies) so seldom do we get to find out -- it's quite amazing to have qualities listed explicitly this way.

    The time savings alone makes this a worthwhile consideration!

    The song (for those who do not memorize song lyrics such as I do) goes like this:

    Do you derive joy when someone else succeeds?
    Do you not play dirty when engaged in competition?
    Do you have a big intellectual capacity but know
    That it alone does not equate wisdom?
    Do you see everything as an illusion?
    But enjoy it even though you are not of it?
    Are you both masculine and feminine? politically aware?
    And don't believe in capital punishment?

    These are 21 things that I want in a lover
    Not necessarily needs but qualities that I prefer

    Do you derive joy from diving in and seeing that
    Loving someone can actually feel like freedom? are you funny?
    à la self-deprecating? like adventure? and have many formed opinions?

    These are 21 things that I want in a lover
    Not necessarily needs but qualities that I prefer
    I figure I can describe it since I have a choice in the matter
    These are 21 things I choose to choose in a lover

    I'm in no hurry I could wait forever
    I'm in no rush cuz I like being solo
    There are no worries and certainly no pressure
    In the meantime I'll live like there's no tomorrow

    Are you uninhibited in bed? more than three times a week?
    Up for being experimental? are you athletic?
    Are you thriving in a job that helps your brother?
    are you not addicted?

    These are 21 things that I want in a lover
    Not necessarily needs but qualities that I prefer
    I figure I can describe it since I have a choice in the matter
    These are 21 things I choose to choose in a lover

    ...curious and communicative...

    By my count (depending on how you number them), I have somewhere between 19 and 20 of these items (I say 20, a friend of mine says 19). And I have frankly been wavering on the capital punishment issue for a few years now and am very willing to be flexible here.

    Now if you consider the fact that her engagement to Ryan Reynolds is over and then ignore the fact that the odds of her ever even talking to me are close enough to nil as to make a lottery win seem certain, I may have a shot, right? :-)

    Well, actually, I don't. I recall an interview (sometime after the above was written) where she claimed that she actually had more like 673 things she was looking for, and that the actual list was updated after each relationship. So this idealized view of making relationships easier by having if not the needs at least the qualities that a woman might prefer is tragically not available, for her or anyone else.

    Even that movie What Women Want is a superficial take on things that would have been better titled What Women Think since that what the premise actually provided. And while it succeeded in making Mel Gibson's character less of a prick, it still proved to ultimately be worse rather than better for relationships as a permanent gift. It was a t best a parlor trick that made seductionin one-night-stands easier.

    What we learn is that even if a man could magically know what a woman was thinking it may provide nothing in the way of explaining what that woman might want. Because how often is anyone (mn or woman) literally thinking specifically about what they want in anything but the most superficial circcumstances?

    I guess that means I'll have to sort it all out the hard way?

    Something that has gone so well for me, to date1. :-)

     

    1 - Sarcasm implied.

     

    This post brought to you by (U+2cfa, a.k.a. COPTIC OLD NUBIAN DIRECT QUESTION MARK)

  • Sorting it all Out

    VK_DECIMAL is always valid (except formerly in Serbia)

    • 0 Comments

    Regular reader, comrade, veritable demigod in all things related to keyboards, and the man behind Tavultesoft Keyman, and provider of Australian beer Marc Durdin mentioned to me:

    Hi Michael,

    Just an FYI - didn't see a mention of it on your blog...

    We had a report recently of a hang in Keyman associated with the Serbian Cyrillic keyboard. After tracing it back to our on screen keyboard, we discovered that the issue was with a ClearKeyboardBuffer function (developed independently, but then updated to use VK_DECIMAL as you suggest in
    http://blogs.msdn.com/michkap/archive/2006/04/06/569632.aspx).

    With the Serbian Cyrillic layout, ToUnicodeEx(VK_DECIMAL, sc, lpKeyStateNull, sb, sb.Capacity, 0, hkl) returns 0. Unfortunately, ClearKeyboardBuffer can never be terribly robust as we don't know what the keyboard layout is going to give us...

    We checked the function against every other layout in XP, and Serbian Cyrillic was the only one to fail. Haven't checked against Vista layouts.

    Cheers,

    Marc

    Marc is right. The ClearKeyboardBuffer function, first introduced in Getting all you can out of a keyboard layout, Part #7, does have a bug in it.

    A bug that is reproducible in the Serbian (Cyrillic) keyboard layout that ships with Windows NT 4.0, 2000, XP and Server 2003, and only because that keyboard layout contained a bug in it.

    Here is the "broken" layout:

    Note especially that key for the Decimal Separator -- looks like there is nothing in it, right?

    Looking at the key in all shift states:

    Indeed, there is nothing in it.

    So much for VK_DECIMAL being the one key that is in literally every layout!

    Now the problem is fixed in the layout in Vista:

    I just looked at the source code logs, and it turns out that I am the one who checked in the fix for this, back in the end of April 2004. But someone else made the fix, I was just checking it in for him....

    The slightly updated version of the function, which allows for this ZERO case and also the cleared out state where multiple characters are returned (which still means that things are cleared out) is:

    private static void ClearKeyboardBuffer(uint vk, uint sc, IntPtr hkl) {
        StringBuilder sb = new StringBuilder(10);
        do {
            rc = ToUnicodeEx(vk, sc, lpKeyStateNull, sb, sb.Capacity, 0, hkl);
        } while(rc < 0);
    }

    The broken keyboard, as I mentioned, has been around for some time. And obviously one could create one if they wanted to that was just as broken, which makes this a good fix to put in....

     

    This post brought to you by , (U+002c, a.k.a. COMMA)

Page 1 of 7 (100 items) 12345»