Blog - Title

September, 2008

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    The difference between Six Sigma and Sigma Diaresis is one must never fail; the other seems to do so by default

    • 4 Comments

    As the regular font ninjas and font experts and font mavens will likely be quick to agree with me about, I am not a font maven, expert, or ninja.

    But every once in a while I can contribute productively. :-)

    The other day, over on the VOLT users community forum, an interesting question was raised, by JGlavy:

    Greetings,
     
    I'm trying to make a font opentype friendly toward using Greek letters with non-canonical diacritics (such as used in Arvantic and Karamanli, etc).  I can't even get the dieresis (combining, or otherwise) to keep from colliding with the capital Sigma.  I've tried work-arounds such as precomposing a Sigma with dieresis but can't get it to output the precomposition when typing Σ and ¨ .  I can make up some completely arbitrary letter like S with ¨ and then link it to the precomposed Sigma-with-dieresis and THAT will output properly....but only with Latin letters.
     
    Any advice on how I can resolve this...or is this one of the Uniscribe limitations?  It does seem that the Greek range won't let me put any diacritics over Capital or lower case that are as tall as capital letters.
     
    Please help
     
    JGlavy

    now of course the first part of this is easy and any OpenType maven can get into it -- the liga or Standard Ligatures OpenType feature, which you can read about here:

    Tag: 'liga'

    Friendly name: Standard Ligatures

    Registered by: Microsoft/Adobe

    Function: Replaces a sequence of glyphs with a single glyph which is preferred for typographic purposes. This feature covers the ligatures which the designer/manufacturer judges should be used in normal conditions.

    Example: The glyph for ffl replaces the sequence of glyphs f f l.

    Recommended implementation: The liga table maps sequences of glyphs to corresponding ligatures (GSUB lookup type 4). Ligatures with more components must be stored ahead of those with fewer components in order to be found. The set of standard ligatures will vary by design and script.

    Application interface: For sets of GIDs found in the liga coverage table, the application passes the sequence of GIDs to the table and gets back a single new GID. Note that full sequences must be passed.

    UI suggestion: This feature serves a critical function in some contexts, and should be active by default.

    Script/language sensitivity: Applies to virtually all scripts.

    Feature interaction: This feature may be used in combination with other substitution (GSUB) features, whose results it may override.

     Okay, that seems easy enough.

    Update 30 Sep 2008: Actually, turns out the ccmp feature is better for this then the liga one; you can see the comments for details on why. I think I mentioned, not a font ninja? :-)

    So all I need is a SIGMA and a DIAERESIS and I am done, right?

    Well, not quite. And this is where I can become more useful, since we are moving into a place where I am (on a scale of ONE TO NINJA) going to place higher than I do when it comes to typography -- keyboards. :-)

    So we'll back up. We assume that JGlavy knows about the font stuff since he is talking about several things intelligently that I don't fully grok (this is admittedly using the same principle I suggested here that Dale had an even better citation for!).

    And we look at what is being done:

    I've tried work-arounds such as precomposing a Sigma with dieresis but can't get it to output the precomposition when typing Σ and ¨

    The important question here -- what is the ¨ here?

    The liga entry has you include individual Unicode values. But almost no keyboards include U+00a8, aka DIAERESIS, except in the case of dead key combinations when no valid pair of characters is found or the broken Romanian keyboard is used.

    Dead keys will, like fonts, use the precomposed characters.

    Unicode would want you to use U+0308, aka COMBINING DIAERESIS. So you would type Σ (U+03a3, aka GREEK CAPITAL LETTER SIGMA) followed by that U+0308. Only it does not usually tend to exist on keyboards unless you include it in a custom keyboard you create!

    Now of course the liga feature in OpenType has no higher knowledge of what we'll call "stupid sequences" so in theory nothing stops you from programming any sequence in and then it, when it notices such a sequence, using it. But in the interests of test that is correct and is easy to do searches on later, let's try to get the right sequence in the text stream -- U+03a3 U+0308.

    You'll need to have an input method that lets you type it (or build your own, a-la-MSKLC!), and the font that does the magic so you can see the dots above the sigma.

    Now, changing tracks a bit, if you use that sequence without help, you get Σ̈ and you can maybe make out the dots hiding behind the sigma. I wonder whether the habit of not including diacritics on the capital letters in Greek had more to do with the fact that there was nothing there to make them shape more harmoniously together (ref: Sometimes, uppercasing sucks), and that maybe if the right glyphs were added here then everyone would be okay with the diacritics.

    You could then solve it all with some liga entries in your font.

    Now the bonus of this supposition is that even if I am dead wrong, you can still solve the problem I mention in Sometimes, uppercasing sucks with the liga entry, simply using the letter without the diacritic as the form to show when combined with the capital letter. Thus being wrong is no blocker to me having provided the correct solution. :-)

     

    This blog brought to you by all of the previously mentioned characters in this post

  • Sorting it all Out

    What a tangled web we weave when a KLID from an HKL we must receive

    • 6 Comments

    For the title to work best, you have to use the pronunciations from Some keyboarding terms for the terms...

    In a series of newsgroup posts, Udi Raz asked many related questions:

    There are two Bengali layouts on XP.

    I am able to load and activate the Bengali layout using :

    CString strLCID = _T("00000445");
    HKL hNewKeyboard = LoadKeyboardLayout(strLCID,KLF_SUBSTITUTE_OK);
    HKL hkl = ActivateKeyboardLayout(hNewKeyboard,0);

    or to activate the bengali on a different window using :
    ::PostMessage( m_foregrounfWnd, WM_INPUTLANGCHANGEREQUEST, 1, 0x04450445 );

    the second layout, Bengali inscript does not work.

    I obtain the Bengali Inscript by adding it to the language bar, choosing it
    and using :
    DWORD dwThreadID = ::GetWindowThreadProcessId (foregrounfWnd, &dwProcessID);
    HKL focusLangId = ::GetKeyboardLayout(dwThreadID);

    The value I got is : focusLangId = 0xF02A0445

    Please advice how to load it
    Thanks,
    Udi Raz



    Hi,

    The following code does not activate Bengali Inscript but do activate
    Benagli (0x00000445) 

     LoadKeyboardLayout(strLCID,KLF_SUBSTITUTE_OK);
    ::PostMessage( m_foregrounfWnd, WM_INPUTLANGCHANGEREQUEST,
    INPUTLANGCHANGE_SYSCHARSET, 0x00010445);

    while the code below works well for both Bengali layouts
    HKL hNewKeyboard = LoadKeyboardLayout(strLCID,KLF_ACTIVATE);
    ActivateKeyboardLayout(hNewKeyboard,KLF_SETFORPROCESS);

    I would like to change the language of other application and therefore I am
    using PostMessage. It looks like that the postmessage does not call
    ActivateKeyboardLayout with KLF_SETFORPROCESS flag.

    Please advice

    Now the biggest problem here is a misunderstanding I cover in one of my earliest blogs (Some keyboarding terms) -- the difference between a KLID and an HKL.

    Now the topic has come up a few times since then, like in Thursday morning wrestling: LCID vs. HKL, Why are the HKL and KLID of the keyboard different?, and many others.

    Now the problem here is kind of like the one I discuss in How do I get the @!#$% name of the keyboard?, which is trying to get the KLID when one has the HKL.

    This is essentially the same problem -- one has the HKL but one wants to get the KLID.

    Now of course solving the converse problem is easy -- a simple call to LoadKeyboardLayout will take a KLID and give you an HKL. And then the only thing you need to do is make sure that you unload it (via an UnloadKeyboardLayout call) if it was not already loaded, a problem I talk about in Getting all you can out of a keyboard layout, Part #2. But that is really very easy.

    Now admittedly that problem would have much easier to handle if USER's keyboarding API borrowed a page from the ADVAPI32 registry API with its RegCreateKeyEx/RegOpenKeyEx or KERNEL32's DLLs/Processes/Threads API did with its LoadLibrary/GetModuleHandle -- distinguishing between trying to point to something if it is already around, versus load it first if it's not there.

    And life would much easier if that existed, I will admit -- in fact my own life would have been easier when the MSKLC work was going on. But the problem is still easily solvable so there are no real worries there. It is probably time to get over it and stop whining so much. :-)

    But as I pointed out in How do I get the @!#$% name of the keyboard?, the other way around is not so easy at all.

    The only function that exists to help is GetKeyboardLayoutName. And it will only deal with the current input language in a thread, not the other ones that may be loaded with their own HKL values.

    Changing the current HKL (via ActivateKeyboardLayout, for example) is a nightmare of unwanted notification messages and communications that is so awful that even .NET's InputLanguage class saw its developers throwing up their hands and writing their own code to parse the almost-impossible-to-figure-out HKCU keys to find out about loaded input languages, in order to avoid tripping up its own input language events functionality.

    It's a freaking clusterfrick, if you know what I mean.

    Even if you are willing to live with the unwanted events and such, using the same code How do I get the @!#$% name of the keyboard? uses to only unload what is already loaded, you will won't have everything.

    Because a long time ago it was decided that you could put any keyboard under any language. And the HKL's LANGID is based on the language you put the keyboard under, not the actual keyboard layout that the KLID expresses.

    In fact the TSF functions are the only way to even add such language/keyboard combinations yourself, though there is a newly documented function (only really available in Vista) that makes this all at least possible (EnumEnabledLayoutOrTip). It returns a bunch of LAYOUTORTIPPROFILE stuctures, which are defined as follows:

    typedef struct tagLAYOUTORTIPPROFILE {
        DWORD  dwProfileType;       // InputProcessor or HKL
    #define LOTP_INPUTPROCESSOR 1
    #define LOTP_KEYBOARDLAYOUT 2
        LANGID langid;              // language id
        CLSID  clsid;               // CLSID of tip
        GUID   guidProfile;         // profile description
        GUID   catid;               // category of tip
        DWORD  dwSubstituteLayout;  // substitute hkl
        DWORD  dwFlags;             // Flags
        WCHAR  szId[MAX_PATH];      // KLID or TIP profile for string
    } LAYOUTORTIPPROFILE;

    Note that this structure will give you both the LANGID and the KLID, though interestingly not the HKL.

    Though now, even armed with EnumEnabledLayoutOrTip, you still need ActivateKeyboardLayout, GetKeyboardLayoutName, LoadKeyboardLayout, and UnloadKeyboardLayout if you really want to get the KLID of a specific HKL, unless you trust that the never-explained-member of LAYOUTORTIPPROFILE member dwSubstituteLayout is a 32-bit HKL (shadows of the 32 bit vs. 64 bit HKLs? problem now, since this function works the same on 64-bit Windows as on 32-bit!).

    We'll just leave the actual problem Udi Raz was having, of trying to load anything other than the first, default input language under the same LANGID as the underlying KLID is used as a difficult one (though all of the clues are now here for how to solve the problem in Windows >= Vista and most of the clues are now here for Windows in prior versions.

    In my opinion the fact that it is so hard for users to do something in this space ought to be considered a genuine bug, though this is just my opinion and have no proof or belief that the owners of the components would feel the same way.

    An upcoming blog later this week will clear up the loose ends and make this all a bit easier to work out the solution here.

    Perhaps some of this will even be coded up in a future blog beyond that, too.

    And I still have bug to explain as well -- this will be in a blog coming up tomorrow or the next day...


    This blog brought to you by(U+0990, aka BENGALI LETTER AI)

  • Sorting it all Out

    I'll start by saying לשנה טובה to everyone who understands what I just @#%&*! said

    • 4 Comments

    In this blog I am not speaking for Microsoft or that bar I was in whose name I don't recall or any branch of Judaism or any particular Jew other than myself. And the UMW rule definitely applies for people who ignore this truth!

    So I was asked a question the other day. It was one of those wild evenings of what turned out to be a fascinating and scintillating birthday weekend.

    It kind of screwed with the story behind On why I think my birthday sucks a bit. Or maybe a tad. Some small amount like that....

    Anyway, I was talking about the question I got.

    The really random question.

    Soon after the hellos and all that, she came out with "You're Jewish, right?"

    It was kind of a weird opener, but I figured what the hell. "Yes, I am," I answered.

    No, that was not the interesting question.

    But there was one. I'll get to it in a minute.

    First she explained "I guess I'm half Jewish, my dad. But he left before I was born so I don't know much about it."

    "No shame there," I reassured her. "It was good enough for Goldie Hawn."

    She nodded (though I think the joke fell flat), but continued "What's up with Jews and The Jazz Singer?"

    As questions go, it is a pretty good one.

    The Jazz Singer is, by pretty much any objective standard of movies,  pretty lame.

    It is hardly an accident that it has a 15% rating on rottentomatoes.com, and there are many sound reasons why Roger Ebert gives it just one star.

    Everything he put in that review is true.

    I think the fact that I knew about the 15% thing was not really something that helped me here, in my answer.

    Though she did like my actual response....

    I explained that "VHS came out at the end of the '70s. By the early '80s people certainly started having VHS VCRs. Somehow in America where Jews are perpetually worried about assimilation. this crazy story of the cantor with a very American dream whose very religious father accepts him and he gets to have both lives just turned into the Hello World movie that every Jewish family wanted to have the tape for, whether they bought the (at the time expensive) prerecorded one or just recorded it themselves off of HBO. How many mainstream movies even brought up Judaism at all? And here was one that had a great answer to people trying to figure out where they fit. The movie became available to them at about the perfect time for this."

    This may not be exactly what I said, but I think it is fairly close. It may have been less coherent since I was a little buzzing at the time!

    "You too?" she asked.

    I nodded. "Indeed, I am pretty sure it was the first pre-recorded movie they bought -- I remember the black box with gold shape of Neil Diamond's outline -- and also one of the first movies my parents recorded off of HBO that was just kept in the library. Some kind of milestone. The fact that it is such a turd never came up in conversation."

    "Are you orthodox?" she wondered?

    "No, I'm kind of nothing at the moment though reform is easier in Seattle for high holidays. But I was raised conservative, which is ironically kind of liberal."

    She senses there are jokes here that she isn't getting, and not just because she's a blond. I'm losing my grasp on the situation, clearly.

    But she searches for a firmer foothold. "What did you mean by Hello World before?" she asked, as if I had just said it.

    "Oh, I am a computer software guy. The 'Hello World' program is like the first one you create and then when you run it the program just greets the world."

    She looked thoughtful. "Okay, I think I understand that. But I don't do much with computers. Back to the movie -- every Jew thinks this?"

    "Oh no," I replied. "I think it's mostly a subconscious thing. I doubt most of the people who like it despite it being such a crap movie are nearly as obnoxious of self-important or as total narcissists as Neil Diamond was in the movie. But the overall themes of the movie feel comfortable -- it just resonates, you know?"

    So we talked for a little longer after that. Soon she had to leave, so after after that so I turned back to my friend and found that several of them were not there. I went to ask people where everyone was and such.

    So there you have it -- my personal theory about why such a terrible movie ended up being such a staple for so many Jews. It wasn't such a religious message, really. In fact from an actual faith standpoint it's a shitty message for a religion where even the most lax can make it to Shul for Rosh Hashannah and Yom Kippur, to make it clear that a trained cantor being at neither unless his dad has a heart attack is a good thing somehow. And from a secular standpoint Roger Ebert had it all down in particular with one part:

    One sequence that is not predictable has Neil Diamond abandoning the (now pregnant) Lucie Arnaz in order to hit the highway and become a road show Kristofferson. This stretch of the film, with Diamond self-consciously lonely and hurting, is supposed to be affecting, but it misfires, it drips with so much narcissism.

    There's a great message in there, knowing that the temper tantrums we have when we are ten are acceptable when we are middle-aged. That we can leave one wife who is living with our father and then we just walk away from the woman we left her for who we are sleeping with and who we actually knocked up without a forwarding address. This film is just brimming with family values, let me tell you.


    This blog brought to you by(U+2721, aka STAR OF DAVID)

  • Sorting it all Out

    On why I think my birthday sucks

    • 5 Comments

    A lot of people know I am not a big fan of celebrating my birthday.

    The whole practice seems kind of silly to me.

    I had nothing to do with the conception (essentially my move-in date), and the eviction nine months later (well, over nine months -- I was late because apparently I wanted to finish my standard one-year lease) was not my idea either.

    So how many evictions from places you used to live do you celebrate?

    Probably just the one, the one you cried about at the time like most people do.

    There aren't many times in my life that I have cried that I would celebrate; they all tended to be kind un-fun. And though I really don't remember much about that first time (I think I was still really tired from the move, once they completed serving me with the eviction, too tired to deal with all of the people talking to me like I was an idiot -- how condescending is baby talk? Geez!), I imagine that I was not enjoying myself.

    Or maybe the George Carlin approach is best -- if we are truly going to celebrate the anniversary of when a person doesn't put their diaphragm in, then shouldn't it be the conception date that is celebrated? Or whatever, you know what I am saying.

    My only actual uncle crashed his motorcycle trying to get back to where my mother was when she was having me (there is a reason that folks in a doctor's office I used to work at called them donorcycles, clearly). He recovered, but it seems like I was not the only one having a bad day then. Why remember it at all?

    Anyway....

    So today I am 38.

    I was born in the central time zone, in Creve Coeur, Missouri, so I guess I actually should have posted this two hour ago. But I was born at 3:12am, so I guess I am actually one hour and eleven minutes early.

    Creve Couer is actually from French, and I think it means something like "heart breaker" which I will let you draw whatever conclusions you feel the need to.

    I am not 100% positive on the French there, sorry. But when I was in third grade I was given the option: I could take Spanish and be Miguel or French and be Michel, so it is little wonder that I chose Spanish. Do teachers realize how dumb they sound when they ask questions like that?

    Technically I think that my heart has been broken more times than I have broken the hearts of others, but Michaels tend to be hell-raisers by and large so I doubt I would really be able to convince anyone without enumerating both lists, which I really am not prepared to do....

    I don't think I look 38, and looking younger is probably a good thing, though I often feel older (that is the MS thing, perhaps both MS things, kicking in).

    Julie and Cathy tell me I am 12, though they have been telling me that for nearly eight years now which should make me 20 by their logic yet somehow does not. Perhaps Julie (who is now in management and no longer needs the math skills she had to have a dev!) and Cathy (who between her tendency to double estimates of her wine cellar contents and also to equate eight with infinity!) can be forgiven the errors in math that seem so obvious to others.

    Or perhaps it is some kind of comment about maturity level, to which I will plead Nolo contendere since looking at life through the eyes of a twelve year old would have its advantages, too.

    If the actuarial tables are to be believed I have already crossed the midpoint of my life (and if Edgardo Vega Yunque only made it to 72 then it is unrealistic to assume I'll make it all that much further), so celebrating it seems odd.

    What comedian pointed out that birthday traditions are kind of suspect anyway?

    I mean, when you are a kid you just get a few candles that you love, but if anyone ever wheeled out a cake with 38 candles on it then I would throw something at them. Something heavy if I have such an item near by.

    This comedian's theory is that we would go to actuaral tables at the point of birth to get the estimate of life expectancy (an estimate which, while obviously not 100% accurate at the specific level is quite accurate generally). On your first birthday you start with that many candles and then each year you remove one candle.If you make it down to no candles then you have a REAL party because you have essentially beat the insurance companies.

    Now that is a cause that anyone can get behind.

    I mentioned Edgardo Vega Yunque a moment ago. I was thinking about this when I read the words his essentially daughter Suzanne Vega from her blog (the words she said at the memorial)

    Over the last week and a half I have been hearing Ed Vega described in various ways like “difficult” and “cantankerous” and “feisty”. Which he was. But the way I would describe him was “angry”. That’s how I knew him, and that’s how I will remember him.

    And along with that anger, there was a lot of passion, especially passion for knowledge, which is what he tried to teach us, his children.

    I wonder whether that is the kind of way people will describe me when someday I'm gone. I can't help wondering on an anniversary of a birth how the death will play out. I imagine that many people will describe as angry (just the other day Jaimee was explaining why she wouldn't let me drink at a group unwinder a few weeks ago -- she thought I seemed especially angry at the time and didn't want to have alcohol added to that mix, because essentially she was worried about me. Though in retrospect I did happen to go out and get drunk that night and everyone survived so I guess it would have been okay).

    But many people use those same words (angry, difficult, cantankerous, feisty, passionate) to describe me. And though I lack the talents that Ed Vega had as a writer (among other things), I have trouble imagining these qualities as being so truly negative if someone whose talents I admire so highly thinks well of someone who embodied them for so many.

    I am not saying I am as good as he was. Because I know I'm not.

    Though I still have a few years left to work on that....

     

    This blog brought to you by (U+221e, aka INFINITY)

  • Sorting it all Out

    When Flat is not the Standard, the System is null?

    • 10 Comments
     

    The question to the alias was simple enough:

    I have an issue dealing with winform button,  I want to get the font used by button by sending WM_GETFONT message, if button’s FlatStyle=Standard, it returns NULL, however if button’s FlatStyle=System,  it returns the correct font handle, anyone can take a look?

    According to MSDN, if WM_GETFONT returns NULL, it means system font, but I specifically set the button font to a non-system font, and this seems it’s a bug to me.

    Everything is actually behaving as it was designed to, though.

    What is important to remember is that Windows Forms or "WinForms" is layer on top of Windows -- sometimes a very thick layer with a lot of functionality, and other times a vey thin layer that gives the smallest amount of an illusion a it can get away with.

    Now for the FlatStyle=System setting on the button, the wrappet becomes very thin -- and the OS is used to do everything, including the rendering.

    While when FlatStyle=Standard is used, however, the control goes the "owner draw" route, and since Windows doesn't really need a font itself in that case (and since .NET has to have it's own managed Font object to do its own work, there is no benefit to creating an extra object and storing it when it is not needed.

    Now this deisgn had very interesting consequences when WinForms was supporte don Win9x -- because it meant that FlatStyle=System would not support text of the default system code page, while FlatStyle=Standard would support any text you had the font for.

    And even moving into NT-based platforms, you could easily find yourself dealing with the differences between GDI/Uniscribe being used in one case vs. GDI+ in the other.

    This particular WM_GETFONT issue is an interesting side effect (the reason the questioner wanted to use the WM_GETFONT message was for a test tool that was itself using native code, trying to verify what the font was on a particular control.

    And this even provides a workaround!

    I mean WinForms doesn't bother setting up an object that it does not need. And I can respect that (plus you should, too).

    But if your particular application does care about it, then you have no reason to stay with the limitation....

    If your managed code wants to call WM_SETFONT itself by taking the Font object you have and using Font.ToHfont, it can definitely do so -- and then the font will be available to the process that wants to be using WM_GETFONT. Though keep in mind the rules that Font.ToHfont mentions:

    When using this method, you must dispose of the resulting Hfont using the GDI DeleteObject method to ensure the resources are released.

    This mirrors what the WM_SETFONT message says:

    The application should call the DeleteObject function to delete the font when it is no longer needed; for example, after it destroys the control. 

    in case you weren't convinced yet. :-)

    Now note that you do not necessarily have to wait until the control is destroyed -- no one is using that HFONT other than you, so you can destroy it when you are done with it. Though you should probably call WM_SETFONT with a NULL wParam just in case some other process you don't know about has similar ideas to yours and tries to use an object after you have stopped doing so.

    Now moving back to these kinds of behavioral differences that pop up, where entirely different code paths get utilized based on solitary settings like the FlatStyle of a button, as I said way back in the beginning, that's by design. Though I wouldn't mind seeing more advasnced documentation cover consequences like the ones mentioned here, to make it easier to understand the results in one's application....

     

    This blog brought to you by ি (U+09bf, aka BENGALI VOWEL SIGN I)

  • Sorting it all Out

    You're not my type if you have no culture

    • 0 Comments

    This blog is about CultureTypes.

    The list can be kind of confusing, perhaps even a bit daunting to people.

    But it all makes sense, if you know where it came from.

    Thus this blog!

    The full list of members of the CultureTypes enumeration are:


    Member name Description
    NeutralCultures Cultures that are associated with a language but are not specific to a country/region. The names of .NET Framework cultures consist of the lowercase two-letter code derived from ISO 639-1. For example: "en" (English) is a neutral culture.
    SpecificCultures Cultures that are specific to a country/region. The names of these cultures follow RFC 4646 (Windows Vista and later). The format is "<languagecode2>-<country/regioncode2>", where <languagecode2> is a lowercase two-letter code derived from ISO 639-1 and <country/regioncode2> is an uppercase two-letter code derived from ISO 3166. For example, "en-US" for English (United States) is a specific culture.
    InstalledWin32Cultures All cultures that are installed in the Windows operating system. Note that not all cultures supported by the .NET Framework are installed in the operating system.
    AllCultures All cultures that ship with the .NET Framework, including neutral and specific cultures, cultures installed in the Windows operating system, and custom cultures created by the user.

    UserCustomCulture Custom cultures created by the user.

    ReplacementCultures Custom cultures created by the user that replace cultures shipped with the .NET Framework.

    WindowsOnlyCultures Cultures installed in the Windows operating system but not the .NET Framework.

    FrameworkCultures Neutral and specific cultures shipped with the .NET Framework.

     Now you will notice that there are two categories there -- the ones supported by the XNA Framework/the Compact Framework, and the ones that are not.

    A better way to look at these two categories are the culture types that existed prior to the 2.0 .NET Framework, and the ones added in 2.0.

    Now prior to 2.0, the model for the split was as follows:

    • The neutral/specific categorization was designed along necessary functional lines since the nature of the need for cultures really requires distinguishing them this way based on the data that is needed;
    • The InstalledWin32Cultures was a fledgling attempt to allow support for applications that needed to care about whether he underlying operating system also supported a specific culture -- crucial in a world where one might be running on windows 95 (and one may be quite excited to do so), but support of Hindi is obviously not going to be there whether the BCL supports it or not.

    Note that when .NET 1.0 shipped, based on the Server 2003 data, it was (with one solitary almost entirely unknown exception) a superset of everything else available. Therefore the above model handles the scenarios one might throw at .NET quite well.

    Arguing for handling the world of "supported in Windows but not in .NET" would be foolish -- that was an empty set for literally years.

    But then XP SP2 and ELKs came, then eventually Vista came.

    .NET suddenly had to care about the scenario that never mattered before.

    Starting in 2.0, the split between

    • Cultures only supported by the .NET Framework's data store, and
    • Cultures only supported by the underlying operating system upon which the .NET Framework sits atop, and
    • Cultures only supported by user-provided data

    all had to be there, since there were many potential cases where an application might in theory have to change its behavior.

    You know, change it based on what kind of culture is being used.

    I have run across such applications in the past so I know they exist, and some of the scenarios are ones that I find totally reasonable

    Further reading of random circumstances where this has come up:

    You get the point. The consequences if trying to build a Culture architecture that will support all manner of applications running on all kind of version of Windows or hosted in several different versions SQL Server or on several different versions of web server and so on are huge. The current potential available feature set in .NET works with what is there now.

    Though the average application never has to really care about any of this.

    Hopefully the ones that do will see its designers much happier after they have read this blog. :-)

     

    No character really felt like sponsoring today; I get the feeling they are up to something!

  • Sorting it all Out

    When to make a change, when to stay the same

    • 0 Comments

    The roots of this blog run pretty deep.

    It starts within months of the very beginning of this Blog, with FoldString.NET? No, but Whidbey has Normalization (which is kinda more cooler), where the fact that the NLS API function FoldString, which predates Unicode normalization by many years, provides very Normalization-esque functionality.

    In fact, the analogues given are valid on their face since they conceptually do the same thing:

    FormC      MAP_PRECOMPOSED
    FormD      MAP_COMPOSITE
    FormKC     MAP_PRECOMPOSED | MAP_FOLDCZONE
    FormKD     MAP_COMPOSITE | MAP_FOLDCZONE

    As an interesting side point,. the fact that Asmus Freytag was heavily involved with both the Microsoft function when the spec was originally being developed and the UAX when it was being written is not a coincidence -- he was instrumental in both of them.

    Of course there can be (or, in this case, is) a wide chasm to deal with.

    You know, the chasm between the concept and the reality.

    As an example, if you run the following code on an XP machine to compare Microsoft's FoldString functionality with Unicode's normalization, looking just at the Basic Multilingual Plane (U+0001 to U+ffff):

    using System;
    using System.Text;
    using System.Globalization;
    using System.Runtime.InteropServices;

    public class Test {
        [DllImport("kernel32.dll", CharSet=CharSet.Unicode, EntryPoint="FoldStringW", ExactSpelling=true, CallingConvention=CallingConvention.StdCall)]
        private static extern int FoldString(uint dwMapFlags, string lpSrcStr, int cchSrc, StringBuilder lpDestStr, int cchDest);

        private const uint MAP_PRECOMPOSED = 0x00000020;  // convert to precomposed chars
        private const uint MAP_COMPOSITE = 0x00000040;  // convert to composite chars

        public static void Main() {
            for(ushort uch = 0x0001; uch != 0xffff; uch++) {
                if((CharUnicodeInfo.GetUnicodeCategory((char)uch) == UnicodeCategory.Surrogate) ||
                   (CharUnicodeInfo.GetUnicodeCategory((char)uch) == UnicodeCategory.OtherNotAssigned)) {
                    continue;
                }
                StringBuilder sb = new StringBuilder(10);
                string st = ((char)uch).ToString();
                string stFormD = st.Normalize(NormalizationForm.FormD);
                string stComposite;
                int ret = FoldString(MAP_COMPOSITE, st, -1, sb, sb.Capacity);
                if(ret > 0) {
                    stComposite = sb.ToString(0, ret - 1);
                    if(stComposite != stFormD) {
                        Console.Write("USV: ");
                        Console.Write(uch.ToString("x4"));
                        Console.Write("   |||   Microsoft: ");
                        for(int ich=0; ich < ret - 1; ich++) {
                            Console.Write(((ushort)stComposite[ich]).ToString("x4"));
                            Console.Write(' ');
                        }
                        Console.Write("   |||   Unicode: ");
                        for(int ich=0; ich < stFormD.Length; ich++) {
                            Console.Write(((ushort)stFormD[ich]).ToString("x4"));
                            Console.Write(' ');
                        }
                        Console.WriteLine();
                    }
                }
            }
        }
    }

    You will see that even just looking at the MAP_COMPOSITE vs. Normalization Form D case, there are 12,224 entries that have different results.

    Now 11,172 of those are Korean so we'll throw those out for a moment.

    There are still 1,052 differences between the two.

    Now in Vista, the work was done to dump these older tables and instead call the normalization functionality provided by the NormalizeString function that was now also a part of the NLS API. That work involved some interesting tradeoffs that might be worthy of a blog another day, but for now I have a totally different set of things to talk about....

    You see, we now have to talk about the other place that these prehistoric normalization-esque are used.

    In the MultiByteToWideChar function and its MB_PRECOMPOSED and MB_COMPOSITE flags, which map as one would expect to the MAP_PRECOMPOSED asnd MAP_COMPOSITE flags from FoldString.

    Now I am not going to pretend that these flags are such great things -- in fact I am on record (ref: A few of the gotchas of MultiByteToWideChar and The MB_PRECOMPOSED flag is stupid, and the MB_COMPOSITE ain't no genius either) explaining how one of them is bad and the other is not needed since t is the dfault and can cause bugs by being passed gratuitously.

    It is also connected to the WC_COMPOSITECHECK flag for WideCharToMultiByte that I blogged about in A few of the gotchas of WideCharToMultiByte, though not nearly as much to worry about except occasionally. We'll ignore it for now. :-)

    Now it is true that changing a mapping function whose job it is to try it's best to map as requested, and that changing to provide better data is a good thing, is for the majority of people a decision. I believe that, and it is a decision I would defend as being the correct behavior.

    But changing the behavior of MultiByteToWideChar to cause it return so man potential differences, that is another kind of decision entirely. In that case I would defend the decision not to change the behavior.

    Especially when we are on record as not ever wanting to add, remove, extend, or modify code pages!

    Even though this extra operation is not in the code pages themselves, it is behavior that is built into the functions. And claiming on the one hand that we won't make changes ever while deciding on the other hand to make over a thousand changes? It is easy to imagine customers being unhappy with the end result.

     

    Sponsor? We don't need no stinking sponsor! :-)

  • Sorting it all Out

    It used to be right, dammit!

    • 2 Comments

     

    The GetDateFormat function from the NLS API has been around for a while.

    Like since NT 3.5 and Windows 95.

    The documentation for this function has of course been expanded and modified and edited many times since it was initially added to the Platform SDK.

     

    One bit from the documentation has probably been around since the very beginning:

    dwFlags
        [in] Flags specifying various function options that can be set if lpFormat is set to a null pointer. If lpFormat is not set to a null pointer, this parameter must be set to 0. The application can specify a combination of the following values.

    One bit of it:

    If lpFormat is not set to a null pointer, this parameter must be set to 0.

    used to be right.

    And it was right for a reason -- because most of the flags relate to specifying the kind of format to use and such things are really only interesting if you aren't passing a format.

    I mean, how on the earth can a function respect a request to modify th builtin format like DATE_SHORTDATE or DATE_LONGDATE or DATE_YEARMONTH or LOCALE_NOUSEROVERRIDE if you pass your own format that would ignore your request?

    Better to fail fast and fail consistently!

    But then new flags like

    • LOCALE_USE_CP_ACP, that only affects the encoding (code page) of the returned string when the ANSI function is called, and
    • DATE_USE_ALT_CALENDAR, that mostly only affects the date value itself that will be formatted and not the underlying format used, and
    • DATE_LTRREADING and DATE_RTLREADING, that only affects characters to insert into the resultant formatted string
    started getting added.

    Any of those might make sense to want to add even when you explicitly pass a format.

    It would be completely unfair if we didn't let you pass one of them if you needed the results.

    Luckily, the function itself works for these cases, even though the documentation doesn't know about this!

    Especially since there is that GetDateFormat wrapper that will add sometimes add flags even if the lpFormat is not null....

    No worries, the documentation can be updated. The weird part is that this really has been wrong for a long time. No one ever noticed it before, even to complain about how the wrapper function might be wrong?

    I guess it means no one reads the documentation either.... :-)


    This post brought to you by U+200f, RIGHT-TO-LEFT MARKER

  • Sorting it all Out

    Tavultesoft is one of the company names mispronounced more often than Trigeminal

    • 13 Comments

    Regular readers may recall that I have mentioned Marc Durdin in the past, especially in posts like the recent The key to key messages is a key contribution, where I went on for a bit about the fact he impresses me professionally....

    I also enjoyed the Australian beer that he and Gary McMullan brought for me when I saw them last. I suppose that might be being happy with the two of them personally. :-)

    I vaguely recall the night when Marc and Peter Constable were ordering Thai food in Thai was also fun. If he were not spending so much of his life down under he'd be cool to hang out with, I imagine.

    And Marc's father John Durdin has probably forgotten more about Lao then I might ever have the opportunity to know ever if I moved to Laos tomorrow and spent the rest of my life there. And I am not just talking about his sorting efforts, which we wouldn't be as good as even if we were working properly. Note that this paragraph has nothing to do with anything, except to point out that his dad impresses me too!

    Anyway, I bring up Marc for a reason.

    The other day something very cool happened.

    Tavultesoft joined the Unicode Consortium as an Associate Member!

    Tavultesoft Pty Ltd.

    In their own words from their site:

    Tavultesoft is the developer of a market-leading keyboard mapping software, Keyman. Keyman brings a simple solution to the complexity of typing in a range of languages and scripts. It is the solution for languages that are either unsupported or only partly supported by the operating system. The Keyman product family includes keyboard design tools, Windows-based keyboard mapping, and web-based JavaScript keyboards. Keyman has attracted users from around the globe who both benefit from the software and the keyboard layouts available. Linguistic experts around the globe contribute their expertise and skills to develop keyboard layouts for both common languages and languages that otherwise would have no support on computers. Keyman is now in its 7th release since it was first developed in 1992.

    Now Keyman is a product that I think is cooler than MSKLC for several reasons, including the obvious such as the fact that it covers scenarios that MSKLC doesn't, such as Win9x.

    Though one of the things that really impresses me is that Marc Durdin of Tavultesoft actually dug in to all of the Text Services Framework interfaces and such and figured them out. Enough to produce a working prodcuct, and enough to be able to push back on Microsoft when they ran into bugs -- which more often than not were actual bugs and limitations in the Text Services Framework!

    The people who can dig in to complex components like this (the work of Rick Cameron of Crystal Decisions to support Uniscribe and also to support and encourage the extension of MSLU while it was in early beta under development and there was no information about it is another example) are impressive because the lack of samples and sometimes even documentation does not daunt them.

    They know that we are likely full of crap if we claim it's easy since we don't have samples out there, but they go in and figure out the hard stuff anyway.

    Plus the many things that Keyman can do -- that MSKLC and Text Based TSF TIPs can't -- make it fairly unique among such tools and required for sensible input methods in more languages than many experienced folks in this area can fathom....

    Anyway, enough gushing. Welcome, Tavultesoft, to Unicode, as an associate member!


    This blog brought to you by(U+0e9f, aka LAO LETTER FO SUNG)

  • Sorting it all Out

    As the comma turned (in space!)

    • 2 Comments

    Unicode List regular Karl Pentzlin wrote to the list the other day:

    On http://www.iau.org/public_press/news/release/iau0807/ ,
    the IAU (International Astronomical Union) publishes a press release
    of 2008-09-17 "IAU names fifth dwarf planet Haumea".

    There, also the names of two moons of this dwarf planet are announced,
    the larger of them being named Hiʻiaka (after a Hawaiian goddess).

    It is pleasant to see that this name is in fact spelled correctly in
    the recent version of that press release, including U+02BB as the
    correct encoding for the Hawaiian ʻokina.

    This even is done in the plain text file downloadable from that
    site, which is UTF-8 encoded.

    Thus we have now a celestial body which is officially given a name
    which requires Unicode to be spelled correctly, rather than simply
    ASCII (aka ISO 646) or ISO 8859-1.

    From that press release:

    Haumea sits among the trans-Neptunian objects, a vast ring of distant cold and rocky bodies in the outer Solar System. At this moment it is roughly 50 times the Sun-Earth distance from the Sun, but at its closest the elliptical orbit of Haumea brings it 35 times the Sun-Earth distance from our star.

    Haumea is the name of the goddess of childbirth and fertility in Hawaiian mythology. The name is particularly apt as the goddess Haumea also represents the element of stone and observations of Haumea hint that, unusually, the dwarf planet is almost entirely composed of rock with a crust of pure ice.

    Hawaiian mythology says that the goddess Haumea's children sprang from different parts of her body. The dwarf planet Haumea has a similar history, as it is joined in its orbit by two satellites that are thought to have been created by impacts with it in the past. During these impacts, parts of Haumea's icy surface were blasted off. The debris from these impacts is then thought to have gone onto form the two moons.

    After their discovery, in 2005, the moons were also given provisional designations, but have now too been given names by the CSBN and the WGPSN. The first and largest moon is to be called Hiʻiaka, after the Hawaiian goddess who is said to have been born from the mouth of Haumea and the matron goddess of the island of Hawaiʻi. The second moon of Haumea is named Namaka, a water spirit who is said to have been born from Haumea's body.

    Now that is indeed quite cool, for reasons I am having trouble fully identifying.

    I mean, language is everywhere.

    And Unicode captures language.

    So why be so interested when people get better and better at supporting language?

    No idea. But I love it. :-)

     

    This blog brought to you by ʻ (U+02bb, aka MODIFIER LETTER TURNED COMMA - typographical alternate for U+02BD or U+02BF, used in Hawai`ian orthography as `okina (glottal stop))

  • Sorting it all Out

    Survey says? Nine Inch Wails!

    • 4 Comments

    Nothing that is even familiar with something vaguely reminiscent of anything vaguely technical....

    So one of those interesting mails from the Nine Inch Nails mailing list included information about a survey they wanted some people to take, which included a little bit of the bribeinducement to encourage people to take it:

    As an incentive, everyone who completes the survey will be able to download a video of live performance from this most recent tour (and I know what's going through your little minds right now: "I'll just grab this off a torrent site and not have to fill out the survey!!!" and guess what? You will be able to do just that and BEAT THE SYSTEM!!!! NIN=pwn3d!!!)
    BUT
    What if we were to select some of those that DO complete the survey and provide them with something really cool? I'm not saying we'll ever get around to it, but if we did maybe something like signed stuff, flying someone to a show somewhere in the world, a magic amulet that makes you invisible, a date with Jeordie White (condoms supplied of course), you know - something cool. See, you'd miss that opportunity...

    I was left wondering, I mean given my real disinterest in a date with bassist Jeordie White (not to mention fear of any activity that would make condoms seem prudent were such an event to happen -- seriously!), whether I had filled out some prior survey incorrectly or had signed up for the wrong mailing list way back when or whether the rule about not getting too drunk at the show had been broken once too often andI had forgotten something really important....

    It actually reminded me of this blog, and this XKCD comic:

    Voting Machines

    Because, while better than the alternative, clearly this would be a date gone horribly wrong....

    On the other hand, being flown to a show somewhere in the world is a real treat I've seldom (but not never) had though never with NIN, and the magic amulet could really come in handy!

    So the list was not entirely repugnant to me.

    Plus Jeordie is a hell of a bassist, and I'd certainly be willing to buy him a glass of whatever he was drinking.

    So why exactly did my mind rebel about this survey, just because of the paragraph? "Was I really such a cliché" I wailed softly as I was telling this story to Andrea a little while ago and she asked me close my eyes, after which she read the paragraph again but inserted the name of an attractive female bassist I've never expressed interest in previously, in the same spot Mr. White originally occupied.

    And that actually bothered me too -- as much as if not more than the original words.

    So, as Andrea was able to determine via nearly scientific methods, I'm a prude. :-)

    Brilliant!

    After we got off the phone, I decided to fill out the survey. I figured I could always give the date with Jeordie to Sam if I won it. She's a true groupie and I'm sure she'd appreciate it more than I would, anyway.

    The video was cool, in any case. :-)

    That reminds me! I managed to acquire tickets to see Liz Phair next month. A very different show than Trent Reznor would put on, obviously. But both tend to be quite memorable even if two very different ways, even to a prude like me.

    Maybe I'll even blog about it....


    This blog brought to you by(U+ff2e, aka FULLWIDTH LATIN CAPITAL LETTER N)

  • Sorting it all Out

    A lot of problems to enumerate...

    • 5 Comments

    The question (asked over in the microsoft.public.win32.programmer.international newsgroup) was:

    I think there is something fundamental that I missing about the CompareString API.  Distilling a lot of confusion, I have an example where something like...

    LPCTSTR cTest = L"{\rtf";

    CompareString( LOCAL_INVARIANT, 0, cTest, -1, cTest, -1)  does not return CSTR_EQUAL.

    CompareString( LOCAL_INVARIANT, NOM_IGNORECASE, cTest, -1, cTest, -1) does not return CSTR_EQUAL.

    It does return CSTR_EQUAL on XP with English locales (UK and US), but fails with a German locale.

    Shouldn't a string be equal to itself?  What am I missing?

    Alternatively, can someone suggest a better way to do a case insensitive test of equality that works independent of the locale?  I can see where sorting would be locale dependent, but it seems that equality should not depend on locale.

    Thanks.

    Now there are a whole bunch of problems here, but I thought I'd open it up to the peanut galleryreaders if anyone wants to enumerate errors here, and by errors I mean technical errors, process errors, errors in the questions being asked, usage errors, conventional errors, best practice errors, errors in language usage, errors in spelling, errors in the color coordination of the person asking, literally anything -- provided you can prove it by using the text above to do so (i you can figure out how to prove the color coordination issue I will be quite impressed!).

    Anyone who was there for the first time but has not forgotten the answers, and anyone who tries to find the old thread in the newsgroups is disqualified.

    Psychic debugging encouraged, and I'll be using it to determine if people cheat on any of the above. :-)

    Ready, set, go!

  • Sorting it all Out

    Michael hasn't worn pants in months

    • 1 Comments

    I recently had an epiphany, about some of the ultra-fast communication technologies such as TEXT'ing and Twitter (aka blogs for the ADHD).

    It came up because I was having dinner with someone who happens to be a big twitterer.

    My exact words were along the lines of "this is the first time I have worn pants in months."

    As the statement alludes to, I was wearing pants at the time. Though I was wearing sandals, sans socks. I only know of two people these days whose company I can stand who avoid wearing shoes pretty much as often as possible. And I am not one of them. Thus the sandals.

    Please don't worry, there was no danger of me being arrested for indecent exposure, though. Because I was wearing shorts all of those other times throughout the summer.

    I was going to talk about the epiphany.

    You see, I witnessed what could really only be a serious temptation to send out a tweet to the twitterverse that relayed the "Michael hasn't been wearing pants for months" message.

    A message that kind of had its interest obviated a moment later when the incredulous grin inspired me to to explain that I have been wearing shorts all summer.

    Now dinner was just the two of us, but imagine a third person was there. Say a female friend who was flirting with the attractive male server and who therefore missed the whole "pants can't touch him" issue.

    And let us further suppose that the laugh and "HUH?" response from my first friend was enough to distract the other friend from her server seduction. My first friend would volunteer the "Michael apparently stopped wearing pants a few months ago!" information and a whole little subconversation would start.

    Is that so different than what would have happened with the tweet and the subsequent tweets it would inspire?

    So there you go -- Twitter is the alternative to being right there. For some, who will send tweets to people who are also there, it is even the semi-private subvocalized bit of conversation as well.

    People often think of Twitter as supporting "microblogging". Now if you blog like Scoble does that may be true, but for most people it isn't.:-)

    Now if you think about the way people might (usually, opefully) think before they blog -- not me, of course, but some people -- they may be less likely to think before they say something due to how much less permanent conversation feels.

    And thus we get to why I don't tend to use Twitter, or to send text messages.

    Because though I can name aloud a small list of people who believe I never think before I blog, they are dead wrong. I do, especially after one that was blogged the same way one might tweet and regretted it not long after.

    And although it seems less permanent than blogs, in its own way tweeting is not -- someone determined to find examples of your mistakes will have an easier time if they just follow you on Twitter. It is like having someone carry a microrecorder around to keep your conversations on file (though with Twitter the ones being recorded are volunteering the content).

    None of that is in my nature.

    Now with all of that said, I actually do use Twitter.

    However, I only use it as a way to forcibly lower the size of my blog titles (a problem I had no other way to control previously -- I am just too wordy!).

    But by using Twitter to send my blog titles, which become Facebook status messages, the fact that I send out a blog title this way keeps the size of the title down since Twitter limits one to 140 characters per tweet.

    I follow no one on Twitter, and no one follows me, which might seem irresponsible since it would provide an easy mechanism by which people could know that a blog went live.

    But remembering the traditional Twitter is about literal conversation, I hardly think that "Michael just blogged a blog on his Blog" is important enough to say out loud in the middle of a conversation. Thus it would be inappropriate for me to use Twitter for such a thing -- I'm simply not that important. :-)

    Perhaps if I were, I might have been wearing pants today....

     

    This blog brought to you by t (U+0074, aka LATIN SMALL LETTER T)

  • Sorting it all Out

    The Company Meeting, the interesting science of Forensic Typography, and what happened after

    • 4 Comments

    So a few days ago, I was at the Microsoft Company Meeting.

    Nothing I haven't done before, mind you. Regular readers might even remember when I have blogged about it previously (eg here).

    This time, I figured I've actually been doing the 545 bus into Seattle often enough that I could trust that particular way of getting there, scooter and all.

    I had already learned about the accessibility options and such, so I felt ready.

    And it actually worked out pretty well. :-)

    I wore the group tshirt they handed out, and I brought the hat (though I didn't wear the hat; I'm not a hat guy!).

    As happened last time, I did not bother with the assigned section for the group since it is about as accessible as K2 would be, for the scooter. I went right to the field.

    The funny part was that unlike the first time, no one questioned where I was going -- I think people assume if you look confused that you might be in the wrong place. But if you go boldly forward then they just assume you know what the hell you are doing....

    I won't talk about the meeting content itself, other than to say I did not leave feeling inspired. If people are interested then Mini might have something to say. :-)

    But I did have a chance to talk to several of the other folks on the field level, and I got to learn about what a bunch of other groups do (and as usual I got to explain that no I don't speak a lot of languages, and no I won't do localization!).

    Like one person who actually reads my blog, who congratulated me on the Bulldog Award. I'll admit it, that was cool. And no, I don't carry the award around with me, though it was a very funny question to be asked, as was whether the award helps me with the ladies (no, I explained, it doesn't -- being stubborn can often be a detriment there!).

    And then one group in particular fascinated me -- the STB Finance group.

    They had these great shirts with text that said:

    Got Budget?

    Server & Tools Finance

    Now yes I thought it was funny.

    But that wasn't what got me.

    What got me was that the font they used -- Comic Sans MS.

    I told them I thought this was totally awesome!

    They were actually surprised that I knew what the font was, so we talked for a bit about the fascinating science of Forensic Typography, something I am really just a beginner in (it is amazing to watch the experts) but after seeing Helvetica I have been noticing both that font and Comic Sans MS more often. They just kind of jump out at you.

    As the meeting was winding down they even gave me one of the shirts:

    Awesome!

    That night I finally went on one of those outings from the Seattle Barnight group on Facebook (they are a little less formal then the Seattle Anti-Freeze folks, and they don't require one to be a couple to at the event). I had been missing events for months (always other stuff going on), but with the scheduled bar just blocks away from the unofficial after-party I was at from the Company Meeting, I just decided to go for it.

    An I had a great time, plus met a bunch if people who were all from the area but weren't all employees if Microsoft (no offense, but 'Softies do have their drawbacks, particularly when it comes to non-geeky conversations!).

    All things considered a great use of a day, and although slightly hungover in the morning the next day (Friday), I was able to jump in and get a bunch of interesting work done. Because the v.next work is still quite interesting, even if I can't talk about it just yet....

     

    This blog brought to you by(U+0df2, aka SINHALA VOWEL SIGN DIGA GAETTA-PILLA)

  • Sorting it all Out

    Sorting the Vowels all Out

    • 0 Comments

    The other day in Sorting the DPRK all Out, I mentioned:

    Let's ignore the vowels for a moment, I'll talk about those another time (I have different linguistic theories to draw in for them!).

    Well, it is officially time to stop ignoring the vowels. :-)

    I started writing up the blog about the vowels and then I saw that Wikipedia already most of the info I was going to be writing about, plus adding one or two additional points.

    First on the historical order for the vowels:

    ㆍ ㅡ ㅣ ㅗ ㅏ ㅜ ㅓ ㅛ ㅑ ㅠ ㅕ 

    [ARAEA  EU  I  O  A  U  EO  YO  YA  YU  YEO]

    Then on the South Korean ordering for the vowels:

    ㅏ ㅐ ㅑ ㅒ ㅓ ㅔ ㅕ ㅖ ㅗ ㅘ ㅙ ㅚ ㅛ ㅜ ㅝ ㅞ ㅟ ㅠ ㅡ ㅢ ㅣ 

    The modern monophthongal vowels come first, with the derived forms interspersed according to their form: first added i, then iotized, then iotized with added i. Diphthongs beginning with w are ordered according to their spelling, as ㅏ or ㅓ plus a second vowel, not as separate digraphs.

    [A  AE  YA  YAE  EO  E  YEO  YE  O  WA  WAE  OE  YO  U  WEO  WE  WI  YU  EU  YI  I]

    And finally the North Korean ordering for the vowels:

    ㅏ ㅑ ㅓ ㅕ ㅗ ㅛ ㅜ ㅠ ㅡ ㅣ ㅐ ㅒ ㅔ ㅖ ㅚ ㅟ ㅢ ㅘ ㅝ ㅙ ㅞ 

     All digraphs and trigraphs, including the old diphthongs ㅐ and ㅔ, are placed after all basic vowels, again maintaining Choe's alphabetic order.

    [A  YA  EO  YEO  O  YO  U  YU  EU  I  AE  YAE  E  YE  OE  WI  YI  WA  WEO  WAE  WE]

    Now despite the claim of maintaining the original alphabetic order, really both the ROC and the DPRK ordering clearly deviate from Choe's order in several cases - each heading off in slightly different directions. Different enough to make one feel like one is looking at an English vs. Lithuanian on where to put the "Y" type split!

    Even ignoring the comments about the basis for the differences, you can kind of see the basis each one is using just by looking at the vowels themselves (I added the Unicode names to each to hopefully be of some help here, though for a couple of the vowels this may actually be a bit of a hindrance!).

    Though once again this kind of highlights the huge differences between the ordering of the modern 11,172 Hangul syllables, since each of the 11,172 contains a vowel and clearly the vowels have been shifted quite a bit here.

    The Wikipedia Vowel Jamo design topic gives some interesting background on the vowels in general. I will likely talk about the vowel harmony that used to happen in Korean 600-odd years ago and what happened to it in modern times. And all that is left other than that is to talk about is the additional Jamo used for Old Hangul encoded today, as well as the new Jamo being added to Unicode that I mentioned in Using a character proposal for a 'repertoire fence' extension.

    I haven't decided whether that is one more blog post or two, but you'll know the next time I blog on this subject. :-)

     

    This blog brought to you by(U+ae99, aka HANGUL SYLLABLE SSANGKIYEOK YA IEUNG)

Page 1 of 4 (50 items) 1234