Blog - Title

February, 2008

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Sorting it all Out

    Behold the Table Driven Text Service, Part 10 ("Even Jove nods," Atlas shrugged, "so we'll convert their old files, now.")

    • 7 Comments

    Prior posts in the series:

    Last time, I explained a bit about what this one would be about -- it is a pure IME blog posting, so if IMEs do not interest you then you can skip on and wait for the next part....

    My exact words were:

    Next time I'll look at how to convert from some on the old style IMEs if you have been using them in the past...

    I will now explain what the hell I was talking about. :-)

    You see, in prior versions of Windows, Microsoft provided for Traditional and Simplified Chinese users the dotIME which was similar to the Table Driven TIP in terms pf lots of its functionality, though it was based more directly on the old IMM (Input Method Manager) based IMEs.

    In addition, end users could be create there own dotIME by providing a "dictionary" text file and then running one of the conversion tools (named ImeGen.exe and UImeTool.exe).

    Now this text fule format, while very useful for its time, was not the format that the new Table Driven Tet Service IMEs use. Just a slightly different direction, one that is not being carried forward into Vista and beyond. Mea culpa, etc. -- this new format, maybe it will last longer!

    Therefore with Vista, those tools (ImeGen.exe and UImeTool.exe) are no longer supported, but Microsoft wanted to provide a way so that a customer could provide the text file nd it would convert the dictionary to the Table Driven TIP dictionary format....

    The syntax to make the conversion happen (all on one line)

    RunDll32.exe "%ProgramFiles%\Windows NT\TableTextService\TableTextService.dll" DictionaryGenerator [options] <output text file name>

    Note that special, case-sensitive DictionaryGenerator keyword.

    And the options are as follows:

    -format:<name>

    Specify input text file format

    <format>                Traditional | Simplified | HongKong

    -section:<name>:<input text file name>

    Specify input text file name for each particular section

    This option could be specify one or more of the following:

    <name>                  SettingFile | KeyStroke | Radical | Text | Phrase | Symbol

    <input text file name>  Specify input text file name

    -cp:<code page>

    Specify code page value for convert text file to Unicode if input text file(s) are in some other code page. Note that there is only one setting to be used by all files.

    <output text file name>

    Specify output text file name

    Now this converter will only work from the text file, not the binary format file that the older tools (ImeGen.exe and UImeTool.exe) generated.

    But if you have the text files this converter will reportedly save you some trouble. :-)

     

    This post brought to you by(U+2469, aka CIRCLED NUMBER TEN)

  • Sorting it all Out

    Moving is a huge horking pain, even to a window office

    • 5 Comments

    It was just a few months ago (in In my opinion, the only thing worse than an office move is a largely gratuitous one and Clarification on my concerns with gratuitous moves) that I expressed my displeasure at moves done for let's say less noble reasons. You may have read about it back then (though there were 100 posts that month so perhaps you might have missed it).

    Anyway, I also mentioned at one point that with our CSS contingent moved out and that there were seven window offices right near me that were now available, so I suppose it was inevitable that I would be tapped.

    As those who know me can probably imagine, when I was sent the congratulatory email from the admin, it was not a simple thing.

    I had conditions. :-)

    I was in India. And the moving date was also set up for when I was still going to be in India.

    Oops!

    So I did push back a bit on the early move date.

    Plus I was going to be coming back (date uncertain at the time) yet I had to be at the end of the (then upcoming) UTC meeting, so the move had to be after the UTC with enough days after that I was able to get some work done.

    At the meeting (at Apple) I discovered at 1 Infinity Loop that the new Limonata cans are so cool that even kickass Adobe software engineers like Eric Muller pale in comparison:

    Plus 1% more lemon juice. I wish my company cared enough about me to stock Limonata in the cafeterias. Hell, it's not like it costs Apple anything (they charge for it there). Sigh....

    Okay, back to the office move thing.

    So we had successfully negotiated the move dates. But I had more conditions!

    I asked whether there would be a straight shot from the door to where I could park the scooter, so that I could stop parking it in the hall.

    She said she wasn't sure.

    I got a little stubborn.

    Essentially I pointed out the fact that in that case, I wasn't sure that I wanted to move.

    Pause before the next email arrived.

    I might have almost talked myself out of the move by this point.

    Something similar almost happened with my original job offer, in fact.

    But admins are made of sterner stuff.

    She looked at the office in person and answered back that it looked fine for the scooter. And since I would be in town on the move day I could pick the right layout to guarantee that.

    So the move was set up for after 5pm on the 11th of February.

    Just after the admin set this up the announcement that she was leaving the group went out. I won't claim they are related, just like I don't blame myself when my 5th grade teacher Ms. Simon retired the year after she was my teacher despite our regular manipulation of her via the scheme I invented entitled "Operation Happy Scan" that proved to be so popular with getting us out of certain kinds of work. It was when I first learned that adults who are not one's parents can also be subject to manipulation.....

    In any case, let's just say that the admin leaving after this whole mess of setting up my move was a coincidence. :-)

    Of course packing is a huge horking pain in the ass, but then again so is asking for help....

    Though I did get help from Russ to move an extra desk into the new office (the movers have problems with putting two corner pieces into a single).

    And then I moved the computers and a few fragile items myself. I hav never trusted the movers and  never had cause to regret my lack of trust.

    The next day I unpacked most of the boxes and left early.

    And the day after that I was so wiped out that I only went in for a few hours and then took the rest of the day off sick. The whole MS thing making me go home sick from MS....

    I had a few meetings to go to, one of them with a Principal Group Program Manager who seemed concerned that I was so tired and then surprised that they would do a move right now. Which once again makes me suspect that the whole gratuitous move is a rumor on its way to reality soon. :-(

    Anyway, I am now moved in, and here is the office:

    Oh and here is the other side of the room with the important things like the bookshelf and the candy:

    The refrigerator in that first picture is stocked with a full case of Limonata and the microwave and toaster are plugged in (as are the three computers on the switcher). And the Miro is there too, though not yet on the wall (but then neither are the posters -- that move rumor is freaking me out too much for some of that at the moment!).

    I worked from home yesterday, I had no meetings and I was answering questions by email all day the only three people I had to talk to by phone (Chris and Brian and Nico) never even noticed that my office phone was forwarded.

    If Mike hadn't wanted to steal some lollipops just after lunch, I doubt anyone would have noticed I wasn't in. :-)

    So open for business again, and by the time the weekend is over I'll be fully recovered and ready to be back at work full days.

    As you can perhaps see from the picture, the window has a scenic view of a bunch of trees. You can see 156th Ave NE if you squint hard enough through them but all things considered, the trees are probably a better choice.

    Damn, but moving is a pain in the ass....

     

    This post brought top you by ڢ (U+06a2, aka ARABIC LETTER FEH WITH DOT MOVED BELOW)

  • Sorting it all Out

    More on license plates in Bengalūru, and in India

    • 7 Comments

    A quick follow up on Canada isn't Kannada, ay (ಎ)? that was inspired by a comment from Sandeep:

    License plates are generally a literal transliteration of English pronunciations into respective language in India. In Maharashtra also this is common. So a number like MH 1 A 1001 will be like  एम एच १ ए १००१.

    Now my first thought was that it was cool that my non-native-speaker Spidey senses were nevertheless good enough to have figured that out, but that is only mildly impressive (and probably not at all impressive to the native speakers who would obviously figure that out so much faster anyway.

    But my second thought was perhaps more interesting (and I hope Sandeep or another reader knows the answer!):

    What happens with vanity license plates?

    I assume that they exist -- the real question is that if they do, and if there is overlap between those who get vanity plates where they choose the numbers/letters and those who get these transliterated into another language plates, do you get top choose the transliteration?

    It actually takes us back to a unique solution to the whole PenIsland vs. PenisIsland issue I mentioned before in this post and this other one -- if one is given the opportunity to provide and guide in the pronunciation, one can help assure that the way that people read the plate matches our intent!

    Of course I doubt that two or more identical plates would be issued to different people just because the transliteration was different, but even so it is very cool to contemplate -- it is (as the post title indicates) a sort of ruby for the masses, a real chance to have the best possible control over the way people will by default interpret the text.

    Unless people ignore the transliterated license plates when they see them -- which might be easy since they do not exist on every vehicle.

    This raises another interesting idea -- remember how cool I thought it was that the license plate I saw seemed to "spell out" an English word in part of the Kannada text? How about vanity plates that take advantage of that idea to spell out other words?

    Over here the DMV makes sure to do an "obscenity check" on applications to be sure no offensive plates are issued -- would the analogous departments in India have to account for this extra level of indirection in their approvals? Do they have to now for the existing plates?

    Might be fun for someone who happened to be a Microsoft employee working on the Zune project to choose to put 7819 which would put

    ೭೮೧೯

    on their license plate, right?

    Now this raises another question -- can people use other Indic languages? I imagine in Hyderabad the Telugu/Urdu/Hindi thing has to raise its head. Can one choose one's script?

    Can they mix their choice? How about after the government and the people who make the plates move more to Unicode?

    Were I still a consultant, I think I want the gig to move the license plate programs to use Unicode. That seems like it would be a fun Unicode conversion project to me -- they could pay me off with vanity plates of my choosing. :-)

    The possibilities for expression seem endless!

     

    This post brought to you by ೭೮೧೯ (U+0ced U+0cee U+0ce7 U+0cef, aka KANNADA DIGIT SEVEN, KANNADA DIGIT EIGHT, KANNADA DIGIT ONE, and KANNADA DIGIT NINE)

  • Sorting it all Out

    Blogger lunches and such, and Have a Happy Valentine's Day, gentle reader

    • 4 Comments

    Very little that is technical, you know the drill. If you don't like it, then please either get over it or skip it! :-)

    So yesterday there was a lunch for Microsoft bloggers in the Building 16/17/18 cafeteria.

    I felt a little like a food snob since I scooted over to Shamiana in Building 25 to pick up my food and then scooted back to 16/17/18 for the lunch. But it just seemed like a better food choice....

    Of course the lunch was on that island in the middle of the cafeteria, the one you can go all the way around and never find a ramp to get up the couple of stairs, but we we worked it out (I handed my food over to someone and then parked out of the way, on the low ground).

    It was a nice small crowd (a bunch of people may have been over at TechReady?) but it is always fun to talk with saner bloggers who don't try to write off hours, who do proofread their posts, and who don't post twice a day "just cuz" like I do).

    I even got to see KC Lemson there, who I have shared an email or two with. I actually think of her blog the way that Cathy thinks of Vogue or Cosmo on a plane -- total guilty pleasure, no real specific call to read it but I do anyway. She did not remember me, but that's okay. I remembered her. I don't think we have ever actually met before, come to think of it....

    I'll just put up a bunch of their blogs if you want to put yourself at the table when we kind of went around and explained who we were and what we do:

    I was not afraid to say the blog can be like therapy, it almost is sometimes. And I guess that makes blogger lunches and dinners like Group Therapy, which is worth the time for its own sake!

    I honestly think if you tried to you could not come up with eight more different Microsoft[-ish] blogs, but that is actually a good thing in this kind of get-together, all things considered. I think fun was had by all.... :-)


    And now for something completely different!

    Today is actually Valentine's Day, which through my own particular genius that lets me avoid buying stuff for people I have managed to arrive at once again unattached for such purposes.

    Imagine what I save on flower deliveries and such!

    Though I did get a call last night from Andrea, who asked me to be her valentine from across the ocean. I agreed, and note that this too saves me from having to do anything!

    Andrea does give a bit of a lie to the whole No lyrics required idea, since she and I have also discussed song a few times over the years.

    Though with her it was mainly because she knew I liked it, so she figured it was a reasonable way for her to "repay" me for some of the music I helped introduce her to. I suddenly remember why Miss Manners answered her mails with "Gentle reader" and how it applies here. In both directions.

    We talked about Liz for a bit, and I promised we could still talk about songs now and again (she is another late night, why care about time zones person which is technically even worse since she is like nine hours away!). But they really are two different kinds of conversations, and not because it is two different people, but because we mostly talk about language and collation and internationalization, with a little bit about music sometimes since I helped introduce her to some of it.

    Plus we even went out for a little while, but really decided we made more sense as friends given the distance and the fact that we were really both uninterested in moving and even less interested in a long-distance relationship.

    I think I tend to make myself emotionally unavailable more often than not.

    And that is a choice.

    Or a pattern.

    Andrea suggested that it might be why I talk about song lyrics from time to time -- to make myself emotionally available (on a limited basis). There may be something to that....

    She also thinks I have women sprinkled all over the world, though I pointed she is taking some artistic license there since (a) we are not talking about all that many women nor (b) all that much of the world, and (c) if you break up with someone over distance issues, it is not a definite thing that you end up together if you remove the distance (something about the people you can live without are not the ones you turn to just because it becomes more convenient!).

    I tell read her the earlier bit about the lunch today, and she sees a pattern.

    I can almost hear the sad smile behind her voice. "You can't really cook like you used to, but you're kind of a food snob. And not so much with the women like you used to be, but you're kind of a relationship snob, too. I think you miss your old life, Michael."

    She can probably hear the head shake and not just because of the bluetooth headset. "That's no mystery, though -- I have said as much, at times. But I was those kinds of snob about things originally too, so it is not cause/effect -- it is more like a residual part of me. Something one could perhaps no longer deserve but it's the only way I know."

    "You still deserve to be who you are, Michael. But it is a bit like cooking. A little bit of a bitter herb can improve the flavor of a dish when you add it the right way. But take away the dish and if all you have left is the herb, then all people see is the bitter."

    Holy on target insight, Batman. I am momentarily flummoxed.

    She continues, "see, Michael? You're still easy to stun silent, dear."

    "But I'm not bitter about everything, though."

    "True, " she allows. "But you could probably detoxify more of yourself than you have."

    "Like I have any idea how to do that," I say shaking my head.

    "Michael, you've helped other people with the exact same problem. Even me once or twice, remember?"

    I am nodding despite the distance. "Yes," I admit.

    "So I'll tell you what you told me. And what Liz told you by the way. Leave yourself open to the possibility, 'kay?"

    "I can try and work on that."

    "Good. And now you should get some sleep."

    So we agree to keep talking occasionally about whatever, pretty much whenever, like before -- basically to do what we had been doing on and off. We even got off the phone at a semi-decent hour.

    I'll update the blog before it goes live to tell you if I got a good night's sleep.

    [LATER] I did.

     

    This post brought to you by(U+22cf, aka CURLY LOGICAL AND)

  • Sorting it all Out

    Canada isn't Kannada, ay (ಎ)?

    • 9 Comments

    More fun from that India trip....

    One of the interesting things that can happen as a government works to keep language in the minds and hearts of people is that the language ends up getting used. The results can be interesting or amusing or even dangerous (thinking back to previous blogs about Irish and Welsh) but there are even more fascinating concepts when you involve Indic languages like Kannada (a-la-Bangalore or Bengaluru (Bengalūru)?).

    Let's take a look at the back of the buses in Bengalūru for a moment....

    Like vehicles around the world, they have license plates. But let's take a closer look:

    I was seeing these license plates all week and every time I did I scrambled for my camera but kept missing getting the picture all week -- this one I got on the way to the airport, it was the closest to non-blurry that I was able to get from the back of a moving taxi!

    On the left side is the license plate like you see on vehicles throughout the city. But if you look on the right, there is that same license plate, using the Kannada script and language!

    Now this is fascinating on several levels -- first and foremost it is the sort of thing that makes the idea of internationalizing StrCmpLogicalW of more than just theoretical interest!

    Then there is taking 0123456789 in Kannada when you get ೦೧೨೩೪೫೬೭೮೯ and by coincidence 01 looks like ON and 1019 looks like NONE. :-)

    But there is another interesting thing that happens for the language. Do you see it?

    Just like with Hebrew, where IBM becomes יבמ rather than יבם (mentioned previously here), KA becomes not (KANNADA LETTER KA) or ಕಾ (ending with a KANNADA VOWEL SIGN AA) or even ಕೆ (ending with a KANNADA VOWEL SIGN E), it is instead ending with (an independent KANNADA LETTER E) -- which gives hints on both how it is pronounced if one is a native speaker, but clearly making sure people don't make it a single word KA but instead something that sounds like KAY-AAY (the KANNADA A and AA sound more like AHH than AAY, as opposed to the E).

    Do you also see how that KANNADA LETTER E surrounds the second word in order to allow the letters to be pronounced not as a word but as individual letters, and how it seems possibly happier not using (KANNADA LETTER FA) but instead something closer maybe to (KANNADA LETTER PHA)?

    A native speaker can probably explain better about what is going on here exactly, but one thing is clear -- the license plate really is quite a literal transliteration of English pronunciations into Kannada. That is indescribably cool, in my opinion!

     

    This post brought to you by(U+0c8e, aka KANNADA LETTER E)

  • Sorting it all Out

    It's not what you know or even who you know; it's what you're near (aka C is for Copy, that's good enough for me)

    • 11 Comments

    Over in the Suggestion Box, Amie asked:

    What exactly does Ctrl - V stand for and how did it come about.. besides the fact that the V is near X and C? What does the V stand for in the shortcut? Can someone help me figure this one out please?

    Thank you!

    -- Amie (:

    And regular reader Jan Kučera responded (also in the Suggestion Box, where he had moments ago posted an unrelated question that I have not gotten to yet:

    To Amie regarding Ctrl+V meaning:

    ... 'Paste' in Czech is 'Vložit' ...  so ... maybe? ..hm? :-D

    Cute!

    But I am inclined to doubt that the Czech language had much to do with it. :-)

    As far as I can see, both CTRL+X and CTRL+V owe their ubiquitous assignments to their proximity to CTRL+C, which is a very sensible choice for the copy operation....

    Sorry about that, Jan!

     

    This post brought to you by C (U+0043, aka LATIN CAPITAL LETTER C)

  • Sorting it all Out

    Behold the Table Driven Text Service, Part 9 (Will you be content if I tell you how some content can be defined?)

    • 1 Comments

    Prior posts in the series:

    It is ironic that after those last two blogs with all of the various configuration settings, thast I m only now getting to the actual settings involving content....

    Of course before you can have the content, first you have to point to where all the content is! That is done in the...

    Profile section

    Profile section should begin as “[Profile]” section name.
    In [Profile] section, Table Driven TIP could hany asny or all of the following definitions:

    KeystrokeFile = Path and file name value
    RadicalFile = Path and file name value
    DictionaryFile = Path and file name value
    PhraseFile = Path and file name value
    PhraseFromKeystroke = Path and file name value
    SymbolFile = Path and file name value
    DirectInputFile = Path and file name value

    Now you could have each of these sections within the very same file, the very file that you have been editing all this time.

    Or alternately, you can you have a single file that multiple Table Driven TIPs point to - if in fact your goal is to have multiple TIPs that share different configuration settings. This can obviously be very helpful when there are large sections that might be shared, like in IMEs

    Each entry in the section above that is included has it's own section that ha the actual data in it, which I will now cover below....

    Keystroke sections

    Defines Table Driven TIP’s composition window keystroke data.
    In [Keystroke.Composition] section, form is:
    Key code value = Function value

    Define Table Driven TIP’s candidate window keystroke data.
    In [Keystroke.Candidate] section, form is:
    Key code value = Function value

    Define Table Driven TIP’s candidate window with wildcard search keystroke data.
    In [Keystroke.Candidate.Wildcard] section, form is:
    Key code value = Function value

    The key code value in any of the three of the above sections is basically the VK value, as you have seen in previous samples in earlier sections like  in Part 1.

    "Function value" in any of the three of the above sections is defined as follows:

    Function Definition Description
    INPUT This keystroke is defined as input to Table Driven TIP composition engine
    CANCEL_TEXTSTORE_AND_INPUT This keystroke is defined as finalize composition or candidate
    MOVE_PAGE_DOWN This keystroke is defined as move one page down on candidate list

    Radical section

    Radical is not used for any conversion method. This is just redirect display character in reading window from particular keystroke.
    In the [Radical] section, the form is"
    "Key" = "Char"

    Text section

    In the [Text] section, the form is:
    "Keys" = "Converted string"

    Phrase section

    In the [Phrase] section, the form is:
    "String" = "Phrase 1" , ... , "Phrase N"

    Phrase from keystroke section

    In the [PhraseFromKeystroke] section, the form is :
    "Keys" = "Phrase 1" , ... , "Phrase N"

    Symbol section

    In the [Symbol] section, the form is:
    "Keys" = "Char 1" , ... , "Char N"

    Direct input section

    In the [DirectInput] section, the form is:
    "Key" = "Char"

    Now of course you wouldn't always be using every one of these sections (as the earlier samples showed), but they have their uses when you need them. :-)

    Next time I'll look at how to convert from some on the old style IMEs if you have been using them in the past and then after that I'll show some examples of these sections in some of the built-in Table Driven TIP profiles in Vista/Server 2008, after which we'll start building some useful profiles for other languages....

     

    This post brought to you by ⑧ and(U+2467 and U+2468, aka CIRCLED DIGIT EIGHT and CIRCLED DIGIT NINE)

  • Sorting it all Out

    Making a different keyboard indicate a different font

    • 3 Comments

    Over in the Suggestion Box, Andy asks:

    Hi,

    I am working with Word 2003, and I have a question about language keyboards and fonts.  I work with Hebrew in a lot of the papers I have to write for school, and I have found Unicode support very helpful in to that end.  Switching between English and Hebrew keyboards is very smooth and easy, but I get frustrated because Times New Roman just doesn't cut it for a Hebrew font.

    Is there any way that Word can automatically switch to another font when I change to Hebrew Keyboards?  What I mean is this: Is there some setting in Word to make it so that when I switch from English to Hebrew keyboards, that the font will automatically switch from Times New Roman to Ezra SIL SR (the Hebrew font I use)?  Right now, I have to manually switch between the two, and it sometimes is frustrating.

    Thanks for your help!

    ~Andy

    This is actually not too hard to do, as long as you define a keyboard under a language that word thinks of as a complex script one (such as Hebrew).

    All you have to do is launch the Font... dialog, which looks like this in Word 2003:

    and like this in Word 2007:

    Just change the value in that Complex Scripts section, and then when you change the keyboard to a complex script language, word will usually set to that font....

    Now I say usually since here are times that Word will believe it knows better and will override the choice given here, mainly when it believes that the font you selected cannot handle the input language or cannot handle what the characters in question require.

    But in Andy's case it should do well....

     

    This post brought to you by(U+fb4f, aka HEBREW LIGATURE ALEF LAMED)

  • Sorting it all Out

    Behold the Table Driven Text Service, Part 8 (Configuration 'junk in the trunk', part 2)

    • 0 Comments

    Prior posts in the series:

    After last time, I had it impressed on me that the majoritry of the people who read this series over time will not care so much about the order of blogs as the actual information, so that posts wih settings in them before every single window is fully defined is not necessarily a bad thing.

    So today we're going to do some more settings! :-)

    Configuration section for text candidate list window

    This is group for text candidate list window however should be inside “[Configuration]” section.

    • Keystroke sort

    Text candidate list item should sort by keystroke order.

    Form of keystroke sort is:
    KeystrokeSort = integer value
    Where:     0     - turn off keystroke sort (default)
               Not 0 – turn on keystroke sort

    • Text sort

    Text candidate list item should sort by text string order.

    Form of text sort is:
    TextSort = integer value
    Where:     0     - turn off text sort (default)
               Not 0 – turn on text sort

    • Hide text candidate list window

    Form of hide text candidate list window is:

    CandidateList.Text.HideWindow = integer value
    Where:     0     - show text candidate window (default)
               Not 0 – hide text candidate window

    • Conversion only one item

    When convert some keystroke and result of text candidate list item is only one item or if phrase file is available however doesn’t find match phrase list with converted key, then converted key should finalized.

    Form of conversion only one item is:

    Composition.ConversionOnlyOneItem = integer value
    Where:     0     - turn off conversion only one item (default)
               Not 0 – turn on conversion only one item

    For example, Chinese Traditional Array, keystroke “AAES” has only one text string as "虣", and phrase file doesn’t have it character. In this keystroke, character "虣" is finalized when press Conversion key.

    •  Conversion only one item on convert

    When convert some keystroke and result of text candidate list item is only one item or if phrase file is available however doesn’t find match phrase list with converted key, then converted key should finalized.

    Form of conversion only one item on convert is:

    Composition.ConversionOnlyOneItemOnConvert = integer value
    Where:     0     - turn off conversion only one item on convert (default)
               Not 0 – turn on conversion only one item on convert

    For example, Chinese Traditional DaYi, keystroke “,,,” has only one text string as “劦”, and phrase file doesn’t have it character. In this keystroke, character “劦” is finalized when press Conversion key.

    You might be wondering at this point: What is different with ConversionOnlyOneItem and ConversionOnlyOneItemOnConvert?

    Imagine you have two different profiles (one with each setting). Both profiles defined below strings which particular keystroke should be one candidate converted string.

    [Text]
    "KB" = "㎅"
    "MB" = "㎆"
    "GB" = "㎇"
    "Hz" = "㎐"
    "kHz" = "㎑"
    "MHz" = "㎒"
    "GHz" = "㎓"
    "THz" = "㎔"

    In “ConversionOnlyOneItem” profile, when we type “kb”, then converted string “㎅” is finalized immediately.

    On the other hand, in “ConversionOnlyOneItemOnConvert” profile, when we type “kb”, then converted string “㎅” is not yet finalized until we press Convert key.

    Configuration section for phrase candidate list window

    This is group for phrase list window however should be inside “[Configuration]” section.

    • Phrase sort

    Phrase candidate list item should sort by phrase string order.

    Form of phrase sort is:

    PhraseSort = integer value
    Where:     0     - turn off phrase sort (default)
               Not 0 – turn on phrase sort

    • Make phrase from text

    If profile doesn’t have phrase file, however would like to shows some phrase which browse from text dictionary, then turn this switch on.

    Form of make phrase from text is:

    MakePhraseFromText = integer value

    For example, Chinese Simplified QuanPin, keystroke “ZHONG” shows some texts and select one item which “中”, then Table Driven TIP makes phrase list as start “中” character into text dictionary which phrase list corrected below:

    "zhongbiao"="中标"
    "zhongbu"="中部"
    "zhongceng"="中层"
    "zhongdang"="中档"
    "zhongdeng"="中等"
    "zhongdengche"="中等城市"
    "zhongdengjia"="中等教育"

    • Hide phrase candidate list window

    Form of hide phrase candidate list window is:

    CandidateList.Phrase.HideWindow = integer value
    Where:     0     - show phrase candidate window (default)
               Not 0 – hide phrase candidate window

    • Title of phrase candidate list window

    Show title on phrase candidate window with specified string.

    Form of title of phrase candidate list window is:

    CandidateList.Phrase.Title = String value

    For example, Chinese Traditional DaYi has “Shift + Numeric” string in title bar when keystroke “A”+”Space”+”Space”.

    • Modifier of phrase candidate list window

    For select an item in phrase candidate list by keyboard, could be specified some modifier value.

    Form of modifier of phrase candidate list window is:

    CandidateList.Phrase.Modifier = Modifier value
    Where:     Modifier value is TF_MOD_xxx value, as previously discussed here.

    For example, Chinese Tradition DaYi has TF_MOD_SHIFT which phrase candidate item should finalized by Shift + Numeric key.

    Preserved key section

    Preserved key section should begin as “[PreservedKey]” section name.

    Table Driven TIP could specify below three preserved keys:

    • IME mode
    • Full / Half character width as called Double/Single byte
    • Punctuation switch

    Each preserved key has similar item which is:

    • GUID value (required if support preserved key)
      Each Table Driven TIP’s language profile and each preserved key need to specify unique GUID value if target language profile need to use proper preserved key.
    • Preserved key (required if support preserved key)
      Define preserved key as Key code value.
    • Description (optional)
      Define description of this preserved key.
    • Initial state (optional)
      Define initial status of this preserved key.

    In [PreservedKey] section, Table Driven TIP could see the below definitions specified:

    Preserved key section for IME mode
    GuidImeMode = GUID value
    KeyDefineImeMode = Key code value
    DescriptionImeMode = String value
    ImeMode = integer value

    Preserved key section for Double/Single byte
    GuidDoubleSingleByte = GUID value
    KeyDefineDoubleSingleByte = Key code value
    DescriptionDoubleSingleByte = String value
    DoubleSingleByte = integer value

    Preserved key section for Punctuation
    GuidPunctuation = GUID value
    KeyDefinePunctuation = Key code value
    DescriptionPunctuation = String value
    Punctuation = integer value

    Language bar section

    Language bar section should begin as “[LanguageBar]” section name.

    Table Driven TIP can specify the below three language bars:

    • IME mode
    • Full / Half character width as called Double/Single byte
    • Punctuation switch

    Each language bar has similar items, which are:

    • Description (required if language bar item is supported)
    • Tooltip description (required if language bar item is supported)
    • Enable / Disable language bar item (optional)
    • Button ON icon file (optional)
    • Button ON icon index (optional)
    • Button OFF icon file (optional)
    • Button OFF icon index (optional)

    The icon file/index entries are the same as were prevuously discussed here.

    In [LanguageBar] section, Table Driven TIP could specifiy the following definitions:

    Language bar section for IME mode
    DescriptionImeMode = String value
    TooltipImeMode = String value
    EnableImeMode = integer value
    ImeModeOnIcon = Path and file name value
    ImeModeOnIconIndex = integer value or predefined value
    ImeModeOffIcon = Path and file name value
    ImeModeOffIconIndex = integer value or predefined value

    Language bar section for Double/Single byte
    DescriptionDoubleSingleByte = String value
    TooltipDoubleSingleByte = String value
    EnableDoubleSingleByte = integer value
    DoubleSingleByteOnIcon = Path and file name value
    DoubleSingleByteOnIconIndex = integer value or predefined value
    DoubleSingleByteOffIcon = Path and file name value
    DoubleSingleByteOffIconIndex = integer value or predefined value

    Language bar section for Punctuation
    DescriptionPunctuation = String value
    TooltipPunctuation = String value
    EnablePunctuation = integer value
    PunctuationOnIcon = Path and file name value
    PunctuationOnIconIndex = integer value or predefined value
    PunctuationOffIcon = Path and file name value
    PunctuationOffIconIndex = integer value or predefined value

    Shortcut section

    Table Driven TIP could define shortcut keys for text candidate list.

    In [Candidate.Shortcut.Original] section, form is below and integer value is mentioned as item number in candidate list.

    “Key” = integer value

    Next part, I'll finish up the definitions by defining the actual content sections of the file, and jump into some more examples. Stay tuned!

  • Sorting it all Out

    Who assigns the VK_OEM_* values in keyboards?

    • 0 Comments

    Someone named Michael who is not me asked via the Contact link:

    Hello,

    I was interested in the process Windows uses for converting scan codes to virtual keys. Do you know anywhere where it is described precisely? Notably, things like what determines which scan codes get assigned to VK_OEM_x for a given keyboard layout (or whether it even differs, although I assume it must, since the VK_OEM_x must move around VK_OEM_PLUS and things).

    Thanks

    The actual mapping between scan code and virtual key is arbitrary and implicit in the individual keyboard layouts, and that Getting all you can out of a keyboard layout series (particularly Part 1 and Part 4) goes through how that mapping was built into the sample to get the actual mappings.

    In order to reduce the complexity of things, MSKLC (Microsoft Keyboard Layout Creator) actually goes to great effort to not allow arbitrary changes to the mapping, though by letting you load any one existing layout you can inherit the arbitrary mapping that a specific, pre-existing keyboard layout uses.

    And if you do want to specifically modify the mapping, you can open up a saved .KLC file directly, since the file's main layout table has as its first two columns the SC and VK values (which can be modified as needed). Though if one does down this road, try to avoid badnesses like duplicating SC or VK values. :-)

    But as I pointed out at the beginning of the blog, there are no real rules determining what the mapping ought to be, so there is no real pattern to determining what it will be -- though the fact that MSKLC is used for the bulk of all new layouts, its limitations that try to stop random change here should have a positive impact on news layouts in future versions of Windows.

    In prior versions before MSKLC existed, keyboard layouts were created by directly creating text files that look a lot like those .KLC files. They were mostly created by copying an existing one and changing key assignments, so the idea of VK_OEM_* values being inherited from other layouts was pretty common even back then....

     

    This post brought to you by(U+1ed9, a.k.a. LATIN SMALL LETTER O WITH CIRCUMFLEX AND DOT BELOW)

  • Sorting it all Out

    Microsoft still does not use the UCA; the converse is also true

    • 5 Comments

    In recent conversations about the atomic Malayalam chillu on Unicode's Indic list, we do find that the fact they have been encoded has not stopped widespread argument about them from several, even now.

    In the midst of all that, several comments about the DUCET (the Default Unicode Collation Element Table) came up, with people arguing that whether or not the chillu are encoded atomically impacts whether the DUCET can support Malayalam without tailoring.

    This is of course not true; the Unicode Collation algorithm has no technical barrier to supporting any kind of collation needed.

    With that said, there is a performance impact to contractions, such that the proposed update text to Unicode 5.1 actually includes the following text. Here is the old text, slated to be removed:

    Contractions are provided for those instances where a canonical decomposable character needed to be given a distinct primary weight in the main weight table, which implied that the canonically equivalent character sequences should also be given the same weights. These currently include Indic two-part vowels and with some Cyrillic accented characters, to match the expected collating behavior for those scripts. Contractions are also provided for Thai/Lao reordering.

    And here is new text in the latest version under public review:

    The Default Unicode Collation Element Table does not aim to provide precisely correct ordering for each language and script; tailoring is required for correct language handling in almost all cases. The goal is instead to have all the other characters, those that are not tailored, show up in a reasonable order. In particular, this is true for contractions, because the use of contractions can result in larger tables and significant performance degradation. While contractions are required in tailorings, in the Default Unicode Collation Element Table their use is kept to the bare minimum to avoid such problems.

    In the Default Unicode Collation Element Table, contractions are required in those instances where a canonically decomposable character requires a distinct primary weight in the table, so that the canonically equivalent character sequences are also given the same weights. For example, Indic two-part vowels have primary weights as units, and their canonically equivalent sequence of vowel parts must be given the same primary weight by means of a contraction entry in the table. The same applies to a number of precomposed Cyrillic characters with diacritic marks and to a small number of Arabic letters with madda or hamza marks.

    Contractions are also entered in the table for Thai and Lao logical order exception vowels. Because both Thai and Lao both have five vowels that are represented in strings in visual order, instead of logical order, they cannot simply be weighted by their representation order in strings. One option is to require preprocessing of Thai and Lao strings, to identify and reorder all logical order exception vowels around the following consonant. That approach was used in Version 4.0 (and earlier) of the UCA. Starting with Version 4.1 of the UCA, contractions for the relevant combinations of Thai and Lao vowel+consonant have been entered in the Default Unicode Collation Element Table instead.

    Those are the only two classes of contractions allowed in the Default Unicode Collation Element Table. Generic contractions of the sort needed, for example, to handle digraphs such as "ch" in Spanish or Czech sorting, should be dealt with instead in tailorings to the default table -- in part because they often vary in ordering from language to language, and in part because every contraction entered into the default table has a significant implementation cost for all applications of the default table, even those which may not be particularly concerned with the affected script. See the Unicode Common Locale Data Repository (CLDR) for extensive tailorings of the DUCET for various languages, including those requiring contractions.

    The upshot of this is that while it may be true that there is no technical limitation blocking the support of any language's needs within the DUCET, in practice there is a policy to limit contractions within the DUCET, due to the performance cost on all implementations to have such an addition.

    People who need language-specific support, therefore, should turn to CLDR to get tailoring, and not expect that the DUCET will support all aspects of collation.

    Now as I pointed over three years ago, Microsoft does not use the Unicode Collation Algorithm. We just couldn't wait for it, given when we wanted to add collation support to Windows, especially since there was no way to know there would be something wait for?

    I have had it suggested to me in the past by people both inside and outside of Microsoft that details on the collation implementation with data tables was requested of people over a decade ago and that request was in fact turned down, which led to the eventual UCA creation (back in early 1997) not having as any kind of source or basis the work that Microsoft has included. Though since they are trying to model the same thing they can often return the same results, the sometimes arbitrary nature of collation definitely can lead to substantial differences between the two.

    Anyway, although Microsoft does not use the UCA, its own implementation of its version of the DUCET -- its default collation table -- is implemented currently as a flat table covering the Unicode code values from 0x0000 to 0xFFFF -- thus there is no room for contractions (what we call compressions, a term that would have confusing to use that way in the UCA due to the important discussion of sort key compression) within.

    There is also no support for supplementary characters -- anything from 0x10000 to 0x10FFFF -- other than as surrogate pairs, which is how they are currently implemented -- this issue led directly to the way by which the mathematical sort (discussed in What is SORT_INVARIANT_MATH for?) was implemented, as that article discusses.

    I'm not really directly involved with any of those things anymore, though all things being equal if it were up to me I'd probably be inclined to just extend the flat table to be bigger than it currently is, though of course the size increase would make one want to rethink the flat table idea a bit and would lead to some mildly interesting decisions when it comes to high surrogates and low surrogates for data largely designed to support UTF-16-based functions....

    For Microsoft, which can manage to avoid the public scrutiny that Unicode has to deal with for some of these issues, the decision of what to do with newly added scripts is a very real one (I talk about some of the issues in How does Microsoft assign new collation weights?).

    But it is interesting how over the past few years these two completely different implementations have been moving closer to each other, for example in

    • handling of Thai/Lao and their visual encoding (both implementations are now table based with Unicode moving away from the former algorithm-based approach;
    • support of the entire range covered by Unicode (both now try to cover the entire space though prior to Vista Microsoft only covered the subset of Unicode covering some of the world's languages)
    • contractions in the default table (impossible in Microsoft's implementation and highly unlikely in Unicode's, with no such restriction officiously described previously.

    and so on. They are getting closer to each other conceptually.

    Now all that really remains that is different is the arbitrary nature of weight assignment, a difference that is unfortunate since it makes it impossible to (for example) use the tailoring defined in one implementation with the other, since they are both modifications of entirely different starting points (default tables).

    Because of this there is not much that Microsoft can do for Unicode other than give technical advice when UCA drafts come up and not much Unicode can do for Microsoft since we can't use any of the CLDR-defined tailorings directly (we also kind of rely on our current model of working with linguists, native speakers, and language experts that essentially amounts to a procedural difference in how data is added between the two platforms.

    I wonder now, on the far side of that long ago decision to not share data whether it would have been better to share it, though to be honest I doubt they would have been likely to match given the many differences Microsoft has had over the years like Korean re-ordering and not entirely compatible ideas of how to handle default table and multiple language/script scenarios.

    Would the situation be easier to unravel now? The world may never know....

     

     

  • Sorting it all Out

    Are they premature classics, or am I just getting old?

    • 8 Comments

    Warning: amazingly, astoundingly off-topic by any definition of topicality!

    Growing up, I never had a sense of what my parents really liked in the way of music and movies and such.

    We had movie channels and watched what went by and we owned The Jazz Singer on VHS but then every freaking Jewish family did that, but beyond that it was a mystery to me.

    In fact, when Mike once complained about my snubbing some particular hip-hop music that I was "becoming my father", I pointed out that I honestly wasn't since I had no sense of musical tastes from him, so perhaps I was becoming Mike's father, but not necessarily my own? :-)

    Movies are what I really wanted to talk about, though. It seems like the definition of classic movies used to seem more like "movies from before you were born" but now, with the help of TCM (Turner Classic Movies) has morphed even past "movies you grew up with" into "movies almost two decades old" or something.

    Like a bunch of yesterday's picks, in a row from 5pm to 1am:

    Now these were all interesting movies, in their own individual ways.

    I personally put The Sandlot (1993) ahead of Stand By Me (1986) from the genre standpoint, and not just because I found the quest to be more palatable as a 15-16 year old (death would not fascinate me until the first time I was mugged at gunpoint almost two years later), but also the whole lard-ass sequence was just kind of disgusting and actually soured me on Wil Wheaton (the one who told the story) even before the Star Trek fanbase landed on him years later, despite the positive influence of the whole That's weird, what the hell is Goofy? sequence that served as foreshadowing for Gonzo from The Muppet Show since although it came after Gonzo it was set before him.... 

    But in my mind are they classics?

    I guess I think of the music definition, where the classics were things I listened to even though they were technically music from before I was born, or at worst before I was listening to music. Maybe movies work under different rules.

    Or perhaps that means I really am getting old -- that any movie that I remember when it first came out in the theater can be called a classic now.

    Some of my younger readers don't have this problem -- these movies came out either before they were born or when they were too young to see movies (or in some case to even know English!). So for them the definition is pretty much intact. And they don't mind if movies like Starman (1984) end up on TCM because of it.

    But I am left wondering whether they have been kind of shifting the definition and thus making me a bit further along the whole "old man" trail than I thought I was otherwise, or whether the definition has not changed and I am basically old.

    My parents were born in 1939 and 1943 (I only know this because I can literally picture their ages on my birth certificate, I can never remember their ages directly), and since they were in their late 20s and early 30s when I was born they confound the generational bit a tad by not being in that group that had kids at or around becoming 20, so perhaps they ended up with the same definition -- it is easy to include one's childhood if one is a few years older by the time the next generation gets started.

    And my sister did not start having kids until her mid 30s, so she is in same boat.

    I myself am (with no current serious prospects) unlikely to reproduce, and have actually had a vasectomy, a reversal, and then another vasectomy (the former targeted as a post MS diagnosis scare foreshadowing this issue, the latter two targeted at relationships and my partner at the time's view on the matter of children -- this is likely a topic for a future blog) -- the net effect making progeny incredibly unlikely (even the reversal was only partially successful, though the technical term for people who rely on that sort if thing for birth control is daddy).

    So I'll have to rely on Meredith and Zachary for the next generation. Good luck on that... :-)

    But back to classics -- maybe what I am rebelling against is the notion that I was able to be alive for them -- don't they seem like they should really be the thing that came before me?

    This could all be crap anyway, since I have no idea what the target demographic of TCM is -- do they aim at people younger than me, making my concern irrelevant? Or am I really getting old?

    My younger friends who think I may as well by 100 respond by saying Hell yes, Michael! and my older friends who themselves re feeling like they are getting old respond by saying Shut the hell up, Michael which might mean that I have some kind of hellfire scenario to look forward to either way. Good time be more agnostic, huh? :-)

    According to Wikipedia, TCM is defined as follows:

    Turner Classic Movies (TCM) is a cable television channel featuring commercial-free classic movies, mostly from the Turner Entertainment and Warner Bros. film libraries, which include many MGM, United Artists, RKO and Warner Bros. titles.

    Maybe I would feel differently if I ever look at AMC for movies, but it is probably about the same.

    These days I take the Emo approach. You know, that "Sometimes, people come up to me concerned.... that I'll reproduce." but figuring they don't need to worry. Though I have worked in schools and afterschool programs and day care centers, as a babysitter and even as a nanny for one year, I don't have a whole lot of contact with kids these days (th sole exception being my nieces, who I do not see nearly as often as I probably ought).

    And I guess I do feel old more often than not -- and not just because I don't quite keep up with the young ones these days....

    This blog post may not actually be about anything. But it was written on a quiet Saturday for posting on an even quieter Sunday morning. What better time to not be about anything?

     

    This post brought to you by (U+1375, aka ETHIOPIC NUMBER FORTY)

  • Sorting it all Out

    If the [IE7 on Vista ]browser doesn't have anything nice to display, it might display nothing?

    • 2 Comments

    Another recent Suggestion Box item from favored "regular" reader Tanveer Badar:

    Another problem from the [not] random reader.

    Your post
    No charset meta tag? causes FF 3 beta 2 to crash with a buffer overflow.

    Hehehe. Sleek :).

    And in IE7 it does not show anything although the html is there if you do "View Source".

    Could you investigate the issue so I can enter a bug for IE7 somewhere?

    Okay, do here is what I am seeing here across a wide range of browsers I have installed:

    • IE 6.0 on XP SP2 -- page looks good
    • IE 7.0 on XP SP2 -- page looks good
    • IE 7.0 on Vista -- all text is missing from the post, just as was reported
    • FireFox 2.0.0.2.12 on Vista -- page looks good
    • Opera 9.25 on Vista -- page looks good
    • Safari 3.0.4 (523.13) on Vista -- page looks okay, though I'm getting notdef glyphs rather than YI even though the font is installed

    And then there is that bug Tanveer mentioned about the FireFox beta someone may want to look into.... :-)

    Is anyone from the IE team reading SiaO these days?

     

    This post brought to you by(U+ff26, FULLWIDTH LATIN CAPITAL LETTER F)

  • Sorting it all Out

    SGCAPS and dead keys don't mix

    • 0 Comments

    Over in the Suggestion Box, Alec McAllister asks:

    I'm a long-time user of MSKLC, having made keyboard layouts to support many languages, but I'm stumped by one thing: does MSKLC allow a SGCAP+somekey combination to be used as a dead key, or not?

    MSKLC offers a Dead Key option by such keys, allows us to create listings of characters, and correctly types those characters with that dead key in Keyboard Layout Testing. However, it seems reluctant to activate those dead keys when the DLL is built.

    Hebrew and a few other layouts use SGCAP combinations, but not as dead keys.

    Unfortunately, Alec isn't missing anything here.

    Dead keys and SGCAPS don't mix.

    It's funny, the organic and essentially spec-less process by which MSKLC was built is in some ways responsible for the many layers by which features ended up being added.

    First came dead keys -- not supporting dead keys was considered a ship killer (an attitude which I was later able to inject into The Keyboard Convert Service, which initially did not support dead keys until I helped convince them that a) this was a really bad idea and b) that it was not too hard to add support for!

    Then came SGCAPS -- which PM was actually against at first as it seemed extraneous. I persisted since their original goal (supporting as many keyboards from the ones we ship as possible) could not be met without SGCAPS support, and eventually they agreed that incomplete loading of all of those keyboards would not be such a great user experience....

    Then came the test surface, that built-in way to test the keyboard without installing it. That one was entirely inspired by Bala, my lead at the time (he had never installed the beta builds of MSKLC until the release candidate but when he did he was really impressed with the exception of it requiring installation of the keyboard to try it out. We had a big conversation about mitigation strategies and based on an idea from Bala a prototype of that test surface was built. It proved to be the only option people (including Bala) really liked, so we decided to ship it....

    The test surface was taking the data built-into the keyboard and just kind of applying it, without any notion of what may or may not work later. It was a wonderful idea, truly -- one I wish we would have added sooner since (as it turned out) very few people were actually installing their random test keyboards and testing them purely to find bugs -- and the test surface flushed out a lot of bugs!

    But the fact that

    • multiple characters in a single keystroke
    • ALTGR and Shift+ALTGR states
    • dead keys

    all don't work as one might expect are limitations that a spec and the investigation into the covered feature set that the spec would be expected to include would probably have revealed.

    In fact, as far as I can recall, only the first two of those bullet points have ever been reported -- Alec's report is a longstanding unreported MSKLC bug!

    If one of the current owners of MSKLC would make sure that bug report gets put in the proper place, it seems like something to consider for next version -- locking down the UI so the dead key checkbox is disabled/hidden for those two shift states and the dead key dialog can't be reached from SGCAPS seems like a pretty good idea, since the UI is kind of writing checks that Windows keyboards can't cash....

     

    This post brought to you by(U+2713, aka CHECK MARK)

  • Sorting it all Out

    Overheard, via email

    • 0 Comments

    A quote that was so perfect that I just had to capture it here....

    The context is not provided since I am not trying to slam the specific component (though one really could in this case), I just loved the quote:

    This works fairly efficiently in most cases, and even looks good sometimes, but usually just looks bad, and in certain cases, completely wrong. So it's great for the developer that wants to get something working, but terrible for anyone doing something that is supposed to be robust.

    Things work this way in software entirely too often, and in internationalization features more often, still.

    This post has absolutely nothing to with Cathy or Goldie or Victoria; their names are mentioned here only to see if that mention leads to them seeing the blog and mentioning it at some point....

     

    This post brought to you by(U+4df4, aka HEXAGRAM FOR DEVELOPMENT)

Page 3 of 5 (61 items) 12345