Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
So it was back in Behold the Table Driven Text Service, Part 6 (it is sufficient to be iconic, aka DaYi, DaYi Aynu!) when I showed this picture
and was taken to task for not just looking in the resource directory so they could all be the same size.
"But what size to use?" I asked.
"Try the 'tiled' size," they told me.
So be it:
Somehow this feels worse for some of them (the green color one gets for "transparency" portions in some of them is gone now, for example) and now you can see the ones that were not made to be made bigger like this, and why.
Ah well, I am done playing with the icons, and have no plans to make them any better than this.
I guess I am just not a media icon....
I hope you are not surprised by this.
This post brought to you by ⌘ (U+2318, aka PLACE OF INTEREST SIGN)
I had somebody send me mail the other day, asking me if I wanted people to send me song lyrics questions like Liz used to (yes, that Liz -- should I be worried that I almost need to create a Liz blog topic now, and rename it to Sorting Liz all Out?).
Anyway, this reader, she was I think wondering if I was looking for a substitute to help me get past the memory of the person who used to fill the role.
I have to tell you, I actually have probably three dozen different songs that we had talked about in the past that I just haven't had the heart to write about, though when I concentrate I can recall the conversations, hear her voice.
Because it wasn't ever the blogs, it was the chat -- the late night, dynamic conversations that would infuriate and frustrate, yet simultaneously fascinate and captivate me.
I just enjoyed bloggig about the ones that amused me most from time to time, and I know she liked reading them later on....
Could I have the same kind of conversations with someone else? I probably have, or do, or will.
But would they be on the same topics? Probably not.
At the moment I am not even listening to new music, which really sucks given some of what is coming out soon.
And would they be from someone who claimed (when asked) to be morally opposed to time zones and thus unafraid to call someone who was getting the call at 2:00 am? Again, probably not.
I am sure I will find other ways to amuse myself.
And life is likely to be full of fresh tragedies and celebrations in the future.
One day I might try to tackle some of the songs we did talk about, too....
So I don't think I need any lyrics, really. For now I can't even do the ones I have on tap, and I somehow doubt the life/personal circumstances would allow me to synthesize the same kind of relationship again, really. I don't know anyone who was as truly strange as her (well, other than me, I mean).
But it was a very sweet offer, in any case! :-)
This post brought to you by 𝄺 (U+1d13a, aka MUSICAL SYMBOL MULTI REST)
I have been neglecting the Suggestion Box as of late, so I figured I should cover something over there. This one comes from Gé van Gasteren:
My problem is caused by two circumstances:1. In the Netherlands, most keyboards have US-style keytops with one extra key (and a smaller left Shift key). The Windows keyboard layout for Dutch doesn't match these keytops, and doesn't allow easy typing of accent characters, so most people use the US International keyboard.2. Dutch has sequences like: ’t ’n ’s-Gravenhage ’98which should be written with a single curly close quote (U+2019).The problem is that these sequences usually come out with a single curly open quote (U+2018) because of some smart-quote routine.To try and avoid this behaviour, I have tried (with MSKLC) to make the quote key on the US-International layout produce a proper close quote, but then that key stops working as a dead key, and produces two close quotes every time it's typed!This error does not occur when I put the acute accent on the quote key -- maybe because it's in the ASCII range?A work-around I came up with was to include the necessary sequences in the dead-key table, but multiple code units aren't allowed there, as documented amply in your blogs...If there is a way to create a proper keyboard layout, I would publicize it and try to get some language sites promote its use. I already got one supporter: a Nobel-prize laureate with last name: ’t Hooft, who receives lots of letters with his name containing the meanwhile very annoying close quote. Actually, he indicated to me that he has tried to convince Microsoft to change the Windows keyboard layout for Dutch, but without success. That would be a better solution of course, if possible.All help in this matter is greatly appreciated!
The first issue is entirely by design and Windows has noted the market preference for the US International keyboard layout in the Netherlands for quite some time. So there is not much to fix there with that one (well, ignoring bugs like the problem I noted here).
The second issue is an interesting one -- in part due to an issue that I have talked about before in posts like this one and this other one but will not discuss further as I have been told that there are people who think I am "unreliable" and " don't get along well with others" because of my diatribes about the "features" just because they cause problems in "uncommon scenarios".
Whatever.
If you don't like Office's "brilliant quotes" feature here then turn it off (I do in every Office application, every version I run).
Or alternately, feel free to call product support and suggest that better handling within languages like Dutch might help me shut up more often, which I am sure the feature team would appreciate getting hints about. :-)
The other half that I can blog about without pissing off random people is technically user error, but to be perfectly honest it is a case that (were I still the development owner of MSKLC) I would want to see handled in MSKLC directly rather than requiring the user to do the extra work here....
Let me explain.
First let's put up the keyboard layout so everyone knows what we're talking about:
That is the key in question, also -- notice the nice tooltip listing all of the dead key pairs....
Oh and before I forget, I have a friend who claims that she skims most of the blog posts but skips the keyboard ones entirely. To test this theory, I'll say Hi Goldie! here and since she skips these "boring" keyboard posts she'll never see that I said hello.
Back to the post....
Okay, so let's right click on that key:
We have a nice set of options here, and we will go to the view for all shift states. It brings up this handy dialog
which we can easily change the dead letter from U+0027 to U+2019 as Ge would like....
Okay, so we're done now, right?
Let's go back to the main dialog and take a look:
Do you see the problem here?
We did change the dead letter properly, but let's look at the MSKLC help file warnings on dead keys in validation, which include this text:
Last entry in a dead key table should use a space as its base character
Now the convention is actually supposed to get a bit further beyond this -- the last key should include a spacing version of the dead key letter.
Do you see it now?
That last entry was not updated!
That reminds me, I have another regular reader who clams to read every post. She knows her name and who she is so I'll just tell her Hi! and leave it at that. We'll see if she finds this without the name to search by :-)
Where was I?
Oh yeah?
So, let's go back and update it now:
There we go.
Let's look at the tooltip again now:
That is more like it!
Now to get the U+2019 you do have to hit the space bar after typing the dead key, but that is the way dead keys work when you want to make them live and have them show the character.
But notice how MSKLC did not change the last entry (which by convention is supposed to be either the same character or (if the original dead key is non spacing) a spacing version of it) to match this convention?
Think of how much trouble MSKLC could solve in keyboard maintenance if it would do little usability tricks like this, or failing that at least validation could put in a warning when the convention is not followed?
(I'd prefer to change it automatically, personally -- but any change would be better than the current confusion....
This post brought to you by ’ (U+2019, aka
Prior posts in the series:
I have been stalling in just trying to provide a whole bunch of configuration informatoon at one time, since it just felt like it would be so dry. You know, like reading an encyclopedia article or whatever about the technology.
Then yesterday someone reminded me that we aren't expecting to see documentation on all of this for some time. And if I keep dishing out settings 1-3 at a time, I'll be hitting part 79 before I'm done!
So I surrender, and I will put in a bunch of info on various configuration settings here. I will try to avoid making it dry if I can. :-)
I am mostly going to do a bunch of copy/paste from a document intended to help someone who would translate it into English, though since I am underqualified for that job, I'll just do my best. :-)
First there is the rest of the [Configuration] section to talk about, which I started in Part 2:
l Wildcard search switch (optional)If this switch were specify, Enable wildcard search from dictionary when composition string includes either of the below two things: The asterisk “*” wildcard character find string match any zero or more characters; The question mark “?” wildcard character find string match any one character. Theform of wildcard search identifier is: Wildcard = integer value Where: 0 - turn off wildcard search option (default) Not 0 – turn on wildcard search option l Disable wildcard character at first composition (optional)If wildcard search option is turned on, but won’t wildcard search at first composition character, turn this switch on.The form of disable wildcard search at first composition is: DisableWildcardAtFirst = integer value Where: 0 - turn off wildcard search at first option (default) Not 0 – turn on wildcard search at first option
l Wildcard search switch (optional)If this switch were specify, Enable wildcard search from dictionary when composition string includes either of the below two things:
Theform of wildcard search identifier is:
Wildcard = integer value
Where: 0 - turn off wildcard search option (default) Not 0 – turn on wildcard search option
l Disable wildcard character at first composition (optional)If wildcard search option is turned on, but won’t wildcard search at first composition character, turn this switch on.The form of disable wildcard search at first composition is:
DisableWildcardAtFirst = integer value
Where: 0 - turn off wildcard search at first option (default) Not 0 – turn on wildcard search at first option
Then there are a few items that should be in the [Configuration] section but they aren't, related to the Reading window:
Configuration section for reading window This is group for reading window however should be inside “[Configuration]” section. l Hide reading windowForm of hide reading window is: ReadingWindow.HideWindow = integer valueWhere: 0 - show reading window (default) Not 0 – hide reading window l Reading window width Form of reading window width is: ReadingWindow.Width = integer value If the reading window width is not specified, then the default value is 6.
Configuration section for reading window
This is group for reading window however should be inside “[Configuration]” section.
l Hide reading windowForm of hide reading window is:
ReadingWindow.HideWindow = integer valueWhere: 0 - show reading window (default) Not 0 – hide reading window
l Reading window width
Form of reading window width is:
ReadingWindow.Width = integer value
If the reading window width is not specified, then the default value is 6.
Then there are some [Configuration] section settings that are in the right section which are for the Composition window:
Configuration section for composition window This is group for composition window however should be inside “[Configuration]” section. l Quit and error on conversion Show error display if composition string doesn’t convert to any strings.The form of quit and error on conversion is: Composition.QuitAndErrorOnConversion = integer value Where: 0 - turn off error display (default) Not 0 – turn on error display
Configuration section for composition window
This is group for composition window however should be inside “[Configuration]” section.
l Quit and error on conversion
Show error display if composition string doesn’t convert to any strings.The form of quit and error on conversion is:
Composition.QuitAndErrorOnConversion = integer value
Where: 0 - turn off error display (default) Not 0 – turn on error display
Then there are also some entries for the Candidate window:
Configuration section for candidate window This is group for candidate window however should be inside “[Configuration]” section. l Candidate window width Form of candidate window width is: CandidateWindow.Width = integer value If not specified candidate window width, then default value is 6. l Candidate list index Form of candidate list index is: CandidateListIndex = index list which specifies each index value as x,y,z or CandidateListIndex = index list which specifies range of index as m–n Initial setting of candidate list index is 1,2,3,4,5,6,7,8,9,0. l Don’t show next key sequence In the Explicit Conversion mode, candidate list doesn’t shows keystroke data after conversion character item. Form of next key sequence is: CandidateList.dontShowNextKeySequence = integer value Where: 0 - Show next key sequence Not 0 – Don’t show next key sequence
Configuration section for candidate window
This is group for candidate window however should be inside “[Configuration]” section.
l Candidate window width
Form of candidate window width is:
CandidateWindow.Width = integer value
If not specified candidate window width, then default value is 6.
l Candidate list index
Form of candidate list index is:
CandidateListIndex = index list which specifies each index value as x,y,z
or
CandidateListIndex = index list which specifies range of index as m–n
Initial setting of candidate list index is 1,2,3,4,5,6,7,8,9,0.
l Don’t show next key sequence
In the Explicit Conversion mode, candidate list doesn’t shows keystroke data after conversion character item.
Form of next key sequence is:
CandidateList.dontShowNextKeySequence = integer value
Where: 0 - Show next key sequence Not 0 – Don’t show next key sequence
l Keep candidate list for invalid key
If inputted invalid character which key is not defined in “[Keystroke.Composition]”, it invalid character should inject to application and Table Driven TIP still keep composition character.
Form of keep candidate list for invalid key is:
CandidateList.KeepCandidateListForInvalidKey = integer value
Where: 0 - Close candidate window Not 0 – Keep candidate window
It did suddenly occur to me that I should maybe stop and describe the different kinds of windows a bit further, so I will do that tomorrow....
This post brought to you by ⑦ (U+2466, aka CIRCLED DIGIT SEVEN)
In some of the prior posts in this series, I have shown various screen shots of the Table Driven Text Service input methods, and they all had a particular icon that they used.
Like in Part 1, where the TIP had nothing to do with ideographs really, yet had kind of an ideographic icon next to it. I mean this one:
So, the obvious question is where the hell is that coming from? :-)
The answer is simple enough, though. In the [System] section of the file, there is a setting there called the IconIndex, set like so:
IconIndex = <predefined value or integer>
The index value is for an image built into TableTextService.dll. If you don't specify one, it takes the first one by default (which happens to also be the one for the DaYi IME. The full table of possible entries is:
Note that:
Now that last point is an important one, luckily there is a happy response.
In addition to the IconIndex entry, there is an Icon entry, which takes the path and file pointing to an icon file. So you can make your own icon (though you will not be likely to find someone as cool as my colleague Jennifer Shepherd to create it for you!). It goes in that same [System] section and looks something like:
Icon = C:\path\file\colliconthoughnotascoolasonefromjenny.ico
and there you go!
Now as you can see a lot of the built-in IMEs use other resources in TableTextService.dll like their names; in all cases you can point to your own DLL instead, and still do right by other user interface languages in Windows.
And then you too can be iconic -- without being required to sing DaYi, DaYi Aynu and wonder whether it would be sufficient. :-)
This post brought to you by ⑥ (U+2465, aka CIRCLED DIGIT SIX)
Between the time I blogged about Bangalore Sans Scooter, aka Counting backward from ten in Sanskrit and now, many of my regular readers have been concerned about the situation with me and my scooter.
I know this for a fact.
Not because I am arrogant (which I unquestionably am, it just didn't apply here).
And not because I am self centered (which I am not, though lots of people who don't know me all that well assume I am).
But because so many people have been asking me! :-)
Not everyone even knew I was here, a side effect of the last minute nature of the trip (my favorite message was the one from Cathy which I received soon after I landed and checked mail -- the title was OMG -- You're in India? and I was laughing for several minutes at the message from this friend of mine who just came back from a vacation in South Asia to find out that her friend was going on "vacation" in another part of South Asia!).
Anyway, I have been getting many messages about the scooter situation, and people have generally been either concerned or sympathetic or outraged (or some conventional and occasionally unconventional combination of these traits), so I thought I should let people know what is going on so people do not have to worry....
They were able to find a place to repair the broken parts of the scooter, and in fact I am sitting on it right now!
Just a few minutes ago I was scooting around the building -- nominally to test the scooter out but actually just greeting many people who had mainly had to come to me or who I had met at lunch and now I finally go to go talk with them, it was very nice.
I had actually met Vidya, Cathy's opposite number here (the Director of Strategy at MSRI) and we had the chance to talk about Tamil for a bit, which was fun (she is actually Bengali not Tamil but she knows about the language and many of the projects that happen related to it so it was a great little language policy conversation between two people who just find language issues interesting).
Then I had some other conversations about places to see which will help me over the next few days, more on this as it happens....
Now they never did find he basket but it looks like they will just reimburse me for that. And then everything will back to normal.
And things are close enough to normal now that I can get back to doing stuff the way I am used to doing it.
Special thanks to Joy for pushing them to do the right thing and for everyone else up and down the tree for helping to further expedite things, lending their time and effort and position and reputation.
And if course thanks to Jet Airways for doing the right thing. In my largess I will assume they would have done it without that extra help and support from people in MSRI, but just in case that help was needed I am truly thankful for it being there....
This post brought to you by ॐ (U+0950, DEVANAGARI OM)
Last time I promised
Next up, I'll cover some of the other settings in the [Configuration] section, or I might cover the language settings. Whichever one I don't do next will be done the time after....
But who are we kidding here? Language settings are always gonna come first for me; we all know that, right?
Not that I like being so boring and predictable, mind you. I'll explain how I have been working on that another day.... :-)
Anyway, the language settings.
We start in the
[System]
section of the file, (by convention) right at the top of the file, and (again, by convention) the first line in the section will be
LangId = x
where x represents one of five different possible values. These possible values are:
1) One of the many LANG_* constants defined in winnt.h from the time that Vista shipped, like LANG_MACEDONIAN or LANG_BENGALI or LANG_TAMIL or whatever, e.g.
LangId = LANG_BENGALI
and yes those spaces are okay there. Let's do me a favor and not get me started on the spaces there....
This seems pretty straightforward, right?
I know the list is the Vista RTM one as at the time after I had just gotten that Amharic IME in (as previously mentioned in Hang on just a [Hansel]Minute! and We weren't Vista heroes, but I think we were kinda heroic), I looked to make sure that LANG_AMHARIC was there, and it wasn't. And most of the new Vista locales weren't either, so I added them. :-)
Then I worked with colleague Jennifer Shepherd to get an icon in for the Amharic input method....
The LANG_* constant works by calling GetLocaleInfo with that value, which means it will expand to LANG_*, SUBLANG_DEFAULT as GetLocaleInfo is wont to do. It will be called with the LOCALE_SABBREVLANGNAME lcype which will be used as described in LOCALE_SABBREVLANGNAME is more than just an ISO-639 code and LOCALE_SABBREVLANGNAME is so not an ISO-639 code.
2) One of the many LANG_* constants defined in winnt.h from the time that Vista shipped, like LANG_MACEDONIAN or LANG_BENGALI or LANG_TAMIL or whatever, followed by a comm a and a space, followed by one of the many SUBLANG_* constants defined in winnt.h from the time Vista shipped, like SUBLANG_DEFAULT or SUBLANG_BENGALI_INDIA or whatever, e,g,
LangId = LANG_BENGALI, SUBLANG_BENGALI_INDIA
Again working the same way via that GetLocaleInfo call, via a MAKELANGID macro to turn the two values into a number and the function will be called with the LOCALE_SABBREVLANGNAME lcype which will be used as described in LOCALE_SABBREVLANGNAME is more than just an ISO-639 code and LOCALE_SABBREVLANGNAME is so not an ISO-639 code.
3) A numeric value of any valid PRIMARYLANGID or LANGID, e.g.
LangId = 0x0409
which will see GetLocaleInfo be called with the LOCALE_SABBREVLANGNAME lcype which will be used as described in LOCALE_SABBREVLANGNAME is more than just an ISO-639 code and LOCALE_SABBREVLANGNAME is so not an ISO-639 code.
4) #2 with a LANG_* constant for the PRIMARYLANGID and 0xFFFF for the SUBLANGID, e.g.
LangId = LANG_BENGALI, 0xFFFF
This odd 0xFFFF value is something that TextTableService.dll thinks of as the neutral setting, and it will then add the input method as n option to all of the SUBLANGID values under that PRIMARYLANGID -- thus under Bengali (India) and Bengali (Bangladesh) in the above example.
5) #3 with the 0xFFFF value (or #2 with 0xFFFF for both PRIMARYLANGID and SUBLANGID values, e.g. either of the following:
LangId = 0xFFFFLangId = 0xFFFF, 0xFFFF
Either of these values will cause the input method to be included as an option under all languages.
Anyhow....
At this point you may be wondering one or both of the two things I was wondering about:
The answer to both questions will likely not be the one you are looking for (they were not the ones I wanted to hear!).
Not supported.
But just to round out the [System] section information, here are some more entries (these ones pulled from the Yi IME -- you would fill in your own info):
GuidProfile={<fill in your own unique curly-brace delimited GUID value, please!}Description="Yi Input Method"Display Description="@%programFiles%\Windows NT\TableTextService\TableTextService.dll,-16"
Note the MUI-friendly string -- you can fill in your own DLL name and string index, obviously....
I'll cover the rest of the settings like icon info in an upcoming post.
This post brought to you by ⑤ (U+2464, aka CIRCLED DIGIT FIVE)
Over on the Microsoft VOLT users community, SpaceyT-17 asked:
What updates related to Uniscribe, locales and language support are to be released in Windows XP Service Pack 3?
To which O2K answered:
I wanted to ask the same. But seems it's not finalize yet and they are not disclosing anything about it. :(
Though to be honest some of this has been answered, in blogs like ELK stampede! (which has the 10 locales added post XPSP2 which are possibly going to be in the service pack.
And then there is How many versions does a bug have to exist before backporting the fix can't be successfully argued?, which pointed out a request to put the Romanian keyboard updates into the service pack, and the similar question about the Romanian/Bulgarian font updates I have talked about many times in the past.
So we don't know for sure, but clearly there are ten locales, a bunch of fonts, some keyboards, and no Uniscribe updates that are potential candidates based on these already available downloadable updates....
This post brought to you by Ș (U+0218, a.k.a. LATIN CAPITAL LETTER S WITH COMMA BELOW)
If you are a regular reader, you may recall the recent SiaO @ B.NET blog discussing a speaking engagement in Bangalore set up by regular reader Pavanaja U B.
Now this had to happen without the scooter, because of the adventures I just pointed out in Bangalore Sans Scooter. But I figured I'd be sitting anyway, so no worries at all. :-)
I was a little worried about how the venue changed three times (I pointed how at Microsoft if you move the meeting three times then sometimes people decide to skip it, out of fear that people will go to all three rooms and the actual meeting won't happen!). But we had a nice crowd and people kept trickling in even as the presentation went on....
The talk was a ton of fun, even more than I expected -- in some part because of how fascinating it is to watch people respond to the same talk but find different points amusing due to different cultural contexts. In particular I had taken the time to add a bit of Kannada to some of the samples, which I think also went well. :-)
The real proof was in the Q&A -- the geek presentation version of the singer/songwriter's encore. We blew the house out, with a time that threatened to be longer than the original presentation if you include some questions that happened during, too. We covered everything from sorting to fonts to keyboards to Unicode to not using Unicode and LIPs and more -- many points raised are going to find their way into future blog posts, particularly some of the font issues.
The whole thing was really amazing, and I had a ton of fun (I hope they did too!).
After the meeting, I was presented with a gift -- two books, that were the Hindi translations of two Harry Potter books (Philosopher's Stone and Chamber of Secrets).
Very cool!
I was shown one extremely cool part that they knew I'd love to see -- the way that the translator handled the chapter that in English was entitled Mirror of Erised. I decided I was going to look into this a bit deeper. :-)
So that night when I probably ought to be sleeping I am looking through the chapter and slowly deciphering it -- slowly since I don't know much Hindi but I do know the script a bit and know the original book and have resources to translate words (including people at the front desk of the 37th Crescent Hotel -- including several Harry Potter fans -- who were happy and I think quite amused to help me on this odd quest!).
Now from the original book:
Harry thought. Then he said slowly, "It shows us what we want... whatever we want...""Yes and no," said Dumbledore quietly. "It shows us nothing more or less than the deepest, most desperate desire of our hearts."
Of course Erised is "desire" spelled backward.
This is an interesting translation issue, for Hindi. The translator of course knew that spelling a word backward is a bit more involved, so to be true to the intent of the original he took the Hindi word
ख़्वाहिश
or KHA NUKTA VIRAMA VA AA HA I SHA
(which in my humble opinion was a reasonable choice as it means wish/desire) and reverses it to
SHA HA I VA AA KHA NUKTA VIRAMA ZWJ
(that ZWJ required to look most like the book's printed text for reasons I discussed in the very talk I gave earlier and in that blog Why my IUC31 talks were presented on Vista (even though running on a MacBook Pro) so that the chapter title becomes
शहिवाख़् का दर्पण
which is basically backward as ordered though it creates a kind of awkward word in Hindi for all sorts of reasons, including the ending half form that is not represented perfectly for what is in the book in Unicode (and is not comfortably pronounceable in Hindi, either).
It is a fair bet that the book was not typeset in Unicode (which is yet another issue we talked about in the Q&A that I will blog about some time soon -- Adobe beware!)
I wondered whether I should be writing a proposal for the UTC to cover what this word brings up, since it appears in a published work? :-)
I kept reading, out of curiosity I think -- plus I really wanted to be comfortable with my conclusions (I feel like I did not listen closely enough when they were explaining meanings and felt properly self conscious that I may have missed something). I was reminded of an old BtVS bit of dialog:
Dawn: I've been reading this old Turkish spell book. There's an old conjuration that the ancient Turks used to communicate with the dying. Willow: Oh, yeah. I think I've read a translation of that. Dawn: There's a translation of it? [Dawn makes a frustrated sound] Dawn: I'm over it.
I mean, maybe it is funny for me to be trying to read the book, but then as Cathy (back in Redmond from Thailand, shocked to find out I was in India, but still only an IM away) put it when I feared she'd be laughing about this:
You gotta remember that I took at least 6 dead language courses + German + French + Mandarin in college.
Fair enough. :-)
Eventually, I felt comfortable that I had properly reverse engineered the translator's intent here, but still imagining I would keep cracking these two books from time to time as I truly enjoyed the experience of this one chapter and found myself curious about what the translator did with things like the Halloween references (do they even have Halloween in India?).
In any case, a truly enjoyable presentation, a wonderful crowd, a delightful gift, and a pretty half-decent blog post about it all. What more could one ask for? :-)
This post brought to you by ख (U+0916, aka DEVANAGARI LETTER KHA)
No, the title is not about a new font name for the next version of Windows!
It was inspiring, I could almost hear the Chad Michael Murray voiceover (re-used script from the One Tree Hill episode Pictures of You with the writer's strike and all):
You ever wonder how long it takes to change your life? What measure of time is enough to be life altering? Is it four years like high school, one year, an eight week rock tour? Can your life change in a month, or a week or a single day? We're always in a hurry to grow up to go places to get ahead but when you're young one hour can change everything.
Perhaps I should explain how and/or why I think something changed so fast....
Another my India trip post, of course.
The trip from Chennai to Bangalore really did manage to put a cat among the pigeons, let me tell you.
I arrived ninety minutes prior to departure.
They smiled and told me I had plenty of time.
They seemed uncertain about checking the scooter plane-side. Although willing to go along with me, it just did not seem to them like something people do. No one could really imagine that I wouldn't want to be able to sit back and be pushed places, rather than being in control of my own movement.
I count from ten backward to myself. Ten, Nine, Eight, Seven, Six, Five, Four, Three, Two, One.
On the other hand the scooter had everyone interested -- we had two supervisors, four loaders, and two women whose exact role was never explained (the supervisors wore ties and introduced themselves as supervisors, the loaders were very deferential to them, but the exact relationship between the women and the others was indeterminate).
Of course the guy in the uniform we had to pass before going through security refused entry of the scooter.
"It can't go through the x-ray machine," he explained.
Fair enough. We head back to the ticket counter and I get a baggage claim ticket. The loaders now wrestle with the scooter. I offer to help in case they have to take it apart or tilt it or whatever. They assure me they have the matter in hand (as one of them lifts his end by the seat and it comes off the base in his hand).
He smiles, apologizes, and lifts by the bottom.
I count backward from ten to myself in Portuguese. dez, nove, oito, sete, seis, cinco, quatro, três, dois, um.
They put it on a baggage cart and head off.
Me they put in a wheelchair. One of those wheelchairs with really small back wheels so I can't drive it myself but am absolutely dependent on someone to push me,
Only one of the women remains; all of the supervisors and loaders and the other woman have vanished without a trace.
I suppose a man on a scooter is more interesting than one in a wheelchair.
The woman, who is not wearing a tie but is wearing a beautiful purple outfit, if trying to find a loader to push the wheelchair.
I can't help noticing the irony that when I needed no one and could do it all myself I was surrounded by a small entourage of Jet Airways employees, yet now that I was put in a situation where I needed someone the only person there was someone who clearly had no intention of pushing a wheelchair.
I am a bit too embarrassed to ask for a chair with big back wheels that I could move.
And way to embarrassed to ask her if she could do so. She is very slight, the sort of person who one could imagine blowing away on a blustery day, and pushing me plus two laptops? I am happy to accept her decision to look for a loader, though her attempts to signal one over do not seem to be having much success.
I take a moment actually look at her, mainly since I don't know what else to dco. She is young, attractive, a bit distracted trying to find a loader and with the stress caused by this has some worrying lines on her forehead. I hope we figure this out soon, her face definitely looks better when she is not stressed. The purple is something I can't decide how I feel about -- very exotic, to be sure, and it stands out a bit. But it has an odd effect on her eyes that I cannot quite place. I can't really determine whether I like it or not, to be honest.
It occurs to me that I am not going to be telling her any of this, conversationally or otherwise.
A brief sigh of regret, and I let it go and pay attention to our situation again. I simply lack that kind of nerve, these days....
It is now 45 minutes to departure, I was not through security yet. And my companion clothed in purple and I are going nowhere.
I am struck by the fact that the loaders were eagerly surrounding a powered scooter yet no one is even willing to acknowledge the existence of a beautiful woman.
Very odd. I suppose there are many beautiful women around while the scooter is something of a novelty. But still....
Finally a loader responds to her now furtive yet still subtle gestures.
I am pushed up to security and I give then my laptops to go through the x-ray. They do not ask me to remove them from the case, or to take off my shoes or put my wallet in the tray, or my belt.
They ask me twice to put my phone in a tray.
When I am through the officer asks me if I can stand. I am thinking about how tired I am at the end of the day and I grimace a bit, saying "If I have to, I can."
He shakes his head and gives me the most lackadaisical pat down I have ever been given.
I am almost offended -- I could have been sitting on a SCUD missile and he would have missed it. On the other hand look how poorly those SCUD missiles did in Iraqi hands. I was too tired to argue they should look at me more closely, so I just let myself get led over to retrieve my bags.
One of the officers wants to look in one compartment of one of the laptop cases.
It is now just twenty minutes to departure.
I count from ten to myself in Swedish. tio, nio, åtta, sju, sex, fem, fyra, tre, två, en.
He opens it and takes everything out, asking what each item is (it was my two Zunes, three batteries for the MacBook Pro, two USB powered personal fans, about ten different pieces of various adapters and chargers and converters for power outlets, a few USB keys, the charger for the phone, and the AC adapter for the MacBook Pro,
He packs the bag back up and lets me go.
We are now just ten minutes to departure.
They put me on a bus (three men lift me and the chair onto it), and we drive to the plane. They lift me off the bus, and wheel me up to just behind the plane. The woman in purple leaves with the bus, and I am sitting alone at the bottom of a staircase leading up to the rear door on the plane.
Finally a flight attendant sees me and tries to get some people to help.
I count from ten backward to myself in French.... dix, neuf, huit, sept, six, cinq, quatre, trois, deux, un.
At this point I am just way too frustrated, so I go ahead and climb the stairs. I know I might be regretting it later but for now the adrenaline pushes me up the stairs. Someone has taken the bags off my shoulders and carries them up.
The flight is uneventful and takes less time than the flight from Seattle to Portland, or to Spokane.
Everyone else is off the plane, but the flight attendant won't let me get up, "Someone is coming to help you down," she explains.
I count from ten backward to myself in Basque. hamar, bederatzi, zortzi, zazppi, si, bost, lau, hiru, bi, bat.
Two men finally come up and are carrying me down the stairs, almost tipping me once and almost dropping me twice. I am briefly terrified, and this does not improve until I am on the ground.
Someone pushes me into the terminal. I look for my bag and the scooter, and find my bag,. But the scooter is nowhere to be seen.
Suddenly I see a piece of it. Harrison Ford's voice pops into my head.
I've got a bad feeling about this.
I count from ten backward to myself in Greek. thaeca, eney-ya, octo, aeft, aeksee, paendae, taessaera, treea, thee-o, aena.
I ask if someone can push me up to the scooter piece and they do so.
It is actually two pieces hanging together oddly - they removed the clip connecting the front section to the rear section but did not disconnect the wiring plug between them (it is now cracked and wires are exposed).
I count from ten backward to myself in Hebrew. esser, taysha, shmoneh, shehvah, shesh, chamash, arbah, shalosh, shtayim, echad.
I stand up and say "please let me do this" before they break it any further.
While I am taking care of these pieces, the seat and the battery arrive (the basket is still nowhere to be found).
I put the whole unit together, noting the cracked wiring case, presumably due to it being kept attached rather than separated when the clip holding the pieces together was removed.
I count from ten backward to myself in Esperanto. dek, naǔ, ok, sep, ses, kvin, kvar, tri, du, unu.
I mentally try to decide what to do about that. No biggee, I'll probably just buddy tape it. If everything works then the buddy tape will be fine.
Then I sit in the chair and pull out the key from my pocket (realizing suddenly that the pat down did not find that either).
I put the key in and the meter goes to full -- I start to smile, as the connection seems okay.
My smile stops when the scooter starts beeping insistently.
I count the beeps -- five of them , cycling and then another five, and so on.
I did not have the manual in front of me, but I do now and I knew what it meant then.
Solenoid brake trip. The manual freewheel lever may be in the freewheel position.
It's okay, an easy fix.
I remove the key, take off the seat and the battery,and try to pull back the freewheel lever.
It is jammed. It won't move forward to freewheel mode or backward into drive mode.
I count from ten backward to myself in Sanskrit.
Wait, I'm totally kidding -- I don't know how to do it in Sanskrit. But I do close my eyes and count OM ten times slowly.
I ask to see a supervisor.
When Charumathy arrives I tell her what has happened and show her what is working and what is failing, what is broken, and what is missing (the basket).
She has me fill out the damage complaint form, gives me her card with her name and number on it and assures me someone will call by the next morning at 11am.
She did call (it was about 11:30am but she apologized) and told me that they were going to work this and that although she could do nothing until after the weekend that someone would call me on Monday evening. I mention I have some information wholesale prices and such and she asks me to forward this to her, which I agree to do.
Everything changed, either way. I was on the verge of taking a leave of absence for the rest of the time on the visa from this India trip and travelling for longer, but the knowledge of how close I am to helpless scared that notion right out of consideration.
It was like the first time I fell; I never felt the same again. I doubt this one is going to be any easier to get over as a former invulnerability becomes certain knowledge that I am even more vulnerable than I used to be....
Anyhow, this blog will be live on Monday morning (Bangalore time) which is a few hours from now that I am typing this, so the situation will be some hours from knowing what is happening next.
But I can update everyone once I know what is happening.
And what language I am counting back from ten in at that point....
I had the opportunity yesterday to Speak at Anna University (at the computer science center of the College of Engineering at Guindy) that was, in a word, amazing.
The idea of doing something there started by someone who read here about my India trip and made a suggestion to Soma via his blog that such a presentation might make sense. From there, the right people at Microsoft talked to the right people at the University and then it all came together
Special thanks to Chandar Sundaram and Dr. S. Swamynathan for their assistance in this regard!
So many things about the experience impressed me -- their knowledge, their passion, watching them think about their own language in a context where the majority of them are quite used to only thinking about English as often happens in the technical sphere of thinking. You could see lights going on and interest in thinking further about the issues on their faces. Their passion was at times intoxicating....
Afterward, I had the chance to look at some of the many Tamil language projects going on in Dr.T.V.Geetha's lab. And while they did not see thinking about the Tamil language in the context of technology as a huge mind-shift (it after all being their day job!), the various projects they were working on were also quite fascinating.
My only regret is that the short notice for the India trip led to short notice for the Anna University visit, and I would have loved to be able really plan the opportunity to perhaps even do some interesting technical talks for their .NET and C# classes and Unicode/globalization/localizability support. Also, the interaction with bright minds eager to learn almost made me sorry it wasn't also a college recruiting trip for Microsoft (although I suspect some of the people I met may end up at Microsoft eventually!).
The talk covered a lot of ground -- from Unicode to the many ISCII/TAB/TAM/TSCII codepages to keyboards to fonts to the Tamil Language Interface Pack and more -- it was pretty exciting and if I am lucky inspired some of the students to look more closely at this crucial area of programming so often overlooked, even by those who have lived their lives in cultures with rich language traditions that make up so much of their upbringing....
Other interesting issues to note include the fact that the college's female population in technical tracks exceeds 30%, something unheard of in the US in any coed institution. In Geetha's lab it was well over 50% and to be honest I think among the students who attended my presentation it was well over 40% (I could try to blame the latter on my undeniable charisma and charm but that would be silly -- even if I had such impact they would not have known it before they came into the room!).
I have to wonder about the cultural issues in the USA that keep women out of technology and how much we lose as a country for having lost sight of the fact that people are simply people, and technical people can be both male and female. To Err is human, but to Geek is divine, after all.
Did anyone else know that ANUSVARA in some spell checkers suggests UNSAVORY. Talk about your irony....
An amazing opportunity, and one I hope I have the chance to do again! :-)
This post brought to you by ஂ (U+0b82, aka TAMIL SIGN ANUSVARA -- the most un-Tamil of all that is Tamil in Unicode)
The exact words:
I apologize for being so shameless.
She was subsequently assured that she had nothing to be ashamed of, since casual subconscious flirting between people slightly bored by their surroundings who would never take it any further given the genuine lack of means, motive, and opportunity does not now and would not ever constitute a reason to feel such a thing.
A smile proved this assuaged the shame. :-)
It seemed flattering, nevertheless....
But then maybe I need to get out more, Or stay in more. Or whatever.
The ears of SiaO are open, and one never knows what might be overheard....
The various characters in Unicode are ashamed at even the implied sponsorship caused by letters being used to represent the words in this post
The first blog in this series was On reversing the irreversible (the introduction) and the second was On reversing the irreversible (The Set-Up).
The real question that comes up is how to make InitializeSortkeyReverseData[Ex] do the job here -- starting with how to store the information?
Looking at our prototypical empty sort key value, discussed in A&P of Sort Keys, part 0 (aka The empty string sorts the same in every language):
01 01 01 01 00
or with the placeholders for actual sort key data included:
[all Unicode sort weights] 01 [all Diacritic weights] 01 [all Case weights] 01 [all Special weights] 01 [Punctuation weights] 00
For each sort key value obtained, we need to separately store these five chunks, hashing them by the original string used to get the sort key.
And how to get the big bunch of sort key values?
Well, you may recall some of my prior discussion on sort elements, such as Sort element vs. text element and More on sort elements.
(if not then you can read them now!)
Well, the goal is to isolate each call to retrieve only one single sort element. There are two important consequences of this need:
1) Any time you have an expansion -- discussed in that second post above and more rigorously in A&P of Sort Keys, part 5 (aka EXPANSIONing your horizons) -- you do not need to store the information. Because you are really getting two or three sort elements there, your later attempts to do lookups of single sort keys are never going to find them (the individual sort elements will be found instead).
The consequence? You will in most cases not be able to retrieve Æ (U+00c6, LATIN CAPITAL LETTER AE) since it will for most locales only ever be treated like AE.
It may be tempting to remove the items (perhaps detecting them via a call to FoldString with the MAP_EXPAND_LIGATURES flag), but as I point out in Why doesn't FoldString take an LCID? there is no way to get the information for anything but the default table. So on the whole storing this data is a harmless thing, and nothing you really need to worry about (though if you wanted to remove them to have fewer entries you could try that FoldString route).
2) Any time you have a compression in a locale -- more on these all over the place in the blog like A Microsoft convention for compressions in sorting and the definitive A&P of Sort Keys, part 6 (aka Relax, be calm, and deCOMPRESS if you are feeling out of sorts) -- you will get a single sort element from multiple letters.
So reversing them is easy, but in order to store them you have to know what they are either because you know the language and its rules or because you had the actual data (or you have to do huge brute force calls to get every single letter combination to look for sort key values that are of different length than they are in the default table which has no compressions).
This is actually the easiest part for people to skip (and most of the attempts at solving this problem, people do skip), even though it means you miss out on a lot of what Windows has to offer in the way of collation support. But in many ways being willing to be flexible and work a little harder to build the data cache up will reap huge rewards and is worth the extra effort....
So how to proceed? Well, that part is easy to start -- naively you can scroll through every single code point from U+0001 to U+10fffd and store the sort key value. Less naively you can generally skip lots of different code points, like via a GetStringTypeW call skipping anything that is C1_CONTROL or for some purposes consider whether you want to be throwing out C1_BLANK or C1_PUNCT or C3_SYMBOL values.
You don't have to do this of course, but you have to ask yourself whether it is worth retaining all of that data or whether it is extraneous -- like you may want the punctuation saved so that the word don't can be retained, but do you really care about control characters? And if you are going to put in a SPACE for all of the characters you skip then maybe you could take that smallest symbolic weight (ref: I need my SPACE, symbolically speaking) and plug it in for many of these non-letter type characters.
And when you find a duplicate sort key, you can do one of two things -- you can throw away the new one, or you can replace the old one with it. And you probably want to not pass a bunch of NORM_IGNORE* flags as I mentioned before in (The Set-Up) -- I'll discuss how to get those same results later, but for now let's keep the data.
By default we'll assume that we should get all of the language specific stuff so we'll figure out a huge naive way of doing it -- we'll optimize it later, after we figure out what we have.
Now I have skipped stuff that is seemingly quite important that has come up in this blog before, like Japanese Kana, Korean Jamo, and more. But for now just trust me when I say that they are either nothing to worry about (or at worst no more awful than compressions, above), and in an upcoming blog I will explain why.
I will also explain in an upcoming blog about how the actual reversal operation will use this data that has been conceptually built in up, so that the combination of how we get the data and how we are gonna use it can help drive how to implement the actual storage....
Think of this series like a bunch of interview questions, but ones that I am going to answer more than I ask (which is not to say I won't do some quizzing of you readers!).
Stay tuned -- this series is only gonna get cooler! :-)
This post brought to you by Ặ (U+1eb6, aka LATIN CAPITAL LETTER A WITH BREVE AND DOT BELOW)
Now at this point some of the basic ideas are here as well as three kind of illustrative but not terribly useful samples, and the obvious question to ask is what gets covered next.
To start, I'll clear up some of the mystery from Part 3:
At this point, you may be wondering about two more things -- the meaning of the INPUT keyword here, and the meaning of the number 0. The fact that every single text file on Vista includes these very same entries is just a special bonus to people trying to figure out the meaning without assistance or documentation....
The INPUT thing describes the functionality to use and I'll get to that eventually describing other options here.
And that 0 thing? The meaning is "handle all shift states". You can replace it with flags that can be added together (literally added, with a PLUS sign and everything). The full list of them is:
If you ever had to deal with keystroke events, you might see where some or all of this is coming from. Essentially you can use these flags to limit when the TABLE DRIVEN TEXT SERVICE will be handling specific key strokes and when it will ignore them and let them get passed through to the underlying keyboard.
Unfortunately, the TEXT section does not give you a way to distinguish between what shift state(s) are being depressed - which means if you add entries for both a and A then you will get two items in the candidate list to choose from, rather than just one. The good news is that the presence of these modifier flags mean that a future release can fix this limitation since the DLL really is paying attention to what modifier key(s) are being pressed for filtering purposes and they are positioned to extend that code to modify potential candidates in the future.
So for now when I am building these text files, I simply act like the case does make a difference so that when/if they fix this bug everything will work, and in the meantime everything will still work (albeit more slowly since a candidate list will exist).
This "workaround" only affects the "case" distinction; there is currently no way to specify the other modifier states and there is no way to know what this syntax will look like when/if that limitation is addressed.
But if I ever do find out you (as my loyal readers) would be the first to know....
An interesting consequence is that you can literally add duplicates if you want to -- and they will just show up in the candidate list. Since filtering is done based on really any of the modifiers you specify (or no filtering if you use that 0 that the built-in ones seem to use), you can add the ALTGR choice in with the same candidate characters as the non-ALTGR version, and for now it will just be on the candidate list.
Easy!
You can also simply add pure duplicates with no plans to ever distinguish them further, giving people two options....
But whenever there is more than one option in a candidate list, the order of the list can be very important (for obvious reasons). This is specifiable in the [Configuration] section of the text file, with one of the following options:
Keystroke sort
Text candidate list item should sort by keystroke order. Form of keystroke sort is: KeystrokeSort = integer value Where: 0 - turn off keystroke sort (default) Not 0 – turn on keystroke sort
Text candidate list item should sort by keystroke order.
Form of keystroke sort is:
KeystrokeSort = integer value
Where: 0 - turn off keystroke sort (default)
Not 0 – turn on keystroke sort
Text sort
Text candidate list item should sort by text string order. Form of text sort is: TextSort = integer value Where: 0 - turn off text sort (default) Not 0 – turn on text sort
Text candidate list item should sort by text string order.
Form of text sort is:
TextSort = integer value
Where: 0 - turn off text sort (default)
Not 0 – turn on text sort
When you have pure duplicate entries, neither option will help much since KeyStrokeSort will not distinguish so should keep them in the order of the text file, while TextSort will also not distinguish and should also order them by the order they are in the file. But when the entries are not duplicates you can see different results and it is worth trying both to see which order you prefer.
This post brought to you by ④ (U+2463, aka CIRCLED DIGIT FOUR)
Now I did not start this at the beginning, I kind of picked it up in the middle. Here is some info that is nice to know very near to the beginning, though....
Here are the pre-defined keystrokes and how they behave in the Candidate Window.
Note that they are Virtual Key (VK) based. I will get to the consequences of that in a moment.
While composing:
Virtual Key
Function
VK_LEFT
Move caret to left in composition char
If we have the Incremental Candidate List, then cancel composition.
VK_RIGHT
Move caret to right in composition char
If we have the Incremental Candidate List, then cancel composition
VK_RETURN
Finalize composition char
VK_ESCAPE
Cancel composition char
VK_BACK
Delete one composition char
VK_SPACE
Convert composition char
VK_UP
If we have the Incremental Candidate List, then move candidate selection to up
VK_DOWN
If we have the Incremental Candidate List, then move candidate selection to down
VK_PRIOR
If we have the Incremental Candidate List, then move candidate selection to prior
VK_NEXT
If we have the Incremental Candidate List, then move candidate selection to next
VK_HOME
If we have the Incremental Candidate List, then move candidate selection to home
VK_END
If we have the Incremental Candidate List, then move candidate selection to end
And then in the Candidate/Phrase Window:
Move candidate selection to up
Move candidate selection to down
Move candidate selection to prior
Move candidate selection to next
Move candidate selection to home
Move candidate selection to end
Finalize candidate list
Cancel candidate list
VK_1 - VK_9
Select list item for corresponding number, one through nine
Now each text file has a section it in that is key to how every other character is handled. Here is what it looks like in our MSDN_TableTextService_Explicit.txt file:
[Keystroke.Composition]VK_1, 0 = INPUT // 1VK_2, 0 = INPUT // 2VK_3, 0 = INPUT // 3VK_4, 0 = INPUT // 4VK_5, 0 = INPUT // 5VK_6, 0 = INPUT // 6VK_7, 0 = INPUT // 7VK_8, 0 = INPUT // 8VK_9, 0 = INPUT // 9
Now as that list might or might not seem to indicate, all other keys just pass right through to the underlying keyboard that sits underneath the Tabld Driven TIP.
But adding a VK_* value to this list will cause keystrokes from that VK to be processed by the Table Driven Text Service -- and if those letters never appear in the [Text] section then those keystrokes will be eaten.
Now there are a few more advanced features, which I'll discuss soon.
And there is the interaction between the predefined keys and the ones you define in the file -- basically the rule is easy and simple for the DLL -- it always wins. Which means if you define something then it will alwys pay attention to you unless it has its own override (hih is why VK_1 - VK_9 have that weird behavior where as soon as you have a candidate list then if it is numbered then the candidate item selection takes prioirty over the actual input -- which leads to VERY confusing behavior.
I'll discuss this more soon, as well. :-)
If you are one of those people then you can take comfort in the fact that I was too, so I'll be getting to this very soon, in an upcoming blog in this series. :-)
Coming up soon, maybe next -- some really explicit information about the [TEXT] section, and more information on some of the configuration sections....
This post brought to you by ③ (U+2462, aka CIRCLED DIGIT THREE)