Sorting it all Out Michael Kaplan's random stuff of dubious value Be sure to read the disclaimer here first!
Last month I was talking about how Feature ideas don't always turn out to be good ones. And I mentioned how I'd probably talk about other cases in the future.
What can I say besides welcome to the future. :-)
In Vista, from the time when it was just Longhorn, there has been enhanced collation support for all of the CJK locales. The stroke count sorts and Mandarin pronunciation (both Pinyin and Bopomofo) sorts all covered more characters, the Korean Hangul pronunciation sort was enhanced too, and the Japanese locale got a new alternate sort to cover everything in JIS X 0213. Basically a lot of work was done.
But there was one area that was not covered that was really bothering me -- there was no support for a Cantonese sort of any kind.
"But isn't Cantonese," you might ask, "a spoken dialect, not a written one?"
The Wikipedia article Written Cantonese gives a good answer to this question in its introduction:
Written Cantonese refers to the written language used to write colloquial standard Cantonese using Chinese characters.Cantonese is usually referred to as a spoken variant, and not as a written variant. Spoken vernacular Cantonese is different from standard written Chinese, which is essentially formal Standard Mandarin in written form. Written Chinese spoken word for word in Cantonese sounds overly formal and distant. As a result, the necessity of having a written script which matched the spoken language increased over time. This resulted in the formation of additional Chinese characters to complement the existing characters. Many of these represent phonological sounds not present in Mandarin. A good source for well documented written Cantonese words can be found in the scripts for Cantonese drama and Cantonese opera.With the advent of the computer and standardization of character sets specifically for Cantonese, many printed materials in predominantly Cantonese spoken areas of the world are written to cater to their population with these written Cantonese characters. As a result, mainstream media such as newspapers and magazines have become progressively less conservative and more colloquial in their dissemination of ideas. Generally speaking, some of the older generation of Cantonese speakers regard this trend as a step "backwards" and away from tradition. This tension between the "old" and "new" is a reflection of a transition that is taking place in the Cantonese speaking population.
And if you look at the major population centers with people who use Cantonese, there are clear efforts to support this development among many of the native speakers (and writers) of Cantonese.
There are some cultural issues that even I was faced with when doing research here that I will discuss further in a follow-up post....
Of course one of the big problems has been that there are multiple romanizations used to represent the pronunciations, and unfortunately they are often used in the same lists (like phonebooks in Macau and elsewhere that allow people to simply enter the pronunciation -- how can you hope to sort the phone book consistently if the people providing the pronunciations have different ideas of how even identical pronunciations are to be represented?
But lots of work has been done to try to help with this issue, for example the Jyutping system produced by the Linguistic Society of Hong Kong (LSHK). And many people have been trying to use it -- for example the government of the Hong Kong SAR's Chinese Language Interface Advisory Committee (CLIAC) has produced the Cantonese Pronunciation List of the Characters for Computers, a huge set of data providing Cantonese "Pinyin-esque" style pronunciations for much of the Hong Kong Supplemental Character Set (HKSCS).
When I first saw that we would have a list of over 30,000 ideographs and their pronunciations, I was excited -- perhaps this data could be used to provide a Cantonese sort for the people in Hong Kong and elsewhere who wanted it?
But unfortunately, while there is much that is good about Jyutping, it has one liability at present, one that it shares with Yale and other romanization systems: and that is that there are several romanization systems. And there is not yet one that is ubiquitous.
Another problem that exists is that for the 30,764 unique ideographs given pronunciations in the CLIAC-provided doc, there are less than 2,000 unique pronunciations (less than 700 if you do not include the tone values).
And yet another problem is in the decision about tones -- some number the tones in Cantonese at nine, while others claim that three of these are unimportant distinctions and that there are only six to worry about. So it is not just different romanization systems, which vary enough with place names like Canton and Guangzhou coming from the same word, but even if people agree on the romnization they may differ on their opinion of the tones (with some believing that tones 7, 8, and 9 actually fold into 1, 3, and 6 respectively).
And the final problem, there is not yet a clear and established standard on how to break ties -- once you decide which Han have the same pronunciation, how do you decide which one comes first?
There was just not enough of a consensus yet to try to push ahead in Windows with providing such a sort. Because Microsoft has no interest in dictating language policy; we just want to identify it so that we can represent things the way customers would like them.
But this now brings us to input methods.
Like I said way back in December of 2004, IMEs have it easy. In this case because (if for no other reason) if you identify a rich new source of pronunciations you can simply add them to the IME if you like them. Or you can provide different IMEs using the different systems, too (assuming you have enough data!).
Anyway, enough of the backstory, right? Let's get to the IME, like I said I would!
The steps are the same as they were with the Unicode IME. Just grab the file from here (871 kb) or you can grab the zipped version here (144 kb).
1) Copy the text file to \Program Files\Windows NT\TableTextService on your Vista machine (if the "Program Files" on your machine is another language, use that directory, do not create a new one!).
2) Open an elevated command prompt and navigate to that directory.
3) Run the following from that command prompt:
rundll32 TableTextService.dll RegisterProfile TableTextServiceCantonese.txt
4) Say OK to the dialog that comes up verifying you want to install it:
You can now add the Chinese Hong Kong Cantonese IME to the Chinese (Hong Kong S.A.R.) locale by going through the following steps that are illustrated here.
Now like the Unicode IME this is a sample, and further this is a work in progress. There are lots of things I would like to do to tweak settings here, like as in how/if the list should be sorted, for example.
(And if I find other huge caches of Cantonese pronunciations in other romanizations I might even see whether they could be productively combined.)
And like I said, in an upcoming post I will talk about many of the cultural issues I ran across while doing the research here -- they are fascinating!
This post brought to you by 䕫 (U+2f9b2, an Extension B ideograph in HKSCS with a Jyutping pronunciation of kwai4)
I dont know if there is a bug or not. But your method described to register a new IME does not work on my Vista RC2.
And i am wondering if there are anymore info on those variable inside TableTextService. A search on google does not return anything useful.
Thanks in advance
The keyboard layout for Inuktitut has a fascinating history on Windows, even though it really only dates
Ksec, are you absolutely sure that you're using an elevated command prompt (That is one with admin privileges) and not a regular one? If not, right-click on the command prompt and choose to run it with admin privileges. That solved it for me.
This is the first blog in a series that will talk about the table driven text service, a new feature
The (simplified) story with the tones is this:
Middle Chinese had four tones, conventionally named ping, shang, qu, ru. Ru tone was used only for syllables that ended in a stop: p, t, or k.
In each of the modern dialects, one or more of these tones split into two tones, conventionally called the yang and the yin varieties, mostly on the basis of whether the original syllable began with a voiced stop, r, or l (yin) or not (yang). Voiced stops have been lost in the Chinese languages (other than Shanghainese, which is radically different and may not be a tone language any more).
In Mandarin, ping tone split into modern tones 1 (yang) and 2 (yin), shang tone became modern tone 3, and qu tone became modern tone 4. Ru tone disappeared when Mandarin lost all final stops, and the syllables were redistributed among the other tones.
In Cantonese, the story is way more complicated. All four of the old tones split, and what's more, ru tone split *twice*. Consequently, ping tone became modern tones 1 (yang) and 4 (yin), shang tone became modern tones 2 (yang) and 5 (yin), qu tone became modern tones 3 (yang) and 6 (yin), and ru tone became modern tones 7 and 8 (yang) and 9 (yin). That's your nine tones, which represent a full structural analysis.
However, except for the final stop making the syllable shorter, tones 7, 8, and 9 are pronounced exactly like tones 1, 3, and 6 (some older speakers still pronounce 1 falling and 7 level, but for most they are both level now). This is why on the phonetic level the nine structural tones are reduced to just six.
In addition, there are two "changed tones" which have no counterparts in the other dialects, and which signal a variant meaning of the basic word (unlike other tones, which have no individual meanings any more than a vowel or a consonant has). So structurally there are really 11 tones, but the changed tones are phonetically just lengthened versions of 1 (or 7) and 2, leaving us once more with six.
I tried it but I got an DLL error "Error loading TableTextService.dll The specific module could not be found." How do I get the TableTextService.dll?