We have come a long way in this series, haven't we? :-)
Look at all that we have covered:
- Part 0: An enumeration with all of the Virtual Key values defined in winuser.h;
- Part 1: Getting Scan codes, Virtual Keys, and a single character in one shift state;
- Part 2: Making sure to not unload the keyboard if the user already had it loaded;
- Part 3: Finding dead keys and ligatures;
- Part 4: Moving SC/VK code determination that is static per layout out of the inner loop;
- Part 5: Character detection for the easy shift states;
- Part 6: Getting numeric keypad assignments;
- Part 7: Getting the base and combining characters for all of the (previously detected dead keys);
- Part 8: Detecting usage of the CAPS LOCK key (both SGCAPS and persistent shifting)
- Parts 9a and 9b: Character detection of the harder shift states
What a long, strange trip it has been!
There is really only one item left on that list of things to do that I mentioned originally, and that is dealing with chained dead keys, another feature that MSKLC does not support.
It is a slightly more interesting feature to discuss given that (unlike the harder shift states) does not really seem to exist in any keyboard layout that ships with Windows. So unless someone has been working with the DDK to build such a keyboard, there is no way to readily test what is being done.
I'll try not to let the somewhat theoretical nature of this post dissuade me too much. In order to dissuade you, the reader, I will try to point out more immediately relevant items as they come up. :-)
Anyone who has both used MSKLC and also been following along with this series will notice that I have completely ignored the connection between the dead key assignments and the tables that exist, one per dead key. This connection (which is a definite part of the MSKLC UI) is completely ignored in the code that this series has put together.
This omission is intentional; no direct connection actually exists. The code here in this series has it right!
If you think about the consequences of that notion -- that you explicitly tell a key that it is a dead key and it will then look up its own individual dead key table on the next keystoke. Such an architecture goes a long way to explaining the reason why you must have a valid character at each stage of a chained dead key -- because once you jump to a new "dead key table" there is no state information about the old "dead key table". And since the dead key tables only allow a single UTF-16 code unit for the base character and one more for the combined character, there is simply nowhere to store the knowledge or the need for additional characters.
This may cause people to ask why MSKLC was designed the way it was -- it was rather intensely discussed at the time, and ther was just no intuitive way we could find to show this "disconnected" model where dead key tables were not more directly tied to their dead keys.
Given how uncommon the scenario of putting the same dead key into multiple keys is, it would be hard to really notice the problem (for example, to date no one has actually ever reported the issue!).
Clever people who are following this discussion can probably come up with a bug or two in MSKLC if they put some thought to it. These would basically be known limitations, but I won't give any more hints about that in case there are people who wanted to try to spot a bug. :-)
Could this all be changed? Well, obviously any architecture that blocks a particular feature admits to a single last-resort workaround: re-architecting how the code works. But that would cause all kinds of other problems like breaking backcompat with any keyboard already created, not to mention taking code that is very stable and putting it in play again. And that does not even get into needing to create multiple versions of every keyboard layout so that you could install on the old and the rearchitected code. Code that does not even belong to our team, so we'd have to convince another team to do this work.
The price is just a bit too high, sorry. :-(
Anyway, getting back to the chained dead keys.
All you would need to do is change the code in a few places:
- In the DeadKey class, add the notion of saying whether the combined character is itself a dead key;
- In the ProcessDeadKey procedure, allow it to have knowledge of multiple dead keys that would need to be applied when it scrolls through every other character;
- In the ProcessDeadKey procedure, when the rc of the call to ToUnicodeEx is -1, check to make sure the dead key is not identical to either the one in process or any of the ones already processed; if it is neither of these, then use that new DeadKey class feature and call ProcessDeadKey again, recursively.
One more problem that I did not really take care of initially (I admit I was waiting to see if someone would ask about it -- no one is getting any jobs this time around!) is the poor use of the ArrayList class to store:
- The collection of DeadKey objects in the keyboard layout;
- The collection of Base characters in each DeadKey object.
I mean, the characteristics of both of these collections are:
- a somewhat unbounded (or at least unknown) size;
- the items within them that would act as keys must be unique;
- keys are the size of one UTF-16 code unit, basically a ushort;
- must be able to easily look up the members of the collection by the key;
- must be able to dump out all of the members of the collection.
Currently, the code in both cases scrolls through the entire collection to look for duplicates, since the ArrayList class that is so well suited for the first and fifth of these items is so piss poor at the second, third, and fourth. Certainly there are data structures that are better suited here, right? :-)
The Hashtable class is obviously a better choice, I think -- using (ushort)char for the key values.
This last change is not required, obviously. But it would save our performace a bit. Not to mention would keep us from needing to shudder as I have for the last few revs of our code. :-)
Of course without a keyboard layout to test this new code on, the primary goal will be to make sure the existing scenarios do not regress. I'll post up the new code soon in Part 10b of the series.
and perhaps how to create these sorts of keyboards another day....
This post brought to you by "A" (U+0041, LATIN CAPITAL LETTER A)
A Unicode character that is in the very small family of those whose VK value is the same as it's code point, also used for the hexidecimal version of the number 10!