Saturday, November 27, 2004 1:08 AM
Michael S. Kaplan
Some keyboarding terms
This posting will try to clear up some of the problems in documentation and info regading keyboards, since there is plenty left in those things to be confusing and there is no need to throw bewildering terms into the mix. Future posts will build on this one, so if you already know about keyboards you might be able to skip it (though I would not advise it!). It is not really a glossary since it is not alphabetized; the order is arbitrary based on either when I thought of a term to add or when I thought dramatic effect could be increased.
- LCID -- a locale identifier. Traditionally pronounced like "El-Sid". It is a value that has no real meaning at all to keyboards but some people who ought to know better seem to think it does. Those people are actually thinking of LANGID, another entry in the glossary. LCIDs are only 32 bits, a fact that sucks for reasons I'll talk about another day, but it's twice as much as keyboards use, anyway.
- LANGID - a languge identifier. Traditionally not pronounced like "Lan-Gid", instead "Lang-Eye-Dee" is preferred. This is the bottom WORD of the DWORD that makes up an LCID and essentially represents a number that signifies a unique language/region/script combination. This combination is what is needed for keyboards, which are intended to be typical input methods for such combinations. Obviously, looking ahead to the world of custom cultures ans custom locales, this may not be the best design architecture. But it's a little too late to change now....
- Layout ID -- a layout identifier. This has no special pronunciation, probably because calling something a "Lid" would sound dumb. These numbers are used to help the USER SUBSYSTEM manage the situation when multiple keyboards use the same LANGID (something that happens for many keyboards that ship with Windows). Each keyboard layout using the same LANGID after the first must (1) have one and (2) it must not duplicate one that is already assigned on the system. Failure on either of these points will cause a layout to not be properly selected.
- KLID -- a keyboard layout identifier. Traditionally pronounced "Kay-El-Eye-Dee" because some people in the USA get very uptight about certain homonyms (you can catch me slipping on this point from time to time). It's also sometimes called the input locale identifier since the name for HKL has been updated (see the HKL definiteion for info on why that is incorrect since the HKL is for something different). The KLID can be retrieved for the currently selected keyboard layout in a thread through the GetKeyboardLayoutName API (note the pswzKLID parameter), though that is not true of any other selected or installed keyboard layout. Every keyboard layout on the system has one of these. Each KLID is 32 bits (thus 8 hex digits), and they can all be found in the registry as the subkeys under HKLM\SYSTEM\CurrentControlSet\Control\Keyboard Layouts\. The bottom half of the KLID is a LANGID, and the top half is something device-specific. By convention, the first hex digit is usually as follows:
- 0 -- Most keyboard layouts
- A -- Keyboard layouts defined by MSKLC
- D -- Some non-CJK input methods that have been defined by the Text Services Framework (note: reported to me; I have never seen one of these!)
- E -- CJK input methods, also known as IMEs
- HKL -- a handle to a keyboard layout, traditionally pronounced "Āch-Kay-El", the terminiology folks have pretty aggressively tried to call this an "input locale identifier" despite the obvious problem that it has nothing to do with locales and that it is not the same value as the actual identifier (the KLID). The HKL in actuality is the handle to an input method. Althought defined as a handle, only the lower 32 bits are currently used. Of those 32 bits, the bottom 16 bits represent a LANGID, and the top 16 bits represent a value defined by the USER SUBSYSTEM which helps to uniquely identify an installed keyboard layout. This is crucial since any keyboard layout can be installed more than once (by installing it under different languages, which helps user operations like spell checking).
- MKLC -- see MSKLC. The only people who call it MKLC are the ones who object to MSKLC since it's not a true acronym (Microsoft being a single word). None of the user interface or documentation calls it this, so it's unbelievable that it is getting such a large entry in this glossary, but people outside of Microsoft use it all the time.
- MSKLC -- the Microsoft Keyboard Layout Creator, traditionally prounounced "Em-Es-Kay-El-See" by people not taken in by the MKLC arguments. It is a tool released by Microsoft which allows someone to build a custom keyboard layout and build a setup package to install it on Windows NT 4.0, 2000, XP, or Server 2003. The help file contains a ton of information about the best way to design keyboard layouts that work well on Windows. I am the developer on it and love to hear feedback any time people have it, since there is always room for improvement.
- IME -- Input method editor, traditionally pronounced "Eye-Em-Eee". An IME is an engine that converts keystrokes into phonetic or ideographic characters. It is a commonly used abstraction that allows a keyboard with only 100 or so keystrokes to be able to support character sets that contain up to 20,000 or more ideographs. There are old samples in the Platform SDK using the Input Method Manager (IMM) APIs, but today most IMEs written by Microsoft use the much more approachable Text Services Framework.
- Supported keyboard layout -- this is an odd terminology that actually means a keyboard layout is defined on the system. It may not be currently selectable by a user (e.g., if it's a Thai keyboard layout and Thai/complex script support is not enabled). It can also be an IME or a speed-to-text converter, so DEFINED INPUT METHOD might have been a better term. This terminology is slowly being removed from documentation and it's not entirely clear what is replacing it.
- Installed keyboard layout -- another odd bit of terminology, it means a keyboard layout that a user has selected. It can also be an IME or a speed-to-text converter, so SELECTED INPUT METHOD might have been a better term. This terminology is slowly being removed from documentation and it's not entirely clear what is replacing it.
- Scan code -- The numeric value given to each physical key on a keyboard; the scan code is a hardware-dependent number that identifies the key. Scan codes have a fixed position on the physical keyboard, irregardless of the keyboard layout chosen by the user.
- Virtual Key -- Also called the VK, the code that is given by the Windows USER subsystem to represent a keystroke. It is mapped from a scan code by using the keyboard layout definition and is thus entirely dependent on the user's chosen layout. The reason for this is the [unfortunate, IMHO] choice to have e.g. VK_A to be used for the 'A' key, which meant that on keyboards that put 'A' in a different position the VK would have to be moved.
That's all I can think of at the moment, but I am sure I will be updating this topic any time I think of something else.