A recent post describes how RichEdit chooses default fonts for Unicode characters. The method assigns a character repertoire (CharRep) to each character and queries fonts to find out which CharRep’s they support. If the current font doesn’t support the CharRep for a character, RichEdit chooses a font that does. A variety of heuristics are used to fix up tricky cases.
A simple extension reveals whether a font supports a given language. Although many languages may use the same CharRep, e.g., most Western European languages use the LATIN CharRep, very few languages use more than one CharRep. RichEdit has a table mapping languages to CharRep’s, a table useful for sophisticated clients like OneNote and PowerPoint that typically know or guess the language(s) for text runs, but don’t know the CharRep’s.
A simple language descriptor suffices for font binding purposes, namely the Win32 Local ID (LCID). In fact to get the appropriate CharRep, most cases need only the primary language ID (PLID) given by the low-order 10 bits of the LCID. A few cases need the secondary language ID (SLID—next six bits) to choose among possible CharRep’s, e.g., Simplified vs. Traditional Chinese or Cyrillic vs. Turkish.
CharRep
PLID
Language
Comment
Default
00
Undefined
Arabic
01
Cyrillic
02
Bulgarian
Latin
03
Catalan
GB2312
04
Chinese
SLID gives traditional vs.
simplified
EastEurope
05
Czech
06
Danish
07
German
Greek
08
09
English
0A
Spanish
0B
Finnish
0C
French
Hebrew
0D
0E
Hungarian
0F
Icelandic
10
Italian
ShiftJis
11
Japan
Hangul
12
Korea
13
Dutch
14
Norwegian
15
Polish
16
Portuguese
17
Rhaeto-Romanic
18
Romanian
19
CYRILLIC
1A
Croatian
1B
Slovak
1C
Albanian
1D
Swedish
Thai
1E
Turkish
1F
20
Urdu
21
Indonesian
22
Ukranian
23
ByeloCYRILLIC
24
Slovenian
Baltic
25
Estonia
26
Latvian
27
Lithuanian
28
Tajik
Tajikistan
29
Farsi
Viet
2A
Vietnanese
Armenian
2B
2C
Azeri
SLID gives Latin/Cyrillic
2D
Basque
2E
Sorbian
2F
FYROM
Macedonian
30
Sutu
31
Tsonga
32
Tswana
33
Venda
34
Xhosa
35
Zulu
36
Africaans
Georgian
37
38
Faerose
Devanagari
39
Hindi
Indic
3A
Maltese
3B
Sami
3C
Gaelic
3D
Yiddish
3E
Malaysian
3F
Kazakh
40
Kyrgyz
41
Swahili
42
Turkmen
43
Uzbek
44
Tatar
Bengali
45
Gurmukhi
46
Punjabi Gurmukhi
Gujarati
47
Oriya
48
Tamil
49
Telugu
4A
Kannada
4B
Malayalam
4C
4D
Assamese
4E
Marathi
4F
Sanskrit
Mongolian
50
Mongolia
Tibetan
51
Tibet
52
Welsh
Wales
Khmer
53
Cambodia
Lao
54
Myanmar
55
Burmese
56
Gallego
Portugal
57
Konkani
58
Manipuri
59
Sindhi
Syriac
5A
Syria
Sinhala
5B
Sinhalese
Sri Lanka
Cherokee
5C
Aboriginal
5D
Inuktitut
Ethiopic
5E
Amharic
5F
Tamazight
Berber/Arabic, also Latin
60
Kashmiri
61
Nepali
Nepal
62
Frisian
Netherlands
63
Pashto
Afghanistan
64
Filipino
Thaana
65
Maldivian
Divehi
66
Edo
Nigeria
67
Fulfulde
68
Hausa
69
Ibibio
6A
Yoruba
6B
Quechua
Bolivia, Ecuador, Peru
6C
Sesotho sa Leboa
6D
Baskir
6E
Luxembourgish
Luxembourg
6F
Greenlandic
Greenland
70
Igbo
71
Kanuri
72
Oromo
Ethiopia
73
Tigrigna
Ethiopia/Eritrea
74
Guarani
Paraguay
75
Hawaiian
United States
76
77
Somali
Somalia
Yi
78
China
79
Papiamentu
7A
Mapudungun
Chile
7B
-
7C
Mohawk
7D
7E
Breton
France
7F
80
Uighur
China Arabic
81
Maori
82
Occitan
83
Corsican
84
Alsatian
85
Yakut
Russia
86
K'iche
Guatemala
87
Kinyarwanda
Rwanda
88
Wolof
Senegal
89
8A
8B
8C
Dari
8D
Malagasy
Madagascar
8E
TaiLe
8F
Tai Le
NewTaiLu
90
New Tai Lu
In this table, the EastEurope, Baltic and Turkish CharRep’s use the Latin script, but are distinct since they correspond to distinct charset’s. The PLIDs are in hexadecimal.
To tell RichEdit 5.0 or later to format a text run with the CharRep for a given LCID, send the message EM_SETCHARFORMAT with wparam containing the flag SCF_CHARREPFROMLCID (0x0100) and lparam pointing to a CHARFORMAT2 with the LCID.