A recent post describes how RichEdit chooses default fonts for Unicode characters. The method assigns a character repertoire (CharRep) to each character and queries fonts to find out which CharRep’s they support. If the current font doesn’t support the CharRep for a character, RichEdit chooses a font that does. A variety of heuristics are used to fix up tricky cases.

A simple extension reveals whether a font supports a given language. Although many languages may use the same CharRep, e.g., most Western European languages use the LATIN CharRep, very few languages use more than one CharRep. RichEdit has a table mapping languages to CharRep’s, a table useful for sophisticated clients like OneNote and PowerPoint that typically know or guess the language(s) for text runs, but don’t know the CharRep’s.

A simple language descriptor suffices for font binding purposes, namely the Win32 Local ID (LCID). In fact to get the appropriate CharRep, most cases need only the primary language ID (PLID) given by the low-order 10 bits of the LCID. A few cases need the secondary language ID (SLID—next six bits) to choose among possible CharRep’s, e.g., Simplified vs. Traditional Chinese or Cyrillic vs. Turkish.

 

CharRep

PLID

Language

Comment

Default

00

Undefined

 

Arabic

01

Arabic

 

Cyrillic

02

Bulgarian

 

Latin

03

Catalan

 

GB2312

04

Chinese

SLID gives traditional vs.

simplified

EastEurope

05

Czech

 

Latin

06

Danish

 

Latin

07

German

 

Greek

08

Greek

 

Latin

09

English

 

Latin

0A

Spanish

 

Latin

0B

Finnish

 

Latin

0C

French

 

Hebrew

0D

Hebrew

 

EastEurope

0E

Hungarian

 

Latin

0F

Icelandic

 

Latin

10

Italian

 

ShiftJis

11

Japan

 

Hangul

12

Korea

 

Latin

13

Dutch

 

Latin

14

Norwegian

 

EastEurope

15

Polish

 

Latin

16

Portuguese

 

Default

17

Rhaeto-Romanic

 

EastEurope

18

Romanian

 

Cyrillic

19

CYRILLIC

 

EastEurope

1A

Croatian

 

EastEurope

1B

Slovak

 

EastEurope

1C

Albanian

 

Latin

1D

Swedish

 

Thai

1E

Thai

 

Turkish

1F

Turkish

 

Arabic

20

Urdu

 

Latin

21

Indonesian

 

Cyrillic

22

Ukranian

 

Cyrillic

23

ByeloCYRILLIC

 

EastEurope

24

Slovenian

 

Baltic

25

Estonia

 

Baltic

26

Latvian

 

Baltic

27

Lithuanian

 

Default

28

Tajik

Tajikistan

Arabic

29

Farsi

 

Viet

2A

Vietnanese

 

Armenian

2B

Armenian

 

Turkish

2C

Azeri

SLID gives Latin/Cyrillic

Latin

2D

Basque

 

Default

2E

Sorbian

 

Cyrillic

2F

FYROM

Macedonian

Latin

30

Sutu

 

Latin

31

Tsonga

 

Latin

32

Tswana

 

Latin

33

Venda

 

Latin

34

Xhosa

 

Latin

35

Zulu

 

Latin

36

Africaans

 

Georgian

37

Georgian

 

Latin

38

Faerose

 

Devanagari

39

Hindi

Indic

Latin

3A

Maltese

 

Latin

3B

Sami

 

Latin

3C

Gaelic

 

Hebrew

3D

Yiddish

 

Latin

3E

Malaysian

 

Cyrillic

3F

Kazakh

 

Cyrillic

40

Kyrgyz

Cyrillic

Latin

41

Swahili

 

Latin

42

Turkmen

 

Turkish

43

Uzbek

SLID gives Latin/Cyrillic

Latin

44

Tatar

 

Bengali

45

Bengali

Indic

Gurmukhi

46

Punjabi Gurmukhi

Indic

Gujarati

47

Gujarati

Indic

Oriya

48

Oriya

Indic

Tamil

49

Tamil

Indic

Telugu

4A

Telugu

Indic

Kannada

4B

Kannada

Indic

Malayalam

4C

Malayalam

Indic

Bengali

4D

Assamese

Indic

Devanagari

4E

Marathi

Indic

Devanagari

4F

Sanskrit

Indic

Mongolian

50

Mongolian

Mongolia

Tibetan

51

Tibetan

Tibet

Latin

52

Welsh

Wales

Khmer

53

Khmer

Cambodia

Lao

54

Lao

Lao

Myanmar

55

Burmese

Myanmar

Latin

56

Gallego

Portugal

Devanagari

57

Konkani

Indic

Bengali

58

Manipuri

Indic

Gurmukhi

59

Sindhi

Indic

Syriac

5A

Syriac

Syria

Sinhala

5B

Sinhalese

Sri Lanka

Cherokee

5C

Cherokee

 

Aboriginal

5D

Inuktitut

 

Ethiopic

5E

Amharic

Ethiopic

Default

5F

Tamazight

Berber/Arabic, also Latin

Default

60

Kashmiri

 

Devanagari

61

Nepali

Nepal

Latin

62

Frisian

Netherlands

Arabic

63

Pashto

Afghanistan

Latin

64

Filipino

 

Thaana

65

Maldivian

Divehi

Latin

66

Edo

Nigeria

Latin

67

Fulfulde

Nigeria

Latin

68

Hausa

Nigeria

Latin

69

Ibibio

Nigeria

Latin

6A

Yoruba

Nigeria

Latin

6B

Quechua

Bolivia, Ecuador, Peru

Latin

6C

Sesotho sa Leboa

 

Cyrillic

6D

Baskir

Cyrillic

Latin

6E

Luxembourgish

Luxembourg

Latin

6F

Greenlandic

Greenland

Latin

70

Igbo

Nigeria

Latin

71

Kanuri

Nigeria

Ethiopic

72

Oromo

Ethiopia

Ethiopic

73

Tigrigna

Ethiopia/Eritrea

Latin

74

Guarani

Paraguay

Latin

75

Hawaiian

United States

Latin

76

Latin

 

Latin

77

Somali

Somalia

Yi

78

Yi

China

Latin

79

Papiamentu

 

Latin

7A

Mapudungun

Chile

Latin

7B

-

 

Latin

7C

Mohawk

 

Latin

7D

-

 

Latin

7E

Breton

France

Latin

7F

 

 

Arabic

80

Uighur

China Arabic

Latin

81

Maori

 

Latin

82

Occitan

France

Latin

83

Corsican

France

Latin

84

Alsatian

France

Cyrillic

85

Yakut

Russia

Latin

86

K'iche

Guatemala

Latin

87

Kinyarwanda

Rwanda

Latin

88

Wolof

Senegal

Latin

89

-

 

Latin

8A

-

 

Latin

8B

-

 

Arabic

8C

Dari

Afghanistan

Latin

8D

Malagasy

Madagascar

Latin

8E

-

 

TaiLe

8F

Tai Le

China

NewTaiLu

90

New Tai Lu

China

 

In this table, the EastEurope, Baltic and Turkish CharRep’s use the Latin script, but are distinct since they correspond to distinct charset’s. The PLIDs are in hexadecimal.

To tell RichEdit 5.0 or later to format a text run with the CharRep for a given LCID, send the message EM_SETCHARFORMAT with wparam containing the flag SCF_CHARREPFROMLCID (0x0100) and lparam pointing to a CHARFORMAT2 with the LCID.