Can I get my characters into Unicode?

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!

Can I get my characters into Unicode?

  • Comments 7

The other day, Ivan Petrov pointed out:

...maybe the BIGGEST one, is about the absence of many of the Cyrillic vowel letters with graves in Unicode, respectively in ANSI 1251 Codepage. There are defined only 2+2=4 (CAPITAL and SMALL letters with graves – #CYRILLIC CAPITAL LETTER IE WITH GRAVE, #CYRILLIC CAPITAL LETTER I WITH GRAVE, #CYRILLIC SMALL LETTER IE WITH GRAVE and #CYRILLIC SMALL LETTER I WITH GRAVE) in Unicode.
The whole list of the Cyrillic vowel letters must be:

#CYRILLIC CAPITAL LETTER A WITH GRAVE
#CYRILLIC CAPITAL LETTER IE WITH GRAVE
#CYRILLIC CAPITAL LETTER I WITH GRAVE
#CYRILLIC CAPITAL LETTER O WITH GRAVE
#CYRILLIC CAPITAL LETTER U WITH GRAVE
#CYRILLIC CAPITAL LETTER HARD SIGN WITH GRAVE
#CYRILLIC CAPITAL LETTER YERU WITH GRAVE (only for Russian language)
#CYRILLIC CAPITAL LETTER E WITH GRAVE (only for Russian language)
#CYRILLIC CAPITAL LETTER YU WITH GRAVE
#CYRILLIC CAPITAL LETTER YA WITH GRAVE
#CYRILLIC SMALL LETTER A WITH GRAVE
#CYRILLIC SMALL LETTER IE WITH GRAVE
#CYRILLIC SMALL LETTER I WITH GRAVE
#CYRILLIC SMALL LETTER O WITH GRAVE
#CYRILLIC SMALL LETTER U WITH GRAVE
#CYRILLIC SMALL LETTER HARD SIGN WITH GRAVE
#CYRILLIC SMALL LETTER YERU WITH GRAVE (only for Russian language)
#CYRILLIC SMALL LETTER E WITH GRAVE (only for Russian language)
#CYRILLIC SMALL LETTER YU WITH GRAVE
#CYRILLIC SMALL LETTER YA WITH GRAVE

So my third question is:
“What can be done about this problem?”

Form more information you can see at:
http://titus.uni-frankfurt.de/unicode/unicsel/unicself.htm#Cyrillic

Well, when I look at the list, I can only think of one thing (well, one stream of things!) to say:

А̀ Ѐ Ѝ О̀ У̀
Ъ̀ Ы̀ Э̀ Ю̀ Я̀
а̀ ѐ ѝ о̀ у̀
ъ̀ ы̀ э̀ ю̀ я̀

or in Unicode code points....

0410 0300 0415 0300 0418 0300 041e 0300 0423 0300
042a 0300 042b 0300 042d 0300 042e 0300 042f 0300
0430 0300 0435 0300 0438 0300 043e 0300 0443 0300
044a 0300 044b 0300 044d 0300 044e 0300 044f 0300

These characters already exist in Unicode, in the composite (decomposed) form. Note that they look better in some fonts than they do in others -- which is mainly a matter of letting font foundries that work to support languages know that there is a need to make sure these particular characters have good font hints so that they will not look good "by accident" of the combining character guessing how best to work with the base characters.

If you wanted to try to get them added to Unicode in the precomposed form, the submission process for new characters is very straightforward. However, as the proposal information clearly states:

  • Often a proposed character can be expressed as a sequence of one or more existing Unicode characters. Encoding the proposed character would be a duplicate representation, and is thus not suitable for encoding. (In any event, the proposed character would disappear when normalized.) For example, a g-umlaut character is not suitable for encoding, since it can already be expressed with the sequence <g, combining diaeresis>. For further information on such sequences see Where is my Character and the FAQ page Characters, Combining Marks question 12 and question 14.

So it would appear that these characters are unlikely to be separately encoded.

As for the request to add these code points to cp1251, I will deal with that in a separate post, perhaps later today (or sometime soon).

 

This post brought to you by "Ѡ" (U+0460, CYRILLIC CAPITAL LETTER OMEGA)

Comment on the blather
Leave a Comment
  • Please add 8 and 4 and type the answer here:
  • Post
Blog - Comment List
Page 1 of 1 (7 items)