The PUA outside of Unicode

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!

The PUA outside of Unicode

  • Comments 1

Colleague Aldo Donetti asked me:

Hi Michael, I was investigating a bug and it turns out that this character ‘’(U+E843) is in the Private use range but it is also part of the Chinese 936 codepage.

The issue is whether to consider characters in the private use area as valid characters in Identifiers (e.g. in VB/C#/WebService names/…) – I would not allow them but I’m not too familiar with that range so I’m double checking with you. At present it is not allowed (as weird as it may seem).

Thanks!
Aldo

Now the Private Use Area is a part of Unicode that I have discussed before (ref: previous posts). In particular, I have talked about the relationship between the PUA and EUDC (End User Defined Characters) like in this post.

But an important thing to keep in mind is that the PUA is not just a Unicode thing.

In fact, all of the East Asian code pages have areas set aside for private use, and specifically intended for the kind of characters that EUDC is intended. The various ranges used (shown in the registry at HKLM\SYSTEM\CurrentControlSet\Control\Nls\CodePage\EUDCCodeRange) are:

  • 932 --- 0xF040-0xF9FC
  • 936 --- 0xAAA1-0xAFFE, 0xF8A1-0xFEFE, 0xA140-0xA7A0
  • 949 --- 0xC9A1-0xC9FE, 0xFEA1-0xFEFE
  • 950 --- 0xFA40-0xFEFE, 0x8E40-0xA0FE, 0x8140-0x8DFE, 0xC6A1-0xC8FE
  • Unicode --- U+E000-U+F8FF

Looking at U+E843 (which is definitely in the Unicode PUA, covered in the defined range above) and its code page 936 mapping to 0xFE7E, it just kind of makes sense that the various ranges map to each other -- where else could they really map to if not to each other?

But the behavior that does not allow them identifiers sounds like a very good one, that should not change. Because whether one is in the Unicode PUA or the PUA of a code page, one is not looking at good candidates for identifers....

 

This post brought to you by(U+e843, a code value in the Unicode Private Use Area)

Comment on the blather
Leave a Comment
  • Please add 7 and 3 and type the answer here:
  • Post
Blog - Comment List
  • Sorting It All Out : The PUA outside of ...

Page 1 of 1 (1 items)