Michael Entin asks in the Suggestion Box:

Hi Michael.

I want to revisit UTF-8 discussion.

In several posts you wrote that it is impossible to support UTF-8 as NT code page, since there is a lot of legacy code that assumes maximum of 2 bytes per char. So it is impossible to fix all this code to support UTF-8.

I don't quite understand how then does Windows support GB 18030 encoding? It appears it is a very similar encoding, where a character can be encoded by up to 4 bytes.

What are the differences between these two encodings? How come Windows can support one, but not the other?

I believe he is referring to this post and/or this post and/or the comments in this one....

And it is still true that UTF-8 (code page 65001) cannot be an ACP ("ANSI" code page") for a locale.

But from a technical standpoint, neither can GB-18030 (code page 54936) -- for pretty much the same reason.

The GB-18030 question is a bit more interesting since I am pretty sure there was an official request that we change the default system code page of the zh-CN locale to GB-18030, but unfortunately the answer was the same.

These code pages are present for people to convert things out of and to convert things into that a user might run across; they are not for the legacy ("ANSI") support in the Win32 API which, since The Unicode train is leaving the station, are not being added to or updated. So they work great in MultiByteToWideChar and WideCharToMultiByte, but the core OS is not going to updated to work internally off of either one.

Now the job would not be entirely impossible, though I suspect fairly improbable (and I say this as someone who has written a Unicode Layer for Win9x Systems (and who was asked once by another company to write a UTF-8 Encoding Layer for NT (or UELNT, I guess?), this would require a serious and non-trivial development effort, whether one is inside or outside of Microsoft. There simply isn't a specific reason or benefit to doing it that would outweigh the cost).

Now if I ever retired, that UELNT project might be something interesting to take a shot at if someone really wanted to fund it. But I would probably have to run out of other stuff to do first, and that doesn't seem likely to happen any time soon. :-)

 

This post brought to you by  (U+0edc, a.k.a. LAO KO LA)