Sorting it all Out Michael Kaplan's random stuff of dubious value Be sure to read the disclaimer here first!
In the past I have had less than kind words to say about code pages 20127, 20269, and 1258. Well, with this post you will be able to add code page 864 to the list....
Over in the microsoft.public.platformsdk.mslayerforunicode newsgroup, shanab asked:
how i can encode from arabic(1256) to ibm(864) in c#.net?i need help for this problem because i cannot encode and decode this.
Of course this question has nothing to do with MSLU, but it is common for people to see the word Unicode in the title and just post their question there. I think everyone is used to it. :-)
Now in theory, since both code pages exist on Windows you could just pivot them through Unicode -- a simple Encoding.GetEncoding(1256).GetString() to go from Windows code page 1256 to Unicode and a simple Encoding.GetEncoding(864).GetBytes() to go from Unicode to cp864.
If we take a slightly modified version of the code from this post and run it to get the list of characters in cp864, though (looking above 0x7f, of course!):
U+009b 0x9bU+009c 0x9cU+009f 0x9fU+00a0 0xa0U+00a2 0xc0U+00a3 0xa3U+00a4 0xa4U+00a6 0xdbU+00ab 0x97U+00ac 0xdcU+00ad 0xa1U+00b0 0x80U+00b1 0x93U+00b7 0x81U+00bb 0x98U+00bc 0x95U+00bd 0x94U+00d7 0xdeU+00f7 0xddU+03b2 0x90U+03c6 0x92U+060c 0xacU+061b 0xbbU+061f 0xbfU+0640 0xe0U+0651 0xf1U+0660 0xb0U+0661 0xb1U+0662 0xb2U+0663 0xb3U+0664 0xb4U+0665 0xb5U+0666 0xb6U+0667 0xb7U+0668 0xb8U+0669 0xb9U+2219 0x82U+221a 0x83U+221e 0x91U+2248 0x96U+2500 0x85U+2502 0x86U+250c 0x8dU+2510 0x8cU+2514 0x8eU+2518 0x8fU+251c 0x8aU+2524 0x88U+252c 0x89U+2534 0x8bU+253c 0x87U+2592 0x84U+25a0 0xfeU+f8be 0xa6U+f8bf 0xa7U+f8c0 0xffU+fe7d 0xf0U+fe80 0xc1U+fe81 0xc2U+fe82 0xa2U+fe83 0xc3U+fe84 0xa5U+fe85 0xc4U+fe8b 0xc6U+fe8d 0xc7U+fe8e 0xa8U+fe8f 0xa9U+fe91 0xc8U+fe93 0xc9U+fe95 0xaaU+fe97 0xcaU+fe99 0xabU+fe9b 0xcbU+fe9d 0xadU+fe9f 0xccU+fea1 0xaeU+fea3 0xcdU+fea5 0xafU+fea7 0xceU+fea9 0xcfU+feab 0xd0U+fead 0xd1U+feaf 0xd2U+feb1 0xbcU+feb3 0xd3U+feb5 0xbdU+feb7 0xd4U+feb9 0xbeU+febb 0xd5U+febd 0xebU+febf 0xd6U+fec1 0xd7U+fec5 0xd8U+fec9 0xdfU+feca 0xc5U+fecb 0xd9U+fecc 0xecU+fecd 0xeeU+fece 0xedU+fecf 0xdaU+fed0 0xf7U+fed1 0xbaU+fed3 0xe1U+fed5 0xf8U+fed7 0xe2U+fed9 0xfcU+fedb 0xe3U+fedd 0xfbU+fedf 0xe4U+fee1 0xefU+fee3 0xe5U+fee5 0xf2U+fee7 0xe6U+fee9 0xf3U+feeb 0xe7U+feec 0xf4U+feed 0xe8U+feef 0xe9U+fef0 0xf5U+fef1 0xfdU+fef2 0xf6U+fef3 0xeaU+fef5 0xf9U+fef6 0xfaU+fef7 0x99U+fef8 0x9aU+fefb 0x9dU+fefc 0x9e
A quick glance of what cp864 supports explains why I talked about the ability to do this is something theoretical. :-)
The only Unicode code points in the regular Arabic block in Unicode are the numbers, which are incidentally not in cp1256. Everything else in cp864 is from the Arabic Presentation Forms, which are not characters you want to be using if you can help it, as I point out in It does not always pay to be compatible.
Even worse, it does not support all four forms of even the basic Arabic characters (no fault of the code page, it is a fault of all code pages since there is not enough room!).
So, the only real way to move between cp1256 and cp864 would be to write custom code to try to move into the presentation forms, and any time something is not supported I suppose just putting in the wrong form. Which would be a lot of work to support something that does not work very well anyway....
This post brought to you by "ﻱ" (U+fef1, a.k.a. ARABIC LETTER YEH ISOLATED FORM)
Rasqual asks: Hello Michael, I'll keep the question short: What makes a 'good' encoding, and what makes
From the list of bugs from that cool presentation from the folks over in Intel localization.... The bug?
If you really need to use an OEM codepage for Arabic, there is codepage 720.
864 also maps the ASCII percent sign to U+066A, which is for example used as an escape characters in URLs, which is why Mozilla want to remove support for it from Firefox:
www.w3.org/.../show_bug.cgi
bugzilla.mozilla.org/show_bug.cgi