Saturday, April 22, 2006 10:48 AM
Michael S. Kaplan
Unicode? Zip don't need no stinking Unicode!
I have talked about the limitations in ZIP before in the post Zipping up Unicode file names, but Heath has pointed out a new and interesting wrinkle in the problem in his post Update for the Palm Treo 700w Available, with Problems.
Now Heath may seem to some to be some kind of lightning rod for Unicode Lame List stories, but he isn't -- he is just a smart developer who is finding himself thrown into bad software situations that he did not design....
In this case we see the biggest problem with not using Unicode -- the basic problem of deciding what code page to use. It is probably not so much that zipfldr.dll is specifically using cp437 and cp1252, it is that it is using CP_OEMCP and CP_ACP.
What causes such a mistake to not get noticed, though? I mean, it is pretty un-natural to be using both constants, isn't it?
As luck (or unluck) would have it, they are not. The problem starts with the Shell folks, are using funky macros wrapped around funky shlwapi wrappers like SHAnsiToUnicode and SHUnicodeToAnsi. I call them funky because they are. They are also quite consistent in their underlying use of CP_ACP always.
And as for the rest of the problem, it looks like the CP_OEMCP is coming from the fact that it is a console app that is running things so that some of the translations are happening in this different context....
How smart is Palm feeling for putting ™ and ® in the filename, at this point? No wonder they took the update down. :-)
Clearly we'll need to see people using ASCII file names until people move up to Unicode. Code pages are just too damn confusing!
This post brought to you by "®" and "™" (U+00ae and U+2122, a.k.a. REGISTERED SIGN and TRADE MARK SIGN)