Tuesday, May 10, 2005 6:31 PM
Michael S. Kaplan
Zipping up Unicode file names
Let's create the following filenames:
- αβγδεζηθ.txt
- АБВГДЕЖЗ.txt
- אבגדהוזח.txt
- กขฃคฅฆจ.txt
(they can be empty or have data in them)
And then try to zip them up with your favorite program (I'll use WinZip, you can use anything you like here).
The zip will fail, in the case of WinZip with the following error:
---------------------------
WinZip
---------------------------
Error: No files were found for this action that match your criteria - nothing to do. (C:\TEMP\TEMP.zip).
---------------------------
OK Help
---------------------------
And then if you choose to look at the error log you will see why you had zero files instead of the four you asked it zip up:
Action: Add (and replace) files Include subfolders: yes Save full path: no
Include system and hidden files: yes
"C:\TEMP\aß?de???.txt" is not a valid file name and was skipped
"C:\TEMP\????????.txt" is not a valid file name and was skipped
"C:\TEMP\????????.txt" is not a valid file name and was skipped
Warning: name not matched: C:\TEMP\????????.txt
"C:\TEMP\aß?de???.txt" is not a valid file name and was skipped
"C:\TEMP\????????.txt" is not a valid file name and was skipped
"C:\TEMP\????????.txt" is not a valid file name and was skipped
Warning: name not matched: C:\TEMP\????????.txt
"C:\TEMP\???????.txt" is not a valid file name and was skipped
Warning: name not matched: C:\TEMP\???????.txt
"C:\TEMP\aß?de???.txt" is not a valid file name and was skipped
Warning: name not matched: C:\TEMP\aß?de???.txt
Error: No files were found for this action that match your criteria - nothing to do. (C:\TEMP\TEMP.zip)
Your mileage may vary if your default system code page supports one of these filenames, and those question marks with best fit mappings for the Greek names will probably give the clues as to what is going on here.
The ZIP format is fine with Unicode data in filenames, but is not so fine with the filenames themselves being off of the default system code page.
Curses, foiled again!
Now one could work around this by using the short file names, but this would have a negative impact on being able to use them in the ZIP file:
- αβγδεζηθ.txt --> 3864~1.TXT
- АБВГДЕЖЗ.txt --> 833B~1.TXT
- אבגדהוזח.txt --> A0E9~1.TXT
- กขฃคฅฆจ.txt --> 0344~1.TXT
I think we need to have someone look into an extension to the ZIP format....
This post brought to you by "Ž" (U+017d, a.k.a. LATIN CAPITAL LETTER Z WITH CARON)
(which is unfortunately not a zippable file name character on most code pages)