05 April 2006

.NET System.IO.Compression and zip files

.NET Zip Library

.NET 2.0 does Compression

[Update 30 October 2007]: I moved this library to a CodePlex project. See www.codeplex.com/DotNetZip

There's a new namespace in the .NET Framework base class library for .NET 2.0, called System.IO.Compression. It has classes called DeflateStream and GZipStream.

These classes are streams; they're useful for compressing a stream of bytes as you transfer it, for example across the network to a cooperating application (a peer, or a client, whatever). The DeflateStream implements the Deflate algorithm, see the IETF's RFC 1951. "DEFLATE Compressed Data Format Specification version 1.3." The GZipStream is an elaboration of the Deflate algorithm, and adds a cyclic-redundancy-check. For more on GZip, see the IETF RFC 1952, "Gzip".

Gzip has been done

The GZip format described in RFC 1952 is also used by the popular gzip utility included in many *nix distributions. The Base Class Library team at Microsoft previously published example source code for a simple utility that behaves just like the *nix gzip, but is written in .NET and based on the GZipStream class. This simple utility can interoperate with the *nix gzip, can read and write .gz files.

What about .zip files?

As a companion to that example, enclosed here as an attachment (see the bottom of this post) is an example class than can read and write zip archives. It is packaged as a re-usable library, as well as a couple of companion example command-line applications that use the library. The example apps are useful on their own, for example for just zipping up a directory quickly, from within a script or a command-prompt. But the library will be useful also, for including zip capability into arbitrary applications. For example, you could include a zip task in a msbuild session, or into a smart-client GUI application. I've included both the binaries and source code here.

This is the class diagram for the ZipFile class, and the ZipEntry class, as generated by Visual Studio 2005. The ZipFile is the main class.

If you don't quite grok all that notation, I will point out a few highlights. The ZipFile itself supports a generic IEnumerable interface. What this means is you can enumerate the ZipEntry's within the ZipFile using a foreach loop. Makes usage really simple. ( Implementing that little trick is also dead-simple, thanks to the new-for-2.0 support for iterators in C# 2.0, and the "yield return" statement.)

Using the ZipFile class

You can extract all files from an existing .zip file by doing this:

        ZipFile zip = ZipFile.Read("MyZip.zip");

        foreach (ZipEntry e in zip)

        {

            e.Extract("NewDirectory");

        }

Of course, you don't have want to extract the files, you can just fiddle with the properties on the ZipEntry things in the collection. Creating a new .zip file is also simple:

      ZipFile zip= new ZipFile("MyNewZip.zip");

      zip.AddDirectory("My Pictures", true); // AddDirectory recurses subdirectories

      zip.Save();

You can add a directory at a time, as shown above, and you can add individual files as well. It seems to be pretty fast, though I haven't benchmarked it. It doesn't compress as much as winzip; This library is at the mercy of the DeflateStream class, and that class doesn't support multiple levels of compression.

Hmmm, What About Intellectual Property?

I am no lawyer, but it seems to me the ZIP format is PKware's intellectual property. PKWare has some text in their zip spec which states:

PKWARE is committed to the interoperability and advancement of the .ZIP format. PKWARE offers a free license for certain technological aspects described above under certain restrictions and conditions. However, the use or implementation in a product of certain technological aspects set forth in the current APPNOTE, including those with regard to strong encryption or patching, requires a license from PKWARE. Please contact PKWARE with regard to acquiring a license.

I checked with pkware for more on that.  I described what I was doing with this example, and got a nice email reply from Jim Peterson at PKWare, who wrote:

From the description of your intended need, no license would be necessary for the compression/decompression features you plan to use.

Which would mean, anyone could use this example without a license. But like I said, I am no lawyer.

Later,

-Dino

[Update 11 April 2006 1036am US Pacific time]: After a bit of testing it seems that there are some anomalies with the DeflateStream class in .NET. One of them is, it performs badly with already compressed data. For example, imagine that you use the zipDir.exe tool included in the attachment here, to zip a directory that itself contains a large zip file. The DeflateStream in .NET can actually Inflate the size of the stream. The output is still valid, but it isn't compressing as you'd like.

The base class library team is aware of this anomaly and is considering it. If you'd like to weigh in on this behavior, and I encourage you to do so if you value this class, use the Product Feedback Center, see here.

[updated with latest source, 28 march 2007]

Filed under: ,
 

Comments

# Tim Heron said:
It's a pity that GZipStream doesn't work with streams over 4GB in size.  GNU gzip can cope with >4GB files so why this limitation ? http://www.gzip.org/#faq10
05 April 06 at 12:06 PM
# CedarLogic said:
.NET System.IO.Compression and zip files .NET Zip Library.NET 2.0 does CompressionThere's a new namespace...
05 April 06 at 12:53 PM
# TravisOwens said:
Don't get PKZIP and GZIP confused, while PKZIP came about 2-3yrs before GZIP, GZIP is not a *nix implementation of PKZIP, although both deflators support each other's format.

If .Net is using GZIP's method (which fully works in PKZIP, WinZip, etc) then the licensing is a non issue anyways.
05 April 06 at 1:48 PM
# Jeff Parker said:
Ohhhh, brilliant I was looking for something like this the other day when I was playing in the compression namespace.
05 April 06 at 1:49 PM
# The daily link said:
06 April 06 at 4:50 AM
# DotNetInterop said:
Travis, thanks for reminding us all that Gzip and Pkzip are different.  I should have pointed that out.  Both Gzip (the *nix utility) and Pkzip (the commercial tool) do standard compression (see the IETF RFC's mentioned in the original entry).  But Gzip compresses a single file, and pkzip builds compressed archives.
I think you are jumping to conclusions when you suggest that because .NET's compression library uses the Deflate algorithm, there are no IP issues.  PKWARE defines the format for .zip files, and that format is theirs to license.  They don't have a license on the compression format, but on the surrounding data that describes the multi-file archive.

I contacted PKWARE and they agreed that the usage here is covered under their "free" license terms.  But it is still PKWARE's intellectual property, and it is still a license, though I did not pay for it.  Keep in mind, I am not a lawyer.
-Dino
06 April 06 at 3:34 PM
# DotNetInterop said:
Tim, I don't know about the 4GB limit - if it is real, and if so, why it is there.  
I would suggest posting to http://forums.microsoft.com/MSDN/showforum.aspx?forumid=39&siteid=1

-Dino
06 April 06 at 3:55 PM
# Nicolai's Blog said:
Linkliste 08.04.2006

Software
HFS - Http File Server - Ein kleiner Fileserver der keine Installation benötigt. Sourcecode ist auch verfügbar. [via Portable Freeware]


.Net
.NET System.IO.Compression and zip files - Eine Zip Library basierend
08 April 06 at 7:11 PM
# XIU’s Blog » Blog Archive » links for 2006-05-03 said:
PingBack from http://xiu.shoeke.com/archives/2006/05/03/links-for-2006-05-03/
02 May 06 at 8:44 PM
# Colin said:
Hi

I have managed to implement the zipping of a file, but when I close down my Form, I get a

"System.MissingMethodException". This seems to relate to the zip.Dispose method.

Any ideas on how I can correct this?

Many thanks
Colin
10 May 06 at 11:41 AM
# ignition00 said:
It just does not work, it won't add files. It writes the header but not the data.
sux0rs
31 May 06 at 6:02 AM
# DotNetInterop said:
Umm, I just downloaded the attachment, and tried it, and it works for me. you must be having some other problem.  -Dino
01 June 06 at 4:24 PM
# This Old Code said:

Finding a way to use system.io.compression for zip archives

16 February 07 at 10:28 PM
# Jon Galloway said:

Overview SharpZipLib provides best free .NET compression library, but what if you can't use it due to

25 October 07 at 4:08 AM
New Comments to this post are disabled
Page view tracker