05 April 2006

.NET System.IO.Compression and zip files

DotNetZip Library

.NET 2.0 does Compression

[Update 30 October 2007]: I moved this library to a CodePlex project, called DotNetZip. See www.codeplex.com/DotNetZip.  It does zip creation, extraction, passwords, ZIP64, Unicode, SFX, and more. It is open source, free Free FREE to use, has a clear license, and comes with .NET-based ZIP utilities. It works on the Compact Framework or the regular .NET Framework.  It is not the same as #ziplib or SharpZipLib.  DotNetZip is independent.

There's a new namespace in the .NET Framework base class library for .NET 2.0, called System.IO.Compression. It has classes called DeflateStream and GZipStream.

These classes are streams; they're useful for compressing a stream of bytes as you transfer it, for example across the network to a cooperating application (a peer, or a client, whatever). The DeflateStream implements the Deflate algorithm, see the IETF's RFC 1951. "DEFLATE Compressed Data Format Specification version 1.3." The GZipStream is an elaboration of the Deflate algorithm, and adds a cyclic-redundancy-check. For more on GZip, see the IETF RFC 1952, "Gzip".

Gzip has been done

The GZip format described in RFC 1952 is also used by the popular gzip utility included in many *nix distributions. The Base Class Library team at Microsoft previously published example source code for a simple utility that behaves just like the *nix gzip, but is written in .NET and based on the GZipStream class. This simple utility can interoperate with the *nix gzip, can read and write .gz files.

What about .zip files?

As a companion to that example, enclosed here as an attachment (see the bottom of this post) is an example class than can read and write zip archives. It is packaged as a re-usable library, as well as a couple of companion example command-line applications that use the library. The example apps are useful on their own, for example for just zipping up a directory quickly, from within a script or a command-prompt. But the library will be useful also, for including zip capability into arbitrary applications. For example, you could include a zip task in a msbuild session, or into a smart-client GUI application. I've included both the binaries and source code here.

This is the class diagram for the ZipFile class, and the ZipEntry class, as generated by Visual Studio 2005. The ZipFile is the main class.

If you don't quite grok all that notation, I will point out a few highlights. The ZipFile itself supports a generic IEnumerable interface. What this means is you can enumerate the ZipEntry's within the ZipFile using a foreach loop. Makes usage really simple. ( Implementing that little trick is also dead-simple, thanks to the new-for-2.0 support for iterators in C# 2.0, and the "yield return" statement.)

Using the ZipFile class

You can extract all files from an existing .zip file by doing this:

        ZipFile zip = ZipFile.Read("MyZip.zip");

        foreach (ZipEntry e in zip)

        {

            e.Extract("NewDirectory");

        }

Of course, you don't have want to extract the files, you can just fiddle with the properties on the ZipEntry things in the collection. Creating a new .zip file is also simple:

      ZipFile zip= new ZipFile("MyNewZip.zip");

      zip.AddDirectory("My Pictures", true); // AddDirectory recurses subdirectories

      zip.Save();

You can add a directory at a time, as shown above, and you can add individual files as well. It seems to be pretty fast, though I haven't benchmarked it. It doesn't compress as much as winzip; This library is at the mercy of the DeflateStream class, and that class doesn't support multiple levels of compression.

Hmmm, What About Intellectual Property?

I am no lawyer, but it seems to me the ZIP format is PKware's intellectual property. PKWare has some text in their zip spec which states:

PKWARE is committed to the interoperability and advancement of the .ZIP format. PKWARE offers a free license for certain technological aspects described above under certain restrictions and conditions. However, the use or implementation in a product of certain technological aspects set forth in the current APPNOTE, including those with regard to strong encryption or patching, requires a license from PKWARE. Please contact PKWARE with regard to acquiring a license.

I checked with pkware for more on that.  I described what I was doing with this example, and got a nice email reply from Jim Peterson at PKWare, who wrote:

From the description of your intended need, no license would be necessary for the compression/decompression features you plan to use.

Which would mean, anyone could use this example without a license. But like I said, I am no lawyer.

Later,

-Dino

[Update 11 April 2006 1036am US Pacific time]: After a bit of testing it seems that there are some anomalies with the DeflateStream class in .NET. One of them is, it performs badly with already compressed data. The DeflateStream in .NET can actually Inflate the size of the stream. The output is still a valid Deflate stream, but it isn't compressed as you'd like. The DotNetZip implementation works around this by using the STORE method rather than DEFLATE when data size increases.  But still....

The base class library team is aware of this anomaly and is considering it. If you'd like to weigh in on this behavior, and I encourage you to do so if you value this class, use the Product Feedback Center, see here.

 

Filed under: , ,
 

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# Tim Heron said:
It's a pity that GZipStream doesn't work with streams over 4GB in size.  GNU gzip can cope with >4GB files so why this limitation ? http://www.gzip.org/#faq10
05 April 06 at 12:06 PM
# CedarLogic said:
.NET System.IO.Compression and zip files .NET Zip Library.NET 2.0 does CompressionThere's a new namespace...
05 April 06 at 12:53 PM
# TravisOwens said:
Don't get PKZIP and GZIP confused, while PKZIP came about 2-3yrs before GZIP, GZIP is not a *nix implementation of PKZIP, although both deflators support each other's format.

If .Net is using GZIP's method (which fully works in PKZIP, WinZip, etc) then the licensing is a non issue anyways.
05 April 06 at 1:48 PM
# Jeff Parker said:
Ohhhh, brilliant I was looking for something like this the other day when I was playing in the compression namespace.
05 April 06 at 1:49 PM
# The daily link said:
06 April 06 at 4:50 AM
# DotNetInterop said:
Travis, thanks for reminding us all that Gzip and Pkzip are different.  I should have pointed that out.  Both Gzip (the *nix utility) and Pkzip (the commercial tool) do standard compression (see the IETF RFC's mentioned in the original entry).  But Gzip compresses a single file, and pkzip builds compressed archives.
I think you are jumping to conclusions when you suggest that because .NET's compression library uses the Deflate algorithm, there are no IP issues.  PKWARE defines the format for .zip files, and that format is theirs to license.  They don't have a license on the compression format, but on the surrounding data that describes the multi-file archive.

I contacted PKWARE and they agreed that the usage here is covered under their "free" license terms.  But it is still PKWARE's intellectual property, and it is still a license, though I did not pay for it.  Keep in mind, I am not a lawyer.
-Dino
06 April 06 at 3:34 PM
# DotNetInterop said:
Tim, I don't know about the 4GB limit - if it is real, and if so, why it is there.  
I would suggest posting to http://forums.microsoft.com/MSDN/showforum.aspx?forumid=39&siteid=1

-Dino
06 April 06 at 3:55 PM
# Nicolai's Blog said:
Linkliste 08.04.2006

Software
HFS - Http File Server - Ein kleiner Fileserver der keine Installation benötigt. Sourcecode ist auch verfügbar. [via Portable Freeware]


.Net
.NET System.IO.Compression and zip files - Eine Zip Library basierend
08 April 06 at 7:11 PM
# XIU’s Blog » Blog Archive » links for 2006-05-03 said:
PingBack from http://xiu.shoeke.com/archives/2006/05/03/links-for-2006-05-03/
02 May 06 at 8:44 PM
# Colin said:
Hi

I have managed to implement the zipping of a file, but when I close down my Form, I get a

"System.MissingMethodException". This seems to relate to the zip.Dispose method.

Any ideas on how I can correct this?

Many thanks
Colin
10 May 06 at 11:41 AM
# This Old Code said:

Finding a way to use system.io.compression for zip archives

16 February 07 at 10:28 PM
# Jon Galloway said:

Overview SharpZipLib provides best free .NET compression library, but what if you can't use it due to

25 October 07 at 4:08 AM
# Mohan said:

Hai

i have used it in my code but have a problem with it. my folder size before zipping is 599 kb and after zipping is 998 kb so what is the way to zip  it in  a way  to decrease the file size

28 August 08 at 2:07 AM
# DotNetInterop said:

Mohan - What version of the Zip Library are you using?  You will want to get the latest version of this library, from www.codeplex.com/DotNetZip.  It corrects the problem where some files get "inflated" when they are zipped

02 September 08 at 11:56 AM
# john said:

the 4gig limit is probably due to the physical memory limitations of addressing space in the system.

IE the .net framework is not designed to touch the disk whilst it compresses

If you wanted to exceed this then you would have to write a pagefile like system to store processed data whilst using the 4 gig as a buffer

18 December 08 at 10:41 AM
# DotNetInterop said:

@john, The 4g limit mentioned above has nothing to do with the physical memory of the machine.  It is related to the DeflateStream implementation.  I haven't explored it well, so

I cannot say more than that.

it does not have to do with whether the implementation is streaming or not (viz, "not designed to touch the disk while compressing").  

23 December 08 at 1:19 AM

Leave a Comment

Comment Policy: No HTML allowed. URIs and line breaks are converted automatically. Your e–mail address will not show up on any public page.

(required) 
(optional)
(required) 
Page view tracker