Compressing messages in WCF part one - Fixing the GZipMessageEncoder bug

This blog has moved to http://mode19.net.

Compressing messages in WCF part one - Fixing the GZipMessageEncoder bug

  • Comments 8

The compression options for WCF out of the box are limited in .Net 4.0. However, a sample is provided for GZip compression that shows you how to write your own MessageEncoder that can wrap the output of another encoder and apply GZip to the messages. If your environment has a network bandwidth limitation, compressing the messages going across the wire could be very helpful. In this series, we will be taking a look at how to use the GZip message encoder and what effect it has on your performance.

Download the WCF/WF Samples from here: http://www.microsoft.com/downloads/en/details.aspx?FamilyID=35ec8682-d5fd-4bc3-a51a-d8ad115a8792&displaylang=en

The first thing to do is examine the code for GZipMessageEncoder itself. Let's open up the solution. Download and install the WCF/WF samples to the directory of your choice. Then navigate to the WCF/Extensibility/MessageEncoder/Compression/CS directory and open the solution. Right-click on the solution in the solution explorer pane and choose "Set Startup Projects". Choose the Multiple startup projects radio button and use the dropdown to change the client and service projects' actions to "Start". Then you should be able to hit F5. The service and client windows should come up and execute, exchanging a couple messages back and forth.

The GZipMessageEncoder works by using another encoder underneath. In the sample, buffered messages are used. This means that the entire message is stored in a single continguous byte[]. We can examine the effect of compression on the buffered message by altering the code a bit to write the sizes before and after compression. To do this, open the GZipMessageEncodeFactory.cs file. Navigate to the GZipMessageEncoder class and the WriteMessage method that returns an ArraySegment<byte>. Alter the code as shown below:

//One of the two main entry points into the encoder. Called by WCF to encode a Message into a buffered byte array.
public override ArraySegment<byte> WriteMessage(Message message, int maxMessageSize, 
    BufferManager bufferManager, int messageOffset)
{
    //Use the inner encoder to encode a Message into a buffered byte array
    ArraySegment<byte> buffer = innerEncoder.WriteMessage(message, maxMessageSize, 
        bufferManager, 0);
    //Compress the resulting byte array
    System.Diagnostics.Debug.WriteLine("Original size: {0}", buffer.Count);
    buffer = CompressBuffer(buffer, bufferManager, messageOffset);
    System.Diagnostics.Debug.WriteLine("Compressed size: {0}", buffer.Count);
    return buffer;
}

This just writes to diagnostics the size of the buffer. Here we can see how well our messages are being compressed. Hit F5 again to run and then bring up the Output view window in Visual Studio. You should see something like this:

Original size: 751
Compressed size: 1024
Original size: 426
Compressed size: 512
Original size: 2714
Compressed size: 1024
Original size: 2382
Compressed size: 1024

There are a couple problems here. First, it looks like small messages actually get bigger. Second, the compressed sizes are in exact powers of two.

The first problem could be explained somewhat by the second problem. Let's examine the CompressBuffer code to see if we can find out what's wrong.

//Helper method to compress an array of bytes
static ArraySegment<byte> CompressBuffer(ArraySegment<byte> buffer, BufferManager bufferManager, 
    int messageOffset)
{
    MemoryStream memoryStream = new MemoryStream();
    
    using (GZipStream gzStream = new GZipStream(memoryStream, CompressionMode.Compress, true))
    {
        gzStream.Write(buffer.Array, buffer.Offset, buffer.Count);
    }

    byte[] compressedBytes = memoryStream.ToArray();
    int totalLength = messageOffset + compressedBytes.Length;
    byte[] bufferedBytes = bufferManager.TakeBuffer(totalLength);

    Array.Copy(compressedBytes, 0, bufferedBytes, messageOffset, compressedBytes.Length);

    bufferManager.ReturnBuffer(buffer.Array);
    ArraySegment<byte> byteArray = new ArraySegment<byte>(bufferedBytes, messageOffset, 
        bufferedBytes.Length - messageOffset);

    return byteArray;
}

The highlighted portion above is what's causing our problem. The bufferedBytes variable is a buffer taken from the BufferManager. The BufferManager will give you a buffer that is at least as large as what you asked for, usually rounding up to the nearest power of two. This means that when we write bufferedBytes.Length as the number of bytes in the ArraySegment, we're not getting the correct number. Instead, replace bufferedBytes.Length - messageOffset with compressedBytes.Length. Run the test again to see the improvements:

Original size: 751
Compressed size: 592
Original size: 426
Compressed size: 377
Original size: 2714
Compressed size: 874
Original size: 2382
Compressed size: 670

This looks much better! For those of you who are curious, I've already reported this bug to the samples team and it should be cleared up in the next release.

Leave a Comment
  • Please add 5 and 1 and type the answer here:
  • Post
  • Thank you for this!  Just what I needed.

  • What if one wants to use it in a custom binding that encrypts the message, how do you compress it before it is encrypted ? (I know that there is a security issue regarding doing this)

  • Hi there,

    is there a way to support dynamic compression. I.e. check if the client accepts gzip?

    Regards, Jeroen

  • Hi Jeroen,

    Yes there is. Actually it's built into WCF now.

    Take a look here: msdn.microsoft.com/.../aa751889(v=vs.110).aspx

    Scroll down to the heading "Compression and the Binary Encoder".

  • Dustin,

    Question 1:

    Why is

    byte[] bufferedBytes = bufferManager.TakeBuffer(totalLength);

    not

    byte[] bufferedBytes = bufferManager.TakeBuffer(compressedBytes.Length);

    Question 2:

    Why is

    Array.Copy(compressedBytes, 0, bufferedBytes, messageOffset, compressedBytes.Length);

    not

    Array.Copy(compressedBytes, 0, bufferedBytes, 0, compressedBytes.Length);

    Question 3:

    Why is

    ArraySegment<byte> byteArray = new ArraySegment<byte>(bufferedBytes, messageOffset,

           bufferedBytes.Length - messageOffset);

    not

    ArraySegment<byte> byteArray = new ArraySegment<byte>(bufferedBytes, 0,

           compressedBytes.Length);

    Thanks for any feedback.

  • @David

    It might help to look at this example as well: msdn.microsoft.com/.../ms195359(v=vs.110).aspx

    My understanding is that the messageOffset is telling you where the message should start inside the buffer. There is no content in that part of the buffer that you have to copy over, but WCF is asking you to leave space for something there. I'm not sure if that's for a header or for some other use.

  • @Dustin

    I initially thought that as well.  The more I thought about it though the more I didn't understand if the basis of the message handling was the BufferManager (to avoid a lot of array allocations to be GC) why wouldn't there just be another buffer for things of that nature rather than intermixing it with a message content buffer?

    The help file description of the messageOffset parameter was not very enlightening.

    Dug downstream into the TextMessageEncoder and the BinaryMessageEncoder to see if they stuffed anything in that space and they don't appear to.

    referencesource.microsoft.com/.../TextMessageEncoder.cs.html

    referencesource.microsoft.com/.../BinaryMessageEncoder.cs.html

    Further found interesting the notable use of 0 for messageOffset in

    public ArraySegment<byte> WriteMessage(Message message, int maxMessageSize, BufferManager bufferManager)

           {

               ArraySegment<byte> arraySegment = WriteMessage(message, maxMessageSize, bufferManager, 0);

               return arraySegment;

           }

    found in referencesource.microsoft.com/.../MessageEncoder.cs.html

    So that leaves upstream .... I think.

  • @David - Ya, it's definitely weird. It's one of those things that I think made sense to someone a long time ago but they didn't document it.

Page 1 of 1 (8 items)