The issue

Today I‘m investigating a WCF compatibility issue reported by a customer. The scenario is: The customer has a WCF service running on a machine with .NET 4.0, and a WCF client running on another machine with .NET 4.0 and everything works fine. Then the customer upgrades the client machine to .NET 4.5 and all of a sudden the client and service are not able to communicate.

Here is the DataContract being used:

 

[DataContract(Namespace = "http://temp/schemas")]

public abstract class BaseClass

{

    [DataMember(IsRequired = false, EmitDefaultValue = false)]

    private int _rowId;

    [DataMember(IsRequired = false, EmitDefaultValue = false)]

    private DateTime _addDate;

    [DataMember(IsRequired = false, EmitDefaultValue = false)]

    private string _developerNotes;

    [DataMember(Order = int.MaxValue, IsRequired = true, Name = "IsDirty", EmitDefaultValue = true)]

    public bool IsDirty { get; set; }

}

Looking at the XML serialization produced by WCF 4.0 and 4.5, we can see the members are ordered differently, no wonder the client and the service cannot talk to each other.

XML serialization produced by WCF in .NET 4.0:

<BaseClass xmlns="http://temp/schemas" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">

<IsDirty>true</IsDirty>

<_addDate>2013-04-01T01:01:02</_addDate>

<_developerNotes>DeveloperNotes</_developerNotes>

<_rowId>1</_rowId>

</BaseClass>

XML serialization produced by WCF in .NET 4.5:

<BaseClass xmlns="http://temp/schemas" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">

<_addDate>2013-04-01T01:01:02</_addDate>

<_developerNotes>DeveloperNotes</_developerNotes>

<_rowId>1</_rowId>

<IsDirty>true</IsDirty>

</BaseClass>

The cause

My initial thought is there might be some changes in WCF that caused the member order to be changed, but I would also be surprised if that’s the case as this would be a big breaking change. It turned out to not be the case. But there must be something changed, right? Exactly! It turned out what has changed was the sort algorithm in BCL.

Wait, you might ask, shouldn’t the sort algorithm change be transparent to upper level applications? Well, theoretically it should be, but there’re always situations when reality rear its ugly head.

Before WCF serializes data contract classes into XML, it will sort the members according to its ordering rule. In short, a member that has an order specified will appear after a member that doesn’t have an order specified. If both members have order specified, the one with smaller order will appear first. If both has the same order value, then alphabetic order will be used.

In fact the ordering rule is a bit complicated, especially when a comparer is concerned. But WCF managed to implement one simple comparer that worked almost perfectly – until Int.MaxValue comes into the picture.

Below is the comparer class that WCF implemented, to make sure members that has no order specified appear before members that have orders specified, it used a trick, i.e. to assign them an order of -1. In this way, it can treat all members in a unified way.

internal class DataMemberComparer : IComparer<DataMember>
{
    public int Compare(DataMember x, DataMember y)
    {
        int num = x.Order - y.Order;
        if (num != 0)
        {
            return num;
        }
        return string.CompareOrdinal(x.Name, y.Name);
    }
}

Now let’s look at what happens during sorting. Below is the first comparison that happens on .NET 4.0 and .NET 4.5. Can you see what’s wrong there?

  x.Name /
x.Order
y.Name /
y.Order
x.Order - y.Order CompareResult
.NET 4.0 IsDirty /
2147483647
_rowId /
-1

-2147483648
< 0
.NET 4.5 _rowId /
-1
IsDirty / 2147483647
-2147483648
< 0

Can you see the difference? And can you see what’s wrong there?

Actually there’re 2 issues here. The first issue is that the comparison WCF did in .NET 4.0 produces incorrect result, it’s a typical wrap around error that happens in boundary cases. Mathematically, Int.MaxValue - (-1) = Int.MaxValue + 1 = 2147483648, still a positive value, however, in reality, the maximum positive integer that a 32 bit integer can represent is 2147483647, adding 1 would cause overflow and the result becomes -2147483648 (Int.MinValue), which is negative and wrong.

The second issue is that in .NET 4.5, the change of sort algorithm causing the 2 members to be passed in in the reverse order. This time because a 32 bit integer can represent 1 more negative value than positive value, the comparison produces the right result: -1 - Int.MaxValue = –2147483648 = Int.MinValue.

The compatibility issue is caused by the combination of these 2 issues. Even if WCF in .NET 4.0 sorted members incorrectly, as long as it’s talking to .NET 4.0, everything is fine as the other side would be expecting the same (wrong) order. Same for .NET 4.5. But when .NET 4.0 and 4.5 are mixed, the problem arises as the two side are expecting different orders.

Had WCF implemented the comparison correctly, it would be resilient to any sort algorithm changes. Had the sort algorithm not changed in .NET 4.5, WCF would continue to produce the same (incorrect) result and interact well across different versions.

The options

OK, now we’ve understood the root cause, let’s have the joy of evaluating different options.

The first option is to revert the change to the BCL sort algorithm. But in fact it didn’t do anything wrong, as long as you implement the comparer correctly, it will still produce the right sort result – and in a much faster way! The change to the BCL sort algorithm in .NET 4.5 is for the good, even if you don’t do anything to your application, your application would get a much faster sort for free when moving from .NET 4.0 to 4.5. In my specific case, the sort used to take 17 comparisons in .NET 4.0, but now it only takes 6 comparisons in .NET 4.5. That should give you a better idea regarding how faster it is. So 99.99% of the .NET 4.5 customers would benefit from this change. Does it make sense to revert it just for a few customers? This choice might not be too hard. Check out here if you are interested in the new sort algorithm implemented in .NET 4.5.

The second option is to fix the comparer in WCF. However this isn’t an easy task for several reasons. First we can only fix the behavior in .NET 4.5 and later versions, as that’s where the behavior change happened. But on the other hand in our case the comparison is done correctly. So in order to make it produce the same result as 4.0, we’ll need to emulate the wrong behavior in 4.0. That indeed is a mission impossible as there’s simply no way to emulate the wrong behavior in 4.0, as the wrong behavior can also happen in .NET 4.5 – it depends on the initial order. The only possible way might be to have WCF duplicate the sort algorithm from .NET 4.0 into .NET 4.5, so that the comparison sequences remain unchanged, but obviously that’s not an easy task, let alone there’s no guarantee you can achieve 100% compatibility. Second, even worse, since .NET 4.5 has been released for a while and a lot of new applications could have been written based on the new behavior, reverting to 4.0 behavior means breaking these new applications. What a dilemma!

The third option is to fix the application. Compare with the above 2 options, this might be the most straightforward option. Just change Int.MaxValue to another values (e.g. Int.MaxValue - 1) and you’re guaranteed to have the same member order across different .NET versions.

Conclusion

If you run into this exact issue, you should already know what to do to solve this issue by reading this far.

I think WCF should check the value of Order when validating data contracts and disallows the use of Int.MaxValue as Order, but unfortunately that didn’t happen and might never happen. That’s the whole point of this post: To raise people’s awareness of this issue and hopefully to prevent it from happening in the future.

The only thing you should remember is, as the title says, if you’re using data contracts, don’t ever set DataMember.Order to Int.MaxValue, otherwise this issue will bite you some time later when your service are deployed on to different versions of .NET (4.0- vs. 4.5+).