The Compare Contract [Kim Hamilton]

The Compare Contract [Kim Hamilton]

Rate This
  • Comments 13

A breaking change?

We recently heard from a customer who observed different sorting behavior in .NET FX 3.5 SP1 compared to 3.5 RTM.

The different behavior was demonstrated with the following code. The class StringWrapper provided a custom sort in which nulls (null StringWrapper references) were moved to the end of the array. To achieve this, StringWrapper implemented IComparable<StringWrapper> and in its CompareTo method, nulls were always greater than non-nulls.

public class MyClass {
    public static void Main() {
        StringWrapper a = new StringWrapper();
        a.Value = "a";
        StringWrapper b = new StringWrapper();
        b.Value = "b";
        StringWrapper c = new StringWrapper();
        c.Value = "c";
        StringWrapper d = null;

        Console.WriteLine("Sort 1:");
        StringWrapper[] src1 = new StringWrapper[] { a, c, d, b };
        Array.Sort(src1);
        PrintStringWrappers(src1); // print elements, method included at end
    }
}

public class StringWrapper : IComparable<StringWrapper> {
    private string _value;
    public string Value {
        get { return _value; }
        set { _value = value; }
    }

    // Recall that CompareTo returns:
    // <0 if this object is less than other
    //  0 if this object is equal to other
    // >0 if this object is greater than other
    public int CompareTo(StringWrapper other) {
        if (other == null) return -1; // nulls are greater than any non-null
        return Value.CompareTo(other.Value);
    }

    public override string ToString() {
        return Value;
    }
}

This custom comparison apparently worked in .NET FX 3.5 RTM, but not .NET FX 3.5 SP1.

3.5 RTM output:

a
b
c
null

3.5 SP1 output:

a
b
null
c

Did the custom comparer really work?

The custom comparer worked in that example, in which only one of the array elements was null. Let’s throw in another null and see what happens:

Console.WriteLine("Sort 2:");
StringWrapper[] src2 = new StringWrapper[] { a, d, d, c, b };
Array.Sort(src2);
PrintStringWrappers(src2);

3.5 RTM output:

a
b
null
null
c

The problem is in the CompareTo method, but it isn’t obvious. The first line in CompareTo actually violates the IComparable<T>.CompareTo contract that any object compares greater than a null reference.

public int CompareTo(StringWrapper other) {
    if (other == null) return -1; // violates CompareTo contract!
    return Value.CompareTo(other.Value);
}

The Compare Contract

IComparable<T>.CompareTo has the following requirements (from the MSDN docs)

For objects A, B, and C, the following must be true:

  • A.CompareTo(A) is required to return zero.
  • If A.CompareTo(B) returns zero, then B.CompareTo(A) is required to return zero.
  • If A.CompareTo(B) returns zero and B.CompareTo(C) returns zero, then A.CompareTo(C) is required to return zero.
  • If A.CompareTo(B) returns a value other than zero, then B.CompareTo(A) is required to return a value of the opposite sign.
  • If A.CompareTo(B) returns a value x that is not equal to zero, and B.CompareTo(C) returns a value y of the same sign as x, then A.CompareTo(C) is required to return a value of the same sign as x and y.

By definition, any object compares greater than a null reference (Nothing in Visual Basic), and two null references compare equal to each other.

The requirement that any object compares greater than null is a bit of a footnote at the end, so it makes sense that this may not be well known (we should highlight this more).

Still, why did the behavior change? (Gory details and a 3.5 RTM performance bug)

The high-level problem is that .NET’s sorting makes assumptions based on the Compare contract, so in some cases sorting will special case null, because it “knows” your comparer will adhere to the contract and return values consistent with that assumption. If you don’t adhere to the contract, you’ll get bitten at some point.

The details are messier. Over the years, a variety of changes were made to improve the performance of Sort. There was a brief window (released in 3.5 RTM) where, during QuickSort, the swap and pivot steps were broken, and in intermediate steps, it would actually unsort certain already sorted arrays (the array had to contain  null). The end result would be correct, but because the elements were incorrectly swapped, sorting took much longer than it should.

We fixed this bug in SP1, and this fix caused the different behavior between RTM and SP1.

Comply with Compare contract

The only way to ensure stability across versions is to comply with the Compare contract. Otherwise you fall prey to implementation quirks (or even bugs) in the runtime sort implementation. We’d like to be free to change it between releases, because we’d like to keep improving the performance!

Enough preaching, what are my options?

The interesting thing is that the StringWrapper implementation gets you mostly there. The apparent goal of the wrapper is to implement a special sort of strings that pushes nulls to the end. You can do what you want if you create a StringWrapper with null values, as follows:

public int CompareTo(StringWrapper other) {
    if (other == null) return 1;
    if (Value == null && other.Value == null) return 0;
    if (Value == null) return 1;
    if (other.Value == null) return -1;
    return Value.CompareTo(other.Value);
}

This actually obeys the compare contract and won’t be brittle to changes in our sort implementation.

Full Source

using System;
using System.Collections.Generic;

public class MyClass {
    public static void Main() {

        StringWrapper a = new StringWrapper();
        a.Value = "a";
        StringWrapper b = new StringWrapper();
        b.Value = "b";
        StringWrapper c = new StringWrapper();
        c.Value = "c";
        StringWrapper d = null;
        StringWrapper e = new StringWrapper();
        e.Value = "e";
        StringWrapper f = new StringWrapper();
        f.Value = "f";


        Console.WriteLine("Sort 1:");
        StringWrapper[] src1 = new StringWrapper[] { a, c, d, b };
        Array.Sort(src1);
        PrintStringWrappers(src1);
        // .NET FX 3.5 RTM output: a b c null
        // .NET FX 3.5 SP1 output: a b null c

        Console.WriteLine("-----");
        Console.WriteLine("Sort 2:");
        StringWrapper[] src2 = new StringWrapper[] { a, d, d, c, b };
        Array.Sort(src2);
        PrintStringWrappers(src2);
        // .NET FX 3.5 RTM output: a b null null c
    }

    private static void PrintStringWrappers(StringWrapper[] swArray) {
        foreach (StringWrapper sw in swArray) {
            if (sw == null)
                Console.WriteLine("null");
            else
                Console.WriteLine(sw);
        }

    }
}

public class StringWrapper : IComparable<StringWrapper> {
    private string _value;
    public string Value {
        get { return _value; }
        set { _value = value; }
    }

    // original CompareTo -- violates Compare contract!
    public int CompareTo(StringWrapper other) {
        if (other == null) return -1;
        return Value.CompareTo(other.Value);
    }

    // alternate CompareTo, which obeys Compare contract and moves null Values to the end
    /*
    public int CompareTo(StringWrapper other)
    {
        if (other == null) return 1;
        if (Value == null && other.Value == null) return 0;
        if (Value == null) return 1;
        if (other.Value == null) return -1;
        return Value.CompareTo(other.Value);
    }
    */

    public override string ToString() {
        return Value;
    }
}
  • PingBack from http://www.easycoded.com/the-compare-contract-kim-hamilton/

  • This requirement treat nulls as the smallest values is documented as applying to:

    - System.IComparable.CompareTo

    - System.IComparable<T>.CompareTo

    - System.Collections.IComparer.Compare

    - System.Collections.Generic.IComparer<T>.Compare

    But does it apply to the System.Comparison<T> delegate?  The documentation does not say.

  • Of course it doesn't Kalle, since it accepts two arguments and you can perform any sorting you like there.

  • IComparer.Compare and IComparer<T>.Compare accept two arguments too, yet they have the same requirement.  I can understand that nulls must be treated specially in IComparable and IComparable<T>, because it is not possible to call ((IComparable)a).Compare(b) when a is null.  In IComparer and IComparer<T> however, either parameter could easily be null.  I don't see any reason to require IComparer and IComparer<T> to sort nulls first, but the documentation requires that anyway.  And it seems quite odd to me if Comparison<T> and IComparer<T>.Compare have different requirements.

  • Yes, that's what I mean. Since the CompareTo is an instance method it won't be called on nulls.

    IComparer<T> and Comparison<T>, be it odd or not, behave differently. Try it yourself:

    using System;

    using System.Collections.Generic;

    public class StringWrapper : IComparable<StringWrapper>

    {

    public string Value { get; set; }

    public StringWrapper(string value)

    {

    Value = value;

    }

    public int CompareTo(StringWrapper other)

    {

    if(other == null)

    return -1;

    return Value.CompareTo(other.Value);

    }

    }

    public class MyComparer : IComparer<StringWrapper>

    {

    public int Compare(StringWrapper first, StringWrapper second)

    {

    if(first == null)

    return 1;

    if(second == null)

    return -1;

    return first.Value.CompareTo(second.Value);

    }

    }

    public class MyClass

    {

    public static void Main()

    {

    StringWrapper a = new StringWrapper("a");

    StringWrapper b = new StringWrapper("b");

    StringWrapper c = new StringWrapper("c");

    StringWrapper d = null;

    StringWrapper e = new StringWrapper("e");

    StringWrapper f = null;

    var array1 = new[] {e, c, f, b, d, a};

    var array2 = new[] {e, c, f, b, d, a};

    var array3 = new[] {e, c, f, b, d, a};

    Array.Sort(array1);

    foreach(var el in array1)

    WL(el != null ? el.Value : "null");

    WL("");

    Array.Sort(array2, new MyComparer());

    foreach(var el in array2)

    WL(el != null ? el.Value : "null");

    WL("");

    Array.Sort(array3, new MyComparer().Compare);

    foreach(var el in array3)

    WL(el != null ? el.Value : "null");

    }

    private static void WL(string text)

    {

    Console.WriteLine(text);

    }

    }

  • Hi Kim, it was enjoyable to get some insights into the BCL. I've been looking through some of the xml classes and I would love to know more about their internal design. I realize this is not the right place to ask, but do you know a better place?

    For example, I'd like to know why XmlTextReader has an internal class called XmlTextReaderImpl.

    Thanks in advance!

    Regards,

    Daniel Lidström,

    Stockholm, Sweden

  • Hi Kalle and Simone,

    Sorry for the delay. It's correct that, with IComparables, nulls may be special cased to avoid calling CompareTo on null.

    If you specify a comparer other than the default comparer, then that comparer will be used.

    Simone - are you saying when you run that repro you get different results for IComparer<T> and Comparison<T>? I get the same output, i.e.:

    a, b, c, e, null, null

    Let me know if you're seeing different behavior.

    To get back to the original question: I'm not sure why our docs say this requirement for IComparer and IComparer<T>, because no special cases are being made. My guess is it's either a doc error or based on an earlier assumption that IComparers should always do this. I'll look into this and reply.

    Thanks,

    Kim

  • @Kim

    no I'm not saying that IComparer and Comparison have different behavior, quite the opposite. I mean that they have different behavior compared to IComparable, that is, they can sort nulls correctly.

    The docs of IComparer<T>.Compare method say:

    A null reference is considered to be less than any reference that is not null.

    To me, this doesn't mean anything. I am doing the custom sorting, so who is considering null references less than non null references?

  • Let's say we want to sort a List<T> and put null values at the end, how can we achieve that goal then ? Seems to me that you impose a non obvious semantic contract for the sake of performance only.

    IComparable, IEquatable and others similar interfaces are broken anyway as their behavior on null is undefined. IMO a functional approach would have been much better (ie Comparison<T> or even IComparer<T>).

    Why have so many different ways (IEquatable<T>, IEqualityComparer<T>, static and virtual Objects.Equals, Comparison<T> ....) to do something as basic as comparing ? It just makes things complicated, inconsistent and cumbersome. It is particularly annoying when one library cannot "talk" to another because one did not do it the same way as the other.

  • I did some detective work and have a recommendation about the Comparison<T> delegate discrepancy. But to make sure we’re all on the same page, I’ll start with a more thorough description of the problem.

    The docs for IComparable.Compare (generic and nongeneric; signatures below) say null should always compare least; i.e. null is less than any non-null reference. As discussed above, that’s reasonable in this case, since CompareTo is an instance method and one may be null.

    IComparable.CompareTo(Object obj)

    IComparable<T>.CompareTo(T other)

    The docs for IComparer.Compare (generic and non-) also say null should compare least

    IComparer.Compare(Object x, Object y)

    IComparer<T>.Compare(T x, T y)

    Interestingly, the Comparison<T> delegate takes 2 args (like IComparer<T>.Compare), but it doesn’t specify that null should always compare least.

    delegate int Comparison<T>(T x, T y);

    So why does IComparer<T>.Compare say null should compare least but Comparison<T> has no such requirement?

    Unfortunately the history is missing, but the most likely explanation seems to be rooted in the Remarks section for IComparer.Compare (note, non-generic). This says the preferred implementation is to use CompareTo method (from the IComparable interface) of one of the parameters. If this is the case, it makes sense that it would expect parity with IComparable and null should compare least.

    IComparer<T>.Compare (the generic version of above) get trickier. It doesn’t mention that the preferred implementation is to use CompareTo, but it does say null should always compare least.

    As discussed in the comments above, special casing null is a reasonable restriction if the compare method is an instance method on the comparands, one of which may be null. However, this isn’t the case for IComparer, IComparer<T>, and Comparison<T> .

    I think the “preferred implementation” guidance is the only reason IComparer.Compare requires that null compare least.  Furthermore, the “preferred implementation” guidance is questionable because it overlooks a key selling point of IComparer – the ability to provide a custom comparison. It’s good that the preferred implementation guidance didn’t carry over to IComparer<T>.Compare. Unfortunately, the requirement to special-case null for IComparer<T>.Compare is inconsistent from this perspective. However, this guidance has been there for a while and possibly the best thing we can do for now is leave this as is.

    Lastly, the IComparison<T> delegate currently has neither restriction. I think we should leave it as is instead of letting legacy restrictions creep around. Why not give the flexibility? Any comments?

  • Certainly the documentation of IComparer.Compare and IComparer<T>.Compare should be clearer on whether nulls first is only a recommendation or a requirement.  I think a reasonable compromise would be:

    - At IComparer.Compare and IComparer<T>.Compare, say that these methods were originally intended to treat nulls as the smallest values, and that some callers may require this but the Sort methods included in the .NET Framework do not.

    - At each Sort method that takes an IComparer or IComparer<T> parameter, say whether or not that Sort method expects the comparer to sort nulls first.

    That would leave the interface specification vague but it's better to be vague explicitly than implicitly.

  • Definitely agree that doc clarification is needed -- I'll follow up on this.

    In fact, I like the phrasing of your first bullet. That will reduce confusion for readers who've been through former iterations of guidance.

    Also, perhaps I should rename this post to the CompareTo contract, to avoid further confusion. :)

  • Hi Daniel,

    Sorry for the delay. Have you tried http://blogs.msdn.com/xmlteam/?

    Thanks,

    Kim

Page 1 of 1 (13 items)