Spot the defect: Bad comparisons, part two

Spot the defect: Bad comparisons, part two

Rate This
  • Comments 21

Suppose I want to sort a bunch of strings into order first by length, and then, once they are sorted by length, sort each group that is the same length by some other comparison. We can easily build such a device with higher-order programming:

static Comparison<string> FirstByLength(Comparison<string> thenBy)
{
  return (string x, string y) =>
  {
    // Null strings are sorted before zero-length strings; remember, we need to provide a total ordering.
    if (x == null && y == null)
      return 0;
    if (x == null)
      return -1;
    if (y == null)
      return 1;
    if (x.Length > y.Length)
      return 1;
    if (x.Length < y.Length)
      return -1;
    // They are the same length; sort on some other criterion.
    return thenBy(x, y);
  };
}

Super. This idea of composing new comparison functions out of old ones is pretty neat. We can even built a reversal device:

static Comparison<string> Reverse(Comparison<string> comparison)
{
  return (string x, string y) => -comparison(x, y);
}

Something is subtly wrong in at least one of these comparison functions. Where's the defect?

.

.

.

.

.

.

.

.

Let's restate that contract again. The comparison function returns a negative integer if the first argument is smaller than the second, a positive integer if the first is greater than the second, and zero if they are equal. Any negative integer will do, and in particular, Int32.MinValue is a negative integer. Suppose we have a bizarre comparison function that returns Int32.MinValue instead of -1:

Comparison<string> bizarre = whatever;

and we compose it:

Comparison<string> reverseFirstByLength = Reverse(FirstByLength(bizarre));

Suppose two strings are equal in length and bizarre returns Int32.MinValue for those strings. Reverse should return a positive number, but -Int32.MinValue either throws an exception (in a checked context) or returns Int32.MinValue right back (in an unchecked context). Remember, there are more negative numbers that fit into an integer than positive numbers, by one.

The right implementation of Reverse is either to spell it out:

static Comparison<string> Reverse(Comparison<string> comparison)
{
  return (string x, string y) =>
  {
    int result = comparison(x, y);
    if (result > 0) return -1;
    if (result < 0) return 1;
    return 0;
  };
}

Or, as some commenters noted, to swap left and right:

static Comparison<string> Reverse(Comparison<string> comparison)
{
  return (string x, string y) => comparison(y, x);
}

Next time: One more defect to spot.

  • Couldn't you implement reverse as "return (string x, string y) => comparison(y,x);" instead?

  • Why don't you change the implementation to:

    return (string x, string y) => comparison(y, x);

    ?

  • @Brian and @Patrick: You can. It's just that that doesn't *appear* immediately to be any better than returning the result of negating the original call. You basically need to remember the int.MinValue problem. Once you know it, it's easy to remember it... but I suspect that relatively few people who hadn't already come across it would pick that up in code review.

  • (Oops, hadn't spotted the longer implementation at the end of the post.)

    I would suggest that the "comparison(y, x)" implementation works in any *sane* implementation of IComparer<T>. No doubt you could produce weird implementations (e.g. ones that favoured the first argument somehow) but I'd argue that at that point all bets are off anyway. I can't think of any guarantees provided by Sort in terms of which value will be used for the first parameter and which for the second.

  • If we generalized Comparison to something that took two different data types then I could see why you'd need the longer version.  E.g. something like Comparison <int,string> where you compare the value of the int to the length of the string.  But I can't think of a great example where such a comparison would be useful.

  • I wonder why you introduced FirstByLength anyway...? It seems to me that Reverse(bizarre) will have the same unwanted behaviour - or am I missing something?

  • I'm assuming the next defect to spot isn't the missing '};' in the last code sample.

    According to Raymond's article that Eric referenced on Thursday: (blogs.msdn.com/.../55408.aspx)

    "...your comparison function needs to follow these rules:

    ...

    •Anti-Symmetry: Compare(a, b) has the opposite sign of Compare(b, a), where 0 is considered to be its own opposite. "

    So if we can make that assumption about the comparison function, we should be able to reverse it by reversing arguments. If we can't make that assumption, I agree with Jon, "all bets are off anyway."

    This is of course ignoring the fact that Eric's implementation discards any information in the return value of the inner comparison other than the sign. I'll leave it to others to decide whether that's a good thing or a bad thing.

  • @Tim Goodman: A comparison on two different types is useful when you want to update one sorted list using another sorted list of updates.  You need to compare elements from the two lists to do a merge, but update objects won't necessarily be of the same type as the objects being updated.

  • "I would suggest that the "comparison(y, x)" implementation works in any *sane* implementation of IComparer<T>"

    I suspect that a not-uncommon way to implement the Comparison<T> method is, when given input that is either numerical or easily mapped to something numerical, to just return the result of subtracting the second input from the first.

    While I would just delegate to the numerical type's own CompareTo() method, a) that might not always be possible (for custom numerical types that are not as fully implemented as the built-in types), and b) I can't say it's technically insane to use subtraction instead (in fact, in a micro-optimization sort of way it could be more efficient in some cases, trading the math for a method call).

    And subtraction makes even more sense if the difference calculation somehow just falls out of the data type naturally without an actual project to a numerical data type (not that I can think of any valid examples of that at the moment…but I don't feel comfortable ruling them out at the moment :) ).

    At least, this is what I came up with when I starting thinking about why the Comparison<T> (and related comparison APIs) are defined this way, rather than just requiring -1, 0, and 1 as return values in the first place.

  • @pete.d, consider subtraction and these results. Comparison<int> comparer = (a, b) => a - b; int x = comparer(3, 4); int y = comparer(int.MinValue, int.MaxValue); What's x? What's y?

  • I don't know if it's a "defect" per se, but composing comparisons like this is going to become particularly unwieldy as soon as you want to sort *first* by *descending* size and *then* in ascending order of some other comparison function. It seems it would be cleaner not to clutter up our length comparison function with comparison composition functionality, and instead write a separate function to compose an arbitrary number of comparisons. (Or if we're really crazy on the functional programming to write a function to compose two comparisons and then apply a fold to it.) Then instead of something nasty like:

    var comparison = Reverse(FirstByLength(Reverse(CompareAlphabetic)));

    we can write more naturally:

    var comparison = Compose(Reverse(CompareByLength), CompareAlphabetic);

  • I think using extension methods on the Comparison<T> type would probably be a cleaner way to compose, and would more closely match the LINQ syntax people are used to:

    var comparison = CompareByLength.Reverse().ThenBy(CompareAlphabetic);

  • @Anthony P: "int y = comparer(int.MinValue, int.MaxValue); What's x? What's y?"

    You are missing the point. I'm not talking about a general purpose comparison function. I'm talking about a scenario where the inputs are known to work fine for subtraction.

  • Now, wait a sec: "once they are sorted by length (!!!), sort each group that is the same length by some other comparison," isn't that what we ar supposed to do? If so, then why are we sorting by "some other comparison" within the same function that sorts by length, which is clearly before they are sorted by length??? Maybe my math is bad; maybe the result will be the same... but my gut feeling, plus a bit of military background that says "do exactly what you are told to do", are against this approach. Sorry if it does not sound very convincing...

  • @Denis, he's saying this: sort by length, let another comparison be a tiebreaker for matching lengths. Perhaps it's a straight alphabetic sort, maybe it's by the number of vowels. In SQL, it would be Order By Length, Foo.

Page 1 of 2 (21 items) 12