Spot the defect: Bad comparisons, part four

Spot the defect: Bad comparisons, part four

Rate This
  • Comments 16

One more easy one. I want to "sort" a list into a random, shuffled order. I can do that by simply randomizing whether any two elements are greater than, less than, or equal to each other:

myList.Sort((x, y) => (new Random()).Next(-1, 2));

That generates a random -1, 0 or 1 for every comparison, right? So it will sort the list into random order, right?

.

.

.

.

.

.

.

There are multiple defects here. First off, clearly this violates all our rules for comparison functions. It does not produce a total ordering, and in fact it can tell you that two particular elements are equal and then later tell you that they have become unequal. The sort algorithm is allowed to go into infinite loops or crash horribly when given such an ill-behaved comparison function. And in fact, some implementations of sorting algorithms attempt to detect this error and will throw an exception if they determine that the comparison is inconsistent over time.

Second, every time this thing is called it creates a new Random class seeded to the current time, and therefore it produces the same result over and over again; hardly random.

Shuffling is not sorting; it is the opposite of sorting, so don't use a sort algorithm to shuffle. There are lots of efficient shuffle algorithms that are easy to implement.

  • Plus, it's unlikely this would produce a uniform distribution or any other well-defined distribution (assuming sort() even completes without an exception).

  • I mean, even if you fix the random seeding defect.

  • Long ago, when I worked as a developer in the Excel group, we used Excel for everything (I assume they still do :) ). Including shuffling. By sorting. :)

    The technique was simple: in a worksheet, one column of the data you want to sort (this was often a list of the programmers, for morale-related lotteries :) ). In the adjacent column, a fill-enter of "=rand()". Then, copy/paste special/values to lock the random numbers in place, finish up with a data/sort… to rank the rows by the random number.

    You could do the same exact thing with a deck of cards. Or whatever.

    So, you see: shuffling IS sorting. At least, some of the time. :)

  • A classic sort-based shuffle is to assign a random number to each piece of data and then sort them on that number.  I am not a fan of that algorithm.  My favorite shuffling algorithm remains Fisher Yates shuffle (aka Knuth Shuffle): en.wikipedia.org/.../Fisher–Yates_shuffle .

  • If you really wanted to use sort for shuffling, would there be harm in producing a tuple or a key/value pair, sort those, then split them up again for results?

  • @Joe: Sure it would pe possible, but here are two problems.

    First, you must ensure that you don't assign the same random number twice, as your sort algorithm might be stable (i.e. leave input in the smae original order if they have the same random number).

    Secondly, sorting has O(n log(n)) complexity, while shuffling can be done in O(n).

    In other words, for sorting to work, you need to shuffle nu numbers 1 to n. But if you can shuffle those, why not shuffle what you need to shuffle in the first place? Secondly, the sorting is likely to be slower than just plain shuffling.

    Brian is right, have a look at Fisher-Yates shuffling.

  • Even if the randomization worked as expected, wouldn't this maintain some of the original ordering?  With this approach you effectively have 3 queues (-1, 0, 1) so the first item would almost never be the last except on the instance when it's the only item in the +1 queue.  So, on average, the first item would come from the first 3 items, the next from the second 3, etc.

  • It seems like you <em>could</em> get at least a stable result by using the following for comparison:

    list.Sort((x, y) => new Random(x).Next().CompareTo(new Random(y).Next()));

    That is, compare elements based on the first value produced by a Random object seeded using each element. This would at least give you consistency between comparisons of the same numbers. Of course, it would also create an excessive number of superfluous Random objects. I also have no idea how statistically "good" it would be (I'm guessing not very).

    Still, it seems like it's at least one <em>possible</em> way to "shuffle by sorting."

  • @Dan: Comparisons also have to be transitive: if a < b and b < c then a < c.

  • An interesting read on how non-uniform random distribution caused problems for the Microsoft default-browser form can be found at:

    www.robweir.com/.../microsoft-random-browser-ballot.html

  • @Simon Buchan: Thinking out loud, new Random(n).Next() will produce the same value each time it's called.  So really all that's happening is 1..x..y..n is getting mapped to R(1)...R(x)...R(y)...R(n) which will have a different order, but the transitive rule will still apply for this new order because of the consistent result.

  • @Dan Problem is, the Random constructor takes an int as a seed. So, your method only works for List<int>. You could however, grab the hash code for the seed. You can do it as crazily as you want, or can just use something that works: Fisher-Yates.

  • "…but the transitive rule will still apply for this new order because of the consistent result."

    I agree, but it also means that it's not really a random ordering. You will always get the same new ordering every time you apply that algorithm. It's dependent only on the specific RNG being used.

    You might as well just shift all the bits some fixed amount, or multiply by some large prime, or some other kind of "mix-up" transformation and compare that result instead.

  • "The sort algorithm is allowed to go into infinite loops ...   "

    No, Young master.

    proof infinite of random very hard it is.

  • @Simon Buchan: As @Harold D said, the comparison I posed as a hypothetical shuffle-by-sort method should maintain transitivity. But also as @pete.d pointed out, it would be a pretty poor choice as a reusable algorithm since every "shuffle" of the same list would have the same result (this is in addition to the fact that it's inefficient and wasteful to begin with). @Martinho: I had that same thought--the most obvious way to make it generic would be to use GetHashCode. But you're right, I didn't mean to suggest that it was a reasonable thing to do, given the existence of, e.g., Fisher-Yates. It was more just a think-out-loud exercise of one possible way that you COULD try and shuffle with a sorting algorithm. Another more correct way off the top of my head would be something like:

    var r = new Random();

    var shuffled = list.Select(x => new { Key = r.Next(), Value = x })

       .OrderBy(x => x.Key)

       .Select(x => x.Value);

Page 1 of 2 (16 items) 12