A Simple Puzzle

A Simple Puzzle

Rate This
  • Comments 24

My original version of the histogram-generating code that I whipped up for the previous episode of FAIC contained a subtle bug. Can you spot it without going back and reading the corrected code?

private static int[] CreateHistogram(IEnumerable<double> data, int buckets, double min, double max)
{
  int[] results = new int[buckets];
  double multiplier = buckets / (max - min);
  foreach (double datum in data)
  {
    int index = (int) ((datum - min) * multiplier);
    if (0 <= index && index < buckets)
      results[index] += 1;
  }
  return results;
}

Note that of course if this were production code, instead of demo code I whipped up in five minutes, it would be a lot more robust in its error detection; the bug that I am looking for is a bona fide error in the logic of the method, rather than things like "the method does not verify that min is smaller than max", and so on.

A hint: the first time I ran this code and displayed the results, the generated histogram looked fine. Then I made a small change to the arguments and the resulting histogram image was obviously wrong. Can you spot the defect?

  • I just watched a presentation by Bret Victor last night called Inventing on Principle, and he talks about an interface that gives algorithm developers immediate insight into the results of their functions.  If you haven't seen it, you should watch this section of it: www.youtube.com/watch

    Watching the video and then reading this post made me think: Man, if Eric had this tool, he may not have had that bug. :)

  • @Ali I don't think that has anything to do with the bug, does it? Mathematicians would normally write something like (min<=value<max) to more clearly express how value falls inside a range with a lower and an upper bound. Indeed some languages such as Python actually allow that syntax. In languages like C# that don't allow that syntax, (min<=datum && datum<max) is the closest you can get.

  • This all sounds very familiar. This bears some relation to one of Eric's previous posts, "What's the difference? Remainder vs modulus":

    blogs.msdn.com/.../what-s-the-difference-remainder-vs-modulus.aspx

    In fact, I think in the comments I linked to an instance of a very similar bug.

  • @weeble0 I think you misunderstand me, my comment was "first check if the current point is in range, and if it is, only then calculate the index" seems a more natural way to code the algorithm. I wondered if there is some disadvantage I cannot see. (Except that, then there would be no interesting bug/puzzle to talk about)

  • @Ali Ah, sorry, I did misunderstand you.

  • private static int[] CreateHistogram(IEnumerable<double> data, int buckets, double min, double max)

    {

     int[] results = new int[buckets];

     double multiplier = buckets / (max - min);

     foreach (double datum in data)

     {

       int index = (int) ((datum - min) * multiplier);

       if (0 <= index && index < buckets)

         results[index] += 1;

       min = datum;

     }

     return results;

    }

  • The moral of all this: static typing does not replace testing ;-)

  • @Ali

    The disadvantage of that method is, that you now need to be *very* careful about rounding errors. I would not be surprised if there are values min<=x<max for which the calculated bucket is outside of the array range. Floating point rounding is a tricky beast.

    So I prefer Eric's code (with a fix for his bug of course).

    ---

    Since IMO the build in Random class in .net sucks, I'm currently creating my own RNG. A similar issue crept up:

    One feature is generating a uniform double that satisfies min<=x<max from a uniform double that satisfies 0<=x<1.

    The naive implementation if this is simply: `return min+(max-min)*Uniform()`. Unfortunately if `Uniform()` returns `1-Epsilon`, the result might get rounded to `max`, violating the contract.

    So I've added a do-while loop ensured the result is smaller than max.

  • (max - min) / buckets ?

Page 2 of 2 (24 items) 12