A Simple Puzzle

A Simple Puzzle

Rate This
  • Comments 24

My original version of the histogram-generating code that I whipped up for the previous episode of FAIC contained a subtle bug. Can you spot it without going back and reading the corrected code?

private static int[] CreateHistogram(IEnumerable<double> data, int buckets, double min, double max)
{
  int[] results = new int[buckets];
  double multiplier = buckets / (max - min);
  foreach (double datum in data)
  {
    int index = (int) ((datum - min) * multiplier);
    if (0 <= index && index < buckets)
      results[index] += 1;
  }
  return results;
}

Note that of course if this were production code, instead of demo code I whipped up in five minutes, it would be a lot more robust in its error detection; the bug that I am looking for is a bona fide error in the logic of the method, rather than things like "the method does not verify that min is smaller than max", and so on.

A hint: the first time I ran this code and displayed the results, the generated histogram looked fine. Then I made a small change to the arguments and the resulting histogram image was obviously wrong. Can you spot the defect?

  • It seems a bit odd that you are casting an int to an int inside the array index. It looks like you may have mistakenly included a hint.

    You are correct; that was an editing error. I've removed it. Thanks! -- Eric

  • unchecked
    {
       var value = (Int32)4294967296.0;
       Console.WriteLine(value == 0);
       // And so on...
    }

    I'm not following you. How does this relate to the bug in my code? -- Eric

     

  • CreateHistogram discards all data outside the [min, max) range - specifically, any datum equal to max is discarded.  That may be intentional?

    That is intentional; the intention is to display the portion of the histogram between the ranges. However, you're on the right track. Does it in fact discard all data outside the desired range? -- Eric

  • Ah tricky. I admit that it took me a few more minutes to see the issue even after I looked up the correct code.

    (Spoiler: I knew it would somehow be about negative numbers but the code was correct even when min is negative, at that point I lost hope and cheated)

  • Hmph.  And I thought I knew C#. :)

    Eric, please explain why that doesn't behave the way I expected it to.

  • Is it related to the relationship of buckets to max-min? If there were only one bucket (which is permitted even if silly) it looks like you could get an index that matches no bucket.

    I need to read these kind of puzzles on something other than my iPhone.

    The issue I have in mind will come up if there is only one bucket, yes, but the issue I have in mind affects histograms with any number of buckets.  -- Eric

  • [Addition to my first comment]

    Double is a 64bit type and Int32 is 32bit. The cast will erase a part of the index before we ensure that the initial operation is in range.

    Indeed, this is a problem; good catch. However, in this particular case all the data are reasonably sized and the program still has a bug. -- Eric

  • Agreeing with Lars... any datum equal to the maximum incorrectly gets discarded.

    And what's the problem with that? It's a histogram of tens of thousands of random data; the odds of one of them being exactly the maximum are tiny, and if it is, who cares if one datum in a hundred thousand is accidentally not displayed? We're already not displaying any data outside of the given range. -- Eric

  • Isn't this skewed by 0.5 in the cast to int?

    You tell me. Is it, or isn't it? -- Eric

  • I'm stuck on the cast to int... it looks like you should be performing some actual rounding logic there to make sure index is the appropriate value when dealing with a min and max that are negative.

    You're getting close. -- Eric

  • Data that would be in the first bucket before the histogram will be included in the first bucket of the histogram. (Because "(int)x" will evaluate to 0 for x in ]-1,1[)

    You got it! -- Eric

  • Short and sweet:

    any time (datum-min)*multiplier gives you a result between -1 and 0, not inclusive, the cast to int will always result in a zero.

    Thus not all data is excluded, the bucket at index 0 gets one buckets worth of extra data, however much that may be.  If you do the cast after checking the range, you will exclude those values.

    You got it! Beaten by mere seconds by jhominal. -- Eric

  • Not sure about the C# semantics on the (int) cast from double. My first assumption was that it would truncate, so that eventually, with large enough (or small enough) values of datum, index would wrap around back into the allowed range.

    Also, I'm agreeing with the others that if datum == max, it's discarded.

    When I tested casting with gcc, I got this behavior

    (int)4294967297 -> -2147483648

    In fact, casting any large double to int gave me -2147483648.

    If I cast it to a long, then cast the long to an int, I got 1, as I would expect with truncation.

  • Eric, If I were to write this code, I would automatically write it as:

    foreach (var datum in data)

    {

       if(datum>=min && datum<max)

       {

           bla bla

       }

    }

    and there can be no bug even if I'm not careful. Is there any particular reason you did not choose to write that way?

  • The only time it goes into the last bucket is when the value is equal to max.  Did you mean to use a rounding function when going from double to int instead?

    I'm not following your train of thought here. Suppose min is 0, max is 5 and buckets is 10. Multiplier is 2. If one of the data is, say, 4.7 then we multiply by 2 to get 9.4 and then round off to 9, which is the index of the last bucket. But 4.7 is not equal to 5. So I've provided a counterexample to your claim; would you care to clarify the claim? -- Eric

Page 1 of 2 (24 items) 12