Bug Psychology

Bug Psychology

Rate This
  • Comments 30

Fixing bugs is hard.

For the purposes of this posting, I’m talking about those really “crisp” bugs -- those flaws which are entirely due to a failure on the developer’s part to correctly implement some mechanistic calculation or ensure some postcondition is met. I’m not talking about oops, we just found out that the product name sounds like a rude word in Urdu, or the specification wasn’t quite right so we changed it or the code wasn’t adequately robust in the face of a buggy caller. I mean those bugs where you were asked to compute some value and you just plain get the result wrong for some valid inputs.

Let me give you an example.

The first bug I ever fixed at Microsoft as a full-time employee was one of those. To understand the context of the bug, start by reading this post from the early days of FAIC, and then come back.

Welcome back, I hope you enjoyed that little trip down memory lane as much as I did.

Now that you understand how a VT_DATE is stored, that explains this bizarre behaviour in VBScript:

print DateDiff("h", #12/31/1899 18:00#, #12/30/1899 6:00#) / 24
print DateDiff("h", #12/31/1899 18:00#, #12/29/1899 6:00#) / 24

This prints –1.5 and –2.5, as you’d expect. There’s a day and a half between 6 AM December 30th and 6 PM December 31st, and two and a half days between the other two dates. This is perfectly understandable. But if you just subtract the dates:

print #12/31/1899 18:00# - #12/30/1899 6:00#
print #12/31/1899 18:00# - #12/29/1899 6:00#

You get 1.5 and 3, not 1.5 and 2.5. Because of the bizarre date format that VT_DATE chooses, when you convert dates to numbers, you cannot safely subtract them if they straddle the magic zero date. That’s why you need the helpful “DateDiff”, “DateAdd” and so on, methods.

The bug I was assigned was that testing had discovered a particular pair of dates which DateDiff was not subtracting correctly. I took a look at the source code for one of the helper methods that DateDiff used to do one of the computations it needed along the way. To my fresh-out-of-college eyes it looked something like this:

if (frob(x) > 0 && blarg(y)) return x – y;
else if (frob(x) < blarg(y) && blah_blah(x) > 0 || blah_de_blah_blah_blah(x,y)) return frob(x) – x + y + 1;
else if…

There were seven such cases.

My urge was to dive right in and add an eighth special case that fixed the bug. But my ability to get it right in the face of all this complexity concerned me. It seemed like this was an awfully complicated function already for what it was trying to do.

I researched the history of the code a bit and discovered that in fact variations on this bug had been entered… seven times. Each special case in the code corresponded to a particular bug that had been “fixed”, a term I use guardedly in this case. A great many of those “fixes” had actually introduced new bugs, regressing existing correct behaviour, which then in turn were “fixed” by adding special cases on top of the broken special cases that had been added to “fix” previous bugs.

I decided that this coding horror would end here. I deleted all the code (all seven lines of it! I was bold!) and started over.

Deep breath.

Spec the code requirements first. Then design the code to meet the spec. Then write the code to the design.

Spec:

  • Input: two doubles representing dates in VT_DATE format.
  • VT_DATE format: signed integer portion of double is number of days since 12/30/1899, unsigned fractional part is portion of day gone by. For example: –1.75 = 12/29/1899, 6 PM.
  • Output: double containing number of days, possibly fractional, between two dates.  Differences due to daylight savings time, and so on, to be ignored.

Design strategy:

  • Problem: Some doubles cannot simply be subtracted because negative dates are not absolute offsets from epoch time
  • Therefore, convert all dates to a more sensible date format which can be simply subtracted.

Code:

double DateDiffHelper(double vtdate1, double vtdate2)
{
  return SensibleDate(vtdate2) – SensibleDate(vtdate1);
}
double SensibleDate(double vtdate)
{
  // negative dates like –2.75 mean “go back two days, then forward .75 days”:
  // Transform that into –1.25, meaning “go back 1.25 days”.
  return DatePart(vtdate) + TimePart(vtdate);
}

I already had helper methods DatePart and TimePart, so I was done. The new code was shorter, far more readable, generated smaller, faster machine code and most important, was clearly correct. No special cases; no bugs.

It’s not that my coworkers were dummies. Far from it. These were smart people. But computer geek psychology is such that it is very easy to narrow-focus on the immediately wrong thing, and try to tweak it until it does the right thing.

When faced with these sorts of “crisp” bugs, I try to restrain myself from diving right in. Rather, I try to psychoanalyze the person – who is, of course, usually my past self – who caused the bug. I ask myself “how was the person who wrote the buggy code fooled into thinking it was correct?” Did they not have a clear specification of what the method was supposed to do? Was it misleading? Did they have a clear plan for how to proceed? If so, where did it go wrong?

If there never was either a spec or a plan, then for all you know the whole thing might only be working by sheer accident. There could be any number of design flaws in the thing that just haven’t come to light yet. Editing such a beast means adding unknown to unknown. which seldom leads to good results. Sometimes coming up with a new spec, a new plan and scrapping an existing bug farm is the best way to proceed.

For many years after that, I would ask how to implement DateDiffHelper as my technical question for fresh-out-of-college candidates that I was interviewing for the scripting dev team. I reasoned that if that was the sort of problem I was given on my first day in the office, then maybe that would be a reasonable question to ask a candidate.

When you ask the same question over and over again, you really get to see the massive difference in aptitude between candidates. I had some candidates who just picked up a marker, wrote a solution straight out on the board, wrote down the test cases they’d use to verify it, mentally ran a few of the tests in their head, and then we’d have another half hour to chat about the weather. And I had some candidates who tried earnestly to write the version using special cases, despite my specifically telling them “you might consider transforming this bad format into something more pleasant to work with”, after they got stuck on the third special case. I’d point out a bug and immediately they’d write down code for another special case, rather than stopping to think about the fact that they’d just written buggy code three times already and told me it was correct three times.

  • There are bugs for which there are no fixes.  Accept it.  Software is just a flawed thought process made real.  But that doesnt mean all bugs can be fixed.  Because underneath the hood of that bug, there are 700 other bugs.

    Just try and stop whining about other coders.  Fact is software can never ever be bug free. Ever.  

    Ever.

  • Исправлять баги трудно. В контексте этой статьи я говорю о достаточно «явных» багах – тех изъянах, которые

  • @bob

    Most implementations of Hello World are pretty bug free. It's also perfectly possible to structure your code so that it's provably correct. Read some of Edsger Dijkstra's works if you wish to be convinced of that.

    I might be able to agree that sufficiently complex software is astronomically unlikely to be bug free  (for a sufficiently vague definition of 'sufficiently complex').

  • I have to disagree with Bob's comment of "There are bugs for which there are no fixes".

    There is ALWAYS a fic os some type of for any Bug.

    As an example, (going back to the mid 1980's) there was one 4GL RDBMS company which provided a new release of their product. Certain functionallity which worked great in the previous version was now crippled. Their response "It is not a bug, it is a feature with a negative performance impact".

    Thus the bug was fixed.

  • Essentially, code modification usually can be done to refactor existing code to be  simpler, better or more robust.  Also, improve the inline comments with notes about the cases handled by the code.

    Continual application of this slow 0.01% improvement will turn a large system from a mess into much less of a mess within a year or two.  Six Sigma.

  • I remember a similar bug from the days of Visual C 4.1 (mid-90s):

    We were coding a healthcare system and old patients, born before 12/30/1899, got their happy birthday greetings etc. on the wrong day when we upgraded from VC 4.0 to VC 4.1. I think it was a datediff() flavored function that had the same bug as you mention VBScript had. It got fixed in VC 4.2.

    BTW, there are so many facets re time and dates and computers: I remember in the 80's when Microsoft acquired Lattice C and later released their own Microsoft C 5 and later version 6, great products. One of the time functions had a safeguard so that when run on a PC without a hardware clock, it would instead return a date from 1955. And I always wondered if that was Bill Gates birthday...

    Rgrds Henry

  • @Steve Owens, I totally agree with your assumptions. What I was trying to point out (not very successfully, a bug to my queue :-) ) is that here we have a management problem that results in a dodgy code. Or a corporate culture problem, if you will.

    There are some developers who really are over-sensitive about their code and believe in a sort of Samurai code (as in "code of honour") demanding that whoever let a bug slip should perform a harakiri. But most people are twitchy simply because they are worried that, if they are noticed to have stuffed up a number of times, they will be singled out for the next staff reduction, with all that implies.

    The remedy to that, I believe, is to promote a culture of fixing the problem rather than the blame: find out what's stuffed up and fix it. In light of that, the question you propose  -- "How did they (I) get fooled?" -- may be rephrased into "How did it get stuffed up?" WHO exactly goofed is less of an issue here, because  -- and this is an additional assumption to those you propose -- EVERYONE can, and DOES, get goofed, stuffs up, or whatever is your local dialect expression for it. :-)

    A goog team leader/manager should be able to drive home the idea that there is no shame in admitting a mistake and putting your back into fixing it; persisting in your erroneous ways is much, MUCH more pathetic, whether it is done out of misplaced pride, or for political (politi-corporate, I mean) reasons of never, ever, wanting to be seen as someone capable of making a mistake.

    And the way to achieve that is to lead by example, I recon...

  • Grant, as interesting as the story is about how the standard railway gauge is derived from ancient Roman standards, it's purely made-up. http://en.wikipedia.org/wiki/Standard_gauge says why: there were many gauges used in the old days of railroads. Since old railways (often used to deliver coal from mines) were horse-driven, it's not surprising that the gauges were mostly around 5 feet, about the width of two horses. Therefore it shouldn't be surprising that ancient Roman wheel ruts were also about the width of two horses. The similarities are only a coincidence.

    The current standard gauge only came about as a convergence of many different standards, not because of a direct lineage to Roman chariots.

  • Interesting read. I was going to ask the same question as Faraz asked. :)

  • Welcome to the 49th edition of Community Convergence. The big excitment of late has been the recent release

  • My favorite maxim to apply in situations like these is that, by the time you're done, the code should look like the author knew what he was doing from the beginning.

  • Consider the product out there that depends on the original, buggy implementation. Your crisp, clean solution breaks backwards compatibility.

  • @Denis.   when fixing bugs, it is very helpful to know who did it so you know the style of thinking of whoever do it.    

    Trying to figure out why somebody was doing something on a certain way it's way more difficult if you don't have background info (like "this guy always follows the happy path, be way of uncheked previous errors")

    The "How did he/she get fooled on this" is easier to answer then.

  • Thanks for this post. I the add date and diffdate are really useful.

  • Gabe and Grant, just to ruin the old classic a bit more... Not only does the gauge only have a very vague relation to Roman chariots, the parts of the shuttle were never transported through a tunnel like that described in the myth. Who comes up with these things?

Page 2 of 2 (30 items) 12