Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

I can make it arbitrarily fast if I don’t actually have to make it work.

I can make it arbitrarily fast if I don’t actually have to make it work.

Rate This
  • Comments 27

Digging way back into my pre-Microsoft days, I was recently reminded of a story that I believe was told to me by Mary Shaw back when I took her Computer Optimization class at Carnegie-Mellon…

During the class, Mary told an anecdote about a developer “Sue” who found a bug in another developer’s “Joe” code that “Joe” introduced with a performance optimization.  When “Sue” pointed the bug out to “Joe”, his response was “Oops, but it’s WAY faster with the bug”.  “Sue” exploded “If it doesn’t have to be correct, I can calculate the result in 0 time!” [1].

Immediately after telling this anecdote, she discussed a contest that the CS faculty held for the graduate students every year.  Each year the CS faculty posed a problem to the graduate students with a prize awarded to the grad student who came up with the most efficient (fastest) solution to the problem.  She then assigned the exact same problem to us:

“Given a copy of the “Declaration of Independence”, calculate the 10 most common words in the document”

We all went off and built programs to parse the words in the document, inserting them into a tree (tracking usage) and read off the 10 most frequent words.  The next assignment was “Now make it fast – the 5 fastest apps get an ‘A’, the next 5 get a ‘B’, etc.”

So everyone in the class (except me :)) went out and rewrote their apps to use a hash table so that their insertion time was constant and then they optimized the heck out of their hash tables[2].

After our class had our turn, Mary shared the results of what happened when the CS grad students were presented with the exact same problem.

Most of them basically did what most of the students in my class did – built hash tables and tweaked them.  But a couple of results stood out.

  • The first one simply hard coded the 10 most common words in their app and printed them out.  This was disqualified because it was perceived as breaking the rules.
  • The next one was quite clever.  The grad student in question realized that they could write the program much faster if they wrote it in assembly language.  But the rules of the contest required that they use Pascal for the program.  So the grad student essentially created an array on the stack and introduced a buffer overflow and he loaded his assembly language program into the buffer and used that as a way of getting his assembly language version of the program to run.  IIRC he wasn’t disqualified but he didn’t win because he circumvented the rules (I’m not sure, it’s been more than a quarter century since Mary told the class this story).
  • The winning entry was even more clever.  He realized that he didn’t actually need to track all the words in the document.  Instead he decided to track only some of the words in the document in a fixed array.  His logic was that each of the 10 most frequent words were likely to appear in the first <n> words in the document so all he needed to do was to figure out what "”n” is and he’d be golden.

 

So the moral of the story is “Yes, if it doesn’t have to be correct, you can calculate the response in 0 time.  But sometimes it’s ok to guess and if you guess right, you can get a huge performance benefit from the result”. 

 

 

[1] This anecdote might also come from Jon L. Bentley’s “Writing Efficient Programs”, I’ll be honest and say that I don’t remember where I heard it (but it makes a great introduction to the subsequent story).

[2] I was stubborn and decided to take my binary tree program and make it as efficient as possible but keep the basic structure of the solution (for example, instead of comparing strings, I calculated a hash for the string and compared the hashes to determine if strings matched).  I don’t remember if I was in the top 5 but I was certainly in the top 10.  I do know that my program beat out most of the hash table based solutions.

  • The point of the excercise is to return the top 10 used words in the Decl. as quickly as possible. I'd argue he's the -only- one who understood the point of the exercise.

  • This reminds me of a solution I did in college. The instructor asked us to write our own brand new sort algorithm. Most students changed bubble sort of quick sort etc... I turned in what I learned later was the six pack sort.

    For the array, Random the whole array, check if each array variable is less then it's neighbour. If not, Random again. The instructor laughed and gave me an A, due to it was possible to be more efficient then quick sort, but in reality could never ever finish.

  • This reminds *me* of the debate about voting machines.  There always seems to be a lobby group who insist on getting election results quickly and aren't too worried about accuracy.  Given that specification, I always recommend a random number generator; not only is it fast, but it's also much more efficient than actual voting. :-)

  • @Scott: that's yet another good illustration of the pitfalls of specifications, because "the top 10 most common words" is a meaningless phrase when you have less than 10 words to create that list from. An argument can be made that the programs would be allowed to simply fail in that case (in whatever way they choose), since there's no correct answer.

    Of course, it's trivial to extend the concept of a "top 10" to allow for less entries in such a case, or to reformulate the problem so that the problematic "top 10" doesn't appear. You (and many others) probably expect the programmer to do this on his own initiative, since nobody likes a failing program if there's a reasonable way to make it not fail.

    In fact, I'd say the most valuable thing to take away from this exercise is not the actual algorithm but an object lesson in how specifications should be done, can be done, will be done and how the pragmatic programmer has to cope with reality. Programming is not mechanically translating specifications into programs -- that's the compiler's job.

  • Hmm, hard-coding the results is exactly what I'd do in 'live' code, given that specification.

    I use that technique occasionally. If I can convert a rather complicated algorithm into a lookup table, that's usually a win.  So I write a program(1) to read the raw data and write out a C/C++ structure; this gets compiled into program(2).

    Program(1) will be executed once so efficiency is not a concern. Program(2) is nice and fast because it has a simpler algorithm.

    More skilled persons that I might write program(1) with C++ template metaprogramming.

  • A classic challenge problem ... see Bentley, J.,  Knuth, D., and McIlroy, D. (1986) Programming pearls: a literate program Communications of the ACM 29(6) (direct link http://cacm.acm.org/magazines/1986/6/10222-programming-pearls/pdf?dl=no). I still lean toward the shell script. Even though the run time is longer, the development time is *very* fast.

  • My version of this story was an assignment to have the fastest program that would calculate whether an arbitrary number in the range 2..10000 was prime.  My program beat the pants off everybody - it was just a simple lookup table of 10000 bools.

    I got docked a couple points for being a smartass but otherwise it was accepted.

  • Of note, guessing is exactly the same thing as "getting it wrong most of the time."

    Guessing may make the code faster, but it hardly makes it any more useful.

  • if (input.size < SOME_THRESHOLD) {

     use_lookup_table(input); // hard to be faster than that

    } else {

     use_scalable_but_more_complicated_solution(...);

    }

    In the Declaration of Independence case, the size of the input is known.

    I would have been interested to see a grading scale that tested performance across a range of inputs, some of which were very large (_War and Peace_ comes to mind.)

  • if (input.size < SOME_THRESHOLD) {

     use_lookup_table(input); // hard to be faster than that

    } else {

     use_scalable_but_more_complicated_solution(...);

    }

    In the Declaration of Independence case, the size of the input is known so the threshold can be set accordingly.  Is this still cheating?  Probably.

    I would have been interested to see a grading scale that tested performance across a range of inputs, some of which were very large (_War and Peace_ comes to mind.)

    Practically speaking, such an approach is problematic (in all but the most performance-critical of scenarios) since you have twice as many code paths to test.

  • That exact same story and almost the punchline occurs in Code Complete, two devs arguing over the speed of an algorithm.  "Your code takes x seconds per input, mine is much faster."  "Yes, but your code doesn't work, if mine doesn't I can have it return instantly."  Or something to that effect.  I wonder if this is some sort of development story archetype.

  • I'd argue that, based on the information stated here, the hard-coded answer is the right solution. If they wanted something different, they should have asked, "Given any text document..."

    I remember a million years ago on the ACM programming team, a problem was, "Print the decimal value of 80 factorial." On then-current hardware, it was impossible to do this within the two-minute time limit. The right answer was to do it quick and dirty, let it run, and then submit a program that just prints the (hard-coded) answer.

Page 2 of 2 (27 items) 12