Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

I can make it arbitrarily fast if I don’t actually have to make it work.

I can make it arbitrarily fast if I don’t actually have to make it work.

Rate This
  • Comments 27

Digging way back into my pre-Microsoft days, I was recently reminded of a story that I believe was told to me by Mary Shaw back when I took her Computer Optimization class at Carnegie-Mellon…

During the class, Mary told an anecdote about a developer “Sue” who found a bug in another developer’s “Joe” code that “Joe” introduced with a performance optimization.  When “Sue” pointed the bug out to “Joe”, his response was “Oops, but it’s WAY faster with the bug”.  “Sue” exploded “If it doesn’t have to be correct, I can calculate the result in 0 time!” [1].

Immediately after telling this anecdote, she discussed a contest that the CS faculty held for the graduate students every year.  Each year the CS faculty posed a problem to the graduate students with a prize awarded to the grad student who came up with the most efficient (fastest) solution to the problem.  She then assigned the exact same problem to us:

“Given a copy of the “Declaration of Independence”, calculate the 10 most common words in the document”

We all went off and built programs to parse the words in the document, inserting them into a tree (tracking usage) and read off the 10 most frequent words.  The next assignment was “Now make it fast – the 5 fastest apps get an ‘A’, the next 5 get a ‘B’, etc.”

So everyone in the class (except me :)) went out and rewrote their apps to use a hash table so that their insertion time was constant and then they optimized the heck out of their hash tables[2].

After our class had our turn, Mary shared the results of what happened when the CS grad students were presented with the exact same problem.

Most of them basically did what most of the students in my class did – built hash tables and tweaked them.  But a couple of results stood out.

  • The first one simply hard coded the 10 most common words in their app and printed them out.  This was disqualified because it was perceived as breaking the rules.
  • The next one was quite clever.  The grad student in question realized that they could write the program much faster if they wrote it in assembly language.  But the rules of the contest required that they use Pascal for the program.  So the grad student essentially created an array on the stack and introduced a buffer overflow and he loaded his assembly language program into the buffer and used that as a way of getting his assembly language version of the program to run.  IIRC he wasn’t disqualified but he didn’t win because he circumvented the rules (I’m not sure, it’s been more than a quarter century since Mary told the class this story).
  • The winning entry was even more clever.  He realized that he didn’t actually need to track all the words in the document.  Instead he decided to track only some of the words in the document in a fixed array.  His logic was that each of the 10 most frequent words were likely to appear in the first <n> words in the document so all he needed to do was to figure out what "”n” is and he’d be golden.

 

So the moral of the story is “Yes, if it doesn’t have to be correct, you can calculate the response in 0 time.  But sometimes it’s ok to guess and if you guess right, you can get a huge performance benefit from the result”. 

 

 

[1] This anecdote might also come from Jon L. Bentley’s “Writing Efficient Programs”, I’ll be honest and say that I don’t remember where I heard it (but it makes a great introduction to the subsequent story).

[2] I was stubborn and decided to take my binary tree program and make it as efficient as possible but keep the basic structure of the solution (for example, instead of comparing strings, I calculated a hash for the string and compared the hashes to determine if strings matched).  I don’t remember if I was in the top 5 but I was certainly in the top 10.  I do know that my program beat out most of the hash table based solutions.

  • Given that specification, I would have hard-coded the results as a constant too; though if required to write real code, I would have started with a hash table, as they're generally even easier to write than tree programs. The extra-speed magic would be in calculating the best hash function, well-matched to the input document.

  • Yeah, the hard-coded results seem like the best option given the specification. :)

    Which is why the exact input shouldn't be part of an assignment specification.  You should probably get a C if your program parses the given example input ok, but the As and Bs should be reserved for the programs that do well on unknown (but within spec) input.

    This reminds me actually of an assignment I had to do at Uni; we were asked to write an algorithm that was faster at sorting a large array of numbers than the CRT's qsort.  IIRC, I ended up still using a qsort style algorithm but used the numbers' bit patterns to drive the partitioning.  Worked quite well, actually.

  • Without knowing about the text in advance, the winning entry can't know that the input text doesn't end with a new word repeated 1000 times. IMO it still falls into the "fast but wrong" category.

    If you were allowed to write the code with assumptions about the input text then the entries which hardcoded the results should have been allowed (at least if the question explicitly said that the Declaration of Independence was the one and only input).

    I guess "wrong" is a fuzzy thing. Maybe the winning entry is "right often enough" and Joe's bug that caused Sue to explode was more egregious than that.

  • Interesting story about calculating the wrong answer in 0 time.

    I've read something similar in Kernighan and Plauger's book on programming style. There they mention how many programmers don't use bound-checking compilers because they're inefficient. KP then quip: "Presumably this means that it is vital to get the wrong answers quickly."

    I loved that sentence!

  • This is a somewhat odd contest, since the winner is the person who most accurately guesses judges' interpretation of the ambiguous rules.

    They should have either required a general solution, or stated the rules in an otherwise non-ambiguous way. For example, they could have said that the program must get 8 out of the top 10 words right on a body of text unknown to the students.

    But, the moral you concluded with is still a good moral. :-)

  • So if the Declaration had read "When in the course of human events... inalienable rights... etc... we mutually pledge to each other our lives, our fortunes and our sacred potato potato potato potato potato potato potato potato potato potato potato potato potato potato potato potato potato potato potato potato", the winning entry would in fact not have won. Also, the United States might look very different today, but that's another matter.

    If you argue that the nature of the input is part of the specification, it's hard to argue that the guy who hard-coded the answer was breaking the rules. The problem would have to be formulated much more carefully to make the winning entry unambiguously adhering to the rules and the hard-coding entry unambiguously in violation of them. After all, you can make things arbitrarily fast if you can redefine "make it work".

  • Let us commence a journey into the much travelled topic of hashing. Advancments in hashing can be linked to many areas. Given that its influence pervades our society, several of todays most brilliant minds seem incapable of recognising its increasing relevance to understanding future generations. Inevitably hashing is often misunderstood by those most reliant on technology, who are yet to grow accustomed to its disombobulating nature. Though I would rather be in bed I will now examine the primary causes of hashing.

    Social Factors

    As Reflected in classical mythology society is complicated. When blues legend 'Bare Foot D' remarked 'awooooh eeee only my dawg understands me' [1] he borrowed much from hashing. While deviating from the norm will always cause unrest amongst ones peers, hashing bravely illustrates what we are most afraid of, what we all know deep down in our hearts.

    Special care must be taken when analysing such a delicate subject. On the other hand anyone that disagrees with me is an idiot. Just as a dog will return to its own sick, society will return to hashing, again and again.

    Economic Factors

    The preceding section may have shed some light on society but to really understand man you must know how he spends his money. We shall examine the Watkis-Teeth-Pulling model, a complex but ultimately rewarding system. Annual

    Military

    Budget  

     hashing

    There is no longer a need to argue the importance of hashing, it is clear to see that the results speak for themselves. The question which surfaces now is, how? In spite of the best efforts of The World Bank the annual military budget world wide are driven entirely by hashing. Perhaps to coin a phrase hashingeconomics will be the buzz word of the century

    Political Factors

    No man is an island, but what of politics? Politicians find it difficult to choose between what has become known in politics as - 'The two ways' - 0

    To quote a legend in their own life time, Bonaventure Rock 'Man's greatest enemy is complacency with regards to personal and political hygiene.' [2] Primarily, he is referring to hashing. To paraphrase, the quote is saying 'hashing wins votes.' Simple as that.

    The question which we must each ask ourselves is, will we allow hashing to win our vote?

    Conclusion

    To conclude hashing has played a large part in the development of man in the 20th Century and its influence remains strong. It fills a hole, it stimulates and never hides.

    What a great essay. Finally a word from super-star Courteney Pfeiffer: 'It's been nice educating you.' [3]

    --------------------------------------------------------------------------------

    [1] Bare Foot D - Classic - 1967 Stinton Records

    [2] Rock - Roll It Up - 1977 - F. Lower Publishing

    [3] Sham Magazine - Issue 124 - Monkey Books

  • The anecdote actually comes from one of Gerald Weinberg's books (Psychology of Computer Programming, maybe?).  It was Weinberg showing the software team at a car manufacturer how to avoid the horrible logic bugs in their ad-hoc designed program which spits out a list of parts for a given model of car. (It did crazy things like specify that 6 wheels were needed etc.)

    So it was Weinberg who, when told his program was much slower said, "I can make it take 0 seconds if I don't have to make it actually work."

  • This reminds me of recurring tests that were given at my school. It was a basic programming test, with exercises of increasing difficult, starting at such simple things as "reverse a string", up to moderately difficult ones like "count the number of islands of the same character in an ASCII image".

    The test was administered every month, with different exercises each time, to all students that hadn't gottent yet the requested mark.

    Now, the subject of each exercise had a small section with the list of allowed external functions, which generally amounted to "nothing except write() to output the result in the console". Some exercises allowed malloc(), but after a few months, everybody was used to seeing "nothing" and didn't read the section anymore.

    One day, the subject of the last exercise was "Implement the MD5 algorithm", followed by a copy-paste of the specification of the algorithm. You could see the expression change on the face of most students as they read the subject and started to despair... except for a me and a few other students that had read the "allowed functions" section of the subject and started grinning. That day, it read "all of libc".

    It took me one minute to type "man 3 md5" and write the three calls to MD5_Init/MD5_Update/MD5_Final.

    Now that was an easy subject... (I got the points for it of course !) The next month the MD5 exercise was back, but with a "nothing" limitation again.

  • AKA "If I can't be right I might as well be fast."

  • I'd be pretty upset if my hard coded program was disqualified for breaking the rules while the winning program determined "n" in advance, unless the programmer discovered a value of "n" that worked for all documents.  This seems rather unlikely in light of potato potato potato potato potato potato potato potato potato.

  • I'm reminded of a line from Bruce McKinney's "Hardcore Visual Basic" - "It doesn’t matter how fast your code is if it doesn’t work". The Acknowledgements give the original utterer as Gene Apperson.

  • The moral of the story is actually "noone likes a smartass".  :)

    The guy who was disqualified did not break the letter of the rules, but quite obviously wasn't actually paying any attention to the point of the exercise.

    I'ld agree that specifying the input text in advance is kinda silly - but an important developer skill is "interpretation of specifications that are less precise than actual code or even over-specify details that clearly don't actually need to be that specific" and therefore there is something to be said for not being an ass.

    Another moral - sometimes a catch-all "the judges get to decide who wins, and no argument will be tolerated" is better than trying to be "fair" and listing all the rules and inevitably get caught up in debates about how anyone who finds a loophole is technically obeying the rules.

    Typical rules lawyers who think that rules are necessary because the people running the competition are big meanies are unlikely to be popular enough that anyone will care how upset they are.  :)

    Another funny story:

    Many years ago, a quiz show did the "guess the name of the person, progressively more detailed clues, the first player to get the answer wins but if you're wrong, you're out" thing. Usually the format was "who am I? clue 1... clue 2.." This one time, however, one smartass got as far as "who am I?", hit his buzzer, and gave the name of the host. Which was technically correct.

    Fortunately for him, the officials let him away with it.  :)

  • The thing that you should remember about running programming contests is that you need to give inputs that are likely to cause the programs to fail.

    Students often struggle with error checking and edge cases. I'd bet most of the submissions would have fallen apart if the input had fewer than ten words.

  • I'm surprised the winning entry didn't use a radix tree. It would be perfect for this because its insertion time is O(n) where n is the length of the word you're inserting (like a hash table).

    Of course, since you know the input, you could easily write a hash function that was both perfect (no collisions) and relatively fast to compute.

Page 1 of 2 (27 items) 12