This weekend I got 84th place in the online 10th annual International Conference on Functional Programming contest. Not bad, given that I hadn't any intention in, you know, actually competing.
The task: in this year's competition, you are given 7MB string of characters representing "alien DNA". Like human DNA, each character represents one of four bases -- but instead of A, T, C, G, the four bases are I, C, F and P.
You are also given detailed instructions to decode the alien DNA. You can implement this any language you want, using any tools you deem necessary. Essentially, bases are consumed from the left end of the string, and the consumed bases indicate how to mutate the remainder of the string, as well as generating a series of 7-character strings called "RNA". The process is iterated until the DNA string is completely consumed.
The series of RNA strings represent drawing instructions. "Executing" the RNA should result in the image of an unhappy alien and a crashed spaceship.
And if you are able to get to this point, the puzzle has just begun ...
The real object of the competition is not to generate the image of an unhappy alien, but to "morph" the image into a happy alien and a fixed spaceship. This is done by prepending a "prefix" to the DNA string, which alters how the rest of the (self-modifying) DNA is interpreted.
Oh, and did I mention? You only have 72 hours to write this all up.
As you can imagine, it's as hard as hell. But also extremely fun!
Day 1 (Friday): I've never done an ICFP contest before, so I had no idea what I was in for. I had learned about it from reddit earlier on in the week, and even briefly glanced at the contest materials on Friday when they were unveiled, but never gave much thought into actually joining the quest. But when I got home from work on Friday evening, I saw I had a few hours to kill, and what the hell, why not at least do a little bit and see how far I get?
And so I begin the long, slow slide into the abyss ...
The coding was very straightforward, translating pseudocode into C++. I could have used Java or C#, but I'm at the point where programming at a high level in C++ is just as easy, and I wanted to retain the option of quickly dropping down into some low-level bit-twiddling if it came down to it. Erlang is my other favorite language for hacking around in, but was worried about running into some performance problem that I couldn't extricate myself from.
By the end of the night (i.e., 2am), I had a mostly-working DNA-to-RNA translator. But it was awfully slow. According to the contest materials, complete execution of the DNA requires 1.9 million iterations -- and so even at 100 iterations a second, I calculated it would take around 6 hours to finish.
Day 2 (Saturday): I woke uncharacteristically early on Saturday morning and got immediately to the business of writing a faster interpreter. Originally I had a used a std::deque to represent the DNA, owing to the frequent case of removing bases from the front of the string. But much of the time was spent copying sequences of DNA megabytes in length. It stands to reason that copying a million bytes a million times would take an extremely long time. The contest materials even suggest that "It seems particularly important to ensure that appending unquoted references perform better than in linear time".
My second attempt was to replace the std::deque with a std::list of characters, so that I could used the splice( ) method to combine subsequences. I didn't get too far down that path before the problems became obvious. First, not all subsequences were splice()-able -- some would need to be duplicated, and recognizing those situations added complexity. But an even more grave problem was that the common case of skipping down the list by n bytes went from O(1) to O(n). I wasn't making progress; I was only trading one problem for another.
I quickly abandoned that train of thought and went to write my own container. At first I started writing my own deque, implementing it with a std::vector under the hood. After all, in the usual case only a few hundred bases would be consumed, and then the string would be thrown out and replaced with a mutated string. It wasn't necessary to aggressively reclaim the memory the way that deque does. This improved performance a great deal -- instead of a ridiculously-long 6 hour runtime, I was looking at a ridiculously-long 2 hour runtime.
My fourth try was a ugly mess (I was getting tired at this point, and my code was getting sloppy). The idea was that I would represent only the diffs between DNAn and DNAn+1 -- and then chain this information together. To determine the 1000th character of iteration 10, I'd find where that character was in iteration 9, which entailed finding its location in iteration 8, etc. etc., until I found its location in iteration 0, where the data is actually stored. Every 100 iterations or so I would renormalize this structure so that the number of pointers I was chasing wouldn't grow without bound. But needless to say, the performance sucked for just about every use case.
I finally got it right the fifth time around. Idea #4 wasn't a horrible idea -- the only poor decision I made was to keep 100 iterations worth of history around. The truth is, I only needed to keep one buffer of DNA, and one list of "blocks" with indicies into that buffer.
At 2am in the morning, my DNA-to-RNA translator weighed in at 530 lines of code and took about 3 minutes to run.
Day 3 (Sunday): my wife and I ran a few errands in the early afternoon, so I didn't get back to coding until around 5pm. At this point only eight hours remained in the competition -- and I hadn't even gotten to the second half, which was to convert the RNA into a picture! Even if I got that part working, I would only have a picture of an unhappy alien to show for it. I was completely at a loss for how to gain points by "morphing" the image into something even vaguely resembling a happy alien.
Fortunately the RNA-to-picture converter was much easier than the DNA-to-RNA converter. Three hours and 300 lines of straightforward coding later, I had it working flawlessly. There was only one bug in the whole thing, which didn't take too long to ferret out and fix. The converter dumped out the final image in 24-bit uncompressed BMP format -- it was the simplest thing I could come up with at the time, and I figured I could always just use GIMP to translate to a better format if need be.
The contest materials indicate that "something curious happens if the prefix IIPIFFCPICICIICPIICIPPPICIIC is used". That "something curious" turned out to be a self-check page (which turned up the only one bug remaining in my DNA-to-RNA converter, a corner case involving a out-of-bounds index).
That leaves five hours left to figure out how to morph the image ...
My first idea was to start with the image background. It was relatively uncomplicated -- all I'd have to do is lighten it up a bit. That shouldn't be too hard -- I could just fix that part and then call it a night. All I'd have to do is find the part of the DNA that dealt with drawing the background, change some constants, and voila ...
But first I'd have to find out where that DNA was.
I inserted some code to dump out intermediate results from the RNA-to-picture converter. And that's when I noticed it -- a string of DNA letters, hidden in the first few drawing instructions, which is immediately overwritten by the "unhappy alien" image. I punched in that string, reran the process, and out popped a secret message from the contest creators. Something to the effect of "it looks like you're trying to morph an image. We don't recommend you do it yourself, but if you absolutely have to, here's some technical details about how to do it".
It then gave two DNA prefixes. One of the prefixes generated an "almost there" image -- the nighttime scene had turned to day, but many details were missing. I assumed this was a "checkpoint" that you could submit and get credit for at least getting this far. It also explained why a number of contestants had all submitted the same 28-byte answer.
I quickly registered a new account and sent in my discovery. Three and a half hours left in the contest, and I was finally on the board, at position 151 (of 860+)! It turns out that a great number of contestants hadn't even gotten this far -- and from looking at the top of scoreboard, it seems other teams had gotten a lot further than I had.
(Later on I discovered that the 28-character freebie was one character longer than it needed to be. Other contestants had submitted 27-byte entries which, because it was one byte shorter, scored one point higher than the others. I figured out this trick, which moved me up to 84th place).
Back to the secret message. The other prefix turned out to generate another secret message: "navigating the field repair guide". Unfortunately, this page contained only cryptic instructions and the number "1337". A lot of trial-and-error later I finally was able to figure out what the instructions were getting at, and I was able to generate the "table of contents" to the "field repair guide".
By 2am I had most of the "field repair guide" printed out. Some of it was helpful, but other pages were amusing joke pages. The repair guide seemed to indicate the string-replacement DNA system was a fully Turing-complete programming language in itself. The middle section, which was never modified, contained subroutines ("genes") which would be copied to the front of the string (where they would be executed). The end of the string appears to be a temporary "stack" containing intermediate values, parameters and return values.
By this point, however, I realized that with one hour left in the competition, I was not likely to learn anything that would move my score any higher -- and seeing that I had to be at work Monday morning, finally chose to bag it for the night.
Conclusion: if I had to do it all over again, there's a few things I'd do differently.