Alex Barnett posted a comment in my last post referencing a video on the University of Washington website: http://uwtv.org/programs/displayevent.asp?rid=902. This is a worthwhile video, and I wanted to comment on some of the contents. Of course, I freely admit my bias towards some of his opinions, because he seems to share my opinion that Richard Dawkins is alpha to Stephen Jay Gould in the world of evolutionary scientists. Then again, perhaps it is my bias that caused me to infer this!

In this video, Daniel Dennett shows a video showcasing some computer applications where genetic programming was used to evolve computerized "beings" that are modified and selected for using artificial selection - these selections were based on fitness as defined by swimming, walking, etc. He also posts a couple of slides from Dawkins' The Blind Watchmaker - drawings of "biomorphs" produced by a genetic algorithm. The PC version of this software is harder to find, but the Mac version is still available on Amazon.com.

Are examples such as these the reason why some people hear the term "genetic programming" and immediately think that it has nothing to offer them? When you look at the output of these programs, you do get a sense of wonder. Here is this interesting and complicated thing that nobody actually took the time to create. But is a set of boxes that can simulate walking useful to the average developer? John Carmack can produce things that simulate walking around on your computer screen that are a lot more compelling to the average person than some boxes. Yes, it's a fascinating novelty because there was not an intelligent designer behind those boxes, but what is the practical use?

Like most developers I know, I spend most of my time writing software that helps businesses run. I may want to write some value from the database on the screen. For my purposes, a simple Console.WriteLine is good enough. I may want to draw a chart on the screen. For my purposes, writing some GDI+ code to render that chart is good enough. As long as I have rendered the information quickly enough that my user isn't bored or dissatisfied, then gaining a few milliseconds may not be important. In fact, I am often more concerned with writing maintainable code to pass along to the next intelligent designer than I am in maximizing performance. Genetic programming has a tendency to produce somewhat obscure code. It may be provably correct, but more often than not after you invest a huge amount of time attempting to understand it, you will usually shrug your shoulders and think, "I would have never thought of that. I still probably wouldn't, and I have already seen it."

Does that mean, however, that there is no practical use, simply because this is not a pervasive programming style today? This is where I would disagree. There are many problems that are exceptionally difficult to solve. My most common experiences with difficult problems are around performance and scalability. If I need to write one line of text to a screen, then I can do that good enough, make it understandable, and save a lot of time doing it. If I need to write one line of text to 10 million screens, all in under one second, then suddenly I have a very different problem domain. Of course, I could develop, measure, tune, measure, tune, measure, repeat until I get the application working good enough (which is, after all, what truly matters). But, depending on what I know, this could be an extensively long cycle. It also tends to be somewhat mundane. How do I know that I am tweaking the right thing? What if I never get there? Can this be automated? This automation is genetic programming.

The vast majority of problems do not require such performance. Most developers can write an intranet application that works good enough. That doesn't mean that there is no benefit, that just means that the current expectations that people have of their software are being met. But if you could render more interesting information on your computer faster, or have more robust interaction with a low-bandwidth occasionally connected device, then you enable entirely new scenarios. Another place where genetic programming might be valuable.

The problem, of course, is that genetic programming is hard to do. There are no robust tools to work with. The discipline is still very young. Dr. Dennett brings up an exceptionally going point, which is worth considering with regards to this problem. He points out the lack of noise in a computer simulation. The computer has nothing that you don't program into it. In one sense, this is very true. In another sense, we could consider that the computer program has been artificially filtering the noise, and then need to re-introduce some subset of that to be interesting. As I pointed out in my last post, there are 54,993,666,708,469,390,869,140,625 ways to construct a 10-machine-instruction program. That seems like a lot of noise to me! We have a huge problem with the combinatorial nature of our software, and we combat this by filtering noise and then attempting to deliberately re-introduce some of this noise.

What we really want to do is find some solution to combat our combinatorial dilemma. The underlying problem we are trying to solve is how we should best go about arriving at the best possible solution out of the unfathomably huge number of alternatives. It is incredibly unlikely that even the most intelligent of intelligent designers is going to be able to pick the single best solution. There are more ways to write a program the size of Microsoft Windows than there are atoms in the entire universe. Obviously, we can't just try all of them. So, let's just wander around parts of the solution domain to see what we can solve. What is the best approach to taking a subset, yet still ensuring that you consider alternatives that may not seem intuitively obvious? Be random. Randomness and natural selection is simply a tool that we can use to wander around the set of all possible solutions, in the hope that we find something better than that which is simply most obvious to humans and our innate ability to see patterns (even when they don't actually exist).

We can see the power of genetic algorithms in some of the work that has been done already. It's even more evident just by looking in a mirror. Randomly meandering about the problem domain can produce edge cases such as us that are pretty impressive. We simply need a good technical solution to introducing constrained randomness. Today, that is frequently done by limiting the number of instructions that a genetic program will use. We also introduce mutations at a controlled rate. (If your microscope is just barely out of focus, wouldn't you rather give it a small turn than a big one?) And we have observed from the biological world that sexual selection is an effective means of introducing necessary diversity when rates of reproduction are slow, and we attempt to somehow incorporate that into genetic algorithms. What is challenging is figuring out where to mutate. In the biological world, mutations are guided by chemical properties. Sexual selection is based on molecular interactions and the properties of the molecules that make the proteins. With computers, we just have a huge number of transistors that are either on or off. We have to decide how to break them up and mutate them. How do we decide what is a good way to cross two programs? I have not seen this answered satisfactorily.

So, I see the sweet spot of genetic programming to be very tangible. We have huge opportunities to improve performance and scalability, both in distributed architectures and on a smart client application that renders rich graphics. I see this sweet spot as having more meaning to the average developer once there are some tools in place, as well as some meaningful standards on how to implement mutation and crossing. I don't think it's irrelevant at all - I just think it's too hard. But as technology continues to get more and more sophisticated, we are going to need to find better tools to crawl the combinatorial problem space that this sophistication brings with it. Of course, much of the work may be done at the class library level, so the average developer doesn't need to understand the underlying sophistication - just benefit from the results.