Genetic Programming is a fascinating field of study. Essentially, this is the study of software that writes software, selecting the software it has written that exhibits the highest degree of fitness, and allowing this software to continue to evolve over time. In essence, what Genetic Programming is trying to do is find some route to a higher degree of fitness that is more efficient than either random selection or intelligent design.
To understand the magnitude of the problem, consider how quickly the size of the problem can grow. Say, for example, that you are using IA-32 assembly language on a processor much like my humble little Pentium M in my Tablet PC. How many discrete instructions does the IA-32 instruction set provide? Hundreds. Let's simplify things and say that there are 375 available instructions. We'll ignore the register architecture for the moment, which adds additional complexity. There are, consequently, 54,993,666,708,469,390,869,140,625 different ways to construct a program that is only 10 instructions long. Most of these don't work. Of those combinations that do work, most of them don't do anything useful. The very small number of combinations which do perform some useful activity don't do very much of it.
Most of us are all too familiar with just how easy it is to come up with a combination of instructions that just doesn't work.
Now consider the fact that your program may not necessarily be 10 instructions long. You could potentially solve the exact same problem in 10 instructions, 5 instructions, or 38 instructions. In fact, any solution greater than 10 instructions could potentially be a successful solution that happens to use a lot of NOP instructions (unless, of course, the context of the EIP register happens to be important to determining the correctness of the solution). All put together, there are a LOT of ways to create a program to solve a problem, and the overwhelming majority of them are wrong.
Genetic Programming is an attempt to find a quicker way to the solution from this enormous number of potential solutions. It is not generally practical to try every possible solution to the problem to discover the best one. By starting off with some solutions, selecting the best, and then randomly mutating and recombining, Genetic Programming explores the problem domain much more quickly. Of course, because it covers the set of potential solutions randomly, it is possible that it could still miss the best solution, but any alternative that does not systematically test every possible combination of instructions is subject to the same limitation. The assumption, of course, is that solutions that are a smaller number of mutations away from the "best" solution will show a higher degree of fitness on the test used by your Genetic Programming algorithm.
I don't want to delve too far into Genetic Programming today. What is important to note about genetic programming is the unit of selection: genetic program modifies the source code of the most successful organisms when it is performing mutation or crossover operations.
When I was doing my initial exploration of the terminology in this post, I considered binary code to be analogous to DNA. This is still a perfectly suitable analogy in some regards. (That's the problem with analogies - they are never precise.) Binary code is still the device that drives the expression of phenotype. It is still sufficient to duplicate a piece of software. But, in one very important regard, I don't believe it is always the best analogy, unless you happen to be using binary code as the instruction set.
An important concept is the unit of selection. If you have written an application in C#, and are maintaining that application, those constructs which you happen to save in that C# source code are those that survive to the next generation. This is, indeed, the unit of selection. The binary source code is, of course, a hugely important aspect of the ontogeny of the organism, but I wanted to redefine the analogy I had used previously to be more suitable to some of the concepts that I believe are interesting to explore. As the unit of selection, the source code (in whichever language) is the most suitable analogy to DNA for my purposes.
In my last entry, I discussed complexity in evolution, and how the most highly complex software is, in fact, the edge case. Far more software is less complex; more people have written a "Hello World" program than have written an application of the complexity of, say, Microsoft BizTalk Server.
This begs the question - how can I maximize my experience in building less complex applications? How do I do it at all? For anybody who loves to build fantastic software and change the world, it's important to leverage these opportunities to both improve and enjoy yourself.
Chris Sells writes, "I found out something about myself: I'm really good at digging into the state of the art, whether it's one technology or a feature across technologies, if I have a problem I'm trying to solve. However, if I'm just wandering in a space w/o an explicit goal, e.g. give a presentation, build an app, write an article, I'm lost; I just can't muster any juice."
I find this to be true as well. Say that I decide to come up with a piece of software to write which I can complete in a reasonable amount of time, gain some new experiences from, and share with as many people as I am interested in impacting. Frequently, I will spend more time coming up with the idea than I do actually implementing the project.
There are a couple of useful tools that I have found for overcoming these obstacles. The first is the Coding for Fun website. There are tons of starter ideas, as well as fully fleshed out ideas here. This is a great place to start, because you may find something you've been interested in learning here.
More importantly, Microsoft is starting to define a platform for making your smaller, less complex applications more globally useful: gadgets. Not that this is a new approach. Stardock supports the notion of gadgets with its DesktopX software. Konfabulator has desktop gadgets. Desktop Sidebar has sidebar gadgets. The difference is that these gadgets ran inside of an applicaiton. They are now coming at the platform level. Check out Microsoft Gadgets to read more and to find links to other information.
I think this is important for a couple of reasons. First, it makes it easier to write something fairly cool in a short period of time. Why? You don't have to write all of the platform-level technologies such as containers and windows. You can just create the gadget to do something neat. Second, since there will always be more software that is less complex, we can expose that less complex software more robustly, make it more useful, and generally improve the overall ecosystem.