I want to take a moment to go back and review some of the terminology I have been using, to ensure that there is no confusion. The reader will kindly indulge any ambiguity in my language up to this point – I am quite literally making this up as I go along.

Binary Code == DNA

In the analogy I have been using, binary code represents the DNA of a software organism. Why the binary code, rather than the original source code, or the diagrams that you used to design the source code? The binary code does the work. It drives the expression of the phenotype. The source code, and any documentation that guided its creation, is an artificial construct used to generate this particular binary code. (I will speak more on these constructs later.) At the most basic level, consider the fact that you can make a copy of software using the binary code only.

Single Installation of Software == Cell

Given a set of DNA, you now host that encoded information in an environment where it can develop. This environment is a single installation of a computer somewhere. That environment provides the means of expressing phenotype, and of survival itself for that cell. For example, consider a scenario where you write a software application that depends on the .NET Framework. You, therefore, depend on the expression of that particular set of DNA in order to operate. Now, how you draw this analogy is a matter of some debate. Since this is also DNA in that particular cell, it really is not different from the DNA in your application. In other situations, you may consider some DNA analogous to mitochondria, where it survives independently and provides critical services to the entire cell. This really is a detail of implementation – the cell has two distinct mechanisms contained inside of one barrier. We will not concern ourselves with perfecting our analogy to this point. What is critical is that you begin with a set of DNA, and it must work in concert with the other DNA in that “cell” in order to express itself or even to have the cell survive. It may depend on other DNA being there and expressing itself, and other DNA being there and expressing itself may adversely affect it.

Entire Installed Base == Organism

While a single installation of software must cooperate with, compete with, or ignore the other software DNA on a particular installation, the entire installed base must concern itself with the entire ecosystem of survival. Will it meet with broad acceptance and acquisition, or will it drift away into obscurity? This delves in to issues such as economics and emotional reaction to the software. The ability to behave well on a single installation does not guarantee survival, just as having perfect cells may not help you much when you happen to be sitting in a room full of hungry leopards.

Note that I am not so bold as to suggest that a SKU is the best way to define where one organism stops and another begins. A single software organism may consist of several products combined in some sort of useful and interesting way.

Selection

In my analogy, selection at the cellular level determines the extent to which any organism can grow. If software does not operate in a majority of installations, the organism itself will remain small. (Say, for example, that the software only works on an obscure operating system, and must be paired with an extremely expensive companion software package.) Selection at the organism level determines the extent to which a viable set of DNA, perfectly able to grow and operate in a number of environments, will actually be able to compete with other software organisms for acquisition and use.

Mutation

When you think of true Darwinian evolution, you must take into account the idea of mutation. This mutation is decidedly non-random. Rather, it is dependent on the laws of physics. Certain molecules will change at measurable rates. Combinations of molecules will change at measurable rates. (For example, this is how we can determine the half-life of a given molecular structure.) These mutation rates differ between different sequences of molecules. In addition to varying rates of spontaneous mutation, there are also differences in the success rates of mutations. For example, the rate of variation in the histone gene is remarkably small across all eukaryotes specifically because variation in this gene is extremely maladaptive. (The protein it provides the recipe for plays a pivotal role in gene regulation, as well as forming the spools around which DNA winds.) However, the phenotypic expressions of gene mutation are random. If one sequence of DNA mutates at a rate of once every 10 years, this mutation will occur whether or not that mutation gives rise to either perfect eyesight or a complete lack of a liver. (Not that either one is likely to result from a single mutation.)

Software mutation, on the other hand, is not at all random concerning phenotype (although human imperfection certainly makes it seem like this is the case at times). Software mutation takes place with the explicit purpose of creating a new, and supposed superior, phenotype. This is an important differentiation between software and our biological analogy. While we are still somewhat concerned with error checking to determine the health of our DNA, this is explicitly in response to parasitic modification in a particular cell. We do not to regulate our software’s own tendency towards spontaneous mutation with random phenotypic results. We guide the evolution of software with intelligent design.

So, how then do we guide the evolution of our software to take into account that which parallels biological life, but also that which is fundamentally different (intelligent design)?