The App Compat Guy

Chris Jackson's Semantic Consonance

June, 2005

  • The App Compat Guy

    Single Step Selection

    • 1 Comments

    Have you ever pondered about some really amazing feature of the biological world? The eye? The ear? The sense of touch? Bird flight? These are features evolved very gradually, over many generations. The net result was something that seems incredibly impressive. Sometimes so much so that it is hard to imagine that something so impressive could have evolved at all, yet they did.

    None of these features evolved through single step selection. A group of eyeless creatures did suddenly had offspring that had an impressive eye, complete with a focusing lens, a retracting iris, and appropriate neural connections. It is far more likely that one cell happened to be more light sensitive, so it evolved to have even more light sensitivity because that provided a competitive advantage. Eventually, a mutation came along that had two light-sensitive eyes. This gradual and naturally selected process continued until the eyes that we see today in all sorts of creatures had developed. (This, incidentally, happened more than once.) Of course, it is theoretically possible for something as complicated as an eye to evolve in a single generation. However, the odds against that happening, and happening in a way that is beneficial to the organism, are incredibly small. In fact, there are mechanisms in place to prevent such massive changes in DNA structure, because most massive mutations are not good things.

    The same is true of software. Most great software has evolved over many versions. Seldom is it exactly right the first time. Of course, a version of software is a somewhat arbitrary measure, because release criteria is not always equivalent. One developer may release 30 beta versions over 3 years before finally declaring a build V1.0. Another developer may release just two betas over 6 months before declaring a build V1.0. Therefore, we cannot take the build number at face value. The idea, however, is still the same. You are probably not going to get your software right the first time. You will release it into the wild, and all of a sudden, you will find a number of things wrong with it. In response, you evolve and refactor your software (assuming that it survives the first release), and the second release is generally better because we have directionally (as opposed to randomly) evolved it with a particular phenotype in mind.

    Avoiding single step selection is a good idea. You wouldn’t want to just sit down with a text editor and start writing code (which will generate digital DNA) for an operating system, proceed until you have implemented everything, and then compile and expect that, in a single step, you will get exactly the phenotype you are hoping for.

    This has behavioral implications. This is why a daily build is such a good idea. You generate a functioning organism every single day, and you can measure its phenotype against the expected phenotype, incorporating necessary mutations in the work that follows. It also makes beta releases a good idea. You are placing a particular set of DNA into an environment that more closely represents the “wild” of released software, so that again you can measure the actual phenotype against expectations.

    I am a huge proponent of Microsoft’s latest beta approach – the Community Technology Preview. This puts the organism in the hands of as many people as possible, so that natural selection determines which features make it into the final version, which features mutate, and which features disappear entirely. Allowing the actual environment to influence mutation earlier and more frequently helps ensure that the organism is as viable as possible when you release the final version of your software.

    Avoiding single step selection is also a compelling argument to avoid complete re-writes of software. Occasionally, a fresh developer will look at the code and suggest that they should lose all of the existing code and start again from scratch. After all, that code is old and full of messy bug fixes. What they apparently do not understand is that those mutations were necessary for the software’s survival. While another mutation might exhibit the same phenotype, how likely are you to find that more eloquent mutation on the first try? It is critical to leverage the value of the mutations to the software organism’s survival, and continue to mutate with future survival in mind. It is best to avoid mutation simply because it is old. In software, as in biological life, genes that have survived for quite a long time are very likely to be genes well adapted to survival. Why would you want to lose them?

  • The App Compat Guy

    Mutation and Genes

    From the comments I received, it is apparent that I rushed through my description of mutation, which seems to have led to some confusion. I will attempt to rectify that shortcoming.

    When I speak of mutation being non-random in biological life, there are a couple of ways to think of this. First, consider spontaneous mutation – a piece of the genetic code transforms itself from one message to another: we add a new nucleotide, or we remove an existing nucleotide. Assume no external influence. The rate at which these bits change depends entirely on what these nucleotides are. Some portions of that genetic code are more likely to change than others, due to the inherent stability of the molecules and the degree to which these modules really like being next to each other. Every nucleotide is not equally likely to change in an event of spontaneous mutation as any other nucleotide. Thus, it is not random. Now, you can also consider external factors. If the new combination is particularly unstable, that mutation may not last long at all, and instead trigger additional mutation. That mutation may produce a phenotype that is incapable of survival. These are also highly non-random.

    What is seemingly random is the phenotypic response to a particular mutation. It is convenient for the molecule itself to mutate – this does not occur for the benefit of the organism. A nucleotide sequence in DNA will mutate purely because it is more unstable than another one. It does not matter to the forces governing the mutation whether the resulting organism will benefit from this mutation or not. There is no intelligent design for this mutation. The DNA sequences governing eyesight, for example, are not going to mutate purposefully in order to improve that phenotypic response. If eyesight improves, it will be the result of random mutation, and disproportionate survival of organisms whose eyesight turned out better because of these mutations (natural selection).

    Mutation in software occurs primarily with explicit intent. (Yes, there are examples of viruses and other malicious software modifying the underlying software instructions, in much the same way that a gene researcher may insert a new sequence into an existing biological organism.) A jnz instruction will not mutate spontaneously into a jne instruction, for example. (A particular instance of this software – one cell – may do that in the event of hardware failure, but the software itself does not.) The creator, and the overseer of this organism, will change the underlying code with the intent of creating a phenotypic response that increases its chance of survival.

    Given the concept of intelligent design and purposeful mutation, we can consider how we want to go about mutating our software (since I believe we can all agree that software is not yet perfect). This brings up another reader question – what is the boundary of the gene, as opposed to the entire body of DNA for a piece of software? The truth is that this is something that is under the shared control of both our tools and us, as software developers. The reader suggested something like a class or a struct. While there certainly are some elements of the analogy that strongly suggest this, my tendency is to disagree. Why? This is an artifact of the tools we happen to use for intelligent design of software DNA. I create a class because it makes it easier for me to create my software. The design of my class drives the creation of a particular type of software DNA only when I am using the exact same compilers in the exact same environment. If I use the same compiler for that class, I will end up with the same DNA. If I use a different compiler, I may end up with completely different DNA that exhibits a completely different phenotype. For example, I may compile some C++ code using a compiler optimized meticulously for one particular processor. The result will – presumably – be code that runs faster on that processor. If my intent is to create a high performance library that only needs to run on that processor, the same input created a vastly different expression with vastly different survival characteristics.

    To me, a gene is a segment of binary code – a unit of deployment, if you will. I have seen (poorly written) software that embeds a huge amount of logic into a single logical unit of deployment – both at the human-readable source code level and the actual binary DNA level. Others more carefully analyze their software to ship them as a bundle of genes that can re-use each other and make the process of evolution more straightforward.

    However, since developers are predominantly human, then it is probably more useful to think in terms of the genes as having some meaning at the source code level. Even though I do not personally believe this is the best analogy, it is far more useful in a world where very few people write their code in 1’s and 0’s.

  • The App Compat Guy

    Terminology and Non-Random Mutation

    • 3 Comments

    I want to take a moment to go back and review some of the terminology I have been using, to ensure that there is no confusion. The reader will kindly indulge any ambiguity in my language up to this point – I am quite literally making this up as I go along.

    Binary Code == DNA

    In the analogy I have been using, binary code represents the DNA of a software organism. Why the binary code, rather than the original source code, or the diagrams that you used to design the source code? The binary code does the work. It drives the expression of the phenotype. The source code, and any documentation that guided its creation, is an artificial construct used to generate this particular binary code. (I will speak more on these constructs later.) At the most basic level, consider the fact that you can make a copy of software using the binary code only.

    Single Installation of Software == Cell

    Given a set of DNA, you now host that encoded information in an environment where it can develop. This environment is a single installation of a computer somewhere. That environment provides the means of expressing phenotype, and of survival itself for that cell. For example, consider a scenario where you write a software application that depends on the .NET Framework. You, therefore, depend on the expression of that particular set of DNA in order to operate. Now, how you draw this analogy is a matter of some debate. Since this is also DNA in that particular cell, it really is not different from the DNA in your application. In other situations, you may consider some DNA analogous to mitochondria, where it survives independently and provides critical services to the entire cell. This really is a detail of implementation – the cell has two distinct mechanisms contained inside of one barrier. We will not concern ourselves with perfecting our analogy to this point. What is critical is that you begin with a set of DNA, and it must work in concert with the other DNA in that “cell” in order to express itself or even to have the cell survive. It may depend on other DNA being there and expressing itself, and other DNA being there and expressing itself may adversely affect it.

    Entire Installed Base == Organism

    While a single installation of software must cooperate with, compete with, or ignore the other software DNA on a particular installation, the entire installed base must concern itself with the entire ecosystem of survival. Will it meet with broad acceptance and acquisition, or will it drift away into obscurity? This delves in to issues such as economics and emotional reaction to the software. The ability to behave well on a single installation does not guarantee survival, just as having perfect cells may not help you much when you happen to be sitting in a room full of hungry leopards.

    Note that I am not so bold as to suggest that a SKU is the best way to define where one organism stops and another begins. A single software organism may consist of several products combined in some sort of useful and interesting way.

    Selection

    In my analogy, selection at the cellular level determines the extent to which any organism can grow. If software does not operate in a majority of installations, the organism itself will remain small. (Say, for example, that the software only works on an obscure operating system, and must be paired with an extremely expensive companion software package.) Selection at the organism level determines the extent to which a viable set of DNA, perfectly able to grow and operate in a number of environments, will actually be able to compete with other software organisms for acquisition and use.

    Mutation

    When you think of true Darwinian evolution, you must take into account the idea of mutation. This mutation is decidedly non-random. Rather, it is dependent on the laws of physics. Certain molecules will change at measurable rates. Combinations of molecules will change at measurable rates. (For example, this is how we can determine the half-life of a given molecular structure.) These mutation rates differ between different sequences of molecules. In addition to varying rates of spontaneous mutation, there are also differences in the success rates of mutations. For example, the rate of variation in the histone gene is remarkably small across all eukaryotes specifically because variation in this gene is extremely maladaptive. (The protein it provides the recipe for plays a pivotal role in gene regulation, as well as forming the spools around which DNA winds.) However, the phenotypic expressions of gene mutation are random. If one sequence of DNA mutates at a rate of once every 10 years, this mutation will occur whether or not that mutation gives rise to either perfect eyesight or a complete lack of a liver. (Not that either one is likely to result from a single mutation.)

    Software mutation, on the other hand, is not at all random concerning phenotype (although human imperfection certainly makes it seem like this is the case at times). Software mutation takes place with the explicit purpose of creating a new, and supposed superior, phenotype. This is an important differentiation between software and our biological analogy. While we are still somewhat concerned with error checking to determine the health of our DNA, this is explicitly in response to parasitic modification in a particular cell. We do not to regulate our software’s own tendency towards spontaneous mutation with random phenotypic results. We guide the evolution of software with intelligent design.

    So, how then do we guide the evolution of our software to take into account that which parallels biological life, but also that which is fundamentally different (intelligent design)?

  • The App Compat Guy

    On the Nature of Software Organisms and Selection

    • 4 Comments

    In my last entry, I attempted to illustrate (hopefully with some degree of success) the reasoning behind viewing software as an organism, and all of the associated learning we may gain from such a comparison. In this entry, I am hoping to clarify this analogy a bit more, in order to provide for us a launching point to leverage this analogy more productively.

    The aspect I hope to clarify specifically is the boundaries of the organism, as well as the boundaries of taxonomy. What would we consider a single instance of a software organism? What defines a species? This does influence how we are able to draw some conclusions, so I believe this exercise truly is important.

    Consider, for example, Microsoft Word (which I am using to author this post). The underlying DNA behind this application is the binary code for the 2003 version, Service Pack 1, with all of the latest patches applied. Does this particular instance of Microsoft Word represent an organism, or do all instances of a particular version (at the micro-level, meaning that the next time a patch comes out I will have a new version), put together, represent a single organism?

    The best logical argument I can come up with will classify the entire collection of instances of a particular version as a single organism. Using our analogy, consider the human body. It consists of a large number of cells, each containing the same DNA. These cells, depending on their location, will exhibit a phenotype that depends on the chemical environment within and around that cell.

    In a similar way, one instance of Microsoft Word may be trying to operate in an environment where it cannot survive. (For example, an operating system other than the one the developers targeted.) It may be operating in an environment where it does not perform well (such as an instance running on a very busy e-commerce server). It may be running on a computer where a virus has changed its binary code – literally modifying its DNA so it exhibits a different phenotype. The overall health of the organism will not necessarily harm the organism itself, until such time as inflexibility to variations in the electronic environment cause people to stop acquiring and using the product, eliminating it in a process of natural selection in favor of a superior alternative.

    It is convenient that this classification also happens to be very useful. By considering every instance of a version as an organism, we can then consider other instances of the same species (in this case, a rival word processor) as well as ancestry (previous versions of the same application).

    It also offers us the opportunity to measure success – perhaps using sales figures, download rates, and lifespan. We gain the concept of selection. When a developer releases a version of software, that software organism grows to a particular size. The nature and rate of growth of that organism determines if the developer creates another version (organism). It also may give rise to competitive organisms, which seek the same resources (money) that the existing organism is consuming.

    This provides a strong analogy to evolution. Genetic code. Selection. Mutation. Embryology (the environment in which the organism grows). Assuming that we agree on this as a starting point, it’s probably about time to start leveraging this analogy productively rather than continue to strengthen the case for using it.

    I have had a couple of comments regarding where I am going with this. All have suggested generic algorithms, which are interesting, but this was not where I was originally heading. (Of course, now I feel almost obligated to head there at some point.) To Ralf, I say yes – I do intend to explore how we could leverage this to build the next ERP system. :-)

    In addition, somebody contacted me because he was unable to leave a comment on the blog itself. For now, I have enabled anonymous comments, and we will see how that goes.

  • The App Compat Guy

    Software as an Organism

    • 4 Comments

    Can we correctly describe software as an organism?

    I believe that we can make a compelling argument to do exactly that. To achieve this, I first intend to run through analogies that will describe some of the correlations between software and biological life, which may help to explain why we would want to endeavor on such an exercise in the first place. If we can agree on this, then we can explore some more compelling arguments to use this terminology, which I hope will lead us to some conclusions that will forward the way we think about, design, and build software.

    When you run software, you experience to the phenotype generated by the software code. Therefore, to use biological terms, the source code is analogous to DNA, and the executing binary is analogous to the physical manifestation of that DNA after it develops in the environment where that DNA happens to be situated (typically an egg cell of some size and shape).

    Of course, this may be a flawed analogy. DNA, after all, is more of a recipe than a blueprint. Source code may be more of a blueprint than a recipe. For example, there is no DNA dictating exactly how to build an eye, which we could remove, insert into an egg cell, and grow only that eye. Rather, the DNA tells the original egg cell to divide in such a way that there is a slight difference between the two resulting cells chemically. In these two new cells, the slight differences trigger the reading of slightly different DNA strands from this cookbook, producing two cells each that are, again, slightly differentiated. This process continues until a single cell has been chemically prepared enough to be the precursor to an eye, which enables the reading of the DNA that specifies the design of that eye. In effect, you have a recipe for creating the chemical environment necessary to generate an eye and read additional components of the DNA that direct any variations in eye design that the chemical variation is designed to support.

    We generally conceive of software, on the other hand, as much more of a blueprint. A menu item exists because software code specifically dictates that the computer should draw a menu item there, with the following attributes. However, this way of thinking about it may be too simplistic. How many times have you had one computer operation work repeatedly, but suddenly, on one occasion, this operation no longer works? Maybe the computer does not draw that menu item for some reason. (We can seldom explain that reason without a healthy dose of knowledge and some time with a debugger.) The phenotype of that software has now changed because of changes in the electronic (as opposed to chemical) environment surrounding that software! At commercial software companies, we see this sometimes with bug fixes. We fix one piece of software, which may fix one problem but alters the electronic environment for all other software. Suddenly, this other software (which depended on a particular electronic environment – which it may or may not be aware of) stops working and begins to exhibit a different phenotype despite no change whatsoever in the underlying source code.

    Of course, this is not nearly enough evidence to consider software itself an organism. Rather, most could probably agree that we can define life as something that is able to perpetuate itself. DNA is the basis of all known life precisely because it is so efficiently and accurately able to replicate itself. To some extent, we can see some software that is able to replicate itself – think of a computer virus. The problem with computer viruses is that they are so very efficient at replicating themselves. However, you do not typically think of a program such as Microsoft Word replicating itself wildly. If we dig a bit deeper, however, we can see a better comparison. DNA, in and of itself, really is not that terribly useful. As soon as you introduce enzymes which are able to read that DNA and duplicate it, then you have a powerful self-replication system. (You further need the ability to read the DNA, create RNA, and generate proteins if you want that DNA to exhibit a phenotype. Otherwise, life as we know it would be nothing more advanced than a huge number of strands of DNA floating around in the primordial ooze.) These enzymes are an agent. Another example of an agent providing the means of replication is with an actual (not computer) virus. Many times, they are nothing more than a simple strand of DNA, optimized for entering host cells and utilizing their resources to replicate. They cannot replicate without a host cell. Most software, similarly, does not replicate itself. However, you can use a host (such as a CD burning facility, or a web site) to generate copies of the source code, and thereby spawn additional instances of that phenotype.

    So, at its root, both DNA and source code are a code (one digital with 2 permutations, the other with 4 permutations) that can be read in a certain environment to exhibit a phenotype, and furthermore can be replicated to perpetuate their own lives over time. To me, that means that we have a kind of non-biological organism. Conceiving of software in this way allows us to open our minds to many of the things that we have discovered in the biological realm, which we can potentially leverage to improve our analogous software. Most interesting to me is the concept of evolution.

Page 1 of 1 (5 items)