So.. I've finally got around to blogging about Genetic Algorithms, apologies it's taken me so long. I'll kick off with a very brief tutorial around the theory behind genetic algorithms, before moving on to a concrete example showing how to solve the Path Finder problem in my next post. It's not an easy topic, but bear with me and I'll do the best I can to make sense of it for you.
So, first things first, we need to refresh our Biology knowledge regarding the theory of evolution and natural selection, as this forms the basis of Genetic Algorithms.
Evolution and Natural Selection
OK, Evolution is the term used to describe changes in a populations traits from generation to generation. These traits are encoded in segments of nucleic acid called genes, which connect together to form chromosomes. When species reproduce, their encoded traits, genes, are copied and passed on to their offspring in a process called Recombination, so for instance half of an offspring’s genes could come from one parent and half from another, the exact combination though is determined by the crossover rate. This is one way in which the traits of the next generation are determined.
Natural selection simply states that favourable traits in one population are more likely to survive and be passed onto subsequent generations. For example, a gene directly related to a predator have excellent sight would be more likely to be passed on to future generations as that predator would in theory be able to catch more prey and have a greater chance of survival (thereby living longer and reproducing more), passing it’s genes on (think ‘survival of the fittest’). As you’ll see further on in this tutorial, the concept of ‘fitness’ for an organism is absolutely critical and one of the most difficult things about Genetic Algorithms is defining and calculating the fitness for your conceptual organism.
As well as natural selection and recombination affecting the genes of subsequent generations, on occasion mutation can occur. Mutation could result in no change to the organism with regard to its fitness, it could affect it’s fitness negatively or it could have a positive effect on the fitness. Clearly a mutation that resulted in a usually four legged creature being born with three legs wouldn’t ‘stand it’ in good stead ; )
So how do we take advantage of this evolutionary process to solve problems?
One of the first things we need to do is decide upon a convention for representing possible solutions (guesses) to the problem in hand. You can represent these guesses in any way you choose, in this tutorial we will be encoding them as a binary string. These guesses will be our version of a chromosome (chain of genes). In very basic terms, here is how we will attempt to solve a problem:
1) First off, we will need to create a starting population of random chromosomes to work with
2) Following on from this we will take each one in turn and work out how good it is at solving our particular problem. The better the chromosome is at solving our problem, the higher a ‘fitness score’ we will give it
3) We can then implement our version of recombination by selecting two chromosomes from the population. The fitness score directly relates to the chance of a chromosome being selected and effectively being allowed to reproduce in our program. There are different methods of selection that can be used in Genetic Algorithms (Roulette Wheel, Tournament and Elitism), we will use Roulette Wheel selection
4) We then need to work out how the offspring chromosome will be made up from both parent chromosomes. This process is called crossover
5) Evaluate the chromosome genes for mutation
6) Repeat as necessary to create a population of the right size
Making any sense? I tend to find that the entire process only really starts to make some sense when the scheme used to encode a potential solution as a chromosome has been explained, as well as how to evaluate a chromosomes ‘fitness’ rating. So hang on in there for a short while yet J
In the process listed above, step 3 mentions the selection process for deciding which two chromosomes (remember these are our guesses) will get the chance to create offspring. The selection methodology we are going to use is Roulette wheel selection. The easiest way to imagine this method is to picture a pie chart in your head. Now... with that in mind, assign each chromosome a slice of the chart..... BUT.... the chromosomes with the higher fitness score will receive a larger slice of the pie chart than those with a lower fitness score. This will result in the higher scoring chromosomes having an increased, but not definite chance of being able to ‘reproduce’.
Now comes the interesting part.... how do the genes within each parent chromosome make up the genes in the offspring chromosome? Two things in our simulation will affect how this is made up, the Crossover and Mutation rate. More imagery will help illustrate this point. Consider the following two chromosomes in a hypothetical population that have been ‘selected’ to reproduce.
0011010001001100
0100101010101001
First off, we need to decide if the two chromosomes will ‘swap’ their bits. We use a constant factor to decide this, a good starting point being 0.6 or 0.7. If the decision is to swap, we select a random point within the string of bits, let’s say 12, and then swap all the bits that follow. So, the chromosomes above will become:
0011010001001001
0100101010101100
We then evaluate the chromosomes to see if mutation needs to be applied to each bit in the chromosomes, the chance of this happening should be set to something very low as mutation should be a rare event.
And... that's the bones of it. I'll put together some source code that illustrates this in action and publish it asap. If anyone has any idea for applying this let me know and we can put something together if it's interesting.