I recently interviewed Denny Lee from the SQL Customer Advisory Team about applying data mining to medical research which you can see a video of on Channel 8.  I asked Denny about other areas of academic research where data mining could be applied.  I gave this question some thought on my own and one interesting area could be in genetic algorithm (GA) research.

My advisor for my Computer Science degree was, Dr. Jeff Horn, whose main area of research was GA.  I am by no means an expert on the topic, but I had the opportunity to work a bit with the technology.  GA’s are generally used to find “best fit” solutions to NP-complete problems in which it is “impossible” to test every combination of solution so a method for finding a best fit solution is needed.

dna GA’s use a paradigm based on evolutionary biology where solution sets combine, crossover, mutate, etc.  The goal is to find a solution with the best “fitness”.  The GA’s we were using 10 years ago were passed a number of different startup parameters that affected the number of generations, chance of mutation and crossover, etc.  Depending on the problem set we would see the GA runs come up with varying solutions in terms of fitness.

Now you may be asking yourself “What about data mining?”

We would run these GA’s over and over and over logging and analyzing the results.  My suggestion would be to store the startup parameters and results of the runs or the performance of each generation into a data warehouse.  An example of the types of questions you could use data mining classification for are:

·         What are the clusters of ranges of startup parameters that provide the highest fitness?

·         Is the algorithm generating clusters of solutions?  What are they and why are they being caused?

Although the questions above are only scratching the surface, I think researchers who live and breathe this stuff could really benefit from the power of data mining tools to complement their research, tune the algorithms, etc.

Do any readers out there have more ideas how data mining could be applied in academia?