Despite the beautiful sunny weather in the Bay Area this weekend, I spent most of my time indoors reading Toby Segaran's book "Programming Collective Intelligence". (Ok, I did fit in a couple of trips to the Dog Park - expectations are firmly fixed there.) This is the most engaging 'data book' I've read since the Kimball Group's Data Warehouse series. Although most business scenarios I encounter can be solved by looking at the simple metrics, e.g. Orders shipped, Page Views, Conversion Rates, eCPM, Toby's book challenged me to think the issues more broadly. I've read a few of data mining books in the past that covered many of the same algorithms (e.g. collaborative filtering, Decision Trees, SVM) in the past, but none were as approachable as Programming Collective Intelligence.

Toby captures the relevant theory and mechanics of each algorithm in the narrative - many books leave this as a series of equations with minimal textual explanation. And he uses python code to implement the algorithm. It's awesome! Something about seeing working code for these algorithms speaks clearly to me. Yes, yes I could crack open R's source code. I've done that a couple of times but despite R being a wonderful language for data analysis it's code is not as readable as python. Nor do the comments in the R source code paint the picture as clearly as Toby's narrative. Furthermore, Python is more of a mainstream language, so incorporating it into current IT environments would be easier.

In short if you are looking for a friendly introduction to data mining check out 'Programming Collective Intelligence'. It may just make you want to crack open your web logs or customer database for another look. More about Toby and his book can be found on his blog (http://kiwitobes.com/). Also in case you are wondering I don't know Toby, I just admire his book :).

 -Daniel