Blog Log Analysis

Blog Log Analysis

  • Comments 3

I've been keeping a blog for about two months now, and I thought it would be an interesting exercise to do some analysis of the logs. The blogging application that this site uses (BlogX) records the daily hits each blog gets into a tab-delimited file, so I used Data Transformation Services to clean the data up a bit and import it into SQL Server, and then finally used Analysis Services to create a multidimensional cube that I could manipulate with Excel. This process worked very smoothly, and saved the need to purchase a specialised web reporting tool. I'll document this process more fully at a later stage, but the information gleaned from the analysis was quite revealing about the current status of the blogging world:

  • At the moment my blog averages around 40,000 hits per month. I've no idea how that compares to other blogs out there, but knowing that your blog is read is definitely a motivating factor when writing new entries! I suspect that most people stumble across this blog because it's posted on the main GotDotNet blogs page; I'm certainly under no illusions that it's to do with any personal fame. Like any other website, one of the biggest challenges of a blog is capturing and maintaining traffic to the site. For bloggers without the inherent advantage of working for Microsoft, aggregation sites such as PDC Bloggers are probably one of the best ways to spread the word.
  • I'm amused and amazed at how many people have wound up at the blog by means of a Google search. Unsurprisingly, searching for "Tim Sneath" brings the blog more or less to the top of the results, but I've had hits that have come from such bizarre search terms as "lossless wma", "Sitar music that you can listen to on the net", and "Frank Zappa AND Albanian Music"! Approximately 5% of browser hits to the site come via Google; other search engines might as well not exist for the traffic they bring.
  • There's an astonishing variety of blog aggregators and browsing tools in use: I counted over 500 distinct user agent strings. Of the aggregators, various variants of SharpReader are the most popular, with a 46% share; Newsgator comes next with 23%; NewzCrawler has a 5% share, and many others have a smaller share. (Incidentally, 8% of visitors have an empty useragent string, a surprisingly high number.) I'm a SharpReader user myself; although I've never done an exhaustive survey of aggregation tools, I've certainly heard good things about Newsgator. What's NewzCrawler like (I've not come across it before)?
  • The most popular blog entries have been ADO.NET Tips and Tricks, Mind Mapping and New C# Features in Whidbey. The last of the three can be explained by a link from Robert Scoble's immensely popular blog, but the other two were a little more unexpected. I'll write more on ADO.NET shortly.
  • Traffic drops by about 20% at the weekend. I was expecting that to be higher, but I guess many people leave their computers on permanently, so the aggregators continue to poll for new content.

Overall it's been an intriguing experiment. I look forward to repeating it in a couple of months to see whether there have been any noticeable changes of trend as weblogging continues to mature.

  • Hi! Is there a general place I could to work up a solution like yours? i'm trying to analyse my blog, and I like your solution, but I know nothing about datacubes :)
  • Addendum: Have now written up the "HOWTO" for creating the Analysis Services cube at the following location:
  • Here's another google search that put your blog on the 1st page - "most popular blogs". That should keep you writing. If you find yourself ever at a loss for something to comment on you can go to the Random Thoughts section of for a collection of rarely, if ever, voiced ideas.
Page 1 of 1 (3 items)