Experimenting with Yukon Analysis Services

Experimenting with Yukon Analysis Services

  • Comments 3

Following on from the last blog entry, when I wrote about creating an Analysis Services cube to work with blog logs, I've been experimenting with doing the same things with Yukon. I hit a few road bumps along the way, which helped me understand some important differences in the Yukon Analysis Services architecture.

I should start with a brief eulogy of praise for the new version of Analysis Services. The Yukon team have put an enormous amount of effort into revamping the architecture, and the result of their labours is an environment that blows away the traditional barriers between relational and multidimensional databases. You can now create a cube based on a fully normalised relational database schema; you can use proactive caching features to access active data in real-time; and you can do away with some of the traditional limitations of one fact table per cube and one hierarchy per dimension. The feature that gets a round of applause every time I demonstrate it, even from a traditionally cynical British audience, is the "one click cube", more properly known as IntelliCube. With this feature, you can point Analysis Services at a data source and it can automatically determine the appropriate fact tables, dimensions, hierarchies and levels - a wizard that can potentially turn hours of repetitive work into minutes. Trust me - once you've played with the new stuff, it's painful to go back to the old way of working, good though the 2000 release of Analysis Services is.

When I used IntelliCube on the website log table, however, I ran into a couple of difficulties. The wizard ran through seamlessly, but it only detected one dimension. Not only that, but it didn't find any hierarchies. Even when I tried to create a time dimension, it didn't allow me to split it by a hierarchy such as year / quarter / month, meaning that each of the 350,000 rows appeared separately without any aggregation occurring. To be absolutely clear, the data source here is unusual: both facts and dimensions occur in the same table. This is something that's not generally recommended for good scalability, but quite viable for a comparatively small quantity of data such as this. Given this fact, I posted a provocatively titled mail to the Picasso team, suggesting that they'd perhaps not considered this particular usage scenario.

The development team were good enough to respond very quickly and a short phone call later (many thanks, Thierry!) we'd determined what was going wrong. It turned out to my surprise that Yukon was doing more or less the right thing, and the problem was a few subtle changes in terminology that were causing me confusion.

Firstly, Yukon introduces the concept of attributes that can form part of a dimension. An attribute offers an alternative to dimensions for slicing data. For instance, a Customer dimension might include a traditional location-based hierarchy of country / region / city / street, as well as attributes representing (for instance) gender, household income, and marital status. Wherever you'd traditionally use a dimension, you can optionally choose to use an attribute instead, or even use multiple attributes to create a custom dimension. On my cube, the IntelliCube wizard had simply created each of the slicers (URL, referrer, user agent, time) as attributes rather than the dimensions as existed in the previous release. So rather than having 1 cube and 4 dimensions, I instead have 1 cube, 1 dimension and 4 attributes in the Yukon version.

The second discrepancy related to the time dimension. Here there is a missing feature in the Beta 1 release of the IntelliCube wizard, as the wizard only supports time dimensions where the field has already been exploded into its constituent parts (year / month / day etc.) Analysis Services 2000 could automatically handle the separation of a timestamp into its constituent parts, but it only created dimension members where a row of data existed. This would mean that if your data only included values for the 1st,2nd,4th and 5th of a month, you would wind up with holes in the dimension members - as if the 3rd of the month never even existed! Yukon can automatically create the members for a particular date period, which resolves this problem, but in Beta 1 this has to be done outside of the wizard.

The bottom line I hope people take away from this - Analysis Services in Yukon is a very different beast: it's far more powerful and plays even better with the rest of your operational data, but you'll need to make a few mental leaps if you're already used to the current release of Analysis Services. Fortunately, we also include an import utility that enables you to bring existing cubes across without fuss.

I'll write more about Analysis Services in a couple of weeks, and hopefully drill further into some of the changes that have been made.

  • I just recently got to use yukon beta and play around with it. the one-click cube feature is fantastic. Since i was born curious, Ive taken it upon myself to determine the algorithm behind such a feature.

    Im starting with just trying to figure out how you would determine a fact table first - and then worry about dimensions later - but its difficult. Fun though.. but difficult.

    Heres my initial cut.

    POTENTIAL FACT TABLES
    --------------------------------
    1) Locate all Tables in database (one db only)
    that have a foreign key constraint on them.
    2) Locate all Tables that are alone and not connected to any other tables.

    But using this formula and applying it with the northwind database gives me more fact tables than IntelliCube determines.
  • Hi Tim. I think I've hit those speedbumps too.

    Can you tell me whether the June CTP of Yukon Analysis Services still has the 'unable to automatically form a time hierarchy from a datetime field' problem?
  • Krakow accommodation
    http://hotels.krakow.pl
Page 1 of 1 (3 items)