Tim Sneath

Musings of a Client Platform Guy

November, 2003

  • Tim Sneath

    Best Practices Analyzer for SQL Server

    • 0 Comments

    This tool slipped out of Redmond with little fanfare, but it looks quite promising. It does for SQL Server databases roughly what FxCop does for .NET Framework applications: specifically, it scans through the database schemas and associated SQL code using a rules-based engine to identify any deviations from accepted best practices.

    It's currently in beta, and I found the tool interface a little idiosyncratic (someone took the Windows Update design metaphor slightly too far, perhaps), but it has a fairly comprehensive set of rules (you can access them here so that you can view them without having to install the product). Each rule seems to be implemented as a separate .NET assembly, but for this beta version there's no documentation on how to create your own rule.

    The analyzer tool has a lot of potential value for both developers and DBAs; given that you can analyse a remote server, it's worth loading up on a development workstation to iron out any T-SQL glitches before you release a database into production, and then repeating the process once it's in place to spot any operational oversights.

  • Tim Sneath

    .NET User Groups in the UK

    • 4 Comments
    <PublicServiceAnnouncement>

    It's been quite exciting over the last twelve months to see a number of new .NET user groups spring up in the UK. I've got a high regard for user groups in general - they often provide support (of both social and technical varieties), excellent technical information and a better sense of community than I think we'd ever be able to offer ourselves. Meeting up with other local developers can be helpful as a way to build up contacts in the industry too, particularly for freelancers and contractors, and you can learn a lot from hearing others' war stories of how they dealt with development challenges. On a personal level, I've reason to be grateful to a couple of user groups for giving me some of my first big breaks before I joined Microsoft.

    Unfortunately it's quite difficult to find a list of local developer user groups on the Microsoft web site; I'm trying to get that fixed, but in the meantime, here's a list of groups that I'm aware of. Please drop me a line if you're a member or co-ordinator of a relevant UK-based group that's not on the list, and I'll update it.

    • VBUG. Focuses on Visual Basic and .NET development, with over 1200 members and 10 regional chapters. Nationwide.
    • DDG. General focus on professional .NET development, with a regular magazine and meetings in Reading and London. South East.
    • London .NET User Group. Regular meetings at Microsoft's London offices, covering practical .NET development. London.
    • Scottish Developers. Fairly new, but rapidly growing. Regular meetings covering .NET and agile development. Scotland.
    • UK SQL Server Group. Covering all aspects of database development and administration. Nationwide.
    • .NET Exchange. Run by the VBUG team, language-agnostic with a focus on current and future technology. South East.
    • York .NET User Group. Volunteer-led group - don't know much about them at the moment. North.
    </PublicServiceAnnouncement>
  • Tim Sneath

    Experimenting with Yukon Analysis Services

    • 3 Comments

    Following on from the last blog entry, when I wrote about creating an Analysis Services cube to work with blog logs, I've been experimenting with doing the same things with Yukon. I hit a few road bumps along the way, which helped me understand some important differences in the Yukon Analysis Services architecture.

    I should start with a brief eulogy of praise for the new version of Analysis Services. The Yukon team have put an enormous amount of effort into revamping the architecture, and the result of their labours is an environment that blows away the traditional barriers between relational and multidimensional databases. You can now create a cube based on a fully normalised relational database schema; you can use proactive caching features to access active data in real-time; and you can do away with some of the traditional limitations of one fact table per cube and one hierarchy per dimension. The feature that gets a round of applause every time I demonstrate it, even from a traditionally cynical British audience, is the "one click cube", more properly known as IntelliCube. With this feature, you can point Analysis Services at a data source and it can automatically determine the appropriate fact tables, dimensions, hierarchies and levels - a wizard that can potentially turn hours of repetitive work into minutes. Trust me - once you've played with the new stuff, it's painful to go back to the old way of working, good though the 2000 release of Analysis Services is.

    When I used IntelliCube on the website log table, however, I ran into a couple of difficulties. The wizard ran through seamlessly, but it only detected one dimension. Not only that, but it didn't find any hierarchies. Even when I tried to create a time dimension, it didn't allow me to split it by a hierarchy such as year / quarter / month, meaning that each of the 350,000 rows appeared separately without any aggregation occurring. To be absolutely clear, the data source here is unusual: both facts and dimensions occur in the same table. This is something that's not generally recommended for good scalability, but quite viable for a comparatively small quantity of data such as this. Given this fact, I posted a provocatively titled mail to the Picasso team, suggesting that they'd perhaps not considered this particular usage scenario.

    The development team were good enough to respond very quickly and a short phone call later (many thanks, Thierry!) we'd determined what was going wrong. It turned out to my surprise that Yukon was doing more or less the right thing, and the problem was a few subtle changes in terminology that were causing me confusion.

    Firstly, Yukon introduces the concept of attributes that can form part of a dimension. An attribute offers an alternative to dimensions for slicing data. For instance, a Customer dimension might include a traditional location-based hierarchy of country / region / city / street, as well as attributes representing (for instance) gender, household income, and marital status. Wherever you'd traditionally use a dimension, you can optionally choose to use an attribute instead, or even use multiple attributes to create a custom dimension. On my cube, the IntelliCube wizard had simply created each of the slicers (URL, referrer, user agent, time) as attributes rather than the dimensions as existed in the previous release. So rather than having 1 cube and 4 dimensions, I instead have 1 cube, 1 dimension and 4 attributes in the Yukon version.

    The second discrepancy related to the time dimension. Here there is a missing feature in the Beta 1 release of the IntelliCube wizard, as the wizard only supports time dimensions where the field has already been exploded into its constituent parts (year / month / day etc.) Analysis Services 2000 could automatically handle the separation of a timestamp into its constituent parts, but it only created dimension members where a row of data existed. This would mean that if your data only included values for the 1st,2nd,4th and 5th of a month, you would wind up with holes in the dimension members - as if the 3rd of the month never even existed! Yukon can automatically create the members for a particular date period, which resolves this problem, but in Beta 1 this has to be done outside of the wizard.

    The bottom line I hope people take away from this - Analysis Services in Yukon is a very different beast: it's far more powerful and plays even better with the rest of your operational data, but you'll need to make a few mental leaps if you're already used to the current release of Analysis Services. Fortunately, we also include an import utility that enables you to bring existing cubes across without fuss.

    I'll write more about Analysis Services in a couple of weeks, and hopefully drill further into some of the changes that have been made.

  • Tim Sneath

    Using DTS and Analysis Services to Analyse Blog Logs

    • 2 Comments

    Following on from the entry I wrote about analysing blog logs, one or two people have asked me how I set everything up to do this. It's a really good use of Analysis Services, actually, and offers a good example of how Microsoft's business intelligence tools work together. Nothing is too complicated, although there are a number of steps involve to get everything working. If you've a general idea of what DTS and Analysis Services are, you should be able to follow these instructions fairly well - drop a note in the comments section if you're struggling and I'll see what I can do to help out. Note that the blogs hosted here use BlogX, and so what I've written here presumes a log format that matches the one on our server.

    Step 1: Generating a raw log file
    The first step is of course to get hold of the raw log files that contain the information to be analysed. The log files on my server are stored in a separate file per day, so I use a standard command-line utility called wget to pull down these files over HTTP using a batch file. (Incidentally, here's a quick-and-dirty C# version of wget that I wrote.) Once I've got hold of the files, I concatenate them together into a single log file called bloglog.txt using a batch file. This is about 60MB in size, as of the last count.

    Step 2: Generating a destination table
    Next we need to create a database table into which the cleansed, transformed version of the log files will be inserted. To achieve this, simply execute the following SQL (or similar):

       CREATE TABLE [Hits] (
          [HitID] [bigint] IDENTITY (1, 1) NOT NULL,
          [HitTime] [datetime] NULL,
          [URL] [varchar] (400) NULL,
          [URLReferrer] [varchar] (400) NULL,
          [UserAgent] [varchar] (400) NULL,
          CONSTRAINT [PK_Hits] PRIMARY KEY CLUSTERED ([HitID])  
       ) 

    Step 3: Generating a DTS Package
    The main thing to do here is to copy across the data from the raw tab-delimited file to the Hits table. You can do this by creating a Transform Data task that copies the appropriate columns across (using Copy Column transforms). For the date column, you need to use an ActiveX Script transform, since the date format is different in each case. Here's the VBScript to slice the input appropriately:

       '  Copy each source column to the destination column
       Function Main()
          DTSDestination("HitTime") = Left(DTSSource("Col003"), 10) & _
             " " & Mid(DTSSource("Col003"), 12, 8)
          Main = DTSTransformStat_OK
       End Function
    

    Test the package, and then execute it. If all goes well, you should now have a SQL Server table containing the log file. (If you'd like to save time in this step, you can download the DTS package I created and modify it to match your own table and source file names.)

    (Optional) You might now want to go back and add some steps to the DTS package to clean up the table before each execution, run the batch file created in step 1, and even process the cube we'll create in step 4. This is how the final DTS package should look:

    Step 4: Creating an Analysis Services Cube
    Within Analysis Manager, create a new database. Set the data source to the SQL database you've just created. Now right-click on the Cubes node and choose the Cube Wizard. In a more complex data source, you'd split the fact table (the values you're browsing) and the dimension tables (the things you use to slice and dice the values) out into separate tables, but here everything is conveniently stored in the one table. The HitID is the measure - although the value has no meaning, we'll set the usage of this later.

    Create four dimensions, using the star schema model and accepting all the defaults:

    1. URL (standard dimension type based on the URL column)
    2. Referrer (standard dimension type based on the URLReferrer column)
    3. User Agent (standard dimension type based on the UserAgent column)
    4. Time (time dimension type, year / quarter / month / day hierarchy, based on the HitTime column)

    When the wizard dumps you unceremoniously into the Cube Editor, select the HitId measure and click the Properties button in the bottom left hand corner. Set the Aggregate Function to be Count, not Sum. This switches the aggregations to count how many entries there are, not the values contained in them. Lastly, choose Tools / Process Cube, designing storage using the defaults and then executing the process. The cube is now available for browsing.

    NB Make sure you choose the option to incrementally update the shared dimensions, if you've created the dimensions above as shared - otherwise the processing step will fail on repeated attempts.

    Step 5: Use Excel to Browse the Cube
    Now comes the fun part! In Excel, choose Data / PivotTable and PivotChart Report. Select External Data Source and get the data from the OLAP cube (you might have to set up a new query using Microsoft Query). Accept all the other defaults, and you'll wind up with an empty PivotTable embedded in the spreadsheet. You can now drag fields to the PivotTable - make sure you drag HitID to the centre of the table, and then drag the dimensions to rows or columns. The context-sensitive menus allow you to customise sorting, grouping and formatting. Create a PivotChart to see a graphical view, such as this:

    Well done if you've got this far! Have fun analysing your blog data and identifying some of the trends. You might be surprised as to which kinds of blog entries are the most popular, or what aggregators are visiting your site...

  • Tim Sneath

    PDC Sessions Now Available Online

    • 0 Comments

    In case you've not seen them, the PDC sessions are now available online.

Page 1 of 3 (11 items) 123