I recently had the opportunity to build a data mart in order to analyse web logs from IIS servers.  The solution utilised SQL Server 2012 Database Engine, Integration Services, Analysis Services and Excel 2010 (to slice/dice the data).   

Loading IIS logs (text files with fixed width spaces) was fairly challenging due as the column definitions would alter throughout.  I therefore used the .NET StreamReader class to read the metadata and detect changes in the fields. 

The data flow is shown below.

image

A snippet of the code in the Load Data (Source Script Component) is presented below

Code Snippet
  1. // Get variables
  2. strSourceFile = Variables.vCurrentSourceFileName;
  3. intAuditLogFileIdentifier = Variables.vAuditExecutionFileIdentifier;
  4.  
  5. try
  6. {
  7.     // Create an instance of StreamReader to read from a file.
  8.     // The using statement also closes the StreamReader.
  9.     using (StreamReader sr = new StreamReader(strSourceFile))
  10.     {
  11.         String line;
  12.         int intNumberOfFields = 0;
  13.         string[] strListOfFields = null;
  14.  
  15.         Trace.WriteLine("Log File: " + strSourceFile);
  16.  
  17.         // Output the source file name as the first line (debugging purposes)
  18.         OutputLogFileRawDataBuffer.AddRow();
  19.         OutputLogFileRawDataBuffer.colRawData = strSourceFile;
  20.  
  21.         // Read and display lines from the file until the end of the file is reached.
  22.         while ((line = sr.ReadLine()) != null)

 

Extracting the data from the file was relatively straightforward.  I placed the string into an array based on the fixed spacing between fields.  From the data, I was able to extract useful information such as browser type i.e. Safari, Chrome, IE and even browser version. 

IP addresses were mapped to geolocation using the free GeoLite information http://www.maxmind.com/app/geolite (CSV data imported into the database).  I converted longitude and latitude to the spatial geography data type and presented this against a world map (I wanted to validate that the GeoLite data correctly mapped to the IP address e.g. country/city to IP address).

   1:  USE BiKitchen;
   2:   
   3:  DECLARE @SRID int = 4326
   4:  DECLARE @pLat nvarchar(max)
   5:  DECLARE @pLong nvarchar(max)
   6:  DECLARE @g geography
   7:   
   8:  -- Check longitude and latitude for London
   9:  SET @pLat = (SELECT CAST(Latitude AS nvarchar(max)) FROM [GeoData].[GeoLiteCity-Location] WHERE locid = 13547)
  10:  SET @pLong = (SELECT CAST(longitude AS nvarchar(max)) FROM [GeoData].[GeoLiteCity-Location] WHERE locid = 13547)
  11:   
  12:  SET @g =  geography::STPointFromText('POINT(' +
  13:          @pLong + ' ' +
  14:          @pLat + ')', @SRID).BufferWithTolerance(11000, 1000, 0)
  15:   
  16:  SELECT @pLat AS Latitude, @pLong AS Longitude
  17:  -- SELECT @g.Lat, @g.Long
  18:   
  19:  -- Map the geography type to base world map data
  20:  -- View the result in the spatial tab to validate coordinates
  21:  SELECT @g AS spatiallocation
  22:  UNION ALL SELECT geog FROM World_Borders
 

The star schema was built and deployed in SQL Server 2012 Analysis Services (UDM).  I found named calculations to be incredibly powerful way of extending the data model and making attributes more meaningful for end-users

image

The data was presented using Excel 2010, a screenshot is shown below.  I found slicers to be extremely useful

image

image

 

I thought it would be interesting to see what Excel 2013 had to offer so I tried to create a Power View report but this is not currently supported against the UDM.   There are however some nice enhancements to chart types so I’ll be looking at this in more detail.

image