Benjamin Wright-Jones

Exploring data and distributed systems [I also cross-post to]

July, 2012

  • Benjamin Wright-Jones

    Building a data mart to analyse web log traffic

    I recently had the opportunity to build a data mart in order to analyse web logs from IIS servers.  The solution utilised SQL Server 2012 Database Engine, Integration Services, Analysis Services and Excel 2010 (to slice/dice the data).   

    Loading IIS logs (text files with fixed width spaces) was fairly challenging due as the column definitions would alter throughout.  I therefore used the .NET StreamReader class to read the metadata and detect changes in the fields. 

    The data flow is shown below.


    A snippet of the code in the Load Data (Source Script Component) is presented below

    Code Snippet
    1. // Get variables
    2. strSourceFile = Variables.vCurrentSourceFileName;
    3. intAuditLogFileIdentifier = Variables.vAuditExecutionFileIdentifier;
    5. try
    6. {
    7.     // Create an instance of StreamReader to read from a file.
    8.     // The using statement also closes the StreamReader.
    9.     using (StreamReader sr = new StreamReader(strSourceFile))
    10.     {
    11.         String line;
    12.         int intNumberOfFields = 0;
    13.         string[] strListOfFields = null;
    15.         Trace.WriteLine("Log File: " + strSourceFile);
    17.         // Output the source file name as the first line (debugging purposes)
    18.         OutputLogFileRawDataBuffer.AddRow();
    19.         OutputLogFileRawDataBuffer.colRawData = strSourceFile;
    21.         // Read and display lines from the file until the end of the file is reached.
    22.         while ((line = sr.ReadLine()) != null)


    Extracting the data from the file was relatively straightforward.  I placed the string into an array based on the fixed spacing between fields.  From the data, I was able to extract useful information such as browser type i.e. Safari, Chrome, IE and even browser version. 

    IP addresses were mapped to geolocation using the free GeoLite information (CSV data imported into the database).  I converted longitude and latitude to the spatial geography data type and presented this against a world map (I wanted to validate that the GeoLite data correctly mapped to the IP address e.g. country/city to IP address).

       1:  USE BiKitchen;
       3:  DECLARE @SRID int = 4326
       4:  DECLARE @pLat nvarchar(max)
       5:  DECLARE @pLong nvarchar(max)
       6:  DECLARE @g geography
       8:  -- Check longitude and latitude for London
       9:  SET @pLat = (SELECT CAST(Latitude AS nvarchar(max)) FROM [GeoData].[GeoLiteCity-Location] WHERE locid = 13547)
      10:  SET @pLong = (SELECT CAST(longitude AS nvarchar(max)) FROM [GeoData].[GeoLiteCity-Location] WHERE locid = 13547)
      12:  SET @g =  geography::STPointFromText('POINT(' +
      13:          @pLong + ' ' +
      14:          @pLat + ')', @SRID).BufferWithTolerance(11000, 1000, 0)
      16:  SELECT @pLat AS Latitude, @pLong AS Longitude
      17:  -- SELECT @g.Lat, @g.Long
      19:  -- Map the geography type to base world map data
      20:  -- View the result in the spatial tab to validate coordinates
      21:  SELECT @g AS spatiallocation
      22:  UNION ALL SELECT geog FROM World_Borders

    The star schema was built and deployed in SQL Server 2012 Analysis Services (UDM).  I found named calculations to be incredibly powerful way of extending the data model and making attributes more meaningful for end-users


    The data was presented using Excel 2010, a screenshot is shown below.  I found slicers to be extremely useful




    I thought it would be interesting to see what Excel 2013 had to offer so I tried to create a Power View report but this is not currently supported against the UDM.   There are however some nice enhancements to chart types so I’ll be looking at this in more detail.


  • Benjamin Wright-Jones

    Redux: Using an SSIS package to monitor and archive the default trace file


    I’ve recently been asked for details regarding the SSIS package I wrote to copy and archive the default trace file which is generated by SQL Server.  The contents of the file can be quite useful for troubleshooting or diagnosis purposes.

    I’ve updated the package to work with SQL Server 2008 R2 and SQL Server 2012.  

    The detection of a new trace file is implemented using a Script Task which watches the \Log folder for new files.  When a new file is detected it copies the previous file to an archive location.   The logic is embedded in Visual Basic.NET (not my personal choice although I was limited in SQL Server 2005 when I wrote the package). 


    The archive process renames the file with the date and time and then copies the file to a chosen location.   I should point out that I use expressions on a few variables to alter outputs such as the filename i.e. date_time_filename. 

    I also noticed that the service needs appropriate permissions to both access the \LOG directory and also copy to the target directory, in my scenario, this was \\server\share.  When I was testing, I launch SSDT (SQL Server Data Tools) using Administrator privileges for testing purposes as a quick workround to permission issues).


    Here is the code for the Script Task (apologies for the word wrap, the Live Writer plug-in seems to do this to fit it on the page).  I have commented out some of the writeline commands I was using to debug the package when it was initially developed.

    Code Snippet
    1. 'Disclaimer:
    2.     'The sample scripts and SSIS package are not supported under any Microsoft standard support program or service.
    3.     'The sample scripts and SSIS package are provided AS IS without warranty of any kind.
    4.     'Microsoft further disclaims all implied warranties including, without limitation, any implied warranties of merchantability or of fitness for a particular purpose.
    5.     'The entire risk arising out of the use or performance of the sample scripts and documentation remains with you.
    6.     'In no event shall Microsoft, its authors, or anyone else involved in the creation, production, or delivery of the scripts be liable for any damages whatsoever (including, without limitation, damages for loss of business profits,
    7.     'business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the sample scripts or documentation, even if Microsoft has been advised of the possibility of such damages.
    9.     Public Sub Main()
    11.         Dim vars As Variables
    12.         Dim strComputer As String
    13.         Dim objWMIService As Object
    14.         Dim colMonitoredEvents As Object
    15.         Dim objEventObject As Object
    16.         Dim strSourceDirectory As String
    17.         Dim strServerName As String
    18.         Dim strSourceErrorLogDirectory As String
    19.         Dim strSourceErrorLogDirectoryWithQuotes As String
    21.         Try
    22.             ' Use the SSIS variables in this code for the WMI query
    23.             strServerName = Dts.Variables("v_ServerName").Value
    24.             strSourceErrorLogDirectory = Dts.Variables("v_DT_SourceLogDirectory").Value
    26.             Console.WriteLine("Servername: " + strServerName)
    27.             Console.WriteLine("Monitoring \Log Directory: " + strSourceErrorLogDirectory)
    29.             ' Replace \ with \\\\ which is needed for the WMI query
    30.             strSourceErrorLogDirectory = Replace(strSourceErrorLogDirectory, "\", "\\\\")
    31.             strSourceErrorLogDirectoryWithQuotes = Chr(34) & strSourceErrorLogDirectory & Chr(34)
    33.             'MsgBox("Server Name: " + strServerName)
    35.             ' Connect to the WMI source
    36.             objWMIService = GetObject("winmgmts:\\" & strServerName & "\root\cimv2")
    38.             ' Monitor the error log folder for instances of ERRORLOG.1 as this is the file we want to archive
    39.             ' The directory name is parameterised and populated from the SSIS variable
    41.             ' Monitor the directory for new default trace files
    42.             colMonitoredEvents = objWMIService.ExecNotificationQuery _
    43.                 ("SELECT * FROM __InstanceCreationEvent WITHIN 10 WHERE " _
    44.                     & "Targetinstance ISA 'CIM_DirectoryContainsFile' and " _
    45.                         & "TargetInstance.GroupComponent= " _
    46.                                 & "'Win32_Directory.Name=" & strSourceErrorLogDirectoryWithQuotes & "'")
    48.             objEventObject = colMonitoredEvents.NextEvent()
    50.             'MsgBox("A new file was just created: " + objEventObject.TargetInstance.PartComponent)
    52.             Dim strReturned, strFilePath As String
    54.             strReturned = objEventObject.TargetInstance.PartComponent
    55.             strFilePath = Split(strReturned, "CIM_DataFile.Name=")(1)
    56.             strFilePath = Replace(strFilePath, """", "")
    57.             strFilePath = Replace(strFilePath, "\\", "\")
    58.             'MsgBox("Sliced file: " + strFilePath)
    60.             ' strFilePath is C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\log_nnn.trc
    61.             ' Array element (6) is log_nnn.trc, this assumes the above directory structure
    62.             Dim strFilename As String
    64.             'strFilename = Split(strFilePath, "\")(6)
    65.             'MsgBox("Split: " + strFilename)
    67.         ��   strFilename = System.IO.Path.GetFileName(strFilePath)
    68.             'MsgBox("IO.Path: " + strFilename)
    70.             ' If filename like log_ then enter this code path
    71.             ' The default trace filename is always log_ so we can rely on this for filename matching
    72.             If strFilename Like "log_*.trc" Then
    74.                 Console.WriteLine("A new default trace file was just created in \LOG called " + strFilename)
    75.                 Trace.WriteLine("A new default trace file was just created in \LOG called " + strFilename)
    77.                 ' Archive the previous default trace file
    78.                 'MsgBox("Default Trace found, now process the file")
    80.                 Dim arrContainer As Array
    81.                 Dim intTraceFileNumber As Integer
    82.                 Dim strArchiveFileName As String
    84.                 arrContainer = Split(strFilename, "_")
    86.                 'Console.WriteLine(arrContainer(0).ToString)
    87.                 'Console.WriteLine(arrContainer(1).ToString)
    89.                 ' Split 1111.trc so we only store 1111 to convert to int
    90.                 arrContainer = Split(arrContainer(1), ".")
    92.                 ' This is the active default trace file number
    93.                 'Console.WriteLine(arrContainer(0).ToString)
    95.                 ' Convert the active trace file number to int and decrease by 1
    96.                 intTraceFileNumber = CInt(arrContainer(0)) - 1
    98.                 ' Convert back to string and create the default trace file name
    99.                 strArchiveFileName = "log_" + CStr(intTraceFileNumber) + ".trc"
    100.                 'Console.WriteLine("Archiving: " + strArchiveFileName + " to " + Dts.Variables("v_DT_DestinationDefaultTraceDirectory").Value)
    102.                 'MsgBox(strArchiveFileName)
    104.                 'Write the filename to the SSIS variable
    105.                 Dts.Variables("v_DT_ActiveFileName").Value = strArchiveFileName
    107.                 Console.WriteLine("Archiving: " + strArchiveFileName + " to " + Dts.Variables("v_DT_DestinationDefaultTraceDirectory").Value)
    108.                 MsgBox("Output to SSIS Variable: " + Dts.Variables("v_DT_ActiveFileName").Value + " strFilename: " + strArchiveFileName)
    110.                 ' Indicate success to move on to the next step
    111.                 Dts.TaskResult = ScriptResults.Success
    112.             End If
    114.             ' Error handling
    115.         Catch ex As Exception
    116.             Console.WriteLine(System.DateTime.Now.ToString + " - SSIS Script Task Error: " + ex.Message.ToString)
    117.             Dts.TaskResult = ScriptResults.Failure
    118.         End Try
    120.     End Sub


    I hope you find this useful.

    I will try and attach the SSIS package to this post later.

  • Benjamin Wright-Jones

    Redux: Using a C# script task in SSIS to download a file over http


    A few people have asked for further information about the C# script task which I blogged about (quite a while ago).  I mistakenly forgot to add the full source code, sorry everyone.  Here is the link to the original blog post:

    I have since imported the SSIS package into Visual Studio 2010 (BIDS) and the code compiles without error.  Some of the code below is truncated on the right (just a formatting issue I need to resolve) but the core of the code is there.  

    As always, let me know if there are any problems. 

       Download a file over http using Script Task in SQL Server 2008 R2 Integration Services.   
    Two key variables, vSSOReportURL, which is constructed in a prior Script Task e.g. http://www..
    vSSOLocalFileName, which is the fully qualified reference for the downloaded file e.g. c:\

    */ using System; using System.Data; using Microsoft.SqlServer.Dts.Runtime; using System.Windows.Forms; using System.Net; using System.Net.Security; namespace ST_7e897e41dd5945f3b77366d32f0a97e0.csproj { [Microsoft.SqlServer.Dts.Tasks.ScriptTask.SSISScriptTaskEntryPointAttribute] public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase { #region VSTA generated code enum ScriptResults { Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success, Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure }; #endregion public void Main() { WebClient myWebClient; string RemoteURI; string LocalFileName; bool FireAgain = true; Variables vars = null; Dts.VariableDispenser.LockForRead("User::vSSOReportURL"); Dts.VariableDispenser.LockForRead("User::vSSOLocalFileName"); Dts.VariableDispenser.LockForWrite("User::vSSOReportURLIndicator"); Dts.VariableDispenser.GetVariables(ref vars); try { // Ignore certificate warnings ServicePointManager.ServerCertificateValidationCallback = new RemoteCertificateValidationCallback(delegate { return true; }); // Initiate webclient download, use default credentials (current login) myWebClient = new WebClient(); myWebClient.Credentials = CredentialCache.DefaultCredentials; RemoteURI = vars["User::vSSOReportURL"].Value.ToString(); LocalFileName = vars["User::vSSOLocalFileName"].Value.ToString(); // Log provider notification Dts.Events.FireInformation(0, String.Empty, String.Format("Downloading '{0}' from '{1}'", LocalFileName, RemoteURI), String.Empty, 0, ref FireAgain); // Download the file myWebClient.DownloadFile(RemoteURI, LocalFileName); // Set report URL indicator, this is used to determine the http source of the // download i.e. vSSOReportURL or vSSOReportURLRetry for the message which is // written to the table vars["User::vSSOReportURLIndicator"].Value = 0; // Return success Dts.TaskResult = (int)ScriptResults.Success; } catch (Exception ex) { // Catch and handle error Dts.Events.FireError(0, String.Empty, ex.Message, String.Empty, 0); Dts.TaskResult = (int)ScriptResults.Failure; } } } }

    The files are extracted using an Execute Process Task (with 7-Zip) as shown below:


    And the arguments are set using the expression (below).  There are probably better ways of doing this but I found this worked well.


    The .zip file is then archived using a File System task and the extracted file is renamed to .xlsx. 

  • Benjamin Wright-Jones

    Are you interested in Data Science?


    The University of Dundee is now gauging interest in a number of data science focused courses (see below).


    You can register interest via this link

Page 1 of 1 (4 items)