Working with SharePoint, I often find myself in situation where I need to analyze IIS log files in a variety of ways to understand what kind of load is being placed on the system. In an effort to make this easier for me, and others, I have tried to automate, script, etc., wherever possible.
The first pass that I typically run on the IIS data is to slice the IIS traffic by hour. IIS Logs files record their time in UTC; however, most people live/operate in a different time zone. In order to make the results more reflective of the actual user environment, you will need to adjust the times, either forward or backwards relative to UTC, to compensate for your particular environment.
For ease of use, I typically put my Log Parser scripts into a file and then reference them via the command line. This has the added advantage of allowing me to pass in arguments from the command line. As I mentioned before, I have two scripts – one for negative offset to UTC and the other for positive offset to UTC.
HitsByHourMinusGMT.sql
1: SELECT 2: SUB(QUANTIZE(TO_TIMESTAMP(date, time), 3600), TIMESTAMP(‘%tzoffset%’,’hh:mm’)) AS Hour, 3: COUNT(*) AS Hits 4: INTO 5: %destination% FROM %source% 6: GROUP BY 7: Hour 8: ORDER BY 9: Hour ASC
1: SELECT
2: SUB(QUANTIZE(TO_TIMESTAMP(date, time), 3600), TIMESTAMP(‘%tzoffset%’,’hh:mm’)) AS Hour,
3: COUNT(*) AS Hits
4: INTO
5: %destination% FROM %source%
6: GROUP BY
7: Hour
8: ORDER BY
9: Hour ASC
HitsByHourPlusGMT.sql
1: SELECT 2: ADD(QUANTIZE(TO_TIMESTAMP(date, time), 3600), TIMESTAMP(‘%tzoffset%’,’hh:mm’)) AS Hour, 3: COUNT(*) AS Hits 4: INTO 5: %destination% FROM %source% 6: GROUP BY 7: Hour 8: ORDER BY 9: Hour ASC
2: ADD(QUANTIZE(TO_TIMESTAMP(date, time), 3600), TIMESTAMP(‘%tzoffset%’,’hh:mm’)) AS Hour,
Traffic.cmd
1: @if /i "%DEBUG%" neq "yes" echo off 2: REM 3: REM Generate data for Traffic analysis 4: REM 5: @SET FileList=090715,090716 6: 7: @SET ServerList=1,2,3,4 8: 9: @ECHO Initializing... 10: FOR %%i IN (%ServerList%) DO @IF EXIST %%i.csv DEL %%i.csv /F /Q 11: 12: @ECHO Generating Traffic Analysis For Servers 13: FOR %%i IN (%ServerList%) DO @IF EXIST data\*%%i*.log LOGPARSER file:HitsByHourMinusGMT.sql?source=data\*%%i*.log+destination=%%i.csv+tzoffset="07:00" -i:iisw3c -o:csv 14: 15: @SET FileList= 16: @SET ServerList= 17:
1: @if /i "%DEBUG%" neq "yes" echo off
2: REM
3: REM Generate data for Traffic analysis
4: REM
5: @SET FileList=090715,090716
6:
7: @SET ServerList=1,2,3,4
8:
9: @ECHO Initializing...
10: FOR %%i IN (%ServerList%) DO @IF EXIST %%i.csv DEL %%i.csv /F /Q
11:
12: @ECHO Generating Traffic Analysis For Servers
13: FOR %%i IN (%ServerList%) DO @IF EXIST data\*%%i*.log LOGPARSER file:HitsByHourMinusGMT.sql?source=data\*%%i*.log+destination=%%i.csv+tzoffset="07:00" -i:iisw3c -o:csv
14:
15: @SET FileList=
16: @SET ServerList=
17:
In order for all of this to work properly, we have to assume a minimum amount of IIS log fields have been captured. Typically, I will recommend that they all be captured. That gives you the most flexibility when it comes to parsing. Also, when I do this, I’m usually analyzing multiple machine simultaneously. In order to do that efficiently, there has to be some type of naming scheme in use to differentiate the various log files from one another. It may be as simple as adding the server name/number to the beginning/end of the filename. In our example, we have the following:
Let’s break this name down just a bit.
For the purposes of this example, I’m going to assume that the servers we’re getting data from are located in Redmond, WA (where else :-)). That is an offset of UTC-7 hours. Also, to make things a bit easier, I want to generate a file per machine. So, the command would resemble the following:
logparser file:HitsByHourMinusGMT.sql?source=ex0907151.log+destination=1.csv+tzoffset=”07:00” –i:IISW3C –o:csv
logparser file:HitsByHourMinusGMT.sql?source=ex0907152.log+destination=2.csv+tzoffset=”07:00” –i:IISW3C –o:csv
logparser file:HitsByHourMinusGMT.sql?source=ex0907153.log+destination=3.csv+tzoffset=”07:00” –i:IISW3C –o:csv
logparser file:HitsByHourMinusGMT.sql?source=ex0907154.log+destination=4.csv+tzoffset=”07:00” –i:IISW3C –o:csv
This will create 4 csv files that contain data for each server. What if we had a lot of days and multiple servers? This approach would get a bit tiresome. Let’s leverage the command line a little bit more to do the heavy lifting.
I have multiple days worth of data in a folder called, appropriately enough, data.
By modifying the FileList variable in the Traffic.cmd script, I can account for all them. Now, if I simply run the Traffic.cmd script from the command line, it generates a csv file per server. The output should look something similar to the following:
If we look at the contents of one of these files, we see the following:
This is a breakdown, by hour, of the number of hits that this server received. A hit is defined as anything that generates a line in the IIS Log file.
Well, we’ve got all these raw numbers that we need to do something with. Let’s pull them into Excel and graph them to see if we can see anything that interests us.
Let’s create a simple spreadsheet that contains a summary of all of the data. I like to break it out by server, by date, etc.
As you can see, down the first column are the times (localized to the server timezone), and then the hits/hour for each day for each server. So now what? This is good information (I think) but what do I do with it? Well, you use Excel to do what Excel does best – you create some graphs so that you can see patterns in the data. In this case, the graphing could not be simpler. We are simply going to create line graphs per server/per day.
Let’s talk about some of things that these graphs are telling us:
Another view that I find useful is to look at the aggregate traffic per day per hour across the farm to get an idea of what daily traffic patterns look like. All things being equal, I would expect to see roughly the same type of pattern during weekdays, a decline during the weekend, etc.
There are several notable areas of interest that warrant further investigation
Something to keep in mind while you’re analyzing this type of data is patterns. Patterns are very difficult to discern from a small dataset; so, the more data you have to analyze/graph, the better.
In the next installment, I will take a look at dissecting the IIS logs by unique users/IP addresses to see who is doing what where.
Enjoy!