Familiar. Collaborative. Managed.
Today we’re excited to have a guest blog post by Microsoft MVP Chris Webb. Chris is a UK based consultant (http://www.crossjoin.co.uk/) and trainer (http://www.technitrain.com/) specialising in SQL Server Analysis Services, MDX, PowerPivot and DAX. He is the co-author of three books: “MDX Solutions”, “Expert Cube Development with SQL Server Analysis Services 2008” and “SQL Server Analysis Services 2012: The BISM Tabular Model”, and blogs regularly on Microsoft BI topics at http://cwebbbi.wordpress.com/
PowerPivot is an amazing self-service business intelligence tool – it can handle massive amounts of data and perform complex calculations at lightning speeds – but its true killer feature is the fact that it integrates so well with Excel. Not only is Excel the tool of choice for the vast majority of people who work with and analyze data as part of their jobs, which means that anyone that knows Excel automatically feels at home with PowerPivot, but Excel’s ubiquity means there is a vast array of other tools available that integrate with Excel and which therefore can be used for self-service BI alongside PowerPivot. In this post we’ll look at some free tools that can be used for BI purposes and see how Excel opens up limitless possibilities for BI.
First of all, we need some data to play with. Roger Jennings recently uploaded five months of FAA data on airline flight delays to the Windows Azure Marketplace and wrote a very detailed blog post on how you can import this data into PowerPivot and a number of other different tools here. This is a great dataset – large, detailed, and something that anyone who’s sat in an airport terminal waiting for a delayed flight will be interested in.
Assuming we’ve imported all this data into PowerPivot, how can we start to explore it? Roger’s post shows how to build a pivot table and start to look at average delays by carrier, but there are many other interesting types of analysis possible as well. The first thing we can look at is what the most popular routes and airports are. Taking data from December 2011, we can build a PivotTable in Excel from the PowerPivot data very easily and find the number of flights to and from each airport, but a PivotTable isn’t a great way to visualize this type of data as you can see:
One alternative is to use NodeXL, a tool that was developed in part by Microsoft Research, that makes it very easy to explore network graphs inside Excel. The screenshot below shows what it can do with the same data: the most connected airports such as JFK are at the center of the diagram, while the least connected such as WRG (Wrangell airport in Alaska) are at the edge. The opacity of the lines connecting the airports reflects the number of flights on a route.
It’s also very easy to zoom in and out, highlight a single airport such as Houston (HOU), and see the other airports you can fly to from there.
Maybe it would be better to see the airports on a map though? With a simple web search we can find a csv file with the latitudes and longitudes of all the airports of the world in it and csv files are usually very easy to import into PowerPivot. However, in this case there are a few issues with the data that cause problems for PowerPivot if you load it in directly: it contains a lot of airports that don’t have a three-letter IATA code and there are a few cases where the IATA code is duplicated. To clean up the data we can use a new tool from the SQL Azure Labs called Data Explorer which allows us to create automated procedures that clean, transform, filter and perform calculations on data and then load it directly into Excel. For this data, filtering on airports in the US which have a scheduled service and an IATA code avoids the problem:
With this hurdle overcome it is now very simple to plot each airport on a map using another tool from Microsoft Research called Layerscape. Layerscape is an Excel addin that allows you to take spatial data from Excel and visualize it using Microsoft World Wide Telescope; you can find a more detailed explanation of how it can be used with PowerPivot here and here on my blog.
In the screenshot below, the size of the marker used for each airport is related to the number of flights that originate from it:
It’s also very easy to zoom in, pan, tilt and explore your data once it’s in World Wide Telescope – here we can see more detail about the airports in the northwest of the USA with the number of flights now shown through different-sized circles:
What about those flight delays though? We can use the Excel Data Mining Addin to look at this: it makes it very easy to apply a number of machine learning and statistical techniques to data in Excel tables. The ‘Analyze Key Influencers’ functionality (there are good explanations of how to use this here and here) can help to determine what values in our data are associated with departure delays.
It reveals that one or two airlines and airports are by far the worst offenders as far as delayed flights go – but I won’t name them here for obvious reasons! Less controversially, we can see the influence that date had on departure delays, and from the results we can see that the 2011 holiday period was definitely a bad time to travel:
This has been a whirlwind tour of just a few of the tools that are out there that can help you analyze your data in Excel alongside PowerPivot. All of the tools that have been mentioned are capable of much, much more than I’ve had time to show, and there are hundreds of other tools out there from Microsoft and its partners that can be used for similar purposes – using Excel as the platform for your self-service BI opens up limitless possibilities. Hopefully you’ll have seen something to inspire you to look at your data in a new way!
Great blog post with awesome examples.