Fantasy football draft season is here, and the UX test lead for Power Query, Ben Martens, thought that picking his fantasy team seemed like an interesting test case for Power Query. Here it is how he did it in his own words:
There’s a lot of data available and I thought that past performance might give some indicator of future results. Before I could figure that out, I needed to pull all the data into a single table. Here’s a walkthrough of my solution. It’s not short so I might be brief in some areas. Please feel free to contact our team if you want more assistance!
1. Find the data - I did a search inside Power Query but didn’t find the NFL data that I needed so I headed to the web. The trick was to find a site that listed every single player on one page and had an easy URL scheme so I could change the year and get more data. Pro-football-reference.com fit the bill nicely. My initial focus was on the 2012 season.
2. Pull in Passing, Rushing and Receiving data – I want every passing, rushing and receiving data point for every player. Those three type of data are stored on three separate pages so that means I used the From Web button three times. The data was pretty clean but I had to rename the column of names to “Player” and then use the Replace Values function twice to get rid of the * and + after some of the names. Finally I removed everything except Player, Yards, TD and Int since those are the stats that are important for passing in our fantasy league. Finally I renamed the query to “2012 Passing”. Rushing and Receiving were similar.
3. Merge - Now it gets a little bit tricky because one player might show up in two or three of those tables. For example, Drew Brees usually passes the ball, but he also runs and even catches balls on trick plays. That means that we have to merge the three tables together. However, we can’t use the Merge button in the ribbon because that functionality currently only includes rows from the second table that match the first. We want a full outer join. That’s possible, but we have to get our hands dirty and write a formula. Make sure that you have chosen to show the formula bar in the query editor by clicking Settings > Show Formula Bar. Also, from the ribbon in Excel, in the Machine Settings group, choose Options and then check Enable Advanced Query Editing. In the Excel ribbon choose From Other Sources and then pick Blank Query. (This option gets added when you enable advanced query editing.) You may need to tweak the following formula a bit depending on your query and column names. Enter it into the formula bar in the query editor. For more help with this function, please see section 11.6.12 of the Power Query Formula Library Specification.
= Table.Join(Table.PrefixColumns(#"2012 Passing", "Passing"), "Passing.Player", Table.PrefixColumns(#"2012 Receiving", "Receiving"), "Receiving.Player", JoinKind.FullOuter)
Player names are scattered across two columns, but we can get them all into a single column. Choose Insert Custom Column from the table menu and create a new column that holds the player’s name. Here’s the formula to enter into the Insert Custom Column builder:
if([Passing.Player]<>null) then [Passing.Player] else [Receiving.Player]
Next you can remove the Passing.Player and Receiving.Player columns since the new custom column has the correct name in every row. Choose all the stats columns and replace null values with 0. To really be thorough, we need one more step: sometimes players switch teams in the middle of the season and they’d end up getting reported on separate rows in this table. Do a Group By operation on the player name and then add a summed column for each of the stats fields.
4. Merge again – In step 3 we merged Passing and Receiving so now it’s time to repeat that step again. Merge the results of step 3 with the Rushing query. Don’t forget the additional cleanup steps such as the big Group By operation.
5. Calculate scores - I took our fantasy football league’s scoring rules and created a custom formula that calculated the total points scored by every player in the league. (Well, almost every player. This ignores kickers and defenses.)
6. Done - Sort by the new score column in descending order and click Done to run the entire query and add it to your worksheet.
We now have a pretty useful table that pulls in data from three separate web pages, joins them all together by player name and then runs a custom math formula over the various player statistics. However, it would be even MORE useful if we could get a bunch of previous years and then use that additional data to try and refine our picks for the next year. The obvious way to do this would be to run through steps 2-6 again for each year but that’s a lot of manual labor. We can simplify things by creating a single function that will run steps 2-6 and given only the year as a parameter. There is currently no fantastic support for what we’re about to do, but with a little manual formula wrangling, it can be done.
7. New Workbook – When I’m converting queries into formulas, I find it easiest to create a brand new workbook so I can keep the original work as a reference. Make sure you have enabled advanced query editing in the settings dialog as we’ll be using it heavily in the next steps.
8. Create a “Passing” Function – We’ll start with the passing statistics. In the old workbook, go to the Passing query and click Filter & Shape. On the right side of the formula bar, next to the drop down arrow is a little paper icon. Click on that to bring up the advanced query editor. Now you can see the raw formulas for each step in the query. Highlight all of the code and copy it.
In your new workbook, click From Other Sources > Blank Query. Bring up the advanced query editor again. Paste in the contents of your clipboard. Rename the query to just be “Passing”, wrap the entire text in a function with one parameter, and use that parameter in the URL of the Source step. All of those changes are highlighted in the screenshot below.
Click Done and that should be it. You can test the function if you want by invoking it and giving it a string argument “2012”. Note that the invoke function dialog assumes that 2012 is a number so it generates the wrong formula for you. You’ll need to change it to = result("2012") to see it working. Once you’re confident that the function was created correctly, make sure you delete that invoke step. You query editor should look like this and then you can click Done to send it to the worksheet.
Repeat this for each of the other 4 queries that you created in your old workbook (Rushing, Receiving, the first merge and then the final merge.) The final result should be one function that you can call to get a table of all of the stats for the specific year. Pretty cool!
9. Run the function – Now it’s time to harvest the fruits of your labor. Create a new spreadsheet for each year of data that you want to collect. Each sheet should simply invoke your function from the previous step with a different year. I was able to go back to 2008 with no problems and that was plenty of data for my needs.
10. Analyze – At this point you can pick out the data you need and create whatever formulas you want (either in Power Query, Excel or another tool) to figure out who you should draft in your fantasy league. I chose to go one step further and merge all of the tables from step 9 into one more giant table and then operated on that table.
When you come out on top of your fantasy league, be sure to tell all your friends that you used statistics and Power Query to beat them!
To learn more about Power Query and Power BI:
Comments in this blog are open and monitored for each post for a period of two weeks after the posting date. If you have a specific question about a blog post that is older than two weeks, please submit your question via our Twitter handle @MSPowerBI