DAX (Data Analysis Expressions) Measures in PowerPivot

DAX (Data Analysis Expressions) Measures in PowerPivot

  • Comments 6

Introduction to DAX Measures

It’s been a while since I talked about the DAX formula language in PowerPivot, and wanted to take this opportunity to present some examples showing how DAX formulas can be used to define measures in PowerPivot.

Before we get to the new functions and examples, let me review some basic points about the DAX expression language:

DAX formulas can be used to define calculated columns in a table and they can also be used to define measures. Calculated columns are just like in Excel – you enter a formula, and that formula is evaluated for each row in the table, effectively filling in the entire column with values.

A measure is different. When you define a measure, you provide a DAX formula and a name. This measure can be placed onto an Excel PivotTable, and it will be evaluated many times, once for each cell in the values area of that PivotTable. Let’s look at a simple example.

image

This PivotTable has only one measure, named “Sales” with a single DAX formula defined as

=SUM(FactSales[SalesAmount])

That single formula is being evaluated 72 distinct times, with 72 different results. If the format of the PivotTable changes, by adding or removing any fields to the row or column labels, this measure will be dynamically re-evaluated in that new layout.

The dialog in which the measure is defined has multiple entry points:

  • “New Measure” button in the PowerPivot ribbon
  • Right-click menu item for each table in the field list that says “Add New Measure...”
  • Right-click menu item for each measure in the field list that says “Edit Formula”
  • Right-click menu item for each measure in the Values well below the field list that says “Edit Measure...”

Choosing any of these entry points will bring up the Measure Settings dialog that allows you to specify the name and formula for your measure, and that dialog looks like this:

image

Sample Data comes from the Contoso database

In case you would like to replicate my examples, they use data that has been imported from the “Contoso” SQL Server relational database that is available from the Microsoft Download Center as part of Microsoft Contoso BI Demo Dataset for Retail Industry: http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=868662dc-187a-4a85-b611-b7df7dc909fc To replicate my examples, you will want to import the following tables: FactSales, DimChannel, DimDate, DimStore, DimGeography, DimProduct, DimProductSubcategory, and DimProductCategory.

Filter Context

I’d like to pause for a moment to define the term Filter Context because I’ll need to use the term repeatedly. The Filter Context is the set of filters that have been applied to a particular calculation. Each of the 72 results for the Sales measure has a distinct filter context. The filter context for each value cell includes the items on the row labels, the column labels, the report filter, (and slicers when/if they are being used.) The cell that is highlighted in the PivotTable above has the following Filter Context.

  • [RegionCountryName] = “Germany”
  • [CalendarYear] = 2008
  • [ChannelName]=”Store”
  • [ContinentName]=”Europe”

This can be thought of as applying four filters to the FactSales table prior to evaluating the formula in that particular cell. The formula calculates the sum of the [SalesAmount] values for the rows that are left after applying those four filters. This is where the name Filter Context comes from.

Relationships and Filter Context

When there is a relationship between two tables in PowerPivot, that relationship is always considered a “many-to-one” relationship that goes from the table containing “many” records for each instance of the key to the table containing “one” record for each instance of the key. For example, there is a many-to-one relationship from the table FactSales to the table DimProducts, with many rows for each product in FactSales, and only one row for each product in DimProducts.

It turns out that applying a filter to a table on the “one” side of the relationship will also cause the table on the “many” side of the relationship to be filtered. For example, if I filter DimProducts so that only one product remains, then FactSales will also be filtered so that it only contains the sales transactions for that one product. Applying a filter to the table on the “many” side of a relationship (filter FactSales to include only certain products) will not have any impact on the table on the “one” side of the relationship.

This is how PivotTables are designed to work, and if you look at some examples, it makes perfect sense. Consider the PivotTable below where DimGeography has been filtered to include only two countries. It’s quite natural that FactSales is constrained by the same filter, even though [RegionCountryName] isn’t a field in the FactSales table. It’s a field in a table that is on the “one” side of a relationship to FactSales.

image

Measures and Filter Context

The formula that defines a measure is always evaluated multiple times – once for each cell in the Values area. And each of those evaluations has its own filter context. Even the grand total in the PivotTable above for all countries and for all years is evaluated in its own context, namely the context where [RegionCountryName] = ALL, [CalendarYear]=ALL, and [ChannelName]=”Store”.

The presence of filter context for all measure evaluations (and the presence of row context for calculated columns,) is one of the reasons why you often cannot use the same formula to define a measure as you would use to define a calculated column. Generally, because a measure must be evaluated many times, including for any totals rows or columns, measure formulas must do some sort of aggregation. Most measure formulas will either be the aggregation (Sum, Count, Average, Min, or Max) of an expression, or a formula involving other measures, or some combination of the two.

CALCULATE(Expression, SetFilter1, SetFilter2,...)

The CALCULATE function is very powerful and very useful, but is conceptually a bit more difficult than the functions you’re likely to use in a calculated column. CALCULATE allows any DAX expression to be evaluated in a specified filter context. This is equivalent to defining a measure and also defining a set of filters that will be used to provide a context in which the measure is to be evaluated. CALCULATE does the following:

  1. Using the SetFilter arguments, modify the current Filter Context
  2. If there is a Row Context, (a current row,) move that Row Context onto the Filter Context
  3. Evaluate the Expression in the newly modified Filter Context

Here’s an example that illustrates how this works.

Let’s say that we want to do some analysis on sales data. We’ve created a simple measure on the FactSales table named [Sales] that is defined by this DAX formula:

=SUM(FactSales[SalesAmount])

[Sales] is nothing more than the sum of the [SalesAmount] field for all the sales transactions contained in any particular slice of the data. But there’s a twist because the Contoso sales data includes sales made via four distinct channels: There are “Catalog” sales, “Online” sales, “Reseller” sales, and “Store” sales. Each of these is a separate row in the DimChannel table, and every FactSales row is identified as belonging to one of these channels by the [ChannelKey] field in that table. (There is of course a many-to-one relationship from FactSales to DimChannel.)

Suppose that the sales analysis I want to perform is to be based on Store Sales worldwide. I expect various people in my department to slice the sales data a variety of ways – by country, by time, by store, by product, and by any of the fields available in the database. But I want to make sure that no matter what they choose, they are only looking at sales in the “Store” channel. If I don’t do anything special, I run the risk that the analysis will inadvertently combine store sales with other kinds of sales, because they might forget to place a Channel filter on their PivotTable.

Note that this would be an easy mistake to make because the sales data for Contoso includes Store sales in 34 countries. In 30 of those countries, Store sales is all there is. In this data set, Catalog sales are limited to the United States, Online sales are limited to three countries (US, Germany, China), and Reseller sales are also limited to three countries (US, France, China). Someone who is analyzing sales in Canada would not see any difference between Store Sales and Sales for all channels, but someone analyzing data for the US or China would see a dramatic difference.

To force the values to be from the “Store” channel, I will define a new measure named [StoreSales] which is defined as:

= CALCULATE ([Sales], DimChannel [ChannelName]=”Store”)

The SetFilter argument within CALCULATE says that we should override any filter (if any) on the Channel Name column with a new filter that says we only want sales in the Store channel. By using this measure, we are guaranteed to be looking at Store sales, and we don’t have to rely on PivotTable authors to place the Channel somewhere on their PivotTable.

image

Syntax Shortcut for CALCULATE (expression [,SetFilter1] [,SetFilter2]...)

When you have a CALCULATE function where the expression to be evaluated is simply a measure, then there is a convenient shortcut available. Let’s consider the example above.

= CALCULATE ([Sales], DimChannel [ChannelName]=”Store”)

In order to make certain common cases easier to author and easier to read, we offer the following shortcut: When the expression to be evaluated is a measure, a shortcut for CALCULATE ( <measure>, <optional SetFilters>) is to use the name of the measure as if it were a function name and write this instead:

= <MeasureName> (<optional SetFilters>)

When we apply this to our example, we find that StoreSales can be defined as

= [Sales](DimChannel [ChannelName]=”Store”)

This is easier to read and author and I’ll use this notation in several places going forward in this paper.

Note: There are known issues with auto-complete for formulas that use this syntax shortcut.

VALUES(column)

The VALUES function will return all the valid values for that column in the current filter context. The result comes back as a table of values, even though the table may have only one value (or even no values). This function is an easy way to ask about the current filter context. This function will often be used within an IF function. Imagine a scenario where you want to do one calculation when the country is the United States, and a different calculation when the country is not the United States. This can be accomplished by testing to see if VALUES(DimGeography[RegionCountryName]) = “United States”. Of course there might be a PivotTable where the country isn’t specified, or a situation where multiple countries have been selected and we can only compare a table to a single value when the table has only one value. This forces us to write something like this to determine what the current country is:

= IF (COUNTROWS(VALUES(DimGeography[RegionCountryName]))=1 ,

VALUES(DimGeography[RegionCountryName]) ,

"No single country selected")

This formula says that if there is only one country selected in the current context, return the name of that country, else return the expression “No single country selected”. The result of placing this measure in a PivotTable where the country is on Row Labels looks like this:

image

ALL(Table) and ALL(Column1 [,Column2]...)

The ALL function is a table valued function that that causes the filter context for the specified columns to be ignored. This function is particularly useful when used as a table valued SetFilter argument within the CALCULATE function. Consider some examples:

ALL (DimProduct) returns a table containing all the rows within DimProduct, ignoring any filters that may be present in the filter context. Duplicate rows will be removed. Because the argument to ALL is a table, the filter context is effectively removed from all columns in that table.

ALL (column1) returns a table containing all the values from column1 (without any duplicates) after removing/ignoring any filters from the filter context that may have been present on that column.

When multiple column arguments are supplied to the ALL function, it is required that all of those columns belong to the same table. For example, ALL (column1, column2) returns a table with two columns from a source table, consisting of all the combinations of column1 and column2 that were present in the original data removing/ignoring any filters from the filter context that may have been present on those columns.

Consider the measure FactSales[Sales] that has been defined as =SUM(FactSales[SalesAmount]). By itself, the measure [Sales] is the sum of the Sales Amount column for whatever the current context may be. Looking at the PivotTable below, you see that this is a different number in each cell of the PivotTable because each cell has a different context (a different product category and year).

image

Now, let’s add another measure to the PivotTable. Let’s add RatioAllProd which is the sales for a given context divided by sales for all products (in the same year). There are two ways to write this, one using CALCULATE, and another using our shortcut syntax:

  • = [Sales]/CALCULATE([Sales], ALL(DimProduct))
  • = [Sales]/[Sales](ALL(DimProduct))

image

There is one important thing to note about this formula: = [Sales]/[Sales](ALL(DimProduct)). We have changed the filter context for the denominator of this ratio without changing it for the numerator. This is a very common BI scenario. We need to calculate something compared to that same thing but for a different set of products, or for a different year, or for a different region, etc. Changing some part of the filter context for some portion of the formula is very powerful.

Let’s look at one more example, this time comparing sales for each of our four channels to store sales and also to all sales. I’ll define two new measures:

  • RatioStore = [Sales]/[StoreSales]
  • RatioAllChan = [Sales]/[Sales](ALL(DimChannel))

Then I’ll place these measures in a couple of different PivotTable that have different row and column headers to illustrate how a measure can make sense no matter what PivotTable it is placed on. This makes it important to define measures carefully, because once defined, users are free to place them on PivotTables that might be organized quite differently from what was originally anticipated.

image

image

Time Intelligence to be discussed in a subsequent blog post

One of the most common calculations performed in data analysis is to compare some number to a comparable number for a different time period. Calculations that make comparisons to last month or to the same period from a year ago are very important for any business intelligence tool. Toward that end, DAX introduces 35 new functions expressly for the purpose of working with time based data, and I’ll talk about them in my next blog post!

Leave a Comment
  • Please add 7 and 4 and type the answer here:
  • Post
  • I was wondering if it is possible to do PERCENTILE/Quartile calculations within PowerPivot.

  • Nice. But the betas have expired and to my knowledge there is no way to try this out.

  • Hope i can use this.

    When expiration issue is finally solved!!!!

  • While we don't have built-in DAX functions for percentile or quartile calculations, you can author these sorts of calculations on your own using DAX measures.

  • Can you provide an example of how I'd author a 90th percentile measure?

  • Very Nice.Thanks for the blog

Page 1 of 1 (6 items)