We have now seen several blog posts introducing us to several feature areas of Data Explorer. One particular feature was visible in numerous screen shots, but has not yet been given specific attention: the formula bar. This post delves into some of the depths of what makes Data Explorer tick. Feel free to skim and read selectively. We will follow up with further posts covering related details and go even deeper, so taking a bit of time with today’s post should pay off in laying grounds for those future posts.

When building up tasks in Data Explorer, the formula bar tracks what each step is about. Just as in Excel, it is possible to directly edit the formula in the formula bar. To do so, we need to understand the “language” used to write a formula: the Data Explorer formula language. This language is, by design, quite close to the one found in Excel. So before taking a closer look at the Data Explorer formula language, let’s take a brief look at Excel’s first.

When working with spreadsheets, it is often necessary to enter formulas that compute a cells value based on other values in the spreadsheet. In Excel, the formula bar just above the current worksheet shows the formula “behind” the currently selected cell.

An Excel formula always starts with an equal sign, followed by an expression. The expression may contain simple operators familiar from basic math (+ - * /) and function invocations. For instance, in order to sum up the orders from a table containing the number of orders from each listed customer, the SUM function can be used.

Spreadsheets and Excel are very widely used. Many people have some level of understanding of the Excel formula language. To ease the adoption of Data Explorer, its usage model follows a design similar to Excel’s. In Excel, we typically start out “building” formulas in the user interface by clicking on cells and completing forms. Over time, we see how the formulas we build turn out looking in the formula bar. From there, it is a short step to begin “tweaking” formulas – by fixing a cell reference, editing a constant used in a formula, and so on. Finally, some users venture into just typing formulas into the formula bar.

Data Explorer follows that same model. A lot can be achieved by clicking and completing forms in the user interface. Behind the scenes, there is always the formula language at play. So, let’s take a closer look at the formula language and how it works. Here is a simple example of performing a calculation using the formula bar:

Consider the summation of values in a list. Lists can be written using curly braces – although it is much more common for lists of data to come from some data source.

Once we have a list, we can call functions to calculate interesting properties of the list. For instance, we can sort the list in descending order.

 

Here, we use the function List.Sort to sort the given list. The second argument to that function, Order.Descending, is a pre-defined named value that indicates the desired sort order (descending, in this case). You may wonder where List.Sort is coming from. It is a function included in the Data Explorer standard library that today offers over 300 functions, with more to come in the future.

There are a few things worth noting before we move on. As in Excel, function names can contain dots. In Data Explorer, by convention, function names are formed as pairs of nouns that name the area the function belongs to followed by the specific function. Unlike Excel, function names use lower- and upper-case characters and the casing is significant. (LIST.SUM would be a different name and will not work when the intention is to use List.Sum.) Function names are also spelled out to make them easier to read. (In Excel, many function names are abbreviated. Excel’s STDDEV is called List.StandardDeviation in Data Explorer.)

In Excel, the values are usually spread out across cells of a row or column and summation is achieved by specifying the range of cells that hold the values we want to sum up. It is also possible to name such a range of cells in Excel and then use that name to specify which values to sum up. In Data Explorer, there is no concept of cell ranges. Instead, data sets are commonly referred to by name. If we have a table of values, then we can extract the list of values found in one of the table’s columns and compute the sum of those values.

 

In this example, we have a list of monthly pay rates for the employees of a company. By summing up the values in that list (using the List.Sum function) and dividing by the count of employees (using the List.Count function), we get the average monthly pay rate. We can also use the function List.Average to directly compute the average value.

In Data Explorer, it is possible to see all formulas behind a section of a mashup document at once. To get there, right-click a section tab and select “View Formulas”.

The structure of formulas in a section relates very closely to the structure of resources and task streams seen in the UI. In our example, if we select the first resource, “AveragePay”, we see the defining formula for that resource.

If we select the second resource, “EmployeePay”, we see a task stream, a series of formulas applied on top of each other. When selecting any one of the tasks in that stream, we see the formula for that task. In the screenshot, we selected the task “ExtractedColumn” and the formula for that task extracts the column “Pay” from the “InsertedCustom” table, returning it as a list.

Now, let’s look at the combined formulas for the (only) section of our document.

 

Each of the resource definitions (“AveragePay”, “EmployeePay”) appears in the section’s formulas as a name followed by an equal sign followed by the definition. If a resource is defined by a simple formula, as is the case for AveragePay in our example, then that formula appears on the right-hand side of the equal sign:

AveragePay = List.Average(EmployeePay) ;

For resources that are defined by a task stream, as is the case for EmployeePay in our example, we can see a further feature of the Data Explorer formula language: the “let-in” expression. As we can see, EmployeePay is defined as a sequence of named expressions, each of which corresponds to one task in the task stream. All but the first of these expressions refer to the immediate preceding task by name, thus forming the task stream. Finally, the last named expression is referenced in the “in” part of the “let-in” expression.

The Localhost expression returns a view of all the databases installed in the local machine’s default database. (We are using the Sql.Databases function, included in the Data Explorer standard library.) The value of that expression is a record – a piece of data that has named parts to it, called record fields or just fields. In this example, each field corresponds to an installed database. We also say that the Localhost expression evaluates to a record value.

 

Let’s take a closer look at that “let-in” expression defining EmployeePay. The first named expression in the “let” part opens the databases on the local machine’s default database server:

Localhost = Sql.Databases("localhost"),

We then extract the specific database “M_AdventureWorks” and from there the specific table “HumanResources.EmployeePayHistory”:

HumanResources.EmployeePayHistory =
Localhost[M_AdventureWorks][ HumanResources.EmployeePayHistory],

The square brackets used above indicate accessing a part of some data, selected by name. We saw that Localhost evaluates to a record value. Writing someRecord [ fieldName ] yields the value of the record’s field that has name fieldName.

The result of selecting the “M_AdventureWorks” database is another record holding all the named tables in that database. So, the expression:

Localhost[M_AdventureWorks][HumanResources.EmployeePayHistory]

evaluates to the value of the “HumanResources.EmployeePayHistory” table.

 

A table is, essentially, a list of rows, where each row is a record of the same “shape”. Specifically, all row records of a table have the same number of like-named fields.

Tables are such an important concept in Data Explorer that they are handled specially. A table is displayed with column headers and an open-ended list of rows. In addition, the ribbon changes to present specific table tasks.

Let’s say we want to add a new column to the table. For each row, we want the new column to hold the product of the existing “Rate” and “PayFrequency” values in that row. In Excel, we would achieve that by typing in a product expression that multiplies two cell values. We would then “fill down” that expression to get this product for each row of our table. As we saw earlier, tables in Data Explorer don’t live in a spreadsheet grid and can have huge numbers of rows. Instead of filling down a formula to every row of a table, we instead want to specify the desired behavior at table level.

By selecting “Insert Column”, “Custom Column” in the ribbon, we get to fill out a formula builder that helps us form the right task in Data Explorer.

 

The final formula reads:

= Table.AddColumn( HumanResources.EmployeePayHistory, "Pay",
each [Rate] * [PayFrequency] )

This formula looks a little daunting, so let’s take it apart to understand what it actually does.

First of all, we are adding a column to an existing table. The function Table.AddColumn does just that. It takes an existing table (here the one named HumanResources.EmployeePayHistory introduced in the previous task of our task stream), the name of the new column (here “Pay”), and an expression that defines what value the new column should have in each of the table’s rows.

It is really only that last expression, the “each” expression, that introduces us to a few new concepts. We will return to the exact mechanics behind “each” expressions in a future blog post, but for now it suffices to think of “each” as saying: calculate the “each” expression separately for each row of the table that Table.AddColumn (or any other table-manipulating function) visits when building the new column. Then, for each of the rows, our expression each [Rate] * [PayFrequency] extracts the value of the “Rate” column in that row and multiplies it by the value of the “PayFrequency” column in that same row.

So, with a little squinting, we can actually read our entire add-column expression as: ‘Add a column to the given table, name it “Pay”, and for each row in the given table, compute a value for the new column that is the product of the given “Rate” and “PayFrequency” columns.’ With a little practice, this becomes second nature when using Data Explorer and its formula language since table-level manipulations are so very common.

At this point, we have seen a fairly large part of the Data Explorer formula language already – enough to make serious headway when using Data Explorer. We will continue visiting the formula language in future posts, delving into various more technical aspects. Our next language-related post will explain more of the basics of how things actually work, including the various shapes of data that are supported, and introduce a few new concepts such as types and custom functions. However, rest assured that the language, despite its power, is overall fairly simple and easy to learn.