One of the most interesting scenarios that “Data Explorer” enables is the ability to work with data from different kinds of sources. “Data Explorer” provides the ability to enrich data by mashing up different sources of data that might be subsequently published to be consumed from various other “last-mile” tools. If you want to learn more about the metaphor for first-mile and last-mile tools, you can check out this post written by Tim Mallalieu, our Group Program Manager for Data Explorer.

In this post we focus on a concrete scenario (based on the famous Northwind sample database, which many of you might already be familiar with). We mash up data from different sources that contain information about products, product categories, suppliers and order details. In addition to the mash up capabilities, we cover how to enrich the data by doing some calculations on top of it (such as calculating the total sales per product) and adding custom columns to various resources.

We start from the mashup editor where we have already added a few resources:

1. Product Categories: Information about the different categories of products that Northwind sells. This information is contained in an Excel file that we have added to “Data Explorer”.

image

2. Northwind OData Service: This service is publicly accessible at http://services.odata.org/Northwind/Northwind.svc/. In this post we consume two different entity sets from this service: Products and Order_Details.

image

3. Suppliers: A text document that contains information about supplier contacts for each product. We need to apply some filtering tasks using “Data Explorer” in order to convert the text file contents into a table that we can use in subsequent steps.

image

This is how the mashup editor looks after we have imported these three data sources and selected the two different entity sets from the Northwind OData Service.

image

The ProductCategoriesAndOrders resource is ready to be used as it doesn’t require any additional filtering, cleanup or calculations.

The Products table, however, has too many columns that we don’t need; we are only interested in the ProductID, ProductName, SupplierID and CategoryID (we use the latter two columns to combine this table with Categories and Suppliers).

image

We use “Data Explorer” to keep only the columns that we are interested in. For that purpose, we have two complementary tasks in the ribbon, named Show Columns and Hide Columns. In this case, as we only want to keep four columns, we use the Show Columns task, which lets us select the columns that we want to keep in our resource.

image

Once we select the columns that we are interested in, we want to add a new lookup column to this table in order to also display the Category Name for each product.

7

The CategoryID can be used as the column to match since it exists in both Products and ProductsCategoriesAndOrders resources. We can name this new column CategoryName and specify it as the column to lookup in the ProductCategoriesAndOrders table. Once we specify these settings, the preview is automatically updated to display the Products table with the newly added CategoryName column.

8

We can now hide the CategoryID column in the Products table since it is no longer needed…

9

Once we are done with the Products table, we can focus on shaping the Suppliers resource as needed. This resource has been imported from a text document. We can see a preview of it using “Data Explorer”, as displayed below.

10

Our goal is to convert this text document into a table. In order to do that we can use the task named To Table, under the Convert group in the ribbon. This task lets us configure a few settings as part of the conversion:

  • Delimiter: When dealing with text files there are many different possibilities for what the delimiter could be between columns; sometimes it will be simply a white space, other times it can be a custom character (such as a dash (-), or a bar ( | )…). In this case, the delimiter is actually a tab.
  • Number of lines to skip from start to text: There might be cases in which the entire document strictly contains only the desired table. In this particular case, however, there are three lines at the beginning of the file that we need to ignore for the table conversion.
  • Use First Row as Headers: This is an option that also appears when importing an Excel file, for instance. In many cases, what we want to use as the column headers or names is part of the first row or line in the input file. In this case, indeed, our desired column headers are located in the 1st line after we skip the three noisy rows as we just discussed.

Once we have specified these options for the “text to table” conversion, the preview pane will update and show the result of applying this task to our data.

11

We have successfully converted the imported text document into a table but, similar to what we had in the Products table, there are columns in this table that we don’t need. We are only interested in keeping the first three columns: SupplierID, CompanyName and ContactName. We can use Show Columns in order to perform this filtering task.

You can see that even when the data source for each resource is different in nature (Excel, text, OData, etc.) from the others, once we import the data into “Data Explorer” we have the same set of tools and the same expressivity over all of them. We believe this is one of the most powerful capabilities that “Data Explorer” provides for dealing with heterogeneous data sources: you can focus on the meaning and structure of your data and forget about the particular technicalities on how to access it.

12

Once we have done this we can work on the last resource that we need to prepare for our mashup: Order_Details. As we said earlier, this is another feed that we consume from the Northwind OData service. Similar to what we did with the other resources, we keep only a few columns of interest: OrderID, ProductID, UnitPrice and Quantity.

13

After doing this filtering, we start processing the information in this table. We would like to calculate the total sales for each product. We do that in two separate steps:

  • First we need to calculate the line total by adding a new column that applies the formula Unit Price x Quantity on each row. We achieve this by adding a new column, in this case, a Custom Column and specifying the name and formula for the new column. We will get into more details about the formula language in a subsequent post.

14

  • After calculating the line total we “Summarize” or group the rows by ProductID, by specifying an aggregate expression as a formula. For this example, we want our aggregate expression to be the sum of line totals for all the rows with the same ProductID.

15

Finally, we combine the information that we prepared in these different resources into a single table result we can easily share with others. To combine these resources (Products, Suppliers and Order_Details) we start by doing a Merge operation between the Products and Suppliers resources. We need to specify a few options in order to properly merge the two resources:

  • Right Table: We are using the Products table as left table since we started the merge operation while that resource was selected, however, we can select the right table from all the existing resources available. In this case we use the Suppliers resource.
  • Prefixes: To avoid column names collisions or to better keep a notion of where each column in the merged table comes from, we can specify a prefix to be automatically added to each column name. In this case, since we are dealing with only a few columns in each table and since none of them will collide, we simply skip this setting.
  • Include rows that don’t match: We can specify that we want to include in the resulting table all the rows that don’t match from either the left or right table. In this case, we want to stick to the rows that match so we will not check any of these two options.
  • Match condition: Above each column name from the left table in the preview, there is a dropdown menu that lets us pick the column from the right table that we would want to use in order to match values from each table. In this case, we use the SupplierID as the column to match in both the Products and the Suppliers table.

Once we have specified all these settings, the preview pane will update with the view of the merged table which includes information about Products and Suppliers (as well as Categories, since that was already in the Products as a lookup column we added earlier).

17

Note that the merged table has been automatically added as a new resource in the resource pane on the left. We can rename this resource to have a more descriptive name: Products&Suppliers.

Similar to what we did earlier with the Products table after adding the Category Name lookup column, we can hide the SupplierID column since we no longer need it.

Now we are left with the last operation we need to do as part of this mashup i.e. adding another lookup column with Order_Details information to the Products&Suppliers resource. The Order_Details resource has only two columns: ProductID and Total Sales per Product. As you can probably guess, we want to use this ProductID column as the match column for the lookup operation and display this Total Sales column in our final result.

18

Finally, we can rename this resource to “Total Sales per Product” as that better represents the contents of our mashup result.

19

In subsequent posts we explore the different publish alternatives that “Data Explorer” offers in order to share this result with others… Hope you enjoyed this mashup!

Stay tuned for more posts demonstrating various capabilities of codename “Data Explorer”, and don’t forget to sign up to try it if you like it!