One of the most interesting scenarios that “Data Explorer” enables is the ability to work with data from different kinds of sources. “Data Explorer” provides the ability to enrich data by mashing up different sources of data that might be subsequently published to be consumed from various other “last-mile” tools. If you want to learn more about the metaphor for first-mile and last-mile tools, you can check out this post written by Tim Mallalieu, our Group Program Manager for Data Explorer.
In this post we focus on a concrete scenario (based on the famous Northwind sample database, which many of you might already be familiar with). We mash up data from different sources that contain information about products, product categories, suppliers and order details. In addition to the mash up capabilities, we cover how to enrich the data by doing some calculations on top of it (such as calculating the total sales per product) and adding custom columns to various resources.
We start from the mashup editor where we have already added a few resources:
1. Product Categories: Information about the different categories of products that Northwind sells. This information is contained in an Excel file that we have added to “Data Explorer”.
2. Northwind OData Service: This service is publicly accessible at http://services.odata.org/Northwind/Northwind.svc/. In this post we consume two different entity sets from this service: Products and Order_Details.
3. Suppliers: A text document that contains information about supplier contacts for each product. We need to apply some filtering tasks using “Data Explorer” in order to convert the text file contents into a table that we can use in subsequent steps.
This is how the mashup editor looks after we have imported these three data sources and selected the two different entity sets from the Northwind OData Service.
The ProductCategoriesAndOrders resource is ready to be used as it doesn’t require any additional filtering, cleanup or calculations.
The Products table, however, has too many columns that we don’t need; we are only interested in the ProductID, ProductName, SupplierID and CategoryID (we use the latter two columns to combine this table with Categories and Suppliers).
We use “Data Explorer” to keep only the columns that we are interested in. For that purpose, we have two complementary tasks in the ribbon, named Show Columns and Hide Columns. In this case, as we only want to keep four columns, we use the Show Columns task, which lets us select the columns that we want to keep in our resource.
Once we select the columns that we are interested in, we want to add a new lookup column to this table in order to also display the Category Name for each product.
The CategoryID can be used as the column to match since it exists in both Products and ProductsCategoriesAndOrders resources. We can name this new column CategoryName and specify it as the column to lookup in the ProductCategoriesAndOrders table. Once we specify these settings, the preview is automatically updated to display the Products table with the newly added CategoryName column.
We can now hide the CategoryID column in the Products table since it is no longer needed…
Once we are done with the Products table, we can focus on shaping the Suppliers resource as needed. This resource has been imported from a text document. We can see a preview of it using “Data Explorer”, as displayed below.
Our goal is to convert this text document into a table. In order to do that we can use the task named To Table, under the Convert group in the ribbon. This task lets us configure a few settings as part of the conversion:
Once we have specified these options for the “text to table” conversion, the preview pane will update and show the result of applying this task to our data.
We have successfully converted the imported text document into a table but, similar to what we had in the Products table, there are columns in this table that we don’t need. We are only interested in keeping the first three columns: SupplierID, CompanyName and ContactName. We can use Show Columns in order to perform this filtering task.
You can see that even when the data source for each resource is different in nature (Excel, text, OData, etc.) from the others, once we import the data into “Data Explorer” we have the same set of tools and the same expressivity over all of them. We believe this is one of the most powerful capabilities that “Data Explorer” provides for dealing with heterogeneous data sources: you can focus on the meaning and structure of your data and forget about the particular technicalities on how to access it.
Once we have done this we can work on the last resource that we need to prepare for our mashup: Order_Details. As we said earlier, this is another feed that we consume from the Northwind OData service. Similar to what we did with the other resources, we keep only a few columns of interest: OrderID, ProductID, UnitPrice and Quantity.
After doing this filtering, we start processing the information in this table. We would like to calculate the total sales for each product. We do that in two separate steps:
Finally, we combine the information that we prepared in these different resources into a single table result we can easily share with others. To combine these resources (Products, Suppliers and Order_Details) we start by doing a Merge operation between the Products and Suppliers resources. We need to specify a few options in order to properly merge the two resources:
Once we have specified all these settings, the preview pane will update with the view of the merged table which includes information about Products and Suppliers (as well as Categories, since that was already in the Products as a lookup column we added earlier).
Note that the merged table has been automatically added as a new resource in the resource pane on the left. We can rename this resource to have a more descriptive name: Products&Suppliers.
Similar to what we did earlier with the Products table after adding the Category Name lookup column, we can hide the SupplierID column since we no longer need it.
Now we are left with the last operation we need to do as part of this mashup i.e. adding another lookup column with Order_Details information to the Products&Suppliers resource. The Order_Details resource has only two columns: ProductID and Total Sales per Product. As you can probably guess, we want to use this ProductID column as the match column for the lookup operation and display this Total Sales column in our final result.
Finally, we can rename this resource to “Total Sales per Product” as that better represents the contents of our mashup result.
In subsequent posts we explore the different publish alternatives that “Data Explorer” offers in order to share this result with others… Hope you enjoyed this mashup!
Stay tuned for more posts demonstrating various capabilities of codename “Data Explorer”, and don’t forget to sign up to try it if you like it!