If you are reading this blog post, you are probably aware of the Internet, a veritable cornucopia of data just sitting there waiting for you to use it. Much of that data comes from search engines, social networks, and other interactive media. In this post, we describe the experience of bringing such data into Data Explorer and using it in your mashups. The steps in this post will give details about retrieving and using data from Bing, but can also be used as a template for interacting with other similar sites that have the same interface.

Bringing Web Data into Data Explorer

Let’s begin by seeing how to bring a web data source into Data Explorer. Download and install the "Data Explorer” Desktop Client if you don’t have it yet. Start by creating a new Mashup and you will see the Add Data page.

Using the Add Data button, or the splash screen that appears when opening a new mashup, select the “Web Content” option from the list of available data sources:

Clicking the Web Content button will give a small dialog box where you can put the address of the site at which your precious, precious data is located.

Once you enter an address and click Done, if this is the first time in your mashup that you have used data from that site, you will see a dialog box that asks for credentials.

For most web sites, no credentials are necessary, so just the default option of “Anonymous Access” is the right choice; click Continue to clear the dialog to pull the data into Data Explorer. For most web pages – for instance, a Wikipedia page on frozen yogurt – the most common actions at this point will be to either get access to the raw text of the page (Open As Text in the figure below) or to the structure of the page (Open as Html in the figure below).

If at this point, however, you would like to do more complex, interesting things with our data (and of course, we do), we can access additional features by clicking on the More Tools button.

Doing so will allow us to see other potential actions as buttons, but also the formula bar, which shows us the expression formula that we have built. At this point, it is a simple call to the function Web.Contents with a single argument, the URL of our data source.

Diving Beneath the Surface of the Web

Of course, many sources of data on the web need user input to provide data. Unlike Wikipedia or FreeBase pages and other content pages with relatively static content, search engines and web data portals like Bing or Google have a more interactive interface. These kinds of sites often have two different interfaces. One interface is meant for human beings to ask questions and view data, for instance by going to http://www.bing.com and typing in a search term, resulting in something that looks like this:

The same source may also have an alternative way to access the same data in a way that is more accessible to computers, returning the data in a structured form without all of the pretty formatting. For Bing, that means the same luscious frozen yogurt data as above can be retrieved using a single URL, which returns something like this:

This is the same data, now returned in a much more computer-friendly form. The URL used to retrieve this data looks like this:

http://api.bing.net/json.aspx?Appid=12345&sources=web&query=frozen+yogurt

This URL can be taken apart into several components:

  • The base URL http://api.bing.net/json.aspx. This address is the location of the service that we are using. The suffix of the base URL determines what kind of response you would like from the service. If it ends in “json.aspx”, the response will be in JSON format. If it ends in “xml.aspx”, the result will come back in XML format.
  • A parameter “Appid”, given a value of “12345”. This API key value is given by the Bing service, and represents an account number that tracks the usage of calls using this kind of interface.
  • A parameter “sources”, given a value of “web”. The Bing resource can return many kinds of data, including spell checking, phonebook information, and the web data as shown above.
  • A parameter “query”, given a value of “frozen+yogurt”. This parameter is the term for which we search, but properly “escaped” into a valid URL piece by replacing spaces with the “+” character.

These components put together are actually a programmable web interface to all of Bing’s services. More information on that API can be found here, including information about additional parameters that can be used in the URL (for instance, web.count, which determines how many results to return for the query – the default is 10 if no value is given).

Using a Programmable Web API with Data Explorer

When adding a new Web data source from a programmable web source, you can just enter the full URL as constructed above into the same interface as the Wikipedia example:

However, these programmable web interfaces have such a common, consistent interface that Data Explorer offers a slightly cleaner way to access the same data. The Web.Contents function has a second, optional parameter that passes options to the function and makes the construction and use of programmable web URLs a little simpler in a number of ways. For instance, consider the following expression in the Data Explorer formula language:

Web.Contents(“http://api.bing.net/json.aspx”,
        [ Query =
               [
                      Appid = “12345”,
                      sources = “web”,
                      query = “frozen yogurt"
               ]
        ])

The “Query” field of the record is itself a record, a collection of name-value pairs that represent the parameters to be added to the URL. Putting this formula into Data Explorer will give you exactly the same result as if you put the full URL into the first parameter.

Notice that in the above formula, we did not need to replace the space character in our search query term. One major advantage in using the additional parameter to Web.Contents is that you do not need to worry about any kind special kind of URL construction or string processing. The function will do all of that for you when it forms the request to the remote web site.

The “Appid” parameter can be treated just like any other parameter in the query. However, many different web data sources use this form of “API key” to grant and track access to web data sources. The name of the parameter may differ between various sites; different sources may call the parameter “APIKey”, “LuggageCombination”, or “Bacon”, but in all cases, the parameter represents the same thing: a value, assigned by the site to a user, to track usage of the API. Data Explorer uses an additional field to encapsulate this “API key” pattern:

Web.Contents(“http://api.bing.net/json.aspx”,
              [
                     ApiKeyName = “Appid”,
                     Query =
                          [
                                 sources = “web”,
                                 query = “frozen yogurt"
                          ]
               ])

If you use this extra field, you no longer enter the API key itself as part of the formula. Instead, you enter it when Data Explorer asks you for credentials, like so:

When this credential window comes up, enter the value of your Bing API key into the field marked above. Using the credential store to manage the API key has several key advantages over just treating the key as another query parameter:

  • If you send your mashup code or mashup file to someone else, you won’t send your precious key along with it. That way, other people won’t be able to send millions of requests to Google or to Wolfram Alpha and send you the bill. When they try to run the same formula, they will get the same credential window and be able to enter their own key.
  • The API key is no longer part of the formula itself, so it won’t show up in things like blog posts talking about Data Explorer.

Using the Result of Programmable Web Calls

The results of a call to a web API may come back in any number of formats, but the most common formats are JSON and XML. The running example so far returns data in the JSON format. Data Explorer has a function, Json.Document, that can parse such data into structured forms that is much simpler to work with and manipulate. To use it, in the formula bar, wrap the current expression inside a call to Json.Document (there is not yet a convenient button to do this, but there will be):

From here, you can navigate through the response until you get to the data that you want, navigating to “SearchResponse”, then “Web”, then “Results”:

The result happens to be a list of records representing the return values from the search query. You may suspect that this data could actually be shaped into a table, and you would be correct – pressing the “Into Table” button will tell Data Explorer to attempt to analyze the current data and see if it can be shaped into a table:

The builder for this function gives you the option to shape the schema of the table, just in case Data Explorer’s analysis of the table is not complete. For instance, if all of the available rows in the table have values for the “Title”, but you want that column to be able to hold null values as well, you would check the checkbox next to “Title” in the builder making the column nullable. However, in this case, the builder gives us the right results by default, so we need not do anything further. We now have a resource in Data Explorer that queries Bing for the search term “frozen yogurt”, and returns the first 10 results as a nicely structured table.

Dust Off Your Propeller Hat

For those of you wanting to stretch your Data Explorer muscles at this point, we go one step further: Let’s turn this resource into our very own Bing search function. We will create a new function that can be called on any search term and return its top ten results, because (contrary to what you may have seen on our blogs so far) there are interesting topics besides frozen yogurt.

Actually, in the Data Explorer formula language, turning a resource into a function – and using it – is quite simple. Begin by viewing the formulas for your current mashup (directions on how to do so can be conveniently found in a previous blog post). You will see a formula that looks like the following:

 

section Section1;

shared BingFrozenYogurt =

let

BingFrozenYogurt = Json.Document(Web.Contents("http://api.bing.net/json.aspx", [ ApiKeyName = "Appid", Query = [ sources = "web", query = "frozen yogurt" ] ])),

Results = BingFrozenYogurt[SearchResponse][Web][Results],

IntoTable = Results as table [Title = Text.Type, Description = Text.Type, Url = Text.Type, CacheUrl = Text.Type, DisplayUrl = Text.Type, DateTime = Text.Type]

in

IntoTable;

To this formula, we will make three very simple changes:

  • Rename the resource to “BingSearch” from “BingFrozenYogurt”
  • Add a parameter to the resource, making it into a function that we can call later
  • Use the new parameter in place of the existing frozen yogurt search term

The resulting formula will look like this:

section Section1;

shared BingSearch = (searchTerm) =>

let

BingFrozenYogurt = Json.Document(Web.Contents("http://api.bing.net/json.aspx", [ ApiKeyName = "Appid", Query = [ sources = "web", query = searchTerm ] ])),

Results = BingFrozenYogurt[SearchResponse][Web][Results],

IntoTable = Results as table [Title = Text.Type, Description = Text.Type, Url = Text.Type, CacheUrl = Text.Type, DisplayUrl = Text.Type, DateTime = Text.Type]

in

IntoTable;

Clicking “Done” in the formula window will let you view your new resource function:

Notice the “Invoke Function” button in the ribbon. If you click that button, it will allow you to call that function on a value of your choice. Sending “frozen yogurt” to that function will give you the same familiar results that we saw before:

The other button, “Invoke over table”, allows you to choose another resource in your mashup if it is a table and call your function on it, once per row. Using that feature, you can take a table from any other source – SQL, for example – and use your new Bing function to look up search terms for each item. In case your propeller hat is still spinning at this point, there will likely be a blog post in the near future to talk more in-depth about writing and using functions in Data Explorer.

While the examples given in this post have focused on the Bing service, the combination of base URL, API key, and parameters is a common pattern amongst many web data sources that offer a programmable web API. Google, Yahoo, Wolfram Alpha, and Twitter all have interfaces with a similar pattern, and so the steps used above will work for those services as well without trouble.