May, 2011

  • Windows Azure Marketplace DataMarket Blog

    An Introduction to DataMarket with PHP


    Hi! I’m Jovana, and I’m currently interning on the DataMarket team. I come from sunny Western Australia, where I’ve almost finished a degree in Computer Science and Mechatronics Engineering. When I came here I noticed that there wasn’t too much available in the way of tutorials for users who wanted to use DataMarket data in a project, but weren’t C# programmers. I’d written a total of one function in C# before coming here, so I’d definitely classify myself in that category. The languages I’m most familiar with are PHP, Python and Java, so over the next few weeks I’ll do a series of posts giving a basic introduction to consuming data from DataMarket using these languages. I’ll refer to the 2006 – 2008 Crime in the United States ( dataset for these posts, which is free to subscribe to, and allows unlimited transactions.

    In this post I’ll outline two methods for using PHP to query DataMarket; using the PHP OData SDK, and using cURL to read and then parse the xml data feed. For either method, you’ll firstly need to subscribe to a dataset, and make a note your DataMarket account key. Your account key can be found by clicking “My Data” or “My Account” near the top of the DataMarket webpage, then choosing “Account Keys” in the sidebar.


    The PHP OData SDK

    DataMarket uses the OData protocol to query data, a relatively new format released under the Microsoft Open Specification Promise. One of the ways to query DataMarket with PHP is to use the PHP OData SDK, developed by Persistent Systems Ltd. This is freely available from CodePlex, however unfortunately there seems to be little developer activity on the project since its release in March 2010, and users report that they need to do some source code modifications to get it to work on Unix systems. Setting up the SDK also involves making some basic changes to the PHP configuration file, potentially a problem on some hosted web servers.

    A word of warning: not all DataMarket datasets can be queried with the PHP OData SDK! DataMarket datasets can have one of two query types, fixed or flexible. To check which type a particular set is, click on the “Details” tab in the dataset description page. The SDK only supports datasets with flexible queries. Another way to check is to take a look at the feed’s metadata. Copy the service URL, also found under the “Details” tab into your browser’s address bar and add $metadata after the trailing slash. Some browsers have trouble rendering the metadata; if you get an error, save the page and open it up in notepad. Look for the tab containing <schema xmlns=”…”> (There will probably be other attributes, such as namespace, in this tab). The PHP OData SDK will only work with metadata documents specifying their schema xmlns ending in one of “/2007/05/edm”, “/2006/04/edm” or “/2008/09/edm”.

    Generating a Proxy Class

    The PHP OData SDK comes with a PHP utility to generate a proxy class for a given OData feed. The file it generates is essentially a PHP model of the feed. The command to generate the file is

    php PHPDataSvcUtil.php /uri=[Dataset’s service URL]

    /out=[Name out output file]

    Once generated, check that the output file was created successfully. The file should contain at least one full class definition. Below is a snippet of the class generated for the Crime dataset. The full class is around 340 lines long.

    * Function returns DataServiceQuery reference for
    * the entityset CityCrime
    * @return DataServiceQuery
    public function CityCrime()
    return $this->_CityCrime;
    Using the Proxy class

    With the hardest part complete, you are now ready to start consuming data! Insert a reference to the proxy class at the top of your PHP document.

    require_once "datagovCrimesContainer.php";

    Now you are ready to load the proxy. You’ll also need to pass in your account key for authentication.

    $key = [Your Account Key];

    $context = new datagovCrimesContainer();

    $context->Credential = new WindowsCredential("key", $key);

    The next step is to construct and run the query. There are a number of query functions available; these are documented with examples in the user guide. Keep in mind that queries can’t always be filtered by any of the parameters– for this particular dataset we can specify ROWID, State, City and Year. The valid input parameters can be found under the dataset’s “Details” tab. Note that some datasets have mandatory input parameters.

    $query = $context->CityCrime()
    ->Filter("State eq 'Washington' and Year eq 2007");
    $result = $query->Execute();
    catch (DataServiceRequestException $e)
    echo "Error: " . $e->Response->getError();
    $crimes = $result->Result;

    (If you get a warning message from cURL that isn’t relevant to the current environment, try adding @ in front of $query to suppress warnings.)

    In this example we’ll construct a table to display some of the result data.

    echo “<table>”;
    foreach ($crimes as $row)
    echo "<tr><td>" . htmlspecialchars($row->City) . "</td>";
    echo "<td>" . htmlspecialchars($row->Population) . "</td>";
    echo "<td>" . htmlspecialchars($row->Arson) . "</td></tr>";
    echo "</table>";

    DataMarket will return up to 100 results for each query, so if you expect more than 100 results you’ll need to execute several queries. We simply need to wrap the execute command in some logic to determine whether all results have been returned yet.

    $nextCityToken = null;
    while(($nextCityToken = $result->GetContinuation()) != null)
    $result = @$context->Execute($nextCityToken);
    $crimes = array_merge($crimes, $result->Result);

    The documentation provided with the SDK outlines a few other available query options, such as sorting. Some users have reported bugs arising if certain options are used together, so be sure to test that your results are what you expect.

    Using cURL/libcurl

    If the PHP OData SDK isn’t suitable for your purpose, another option is to assemble the URL to the data you are after, then send a request for it using cURL and parse the XML result. DataMarket’s built in query explorer can help you out here – add any required parameters to the fields on the left, then click on the blue arrow to show the URL that corresponds to the query. Remember that any ampersands or other special characters will need to be escaped.

    The cURL request

    We use cURL to request the XML feed that corresponds to the query URL from DataMarket. Although there are a number of options that can be set, the following are all that is required for requests to DataMarket.

    $ch = curl_init(); 
    curl_setopt($ch, CURLOPT_URL, $queryUrl); 
    curl_setopt($ch, CURLOPT_USERPWD, ":" . $key);  
    curl_setopt($ch, CURLOPT_RETURNTRANSFER,  true); 
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    $response = curl_exec($ch); 

    The $response variable now contains the XML result for the query.

    Parsing the response

    Before using the data, you’ll need to parse the XML. Because each XML feed is different, each dataset needs a parser tailored especially to it. There are a number of methods of putting together a parser, the example below uses xml_parser.

    The first step is to create a new class to model each row in the result data.

    class CityCrime
    var $City;
    var $Population;
    var $Arson;
    public function __construct()

    I’m also going to wrap the all the parser functions in a class of their own. This function will be called with the query uri and account key. Firstly I’ll give it some class variables to store the data that has been parsed.

    class CrimeParser
    var $entries = array();
    var $count = 0;
    var $currentTag = "";
    var $key = "";
    var $uri = "";
    public function __construct($key, $uri) 
    $this->key = $key;
    $this->uri = $uri;

    The parser requires OpenTag and CloseTag functions to specify what should happen when it reaches an open tag or close tag in the XML. In this case, we append or remove the tag name from the $currentTag string.

    private function OpenTag($xmlParser, $data)
    $this->currentTag .= "/$data";
    private function CloseTag($xmlParser, $data)
    $tagKey = strrpos($this->currentTag, '/');
    $this->currentTag = substr($this->currentTag, 0, $tagKey);

    Now we are ready to write a handler function. Firstly declare the tags of all the keys that you wish to store. One method of finding the tags is to run the code using a basic handler function that simply prints out all tags as they are encountered.

    private function DataHandler($xmlParser, $data)
    print "$this->currentTag <br/>";

    The switch statement in the handler needs a case for each key. We also need to let it know when it reaches a new object – from running the code with the previous handler, I knew that the properties for each row started and finished with the tag /FEED/ENTRY/CONTENT, so I’ll add a class variable to keep track of when the handler comes across that tag – every second time it comes across it I know that the result row has been fully processed.

    var $contentOpen = false;
    const rowKey = '/FEED/ENTRY/CONTENT';
    private function DataHandler($xmlParser, $data)
    case strtoupper(self::rowKey):
    if ($this->contentOpen)
    $this->contentOpen = false;
    $this->entries[$this->count] = new CityCrime();
    $this->contentOpen = true;
    case strtoupper(self::cityKey):
    $this->entries[$this->count]->City = $data;
    case strtoupper(self::populationKey): 
    $this->entries[$this->count]->Population = $data;
    case strtoupper(self::arsonKey): 
    $this->entries[$this->count]->Arson = $data;

    Now we create the parser, and parse the result from the cURL query.

    $xmlParser = xml_parser_create(); 
    xml_set_element_handler($xmlParser, "self::OpenTag","self::CloseTag"); 
    xml_set_character_data_handler($xmlParser, "self::DataHandler"); 
    if(!(xml_parse($xmlParser, $xml)))
    die("Error on line " . xml_get_current_line_number($xmlParser)); 

    After the call to xml_parse, the $entries will be populated. A table of the data can now be printed using the same foreach code as the SDK example, or manipulated in any way you see fit.

    Final Thoughts

    The two methods of consuming data from DataMarket with PHP both have their strengths and weaknesses. A proxy class generated from the OData SDK is very easy to add to existing code, but setting up the library can be tedious, and there is not much support available for it. Using cURL and parsing the xml provides slightly more flexibility, but requires much more coding to set up.

    Since it only requires an URL and an Account key, opening the connection to DataMarket is very straightforward, whichever method is chosen. If the dataset you’re connecting to is free, I suggest opening Service Explorer and trying out various queries to get a feel for the data. Both methods shown above will result in the dataset’s conversion to an associative array, from which data can be manipulated using any of the PHP functions available.

    At this stage, if you want to access a flexible query dataset, and are able to modify your PHP configuration file, the PHP OData SDK is a good tool for accessing OData feeds. However, if you want access to a fixed query dataset, or are unable to modify the configuration file, using cURL and parsing the result is straightforward enough to still be a valid option.

    - Jovana Taylor

  • Windows Azure Marketplace DataMarket Blog

    Announcing the release of the Windows Azure Marketplace DataMarket Service Update 2!


    Announcing the release of the DataMarket Service Update 2! With this release, we’re rolling out a little something for everyone. First up: check out our shiny new Service Explorer:


    The offering query UI has evolved into more of an all-purpose toolbox. Now you can create data visualizations right there in your browser. And if you want to do more with the data – but don’t feel like whipping up some code – now you can download your query results in XLS, CSV and XML formats (you already know about our DataMarket Add-In for Excel, don’t you? The CTP2 of the Add-In, shipped today, will no longer require Account key and you can simply use your Windows Live ID credentials).


    Have you found yourself exploring the marketplace and found an offering you wanted to share – maybe some historic weather observations from WeatherBug (is it really normal to have this much rain in Seattle?), or nutrition information from Gregg London Consulting (hmm, better not have the second helping of dessert after all)? Now you can tell all of your friends about it with our new Facebook Like Button.

    For all you developers out there, we’ve made it easier for you to bring valuable data into your apps by enabling Free Trials! This functionality allows you to subscribe to offerings for free, if the content publisher has chosen to publish a trial, so you can wrap your brain around the data and the best way to pop it into your app. Dun & Bradstreet, Boundary Solutions, MetricMash and StrikeIron helped us launch this new feature by creating a bunch of free trials – be sure to check out their free stuff.

    Did you want to your application to access the dataset on behalf of the user? Now you can with OAuth v2 integration in SU2! Surely enough, the user has to actually approve the usage as part of the consent flow (enjoy the snapshot below).


     Hello everyone outside the United States! We haven’t forgotten about you. With this release we’re introducing the first phase of our international expansion. If you’re in Australia, Austria, Canada, France, Germany, Italy, Spain and the United Kingdom, now you can subscribe to commercial DataMarket offerings – paying for them in your local currency! Thank you MetricMash and StrikeIron for helping us launch this important expansion by making your offerings available to people outside the US.
    PS – If you’re not in one of the countries listed above, fear not, we still haven’t forgotten about you…

    And last, but certainly not least, we’re also releasing offerings from three new content publishers: Digital Trowel, InfoChimps and MetricMash. Welcome to the DataMarket, guys! And, best of all… all three have published some sort of free service so you can get started with their stuff at no cost!

    • Powerlinx Database of US based companies and professionals from Digital Trowel: Powerlinx contains contact information of more than 10 million USA companies with website addresses, 25M detailed company profiles, 25M executives including 5 million in-depth profiles with email addresses and phone numbers.
    • Trstrank from Infochimps: The Infochimps Trstrank measures how important a Twitter user is. It’s a sophisticated network measure of centrality, not just a count of the number of followers a user has.
    • U.S. Unemployment Data - 1948 to Current from MetricMash: This dataset is based on the Current Population Survey (CPS) published by the U.S. Bureau of Labor Statistics and is updated monthly. The data is broken down into these unemployment categories: rate; length, population and labor force participation.

    The DataMarket Team

  • Windows Azure Marketplace DataMarket Blog

    Leveraging DataMarket to Create Cloud-Powered Digital Dashboards


    On May 24 at 8 am PST, join our upcoming MSDN webcast “Leveraging DataMarket to create Cloud-Powered Digital Dashboards. Milos Glisic, Director of Development, ComponentArt Inc., and Christian Liensberger, Program Manager, Microsoft Corporation will showcase ComponentArt’s Data Visualization technology to create interactive, web-based digital dashboards in Microsoft Silverlight 4 and mobile dashboards on Windows Phone 7 using Windows Azure DataMarket.

    Click here to register

    The DataMarket Team

Page 1 of 1 (3 items)