The goal of Astoria is to make data available to loosely coupled systems for querying and manipulation. In order to do that we need to use protocols that define the interaction model between the producer and the consumer of that data, and of course we have to serialize the data in some form that all the involved parties understand. So protocols and formats are an important topic in our design process.
With that in mind, we’ve made sure that Astoria can handle a number of options from the system architecture perspective. Of course, each additional option has the potential of introducing added complexity to the system, and also adds to the overall development and testing time of the system, so we have to pick a small set to start with for the first release.
This entry focuses on formats. In general, introducing support for a format in Astoria means going through the exercise of mapping the data model used in Astoria to the different constructs of the target format. Astoria is largely an EDM-based system from the data model perspective, so one could see the problem as creating serialization forms for EDM-schematized data.
Key questions:
NOTE: we’ve discussed these formats to a ridiculous level of detail during our design meetings. Here I’m including the most relevant aspects and those that will show up immediately when looking at a request or response. Once we have cleaner specs we can post all the details if there is demand for them.
Multiple formats, one protocol (almost)
For the most part there is a single “protocol”, and by that I mean the set of HTTP headers for requests and responses, as well as the overall interaction model. In certain cases in order to make a format really look natural to clients we do need to introduce a format-specific protocol element, but we try to keep those to a minimum.
Also, any added protocol elements on top of HTTP need to be done so that unsophisticated agents can ignore a lot of that, do “plain HTTP” and still get by for the most part.
A full discussion on protocol details needs its own write-up, and it’ll come later on. The short story is that we leverage as much as HTTP as we can; HTTP verbs are used for the basic operations, and negotiation such as content-type and charsets are done in the standard way.
Now, with the almost-single protocol in place, the question comes to which formats should we do. Right now we’re thinking ATOM/APP, Web3S, and JSON. We need to define the basic requirements for any format used in Astoria, and then map those to each format we want to support. That’s what comes next.
Astoria requirements for formats
For each format we want to support in Astoria we need to introduce representations for the following constructs:
These formats are used, in somewhat different contexts, for all Astoria operations, including retrieval, update and query over resources. Formats are as symmetric as we can make them for retrieval and update (you get from, and send to the server the same thing, except if very rare cases).
In general Astoria services don’t embed a lot of typing information in the payloads. Some formats include some “light” typing (e.g. JSON), but if a client wants strict typing then the metadata resource should be accessed and the complete typing information extracted from it (more about metadata resources in a future post).
ATOM/APP
This format follows the specification described in APP and ATOM (and it goes beyond formats to include protocol-related aspects). I resisted APP at first. My question was “why do you want to use a feed format to represent data that is a temporal feed”. After pretty much everybody inside/outside Microsoft and within the Astoria team pushed for it, I realized that I was missing an obvious aspect: the fact that it’s a feed format is completely irrelevant. The important characteristic about ATOM/APP is that everybody speaks it, and it’s increasingly becoming kind of a lingua franca of the web when it comes to data exchange between unrelated agents.
It turns out that many of the ATOM and APP concepts map pretty well to the way Astoria exposes data.
From the perspective of manipulating data (i.e. adding, removing, modifying entries), APP follows HTTP very closely, which makes our life easy from the Astoria perspective, as the system is built from the ground up to map to HTTP operational semantics.
There are some caveats with mapping data to an ATOM feed, and to APP more in general to cover for updates, service documents and collections. Most of them have to do with certain required elements in ATOM that do not have a direct mapping in Astoria or EDM in general. For example “title” and “author” elements of “feed” are tricky to map. ATOM was the last protocol in our design list so it’s the least baked right now. Once we have all the rules for it we’ll write an ATOM/APP specific post to discuss the details.
The production code does not yet produce APP, so I can’t include an example right now. Once this is complete we’ll iterate again on the topic of APP and include samples.
JSON
The JSON (Javascript Object Notation) format has become popular lately mostly (my guess) because it makes it easy to consume results from services in Javascript environments, particularly inside web browsers. JSON is a simple format, with no prescription and no widely adopted standards on top of the base format. On one hand that means a fairly clean serialization form, on the other it means that we have to do a few tricks in order to accommodate our requirements.
In addition to be compliant, we tried to design the JSON format so that it’s friendly to their main consumers: javascript clients. What that means in practice is that whenever possible we structured the format so that the result of doing an “eval” of a given service response results in an object graph that is fairly natural and can be used directly from Javascript code without seeing any artifacts in it.
The JSON view of Astoria resources is fairly simple:
To show an example I’ll use the typical Northwind sample database, and pick a single “Customer” entity; this response is a set like the one that would be returned from a top-level container:
[
{
__metadata: {
uri: "Customers!\'ALFKI\'",
type: "NorthwindModel.Customer"
},
CustomerID: "ALFKI",
CompanyName: "Alfreds Futterkiste",
ContactName: "Maria Anders",
ContactTitle: "Sales Representative",
Address: "Obere Str. 57",
City: "Berlin",
Region: null,
PostalCode: "12209",
Country: "Germany",
Phone: "030-0074321",
Fax: "030-0076545",
Orders: {
__deferred: {
uri: "Customers!\'ALFKI\'/Orders"
}
CustomerDemographics: {
uri: "Customers!\'ALFKI\'/CustomerDemographics"
]
Web3S
In the first CTP release of Astoria we had a “plain” XML format we informally called POX for plain-old XML. The goal of this format was to introduce something that did not added extra concepts on top of Astoria/EDM and just presented the data in XML for easy parsing.
It turns out that at the same time the folks at the Live organization in Microsoft were working on a protocol for data exchange that would be used uniformly across many different services. That protocol is called Web3S and its extensively documented here, and there is also a FAQ here.
The obvious question is: should we still roll our own custom XML format or try to unify with Web3S. Since in this kind of thing less is often more, we’ve been working hard trying to unify them. What that means in practice is that Astoria would speak Web3S, and we would make some tweaks to Web3S so that it work well with Astoria services and clients.
Making Astoria work with Web3S is a bit tricky…the main issue is that Web3S models data as a big tree, where as Astoria uses a record-oriented view of the world (entities and associations to be precise). The mapping goes more or less as follows:
Here is an example of the typical Customer entity from the Northwind sample database in Web3S format; in this case it includes a wrapping Customers element, as in when the entity is returned as part of a set:
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<Customers xmlns:web3s="Web3S:" xmlns:dw="http://schemas.microsoft.com/ado/2007/08/dataweb" xml:base="/Northwind.svc" xmlns="Web3SBase:NorthwindModel">
<Customer>
<web3s:ID>'ALFKI'</web3s:ID>
<web3s:Uri>Customers/Customer('ALFKI')</web3s:Uri>
<CustomerID>ALFKI</CustomerID>
<CompanyName>Alfreds Futterkiste</CompanyName>
<ContactName>Maria Anders</ContactName>
<ContactTitle>Sales Representative</ContactTitle>
<Address>Obere Str. 57</Address>
<City>Berlin</City>
<Region>-</Region>
<PostalCode>12209</PostalCode>
<Country>Germany</Country>
<Phone>030-0074321</Phone>
<Fax>030-0076545</Fax>
<Orders>
<web3s:DeferredContent />
</Orders>
<CustomerDemographics>
</CustomerDemographics>
</Customer>
</Customers>
What happened with RDF?
The May 2007 CTP also included support for RDF. While we got positive comments about the fact we supported it, we didn’t see any early user actually using it and we haven’t seen a particular popular scenario where RDF was a must-have. So we are thinking that we may not include RDF as a format in the first release of Astoria, and focus on the other 3 formats (which are already a bunch from the development/testing perspective).
My personal take is that while I understand how RDF fits in the picture of the semantic web and related tools, the semantic web goes well beyond a particular format. The point is to have well-defined, derivable semantics from services. I believe that Astoria does this independently of the format being used. That, combined with the fact that we didn’t see a strong demand for it, put RDF lower in our priority lists for formats.
Retrieval, query and update semantics
This post did not elaborate a lot on how payloads are actually used in retrieval and update operations. This is already a long post as it is, so let’s call this “the formats initial write-up” and stop it here. Future posts will discuss these topics and also dig into specific details of each particular format.
Pablo CastroTechnical LeadMicrosoft Corporationhttp://blogs.msdn.com/pablo
This post is part of the transparent design exercise in the Astoria Team. To understand how it works and how your feedback will be used please look at this post.