Different applications have different requirements around consistency and how concurrent modifications are handled. I’ll oversimplify and put all these applications in two buckets: either you care about controlling concurrent changes or you don’t.
If you’re creating a REST interface to your data and don’t care about concurrency (e.g. no deep consistency rules, or nice units of change that change in whole consistent ways), then you can use the basic HTTP methods to retrieve (GET) and manipulate (POST, PUT, DELETE) resources directly without any more context than the representations of your resources. You get “last one wins” semantics on updates in this case.
On the other hand, if you do care about concurrency in your REST interface, there are more aspects to take into consideration. If your resources are atomic (no further structure that’s interesting from the concurrent changes perspective than the resource as a whole), then you can have an out of band mechanism for creating a “version number” for each resource -typically a monotonically increasing number- and use HTTP’s existing mechanism to ensure you overwrite stuff that you know about. In HTTP you can stick an “entity tag” or ETag to your responses that contain an opaque value used to denote the version or state of a resource. Later on, when you want to modify a resource, you can use that value in a “if-match” request header to make sure that your knowledge about the state of the resource you’re modifying is still current. If it’s not the resource in the sever would have an ETag that won’t match the one you provided and you’d get back a 412 “Precondition failed” status code. All that is standard HTTP 1.1 stuff described in RFC 2616. (ETags are also used for caching and conditional gets in addition to the scenario I described here).
Now, REST data services that expose structured data have to deal with various challenges beyond the basics, which I’ll go into details below. While I discuss this in the context of the ADO.NET Data Services Framework (Astoria), I’m sure some of these problems apply to a broader set of applications.
Creating ETags: concurrency tokens
The data services framework has to deal with the fact that we don’t control the data sources that we expose through the REST interface. Sometimes each entity that we turn into a resource will have a nice clean property that’s a timestamp or similar and maps perfectly to ETag semantics (e.g. whenever we change the value in a significant way the value of this property changes). However, often the schema of the underlying data is not under the control of the service developer so we have to work with what we have. What that means in practice is that you can tell the data services framework which properties of each entity type are “concurrency tokens”. Changing those values means that you chanced the version of the resource.
The way you do that in the framework is by using an [ETag(props…)] attribute in your class or an annotation in your EDM schema. For types that don’t have any concurrency tokens we won’t generate ETags for the responses for those types, and they get “last one wins” update behavior.
Once you indicated which property or properties are your concurrency tokens we can produce ETags by using the values of those properties for the particular instance we’re returning.
During update the data services runtime works with the data source to determine whether the concurrency token values that were marshaled through ETags and if-match headers still match, and if so perform the update/delete operation. If they don’t match a 412 response is sent to the client.
Including ETags in headers and/or payloads
The HTTP spec describes the ETag response header to transport the entity tag for a given resource. That works great for us for cases were we respond with a single entity (e.g. an entry in Atom terms), but it doesn’t when we return a collection of entries from a URL (e.g. an Atom feed). For the latter scenario, we include the ETag as part of the resource representation (in the entry for Atom, in the “__metadata” property for JSON), for example:
<entry m:type="BikesModel.Customer" m:etag="'A%20Bike%20Store'">
<!-- rest of the entry -->
Validation during side-effecting operations
Concurrency tokens are validated whenever you perform an operation that affects the state of an existing resource. In the data services REST interface that means HTTP PUT and DELETE methods.
As I mentioned above, validation happens during update processing by extracting the “original” values from the ETag (which was sent back through the if-match header) and comparing them with the data in the data source. If they are the same, we consider the whole resource the same and proceed with the modification.
An interesting question is whether presenting an ETag in a if-match header should be mandatory for resources that have concurrency tokens. Put another way: should the decision of whether it’s ok to potentially overwrite changes based on state knowledge be up to the client or restricted by the server? The HTTP spec defines a special value of “*” for the if-match header that effectively means “any value will match”. The behavior that we are planning for is that if an entity type has concurrency tokens then we’ll always require an “if-match” header in modification operations. The header value can be an actual ETag obtained through a GET request or “*” meaning “I know this type supports concurrency control, but I’ll overwrite it anyway”.
Almost, but not quite, a perfect match
HTTP ETags and conditional operations are almost a perfect match to what we need to handle concurrent activity in RESTful data services. There are, however, a few glitches. This is where we get into the fine-print that’s not necessarily popular knowledge. Mike brought up many of these details I wasn’t aware of.
ETags can be “strong entity tags” or “weak entity tags”. Weak ETags are very similar to what happens when we have entities for which only some properties are designated concurrency tokens. From section 13.3.3 of the HTTP spec:
“However, there might be cases when a server prefers to change the
validator only on semantically significant changes, and not when
insignificant aspects of the entity change. A validator that does not
always change when the resource changes is a "weak validator." “
The problem is that weak ETags only apply to GET operations, they cannot be used for PUT/DELETE which is what we’re trying to do.
For cases where you own the data, the data services framework can expose a compliant interface by using constructs such as timestamps (if using a database as a data source), where any change in the entity will reflect in the ETag. You can also used a relaxed form of ETags where the entity might change but the ETag stay the same. It’s not completely HTTP compliant and may confuse intermediate systems, but it may be your only option in some scenarios.
As always, thoughts and feedback is welcome.
Pablo CastroSoftware ArchitectMicrosoft Corporation
This post is part of the transparent design exercise in the Astoria Team. To understand how it works and how your feedback will be used please look at this post.
PingBack from http://microsoftnews.askpcdoc.com/?p=3814
If you're going to implement etags - it would also be helpful to also support the if-none-match request header for GETs. This allows efficient conditional GETs for cache updates.
Also, many systems already use a timestamp for simplistic optimistic concurrency (timestamp meaning date and time, not mssql timestamp). This has the advantage of being human-parseable.
It would seem that these timestamp concurrency tokens would map well to the http last-modified header and allow the use of conditional if-(un)modified-since request-headers.
Also just thought I'd point out that all of this behavior would impossible to utilize in the current Silverlight2 beta (with a direct client-side call) - due to the limitations in the current http stack re: headers and response status codes. Are you working with the SL team on this?
John: we actually do support if-none-match for GETs already. It's not that cheap on the server (we still need to ask the data source for the resource), but the resouce does not have to be sent over the wire to the client. We don't plan to do timestamps right now, but we may in the future. Note that the primary goal with implementing ETags in Astoria is for concurrency support and not so much for optimization.
Thanks for pointing out the Silverlight thing. We know about it and we're working with the Silverlight team to make sure the experience is there once all the pieces are done.
As part of the Astoria design process we scanned through many topics, some of them are straightforward,
Hey Pablo. FYI, the issue you describe wrt weak etags and PUT requests is not quite as clear cut as you make it out to be. There's been a bunch of discussion about similar issues in the HTTPBIS WG (which you guys should be participating in IMO!!).
Some people believe that weak etags should be able to be used on a PUT with If-None-Match. I think they could be used anywhere a strong etag could be used: the meaning of the message will change, that's all. e.g. PUT with If-Match with a weak etag means "set the state to that represented in the provided representation so long as the current state is semantically equivalent to the state identified by the etag I'm providing". With a strong etag, that meaning would change to "... so long as the current state is bit-wise equivalent to the state identified by the etag I'm providing".
While this has been discussed in the WG, it's on the periphery and may require its own issue. Do you want to handle that or will I?
Hi Mark, thanks for the feedback. It's really interesting to hear this, as we struggled with it quite a bit.
I completely agree with you that weak ETags should be applicable anywhere. In fact, given that actual resources and their actual representation(s) are different constructs, I found it surprising that the ETag checks were based on bit-wise comparison and not on semantics alone.
I would be interesting to bring up this issue with the WG if there is enough momentum. I wrote to the AtomPub group because this seemed to be an application-level issue. That said, now that I know we're not alone I'd be happy to bring it up in other forums as well. Is the HTTPBIS WG mailing-list still active and a good place to send this?
I am having a blast on my latest gig which has me using many different technologies at the same time
If so, can you supply some code/articles?
Pablo Castro explains the new ETags attributes that will be available for ADO.NET Data Services classes
We are very excited to announce that .NET 3.5 SP1 Beta 1 and Visual Studio 2008 SP1 Beta 1 are now available!
Just like for batching, there's a great explanation of how concurrency looks in ADO.NET Data Services...
For V1 of ADO.NET Data Services (aka Project Astoria) we introduced a process we called Transparent Design
I am having a blast on my latest gig which has me using many different technologies at the same time from Ent Lib/SAB/WAB to WCF to Nunit to Oracle and so on. I love it. ALT.NET/DDD/TDD Greg has altdotnet seattle session video He also has his 9th is his