PDC05

A common theme that has come up repeatedly here at the PDC: there is some confusion around for WS-AtomicTransactions (WS-AT) and how it composes with the general web services architecture.

The strawman syllogism goes something like this:

Premise 1: WS-AT is a specification for atomic transactions over web services.
Premise 2: Web services are for reaching across trust domains and doing business on the internet.
Conclusion: Microsoft wants us to share our atomic transactions across trust domains.
Corollary: Microsoft is crazy!

This is not an accurate reflection of what we’re accomplishing with WS-AT.

Atomic transactions are not a new invention. They have been described in academia and used in industry for over thirty years. Every transaction monitoring system worthy of the name has provided some form of atomic transactions. Some have extended the concept to implement distributed atomic transactions, taking variations of the two-phase commit protocol and mapping them to (generally proprietary) protocol transports. For example, Microsoft’s Distributed Transaction Coordinator (DTC) has provided support for distributed atomic transactions over an RPC protocol (Ole Transactions) for about ten years.

Atomic transactions exist to provide a solution to a common business problem: how to ensure the ACID properties (most notably atomicity) across multiple stores or resources that are used inside a single unit of work. The nature of the two-phase commit (2PC) agreement protocol leads to an obvious conclusion about the applicability of an atomic transaction as a business solution: there is both an implicit trust and an implicitly tight coupling between all actors in an atomic transaction. Superior transaction coordinators must reserve local resources (log space) to ensure atomicity for subordinates. Subordinate transaction coordinator must rely on their superiors to both provide them with the correct outcome and to do it in a timely fashion. All resource managers involved with an atomic transaction are likely to be holding locks (on what might be essential business data) until the transaction’s outcome is known. A rogue or malfunctioning actor can therefore deny service very effectively: this is a feature of the 2PC protocol, not a hole in any implementation of it.

In short: it is absolutely true that atomic transactions are best suited for tightly-coupled scenarios where trust between actors is a given.

The problem with the strawman syllogism is actually in its second premise: we do not recommend that customers use WS-AT across trust domains. WS-AT was not designed to bring atomic transactions into the web services world; instead, it was designed to bring web services into the glass house.

Let me be categorical about this: the general topologies where atomic transactions have been used in the past are the same topologies that are appropriate for WS-AT. In other words, WS-AT does not represent a radical change in the scenarios and uses of atomic transactions. Instead, it provides evolutionary improvements in the following areas:

1.     Atomic Transaction interoperability. With WS-AT, we have significantly raised the bar for functional interoperability across vendors, systems, languages, runtimes and technologies. While there have been some attempts to accomplish this in the past (e.g. the mostly-ignored TIP specification), none have enjoyed the industry support that WS-AT has today.

2.     Web service ubiquity. While web services have traditionally been targeted at crossing trust domains and firewalls, there are no conceptual obstacles to deploying web service technologies appropriately inside those trust domains. In fact, the very availability of WS-AT makes it easier to create internal solutions that aren’t restricted to a single vendor’s technology stack.

This leaves an open question: if atomic transactions are not intended for long-running activities between businesses, then what should be used instead? The traditional answer has been to use a compensating model, in which business resources are durably committed during execution of a long-running activity, then un-done or compensated if the activity is ultimately cancelled. This model scales to the business needs of long-running activities far better than atomic transactions, and is clearly more suited to web service scenarios that cross trust, ownership, reliability and timing boundaries. However, it is significantly more difficult to leverage than the atomic model (which can be made virtually transparent to the user).

I’ll talk about the long-running activity model some more next time.