PDC05

The other day I mentioned some of the issues surrounding the appropriate scenarios for atomic transactions deployment. I also mentioned the compensating model for long-running activities. Choosing between these models inside distributed applications is a common dichotomy when building real-life systems.

Atomic transactions are a fundamental building block solution for a vast number of software problems. It is difficult to overstate the benefits provided to developers by the atomic transaction model, in terms of simplifying the reliability, concurrency and error handling of a transactional program. However, atomic transactions require strong assurances of trust and relative liveness between all participants. In scenarios where these assurances are not possible, a more loosely-coupled model must be used.

An example of such a scenario is the painfully real-life sequence of events and decisions with which anyone building a business application backend will be familiar. A business may wait for supplies ordered from different vendors, require human input to make decisions, await confirmations from customers, tolerate partial failures, resubmit orders, offer customers alternative products, etc. When such a sequence is cancelled, compensating actions are applied to undo every action that was taken as part of the sequence of events leading up to the cancellation. Both actions and compensating actions are likely to use atomic transactions in their internal implementations, but that is a detail at a lower level of granularity than the high-level business process.

While the end result of a compensating workflow may be a concrete outcome or result, it would be limiting to label the entire set of states, events and decisions a “transaction”. In fact, the use of that word in this context is so misleading that you should immediately slap anyone who uses “transaction” to refer to a long-running activity. An atomic transaction is an aggregate set of operations that either complete or fail in unison, usually in timeframes approaching a few milliseconds. Furthermore, an aborted transaction leaves no changes in its wake: the final state of the world is necessarily identical to the initial state. On the other hand, a cancelled workflow activity is likely to leave the world state in the most equivalent state possible to the initial state. The quality of that equivalence will depend on the nature of the actions and compensating actions taken during the workflow activity.

The greater complexity of the compensating model leads to an interesting conclusion. From the user’s perspective, an infrastructure that provides support for atomic transactions can provide a highly transparent interface. However, an infrastructure that provides support for compensating activities has no obvious transparent interface.

A good example of the former is the new System.Transactions namespace in .NET 2.0, which provides a simple and lightweight veneer on top of what is actually a highly complex system for propagating and coordinating distributed transactions. All of the complexity is buried underneath the veneer: in a very real sense, we implemented System.Transactions and MSDTC so that you didn’t have to.

On the other hand, a framework for long-running compensating activities is likely to be an exercise in delegation to user code: both the logic that makes forward progress and the logic that performs compensating operations fall within the domain of the user’s business needs. Consequently, a compensating infrastructure is far less able to isolate a user from his own system’s complexity.

With that said, the workflow modeling environment provided by a sister component of Indigo/WCF, the Windows Workflow Foundation, provides an excellent basis for designing applications that use the compensating model for long-running activities. The combination of WCF for distributed messaging and WWF for state machine and message exchange pattern (MEP) design, as well as general implementation modeling, is extraordinarily powerful.

As WCF and WWF evolve towards their eventual release next year, you should expect to see guidelines and best practices from us on how to build distributed compensating workflows using these components.