Compensating Transactions

 

At the Federal Architect Forum a couple weeks ago I was asked “when is SQL Server going to have compensating transactions?”  Since this comes up pretty regularly, I thought I would write a little background on what this means.

 

First, let’s understand what a compensating transaction is and why we you should care.  In a normal database transaction, any rows that you change are locked so that no other transaction can either change the row again or see the new values until you decide that everything is OK and commit the transaction.  Rolling back one of these transactions is pretty easy.  The database knows that no other transaction has changed the value or done any work based of the uncommitted changes so it can just copy the original values back into the database, release the locks, and continue as if the transaction has never happened.  This is great in the ideal world of database integrity but real life is often more messy than this.  I might order a new HD LCD TV from an online retailer and then cancel the order the next day when my wife finds out what I have done.  Obviously, the database transaction for my order was committed within milliseconds of when I placed the order so when I cancel it the next day, rolling it back isn’t an option any more.  To get rid of my order, a new database transaction is going to have to undo the effects of the original “place my order” transaction.  This new transaction “compensates” for the effects of the original transaction so it is called a “compensating transaction”.

 

At first glance, it might appear to be a simple job for the database to compensate for the order.  If the original transaction is still in the transaction log (or in a backup of the log) the database could just find the original transaction in the log and write the original values back into the database as it would have done if the transaction was rolled back before it was committed.  This works great for the order itself because when the original values are restored, the order will disappear.  The reason this approach doesn’t work in general is that when the original transaction was committed, the changed rows were unlocked so other transactions may have changed the values or made decisions based on the changed values.  For example, if the original order changed the on-hand quantity in inventory from 100 to 99 and the current on-hand quantity is 35, restoring it to 100 will likely upset the warehouse manager.  You might say “well, instead of restoring the original value, just add the order quantity back into the on-hand quantity”.  Again that works if you cancel the order fast enough but if the TV has already been picked from the shelf and transferred to the shipping room cancelling the order will involve putting the TV back on the shelf and maybe even telling the shipping company to not deliver it if it’s already in transit.  These are the kind of things that a database obviously can’t do by copying the original values back into the database.  There might also be parts of the transaction that you don’t want undone.  For example if I changed my shipping address or credit card number when I was placing the order, I would expect those changes to remain if I cancelled the order.  It’s also important to note that while we generally refer to this as a compensating transaction, it often involves many transactions in many different databases.

 

In many modern applications, business processes like taking an order are controlled by some kind of workflow engine that takes the application through the steps necessary to complete the order.  In this type of system, it’s common to be able to define a compensating transaction process as part of the workflow.  The “cancel an order” workflow might delete the order, notify the shipping system to not ship it, notify the billing system to not bill it, notify the warehouse system to restock the items, send a confirmation of the cancellation, etc.  If the reverse process is defined in the same place as the forward process, it’s easy to keep them in synch so when steps are added to the forward process they are also compensated for in the reverse process.  BizTalk orchestrations have done this for quite a while.

 

Service Broker (you knew I was going to talk about Service Broker eventually) doesn’t do compensation or even allow you to define a compensating transaction but it does do a couple things that may make compensating transactions easier.  First, because many Service Broker applications involve multiple asynchronous services associated with a Conversation Group and the sate of the application is generally stored in state tables keyed by the ConversationGroupID, if you know the ConversationGroupID of the transaction you want to do compensation for, It’s usually pretty easy to find out what needs to be undone by looking at the information in the state tables.  Second, if for some reason the state tables don’t give you enough information to do compensation, you can turn on the RETENTION option for the queues and Service Broker will save all the messages involved in the conversation so you can use the messages to find out exactly what was done so you can undo it.  Remember that messages are only retained until the conversation ends so if the cancellation comes in after the order is completely finished, the messages may be gone.  In reality you will probably keep the conversation open until the order has been shipped and billed and cancelling an order at that point will probably require manual intervention so the RETENTION option may work for your application.  In general, using the state information is probably a better option if possible because the state may be kept as long as necessary and won’t have an impact on performance like the RETENTION option.

 

I hope I have convinced you that automatic compensation isn’t a realistic option for most applications.  There might be parts of the originally transaction that the database could compensate for but there’s really no way for the database to know which parts it can automatically issue compensating transactions for and which parts are going to require application intervention.  The only safe option is for your application to do the compensation because only your application knows what is necessary to compensate for a transaction.  Another thing to think about is that compensation will often be different depending on when it starts.  The process for cancelling an order a couple seconds after it was placed is very different that cancelling it a weeks later when the items have shipped and been billed. 

 

As more applications are built as collections loosely coupled, asynchronous services, workflow engines will become more common for hooking services together into and application and even as part of the service itself to provide the flexibility to make the service reusable in a number of applications.  When this happens, including a compensating workflow with the forward workflow will become a common design pattern and compensating transaction will be easier to handle.  If you’re interested in exploring this, I suggest you look into the Windows Workflow Foundation (WF) included in the preliminary versions of WinFX.  There’s a lot of information out there.  I have found this list to be useful: http://geekswithblogs.net/marcost/archive/2006/05/09/77720.aspx