Architecting Service Broker applications (part 2)

 

This is the second in a series on architecting SQL Server 2005 Service Broker (SSB) applications.  If you haven’t read part 1 http://blogs.msdn.com/rogerwolterblog/archive/2006/09/08/Roger.aspx , I would advise reading it first.  The first article covered the Service Broker “materials” available for building Service Broker applications and this article covers the actual design of the application.

 

Presumably you have done the analysis I recommended in the first article to decide that Service Broker is the appropriate solution for meeting your customer’s requirements so at this point we can assume you are building a reliable, asynchronous, database application.  If not, go back and run through the analysis phase again because you made the wrong choice.  If you’re not concerned with reliability, your activities need to be synchronous and you don’t have any data to store, then designing a Service Broker solution that meets your requirement will be pretty difficult.  You can use Service Broker to implement synchronous activities or add a database to your application just to be able to use Service Broker but in general, this is probably only justified if there is a compelling need for SSB reliability or if there are other parts of the application already using SSB.

 

I’ll start with a series of decisions that you should go through when designing a Service Broker application.  Not every decision is required for a given application but it’s worth thinking about all of them to be sure you’re not missing something important.  I present them in order but in reality, design is a very cyclical process and you will often have to revisit earlier decisions as you proceed to later phases of the design.  I put the steps in the order I do them but you may find a different order works best for you.

 

Identify SSB Services

 

Service Broker enables asynchronous communication between services so the first thing you have to decide is what those services should be.  In many cases, the services are obvious from the problem definition.  If you are using Service Broker to log CREATE TABLE events, for example, the services are the CREATE TABLE command and your logging code.  The event code is part of SQL Server so you are left with one service to design.

 

Most Service Broker dialogs involve three pieces of code:

1)     An Initiator that begins a dialog and sends a message (the initiator code usually isn’t the service specified in the FROM SERVICE parameter of the BEGIN DIALOG command)

2)     A target service that receives this message does some work and sends a response

3)     A Response service that handles the response message.  This is the service specified in the FROM SERVICE parameter of the BEGIN DIALOG command.

 

It may seem strange at first that the initiator does not receive the response but that’s the nature of an asynchronous application.  If the initiator waited around for the response, it would be synchronous (in fact this is how you implement synchronous requests over and asynchronous messaging system).  In an asynchronous system, the initiator kicks off the asynchronous activity and them goes on to do something else.  When the response comes back, it may be processing a different request or may even be gone.  If the response is handled by a different service, the service is activated when the response message arrives.  The same service can handle responses from a number of initiators.  A good example is an order entry application that initiates a dialog to a shipping service to ship the ordered item.  As soon as the shipping message is sent, the order entry program can go on to handle other orders.  When the shipping service responds with a ship confirmation (maybe days or weeks later) a service in the order entry database receives the message, updates the order status to “Shipped” and sends an email to the customer.

 

Even if the target does not return a response message, there has to be a minimal service on the initiator side to handle Service Broker control messages such as errors and end dialog messages.  Because of the asynchronous nature of Service Broker, a successful SEND just means that the message was put on a queue (either the target queue or sys.transmission_queue).  A SEND to a remote service that does not exist will succeed because the service broker can’t tell the difference between a service that doesn’t exist and a service that isn’t currently running.  That’s why the FROM SERVICE is a required parameter in the BEGIN DIALOG command.

 

In a more complex application there are many more functions involved – some synchronous and some asynchronous.  Choosing which functions to make Service Broker services is important   The first candidates for SSB services are functions that don’t have to complete before the main logic completes.  Some examples are order shipping and billing, stock trade settlement, and hotel and car-rental reservations for a travel itinerary.  In all these cases, the original transaction may have completed long before the response is returned and the response status is returned to the user either through out of band communications (email) or through a status which the user can query later.

 

If a function must be complete before control is returned to the user, it still may make sense to use a Service Broker service if two or more services can execute in parallel.  A classic example scenario is a call center application I worked on once.  Incoming callers were identified through caller ID and all the customer’s records from all the internal systems were retrieved so they could be displayed to the service representative when the call was answered.  The problem was that this involved remote queries into seven systems and sometime this would take so long that the customer would give up before the call was answered.  We made this work by starting all seven queries in parallel and then returning the results when they all returned.  This decreased the response time from 5 seconds to 1 second.  Note that this is and “anti scalability” approach.  Instead of one thread, this used eight database threads so our improved response time was purchased at the price of lower scalability but the effect wasn’t huge and we had to do it to meet the response requirements.  Another asynchronous use case might be a mortgage application web site where you ask the user for the size of loan they want and then kick off a bunch of amortization table calculations in the background while you go on to ask the customer for other information.  By the time they are ready to look at loan options, you have all the results ready so the customer thinks you’re calculating them instantaneously.  With a little thought, I’m sure you can come up with dozens of similar scenarios.  That’s why asynchronous activities are used so often in high-performance applications.

 

The final point on choosing services is that it’s usually a mistake to make SSB services too fine grained.  SSB messages are written to the database so if you design a service that makes dozens of calls to other services to execute, you may find that the database overhead is larger than the actual processing time.  A Service Broker service should do a reasonably significant piece of work that justifies the overhead of the message handling.  There are exceptions to this if reliability, remote execution, or security context isolation make a Service Broker service attractive even if the service is very small.  This is really no different than DCOM where a few large DCOM calls are much more efficient than many small DCOM calls to do the same work.

 

Define Dialogs

 

Once you have defined your services you need to define the dialogs that they use to communicate.  Basically, this consists of deciding what messages are required to communicate and which services need to send them.  Dialogs are more complex than the usual request-reply semantics that you’re used to in DCOM or Web Services because dialogs can be long winded conversations that involve many messages in both directions and last for months or years.  A dialog should model an entire business transaction.  For example, completing a purchase order might involve submitting the order, acknowledging the order, negotiating a price, negotiating ship dates, status information, ship notification, receipt acknowledgement, and billing.  This transaction may continue for months and involve the exchange of dozens of messages.  With Service Broker, this whole conversation should be modeled as a single dialog.  The dialog will correlate and order the messages across time so all messages dealing with this purchase order will have the same dialog handle and conversation group ID.  If you store one the conversation group ID in the purchase order headers table, your application will easily identify which PO the incoming message is for.  Because dialogs are persistent, the ID stays the same even if the dialog lasts for years.

 

Dialogs are reasonably cheap but not free.  Beginning or ending a dialog might incur a database write and there’s a database message to ensure that both endpoints know the dialog is ending.  For this reason, when services are engaged in a business transaction, the dialogs used to communicate with other services should be kept around until the service is done with them.  Some very high performance SSB applications reuse dialogs for multiple business transactions.  This can significantly improve performance but if not done right can lead to blocking issues.  There’s a discussion of the issues involved here:  http://blogs.msdn.com/rogerwolterblog/archive/2006/05/20/602938.aspx  Recycling dialogs can improve performance but the performance comes at the price of increased complexity.  If your application is simple or if maximum performance is a key requirement you should look at recycling dialogs but it general, it’s best to limit dialog lifetime to a single business transaction unless you discover you need a performance boost.

 

Note:  I have used the term “business transaction” several times without clearly defining it.  For purposes of this paper, a business transaction is a complete activity at the business level.  I the purchase order example, the business transaction was processing the purchase order which took many days and involved dozens of database transactions.  Another example is booking a trip with a travel site.  The business transaction of booking the trip involves the hotel reservations system, the car-rental system, the airline system, the billing system, a bank or credit card system, and possibly several other systems.  There are many database transactions in many databases involved in booking the trip.

 

Dialogs enforce ordering of the messages in the dialog.  This ordering is enforced across transactions from a number of different services.  It survives database restarts and failover.  This is a very powerful feature that allows the application to depend on the order of message delivery.  For example, a SSB application doesn’t need to deal with an order line arriving before its corresponding order header if they are in the same dialog.  Another problem that dialogs can solve is mixed types of data in a message.  In web services applications, one of the more difficult problems to solve is binary data embedded in an XML document.  For example an employee message might be an XML document that contains a photograph, fingerprint, certificate, etc, as embedded binary data.  With Service Broker, you could send the XML, picture, fingerprint, certificate, etc. as separate messages.  When the application received the XML document, it could just receive the picture, fingerprint, etc. on the same dialog and be assured that the messages would arrive in the proper order and on the same thread because SSB ordering and locking would take care of it. 

 

Ordering is a very powerful feature but your design must be aware that dialog messages will always be processed in order.  One of the more common questions I get involves developers who open a dialog and then start sending a bunch of messages on it.  They set up activation to start many queue readers but they find only one of them processing any messages.  To ensure dialog message order, Service Broker must use Conversation Group locks to ensure that only one database transaction can receive messages from a particular dialog at a time.  If you want to use the multi-threaded capabilities of SSB, you must have at least as many dialogs active as you have threads.  (Properly I should have said as many conversation groups active as threads because the conversation group is what is locked.  If 10 dialogs in three conversation groups are active, only three threads can receive messages at a time.)

 

The decisions you make about which messages will be sent by each service are used to create the CONTRACT for the dialog.  When you begin a dialog, you specify which contract Service Broker will use to govern which messages can be sent on the dialog.

 

Define Conversation Groups

 

In many cases, a dialog is a pretty independent entity but some applications will use several dialogs to complete a business transaction.  We have already talked about an order entry system where the order entry service communicates with a shipping service, an inventory service, a credit limit service, a CRM service, and a billing service to complete an order.  Generally, the order entry service will begin dialogs with each of these services in parallel to optimize processing efficiency.  Because the services can return messages at any time in any order, the dialogs for all these services should be put into the same conversation group.  When a message on any one of these dialogs is received, the conversation group lock that is shared by all the dialogs in the group will ensure that no other thread can process messages from any of the dialogs in this group.  This will prevent issues like a credit OK message and an inventory status message being processed simultaneously on different threads and making conflicting updates to the order state.

 

I often get questions about the performance aspects of conversation group locks.  Your intuition tells you that locking all these dialogs will cause blocking and slow things down.  In reality, these locks can improve performance.  If two or more threads are working on the same order object simultaneously, they will have to serialize access to the database rows for the order to prevent conflicting updates which means two or three threads will be blocked waiting for each other to finish.  The conversation group lock will block access to messages for a given order while one of the application threads is processing that order.  This means that only one thread is involved instead of having multiple threads blocking each other.  The threads that are freed up by this can then be used to process messages for other orders which improves the overall performance.  So you can see, conversation group locks not only make the application logic simpler, they make it more efficient.

 

Define Message types

 

The dialog definition process determines which messages are required to implement the dialog so in this step we need to decide what the contents of the message will be.   As far as Service Broker is concerned, a message is a 2GB bucket of binary data.  Service Broker will do all the disassembly and assembly required to transport the message to its destination.  The contracts used to define the contents of a dialog contain message type definitions.  The minimal message type definition is just a name for the message type that your service can use to determine what kind of message it has received.  If the message is an XML document, Service Broker can optionally check the message content to ensure it is well-formed XML or that it is valid in an XML Schema. 

 

For each message in a dialog, you need to define what the message body will contain.  XML is commonly used because it makes the service more flexible and loosely coupled but there is some overhead involved with parsing the XML.  If the message contents are binary then just send it as a binary message.  For example, images, music, programs, etc.  I have seen quite a few customers use a serialized dotNet object as a message body.  This obviously makes the initiator and target services tightly coupled because the both have to have the same version of the object but it is pretty efficient and easy to code so it is co0mmon done.  If the message body is XML, you should define a schema for it as part of the design process.  You may or may not want to have Service Broker validate the contents against the schema but having one gives you the option and a schema is an unambiguous way of telling the developers what the message has to look like. 

 

Using a schema to validate incoming message can be fairly expensive because each message is loaded into a parser as it is received.  If your service then loads the message into it’s own parser to process it, each message is parsed twice.  I generally recommend that schema validation is turned on for development and unit testing but then turned off for integration testing and production.  The exception to this would be a service that receives messages from a variety of untrusted sources so the extra parsing overhead is justified because bad messages are rejected early.

 

Design Services

 

The last step in application design is designing the services that process Service Broker messages.  While this is a big job, I won’t spend a lot of time on it because most of the effort goes into the business logic that actually processes the message contents.  I’ll just point out a few things you should consider when designing your services:

·      Service location – should the service run as a stored procedure or as an external application?  If it’s a stored procedure should it be CLR or TSQL?  In many Service Broker applications, the service primarily does database stuff.  If the service primary does database updates and doesn’t do a lot of processor intensive stuff, it should be a stored procedure.  If it does a lot of database IO but also does a significant amount of processing, it should be a CLR stored procedure.

Services that don’t to a lot of database work or do a lot of processor intensive work or do disk or network IO’s should generally run as external applications that connect to the database to get messages.  All an application has to do to process SSB messages is to open a database connection.  This means that a service that does a lot of processing or network IO can run on a different box and connect to the database server to get messages and do other TSQL stuff.  Most significant business logic can run this way.  Another common application is interfacing with web services.  While you can do some of this in SQL Server, the network overhead and XML processing overhead of Web Services make it attractive to do this processing on a commodity server rather than your very expensive database server.  If the external server goes down, all the transactions it had open roll back.  The messages go back on the queue so it there’s more than one server processing messages, everything continues without interruption.  If the queue starts filling up, you can hook more commodity servers to the network to handle the load.

·      Message Processing loop – almost all Service Broker services are built around the same message processing loop.  This loop is used in a number of examples so I won’t cover it extensively here.  Basically the loop has a RECEIVE command that receives one or more messages, processes the messages, SEND’s any output messages, and then starts over.  Most of the examples have a RECEIVE TOP (1) at the top of the loop.  This makes for simple sample code but is not necessarily the most efficient thing to do.  Without the TOP (1) clause the RECEIVE command will return all the messages on the queue from a single conversation group.
 
Doing one receive command and getting back a bunch of messages is more efficient than receiving them one at a time so it’s worth considering this in your design.  The reason almost none of the samples show this is that they are simple request-reply dialogs where only one message is sent on a given dialog so leaving out the TOP (1) clause wouldn’t change the application behavior much.  If your application sends many messages in a row on the same dialog (a logging application for example).  Receiving many messages per RECEIVE statement will greatly improve efficiency.

The other bad thing that most samples do is to commit the transaction at the end of each loop.  Again, this will simplify your logic and if performance is adequate, this is the best design but writing the commit to the transaction log is often the ultimate limiting factor on performance so if you really need the best performance possible, you may want to only commit after a few trips through the processing loop.  Doing this will improve performance but may increase latency because no responses will be sent until the transaction commits.  In most asynchronous operations, latency isn’t too important so this is a good tradeoff.  Transaction rollback handling is tricky in this case because you have to go back and process the messages one at a time to get around the bad one.

·      State Handling – Read any SOA book and you will find that services should be stateless.  While this improves scaleout and performance, in reality, a lot of real business transactions involve many messages over a significant time so state has to be held somewhere.  If state isn’t held persistently, it can be lost which forces the whole business transaction to fail.  With Service Broker, dialogs and conversation groups are both persistent so they inherently maintain state.  Your application can use this fact along with the fact that the messages are in the data base anyway to maintain state in a scaleable, high performance manner.  This makes long-running business transactions easier to implement.

While there are many ways to design state handling, Service Broker applications have a special advantage.  All messages from any of the dialogs in a conversation group have the same conversation group ID.  Since a RECEIVE command only returns messages from a single conversation group, all the messages returned will have the same conversation group ID and if you designed your dialogs correctly will all be associated with the same business transaction.  The advantage of this is that if you store your application state in tables with the conversation group ID as the key, you can get the key to the state from any message received.  This means you can easily write a TSQL batch that will return both the messages in the queue and the state information required to process them so state handling is quick and easy.  Also, remember that only one thread can process messages from a particular conversation group at a time so if the conversation group ID is the state key, only one thread will be accessing the application state at a time so conflicting updates aren’t an issue.

·      Poison Messages – A poison message is a message that can never be processed correctly.  A simple example is an order header with an order number that already exists in the database.  When you try to insert the header into the database, the insert will fail with a unique constraint violation, the transaction will roll back and the message will be back on the queue.  No matter how many times you try, the insert will fail so the service will go into a loop processing the same message over and over.  This will cause the order entry to hang and can severely impact database performance.  To keep a poison message from bringing your server to its knees, Service Broker will disable the queue if there are five rollbacks in a row.  This allows the rest of the server to continue but the order entry application is dead because the queue is disabled. 

The way to avoid this is to only roll back the transaction that did the RECEIVE if there’s some hope that trying again will make the transaction succeed next time.  If your transaction fails because of a lock timeout, being selected as a deadlock victim, low memory, etc rolling back the transaction and trying again make sense but if the error is permanent, you need to handle it in your application.  The most common way of handling it is ending the dialog with an error and logging the error to an error table.  The method you choose to handle poison messages depends on your application requirements but it’s important to include poison message handling in your design.

·      Message Priority – I’ll say up front that Service Broker doesn’t have a built-in way to enforce message priority.  This leads to a lot of angst among developers who are used to using message priority to ensure that certain messages are processed first.  My personal experience is that most people don’t really need absolute message priority in their applications.  They just need to be able to ensure that high priority messages don’t get queued behind a bunch of low priority messages.  The easiest way to make that happen in Service Broker is with a high priority and low priority queue.  You can use activation to assign enough queue readers to the high priority queue to handle the load and assign a single queue reader to the low priority queue.  This means that low priority message are processed in parallel with high priority messages but high priority messages are never blocked by low priority messages.  If you need more control over priority than this approach gives you, there are some other approaches discussed here:  http://blogs.msdn.com/rogerwolterblog/archive/2006/03/11/549730.aspx and
http://blogs.msdn.com/rogerwolterblog/archive/2006/03/17/554134.aspx

·      Compensating Transactions – Service Broker services process received message in a different transaction and possibly at a much later time than the service that sent the message.  A business transaction can include dozens of database transaction.  For these reasons, you can’t just roll back a business transaction if there’s a error.  Even if you could, undoing the effects of a business transaction generally involves much more than just reverting the database back to a previous state.  To cancel an order for example, you may have to transfer the item from shipping back to inventory (or even pull it off a truck), cancel credit card charges, send out a cancellation notice, etc.  Your service design may have to include provisions for compensating transactions to undo the effects of an activity that errors or is cancelled.  There’s a more complete discussion of compensating transactions here:  http://blogs.msdn.com/rogerwolterblog/archive/2006/05/24/606184.aspx

 

Next

 

So far we’ve talked about what kinds of applications make good Service Broker applications and how to design a Service Broker application.  In the next installment, we’ll discuss a few infrastructure and deployment considerations.