Architecting Service Broker Applications

 

I signed up to do a presentation on Architecting Service Broker Applications so I thought I would write up my thinking here to see if I got any useful feedback.  Pretty much all the presentations, papers, and books I have seen or written talk about how to build a Service Broker (SSB) application.  In this presentation, I’m trying to go up a level to talk about how to make the architectural choices required for a SSB application.  This is based on a bunch of thinking I have done over the past four years and helping quite a few people design their SSB solutions.  Let me know what you think.

 

I wanted to be an architect when I was a kid (I think this came somewhere between Astronaut and Aerospace Engineer).  This ambition was probably because our neighbor was an architect and when we did neighborhood projects, he used drawings and a transit while the rest of us used hammers and shovels. Now that I’m a software architect, I like to think about software in real architecture terms.  There are a lot of parallels between designing a building and designing an application.

 

The art and science of design

 

When you design a building, the land it’s going to be built on is usually a given.  You might start with a building design and then go looking for a place to build it but this is unusual.  The land imposes constraints on your design.  The size of the land determines the maximum dimensions of the foundation.  The soil conditions (or zoning laws) may limit the maximum height.  The slope may determine what kind of building you build.  The surrounding buildings will influence the design and materials used.  In the case of an SSB application, the land would be Windows and SQL Server.  While SSB is a database feature which can be accessed in a variety of ways from many different platforms (basically anything that can connect to SQL Server), you can’t build a Service Broker without SQL Server and you can’t run SQL Server on anything but Windows.

 

The biggest constraint on a building design is the customer requirements.  Does he need a place to live or an office building?  Does he need two bathrooms or 200?  Is the customer a bank or a steel mill?  No architect would start construction before the critical decisions are made.  Some decisions can be deferred – the color of paint, the furniture, etc. but all the decisions necessary for each phase of construction must be made and signed off on before construction starts.  If a customer decides he really needs to build a three bedroom split entry house after 50 floors of steel has been completed on his original request for an office building, it’s going to cost him a lot of money.  There’s a perception that software is flexible so major changes in requirements can be accommodated at any point.  While the cost of changing software requirements might not be as big as changing the design of a half-built building, there are costs and changes hurt so getting the requirements right is vital.

 

The next step in building design is selecting the materials and tools to use.  In some cases, there’s quite a bit of freedom in these choices but if you’re building a 30 story office tower, framing it with pine 2X4’s is not an option.  The customer’s requirements may also limit your options.  If the customer loves brick, you may not be able to use adobe – even if it’s the best material for the job.  The ultimate constraint is often the customer’s budget.  Some decisions the customer will make and others he will leave to the architect’s best judgment.  He may care passionately about whether the roof is wood or tile but leave the choice of copper or PVC pipe up to the architect.  There are other times when the customer’s requirements are either unwise or impossible so it’s the architect’s responsibility to change the customer’s mind.  For example, no matter how much the customer likes lead pipes, you can’t let him have them.  The same kinds of choices and tradeoffs apply to designing a Service Broker application.  You first have to decide whether Service Broker should be used at all.  While it’s the best thing since sliced bread, Service Broker isn’t the answer to everything.  Service broker has a large number of features and options.  You have to use a combination of the customer’s requirements, Service Broker capabilities, design constraints, and your own judgment to determine which features you should use and how you should use them. 

 

Once the constraints, requirements, and materials available have been determined, the architect can design a building that satisfies the requirements, fits the constraints including the time and resources available, and satisfies the architect’s and the customer’s aesthetic sense and professional pride.  A professional architect won’t design an ugly eyesore even if that’s exactly what the customer wants because it will reflect badly on his reputation and perceived competence.  Similarly, a software architect should never design an application that won’t work just because the customer demands it.  If what the customer wants won’t work, it’s the architect’s job to tell him that and if that means he finds someone else to design the application, at least your name won’t be associated with a software disaster.

 

Once the architect has a design, his responsibilities don’t end.  The architect must closely monitor construction, be ready to make design changes as circumstances change,  and be responsible for ensuring that the building satisfies the design criteria.  Similarly, a software architect can’t throw a design over the wall to the developers.  The software architect must shepherd the project through implementation, test and implementation.

 

WOW, that was a lot more philosophical than I intended but I want to use this analogy as a framework for explaining SSB application architecture.  Let me know if it was helpful or too over the top.

 

When is Service Broker the right material?

 

Service Broker is a platform for building loosely coupled, reliable, distributed database applications.  It is built into all editions of SQL Server 2005 so it can be used in any SQL Server 2005 application.  If you would like more information about Service Broker, you might try: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql90/html/sqlsvcbroker.asp or http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql90/html/sqlsvcbroker.asp   If you want a lot more information I recommend:   http://www.amazon.com/Rational-Server-Service-Broker-Guides/dp/1932577270/sr=1-1/qid=1158009357/ref=pd_bbs_1/103-0049023-7244631?ie=UTF8&s=books 

 

While I like to think all applications are potential Service Broker applications, the reality is only most of them are.  The way an architect decides whether to use a building material is by matching its characteristics against the requirements.  I’ll try to give you some guidance on how to do this for Service Broker.

 

Queues

 

One of the fundamental features of Service Broker is a queue as a native database object.  Most large database applications I have worked with use one of more tables as queues.  An application puts something it doesn’t want to deal with right now into a table and at some point either the original application or another application reads the queue and handles what’s in it.  A good example of this is a stock trading application.  The trades have to happen with sub-second response time or money can be lost and SEC rules violated but all the work to complete the trade – transferring the shares, exchanging money, billing the clients, paying commissions, etc. can happen later.  This “back office” work can be processed by putting the required information in a queue.  The trading application is then free to handle the next trade and the settlement will happen when the system has some spare cycles.  It’s critical that once the Trade transaction is committed, the settlement information is not lost because the trade isn’t complete until settlement is complete.  That’s why the queue has to be in a database.  If the settlement information was put into a memory queue or a file, a system crash could result in a lost trade.  The queue also must be transactional because the settlement must either be completed or rolled back and started over.  A real settlement system would probably use several queues so that each part of the settlement activity can proceed at its own pace and in parallel.

 

So queues in the database are a good thing but why not just use tables as queues instead of inventing a new database object?  The answer is that it’s hard to use tables as queues.  Concurrency, lock escalation, deadlocks, poison messages, etc, are all difficult problems to resolve.  The Service Broker team spent years coming up with a reliable, high-performance queue so that you can just call CREATE QUEUE to take advantage of all that work in your application.  The logic to put a message on a queue, pull it off, and delete it when your done with it has been incorporated into new TSQL commands (SEND and RECEIVE) so you don’t have to write that logic either.

 

SSB queues can be used for just about any asynchronous activity a database application wants to do.  An order-entry application might want to do shipping and billing asynchronously to improve order response times.  A trigger that needs to do a significant amount of processing might use SSB to do the processing asynchronously so updates to the original table aren’t impacted.  A stored procedure might need to call several other stored procedures in parallel.  This list goes on. 

 

The asynchronous queue pattern is applicable to a tremendous number of applications.  Just about any large, scalable application uses queues somewhere.  Windows does almost all its IO through queues.  IIS receives http messages on queues.  SQL Server TSQL commands are all executed from a queue.  The obvious question here is if Service Broker queues are so great, why don’t these applications use them?  The short answer is that Service Broker queues are persistent.  That means that putting messages on a SSB queue or removing and processing them involves a write to the SQL Server transaction log.  This is a very good thing if you’re doing trade settlement or billing and you want to make sure that the trade or order is not lost if the power goes off but if the power goes off on SQL Server or IIS, the incoming connections are dropped and the work in progress disappears.  At this point the messages in the queue are worthless because the applications waiting for a response are gone and so persisting the queue is a waste of resources and an unnecessary slowdown.  It’s cool to think about a reliable query mode where your queries are persisted so the answer is returned even if the client or the server crash in the meantime (and several customers use Service Broker to do that) but that’s not possible for the thousands of existing applications.

 

The sweet spot for using SSB queues is for database applications that need to do things reliably and asynchronously.  If the action must happen synchronously then a normal function call or COM or RPC is the right technology.  If the action needs to be started asynchronously but it’s OK for it to disappear if the application dies then some kind of in-memory queue will perform better.  Service Broker is a SQL Server feature so it probably doesn’t make sense to use it unless there’s a SQL Server database around.  There are applications where reliable, persistent queues are so important that adding a SQL Server database just to do the queuing is justified but in general SSB is a better fit for database applications.  It’s also worth noting that because SSB is accessed through TSQL commands, any platform, language, and application that can connect to a SQL Server database can send messages to or receive messages from a Service Broker queue.  This makes it easy to integrate applications on many platforms reliably, transactionally, and asynchronously.

 

Dialogs

 

One of the most unique features of Service Broker is the dialog.  A dialogs is a reliable, ordered, persistent, bi-directional stream of messages.  In most messaging/queuing systems the messaging primitive is the message – each message is independent and not related to other messages at a messaging level.  If the application wants to establish relationships between messages – linking a request to a response for example – the application is responsible for doing the tracking. 

 

In SSB, the dialog is the messaging primitive.  Messages sent on a dialog are processed in the order they were sent – even if they were sent in different transactions from different applications.  Dialogs are bi-directional so request-response relationships are automatically tracked.  Dialogs are persistent so the dialog remains even when both ends of the dialog go away, the database is shut down, the database is moved to another server, etc.  This means you can use dialogs to implement long-running conversational business transactions that last for months or years.  For example, processing a purchase order typically involves an long-running exchange of messages between the purchaser and supplier as prices are negotiated, delivery dates agreed upon, status communicated, delivery confirmed and payment exchanged.  This whole exchange can be a single Service Broker dialog that may last for months.

 

Conversation Groups

 

Dialogs exist in Conversation Groups.  A conversation group is the unit of locking for a Service Broker queue.  Every time a RECEIVE or SEND is executed, the conversation group that contains the dialog used for the RECEIVE or SEND is locked.  One of the more difficult problems with asynchronous, queued applications is that if related messages are received by different application threads, the applications state can get corrupted because of simultaneous changes or changes processed out of order.  For example, an order line might be processed before its order header causing the order line to be rejected.  In many cases, this can only be resolved by making the application single-threaded which obviously limits scalability and performance.  With Service Broker, the application puts all the dialogs related to a given business transaction in a single conversation group so that only one thread will be processing that business transaction at one time.  For example, an order entry application would put all the dialogs associated with a given order into the same conversation group so that when hundreds of threads are processing hundreds of order messages simultaneously, the messages for any given order are only processed on one thread at a time.  This allows you to write a single-threaded application and let Service Broker manage running it on hundreds of threads simultaneously.  

 

A multi-reader queue is probably the most efficient load-balancing system available.  The queue readers whether they are on the database server or on remote servers open a connection to the database and start receiving and processing messages.  After each message is processed the queue reader application receives another one.  In this way, each queue reader receives as much work as it is able to process.  If one of the readers slows down for some reason, it just does receives less often and other readers are free to pick up the slack.  If one of the readers shuts down or crashes, the receive transaction for the message it was processing rolls back and the message appears of the queue again for another reader to handle.  If the queue starts growing because the readers can’t keep up, you can start up another one and it will start processing messages.  There’s no reconfiguration necessary, just start and stop readers as required.  Conversation Group locking makes this all possible.  The sending application does know or care how may queue readers there are or where they are running.

 

Activation

 

One of the fundamental issues with asynchronous, queued applications is the RECEIVE command pulls messages off the queue for processing.  This means that the receiving application has to be running when a message arrives on the queue.  There are several approaches to this such as receiving from a Windows service which always runs or from a startup stored procedure that starts when the database starts.  These are good solutions when messages arrive at a constant rate but in many cases the receiving application is wasting resources when there are no messages on the queue and getting behind when the message arrival rate peaks.

 

Service Broker offers a better alternative called activation.  To use activation, you associate a queue with a stored procedure that knows how to handle messages in that queue.  When a message arrive on the queue, the SSB logic that handles commits checks to see if there is a copy of the stored procedure running.  If there is a copy running then the commit continues.  If there isn’t, the activation logic starts one.  This is better than the triggers that some messaging systems offer because a new copy is only started when it is needed.  Activation assumes that the stored procedure will keep reading messages until the queue is empty while triggers will start a new reader for every message.  If 1000 messages arrive on a queue per second, activation will start one reader while triggers would start 1000.  Activation also looks at whether the queue is growing because messages are arriving faster than the stored procedure is processing them.  If the queue is growing, activation will start new copies of the stored procedure until the queue stops growing.  When the queue is empty, the stored procedures should exit because there is no work to do.  In this way, activation assures that there are enough resources dedicated to processing messages on the queue but no more resources than are needed.

 

Activation has another useful side benefit.  You can execute a stored procedure by sending a message to a queue.  This stored procedure runs in the background on a different execution, transaction, and security context than the stored procedure that sent the message.  This is what enables asynchronous triggers and stored procedures that start up multiple other stored procedures in parallel.  Because the activated procedure runs in a different security context, it can have more or fewer privileges than the caller.  Because the activated procedure runs in a different transaction, deadlocks or failures don’t affect the original transaction.  For example, I worked with a customer who inserted an audit record into a log table at the end of every transaction.  In too many cases, this insert would cause a deadlock or timeout and the whole transaction would be rolled back – leading to user frustration.  They changed their auditing logic to SEND a message to a SSB queue and now a problem with the audit table doesn’t cause the original transaction to fail.  Another customer wrote a simple stored procedure that receives a message from a queue, calls EXEC on the contents of the message, and sends the results back to the originator on the same dialog.  They can now run TSQL commands in the background on any system in their data center by sending a message to it.  SSB security makes this more secure than allowing an administrator to log on to the server and SSB reliable delivery means the commands and responses are never lost.

 

Reliable Messaging

 

We’ve already seen the value of Service Broker in designing asynchronous, queued applications.  Dialogs, Conversation Group locking, and Activation make Service Broker a unique platform for building loosely coupled database services.  Once you understand all the powerful applications you can write with SSB queues, it won’t take long for you to come up with application ideas that require putting messages on a queue in another database.  If the other database runs on a different server, SSB reliability assurances require that the message is sent reliably.  This means that the remote database acknowledges receipt of the message and the local Service Broker keeps sending it until an acknowledgement is received.  In this way an application can SEND a message to a remote queue and have the same reliability assurances as if it was sent to a local queue.  In fact, the application doesn’t know or care whether the message it is sending will be processed locally or remotely.  Writing distributed, queued, asynchronous Service Broker applications is no different than writing local applications.  This means that can start with a local application and make it distributed as processing load or business requirement change.

 

Unfortunately, including reliable messaging in Service Broker has lead to a lot of confusion.  As soon as people see reliable messaging they think MSMQ or MQ Series.  While SSB has a lot of the same capabilities, it is primarily a platform for building distributed database applications.  For example, it’s trivially easy for a stored procedure to reliably and asynchronously start a stored procedure in a remote database with Service Broker but doing the same thing using MSMQ would be very difficult.  I have some more thoughts on these issues here http://blogs.msdn.com/rogerwolterblog/archive/2006/02/28/540803.aspx and here http://www.architecturejournal.net/2006/issue8/F1_Reliability/

 

Because Service Broker communicates reliably between database queues, all the reliability and fault tolerance built into SQL Server automatically applies to Service Broker messages.  Whatever measures your organization takes to ensure your database is available – clusters, SAN’s, transaction logs, backups, database mirroring, etc. – also work to keep SSB messages available.  For example, if you are using Database Mirroring for high availability, when your database fails over to the secondary, all the messages fail over with it and the queues remain transactionally consistent with the rest of the data.  In addition, Service Broker understands mirroring so when the database fails over to the secondary, all the other Service Brokers it is communicating with immediately notice the change and start communicating with the secondary database.

 

What’s Next?

 

Well that’s enough for now.  I’ll post this to see if I get any reaction and write up the next installment over the weekend.  Now that we’ve seen what the Service Broker “material” characteristics are, next we’ll delve into decisions you have to make when designing a Service Broker application

 

Part 2:  http://blogs.msdn.com/rogerwolterblog/archive/2006/09/15/Roger_Wolter.aspx

 

Part 3:  http://blogs.msdn.com/rogerwolterblog/archive/2006/09/16/758387.aspx