Developing a BizTalk Server solution can be challenging, and especially complex for those who are unfamiliar with it. Developing with BizTalk Server, like any software development effort is like playing chess. There are some great opening moves in chess, and there are some great patterns out there to start a solution. Besides being an outstanding communications tool, design patterns help make the design process faster. This allows solution providers to take the time to concentrate on the business implementation. More importantly, patterns help formalize the design to make it reusable. Reusability not only applies to the components themselves, but also to the stages the design must go through to morph itself into the final solution. The ability to apply a patterned repeatable solution is worth the little time spent learning formal patterns, or to even formalize your own. This entry looks at how architectural design considerations associated to BizTalk Server regarding messaging and orchestration can be applied using patterns, across all industries. The aim is to provide a technical deep-dive using BizTalk Server anchored examples to demonstrate best practices and patterns regarding parallel processing, and correlation.
The blog entry http://blogs.msdn.com/b/quocbui/archive/2009/10/16/biztalk-patterns-the-parallel-shape.aspx examines a classic example of parallel execution. It is consists of de-batching a large message file into smaller files, apply child business process executions on each of the smaller files, and then aggregating all the results into a single large result. Each subset (small file) represents an independent unit of work that does not rely or relate to other subsets, except that it belongs to the same batch (large message). Eventually each child process (the independent unit of work) will return a result message that will be collected with other results, into a collated response message to be leveraged for further processing.
This pattern is a relatively common requirement by clients and systems that interact with BizTalk Server, where a batch is sent in and they require a batch response with the success/failure result for each message. With this pattern one can still leverage the multithreaded nature of BizTalk's batch disassembly while still conforming to requirement of the client.
Let’s closely examine how this example works. First, it uses several capabilities of BizTalk that are natively provided – some that are well documented, and some that are not. A relatively under-utilized feature - that BizTalk Server natively provides, is the ability to de-batch a message (file splitting) from its Receive pipeline. BizTalk Server can take a large message with any number of child records (which are not dependent with one another, except for the fact that they all belong to the same batch), and split out the child records to become independent messages for a child business process to act on. This child business process can be anything – such as a BizTalk Orchestration function, or a Windows Workflow.
To learn how to split a large file (also known as as de-batching) into smaller files, please refer to the BizTalk Server 2006 samples – Split File Pipeline example. Even though the sample is based on BizTalk Server 2006 (R1) – it remains relevant for BizTalk Server 2006 R2, BizTalk Server 2009, and BizTalk Server 2010. Note: There are several examples on how to de-batch messages, such as de-batching from SQL Server, as communicated on Rahul Garg’s blog on de-batching.
After the large file has been split into smaller child record messages, child business processes can simultaneously work on each of these smaller files, which will eventually create child result messages. These child result messages can then be picked up by BizTalk Server and published to the BizTalk MessageBox database.
To aggregate the results back to the same batch, correlation has to be used. BizTalk Orchestration provides the ability to correlate. Aggregation with Orchestration is relatively easy to develop. Using Orchestration, however, may require more hardware resources (memory, CPU) than necessary. One Orchestration instantiation will be required per batch (so if there’s 1000 large batch messages, then there will be 1000 Orchestration instances). Orchestrations linger until the last result message has arrived (or it can time out from a planned exception handling). There is an alternative way to aggregate, without Orchestration, but correlation is still necessary and required to collate all the results into the same batch. The trick is to correlate the result messages without Orchestration.
Paolo Salvatori, in his first blog entry http://blogs.msdn.com/paolos/archive/2008/07/21/msdn-blogs.aspx describes how to correlate without Orchestration, by using a two-way receive port. This is important to note, because the two-way receive port provides key information that can be leveraged, by promoting them to the context property bag.
These key properties are
Correlation without Orchestration is as easy as just promoting the right information to the context property bag. Of course, this information needs to be persisted, maintained, and returned with the result messages. However, this method only works with a two-way receive port. What about from a one-way receive port, such as from a file directory or MSMQ?
That is still possible because the de-batching (file splitting) mechanism of BizTalk Server provides a different set of compound key properties that can be used.
These are
The final step is to aggregate all the result messages and ensure that they’re correctly collated into the same batch that their original source messages from from. The UID that was used for correlation, is the parent identifier that can be used to group these result messages into. An external database table can be used to temporarily store the incoming messages, and a single SQL Stored Procedure can be used to extrapolate these messages into a single output file.
This method of using a SQL aggregator was communicated in http://geekswithblogs.net/asmith/archive/2005/06/06/42281.aspx
Note that the XSLT can be eventually replaced by a custom pipeline component or directly by an envelope (in this case the send pipeline needs to contain the Xml Assembler component).
Acknowledgements to: Thiago Almeida (Datacom), Werner Van Huffel (Microsoft), Ji Young Lee (Microsoft), Paolo Salvatori (Microsoft)