BizTalk Core Engine's WebLog

  • Hidden Gem in BizTalk 2006 R2

    Howdy everyone,

       So first off, I should probably say that my posts might be a bit more far and few between (not that they are all that often as it stands :). A long time co-worker, Kartik, will probably be manning the ship on the blog as I have decided to expand my skills a bit and work on SQL Server Analysis Services. Figured I had been helping people build the processes which actually executed their business logic, now I wanted a bit more experience with the tools they use to analyze what exactly they have been doing at their business. This blog, though, will remain a BizTalk blog since it was never meant to be "my" blog ... just look at the name. Okay, enough of that. :)

     As a "parting gift", I managed to convince the team to let me add a small enhancement to persistence as long as I did not push changes to the UI. Ostensibly, this was really meant for PSS, but personally, I would rather have everyone else fix / figure out their problems without having to call PSS. So don't tell them I told you, but here you go :)

    As part of default dehydration now, the orchestration engine also persists, in the clear, the name of the shape on which it was blocked. I have seen many customers which turn on shape level tracking for orchestration just so that if things get "hung up" and lots of orchestrations end up in the dehydrated state, they can open the orchestration in the debugger view and see what it has executed and what it is waiting on. Using orchestration debugger for this is just a tremendous overkill and adds overhead to the processing and now you have to manage the potential buildup of this information in the tracking db. All just so you can know what an orchestration is blocked on. With R2, you have the potential to figure this out yourself. :)

    Unfortunately, we did not get this into the UI. Would have been really cool to view it in the MMC and dehydrated orchestrations would have the string of the blocking shape. But we gave you a start and now if we ever did add the UI tools, we wouldn't have to change to underlying db schema or any of the engine. So here are the details.

    In the Instances table in the messagebox there is a new column called nvcLastAction. This column contains a guid for dehydrated instances. Sorry, all I had access to in the engine was a guid. However, in the DTA database you will find a table called dta_ServiceSymbols which contains some xml for each service which is deployed, regardless of whether you have any tracking on. This xml is used to translate the guids into string names for some of our tools. It is relatively easy to look at this and build a little tool which will map the guid to the actual shape name. Sorry I have not written the tool ... Remember about being very careful when looking directly in the messagebox. Be careful of locking tables and causing all sorts of blocking issues. Use NOLOCK hints and never, ever, ever update / delete anything. If you are at all unsure about rules around this, please see http://blogs.msdn.com/biztalk_core_engine/archive/2004/09/20/231974.aspx which is specific to 2004 and mostly obsolete for 2006 with the new enhancements to the MMC which makes these direct queries unnecessary, but the rules for quering are still valid. Be very, very, very careful. If you change things or cause problems by introducing blocking, your chances of support are pretty much nill.

     Couple of caveats. We did not do extensive testing on this for parallels where you might be blocking on more than one shape. It could technically pick either shape with no gaurantees. Also, we are simply depending on the orchestration engine to provide this information as we hook in to grab it in the same fashion as the tracking interceptors do.

    So there you go. Have some fun with it. Perhaps if you ask nicely, you could convince someone to put this in for 2006 SP1. You have to ask nicely, though. :)

    Hope everyone is having a great day.

    Lee

  • Disaster Recovery Part 2

    Okay, so some people noticed that I deleted pretty much all of the content from an earlier post on DR because, well, it was no longer valid. Most of it was sample scripts that were now replaced by supported shipping versions of those scripts. However, I guess there was some information that was still useful so I will put back a few comments here and address a question that was raised about the correctness of changing the Recovery Mode for our database / when can you do this.

     So first, let me say that the online documentation should be extremely good for our Disaster Recovery at this point. Michael McConnell, Shu Zhang and I spent a good bit of time making sure we covered every detail we needed and tried to order it so that it was more readable. If you have any complaints about the docs, please send feedback via this blog posting. I will get it to the appropriate person (Michael) to update the docs if necessary. You can find these docs at http://msdn2.microsoft.com/en-us/library/aa562140.aspx.

    I will not repeat everything which is in the docs here, but most of the docs are spent explaining how to setup our backup jobs, how to configure log shipping, and how to restore the system in the event of a disaster. A number of companies have had their DBAs complain about having to use our backup process versus using their standard process which is to backup a database at perhaps 1am every day and then take periodic log backups. This link (http://msdn2.microsoft.com/en-us/library/aa577848.aspx), a sub topic of the above link, gives a good explanation of what we are doing ... transactional log marking. The problem with what the DBAs are doing is that they are treating each db as a completely separate entity. BizTalk, though, will at times use Distributed Transactions to ensure reliability of actions across databases. If you treat each db as a separated entity, then if DB1 and DB2 participate in a distributed transaction together and I do a log backup of DB1 before the transaction commits and a log backup of DB2 right after the transaction commits, then when I restore the two of them my environment will be in an inconsistent state. For that purpose we use transaction log marking with "restore to mark" for all the databases in our environment (as described in the previous link and in many online sql docs).

    There is no magic here. This is all standard TSql. You can read through our code and figure out everything that we are doing. We never install custom components on database servers for doing external procedure calls and we never use xp_cmdshell so this is all code that any database developer could have written, it is just that most never seem to care about DTC (even if maybe they should).

    The question has come up as to are there any alternatives. The quick answer is that if you want to use log backups, then no, there are no other alternatives. You would have to write all the same code that we do so you need to use the code that we wrote. However, if you don't really care so much about the data in your system and are okay with, say, 24 hour data loss windows (your data loss window with log backups is the interval between log backups), then you could optionally just have a period of time every day where you shut the entire system down and then just do full backups. Since there is no activity going on in the system when it is totally down, then you don't need to worry about distributed transactions. To shut the entire system down, you need to shutdown all BizTalk Host Instances, make sure no traffic is coming in via Isolated Hosts (for example http / soap receive adapters), shutdown the sql agent processes on the various databases, and make sure that no other administrative tasks are on going (like changing configuration, redeploying, adding a new sendport, ...). You should be able to verify that nothing is happening on the system via SQL profiler or some other device monitoring the databases. Once the whole system is down, you could take a backup of all of the databases in the system. To get a list of all of the databases which the BizTalk backup process was going to backup, you just need to "SELECT * FROM admv_BackupDatabases" from the BizTalkMgmtDb database. That is a view which all of our backup code uses during our DR process. If you add custom databases to our backup process (http://msdn2.microsoft.com/en-us/library/aa561198.aspx) then it will also show up in this list of databases. It is important to note that our log shipping story also copies sql agent jobs out of msdb when you configure the destination system and restores them when you finally restore to mark on the destination system. If you chose to not use our log shipping story and do the system shutdown full backup approach, you need to make sure you handle backing up and restoring msdb with the sql agent jobs.

    At this point, if you are chosing to have a 24 hour data loss window and use scheduled, daily downtime to create a complete set of full backups for recovery, then you should also change the recovery mode on the databases to SIMPLE. This will automatically manage the size of the transaction log by essentially doing "truncate log on checkpoint". By default, our databases are all set to FULL recovery mode which will only truncate the log when the user performs a backup of log or when the user manually forces the log to truncate (ie BACKUP LOG WITH TRUNCATE ONLY type command). There are lots of docs out there on this, just search for SQL Server Recovery Models. The key is that if you leave it in FULL recovery mode and then don't do log backups (with our job), then your log will grow unbounded and you will eventually run out of disk space.

    Hope that helps to clear up some confusion.

     Thx

    Lee

     Oh yeah ... for those of you who always ask about database mirroring ... I am really sorry that we can't support it. If you check out http://msdn2.microsoft.com/en-us/library/ms366279.aspx you will find a write up by the SQL team on issues with mirroring and distributed transactions. Even if you put everything in one database, we still use DTC transactions occassionally (we did not optimize to detect everything in one database and it is only with the latest System.Transaction work is their automatic support for upgrading from local to distributed transactions). Thx.

  • What you can and can't do with the Messagebox Database server.

    This question has come up a lot in internal and external discussions. It got to the point where we had to put out a KB article as so many DBAs out there thought that it was okay to go and "tweak" the settings (http://support.microsoft.com/kb/912262/en-us). So here is an attempt on my part to explain what you can and can't do in relatively simple terms and why.

    What can't you do?

    You may not make any changes which could effect how our queries are executed (ie their execution plan). I will list some specific things which can affect the query plans, but they are just examples. If you know of a setting which I don't list and you aren't sure, then don't change it. Here are a couple of things to understand around this:

    1) Think of the sql code (stored procedures) which we ship as uncompiled C# (or whatever your favorite language is) code. Every customer who installs our product gets their own compiled version of this code courtesy of the SQL Server engine. You can read up on documentation on how the sql engine performs this compilation (or go to one of Paul Randall's talks at Tech Ed or some other conference) but it is important to realize that it is being done. SQL even has a performance counter which you can monitor called recompiles which will show you how often "code" is being recompiled potentially while your applicatin is running at full throttle. SQL Server uses various statistics about the table layout and cardinality of indexes and such to attempt to chose the best plan to reduce IO and provide the best performance. One thing that the optimizer cannot anticipate is the locking side-effects certain plan choices could have. It is therefore possible that in an environment in which lots of threads are accessing the database at very high rates, a plan which might be optimized for potential IO access could cause tremendous locking and hence be a much worse overall plan choice.

    2) The next very, very important thing to understand is that while the database is installed on your server, it is not "your" database. It is our (or if you catch me on the wrong day, not thinking, "my") database. When you have performance problems with it, the buck stops over here, not at your DBAs desk. For that reason, it is unbelievably important that the system runs the same on every installation. If every customer who called in with a question was forced to gather a dump of the plan cache from within SQL Server and send it to us so that we could reconstruct exactly what sql server is doing, it would make support pretty much impossible.

    3) Access to the messagebox is very controlled and very limited. Via our Admin MMC (in bts 2006) you can construct somewhat flexible queries against the data, but even that is somewhat limited. From the runtime, there are a set of precanned generic stored procedures and some per-host templated stored procedures which provide all the access that the runtime needs. Since we have locked down access to the database, we can understand all of the potential queries. With that in mind, we have done everything we possibly can to "hardcode" the query plans for all of our stored procedures. This means using index hints, force order hints, occassionally join hints, norecompile hints (why recompile if you are just going to generate the same plan) and every trick we can come up with the tie the hands of the optimizer (part of the sql engine which calculates the best query plan). The sql server team is well aware of what we are doing. Technically in SQL 2005 there is support for USE PLAN where we could really hardcode the plan completely, but (in order of importance): (a) we support SQL 2000, (b) it is only supported on SELECT statements which only ~30% of our code and (c) it is non-trivial for generated stored procedures like the ones we use for per host access. Along with attempting to force the optimizer to chose the plan we want, we also disable some of the features of SQL Server which "help" the optimizer chose the best plan. This includes statistics generation and updating. Statistics provide information about the layout of a table so that the optimizer might decide it is more efficient to scan the clustered index than to seek over the non-clustered index (as an example). Since we are hardcoding the plan, statistics are not so important. We also disable parallelism (setting at the server level MAX DOP to 1). Parallelism provides more options for the optimizer to chose amongst for plans. Typically it is very useful for queries over very large databases (warehouse type applications). We are a OLTP (online transaction processing) style app which executes the same sets of sprocs on multiple threads at very high rates. We already provide our own parallelism. I "apologize" that we chose to set this at the server level which could cause it to effect any other database you have installed on that server, but you may not change this setting or else we will simply push back on any support until you switch it back (no matter how large a customer you are). If you have only one server, install an alernate instance of sql server on the same machine and put us on the separate instance. In our testing, when SQL Server choses a plan with parallelism in it for one of our queries it can cause orders of magnitude performance degradataion.

    4) We are not perfect (but we try :). Occassionally we do not include enough hints so that the optimizer can chose another plan which is worse. I have personally fixed two bugs in which we needed to add hints or slightly change the structure of our query because the optimizer chose a less than optimal plan. It can happen, but we do our darned'ist and I have only had I think 3 cases of it in about 3 years which is not bad. It also not fair to blame this on the optimizer. As I said above, since we are hardcoding the plans for all of our queries, we turn off a lot of the features which might help the optimizer (but add overhead to the overall system).

    A quick list of things which can effect the query plans:

    Statistics (don't enable these)

    Parallelism (don't turn this on )

    table structure (don't add indexes, columns, triggers, ... If you do you will hear silence when you call for help)

    Stored procedures (don't change them. You can look all you want, but no touching)

     

    So what can you do?

    You can change the underlying storage model for our data on the physical disks. By this I mean stuff like file groups and files and such. By default all of our tables are setup on a single (PRIMARY) filegroup which has one file. This was admittedly perhaps not the best decision. You are more than welcome to create multiple files for a given file group and also to create multiple filegroups and move our tables around. This can give potentially significantly better IO performance and hence overall system performance. We are currently working on a paper (and perhaps tool to automate) which will give recommendations on how to layout out tables given as many phsical disks as you have access to. You can also change setings related to how sql pre-allocates space for the files and for some of their internal structures. None of this effects how our plans are executed, it simply allows for faster IO access and better overall performance. One interesting settings which we do not enable by default but have been playing with for a little while now is "TEXT IN ROW". This table level option is used to tell sql server whether to store the image column data in the same data page as the clustered index or in another data page. By default this is set to off. If you think about an application like Outlook / Exchange this makes a lot of sense, as the image data (you email message) is potentially only loaded when the user clicks on a specific email to read it (disregarding preview pane). Storing the image data in a separate page allows more rows to be stored in one page and hence less IO is needed to read a large set of data. However, in BizTalk, when a message is delivered to a subscribing service, that message is going to be "processed" and hence loaded every time (we do special handling for large messages to fragment them already). So instead of an optimization, we incur an extra IO to lookup the image data in the separate data page. I don't gaurantee it will always provide a performance benefit, but we have seen some decent gains when processing small messages. As always you should test this out in a QA environment first. You can read more about text in row simply by searching for "SQL SERVER text in row" and picking one of the links. Text in row is enabled at the table level, so you need to have some understanding of our table structures. Really, there are only two tables we need to worry about. The first table is the Spool. This table contains for every message, its MessageID, a timestamp, some other properites, and the blob form of the message context. The second table you need to worry about is called the parts table. This table contains both the part property bag (for things like content type and charset) as well as the first fragment for each part. Enabling text in row on these tables could provide a benefit for high throughput systems which are processing smaller messages. Since the size of a page can only be 8K, if your messages are greater than this, enabling text in row probably won't help since sql will be forced to push the data to a separate page anyways (although we do compress your data so it is smaller in storage than you might think). You can also play around with the max size for the image column before it is offloaded to a separate page (http://msdn2.microsoft.com/en-us/library/ms173530.aspx). As always, have someone with database experience read over this and understand it before testing anything out. A quick script to turn on the option would look like:

    exec sp_tableoption N'Spool', 'text in row', 'ON'

    exec sp_tableoption N'Parts', 'text in row', 'ON‘

     

     

    So the conclusion is that you should never do anything which effects the way our queries are executed but you are more than welcome to try and optimize the IO access to improve performance. Hope this helps some of you out there.

     

    Lee

     

    Addendum: 

    I was asked about Index Rebuilds. First, we do not support online index defragmentation. That process involves page locks which we explicitly disable as part of our installation on all of our tables. Page locks make lock acquisition ordering almost impossible and will cause us to deadlock all over the place. We are very carefull about how we acquire locks and expect that the granularity is at the row level so that our ordering is enforced. Having said that, offline rebuilding of indexes is perfectly okay and supported. In our labs, we have not seen significant performance gains from doing this but it is true that our indexes will fragment since many of them are on guids (most). SQL Server team is not a huge fan of guid based indexing, but we have done numerous tests which show that as long as you don't ever scan the table, they can perform better than identity based indexes (for our specific workloads). They will of course get fragmented, but usually our tables don't get too big as data flows in and out of them at a relatively steady pace. If you have large amounts of data which is expected to build up in the messagebox, you are more than welcome to periodically do index rebuilds during scheduled downtime. The same applies to the tracking database where you can rebuild indexes during downtime.

  • Utility available for stitching together archives of your tracking database

    A co-worker of mine, Vishal, has posted on GotDotNet a new utility which enables customers to take the archives of their tracking database and stitch them together into a single, large database.

    http://www.gotdotnet.com/codegallery/codegallery.aspx?id=67bbd6ea-850e-4d93-be87-df6788976cab

    This can be a very usefull tool when used in association with the Archiving and Purging features of BizTalk 2006 which will also be available in BizTalk 2004 SP2. Now you can schedule regular archiving of your tracking database and then reconstruct one single database against which you can mine your data. Thanks to Vishal for putting together this tool for everyone.

     

    For those of you heading to Boston for Tech-ed, drop by the Connected Systems area and say hi. I will be there all week and am doing a co-presentation on Wednesday (my bit is on performance) and have a chalk-talk on Tuesday on Ordered Delivery. Hope to see a lot of you there.

    Lee 

  • Whole buncha stuff

    So it has been a while. In the mean time, WE SHPPED!!!! It is such a great feeling. I have been on this team for almost 7 years now and this release is defnitely the best (one would of course hope that every release is better than the previous but you never know :) :). I am super happy to have this out there and hope you are all going to be really happy with it and we are already working on really cool stuff for the next release. There is rarely a break from inovating. :) :) However, I did manage to take two weeks recently and go to New Zealand which is an absolutely amazing country. You Kiwis really know what is up. Even if a town has only 5 buildings (one of which will always sell fish and chips and the other will have postcards of sheep), you can always find an information bldg which will tell you everything to do, book it for you and send you on your way. It is the best country for tourists. I went surfing, caving, kayaking, glacier hiking, tramping, river surfing, bungy jumping and lots more. It was awesome. I am also apparently now a Hurricanes fan cause the Crusaders win too much and the Blues, well, they are the blues. :) :)  I think I should spend more time with customers in New Zealand ... we'll see.

    My role on the team has changed a bit and now I have a team to help work on all the great features. Sumitra S has been working with me for a while now and I am going to work on getting her to post to this blog as appropriately. She rocks and along with a bunch of work on the messagebox core functionality she wrote the (internal) OM which exposes all of the data and operations in the Admin MMC Group Hub Page. Also joining us now is Adrian Hamza after working on the Share Point adapter most recently He has his own blog which you should all checkout at http://blogs.msdn.com/ahamza. He is great and we are really looking forward to his work on the core engine. Craig C has also joined the team and will be helping define some of the long term direction we are going towards. Should make an awesome team. :) :) :)

    So now maybe I will give you some good technicall details so that you are not just reading fluff. :) One thing I have really wanting to post are some learnings I have had with using Yukon and 2 tips that could really help some people avoid the gotchas that we hit when porting code directly that must run on both Yukon and Shiloh (I'd post more but it is 1 am).

    1) After a number of conversations with the optimizer team, we discovered a hidden gem in the use of WITH SCHEMABINDING on UDFs. Our UDF simply took in three datetime variables and did a somewhat complex comparison of them and returned 1 or 0. It did not touch any tables and so did not affect any data. In Yukon there is a new feature / attribute associated with a UDF called SystemDataAccess. This property can be seen via:

    SELECT OBJECTPROPERTYEX(OBJECT_id(‘<MyFunction>’), 'SYSTEMDATAACCESS')

    Without the WITH SCHEMABINDING option, this property is always set to 1 so the optimizer assumes that you might access the data and will add sort / spool to your plan to protect itself from data changes. Adding the schemabinding hint, causes the function to be parsed when it is created and then you can see this property set to 0 and the sort / spool disappears. It is really quite nice and can be a very large perf improvement which I believe will be KB’d at some point. In our case, this is a major gain in the performance of dequeue for large queue sizes. Thanks to the SysRepublic guys for discovering the perf issue with large queues and triggering this discovery. You will find that our peformance for large queues is better on Yukon in some scenarios because of this .

    2) Always owner qualify tables and sprocs / udfs which you execute. SQL plan cache access actually uses this as part of its lookup in the cache and missing this can cause a certain amount of extra cache misses and plan (re)compiles resulting in higher CPU utilization and potentially lower performance. What I mean is instead of "SELECT * FROM Foo" you would write "SELECT * FROM dbo.Foo" assuming the owner of Foo is the dbo. Unfortunately this is something we found too late and it actually didn't make it into our RTM bits as it was not a functional bug and was such an extensive change that we could not risk our RTM date since we had reached our perf goals. We are pushing to QFE this, though, and I will let you know if / when this happens. All it means is that in certain scenarios, on Yukon, we would use more CPU than on Shiloh.

    These two things are just so wierd that I figured I would post about them as case one could be a big gain for those who do this (which might not be a lot of you) and case two is probably all of us and these are fixes that can do which have no effect on Shiloh but can have large gains on Yukon. Hope this helps some of you and I expect the Yukon guys will KB this stuff at some point (as well as investigating why #2 is currently worse than Shiloh).

    Next time, I will post something cool about BizTalk, but now I have a coulpe more things to catch up on before I go to sleep. :)

    Thx

    Lee

  • Job Openings

    Hi all. Hope everyone watched as the Seahawks pummelled the Carolina Panthers yesterday. For our remote team members out in Charlotte ... ouch ... 6 total yards in the first quarter ... ouch (I know it hurt Brian :). However, on a positive note, the Biztalk team is growing and has a lot of cool new work to do which we are planning right now. The bad news, though, is that you would probably at some level have to work with me. If you can get past that and are interested ... the official blurb is below. Thx. Go Hawks!

     

    We have several open positions in BizTalk team. We just started planning our next release of BizTalk server and have a lot of exciting ideas and opportunities. Send email to kerryv@microsoft.com if you want to participate in building next generation of the BizTalk server, or know someone who can be interested in growing his/her career in Microsoft.

  • ManageMessageRefCountLog SQL Agent Job failing in BETA2

    Since I have seen enough questions on this, I will officially post here about it (and we might have a KB Article on it ... not sure how that works for BETAs though). There is a new sql agent job you will find on the messagebox database server called ManageMessageRefCountLog_<messagebox>. In the BETA 2 bits you will find this job fails occassionally on both SQL 2000 and 2005. This is a known issue which we are working to fix in RTM. The failure is ignorable. The error stems from this job attempting to start a secondary sql agent job. If that job is already running we get an error. The approach we were using to determine if the job was already running had a race in it, but we did not get to fix this for BETA2. The job is scheduled to restart itself every minute so even if it fails, it will just restart again. This job is part of our cleanup process for deleting messages. The expectation is that this job will run forever. The only negative side effect the failure of this job can have is if you are running in VERY high performance testing, a one minute down time of this job could cause enough spool growth to trigger our throttling mechanism. In this case, you would see throttling kick in due to spool growth and you could see this. We are of course working to fix this for RTM for this reason and for the fact that customers don't seem to like the red X that shows up when it fails. :) :) If you don't know how to monitor throttling counters, I recommend reading through our documentation on this and then checking out the MessageAgent perf counters which are now accessible via perfmon. Significant enhancements have been made here to help maintain a healthy system so you should read up on it.

     

    Thx

    Lee

  • Check out the new Backup / Restore stuff in BETA2

    Hi all. I am still here. Just get a bit busy and disappear for a while. :) :) So one of the cool things in Pathfinder is some updated backup and restore features. It is still based around the same log shipping story, but we have automated almost all of the steps and vastly improved our documentation to try and make this easier. We are still working on making it better with things like a UI and support for alternatives instead of just log shipping, but hopefully you will find the new docs easy to understand and the new procedure easier to do. I am in the process of trying to find a way to back port this to BTS 2004 but in the mean time, a lot of the work can be used as "samples" of how to automate the process for 2004. This includes a more finalized version of the scripts I have included below for the automated restoration. They are vbs files so slight modifications for 2004 can be done (I know the TDDS connection string is a slightly different format in 2004 so that will need to be changed). Also, it has some automation for restoring to the mark and you can use the way I import data to populate the LogShippingDatabases table as an example as this manual process can be error prone (I was on the phone with a customer today who had a typo while performing this step which caused failures). The underlying technology for our log shipping story is the same from 2004 to 2006, we have just tried to make it a lot easier. I highly recommend everyone check out the new docs on this feature and try it out on 2006 and then see how you can use it on 2004 while I work to officially backport it. Please send feedback as we continue to work to make this feature better for you. Hope you all have a happy holiday season.

     

    Lee

  • Get Connected to Free Product Support and Tremendous Online Collaboration

    Have you ever wanted to speak to Microsoft developers of a specific feature of BizTalk Server? I am sure your answer was “Yes let me at them”, so the Business Process Integration Division is extending an invitation to all customers to join our key feature developers, program managers, and testers in the following newsgroups:

     

    • microsoft.public.biztalk.accelerator.forsuppliers
    • microsoft.public.biztalk.newuser
    • microsoft.public.biztalk.accelerator.rosettanet
    • microsoft.public.biztalk.admin
    • microsoft.public.biztalk.appintegration
    • microsoft.public.biztalk.framework
    • microsoft.public.biztalk.general
    • microsoft.public.biztalk.library
    • microsoft.public.biztalk.nonxml
    • microsoft.public.biztalk.orchestration
    • microsoft.public.biztalk.sdk
    • microsoft.public.biztalk.server
    • microsoft.public.biztalk.setup
    • microsoft.public.biztalk.tools
    • microsoft.public.biztalk.xlangs
    • microsoft.public.biztalk.xsharp

    We’ve been working very hard over the past year to connect with folks just like you and want to include you in our community of Most Valuable Professionals (MVP), developers, information technology professionals, chief information officers, chief executive officers, or any other role within large, medium, and small companies that hang out in our online newsgroup communities. We want to have you join in this vibrant online community to ask those questions you always wanted to ask but did not know where to go. Well, now you know where to go, we want you to come on in and join us!

     

    If you are new to BizTalk Server, try out the NewUser newsgroup, Microsoft.public.biztalk.newuser.

     

    We’re offering two levels of interaction with Microsoft Corporation employees as follows:

     

    1. Managed Newsgroup Support
    2. Unmanaged Newsgroup Support

    Managed Newsgroup Support

    MSDN managed newsgroups are available in English to MSDN Universal, Enterprise, Professional and Operating Systems subscribers to receive free technical support on select Microsoft technologies as well as to share ideas with other subscribers. MSDN managed newsgroups provide:

    • Unlimited on-line technical support - keep your PSS incidents
    • A commitment to respond to your post within two business days
    • Over 200 newsgroups to choose from
    • Spam protection for your e-mail address when posting items

    Go to the following URL to sign up: http://msdn.microsoft.com/newsgroups/managed .  These newsgroups are monitored by Microsoft support engineers and product group team member as described above.

    Unmanaged Newsgroup Support

    MSDN unmanaged newsgroups are available to all individuals.

    Go to the following URL to participate: http://msdn.microsoft.com/newsgroups. These newsgroups are monitored by Microsoft product group members, other customers like you, most valuable professionals, and various other individuals.

    Questions, suggestions, and direct feedback can be sent to me.

     

    James Fort

    BPI Community Lead

    mailto:jfort@microsoft.com 

  • Backup and Restore of your SQL resources (w/ sample automation scripts)

    I am removing this post as it is no longer needed. With SP2 for 2004 public now, we have backported the Backup / Restore work from 2006 which includes scripts for automating the recovery of the system after a failure (which is what I had posted here). Those files are vbs scripts so there is no mystery as to what they are doing, and you should use them, following the DR instructions in our online docs.

     

    Thx
    Lee

  • Announcements

    Okay, peridiodically the marketing team / other teams ask me to post things to make sure the word is getting out about events. My guess is that all of you are reading scott woodgate's blog (see my links) and if you are not you should, so these will probably be duplicates, but please read them and in this case do the feedback so we can have a strong tech ed session this year (I will be there, so please stop by my talk and we can chat :). Also, I think I have wrangled the owner of the orchestration compiler and now one of the key developers on the engine also to put some info on the blog. That would be great stuff. He is the ultimate source of information related to compiler questions and how to express certain design patterns in your orchestration. Looking forward to hearing from Paul. Okay, the Tech-Ed announcements:

    Industry leaders will conduct special in-depth workshops on Sunday, June 5, from 1 P.M. to 6 P.M. These concentrated “Deep Dive” sessions will last five hours and provide an in-depth look at important technology areas and solutions. 

    The pre-con will start off with an overview session led by Rebecca Dias that gives an overview of SO, Microsoft solutions in the SO space and how they work together. It is then followed by a session by Yasser Shohoud on how to build web service on .Net. The session is followed by a webservices lab. This is then followed by a HIS session by Paul Larsen that shows how to extend the reach of SOA to mainframe via HIS. Following Paul’s session, the earlier web services lab is extended with HIS to expose legacy IBM systems as a web service. Mike Wood’s session picks up from there and shows cases how to orchestrate, expose and integrate with web services. Mike’s talk is followed by another hands-on-lab session where we extend the web services / HIS lab with orchestration in the middle orchestrating these services and maybe followed by exposing the resulting orchestration again as a web service.

    Title: Building Simple and Complex Web Services using the .Net Framework, BizTalk Server and Host Integration Server

    Are you convinced that Service Orientation is the way to go, but don’t yet have the technical skills to begin implementation or are confused by the plethora of choices Microsoft offers you to build Service Oriented systems? Do you want fundamentals on Microsoft’s integration technologies and guidance on trade offs if you use one solution vs. the other for different applicable scenarios? If so, this is a training you won’t want to miss. This training will provide you the guidance and information you need to make an informed decision on how to best construct your integration solution using Microsoft technologies that will ensure your integration solution is extensible, maintainable and inter-operable. Our experts will give you highlights of each technology by demonstrating how you can construct loosely-coupled, composite, and secure systems that bridge corporate firewalls based on core .NET Framework technologies, expose line of business applications in your IBM datacenters as web services using Host Integration Server, and orchestrate the activities of disparate web services into a composite application using Microsoft’s Business Process Management solution BizTalk Server. To solidify your learning, each session is followed by a lab that gives you hands on experience on the technologies just covered. These labs builds upon one another, so by the end of the day you will have the experience of building a real world Service Oriented solution using all the concepts and technologies you just learned. 

    I am pretty sure they want you to register for the class and I think the tech ed site is at www.msteched.com

    This year TechEd is hosting a Birds of a Feather for both the IT Pro and Developer audiences. Topics are then posted to the website where everyone can go in and vote on what their favorite sessions are.  On April 11th we'll close down the voting, and the sessions with the most votes will be held at Tech Ed.  This tool was based on feedback we received at Tech Ed last year that attendees wanted to drive this more and be able to submit and choose what BOF topics were presented. 

    TechEd BOF site: http://www.msteched.com/content/bof.aspx

     

    Thanks

    Lee

  • Hidden gem in SP1 ... cleaning up the msgbox in a test environment

    So now that Jean is posting, this blog should get pretty interesting. Jean is the man, behind the man, behind the man. :) On the current BizTalk team he is about employee number 5 or so which makes him a true wealth of knowledge and I have been working with him for 5 1/2 years and he hasn't killed me yet so he must have a lot of patience too. :) Good to have him on board. :) :)

    I have gotten many questions over time from people asking how to cleanup their messagebox when they are running in a *test* environment without having to reconfigure all of the databases. Sometimes you run a test and something fails and you get 1000s of suspended messages in the database that you don't know what to do with (and HAT takes to long to terminate ... which we are trying to fix). Well, we have had an internal tool for quite a while for doing this which I was finally given permission to include in SP1, but we hid it so here is my chance to let you know that it exists and how to use it (there is a KB article on it, but I haven't found it ... haven't looked too hard :) ).

    In the <install dir>\schema directory you will find a file called msgbox_cleanup_logic.sql (it is installed with SP1). This file contains the definition for a stored procedure called bts_CleanupMsgbox. By default, this stored procedure exists in your msgbox except that the default implementation is empty so it does nothing. To use this script first run the .sql file against all msgboxes using Query Analyzer. This will simply create the stored procedure. It won't actually do anything to your box as far as cleanup goes. Then:

    1) Shut down all bts servers

    2) If you are using HTTP or SOAP run IISRESET from the cmd prompt to recycle IIS and shutdown our out-or-process host instance

    3) If you have any custom Isolated Adapters, make sure they are shutdown also.

    4) run "exec bts_CleanupMsgbox" on all of your msgboxes

    5) Restart everything and away you go

    What will this do???

    You can certainly read the stored procedure to see what we are doing (since you guys seem to read all the other ones I write :), but the summary is that this deletes all running instances and all information about those instances including state, messages, and subscriptions.It leaves all activation subscriptions so that you do not have to reenlist your orchestrations or sendports. Everything will just work and now you no longer have 50,000 instances sitting in a suspended state.

    Couple of notes / gotchas:

    1) If you install a hotfix onto your test system which runs msgboxlogic.sql, it will overwrite this with the empty stored procedure, so you will have to recreate it by rerunning the .sql file.

    2) If you create a new msgbox it will by empty on the new msgbox and you will have to run it there also

    3) THIS IS NOT SUPPORTED ON PRODUCTION SYSTEMS!!!! I have no idea why you would ever use this on a production box since this will delete all of your data but do not do it. Do not even run the cleanup_logic.sql file on your production system. Just don't even think about it cause your data will disappear and I can't imagine that you would be happy.

    4) READ NUMBER 3. :) :)

    5) This script does not actually delete all of the subscriptions. It marks them for deletion and then allows the subscripton cleanup job run by sql agent to take care of them. They will not be filled by the routing process since they have been "ghosted", but if you have 100's of thousands of subscriptons, this could be a little bit of overhead. Right now, for those situations, you just have to wait for the job to finish. Sorry. This is pretty rare scenario that has this happen. To make sure you are okay, after running the script, go to the management node in enterprise manager for your server and under jobs for sql agent, kick off the PurgeSubscriptionsJob_<msgboxname> job and wait for it to finish. Once it finishes, you are golden. This is very rare, but if you are worried about it, do this step to feel okay.

    Besides that, you are golden. This is a very usefull script when rerunning tests after fixing an issue in your configuration. Do not abuse this script though and use it as a crutch for not fixing stress issues in your scenario. If you have issues which are causing the msgbox to overload and become unstable, they should be investigated because you do not want them to occur in your production system. Keep that in mind. Hopefully this script will make it easier for you to run tests efficiently.

     

    Thx

    Lee

  • Large messages in BizTalk 2004, what's the deal?

    The large message support story in BizTalk Server 2004 is a complex one, mainly because the definition of large message varies significantly. This, in turn, is complicated by the fact that our customers expect everything to work with all of the possible variations of "large message". So, how large a message can BizTalk Server 2004 really handle?

    The real answer is "it depends", but the number of considerations varies about as much as there are large message cases, so below I give some rules of thumb to attempt to go by. This post is an attempt to characterize the major classes of scenarios we have seen that require transfer of significant sizes of data, and for each, which features in the core engine is well suited to manage that data, and more importantly, which is not well suited.

    The major classes of large message that we have seen are the following:

    1. Large flat file document, containing many independent records. The records themselves are small, but they were batch delivered occasionally, and the documents need to be processed. Sizes here vary from 100k to 100MB.
    2. Large flat file document, wrapped in a single CDATA section node in an XML document in order to carry the data through the system. This has typically come in the form of an exposed web service, which is trying to carry a large flat file document that needs to be carried out and processed. Sizes of the flat data is similar to 1, but I've only seen that go to about 1MB.
    3. Large XML document that is effectively the structural equivalent of the flat file, with hundreds of thousands to millions of "rows" that were batched together. A variant of this case is data coming from the BizTalk SQL Adapter, where the execution pulls back a number of records that is converted to XML for internal processing.
    4. EDI - style interchanges where the file or data contains medium size documents (10K - 100K) that are intended to be processed independently or in aggregate.
    5. Large flat document with a header and trailer at the starts and ends of the file, that really considered one report, but has potentially millions of records. Each record could be processed separately from the others, but the entire sequence must be processed in order to complete properly.

    Of the above cases, the hardest case to deal with in BizTalk in general is case 2. The problem here is the fact that our internal processing of the data is based on the .Net, and in particular, the XmlReader class. This class for CDATA, and text in general, does not give a mechanism to access the data in a form that is friendly to large message processing. Basically, we get to this node, and effectively have to ask for the entire string to be materialized into memory in order to process it. This happens in all of the native BizTalk Disassembler and Assembler components (xml, flatfile and btf) because the data is streamed through the components via the XmlReader interface. If possible, it is best to avoid this style of XML, because of the implication in terms of materializing the single string. This is aggrevated in Web Services scenarios, because the data is materialized into a complete .Net object, then de-serialized into a message structure before processing by BizTalk, causing at a minimum three copies of the data in the process memory. If you must work with this data, then the best we can recommend not to send data that is more than 1MB into BizTalk, without some form of custom processing or large memory machines. Custom processing would be difficult here as well, because most mechanisms to deal with the xml data will load the entier string into memory, defeating the purpose of streaming the data.

    Processing Requirements:

    What is required of BizTalk when processing these documents pretty much breaks into two flavors, pass-thru routing (potentially with tracking) or mapping along with the routing.

    Routing Only:

    The really easy case to start with is pure routing, so let me start with that first. The desire here tends to be use BizTalk for a pure message routing infrastructure, and do minimal processing on the message itself. What may be required is to promote several key fields that are important for routing purposes, but nothing more than that. In this scenario, everything but 2 works well, because as we use the XmlReader interface internally, we can "stream" the processing of the nodes into the database without loading any single part into memory. For pure pass-thru cases with property promotion, we have tested up to 1GB size messages, getting them into and out of the processing server. This is not seconds, or minutes, but it can be done. I believe we'd seen something like 4 hours for the 1GB case.

    The major consideration for time in this case is how big is the chunk size we use to fragment the data into the database. The default size is 100KB, and it is controlled in the group settings for BizTalk. In the 1GB case, that means we will take 10,000 round trips to the database to store all of the data incoming on the stream. Keeping a transaction open this long will cause the internal processing issue, so setting this value up to 1MB to 10MB will allow the data to be chunked into the db faster improving the overall execution time of the processing. This requires a larger memory footprint, but the machines required for this kind of processing tend to be high memory machines (multi-GB). In order to keep a consistent set of processing, we use (when necessary) a distrubuted transaction to lock all of the resources. We have seen, however at around 300K to 400K messages per submitted batch of messages, we will keep so many locks open at a time that SQL Server 2000 sometimes gives us "out of locks" errors. We've also tried this on SQL 2000 64bit, and that has worked much better for that large a number of documents in a single submission.

    Mapping:

    This, a much harder case to support, is unfortunately the most frequently done with our product, with sometimes disasterous effect. Given the statement "we can handle large messages", the first thing people try to do is process messages through maps. Unfortunately, this is still a huge problem for BizTalk, primarily because of a lack of good means to do large message transformations. The issue here is that our usage of the .Net XslTransform class we pass a stream object, but it is loaded into an XPathDocument that is processed by the XslTransform class. The problem here is as the transform executes, the XPathDocument caches information about the nodes of the XML along with the data itself to allow for faster access, but this causes severe performance penalties because of the redundant data that sits in the objects. This is where 90%+ of the Out Of Memory (OOM) exceptions that cause orchestrations and receive / send ports to fail come from. The blow-up factor can go up to 10x or more easily consuming all of the memory on the machine. The only recommendation here is to see if the reason for mapping can be accomplished some other way, because even 10MB document may be enough (with JITTed product and user code assemblies, other messages flowing through the process) may be enough to blow the process to 200-500 MB in memory.

    One important consideration here is that if you are dealing with non-xml messages (usually called flat files), you have even more to worry about. Flat files seem to have evolved because when paying by the character to transmit data, they were designed to be as efficient as possible in order minimize cost. XML explicitly stated this as a non-goal, with readability as a much higher priority. As BizTalk converts the business data (not attachments) to XML for internal processing, you have gone from a space efficient format to one that, well, isn't. What that effectively comes to is that your message size in flat file, if 1K, becomes 4k or 5K in XML, simply by adding the tags. For example, fields in flat files typically aren't named, so 3 alone in its particular position would define, say a quantity. If you use elements, that becomes <Quantity>3</Quantity>. Those tags take up space as well, and between compression from optional elements missing and the xml expansion, we typically see 3-4x message size increase. So, your flat files don't need to be very big, with reasonably detailed names to make a huge stealth xml document, because it never materializes on disk, most people never see it or think about it as such, but it blows up in memory and causes the system to consume more memory than you think it should.

    We've seen maps used in several distinct cases:

    Extract/Set properties necessary for performing business logic.

    In this case, the best approach is to use distinguished fields and/or property promotion in your process. Orchestration does not load the data of the message stream unless required, and that will happen with map execution. If instead the field you want to read or update is marked as a distinguished field, orchestration will "grab" the right value without loading the whole message into memory, and update the value during persistence, also without loading the message. This is a powerful means to manipulate key fields without loading the whole document into memory.

    Loop processing

    The condition is that each record is independent, but the requirement is that the messages be procesed in order, so a map is used. This tends to be about half of the cases where the large document styles 1, 3 and 5 have been used. In this case, the best approach is to couple the in order capabilities of orchestration with the ability to disassemble the flat file into records. What happens is that the flat file disasembler will take the documents one by one and publish a separate record for each one. Each message is received into an orchestration that maps each message separately, and sends it out using DeliveryNotification = true, where, assuming the destination is file, it is a dynamic send where the filename is set and the mode is Append. This will complete processing of all of the messages and maintain the order of the messages.

    The sticking point here is how does the orchestration finish, because it is a convoy subscription required to correlate all of the message together. A solution is to use a modified FixMsg component. What it does is on every call to Execute, it generates a GUID and promotes it on the message. In the stream processing, it will prepend the stream with a "FixMSGStart", and append the stream with a "FixMSGEnd" strings. What this does is attach a starting message and ending message to the stream, allowing the orchestration to have definite message types that define the start and end of a particular file. This avoids zombie instances.

    Aggregation

    The simplest form of this is the ability to accumulate some results across all of the records. For example, across a batch of line items, accumulate a total price. If the requirements are also simple, this can be an augmentation over case 2, where a custom variable isused to accumulate the value required. Again, distinguished fields or propoerty promotion here works to extract the field in a memory-efficient manner.

    Other

    This is where more complicated logic falls into, combinations of the above into a single transform that could not easily be factored into separate processing. The guidance here is pretty grim, as it falls down to the fundamental issue stated above for map execution. For this, the only thing we can really say is "good luck", because as we can't control the amount of caching that is done by .Net, we don't have any control over the growth of memory. At this point, the best we can say is if you have messages above 10MB of XML data (this may translate to smaller flat files -- remember the xml tags around each field counts toward the total message size) is unsupported, as the memory requirements are not easily predictable, hence could cause OOM exceptions even if on the surface, it shouldn't.

    It is difficult to describe the above in a single support statement, because of the considerations involved. When asked if we support large messages, the answer always comes to "it depends", and the above should give an idea on what the issues and potholes are.

    In summary:

    1. If you are looking at a total size of 1MB or higher and require mapping, investigate the style of mapping required to see if it can be done without using the mapping tool. If orchestration coupled with in-order processing will accomplish your needs, go with that, taking care to avoid races that force the instance to complete prematurely (so-called "zombied" instances, as Lee earlier discussed).
    2. Mapping works up to about 10 MB where it the complexity of the map or number of nodes can cause significant memory inflation, 10x or more. Larger than that given the current architecture it becomes much harder to get to work.
    3. Without mapping, you still can get burned with wrapped flat file in CDATA sections. If that data gets past 1MB, you are going to have issues -- 10Mb or more is not recommended or going to be reasonably supported.
    4. Otherwise, the major consideration is the fragment size that is controlled at the group level. Please see the performance whitepaper to see the details as to the messaging fragmentation threshold to understand this in more detail. These cases fall under almost pure routing, and can handle far bigger messages, in excess of up to 1GB.

    Hopefully, this helps clear up some of the confusion (or at the very least, explain why there is much confusion) around the large message story.  Interestingly, this is actually significantly improved from BTS 2000/2002, since we could never handle messages on the order of a GB even in pure routing scenarios.  And yes, we are working on this for future releases.  We understand that this is painful, and are working very hard to try and make more large message scenarios easy with BizTalk Server.

  • New poster on the core engine blog

    Hi,

    Lee's been trying for a while now to get more of us to post our knowledge to the blog, so here I am.  My name is Jean-Emile Elien, and I work on the BizTalk Messaging Runtime along with Lee. I've been keeping notes on a couple of topics that I have been seeing a lot of discussion on lately, so I thought it was time to actually post something.  My first post will be on large message handling in BTS 2004, the good, the bad, the ugly.  This is not well understood by, well it seems anyone, so I'm going to see if I can shed some light on this.

    Just saying hi, and hopefully Lee will stop sending me emails to add something.  :)

  • SP1 has shipped

    Just want to let everyone know that it is out and it is goodness. Woodgate has posted links to it as well as some information on where to download the .Net Framework GDR on which we have a dependency just in case you miss it in our docs. Check out his links for more information.

    Info on .Net GDR

    Download SP1

    Thx

    Lee

More Posts Next page »

© 2008 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker