Welcome to MSDN Blogs Sign in | Join | Help

Call orchestration with ref parameter in a loop

Sometime back I worked on a case where the spool would go very high and I have been thinking to blog this for some time now and today I decided I will process this awaiting queued item in my mind.

I had a customer who was seeing a surge in the spool count whenever an orchestration instance started and spool would go back to its original count only once this instance get completed. As we all know, the high count of spool impact Biztalk processing, the customer was too worried about it.

Looking into the orchestration, they have got a loop shape which could loop up to 40K times or even more. In this loop, they were calling an orchestration with Biztalk message being passed as ref parameter.

We found that with REF we need to keep messages around until the orchestration completes, so that is why the spool count keeps growing till the loop completes and once the orchestration instance completes, Biztalk jobs would clean up messages from spool. This is by design.

To demonstrate this, we got a simple orchestration with a loop inside. This will loop for 50 times only.

image

Inside the loop, we are making a call to another orchestration and passing Biztalk messages as REF parameter.

image

If we test it, and monitor the spool count in perfmon, we will see that spool keeps growing until the loop completes and once the orchestration instance completes, Biztalk jobs are quick to clear the spool.

image

As a workaround for REF parameter, if we pass a message as IN parameter and have one more OUT parameter of the same type in the called orchestration to return the result, I see that spool growth is lower than while using REF parameter, but still for very large loops spool count can grow substantially.

image

Also, if possible we can try using variables instead - which can be reused as opposed to message.

So moral of the story is we have to avoid large loops with call orchestration shape passing ref parameter. We would suggest using smaller loop, or removing the call structure by using expression Shape or custom object etc.

Posted by Atin Agarwal | 0 Comments
Filed under:

How to join two schemas in a map when they contain namespaces

I worked on an issue where we were receiving two different messages (Message1 and Message2) inside orchestration each with multiple records. Now, we have to join these messages in a map inside orchestration based on TRANID and PROCID. So basically, we have to get the value of TRANID from Message1 and look for the record inside Message2 where value of PROCID matches to value of TRANID and get the value of SUM element and populate it in the output message and this process have to be repeated for every TRAN record inside Message1.

clip_image002

Input Message1

clip_image004

Input Message2

clip_image006

Output Message

Though, there is already a solution for this which could be found at the blog <http://geekswithblogs.net/synboogaloo/archive/2005/04/22/37335.aspx>, but only thing different is that in this case, input schema contains namespace which would complicate the XPATH Query used in XSLT Call Template. Let us figure it out later, how to quickly build XPATH query using namespaces in this scenario.

1) First, create the schema for the three messages.

2) Now create orchestration like below. It will receive these two input messages, constructs the output message using map inside Transform shape and then sends the output to a file location. Since we have to receive two different messages using File adapter and we do not have any unique identifier for correlation purpose, I used BTS.ReceivePortName for correlation. Though, this may not be very practical, I just want to demonstrate joining of two schemas when they have got namespace. But yes, make sure that both the receive locations are in the same Receive Port.

image

3) In the transform shape, select the msgInput1 and msgInput2 as Inputs and msgOutput as Output. Open the map now.

4) In the map, put a looping functoid as in the figure below. Link TRANID and DESC from input to output. The main thing remaining is to retrieve value of SUM from INPUT2 when TRANID matches to PROCID. Now drop a scripting functoid to the map and connect it to TRANID as Input and SUM as Output. It should look something like below.

image

5) Now, for the scripting functoid, things would have been something like below if the schema does not have the namespaces. But here, this XPATH query inside the ‘Inline XSLT Call Template’ will give us only blank results.

image

6) So we need to use the namespaces prefix in out query to get the desired output. Now we need to find out the namespace prefix being used in map for INPUT 2 schema. To find it out, we need to view the XSLT of the map. Therefore, we have to validate the map file first. So, right click the map file and say validate map. Now in the output, we will get a link to the map XSLT. Open this XSL file. Now in this XSL file, we can find the prefix used for the namespace “http:\\Input2” and that is ‘s1’.

image

7) Now, use the prefix "s1" inside the XPATH Query as below. There could be other ways to use XPATH Query but this was the best I could figure out.

image

8) Deploy the solution and test it. That’s all.

I believe there would be several other solutions for this and if you know one, please share it. Would really be interested in knowing that.

Posted by Atin Agarwal | 0 Comments

Throttling

In Biztalk sometimes we see that performance of server is going down. Suddenly server has become slow.

Most of times this would be because of Host throttling.

Now, what is Host Throttling? :) ... Refer Link http://msdn.microsoft.com/en-us/library/aa559591.aspx

The throttling mechanism moderates the workload of the host instance to ensure that the workload does not exceed the capacity of the host instance or any downstream host instances. 

There are lot of scenarios because of which Host throttles. It can be because of Low Process Memory, High DB Size,high thread count and some more

It is very easy to find out reason for throttling .For this ..Go to Start--> Run -->Type perfmon --> Go to report (As shown in Following Figure)

clip_image002[4]

After Going to Report , Select Counters Biztalk: Message Agent .

clip_image002[6]

And Then Select all Counters and all Instances , Click Add.

image

There are two record which I have highlighted, This are throttling State for Publishing and Delivery, This state will let us know if Biztalk is throttling or not

If value of this state is Zero it means there is no throttling . But if value of State other than Zero than it is throttling and using below table you can know what type of throttling it is.

Message delivery throttling state

A flag indicating whether the system is throttling message delivery (affecting XLANG message processing and outbound transports).

  • 0: Not throttling
  • 1: Throttling due to imbalanced message delivery rate (input rate exceeds output rate)
  • 3: Throttling due to high in-process message count
  • 4: Throttling due to process memory pressure
  • 5: Throttling due to system memory pressure
  • 9: Throttling due to high thread count
  • 10: Throttling due to user override on delivery

Message publishing throttling state

A flag indicating whether the system is throttling message publishing (affecting XLANG message processing and inbound transports).

  • 0: Not throttling
  • 2: Throttling due to imbalanced message publishing rate (input rate exceeds output rate)
  • 4: Throttling due to process memory pressure
  • 5: Throttling due to system memory pressure
  • 6: Throttling due to database growth
  • 8: Throttling due to high session count
  • 9: Throttling due to high thread count
  • 11: Throttling due to user override on publishing

Refer Following link for more information http://msdn.microsoft.com/en-us/library/aa578302.aspx

So By looking at state you can know Reason for Throttling. Depending on the state you can find the Solution

For Eg: If Message Publishing Throttling State is 6 then it is because of High DB Size . High DB Size can be if you are not properly maintaining Biztalk DB. This can be because Not all Required Jobs are running . You need to have look at Jobs here.

If Message Publishing Throttling State is 4, it can be because of  Low Process Memory Pressure.

You can go  to Host , Properties and Throttling Threshold. Here you can increase Process Memory and check if throttling is gone.

clip_image002[9]

Host has lot of setting related to throttling . You can refer to this article http://msdn.microsoft.com/en-us/library/aa559628.aspx to understand more about this settings.

Hope this helps.

Posted by sachipa | 0 Comments

Messages not appearing in the dropdown box of Transform configuration

If we are using Multi-part Message inside Orchestration and we have any part which is of simple type, then we cannot directly access any message part of this Multi-part Message from the transform shape inside Orchestration.

This is a limitation with the Multi-part Message type object. If the Multi-part Message type contains ONLY schema types, then only it can be directly accessed in a transform shape, otherwise we will not be able to access any of the parts from the transform shape. This is common when we try referencing a WSDL that contains both simple and complex type’s message parts, the resulting parts of the Multi-part Message type are not available for selection in the transform configuration of the orchestration.

Let’s say we referred to a WSDL file and the resulting Multi-part Message Type contains parts both simple and schema types. In the below figure, the Multi-part message type contains 2 parts which are simple types (outlined in red) and 1 part which is schema type (outlined in green).

image

Now, create a message of this Web Message Type.

image

If we drop a transform shape in the orchestration, we will not see any message in the drop down box. It means that we cannot directly access any part of this Multi-part message from the transform shape.

image

So now the question is how do we construct this Multi-part Message so that we can call our web-service? Lets figure it out.

The workaround is first create a message in the Orchestration View for the schema type part of multi-part message.

Untitled

Then, we can now select this message in the transform configuration. Now we can easily construct this part of the multi-part message in the transform shape.

untitled3

Now, only thing remaining is to construct the Multi-part message. Once we have construct the schema type parts, we can easily construct the Multi-part message using Message Assignment Shape as shown in below figure. Here we have directly assigned values to the simple type parts and assigned the above create schema type message to the schema part of Multi-part message.

untitled

So, this is how my orchestration looks like.

untitled1

Posted by Atin Agarwal | 2 Comments

BAM tracking data gone missing

Recently, I worked on a BAM issue and I wanted to share with you all some interesting facts which I found while researching on this.

Customer implemented BAM Activities with Event Streams. They were using OrchestrationEventStreams within Orchestrations.

This is how their flow looks like in short:

Orch1:  Start Orch1 –> BeginActivity –> UpdateActivity –> Enable Continuation –> Send message to Orch2 so that it gets instantiated and passing Continuation Token to it --> End Activity
Orch2:  Start Orch2 --> Update Activity with Continuation Token  --> End Activity with Continuation Token

Customer observed that sometimes there is a row in the BAM Active table with ActivityId set to ContinuationToken and IsVisible set to NULL. This tracked BAM data is sitting idle in the active table and would not go to Completed table. Also, since the IsVisible is set to NULL, it would mean that it is not going to be shown through BAM view or BAM Portal, unless you write a custom query.

Researching on this, I found that if IsVisible is NULL, this would mean that BeginActivity was never called. So, what was happening in the customer case that that BAM events from Orch2 were getting processed and reaching BAMPrimaryImportDB while the BAM events from Orch1 never make up or they got stuck somewhere in between. Now, the question was how come the events from Orch1 were not getting processed though the events from Orch2, which got instantiated after Orch1, were getting processed. Also, keeping in mind, that the issue is intermittent.

Lets now understand how do the OrchesetrationEventStream(OES) API works. OES API’s are asynchronous. This means that API stores tracking data first in the BizTalk MessageBox database. Periodically the data is processed and persisted to the BAMPrimaryImport database by the Tracking Data Decode Service (TDDS).  There are four tables inside the MessageBox database which stores BAM Tracking data before it gets moved to BAMPrimaryImportDB i.e trackingdata_0_0, trackingdata_0_1, trackingdata_0_2, trackingdata_0_3 (note: the other four tables trackingdata_1_x store the tracking data for BiztalkDTADB). For a particular Orchestration instance, OES uses the Orchestration ID as the StreamID and all the events are written to the same table by the OES and TDDS can guarantee that the events are processed in the same order. http://msdn.microsoft.com/en-us/library/microsoft.biztalk.bam.eventobservation.bufferedeventstream.streamid.aspx

But since we have two different Orchestrations here, it would mean that all the BAM tracking data could go to different Tracking tables inside MsgBox and then we cannot guarantee that it would be processed in sequence. But since we are using Continuations, BAM guarantee that the end result after all the events gets processed (no matter what the sequence is), we should see correct result in the BAM Completed table. That’s the magic with BAM.

Now we queried on the four trackingdata tables inside MsgBox DB and found that the table trackingdata_0_0 has more than 50000 rows which keeps moving in the upward fashion and never decreased. It seems that somehow BAM data stored at this table is stuck there and TDDS is not able to move it to BAMPrimaryImport DB. It now clearly makes sense why we sometimes see BAM events from Orch2 inside BAMPrimaryImport DB and not from Orch1 as the events from Orch2 might have been stored at table other than trackingdata_0_0 and got successfully processed by TDDS. It also explains the intermittent nature of problem as other times the BAM events from Orch1 and Orch2 got written to table other than trackingdata_0_0. Also, sometimes both the events from Orch1 and Orch2 could get written to trackingdata_0_0 table and we would see no BAM data getting tracking in BAMPrimaryImport.

Later, on more digging, the issue was found external to Biztalk. The issue was all related to network and firewall between the Biztalk and SQL servers. After those issues were fixed, all data from trackingtable_0_0 table moved to the BAMPrimaryImportDB correctly.

Posted by Atin Agarwal | 0 Comments

Message for the File Send Adapter remains in active state

This was an interesting issue on which I worked with Shaheer (Biztalk Escalation Engineer) for 2 customers, so I thought I would share it with all.

Customer is using File Adapter at Send Port to write files to a file server. Sometimes, the send message stuck inside in Biztalk with active status and a 0 Kb file is created on the share on the file server. The message remains in active state until we restart the host instance, and then the message is written to the file share.

We took the Biztalk traces and although we did not see anything for the send port other than them being stuck, at the same time that the send ports were stuck, we could see the receive locations hitting the same file server were throwing error 0x800703E5 and Win32 Error = 56 in the traces and even sometimes 80070038

[filelistener]Network connection to location (\\<fileshare>) is down. Error = 80070038

--- 0x80070038 means:
ERROR_TOO_MANY_CMDS winerror.h
# The network BIOS command limit has been reached.

--- 0x800703E5 means:
ERROR_IO_PENDING winerror.h
Overlapped I/O operation is in progress.

--- Win32 Error = 56 means:
Network Bios Command limit has been reached.

This means the network bios command limit has been reached between the Biztalk Sever and the file server. This can happen when there is a very high load of SMB traffic between Windows servers since the default network bios command limit may not be high enough to handle this load. This will effect any SMB traffic between BTS & that file server so even though the error in trace was happening for receive location, it would apply to send port hitting same file server also.

Now comes the resolution part:

Need to follow the below steps as per KB 810886 to increase command limit by setting BOTH of the following registry keys on all the Biztalk Servers AND the File server AND any other file server that you may have problems with:
1. Click Start, click Run, type regedit, and then click OK.
2. Locate and then click the following key in the registry:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\lanmanworkstation\parameters
3. In the right pane, double-click the MaxCmds value and in the Value data box change the value to decimal 5000. (If the MaxCmds value does not exist, create it as a new REG_DWORD value of decimal 5000)
4. Locate and then click the following key in the registry:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\lanmanserver\parameters
5. In the right-pane, double-click the MaxMpxCt value and in the Value data box change the value to decimal 5000. (If the MaxMpxCt value does not exist, create it as a new REG_DWORD value of decimal 5000)
6. Quit Registry Editor.
7. Reboot the computer

We have said above value 5000, taking less cautious route to start with a high value and work your way down. If we want to be cautious, we can start by increasing it to 1000 and then see if we still have the problem and if so, increase it again. In a very high load environment, even 5000 may not be enough and the value would have to be increased further. Only thing we could think with high value even as high as 50,000 is that we may run into high cpu on file server if it can't handle the load.

[Note: The maximum number of simultaneous, active requests between an SMB client and the server is determined when a client/server session is negotiated. The maximum number of requests that a client supports is determined by the MaxCmds registry value. The maximum number of requests that a server supports is determined by the MaxMpxCt registry value. For a particular client and server pair, the number of simultaneous, active requests is the lesser of these two values. This is documented at KB 810886.]

Posted by Atin Agarwal | 0 Comments

Simple Task of Passing a Double Quote to a String Functoid

This is probably not limited to just double quotes but what seemed to be a simple thing gave me quite a few head-scratching minutes yesterday. 

The task was straightforward.  Customer has a double quote as part of a data string inside of a XML element.  For example: 

<attributeData>data"withdoublequote</attributeData>

The mapping logic required us to output the location of the double quote within the data string.  OK, no problem.  We would use the String Find functoid for this. 

image

 

Unfortunately, when we used the double quote character as 2nd parameter to the functoid, we got an error when testing map:

'userCSharp:StringFind(string(attributeData/text()) , """)' is an invalid XPath expression. This is an unclosed string.

OK, so maybe I needed to use an escape sequence with the double quote character.  For the next 15 minutes, I tried different combinations, including &QUOT; and \”.  None of them worked.

Of course, we could go with the old stand-by, a Script Functoid, with the following inline C#:

public int findQuote(string str)
{
    return (str.IndexOf("\"") + 1);
}

That worked fine but not as easy to maintain and re-use.  In the end, the combination that worked involved using an ASCII to Character Conversion functoid and the String Lookup functoid.  For the ASCII to Character functoid, I used the ASCII value of 34 for the double quote character.  I then connected the output to the String Lookup functoid as 2nd parameter and got the correct position.  Just a simple trick and I think we can find use with other “special” characters as well.

image

 

image

Posted by evyang | 0 Comments

Unable to start BizTalk RFID Process when it is bound to too many devices (Device Provider specific issue)

Recently I worked on a very interesting BizTalk RFID issue where the RFID Services appear to hang when trying to start a process that is bound to 25 devices. After a great deal of isolation and some invaluable assistance from Mark Simms (Thank you Mark!) we were able to identify that the issue was occurring as the custom provider was running out of worker threads in its .NET Thread Pool.

The symptoms observed were as under:

2 BizTalk RFID Processes

1 of them connected to 6 devices and the other to 15 devices.

The minute we try to bind one of the processes to 3 more devices (total of 24 devices), the performance seen in the RFID Manager deteriorates and we see all the devices going into Retrying State.

(The performance deterioration mentioned here does not refer to the performance of the RFID Manager by itself. But the behavior observed when the MMC tries to retrieve the latest state of various components - Providers, Devices, Processes etc. - which basically means that the MMC is waiting for the Device Manager, Provider Manager or the Process Manager to respond)

After being busy for a very long time (much longer than the default device connection timeout of 00:01:00), the MMC finally returns with the following error:

"This request operation sent to net.tcp://crpcltiis51e:7891/rfid/service/ProcessManager did not receive a reply within the configured timeout (00:01:00).  The time allotted to this operation may have been a portion of a longer timeout.  This may be because the service is still processing the operation or because the service was unable to send a reply message.  Please consider increasing the operation timeout (by casting the channel/proxy to IContextChannel and setting the OperationTimeout property) and ensure that the service is able to connect to the client."

Troubleshooting

Now considering that there were too many variables involved (Multiple processes, Numerous devices, 3rd party device provider) I decided to start from a more basic test and then add each of the components gradually. Here were some of the questiosns/tests I had in mind to try and isolate the issue:

1. Does the issue happen with a single process connected to multiple devices?

Yes, this was indeed the case. A single process was also failing when trying to connect to 25 devices.

2. What is the threshold at which we were failing?

Created a test process, bound it to 5 devices to start with - the test was successful. Added 5 more devices in each progressive test and found that we are able to connect to 20 devices successfully. It worked fine with even 22 devices, but failed after binding the process to the 23rd device.

3. Could the issue be with the 23rd device that we just added?

Could be - but this was not the case over here. Reason being, we are able to connect to that device when it was the only one in the binding list. The failure is only when a process is bound to too many devices.

4. Is there a ceiling to the number of devices that this specific provider can manage?

Not this provider. The was confirmed by the fact that the customer had other implementations where the same provider was being used to manage/connect to more than 28 devices.

So far, here is what we know:

- There is no limit that BizTalk RFID service imposes, to the number of devices that an RFID process can connect to.

- There is no limit imposed by the Custom provider either.

- This has got to be an environmental issue.

5. What do the RFID Services log show?:

For devices where we were able to connect successfully the following combination of entries were found in the RFIDServices.log:

48| Info|031909 15:31:44|In UpdateDeviceConnectionsThreadPoolProc for device Device1|[DeviceManager] (The point where we start connecting to a device)

...

...

48| Info|031909 15:31:46|Setup connection on Device1 succeeded|[Device] (Confirms a successful connection attempt)

versus, for the device where the connection attempt failed, the following combination of entries were found:

35| Info|031909 15:32:00|In UpdateDeviceConnectionsThreadPoolProc for device Device23|[DeviceManager]

35| Info|031909 15:32:00|Opening the device Device|Device23..

...

35|Warning|031909 15:33:08|Dedicated thread SetupConnection35 timing out. Regaining control forcibly by firing an event and throwing an exception|[ProviderManager]

35|Warning|031909 15:33:08|Provider CustomProvider Provider misbehaved: ThreadOrphaned|[ProviderManager]

35| Error|031909 15:33:08|The device Device|Device23... could not be connected to because of exception [The method SetupConnection has timed out. If this is a provider/device operation, the timeout can occur if the provider/device could not complete the operation within a specified time period. If you encounter this frequently, consider increasing the value of delegateThreadTimeoutMilliseconds in ProviderManagerConfiguration under RfidServerRuntimeConfiguration.] [ at MS.Internal.Rfid.Service.Provider.LocalDomainDeviceMarshaller.RunDelegateMethod(String methodName, Type[] externalTypes, Object[] externalargs)

at MS.Internal.Rfid.Service.Provider.LocalDomainDeviceMarshaller.SetupConnection(AuthenticationInformation authenticationInfo)

at MS.Internal.Rfid.Service.Devices.RfidDevice.ActuallySetupConnection(Boolean fCheckAndHandleConflicts)].|[Device]

(Notice the time difference between the point where the connection attempt started and the point of the error - so it makes sense why a timeout was occurring.)

In fact on tracking the connection time to the various devices in the device binding list, here is what we saw:

clip_image002

This performance degradation observed by near linear growth in connection times seemed to indicate that we are running out of some resource over a period of time.

(Sure, at this point we could increase the device connection timeout as suggested in the initial error message - but this would be like applying a band-aid on a more serious problem underneath. Reason: why does it take more than 1 minute to connect to a device under one scenario i.e. too many devices in the binding list, whereas it takes almost no time to connect to the same devices in a different scenario i.e. fewer devices being bound to.)

The next logical step would be to look into the provider logs to find out what was happening with the corresponding connection attempt at the provider level. To be very honest, I tried to identify the specific connection attempt but was unsuccessful :( - hence I thought of reaching out to the provider writers i.e. 3rd party vendors, and see if they could find anything in the provider logs corresponding to the "Provider misbehaved" warning.

6. On speaking with the provider writers we learnt that they were using the .NET Thread Pool to manage connections to the various devices. This prompted us to review the following article:

Contention, poor performance, and deadlocks when you make Web service requests from ASP.NET applications

This confirmed that the resource we were running out of over a period of time was .NET Worker Threads. On increasing the following settings in machine.config we were able to resolve this issue:

Configuration Setting
Value

maxconnection
12 * #CPUs

maxIoThreads
100

maxWorkerThreads
100

minFreeThreads
88 * #CPUs

minLocalRequestFreeThreads
76 * #CPUs

Please Note: The above values are only recommended values - and are by no means the *exact* values that would be needed in order to resovle the issue. Being a performance tuning scenario, this is something that depends on the environment.

Slow component instantiation

I am dedicating this post to Jozsef because I’d have missed out on this interesting issue without his escalation. 

Yesterday Jozsef sent out an email about an orchestration performance issue.  I thought this would be a good test run for PTrace after Dwaine improved the speed of that utility.  The issue sounded strange.  Customer complained that his orchestration was slow.  The unusual part was, after using orchestration debugger, Jozsef isolated the bottleneck to the start of an atomic shape.  The delay was actually 16-17 seconds.  The delay here was so out of normal scale, I thought at first this must be a debugger error.  I should know better to blame on the tool.

After getting a BizTalk trace, I parsed it in PTrace and loaded up the orchestration events.  Sure enough, there was a huge gap of time between the completion of an expression shape and the beginning of an atomic scope.  That just didn’t make any sense at first.  The delay was at the start of the scope, we didn’t even hit any shape in the scope yet.  We were not at the end of a scope or just completed a send so persistence didn’t even come into play.  Filtering down to the thread level in the trace, I saw that the delay happened right before AtomicTransaction.CreateInstance().  I did a quick look up in the source code to confirm that this event is logged AFTER we have completed object instantiation, not before.  As it turned out, customer declared a variable at the scope level so all signs are pointing at the delay being caused by this variable’s instantiation.  Even then, hard to believe what in the constructor can cause such delay. 

Fortunately customer shared his .odx file with us.  It was easy to find the variable and the type associated with it.  I was able to reflect the constructor.  Only thing that jumped out was a call into the static DatabaseFactory class in the Enterprise Library.   I searched through KB articles and on the web thinking that we may be hitting a known issue specific to this method.  To my surprise, I got many hits on delays caused by certificate revocation list lookup by .Net CLR against signed assemblies.  In this case, the Enterprise Library assembly was signed by  Microsoft certificate.  Finally we got some direction.  If there was a networking issue where CLR couldn’t reach a certificate revocation list server, that’d delay assembly loaded until CLR gives up on the lookup.  I’ve had a similar issue recently with MSIEXE trying to do the same with a signed assembly as part of a hotfix package.  All we had to do to verify this hypothesis was to temporarily disable certificate revocation check, which you can do within Internet Explorer (Internet Options –> Advanced tab).  Customer tested this and confirmed that it resolved the issue.  We are no longer seeing the 16-17 second delay. 

I am not saying you should disable certificate revocation check permanently.  That is obviously a security risk.  However, if you ever run into similar issue, it is simple to test this possibility.  If it turns out to be the same issue, you can shift your troubleshooting to why you are unable to reach the certificate revocation list server from your BizTalk machine.

Posted by evyang | 0 Comments

BizTalk Accelerator for RosettaNet (BTARN) - continued

Troubleshooting BTARN Web Pages

One of the more frustrating things about working with BTARN is troubleshooting the web pages. The code causes the error event log entry to be either 400 or 500. The actual error is lost.

private void HandleError(ErrorLevel level, Exception failure)

{

ExceptionManager.Publish(failure,3001, ExceptionManager.CategoryIdentifier.RNIFSenderWebApplication);

   switch(level)

   {

      case ErrorLevel.ParamValidationFailure:

      case ErrorLevel.UnknownFailure:

         System.Diagnostics.EventLog.WriteEntry("BARN", "400 From Page", System.Diagnostics.EventLogEntryType.Information);

         Response.StatusCode = 400;

         break;

      case ErrorLevel.ProxyToOuterFailure:

         System.Diagnostics.EventLog.WriteEntry("BARN", "500 From Page", System.Diagnostics.EventLogEntryType.Information);

         Response.StatusCode = 500;

         break;

      default:

         // status code must have already been set

         break;

   }

}

Equally frustrating is the lack of a debugging switch in the code. The only option is to do this yourself. The BTARN SDK folder provides the web application code. Strategically adding lines of code like the following can be helpful.

 

System.Diagnostics.EventLog.WriteEntry("BARN", "Load In", System.Diagnostics.EventLogEntryType.Information);

 

Place these statements throughout the page for testing. Change the second parameter to indicate what part of the code executed.

Also it is a good idea to write out any http errors from the post. Locate the ProxyToOuterRequest method. Wrap:

 

outerRequestReqStm = outerRequest.GetRequestStream();

 

with a catch block.

try

{

   outerRequestReqStm = outerRequest.GetRequestStream();

}

catch (WebException ex)

{

   System.Diagnostics.EventLog.WriteEntry("BTARN", ex.Status.ToString(), System.Diagnostics.EventLogEntryType.Information);

   System.Diagnostics.EventLog.WriteEntry("BTARN", ex.Message, System.Diagnostics.EventLogEntryType.Information);

   System.Diagnostics.EventLog.WriteEntry("BTARN", outerRequest.RequestUri.ToString(), System.Diagnostics.EventLogEntryType.Information);

   throw ex;

}

This is an ideal time to add code to output the message. Place the following code in the load method.

 

Request.SaveAs(@"c:\io\"+System.Guid.NewGuid().ToString()+".txt",false);

 

Don't forget to change the save location to a valid value.

Adding the code and deploying is not very difficult if this was all there is to it. Not so fast, this is BTARN. Backup the default virtual site by copying it to a safe location. Then for any version of BTARN after 3.0 download the hot fix from KB933500.

This change is required by Visual Studio. The hot fix provides detailed code changes so the pages will compile with the newer versions. Once completed deploy the new dll's per the text file provided with hot fix. The text file also contains a dated link. The correct link is:

http://msdn.microsoft.com/en-us/library/aa479568.aspx

It is not necessary to make the changes to the send and receive pages. The design allows separate deployment. It makes sense to do both pages for future use. After the changes the new dll's can be copied in. Operate BTARN and analyze the application event log to understand where the process is failing.

Once the problem has been resolved simply copy your shiny new troubleshooting virtual folder to a safe location and restore the original virtual folder.

These changes are intended for troubleshooting. If custom pages are intended to be a permanent part of the environment keep the original virtual folder. This will be required for Microsoft product group support. Any problem will need to be reproduced using the original code.

Posted by Larry2 | 0 Comments

Biztalk Server issue affecting the # of running orchestrations in R2

Recently I noticed a trend occurring while working on several performance cases that I would like to share with the blogsphere.    In one case in particular, the problem was that, under stress, many hundreds of Biztalk orchestrations were running at one time.   This was because typically Biztalk Receive locations can bring messages into the messagebox faster than orchestrations instances can complete, so if several thousand messages are brought into a system, we were seeing several hundred running orchestrations.  This was confirmed using the performance counter  XLANG/s Orchestrations - Running Orchestrations.   In the old days of Biztalk 2004 on a dual processor machine, we'd never see more than 40 running orchestrations.    The net effect was that the large # of running orchestrations was causing problems downstream on the send hosts and also in the orchestration hosts.  The 40 total orchestrations would be capped by the highwatermark settings in the adm_serviceclass table in the mgmt database.   In fact, I used to tune this setting in Biztalk 2004 and it worked like a charm.  In this case however, the Biztalk Server had 4 quad core processors meaning that 16* 20 or 320 orchestrations could potentially be running at one time.   We wanted to control this, so we tuned the highwatermark settings down to 2, expecting that this would get us no more than 32 running orchestrations.  Wrong, we had over 400 running orchestrations under stress.

So after some more research I found that indeed the maximum # of running orchestrations in 2006 and R2 is controlled by the maxworkerthread setting in the clr hosting key in the registry for each service.  That is, if they are set.

 

So, following the documentation we figured if we had 16 procs and we wanted to minimize the # of running orchestrations, we would set the MaxWorkerThreads to 2, then we'd get 32 running orchestrations maximum.

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\BTSSvc$BiztalkServerApplication\CLR Hosting]

"MaxWorkerThreads"=dword:00000002

 

What happened on host instance restart.  The host instance failed to start, that's right, the minimum setting that we could make is 16 which also happens to be the # of processors, or procs * cores.  

 

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\BTSSvc$BiztalkServerApplication\CLR Hosting]

"MaxWorkerThreads"=dword:00000010

So what do you think happened when we set MaxWorkerThreads to 16 and we had 16 processors.   16*16 = 256 running orchestrations right?

No, we had 16 running orchestrations, that's as high as it got, and it performed much better than when we had 400 running orchestrations, and the important thing that it confirmed was that it is not # of processors * MaxWorkerThreads, but simply MaxWorkerThreads which tells the host instance how many orchestrations that it can run.

As it turns out there is an issue with the 3.5 framwork when installed on a Biztalk R2 Server, and that is that the MaxWorkerThreads default value has been raised from 25 in previous versions to 250 in the 3.5 framework. 

.NET 2.0:
http://msdn.microsoft.com/en-us/library/system.threading.threadpool(VS.80).aspx

.NET 3.5:
http://msdn.microsoft.com/en-us/library/system.threading.threadpool.aspx

The moral of the story.

On an orchestration host, make sure you set these settings so that orchestrations do not grow unbounded.  Do it even if the 3.5 framework is not installed.

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\BTSSvc$BiztalkServerApplication\CLR Hosting]
"MaxIOThreads"=dword:00000032
"MaxWorkerThreads"=dword:00000032
"MinIOThreads"=dword:00000019
"MinWorkerThreads"=dword:00000019

Posted by mshea | 0 Comments

Schema and Mapper Versions – Another Day in Deployment Fun

One thing that holds true in support is that similar issues tend to arrive in waves.  No deployment issues for weeks but all of sudden I seem to spend my day swimming in deployment. 

Today, I’d like to point out an UI issue that does not affect runtime.  Let’s say if you have a couple schemas in a BizTalk assembly and a map in another.  After you make some changes to the schema, you increase the assembly version number of the schema project and deploy it side by side with the older version.  Not an unusual scenario since you may have active instances that depend on artifacts in the older version of the assembly and you want to make sure they are drained before removing the older assembly. 

 

image

So now with both schema versions in place, you want to make sure if you have any maps configured in Receive Ports or Send Ports, they are using the right version.  In Receive Port or Send Port properties, you can select what versions of the schemas and the map is used.  So you open up your Receive Port and you find that you can select the map version. You need to do this to make sure you are using the latest map.  But when you try to update the schema version, only one shows up and it is the older version.  This leads one to think that maybe BizTalk is going to use the older schema in the transform which will fail with the update message type.  Fortunately this is limited to an UI issue.  The map picker does not list all of the schema versions but BizTalk runtime will use the updated schema.  This outbound document from the map will be correct. 

Just a small observation.  Now back to work I go…

Posted by evyang | 0 Comments

Do you see the following errors on your BizTalk Server every time you reboot your Domain Controller?

Event ID 6913

Event Type: Error
Event Source: BizTalk Server 2006
Event Category: BizTalk Server 2006
Event ID: 6913
User: N/A
Computer: <Computer name>
Description:
An attempt to connect to <SQL server name> SQL Server database on server <Server name> failed with error: "Login failed for user '(null)'. Reason: Not associated with a trusted SQL Server connection.".

Event ID 5410

Event Type: Error
Event Source: BizTalk Server 2006
Event ID: 5410
User: N/A
Computer: <Computer name>
Description:
An error occurred that requires the BizTalk service to terminate. The most common causes are the following:
1) An unexpected out of memory error. OR
2) An inability to connect or a loss of connectivity to one of the BizTalk databases. The service will shutdown and auto-restart in 1 minute. If the problematic database remains unavailable, this cycle will repeat.

Error message: Login failed for user '(null)'. Reason: Not associated with a trusted SQL Server connection.
Error source: BizTalk
host name: <Server name>
Windows service name: <Service name>

 

The issue here lies with the Windows Net Logon Service and not BizTalk.   The Domain Controller returns the "NO_SUCH_USER" status code in response to BizTalk and SQL Server logon requests. This happens when the Domain Controller that received the logon request is in the process of shutting down. 

When the "NO_SUCH_USER" status code is received, domain member computers (BizTalk and SQL Servers) and domain controllers do not establish a new security channel with another domain controller that is running correctly. Therefore, the logon requests that are sent by users or by applications may time out. The application that originated the logon requests may time out or may fail unless the application has failover logic or retry logic.

 

The hotfix in this article has to be applied on the Domain Controllers and on the Domain Clients (BizTalk and SQL Servers). After you apply this hotfix, the domain controllers will return a "STATUS_INVALID_SERVER_STATE (0xc00000dc)" status code during the shutdown process. Then, the client (BizTalk and SQL Servers) can contact other domain controllers if the client receives this status code. 

More information can be found in the link below.

942636  Windows Server 2003-based domain controllers may incorrectly return the "NO_SUCH_USER (0xc0000064)" status code in response to logon requests

http://support.microsoft.com/default.aspx?scid=kb;EN-US;942636

 

 

In addition to KB 942636, the following KB has been found to help with the above errors:

906736  You experience a delay in the user-authentication process when you run a high-volume server program on a domain member in Windows 2000 or Windows Server 2003

http://support.microsoft.com/default.aspx?scid=kb;EN-US;906736

The Registry change in the above article disables the Privilege Attribute Certificate (PAC) signature in the Kerberos ticket which reduces the RPC requests between the Client (BizTalk Server and SQL Server) to Domain Controller.  This is the default behavior in Windows Server 2008.

Posted by AnzioB | 0 Comments

Help is on the way - new BizTalk Utilities

Read on to learn about two utilities guaranteed to make your BizTalk administration and troubleshooting much easier. Many users have already discovered MessageBox Viewer (MBV). Very soon a compliment to MBV will be available, the Terminator. This utility makes database maintenance and analysis considerable less complex. Let's look at each.

MessageBox View

MBV can be unzipped onto the BizTalk box and executed by the BizTalk administrator. It will collect a snapshot of the entire BizTalk configuration. It takes analysis much further by identifying problems and potential problems in an easy to understand format. The most significant problems are displayed in red. Other potential issues are shown in yellow. Each item includes links to explain or fix the issue. The entire report is formatted as an HTML file. The process takes less than five minutes and is safe for production systems.

It would be difficult to cover all the information provided by MBV. Some of the common issues identified include:

1) Registry values changed causing poor performance

2) Table size issues

3) SQL job failures

4) Missing critical hot fixes

5) BizTalk tracking configuration

6) SQL configuration changes not recommended for BizTalk

7) Counts of zombies, orphans, RFR's and any other monsters lurking in the database

8) Recommended hot fixes and supportability issues

This is just a sample of what this tool can do. Every BizTalk shop should run this utility at least once a month to help manage BizTalk. More often if problems occur. Keep the reports to provide historical data.

Terminator

Terminator can be unzipped on any box (not necessarily a BizTalk box) and pointed at the BizTalk group. It does require BizTalk admin access to operate. Like MBV it does not require installation.

In the past SQL scripts and WMI calls were used to correct common BizTalk database issues. Terminator helps by providing a tool for these actions. It contains the latest SQL scripts and an advanced GUI to make WMI calls. The GUI displays descriptions and presents required parameters based on the selected task.

Check the boxes indicating compliance with program requirements, database backup and stopping BizTalk activities. Enter the SQL Server and management database names. The Terminator responds with access to various tabs.

· Basic tab presents the user with the most common and safe scripts used for BizTalk maintenance.

· Advanced tab for more complex scripts and MBV interoperation.

· WMI tab allows more complex operations than those offered by the BizTalk admin console.

· Results tab displays messages from executed scripts or WMI calls after execution.

· Help tab explains how to use the tool in detail.

With MBV identifying problems and Terminator fixing them, it made sense to get them working together. MBV automatically creates an XML file for the items Terminator can address. Terminator provides a browse button to locate this file. Clicking the MBV button limits the tasks to those required for addressing the MBV issues. 

Terminator is currently only available from BizTalk support. This will change in the near future as it goes through final testing. MBV is available right now at http://blogs.technet.com/jpierauc/. BizTalk professionals don't get caught without these tools. Check the MBV download site frequently. New features are added all the time. Stay tuned for public release of the Terminator.

For details on how to resolve common issues identified by MBV using Terminator, check out http://blogs.msdn.com/biztalkcpr/pages/using-biztalk-terminator-to-resolve-issues-identified-by-biztalk-msgboxviewer.aspx.

Posted by Larry2 | 0 Comments

Random Encounters during BizTalk Deployment

To be honest, deployment and version management are topics that I understand in theory but do not practice often.  Fortunately this week I picked up a deployment issue and went through a refresher course.  Just thought I would share a few observations that resurfaced after being lost in dormant brain cells. 

I started off with two simple BizTalk projects.  Project A contained my schemas and maps.  Project B referenced Project A and contained an orchestration that used the schemas and maps in Project A.  I deployed both assemblies and bound the orchestration to a file receive port and file send port.  Message flow worked fine without error at this point.

I then decided to make some modification to a schema.  Minor change only and the project built without error.  I did a full stop of the BizTalk application before deploying the updated schema assembly.  Deployment was successful.  I restarted the BizTalk host instance in addition to my BizTalk application to be certain the updated schema assembly was loaded.  All looking good so far, right?  So I dropped a test document into the file receive location again but this time no output file got created.    A bit surprised so I looked at the application event log for some clue. Didn’t expect to see a routing failure being reported.  I didn’t change the binding information of the orchestration and my BizTalk application started without error so I assumed that the orchestration was enlisted and running.  A refresh of the application in BizTalk Administration Console revealed the problem.  The orchestration was no longer deployed and therefore there was no subscriber to the message.  What happened here was that because the orchestration assembly depended on the schema assembly, the re-deployment of the schema assembly caused both assemblies to be un-deployed first.  After that, the schema assembly was re-deployed but not the orchestration assembly.  Everything worked fine again once I re-deployed the orchestration assembly.  I remember now writing documentation for this behavior awhile back.  Not a big deal but good to keep in mind when you deploy/un-deploy assemblies with dependencies. 

Interestingly, the above behavior did not happen if you update (overwrite) an assembly as resource from BizTalk Administration Console.  Just don’t delete the schema assembly first.  That’ll cause both assemblies to be removed.

Lastly, I decided to up the assembly version of my schema assembly and deployed the new version.  Now I have 2 versions of my schema assembly in side by side deployment.  Again, no error with deployment.  Restarted my BizTalk application and host instance.  Again, the test message failed.  This time the failure was in XLANG.  The orchestration received “unexpected message type”.  When the message was published, it was resolved to the latest deployed schema assembly, but because I added the updated assembly this time with BizTalk Administration Console, the orchestration was not updated.  It was still expecting earlier version of the message.  A quick rebuild and redeploy of the orchestration assembly resolved the issue. 

The observations above are all very straightforward.  Just little things to keep in the back of the mind.   More to come as I dig deeper into lost memory.

Posted by evyang | 0 Comments
Filed under:
More Posts Next page »
 
Page view tracker