BizTalk Server performance relies on several hardware (disks, cpu(s)...), software (SQL Server, BizTalk Host Instance…), network components and the application design by itself.Each of these above component involved in the architecture may impact heavily the BizTalk server performance.The basic recommendation of an architecture that will ensure the high availability is to have at least:_ a group of 2 BizTalk servers within a group_ a cluster SQL server instance hosting the BizTalk server databases_ a Master secret server running as a generic resource on the SQL server instanceIn the next pages, I'll be reviewing each component and I'll mention what to check in order to make sure if the component is really the bottleneck in the chain of the components that forms the architecture.
Perform some health check operations on the SQL server instanceSometimes the CPU spikes to 100 percent due to several reasons: CPUs, SQL requests, large messages,… and the BizTalk messages processing performance starts to show a slowness over the time. Other times, users report some problems related to the memory pressure and I/O issues. From a troubleshooting perspective, please consider to walk through the steps below:
If over time the numbers returned are keeping growing or you are experiencing periodic jumps/spike or the numbers stay relatively high most likely the system is I/O bound.In order to identify the cause you will have to dive further and mainly to monitor the physical disk (see below or next post on physical disk).
If a number of "runnable" tasks per each scheduler are always greater than 1 and all of your queries have a correct plan then you will need more CPUs (or performing ones).As mentioned above, before suspecting the CPUs system, be sure that the SQL optimizer is not generating a bad query plan. Having the statistics out of date or you are trying to perform a "plumber" optimization will lead the optimizer to generate a bad query plan.
You may also want to use the sys_dm_exec_requests DMV.This DMV returns information about each request that is currently executing within the SQL server instance.One row is returned for every executing query. The DMV is very useful to determine the operation of queries that take a long time to complete. The following query returns details about all the user queries currently executing on the instance of the SQL Server.select session_id, command, total_elapsed_time, status, reads,writes, start_time, sql_handle from sys.dm_exec_requests where session_id > 50order by total_elapsed_time DESCTo determine the text of the corresponding SQL statement returned by sys_dm_exec_requests DMV, use the sys.dm_exec_sql_text DMV that takes the sql_handle as input parameter select text from sys.dm_exec_sql_text(0x03000C00A090EB542946E500449900000100000000000000)
Physical disk I/O operations
Under a heavy work load and if you have a slow I/O subsystem, the BizTalk messages processing will start to show a slowness and users will start to experience peformance problems such as slow response times, tasks that abort due to timeouts,...To identify I/O bottlenecks, you will need to look into multiple performance counters to cross check the outcome of your findings.Don't rely ONLY on one performance counter or on a short period of time of analysis.The following performance counters will help you to determine if you are experiencing I/O bottlenecks or very slow hard disk:
Avg. Disk Sec/Read This performance counter represents the average time, in seconds, of a read of data from the disk.
Less than 10 ms - very good Between 10 - 20 ms – Ok Between 20 - 50 ms - slow, needs attention Greater than 50 ms – Serious I/O bottleneck
Avg. Disk Sec/Write This performance counter is the average time, in seconds, of a write of data to the disk.For value and numbers, Please refer to the guideline in the previous bullet.
Physical Disk: %Disk TimeThis performance counter is the percentage of elapsed time that the selected disk drive was busy servicing read or writes requests.A general guideline is that if this value is greater than 50 percent, it represents an I/O bottleneck.
Avg. Disk Reads/Sec This counter is the rate of read operations on the disk. You need to make sure that this number is less than 85 percent of the disk capacity. The disk access time increases exponentially beyond 85 percent capacity.
Avg. Disk Writes/Sec This counetr is the rate of write operations on the disk. Make sure that this number is less than 85 percent of the disk capacity. The disk access time increases exponentially beyond 85 percent capacity.Of course, when using the above counters, you need to adjust the values for RAID configurations.The following formulas.should be respected:
Let’s say, you have a RAID-1 system with two physical disks and you performance counters show the following values:
Disk Reads/sec 80Disk Writes/sec 70Avg. Disk Queue Length 5
In that case, you are encountering (80 + (2 * 70))/2 = 110 I/Os per disk and your disk queue length = 5/2 = 2.5 which indicates a border line I/O bottleneck
If the physical disks ar dediacted to the SQL Server databases then you may want to identify the I/O bottlenecks by examining the latch waits.
These latch waits account for the physical I/O waits when a page is accessed for reading or writing and the page is not available in the buffer pool. When the page is not found in the buffer pool, an asynchronous I/O is posted and then the status of the I/O is checked. If I/O has already completed, the worker proceeds normally. Otherwise, it waits on PAGEIOLATCH_EX or PAGEIOLATCH_SH, depending upon the type of request. The following DMV query can be used to find I/O latch wait statistics.
Select wait_type, waiting_tasks_count, wait_time_ms
from sys.dm_os_wait_stats
where wait_type like 'PAGEIOLATCH%'order by wait_type