CSS SQL Server Engineers

This is the official team Web Log for Microsoft Customer Service and Support (CSS) SQL Support. Posts are provided by the CSS SQL Escalation Services

TCP Chimney Offload – Possible Performance and Concurrency Impacts to SQL Server Workloads

TCP Chimney Offload – Possible Performance and Concurrency Impacts to SQL Server Workloads

  • Comments 4

TCP Chimney is enabled by default if you apply Windows Server 2003 Sp2.  This is an operating system feature that provides capability to offload TCP/IP packet processing from the processor to the network adapters and some other balancing options.  (For a full description of this feature see http://support.microsoft.com/kb/912222.)

TCP Chimney has been known to cause issues on SQL Server systems such as general network errors and working set trimming.  The following articles document these known issues:

http://support.microsoft.com/kb/942861

http://support.microsoft.com/kb/918483

We’ve also identified situations where TCP Chimney has impacted transaction throughput and caused delays between when a statement has been completed by the SQL engine and the time to receive the begin event of the next statement.  This impact can be significant especially in application workloads that have throughput requirements to execute a series of statements within a certain time boundary. 

For example, your application has a key transaction that consists of multiple statements.  Each individual statement on the engine side is optimized and has very short duration.  The overall duration of the transaction is short because each statement has low duration and the time in between the batches is short as well.  A profiler trace of this transaction typically shows a pattern like the following.  Note that there is very short time in between the complete of one batch and the start of the next batch:

image

However with TCP Chimney enabled, you notice there is a marked delay between a batch completed and the start of the next batch for the exact same series of statements and work.  In this example, note how there is approximately a 500 ms. delay in between the complete and start of the next batch:

image

In this scenario with the 500 ms. delay in between statements you would see the SPID spend most of its time awaiting command in sys.sysprocesses with a waittype of 0x000.

This type of delay can affect application throughput as well as concurrency.  For example if the above statements are all encompassed in an implicit transaction, with the added delay the overall duration of the implicit transaction is significantly increased, locks would then be held longer than normal and you may see unexpected blocking.  If you do a comparison test of the same implicit transaction between two systems, one with TCP Chimney enabled and the other with TCP Chimney disabled and you compare the sum of the duration of the individual statements vs. the total duration of the entire transaction, you may see that the overall transaction is significantly increased when TCP Chimney is enabled.  With TCP Chimney enabled, the delta between the sum of the statement duration from the overall transaction duration shows that the majority of time is spent awaiting the next batch/command. 

Here is an example comparison of the same workload with TCP Chimney enabled and disabled.  Note the significant increase in transaction duration and the large delta (difference between transaction duration vs. the sum duration of all statements within transaction) when TCP Chimney is enabled:

Implicit Transaction Summary TCP Chimney Enabled

spid    TransactionID  TranStart     TranEnd       TranDuration  sum_batch_duration   batch_count    delta
------- -------------- ------------- ------------  ------------- -------------------- -------------- --------
57      916972         09:40:24.450  09:41:17.623  53173         601                  516            52572
57      896243         09:39:31.620  09:40:01.840  30220         322                  301            29898
57      877227         09:39:12.120  09:39:15.293  3173          306                  161            2867
57      876313         09:38:58.590  09:38:58.603  13            0                    1              13
57      895388         09:39:18.510  09:39:18.527  16            16                   4              0
57      915675         09:40:02.653  09:40:02.670  16            16                   4              0

Implicit Transaction Summary TCP Chimney Disabled

spid    TransactionID  TranStart     TranEnd       TranDuration  sum_batch_duration   batch_count    delta
------- -------------- ------------  ------------  ------------- -------------------- -------------- --------
54      127910         11:13:47.287  11:13:52.490  5203          4060                 516            1143
54      107344         11:13:23.380  11:13:24.427  1046          382                  301            664
51      87187          11:12:50.067  11:12:50.550  483           0                    1              483
54      88182          11:13:03.987  11:13:07.237  3250          2878                 161            372
51      106432         11:13:10.487  11:13:10.487  0             0                    1              0
54      126550         11:13:25.490  11:13:26.007  516           516                  4              0

 

If you observe a similar pattern and suspect TCP Chimney, you may want to disable TCP Chimney to provide immediate relief.  Another option is to follow up with your network adapter vendor to see if they have an updated driver that will address the problem and allow for use of TCP Chimney.  For additional information see  http://support.microsoft.com/default.aspx?scid=kb;EN-US;948496

 

TCP Chimney is off by default in Windows Server 2008 - see http://support.microsoft.com/kb/951037.

Sarah Henwood | Microsoft SQL Server Escalation Services

Leave a Comment
  • Please add 4 and 6 and type the answer here:
  • Post
  • I appreciate the information on the impact of SNP on SQL Server, and the desire to disable SNP to avoid the impact as a short-term work-around.

    I would assume that a feature which offloads network processing workload from the system processor(s) to the network adapter(s) would result in greater capacity and better performance of the main system workload.  Evidently, the specific implementation has flaws that block this goal.  Are there resolutions in process to correct these defects and allow the desired gains – and in what time frame?  None were mentioned in the article, but the question begs to be asked.

    Since the feature leverages capabilities within the OS and within specific network adapter(s), it is possible that the defect is in the network adapter (a hardware vendor responsibility), the OS (a Microsoft responsibility), or both.

    I am interested in hearing more information on the root cause(s) and possible resolution(s) for this issue.

    Thanks,

    Scott R.

  • ScalableNetworkingPack(常见问题详细描述)在Server2003安装SP2后会默认被打开,打开此功能可以把对网络数据包的处理从CPU分担一部分到网卡上,但这可能会引起S...

  • By now most of us are using Windows 2003 SP2 unless there are strict application requirements and your

  • This is an excellent article that addresses an issue we have been trying to troubleshoot for some time. I would like to learn more about the actual steps used to obtain the data and the SQL Queries used to extract the data. My guess is the fn_trace_gettable was used to pull the information from the Trace file but I would greatly appreciate getting the actual steps and Queries used to provide these results as they would be instrumental in proving how Offload reduces SQL performance.

    Best Regards,

    SQLRookie

Page 1 of 1 (4 items)