The Issue:

We recently worked on a customer issue, where a distributed SQL query was executed in a single transaction using  SQL  – SQL   as linked server. The 2nd SQL Server has about 80 linked servers defined to 80 different remote SQL Servers.  The distributed query updates a table, which has update trigger defined.  The update trigger was implemented to execute distributed queries to more than 65 different SQL linked servers.

 The distributed query that was executed in transaction scope was failing with following error

 

The OLE DB provider "SQLNCLI10" for linked server "<Name of the linked Server>" reported an error. The provider reported an unexpected catastrophic failure.

 Error: 7391, Severity: 16, State: 2

 The operation could not be performed because OLE DB provider "SQLNCLI10" for linked server "<Name of the Linked Server>" was unable to begin a distributed transaction.

 

Since the error indicates some challenges in starting the distributed transaction, we collected MS DTC ETW traces and analyzed.

 

As per the traces, the query execution was successful when the table update trigger executed distributed queries in  less than or equal  to  64 number of linked servers:

'“;"transaction propagated to < Name of the linked Serve > as transaction child node #1"

….

'“;"transaction propagated to < Name of the linked Serve > as transaction child node #64"

 

But failed when trigger executed distributed queries on the 65th linked servers and logged following error

 

'" ;"failed to propagate transaction to child node < Name of the linked Serve >  because the limit on the number of children has been reached at the local transaction manager"

 

The Cause:

We found the issue is by design as the MSDTC is designed to communicate, maximum up to only 64 DTC transaction managers   in a single transaction. The solution is to make sure a single transaction is not propagated for more than 64 servers/ DTC transaction managers.

 

Reviewed by: David Qiu - Escalation Engineer at Microsoft.