Welcome to MSDN Blogs Sign in | Join | Help

A service’s endpoint identity is an important entity in an authentication process. Any WCF developer should have a good understanding of what it is, how it is set at the service and client and what part it plays in the overall authentication process. For those in doubt, MSDN article on ‘Service Identity and Authentication’ will surely make a lot of those things clear.

Today I will concentrate my discussion on how a service’s endpoint identity can be set within a client application. Most common approach will be to configure it within a client configuration file enclosed within a <identity> element. For the sake of the topic, I will consider the following binding:

     <wsHttpBinding>

        <binding name="Message_Windows">

          <security mode ="Message">

            <message clientCredentialType="Windows"

                     establishSecurityContext="false"

                     negotiateServiceCredential="false"

                     algorithmSuite="Basic128"/>

          </security>

        </binding>

      </wsHttpBinding>

 

Let’s say that my service is hosted on IIS and the application pool is running under a custom service account. I have a HTTP SPN associated with that service account. Check this blog for more insights into SPN.

 

                Account name: somedomain\tyler

                Configured SPN: HTTP/tyler.durden.com

Going back to how we can provide this information inside a client configuration file:

<client>

            <endpoint address="http://somedomain.com/SimpleKerberosTest/Service.svc"

                binding="wsHttpBinding" bindingConfiguration="Message_Windows"

                contract="ServiceReference.IService" name="WSHttpBinding_IService">

                <identity>

                    <servicePrincipalName value="HTTP/tyler.durden.com" />

                </identity>

            </endpoint>

</client>

 

All we need now is to instantiate a service proxy within our client application passing to it this endpoint configuration name.

ServiceClient proxy = new ServiceClient("WSHttpBinding_IService");

Console.WriteLine(proxy.GetData(10));

 

Sometimes there is a necessity for us to pass the endpoint address dynamically via the code. That can easily be accomplished by using any of System.ServiceModel.ClientBase<T> constructors which accept a EndpointAddress object as one of its parameters.

ServiceClient proxy = new ServiceClient("WSHttpBinding_IService", <EndpointAddress object> or remote endpoint as a string input);

  

This whole scenario works like a charm in .NET 3.5. With the application of SP1, the same scenario fails with the following exception message:

<ExceptionType>System.ServiceModel.Security.MessageSecurityException, System.ServiceModel, Version=3.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</ExceptionType>

<Message>The token provider cannot get tokens for target 'http://somedomain.com/SimpleKerberosTest/Service.svc'.</Message>

 

Inner Exception Details:

 

<ExceptionString>System.ComponentModel.Win32Exception: The specified target is unknown or unreachable</ExceptionString>

<NativeErrorCode>80090303</NativeErrorCode>

 

How does the same environment now start complaining about an unreachable or unknown target? Collect WCF trace for the error and have a look at the determined identity (collect verbose tracing and look for ‘Identity was determined for an EndpointReference’):

<Identity xmlns="http://schemas.xmlsoap.org/ws/2006/02/addressingidentity">

<Dns> somedomain.com </Dns>

</Identity>

<EndpointReference xmlns="http://www.w3.org/2005/08/addressing">

<Address> http://somedomain.com/SimpleKerberosTest/Service.svc</Address>

</EndpointReference>

 

To our surprise what we see is a DNS generated by the framework. It then appends ‘HOST’ to this value to generate a SPN identity (HOST/somedomain.com). Eventually a HOST SPN is submitted by the client to active directory in request for a service ticket. In my test environment, I do not have any HOST SPNs registered. Hence the error message ‘target is unknown or unreachable’. Usually a HOST SPN will be associated with a computer account. In that scenario, the same request will fail with either of the following error messages:

 

The HTTP request is unauthorized with client authentication scheme 'Negotiate'. The authentication header received from the server was 'Negotiate oYG7MIG4oAMKAQGigbAEga1ggaoGCSqGSIb3EgECAgMAfoGaMIGXoAMCAQWhAwIBHqQRGA8yMDA4MDgxNDEwMzk0OVqlBQIDBlSwpgMCASmpJRsjUEFSVFRFU1QuRVhUUkFORVRURVNULk1JQ1JPU09GVC5DT02qRTBDoAMCAQOhPDA6GwRob3N0GzJuZ2ltdHdpcGNybWFwcC5wYXJ0dGVzdC5leHRyYW5ldHRlc3QubWljcm9zb2Z0LmNvbQ

 

OR

 

The remote server returned an error: (401) Unauthorized

 

This proves that with the application of SP1, we do not read the identity element from the client configuration file. This is a DESIGN CHANGE implemented with SP1. However one will encounter this issue only when using a ClientBase<T> which accepts a remote address as one of its parameters.

 

 Good news is that we have a couple very simple workarounds to resolve this issue:

 

1.       Use ChannelFactory to create a proxy.

 

IService proxy = null;

ChannelFactory<IService> factory = null;

 

EndpointIdentity identity = EndpointIdentity.CreateSpnIdentity("HTTP/tyler.durden.com");

EndpointAddress address = new EndpointAddress(new Uri("http://somedomain.com/SimpleKerberosTest/Service.svc"), identity);

 

       factory = new ChannelFactory<IService>("WSHttpBinding_IService", address);

 

proxy = factory.CreateChannel();

Console.WriteLine(proxy.GetData(10));

               

In the above scenario, the only limitation is hard coding of identity inside the code. That can easily be altered by reading the value from a configuration file.

 

2.       Use a dummy proxy to read the identity value from configuration file and use it to create another proxy object.

 

ServiceClient proxy = new ServiceClient("WSHttpBinding_IService");

EndpointIdentity identity = proxy.Endpoint.Address.Identity;

EndpointAddress address = new EndpointAddress(new Uri("http://somedomain.com/SimpleKerberosTest/Service.svc "), identity);

 

ServiceClient newProxy = new ServiceClient("WSHttpBinding_IService", address);

 

Console.WriteLine(proxy.GetData(10));

               

This way we can read the identity element from configuration file as well as set endpoint address dynamically via code.

 

Either ways will result in the client using correct SPN to submit to active directory. Now if you look into WCF trace one more time, you will see correct SPN:

 

<EndpointReference xmlns="http://www.w3.org/2005/08/addressing">

<Address> http://somedomain.com/SimpleKerberosTest/Service.svc </Address>

<Identity xmlns="http://schemas.xmlsoap.org/ws/2006/02/addressingidentity">

<Spn>HTTP/tyler.durden.com</Spn>

</Identity>

</EndpointReference>

 

While it was pretty simple for me to figure out why was the request failing (due to incorrect SPN), but it took me quite a while to figure out that it was a design change implemented by SP1. Hope this will save others a lot of precious time. 

Working with issued token is always fun. The whole possibility of making 3 components (client, RP and STS) to work seamlessly excites me to no end. Of late I worked on an interesting token renewal issue where the client was not requesting a new SAML token from the STS, even after expiration. An effect of that was client authentication failure at the service end as the presented token had already expired.

 

Exception thrown by the service:

 

<ExceptionType>System.IdentityModel.Tokens.SecurityTokenException, System.IdentityModel, Version=3.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</ExceptionType>

<Message>The SamlToken is not time valid. The current time '9/22/2009 6:54:35 PM' is outside the Effective '9/22/2009 6:53:54 PM' and Expiration '9/22/2009 6:54:24 PM' time of the token.</Message>

<StackTrace>

at System.IdentityModel.Selectors.SamlSecurityTokenAuthenticator.ValidateTokenCore(SecurityToken token)

at System.IdentityModel.Selectors.SecurityTokenAuthenticator.ValidateToken(SecurityToken token)

at System.ServiceModel.Security.ReceiveSecurityHeader.ReadToken(XmlReader reader, SecurityTokenResolver tokenResolver, IList`1 allowedTokenAuthenticators, SecurityTokenAuthenticator&amp; usedTokenAuthenticator)

at System.ServiceModel.Security.ReceiveSecurityHeader.ReadToken(XmlDictionaryReader reader, Int32 position, Byte[] decryptedBuffer, SecurityToken encryptionToken, String idInEncryptedForm, TimeSpan timeout)

at System.ServiceModel.Security.ReceiveSecurityHeader.ExecuteFullPass(XmlDictionaryReader reader)

at System.ServiceModel.Security.ReceiveSecurityHeader.Process(TimeSpan timeout)

</StackTrace>

 

Initial thought was that a lifetime of a SAML token was governed by the attributes set inside <saml:Condition> element :

 

           <saml:Conditions NotBefore="2009-09-22T18:53:54.425Z"

                                     NotOnOrAfter="2009-09-22T18:54:24.425Z"></saml:Conditions>

 

Once de-serialized at the client, the ‘validTo’ and ‘validFrom’ properties of the returned SecurityToken at the client should reflect the same values. However in my case, things were a bit different:

 

SecurityToken.ValidFrom  à DateTime.UtcNow

       SecurityToken.ValidTo    à DateTime.MaxValue

 

This was really disconcerting. Where the heck were these values coming from ? A max value for ValidTo meant that the issued token would never be renewed !!! Had to dig into the reflected source code and figure out the root cause of the issue. Take a look at the method responsible for de-serializing a RSTR at the client. Just showing the lines of interest from System.ServiceModel.Security.WSTrust+Driver.CreateRequestSecurityTokenResponse:

 

DateTime utcNow = DateTime.UtcNow;

DateTime maxUtcDateTime = DateTime.MaxValue;

if ((element2.LocalName == this.DriverDictionary.Lifetime.Value) && (element2.NamespaceURI == this.DriverDictionary.Namespace.Value))

{

   XmlElement element4 = XmlHelper.GetChildElement(element2, "Created",  "http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd");

   if (element4 != null)

   {

      utcNow = DateTime.ParseExact(XmlHelper.ReadTextElementAsTrimmedString(element4),WSUtilitySpecificationVersion.AcceptedDateTimeFormats, DateTimeFormatInfo.InvariantInfo,DateTimeStyles.None).ToUniversalTime();

   }

   XmlElement element5 = XmlHelper.GetChildElement(element2, "Expires", "http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd");

   if (element5 != null)

   {

      maxUtcDateTime = DateTime.ParseExact(XmlHelper.ReadTextElementAsTrimmedString(element5),WSUtilitySpecificationVersion.AcceptedDateTimeFormats,DateTimeFormatInfo.InvariantInfo,DateTimeStyles.None).ToUniversalTime();

   }

}

Note the highlighted elements above. WCF looks for these elements inside an incoming RSTR to set the lifetime of a security token, NOT inside SAML token. If not present, these individual values, ValidTo and ValidFrom, are set to their corresponding values as highlighted earlier. While the solution to this issue is very simple, I will come to that a bit later. First, we need to understand the significance of such an implementation. A saml:Condition defines the lifetime of a SAML token. A lifetime element defines the lifetime of a RSTR received by the client. This makes perfect sense since an issued token can be something other than a SAML token as well. Taking that into light, a token lifetime at the client and hence any subsequent renewal decision should not be tightly coupled with a SAML token lifetime.

 

Coming back to the solution, just ensure that you define a lifetime element when generating a RSTR inside your custom STS. A lifetime element inside a RSTR looks as follows:

 

<Lifetime>

<Created xmlns="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd">2009-09-22T19:03:44.4697845Z</Created>

<Expires xmlns="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd">2009-09-22T19:03:54.4697845Z</Expires>

</Lifetime>

 

A token with the above lifetime element will be valid for 10 minutes before a renewal request is made by the client to STS.

 

Now let’s make things a bit more interesting by considering these 2 elements in union. Consider a generated security token with the following values set:

 

saml:Conditions NotBefore : 2009-09-22T19:03:44.4697845Z

saml:Conditions NotOnOrAfter : 2009-09-22T19:03:49.4697845Z

Lifetime Created : 2009-09-22T19:03:44.4697845Z

Lifetime Expires : 2009-09-22T19:03:54.4697845Z

 

This means that while the lifetime of a SAML token is set to 5 minutes, that of the RSTR is set to 10 minutes. So the client will not renew the token till the 11th minute, while the SAML token will be invalidated by the service starting from the 6th minute. Make sure the lifetime of the issued security token by STS <= lifetime of SAML token to prevent such a scenario.

 

There is one very important parameter which I have not spoken of in this article: binding. In this explained scenario, secure sessions were disabled. Things change when we enable that either through binding or use one which implicitly sets up a secure session (netTcpBinding). I will talk about that in my following post. Till then enjoy playing around with issued tokens.

 

We all are aware that WCF provides a new TCP-based network protocol (net.tcp://) for high performance communication. People who are familiar with this particular protocol will know that for a IIS 7.0 hosted WCF service to utilize this protocol, we need to enable appropriate protocol (namely net.tcp) at the web site and virtual directory level for things to work. Highlighted below is the section where the configurations goes in :

                              bindings

Coming to the issue that I want to share today. I was working on a basic WCF client – service scenario where I had a service hosted on IIS 7.0 over net.tcp protocol. Service binding was as follows

<netTcpBinding><binding name="TransportWithWindows"> <security mode="Message"> <message clientCredentialType ="Windows"/> </security> </binding> </netTcpBinding>

With this simple setting in place, the client was failing with the following exception :

at System.ServiceModel.Channels.ConnectionUpgradeHelper.DecodeFramingFault(ClientFramingDecoder decoder, IConnection connection, Uri via, String contentType, TimeoutHelper& timeoutHelper)

at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.SendPreamble(IConnection connection, ArraySegment`1 preamble, TimeoutHelper& timeoutHelper) at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.DuplexConnectionPoolHelper.AcceptPooledConnection(IConnection connection, TimeoutHelper& timeoutHelper) at System.ServiceModel.Channels.ConnectionPoolHelper.EstablishConnection(TimeSpan timeout) at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.OnOpen(TimeSpan timeout) at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout) at System.ServiceModel.Channels.ServiceChannel.OnOpen(TimeSpan timeout) at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)

That connectivity is the issue out here is evident from the above stack trace. However I had my bindings configured at IIS, Net.Tcp Port Sharing Service was running on my machine, ‘WCF Non-HTTP Activation’ was installed. Where do we look in this case ? Back to the same bindings as highlighted in the image above :

                            nettcp

While including the binding information, I had made the mistake of entering it in UPPER CASE instead of lower case (net.tcp). Quite an innocuous error but took quite a while to figure out.

One other point which one should keep in their mind while configuring a protocol binding. DO NOT LEAVE ANY SPACE in between individual bindings.

.NET does not like the following : http, net.tcp. (NOTE A SPACE IN BETWEEN ‘,’ AND ‘n’)

Client will once again fail with a EndpointNotFoundException, while service activation will fail with the following exception :

InvalidOperationException: Could not find a base address that matches scheme net.tcp for the endpoint with binding NetTcpBinding. Registered base address schemes are [http,https].]

Correct : http,net.tcp

Good news is that this little nuance has been resolved with .NET 4.0. Hope these will save some precious time which can be spent elsewhere.

Scenario: A WCF service hosted in IIS that is invoked from another WCF service or ASP.NET application hosted in IIS on the same machine, throws the following error:
                 "The HTTP request is unauthorized with client authentication scheme 'Ntlm'. The authentication header received from the server was 'Negotiate,NTLM"'.
The key here is that it fails only for local requests. If you call the same WCF service from a different server, it works fine.

One possible cause of this issue could be the DisableLoopbackCheck introduced in Windows Server 2003 SP1. The following article talks about this.
926642 Error message when you try to access a server locally by using its FQDN or its CNAME alias after you install Windows Server 2003 Service Pack 1: "Access denied" or "No network provider accepted the given network path"
http://support.microsoft.com/default.aspx?scid=kb;EN-US;926642

A quick test to find out if you are running into this issue is turn off disable loop back check(Method 2 in the article above). If this fixes the issue, you should use the Method1 from the article above for a long term solution.

Method 1 (recommended): Create the Local Security Authority host names that can be referenced in an NTLM authentication request

To do this, follow these steps for all the nodes on the client computer:

  1. Click Start, click Run, type regedit, and then click OK.
  2. Locate and then click the following registry subkey: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa\MSV1_0
  3. Right-click MSV1_0, point to New, and then click Multi-String Value.
  4. In the Name column, type BackConnectionHostNames, and then press ENTER.
  5. Right-click BackConnectionHostNames, and then click Modify.
  6. In the Value data box, type the CNAME or the DNS alias, that is used for the local shares on the computer, and then click OK.

    Note Type each host name on a separate line.

    Note If the BackConnectionHostNames registry entry exists as a REG_DWORD type, you have to delete the BackConnectionHostNames registry entry.
  7. Exit Registry Editor, and then restart the computer.

Method 2: Disable the authentication loopback check

Re-enable the behavior that exists in Windows Server 2003 by setting the DisableLoopbackCheck registry entry in the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa registry subkey to 1. To set the DisableLoopbackCheck registry entry to 1, follow these steps on the client computer:

  1. Click Start, click Run, type regedit, and then click OK.
  2. Locate and then click the following registry subkey: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa
  3. Right-click Lsa, point to New, and then click DWORD Value.
  4. Type DisableLoopbackCheck, and then press ENTER.
  5. Right-click DisableLoopbackCheck, and then click Modify.
  6. In the Value data box, type 1, and then click OK.
  7. Exit Registry Editor.
  8. Restart the computer.

Note You must restart the server for this change to take effect. By default, loopback check functionality is turned on in Windows Server 2003 SP1, and the DisableLoopbackCheck registry entry is set to 0 (zero). The security is reduced when you disable the authentication loopback check, and you open the Windows Server 2003 server for man-in-the-middle (MITM) attacks on NTLM.


The same scenario could also occur in case of an asp.net application calling a webservice(.asmx) locally.
The following blog discusses the details about this issue.
http://blogs.msdn.com/lukaszp/archive/2008/07/18/reporting-services-http-401-unauthorized-host-headers-require-your-attention.aspx

Problem Description
You have a COM+ server application. The application is set to run as a particular user. After working for sometime on Windows Server 2008 the application may stop working and keep failing. Unless you restart the COM+ application, it won’t come back. In the meantime you may see an error like this in the application event log on the CLIENT machine:

Event Type:        Error
Event Source:    DCOM
Event Category:                None
Event ID:              10006
Date:                     10/17/2009
Time:                    1:36:39 PM
User:                     Domain\user
Computer:          *****
Description:
DCOM got error "Unspecified error " from the computer ‘servername’ when attempting to activate the server: {EF047BF9-F91A-4D5B-A18F-BED49553703B}

In this case the event message tells you that the error (E_FAIL or 80004005 or Unspecified error ) is returned from the server during activation vs. a method call. The component CLSID is {EF047BF9-F91A-4D5B-A18F-BED49553703B}

Cause 
The identity user initially logged on to the server when the application launched. The issue happens when the identity user logs off and the COM+ application can no longer read registry keys in the profile of the identity user because of a new User Profile Service functionality of forcing the unload of the user profile on Windows 2008 when the user logs off. Note this new User Profile Service functionality is built into the OS by default.This is a situation where the functionality of forcing the unload of the user profile may break an application if registry handles are not closed in the process.

If you enable COM tracing, you’ll see the error ERROR_KEY_DELETED in the ole32 trace log:

[2] 0BA8.15D0::10/17/2009-13:07:54.390 [OLECOM](:CComRegCatalog::GetClassInfoW) CLSID:ecabafae-7f19-11d2-978e-0000f8757e2a 1018(ERROR_KEY_DELETED)
[2] 0BA8.15D0::10/17/2009-13:07:54.390 [OLECOM](:CComCatalog::GetClassInfoInternal) CLSID:ecabafae-7f19-11d2-978e-0000f8757e2a Flags:0 IID:00000000-0000-0000-c000-000000000046 0x800703fa(ERROR_KEY_DELETED)

You'll see events like this in the application event log:

Log Name:      Application
Source:        Microsoft-Windows-User Profiles Service
Date:          10/26/2009 8:22:13 AM
Event ID:      1530
Task Category: None
Level:         Warning
Keywords:      Classic
User:          SYSTEM
Computer:      DAVIDQIU2008
Description:
Windows detected your registry file is still in use by other applications or services. The file will be unloaded now. The applications or services that hold your registry file may not function properly afterwards. 

DETAIL -
1 user registry handles leaked from \Registry\User\S-1-5-21-1049297961-3057247634-349289542-1004_Classes:
Process 2428 (\Device\HarddiskVolume1\Windows\System32\dllhost.exe) has opened key \REGISTRY\USER\S-1-5-21-1049297961-3057247634-349289542-1004_CLASSES

Resolution
As a workaround it may be necessary to disable this feature which is the default behavior. The policy setting 'Do not forcefully unload the user registry at user logoff' counters the default behavior of Windows 2008. When enabled, Windows 2008 does not forcefully unload the registry and waits until no other processes are using the user registry before it unloads it.

The policy can be found in the group policy editor (gpedit.msc)
Computer Configuration->Administrative Templates->System-> UserProfiles
Do not forcefully unload the user registry at user logoff

Change the setting from “Not Configured” to “Enabled”, which disables the new User Profile Service feature.

'DisableForceUnload' is the value added to the registry

Note the same issue can happens on Vista, Windows 7 and Windows 2008 R2.

SYMPTOMS
A COM+ application that uses Oracle 9i fails intermittently on Windows 2003 Service Pack 2. The application logs the follow error in the event log:

Event Type:     Error
Event Source:  ************
Event Category:          None
Event ID:        0
Date:               9/14/2009
Time:               10:46:51 PM
User:                N/A
Computer:       ************
Description:
Error: PID 1160 -- CATCHFAILED2 ERROR: 0x8004d00a Provider New transaction cannot enlist in the specified transaction coordinator.  Unable to enlist in the transaction.

If the component is a .NET servicedcomponent using System.Data.OracleClient, you'll see a System.InvalidOperationException with the following error message:

Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.

CAUSE
Examining the MSDTC trace logs you will see the event RECEIVED_ABORT_REQUEST_FROM_BEGINNER that is logged 90 seconds after the transaction has started.

pid=3188       ;tid=5532       ;time=10/05/2009 18:53:42.035   ;seq=1504       ;eventid=TRANSACTION_BEGUN                        ;tx_guid=c1ff5502-3c67-40ba-b6b8-2b21b54a39f2     ;"transaction got begun, description : 'ECP.ECPOutstandingOutProcessor.1'"
pid=3188       ;tid=4736       ;time=10/05/2009 18:55:13.098   ;seq=1505       ;eventid=RECEIVED_ABORT_REQUEST_FROM_BEGINNER     ;tx_guid=c1ff5502-3c67-40ba-b6b8-2b21b54a39f2     ;"received request to abort the transaction from beginner"
pid=3188       ;tid=4736       ;time=10/05/2009 18:55:13.098   ;seq=1506       ;eventid=TRANSACTION_ABORTING                     ;tx_guid=c1ff5502-3c67-40ba-b6b8-2b21b54a39f2     ;"transaction is aborting"
pid=3188       ;tid=4736       ;time=10/05/2009 18:55:13.098   ;seq=1507       ;eventid=TRANSACTION_ABORTED                      ;tx_guid=c1ff5502-3c67-40ba-b6b8-2b21b54a39f2     ;"transaction has been aborted"

In our case the DTC transaction timeout value is set to 0 (infinite) so the transaction isn't timed out in default 60 seconds. Otherwise you may see the event ABORT_DUE_TO_TRANSACTION_TIMER_EXPIRED in the trace when DTC is timed out.

A dump before the error will show the thread stack like the following:

0:000> ~*kb

  12  Id: 1148.16c0 Suspend: 1 Teb: 7ffaa000 Unfrozen
ChildEBP RetAddr  Args to Child             
0120cfbc 7c826f69 77e41ed5 00000000 0120d000 ntdll!KiFastSystemCallRet [d:\nt\base\ntos\rtl\i386\userdisp.asm @ 545]
0120cfc0 77e41ed5 00000000 0120d000 00000000 ntdll!ZwDelayExecution+0xc [d:\nt\base\ntdll\daytona\obj\i386\usrstubs.asm @ 611]
0120d028 77e424fd 00002328 00000000 0120d084 kernel32!SleepEx+0x68 [d:\nt\base\win32\client\synch.c @ 1655]
0120d038 61173b62 00002328 0120d0d0 00000000 kernel32!Sleep+0xf [d:\nt\base\win32\client\synch.c @ 1579]
0120d084 61174679 44701780 08f60fd8 02cc8714 msdtcprx!CIResourceManager::Init+0xee [d:\nt\com\complus\dtc\dtc\msdtcprx\src\dtcprxrm.cpp @ 520]
0120d0a4 02e9f188 00038a68 0120d0d4 02d1e358 msdtcprx!CIResourceManagerFactory::Create+0x79 [d:\nt\com\complus\dtc\dtc\msdtcprx\src\dtcprxrm.cpp @ 186]
WARNING: Stack unwind information not available. Following frames may be wrong.
0120d0f8 02e9e618 00038a68 02d1f1a0 00fab438 oramts+0xf188
0120d524 02e9ffba 00faa420 44701780 08f60fd8 oramts+0xe618
0120eb6c 02ea667d 011003c8 00001000 00000000 oramts+0xffba
0120ed8c 02ea656a 00effb60 011003c8 00001000 oramts!OraMTSJoinTxnEx+0x10e
0120eda0 02726e38 00effb60 011003c8 00000000 oramts!OraMTSJoinTxn+0x15
0120edd0 4c851e69 02304348 02cc8634 027069fd OraOLEDB+0x26e38
0120eddc 027069fd 02304354 027043ce 02cc8634 oledb32!ATL::CComContainedObject<CRCM>::AddRef+0x11 [d:\nt\enduser\databaseaccess\src\mdac\shared\mpcs_atl\atlcom.h @ 835]
0120ede4 027043ce 02cc8634 0120ee34 011003c8 OraOLEDB+0x69fd
0120edf4 4c883f42 02cc8698 011003c8 ffffffff OraOLEDB+0x43ce
0120ee34 4c883de7 02cc868c 011003c8 ffffffff oledb32!CTransactionJoinPass<CSCM>::JoinTransaction+0xbd [d:\nt\enduser\databaseaccess\src\mdac\oledb\core\src\cm\transactions.hpp @ 224]
0120ee50 4c884b96 023046e4 02305e74 023038e0 oledb32!CTransactionJoinCache<CDCM>::PassTransactionInfo+0x1c [d:\nt\enduser\databaseaccess\src\mdac\oledb\core\src\cm\transactions.hpp @ 158]
0120ee90 4c88cb5f 02304348 00172c80 0312601c oledb32!CDCM::EnlistSCM+0x87 [d:\nt\enduser\databaseaccess\src\mdac\oledb\core\src\cm\dcm.cpp @ 625]
0120eeb8 4a757523 0230357c 023038e0 011003c8 oledb32!CDCMPool::EnlistResource+0xbf [d:\nt\enduser\databaseaccess\src\mdac\oledb\core\src\dpo\dcmpool.cpp @ 1306]
0120eef0 4a7580f4 0312601c 023038e0 011003c8 comsvcs!CHolder::SafeDispenserDriver::EnlistResource+0x22 [d:\nt\com\complus\src\comsvcs\dispman\cholder.cpp @ 177]
0:000> ?00002328
Evaluate expression: 9000 = 00002328

0:000> lmvm oramts
start end module name
02e90000 02eb6000 oramts (export symbols) oramts.dll
Loaded symbol image file: oramts.dll
Image path: D:\oracle\ora92\bin\oramts.dll
Image name: oramts.dll
Timestamp: Wed Nov 12 05:31:04 2003 (3FB20BE8)
CheckSum: 0003257C
ImageSize: 00026000
File version: 9.2.0.4
Product version: 9.2.0.4
File flags: 0 (Mask 0)
File OS: 4 Unknown Win32
File type: 2.0 Dll
File date: 00000000.00000000
Translations: 0409.04b0
CompanyName: Oracle Corporation
ProductName: ORAMTS
InternalName: ORAMTS.DLL
OriginalFilename: ORAMTS.DLL
ProductVersion: 9.2.0.4.0
FileVersion: 9.2.0.4.0
FileDescription: Oracle Microsoft Transaction Server Integration DLL
LegalCopyright: Copyright © Oracle Corporation 1979,1999. All rights reserved.
Comments: Implements the API to integrate Oracle clients with Microsoft Transaction Server


0:000> lmvm OraOLEDB
start end module name
02700000 02763000 OraOLEDB C (export symbols) OraOLEDB.dll
Loaded symbol image file: OraOLEDB.dll
Image path: D:\oracle\ora92\bin\OraOLEDB.dll
Image name: OraOLEDB.dll
Timestamp: Sun May 05 17:34:29 2002 (3CD5A565)
CheckSum: 00000000
ImageSize: 00063000
File version: 9.2.0.1
Product version: 9.2.0.1
File flags: 0 (Mask 3F)
File OS: 4 Unknown Win32
File type: 2.0 Dll
File date: 00000000.00000000
Translations: 0409.04b0
CompanyName: Oracle Corporation
ProductName: Oracle Provider for OLE DB
InternalName: OraOLEDB
OriginalFilename: OraOLEDB.dll
ProductVersion: 9.2.0.1.0
FileVersion: 9.2.0.1.0
PrivateBuild: 9.2.0.1.0
SpecialBuild: 9.2.0.1.0
FileDescription: OraOLEDB Module
LegalCopyright: Copyright © 2001
LegalTrademarks: Copyright © 2001
Comments: Copyright © 2001

The code is stuck in a while loop that breaks after 10 sleeps with each sleep being 9 seconds. This is caused by a bug in the Oracle drivers. Please read Robert Hurlbut's .NET Blog.

About getting memory dumps and displaying callstacks, you can reference my other blogs such as A COM+ Application That Calls the DOMDocumentWrapper::load Method in msxml3 Is Hung Intermittently.

RESOLUTION
There's a patch on the metalink site that addresses this. The patch number is #3252645. The version of oramts.dll that includes the fix is 9.2.0.4.1

SYMPTOMS 
Intermittently MSDTC 4156 error message with RPC_S_INVALIDA_ARG in the event details is logged in the event log when MSDTC is configured with Mutual Authentication not required.

Event Type: Error
Event Source: MSDTC
Event Category: CM
Event ID: 4156
Date: 10/29/2009
Time: 4:22:02 PM
User: N/A
Computer: *********
Description:
String message: RPC raised an exception with a return code RPC_S_INVALIDA_ARG.

ENVIRONMENT
Windows 2003 Service Pack 1 and above.

CAUSE
The 4156 event typically happens when a session from a DTC to its partner DTC is getting established. The code always attempts to bind using Mutual Authentication to establish a secure session. If secure negotiation fails it will try unsecure binding if Mutual Authentication is not required. The attempt to try Mutual Authentication generates the 4156 event, however the immediate unsecure retry succeeds.

If you enable CM tracing, the CM Trace files reflect this behavior as follows:

10-29-2009 16:22 02:920: CM Error Value = 0x000006d2, Received Unauthenticated Rpc Call, d:\nt\com\complus\dtc\dtc\cm\src\iomgrsrv.cpp (704)
10-29-2009 16:22 02:951: CM Error Value = 0x00000057, Call to RpcBindingSetAuthInfoEx failed, d:\nt\com\complus\dtc\dtc\cm\src\iomgrclt.cpp (543)
10-29-2009 16:22 02:966: CM Error Value = 0x80070057, Call to SetRpcSecurity failed, d:\nt\com\complus\dtc\dtc\cm\src\iomgrclt.cpp (570)
10-29-2009 16:22 02:966: CM Error Value = 0x80070057, SecureBuildContextWrapper call failed. This is usually due to security/network configuration issues., d:\nt\com\complus\dtc\dtc\cm\src\iomgrclt.cpp (329)

Add the following Registry Value to enable CM Tracing:

Key: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSDTC
Name: TraceCMErr
Type: REG_DWORD
Value: 1 - Tracing enabled
Default Value: 0 - Tracing disabled

Restart MSDTC service for tracing to be enabled and collect the files from the "C:\Documents and Settings\NetworkService\Local Settings\Temp" folder. The trace file names will be in the following format: "MsDtcCMErr-<number>.txt"

How to enable diagnostic tracing for MSDTC on a computer that is running Windows XP or Windows Server 2003
How to enable diagnostic tracing for MS DTC on a Windows Vista-based computer

RESOLUTION
The 4156 event is safe to ignore if Mutual Authentication is not required and the CM trace log does not contain an additional trace messages of "BuildContextWrapper call failed. This is usually due to network configuration issues". The event is logged from RpcStatusToHresult, a utility function that is called by SetRPCSecurity after RpcBindingSetAuthInfoEx failed. We can consider a Warning message instead of the Error type in the future release.

Problem Description

A WCF message inspectors pass a custom message header from the client caller to the WCF service hosted in a Windows service over TCP.

  • A ClientMessageInspector adds in the custom header. 
  • A DispatchMessageInspector looks for the custom header and interrogates it if it is present.
  • The DispatchMessageInspector looks for the custom header and then calls ToString() on it to get its contents. 

The service WcfMyIncomingMessageInspector : IDispatchMessageInspector calls the WcfMyRequestSoapHeader(System.ServiceModel.Channels.Message message) constructor.

internal WcfMyRequestSoapHeader(System.ServiceModel.Channels.Message message)
{
    foreach (System.ServiceModel.Channels.MessageHeaderInfo header in message.Headers)
    {
        if (header.Name == "MyRequestSoapHeader")
        {
            String documentToLoad = String.Empty;
            try
            {
                XmlDocument document = new XmlDocument();
                documentToLoad = @“<?xml version='1.0' ?>” + header.ToString();
                document.LoadXml(documentToLoad);
                foreach (System.Xml.XmlNode childNode in document.DocumentElement.ChildNodes)
                {
                    WcfUtility.LoadContextElements(this.LoggerContextElements, childNode);
               }
            }
       }
}

The header contents retrieved should look something like:

<MyRequestSoapHeader xmlns="MyRequestSoapHeader">
        <LoggerContextElements>
                <entry name="LocalInstanceId" value="b0b31130-99e4-4fe8-a976-cc39d81c9c7b"></entry>
        </LoggerContextElements>
</MyRequestSoapHeader>

However the string “System.ServiceModel.Channels.HeaderInfoCache+HeaderInfo” is returned from the ToString() method. Next when LoadXml tries to load <?xml version='1.0' ?>System.ServiceModel.Channels.HeaderInfoCache+HeaderInfo and it fails because of the invalid XML.

<Message>Data at the root level is invalid. Line 1, position 23.</Message>

Cause

The reason that “System.ServiceModel.Channels.HeaderInfoCache+HeaderInfo” is returned from the ToString() method is because the type of ‘header’ is System.ServiceModel.Channels.HeaderInfoCache+HeaderInfo instead of System.ServiceModel.Channels.MessageHeader as expected. There is no ToString override in System.ServiceModel.Channels.HeaderInfoCache+HeaderInfo so the type name itself is returned. The problem with this is that MessageHeaders is a collection of MessageHeaderInfo and not of MessageHeader. It is not a good idea to count on objects enumerated by this class to have a particular type other than MessageHeaderInfo, which does not override ToString(). 

MyRequestSoapHeader in this case does derive from MessageHeader, however, since MessageHeader is not serializable, neither is WcfMyRequestSoapHeader, instead this class manages writing itself to the message and so, must manage retrieving itself from the message as well. 

Resolution

Here is a changed version of the request and response headers that solves the problem. The only change is in the constructor – instead of relying on the type of the header in the MessageHeaderCollection (that is, relying that the type will have an override of the ToString() method), these constructors access the header XML directly.

       internal WcfMyRequestSoapHeader(System.ServiceModel.Channels.Message message)
        {
            StringWriter writer = new StringWriter();
            writer.Write(@“<?xml version='1.0' ?>”);
           try
            {
                message.Headers.WriteHeader(message.Headers.FindHeader(this.Name, this.Namespace), new    XmlTextWriter(writer));
                XmlDocument document = new XmlDocument();
                document.LoadXml(writer.ToString());
                foreach (System.Xml.XmlNode childNode in document.DocumentElement.ChildNodes)
                {
                    WcfUtility.LoadContextElements(this.LoggerContextElements, childNode);
                }
            }

Another alternative would be to define WcfMyRequestSoapHeader using a DataContract and use one of the GetHeader overrides to retrieve the header as the appropriate type – this would simplify the code a good deal and remove the need for parsing the header contents as an XmlDocument.

Problem Description

The component is a COM+ event class. It is also a COM+ QueuedComponent (QC) and has object pooling enabled. The clients are a COM+ event publisher. They call into the event component to fire an event. Subscribers are notified of the event. The application works for about a few minutes and then stops working. The event class stops firing any event.

Below is the entry in the application event log while the app stops working:

Event Type:        Error
Event Source:    COM+
Event Category:                (103)
Event ID:              4772
Date:                     2/22/2009
Time:                     2:13:11 AM
User:                     N/A
Computer:          ComputerName
Description:
The COM+ Queued Components Player was unable to create an instance of a Queued Component. CPlayer BindToObject
Server Application ID: {B3DEC044-DFAB-46E0-B424-EBB3FDF90474}
Server Application Instance ID:
{6E75B9BA-DE44-4044-ACE4-CB69F6C73E73}
Server Application Name: ApplicationName
Error Code = 0x8004e024 : COM+ activation failed because the activation could not be completed in the specified amount of time.
COM+ Services Internals Information:
File: d:\srvrtm\com\complus\src\comsvcs\qc\player\player.cpp, Line: 467
Comsvcs.dll file version: ENU 2001.12.4720.1830 shp

Cause

First of all, there is a fundamental issue to be aware of when using QC with a pooled component.

The QC listener in a COM+ server application is designed to scale up to the configured maximum number of threads and it aggressively retrieves messages with as many threads as it has running.  When it first retrieves a message, the message is more or less opaque.  At this point we would not be able to throttle the number of threads based on the maximum pool size of a particular pooled component, for example, or really any contextual information about that component because we don’t know yet what component that message is intended for.  The maximum listener threads is really intended to be the only control on concurrency of QC message processing.

The problem is that object pooling is another control on concurrency in COM+ applications. One of the primary uses is managing limited resources like database connections, and being designed with this in mind, the maximum pool size is really a hard cutoff on the number of instances of a particular component that can exist at any one time.  Although object pooling might also be used to improve performance of some scenarios, this cutoff needs to be kept in mind.  If you configure a component to have a maximum pool size of 12, for example, this is a concurrency limit for that component.

Now let’s look at what happens if the QC listener tries to do more than the object pool can handle. Let’s say QCListenerMaxThreads is 64 (the default for a 4 proc machine = 4*16), and the pool size for one of the components with queuing enabled is 12.  If there are 64 messages in the queue, all intended for the component with max pool size 12, it is possible that 64 threads will be each processing one of these messages. Each of these thread tries to create an instance of the component. Only 12 will immediately get an instance, and the rest will block, possibly until the activation timeout is reached and activation fails with a CO_E_ACTIVATIONFAILED_TIMEOUT (0x8004e024) error. Note that as soon as one of the threads is finished with its instance and returns it to the pool it will almost immediately enable another message to be pulled from the queue, which might also contend for that component. Object pooling uses a semaphore object, which doesn’t guarantee first-in, first-out.

Something along these lines is very likely to be the cause of the event log error. In this particular case, since we failed to even activate the component, the message gets sent to the application’s dead queue.

And it may get worse if there are normal activations coming in for this component as well. These are contending for the same objects as the listener threads, and have no concurrency limit. Whenever you have both normal and queued activations for a component, there is no way to configure it to completely avoid this issue.

Resolution

The component in this particular case is an event class and it shouldn’t need pooling. Compared to components that need to get database connections, etc., pooling probably isn’t buying much in this scenario. Our recommendation is to choose either QC or pooling and disable the other.

The takeaway from all this is that every queued, pooled component in a COM+ application needs to have a maximum pool size at least as large as the maximum number of listener threads to avoid this issue.

We don’t have any KB articles specific to this, but this one gets close to the issue:

PRB: Virtual Deadlock When You Call Pooled Component from Queued Component

Problem Description

A .NET WinForms application creates a distributed transaction that inserts a row into a remote Informix OLE DB database. The .NET app uses System.Transactions and the IBM Informix OLE DB Provider. The transaction starts timing out when calling TransactionScope.Complete().  It is throwing a TransactionAbortedException in one minute. The inner exception is TimeoutException.

If TransactionScope is removed, the code inserts correctly. Using TransactionScope, the record is inserted into the database that can be seen with ISOLATION LEVEL set to DIRTY READ until the transaction aborts.

Cause

You can take a hang dump of the WinForm application process before the exception is thrown using ADPlus or Debug Diagnostic Tool v1.1.

Open the dump in WinDbg.exe and set the symbol path as below:

0:000> .sympath SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols
Symbol search path is: SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols
0:000> .reload

Display the unmanaged callstack::

0:000> kL
ChildEBP RetAddr 
0012ec34 7c90d1fc ntdll!KiFastSystemCallRet
0012ec38 7c8023f1 ntdll!NtDelayExecution+0xc
0012ec90 7c802455 kernel32!SleepEx+0x61
0012eca0 03ef8c74 kernel32!Sleep+0xf
WARNING: Stack unwind information not available. Following frames may be wrong.
00000000 00000000 ifxoledbc!InitDlgProc+0x19974

0:000> lmvm ifxoledbc
start    end        module name
03ed0000 03f20000   ifxoledbc   (export symbols)       ifxoledbc.dll
    Loaded symbol image file: ifxoledbc.dll
    Image path: C:\Program Files\IBM\Informix\Client-SDK\bin\ifxoledbc.dll
    Image name: ifxoledbc.dll
    Timestamp:        Wed Apr 23 07:18:56 2008 (480F1B20)
    CheckSum:         00054EFD
    ImageSize:        00050000
    File version:     3.50.0.13407
    Product version:  3.50.0.13407
    File flags:       0 (Mask 3F)
    File OS:          40004 NT Win32
    File type:        1.0 App
    File date:        00000000.00000000
    Translations:     0409.04b0
    CompanyName:      IBM Corporation
    ProductName:      IBM Informix OLE DB Provider
    InternalName:     Ifxoledbc
    OriginalFilename: Ifxoledbc.dll
    ProductVersion:   3.50.TC1
    FileVersion:      3.50.0000
    FileDescription:  IBM Informix OLE DB Provider
    LegalCopyright:   Copyright 2007 IBM Corporation

It is stuck in the IBM Informix OLE DB Provider.

Load the CLR debugger extension sos.dll from the framework directory.

0:000> .load c:\Windows\Microsoft.NET\Framework\v2.0.50727\sos.dll

Display the managed callstack:

0:000> !clrstack
OS Thread Id: 0x12f4 (0)
ESP       EIP    
0012eeb0 7c90e4f4 [ComPlusMethodFrameGeneric: 0012eeb0] System.Data.Common.NativeMethods+ITransactionJoin.JoinTransaction(System.Object, Int32, Int32, IntPtr)
0012eecc 655eff63 System.Data.OleDb.OleDbConnectionInternal.EnlistTransactionInternal(System.Transactions.Transaction, Boolean)
0012ef10 655f008d System.Data.OleDb.OleDbConnectionInternal.Deactivate()
0012ef20 6522dc3a System.Data.ProviderBase.DbConnectionInternal.CloseConnection(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionFactory)
0012ef54 655e99dd System.Data.OleDb.OleDbConnection.Close()
<snip>

OleDbConnectionInternal.Deactivate calls OleDbConnectionInternal.EnlistTransactionInternal that in turn calls ITransactionJoin::JoinTransaction passing a null to it. When null is passed in, System.Data.Common.NativeMethods+ITransactionJoin.JoinTransaction unenlists the transaction (see ITransactionJoin::JoinTransaction). The IBM Informix OLE DB Provider doesn't return from this call until it is timing out.

Resolution

A fix for the bug is available from IBM.

Problem Description

Intermittently a COM+ application that called the DOMDocumentWrapper::load method in msxml3 is hung. A restart of the application was required to temporarily solve the problem on a Windows 2000 SP4 machine.

Cause

To identify the cause of the issue we took several hang dumps of the COM+ application (dllhost.exe) when it was hung using ADPlus. Alternatively we could use Debug Diagnostic Tool v1.1 to get memory dumps.

Opened the dump in WinDbg.exe and set the symbol path as below:

0:000> .sympath SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols
Symbol search path is: SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols
0:000> .reload

Used ~*kb to display all thread stacks. Many threads were seen waiting for a URLMON critical section.

0:000> ~*kb

Took thread 9 as an example, which was one of waiting threads.

0:000> ~9s
eax=00000000 ebx=00000000 ecx=00000400 edx=00000000 esi=000c0f94 edi=00000000
eip=77f88f13 esp=021d31e8 ebp=021d3258 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000202
NTDLL!NtWaitForSingleObject+0xb:
77f88f13 c20c00          ret     0Ch

0:009> kbL
ChildEBP RetAddr  Args to Child             
021d31e4 77f8f295 000011d8 00000000 00000000 NTDLL!NtWaitForSingleObject+0xb
021d3258 77f87f26 000c0f00 1a41abe8 000c0f94 NTDLL!RtlpWaitForCriticalSection+0x9e
021d3260 1a41abe8 000c0f94 021d3320 1a4071f8 NTDLL!RtlEnterCriticalSection+0x46
021d32ac 1a40ff92 000c0f88 0449ec08 021d32cc URLMON!CProtMgr::FindFirstCF+0x2c
021d32d0 1a415f67 000c0f20 0449ec08 00000000 URLMON!COInetSession::CreateHandler+0x2b
021d332c 1a416d10 00000000 0449ec08 07b48278 URLMON!CTransaction::LoadHandler+0x90
021d3354 1a416e11 07b480c8 00000016 00000000 URLMON!CTransaction::DispatchReport+0x50b
021d337c 1a416eac 07b480c8 07e2ed00 00000000 URLMON!CTransaction::DispatchPacket+0x2e
<snip>

0:009> dd 000c0f94 l4
000c0f94  000bde40 00000013 00000001 000014a0
0:009> ~~[000014a0]
  13  Id: 1464.14a0 Suspend: 0 Teb: 7ffa6000 Unfrozen
      Start: comsvcs!STAThread::STAThreadWorker (787c343e)
      Priority: 0  Priority class: 32  Affinity: ff

0:009> lmvm urlmon
start    end        module name
1a400000 1a47d000   URLMON     (private pdb symbols)  C:\symsrv\symbols\urlmon.pdb\48F658411\urlmon.pdb
    Loaded symbol image file: URLMON.DLL
    Image path: C:\WINNT\system32\URLMON.DLL
    Image name: URLMON.DLL
    Timestamp:        Wed Oct 15 16:53:21 2008 (48F65841)
    CheckSum:         000785A8
    ImageSize:        0007D000
    File version:     6.0.2800.1617
    Product version:  6.0.2800.1617
    File flags:       0 (Mask 3F)
    File OS:          40004 NT Win32
    File type:        2.0 Dll
    File date:        00000000.00000000
    Translations:     0409.04b0
    CompanyName:      Microsoft Corporation
    ProductName:      Microsoft® Windows® Operating System
    InternalName:     UrlMon.dll
    OriginalFilename: UrlMon.dll
    ProductVersion:   6.00.2800.1617
    FileVersion:      6.00.2800.1617
    FileDescription:  OLE32 Extensions for Win32
    LegalCopyright:   © Microsoft Corporation. All rights reserved.

The URLMON critical section was owned by thread 13 in the process. Looking at its stack we saw that thread was holding the lock while making an outbound COM call to a different thread in the same process. Additional inbound/outbound COM calls were piled up on the stack. On the top of the stack the outbound COM call for a MTA component was dispatched.

0:009> ~13s
eax=00000102 ebx=00000002 ecx=df010101 edx=00000000 esi=77f88ef8 edi=00000002
eip=77f88f03 esp=022d0530 ebp=022d057c iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246
NTDLL!ZwWaitForMultipleObjects+0xb:
77f88f03 c21400          ret     14h
0:013> KbL100
ChildEBP RetAddr  Args to Child             
022d052c 7c59a26d 00000002 022d0554 00000001 NTDLL!ZwWaitForMultipleObjects+0xb
022d057c 77e1e9db 022d0554 00000001 00000000 KERNEL32!WaitForMultipleObjectsEx+0xea
022d05d8 7ce7a494 022d05a4 022d0620 000003e8 USER32!MsgWaitForMultipleObjectsEx+0x153
022d0600 7ce846a2 022d0620 000003e8 022d0630 OLE32!CCliModalLoop::BlockFn+0x79
022d0628 7ce846e6 ffffffff 00086448 7cef6a47 OLE32!ModalLoop+0x59
022d0634 7cef6a47 00086448 022d068c 0591b914 OLE32!SwitchSTA+0x1e
022d0650 7cef3ab6 022d068c 0591b914 022d06c8 OLE32!CRpcChannelBuffer::SwitchAptAndDispatchCall+0xdd
022d0690 7ce7a2ff 0591b914 022d07cc 022d0788 OLE32!CRpcChannelBuffer::SendReceive2+0x96
<snip>
022d29ac 7cee0f73 7ceff694 00000001 022d3374 OLE32!CClientContextActivator::GetClassObject+0x75
022d29c0 7ce9196a 022d2da0 022d3374 003a0043 OLE32!ActivationPropertiesIn::DelegateGetClassObject+0x46
022d3380 7ce8baaf 00091f7c 00000001 00000000 OLE32!ICoGetClassObject+0x2a5
022d33ac 7ce60ebb 022d348c 00000001 00000000 OLE32!CComActivator::DoGetClassObject+0x6f
022d33cc 1a41acb9 022d348c 00000001 00000000 OLE32!CoGetClassObject+0x19
022d3428 1a40ff92 00000000 059ffdc8 022d3448 URLMON!CProtMgr::FindFirstCF+0xfd
<snip>
022d4b98 72e73710 0824a6b8 081b93b0 00000000 URLMON!CUrlMon::BindToStorage+0x50
022d4be4 72e72bdd 0824a6b8 00000000 00000000 msxml3!URLStream::OpenURL+0x3b3
022d4c00 72e72b26 022d4c1c 00000000 04a75170 msxml3!URLStream::Open+0x8f
022d4c34 72e729d9 00000000 04a75270 00000001 msxml3!XMLParser::PushURL+0x146
022d4c60 72e629b4 043df1cb 00000000 04a75270 msxml3!XMLParser::SetURL+0x47
022d4ce4 72e73251 00000001 00000000 00000000 msxml3!Document::_load+0x12a
022d4d00 72e73133 04a75260 00000001 022d4ea8 msxml3!Document::load+0x44
022d4d80 011c4cfc 049e70b0 cccc0008 cccccccc msxml3!DOMDocumentWrapper::load+0x1cc
<snip>

Used the dd command to deference the memory address 022d068c on OLE32!SwitchSTA above.

0:013> dd 022d068c l1
022d068c  07d93630

The call from the owner (~13) of the critical section was dispatched to MTA thread 54 because it was the only thread that 07d93630 appears on the stack.  Thread 54 was also waiting for the lock. The owner stack could not unwind and the lock would never be released until the process was restarted. It was a DEADLOCK!

0:013> ~54s
eax=00000000 ebx=00000000 ecx=00007ece edx=00000000 esi=000c0f94 edi=00000000
eip=77f88f13 esp=04169434 ebp=041694a4 iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000206
NTDLL!NtWaitForSingleObject+0xb:
77f88f13 c20c00          ret     0Ch
0:054> kbL100
ChildEBP RetAddr  Args to Child             
04169430 77f8f295 000011d8 00000000 00000000 NTDLL!NtWaitForSingleObject+0xb
041694a4 77f87f26 000c0f00 1a41abe8 000c0f94 NTDLL!RtlpWaitForCriticalSection+0x9e
041694ac 1a41abe8 000c0f94 0416956c 1a4071f8 NTDLL!RtlEnterCriticalSection+0x46
041694f8 1a40ff92 000c0f88 043a8948 04169518 URLMON!CProtMgr::FindFirstCF+0x2c
<snip>
0416acb4 72e72bdd 07ff7838 00000000 00000000 msxml3!URLStream::OpenURL+0x3b3
0416acd0 72e72b26 0416acec 00000000 04a3bf68 msxml3!URLStream::Open+0x8f
0416ad04 72e729d9 00000000 01999710 00000001 msxml3!XMLParser::PushURL+0x146
0416ad30 72e629b4 08512543 00000000 01999710 msxml3!XMLParser::SetURL+0x47
0416adb4 72e73251 00000001 00000000 00000000 msxml3!Document::_load+0x12a
0416add0 72e73133 01999700 00000001 0416af78 msxml3!Document::load+0x44
0416ae50 011c4cfc 049e7f28 cccc0008 cccccccc msxml3!DOMDocumentWrapper::load+0x1cc
<snip>
0416fe54 7ce7a72f 07ac00fc 00000000 081f1340 OLE32!CCtxComChnl::ContextInvoke+0xbb
0416fe70 7cef54e2 07d93680 00000001 081f1340 OLE32!MTAInvoke+0x18
0416fea0 7cef5c06 07d93630 07ac00fc 081f1340 OLE32!AppInvoke+0xb5
0416ff60 7cef5950 04363b64 00000000 00000000 OLE32!ComInvokeWithLockAndIPID+0x297
0416ff80 7ce84714 07d93630 7c59a031 07f2efe8 OLE32!ComInvoke+0x41
0416ff90 7ce85bc5 07d93630 7ce20000 07f2efe8 OLE32!ThreadDispatch+0x21
0416ffa8 7ce85b1d 77d9138f 08005f88 7c57b3bc OLE32!CRpcThread::WorkerLoop+0x22
0416ffb4 7c57b3bc 07f2efe8 77d9138f 08005f88 OLE32!CRpcThreadCache::RpcWorkerThreadEntry+0x1a
0416ffec 00000000 7ce85b03 07f2efe8 00000000 KERNEL32!BaseThreadStart+0x52

Resolution:

It is a known URLMON bug that was fixed in IE7 ONLY. The bug here is holding a lock across the COM activation/call. To workaroud issue we passed the data as a string to xmlDOC->LoadXML(). LoadXMLdoes not use URLMON. If you run into the same bug on Windows 2000 SP4, you can try a similar workaround.

Problem Description

We saw slow Web response and high CPU in a 3-tier application under load. The application consisted of an IIS Web application front end, COM+ applications on the app server and the DB application on the backend.

Cause

To identify the cause of the performance problem we took several hang dumps of the RPCSS service(svchost.exe) using ADPlus on the app server during the period of slow performance. Alternatively we could use Debug Diagnostic Tool v1.1.

Opened the RPCSS dump in WinDbg.exe and set the symbol path as below:

0:000> .sympath SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols
Symbol search path is: SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols
0:000> .reload

Used ~*kb to display all thread stacks. Many threads were seen to be backed up acquiring the rpcss!CRWLock.

0:000> ~*kb

Took thread 14 as an example, which was one of waiting threads.

0:000> ~14s
eax=00f3f632 ebx=13001000 ecx=00000003 edx=00f30010 esi=00000188 edi=00000000
eip=7c8285ec esp=00f3f344 ebp=00f3f3b4 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246
ntdll!KiFastSystemCallRet:
7c8285ec c3              ret
0:014> kL
ChildEBP RetAddr 
00f3f340 7c827d0b ntdll!KiFastSystemCallRet
00f3f344 77e61d1e ntdll!NtWaitForSingleObject+0xc
00f3f3b4 77e61c8d kernel32!WaitForSingleObjectEx+0xac
00f3f3c8 7fda9a2f kernel32!WaitForSingleObject+0x12
00f3f3f8 7fd8b1d1 rpcss!CRWLock::AcquireWriterLock+0x155
00f3f4ac 7fd8ab20 rpcss!CComClassInfo::GetClassContext+0x189
00f3f4dc 7fd89e16 rpcss!CComRegCatalog::CheckForceHKLMForClass+0x2d
00f3f660 7fd8a1fa rpcss!CComRegCatalog::GetClassInfoW+0x293
00f3f6dc 7fd89feb rpcss!CComCatalog::GetClassInfoInternal+0x41d
00f3f700 7fd8f146 rpcss!CComCatalog::GetClassInfoW+0x22
00f3f720 7fd8f0d4 rpcss!GetClassInfoFromClsid+0x2d
00f3f760 7fd8f36b rpcss!ActivationPropertiesIn::UnmarshalInterface+0xcb
00f3f7ac 7fd920cd rpcss!PerformScmStage+0xa1
00f3f8c8 77c80193 rpcss!SCMActivatorCreateInstance+0xdd
00f3f8f8 77ce33e1 rpcrt4!Invoke+0x30
00f3fcf8 77ce35c4 rpcrt4!NdrStubCall2+0x299
00f3fd14 77c7ff7a rpcrt4!NdrServerCall2+0x19
00f3fd48 77c8042d rpcrt4!DispatchToStubInCNoAvrf+0x38
00f3fd9c 77c80353 rpcrt4!RPC_INTERFACE::DispatchToStubWorker+0x11f
00f3fdc0 77c811dc rpcrt4!RPC_INTERFACE::DispatchToStub+0xa3

The owner of the rpcss!CRWLock was thread 87 in this case, which was reading COM class information from the registry.

0:014> ~87s
eax=00000000 ebx=0118c1a8 ecx=01f4f144 edx=7c8285ec esi=0118c0f0 edi=00000000
eip=7c8285ec esp=01f4f564 ebp=01f4f5a4 iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000206
ntdll!KiFastSystemCallRet:
7c8285ec c3              ret
0:087> kL
ChildEBP RetAddr 
01f4f560 7c827d7b ntdll!KiFastSystemCallRet
01f4f56c 7fd8aba0 ntdll!ZwYieldExecution+0xc
01f4f5a4 7fd90150 rpcss!GetRegistryStringValue+0x3d
01f4f66c 7fd8a243 rpcss!CComClassInfo::GetClassContext+0x29d
01f4f6dc 7fd89feb rpcss!CComCatalog::GetClassInfoInternal+0x45e
01f4f700 7fd8f146 rpcss!CComCatalog::GetClassInfoW+0x22
01f4f720 7fd8f0d4 rpcss!GetClassInfoFromClsid+0x2d
01f4f760 7fd8f36b rpcss!ActivationPropertiesIn::UnmarshalInterface+0xcb
01f4f7ac 7fd920cd rpcss!PerformScmStage+0xa1
01f4f8c8 77c80193 rpcss!SCMActivatorCreateInstance+0xdd
01f4f8f8 77ce33e1 rpcrt4!Invoke+0x30
01f4fcf8 77ce35c4 rpcrt4!NdrStubCall2+0x299
01f4fd14 77c7ff7a rpcrt4!NdrServerCall2+0x19
01f4fd48 77c8042d rpcrt4!DispatchToStubInCNoAvrf+0x38
01f4fd9c 77c80353 rpcrt4!RPC_INTERFACE::DispatchToStubWorker+0x11f
01f4fdc0 77c811dc rpcrt4!RPC_INTERFACE::DispatchToStub+0xa3
01f4fdfc 77c812f0 rpcrt4!LRPC_SCALL::DealWithRequestMessage+0x42c
01f4fe20 77c88678 rpcrt4!LRPC_ADDRESS::DealWithLRPCRequest+0x127
01f4ff84 77c88792 rpcrt4!LRPC_ADDRESS::ReceiveLotsaCalls+0x430
01f4ff8c 77c8872d rpcrt4!RecvLotsaCallsWrapper+0xd

Resolution

The owner of the rpcss!CRWLock was accessing the registry. If the access was slowed down for some reason such as anti-virus software scanning, threads in RPCSS could be backed up while acquiring the lock. We re-enabled RPCSS caching, which brought the performance up tremendously.

HKLM\Software\Microsoft\OLE
EnableCatalogCachingForBaseCOM
REG_DWORD
1

911650 How to re-enable the RPCSS caching of COM class information in Windows Server 2003 SP1, in Windows XP SP2, and in Windows 2000 Post-Service Pack 4

Problem Description

An application calls CreateObject to activate a COM+ component or a DCOM server component which runs on a remote machine. The call to the CreateObject may fail with the error 80070721 which translates to "A security package specific error occurred." Both machines are Windows server 2008 or Vista.

Cause

This happens because when the client requests a Kerberos ticket for the COM+ application or the DCOM service it uses a user name as the SPN (Service Principal Name). The KDC (Key Distribution Center) finds the account but it does not have any SPN.

If you get a network trace with Microsoft Network Monitor you will see the Kerberos error in the trace similar to this:

172.17.10.21 172.17.10.17 KerberosV5 KerberosV5:TGS Request Realm: <domain> Sname: DCOMServiceAccount
172.17.10.17 172.17.10.21 KerberosV5 KerberosV5:KRB_ERROR - KDC_ERR_S_PRINCIPAL_UNKNOWN (7)

DCOMServiceAccount is the identity of the COM+ application or the DCOM server application.

Resolution

Registering the following SPN's should workaround the issue:

setspn -A DCOMService/DCOMServer Domain\DCOMServiceAccount
setspn -A DCOMService/DCOMServerFQDN Domain\DCOMServiceAccount

DCOMService:
The name of the COM+ application or the DCOM server application.

DCOMServer:
The NetBIOS name of the server machine where the COM+ application or the DCOM component resides.

DCOMServerFQDN:
The Fully Qualified Domain Name of the server machine where the COM+ application or the DCOM component resides.

DCOMServiceAccount:
The identity of the COM+ application or the DCOM server application..

To run setspn.exe to register a SPN for a domain account, you need to be a domain admin. Setspn.exe can be run on any machine in the domain.

Problem

On windows 2000 SP4 machines, you may experience AVs (C0000005 exceptions)  in SQL Server or any application using msdtcprx.dll after installing the update MS09-012. This is seen only on the machines with ole32.dll version earlier than 5.0.2195.7036

You will not see this issue if the server has Microsoft security bulletin MS06-018 OR COM+ rollup 32 OR subsequent rollups

Workaround

To avoid running into this issue, install MS06-018 OR COM+ rollup 37 prior to installing MS09-012

More information

Due to a code change in msdtcprx.dll, it was running into the following known issue:

An access violation (C0000005) occurs if you call the CoWaitForMultipleHandles function before you call either the CoInitialize function or the CoInitializeEx function. This is first addressed in COM+ roll up 32

Below is the call stack of the AV in SQL Server

0:000> kL 100
ChildEBP RetAddr
061df560 03c30136 OLE32!CoWaitForMultipleHandles+0x4e
061df598 03c3f96c msdtcprx!CITransaction::InitForImport+0x1a6
061df5c0 009a2f53 msdtcprx!CITransactionImport::Import+0xec
061df808 009a831a sqlservr!CDTCState::ImportXact+0xe2
061df828 00796d44 sqlservr!FullXact::GetDTCXactByPropInfo+0x1e
061df884 009a27c8 sqlservr!CMsqlXact::EnlistDTCXact+0xce
061df8a8 009a41a2 sqlservr!CMsqlXact::EnlistDTCXact+0x3a
061dfa48 009a52d2 sqlservr!rm_ods_handler_body+0xa9a
061dfa7c 00425ace sqlservr!rm_ods_handler+0x67
061dfefc 41075309 sqlservr!process_commands+0x171
061dff6c 41074978 ums!ProcessWorkRequests+0x2d9
061dff80 7c34940f ums!ThreadStartRoutine+0x98
061dffb4 7c57b3bc msvcr71!_threadstart+0x6c
061dffec 00000000 KERNEL32!BaseThreadStart+0x52

To get the latest version of ole32, install COM+ rollup 37

Warning You should not enable tracing functionality on production systems unless a Microsoft support representative indicates that the tracing information is required to diagnose an issue. The performance of the computer may be affected. As soon as the reason for the issue has been found and resolved, you should disable tracing functionality. By default, tracing functionality is disabled. Therefore, there is no performance effect on a regular installation

 

Event Tracing for Windows[ETW] provides a mechanism to trace and log events that are raised by user-mode applications and kernel-mode drivers. In Microsoft Windows Server 2003 SP2, Windows Vista and newer Operating Systems, you can use ETW to trace events for some COM+ components and for some base COM components. COM tracing can help troubleshooting issues related to registration errors (such as 80004002, 80040154, 80040155 etc.,) OR Access Denied's (80070005), Activation Failures etc.,

How to enable COM+ and COM diagnostic tracing
http://support.microsoft.com/default.aspx?scid=kb;EN-US;926098

 

In the above KB article, we discuss how to start and stop tracing using logman.exe. Logman.exe is shipped with the Operating System. However, on some Windows 2003 systems, logman may not work and return an error  “collection does not exist”. We are investigating this issue.

 If you run into this problem, please use tracelog.exe that is shipped with Windows Resource Kit or Windows DDK. More info on tracelog at http://msdn.microsoft.com/en-us/library/ms797927.aspx

 

If you are tracing ole32, make sure to enable tracing. Please follow the steps to enable ole32 provider tracing from Q926098

Starting tracing using tracelog:

tracelog -start OLE32 -guid c:\ole32.ctl  -flags 15 -f c:\OLE32.etl

tracelog -start DCOMSCM -guid c:\dcommscm.ctl  -flags 15 -f c:\DCOMSCM.etl

tracelog -start COMSVCS -guid c:\comsvcs.ctl  -flags 15 -f c:\comsvcs.etl

tracelog -start COMADMIN -guid c:\comadmin.ctl  -flags 15 -f c:\comadmin.etl

                       

 

Note the .ctl files just contain the GUID for the specific provider. You can create a text file and rename it to .ctl file and past the GUID of the specific provider

 

The ole32 GUID is bda92ae8-9f11-4d49-ba1d-a4c2abca692e

The DCOMSCM GUID is 9474a749-a98d-4f52-9f45-5b20247e4f01

The COMSVCS GUID is b46fa1ad-b22d-4362-b072-9f5ba07b046d

The COMADMIN GUID is a0c4702b-51f7-4ea9-9c74-e39952c694b8

 

 

For example, ole32.ctl file would contain just the following content

 

 b46fa1ad-b22d-4362-b072-9f5ba07b046d

 

 

Stopping tracing:

 

After running the repro, you can stop tracing using:

 

tracelog –stop OLE32

tracelog –stop DCOMSCM

tracelog –stop COMSVCS

tracelog –stop COMADMIN

 

 

Analyzing traces:

Currently formatting COM traces relies on private symbols, so please contact Microsoft Customer Service Support to have the files formatted into text files.

 

 

Posted Friday, June 19, 2009 6:19 PM by SudhaThota | 0 Comments
Filed under:

Attachment(s): ole32.ctl
More Posts Next page »
 
Page view tracker