Welcome to MSDN Blogs Sign in | Join | Help

This is one of a series of blogs that I plan to write on troubleshooting content deployment issues. Here’s a scenario

The destination farm is running on a Windows Server 2008 computer. When deploying to the destination, the content deployment fails with the following error:

The remote server returned an error: (404)

The first thing to remember is that any errors related to content deployment will be reported to the application event log. This is the default diagnostic logging setting for Content Deployment category. Any medium events will also be reported to the ULS logs. Therefore the first place to find out why the deployment is failing is to look at the Application event log of the export server.

In my case, here’s what I found from application event logs:

Failed to transfer files to destination server for Content Deployment job '80 - 500 Full Deployment'. Exception was: 'System.Net.WebException: The remote server returned an error: (404) Not Found. at System.Net.HttpWebRequest.GetResponse()
at Microsoft.SharePoint.Publishing.Internal.Administration.HttpDataTransfer.UploadFile(String sourceFilePath, String postToUrl, String& responseXml)
at Microsoft.SharePoint.Publishing.Administration.ContentDeploymentJob.UploadFilesToRemoteServer(Guid remoteJobId, String adminPortUrl, ArrayList dataFiles)

at Microsoft.SharePoint.Publishing.Administration.ContentDeploymentJob.DoServerToServer()'


Publishing: Content deployment job failed. Error: 'System.Net.WebException: The remote server returned an error: (404) Not Found.
at System.Net.HttpWebRequest.GetResponse()
at Microsoft.SharePoint.Publishing.Internal.Administration.HttpDataTransfer.UploadFile(String sourceFilePath, String postToUrl, String& responseXml)
at Microsoft.SharePoint.Publishing.Administration.ContentDeploymentJob.UploadFilesToRemoteServer(Guid remoteJobId, String adminPortUrl, ArrayList dataFiles)
at Microsoft.SharePoint.Publishing.Administration.ContentDeploymentJob.DoServerToServer()
at Microsoft.SharePoint.Publishing.Administration.ContentDeploymentJob.ExecuteJob()
at Microsoft.SharePoint.Publishing.Administration.ContentDeploymentJob.Run(Boolean runAsynchronously)'

Publishing: Content deployment job failed. Error: 'System.Net.WebException: The remote server returned an error: (404) Not Found.
at System.Net.HttpWebRequest.GetResponse()
at Microsoft.SharePoint.Publishing.Internal.Administration.HttpDataTransfer.UploadFile(String sourceFilePath, String postToUrl, String& responseXml)
at Microsoft.SharePoint.Publishing.Administration.ContentDeploymentJob.UploadFilesToRemoteServer(Guid remoteJobId, String adminPortUrl, ArrayList dataFiles)
at Microsoft.SharePoint.Publishing.Administration.ContentDeploymentJob.DoServerToServer()
at Microsoft.SharePoint.Publishing.Administration.ContentDeploymentJob.ExecuteJob()
at Microsoft.SharePoint.Publishing.Administration.ContentDeploymentJob.Run(Boolean runAsynchronously)
at Microsoft.SharePoint.Publishing.Administration.ContentDeploymentJobDefinition.Execute(Guid targetInstanceId)'

The Execute method of job definition Microsoft.SharePoint.Publishing.Administration.ContentDeploymentJobDefinition (ID 70d6c4ef-bda5-4b5e-84a0-8c07e4ec3e42) threw an exception. More information is included below.

So as we can see, the Exception was - The remote server returned an error: (404) Not Found. Where do we start troubleshooting? We should start by understanding what status code 404 means and what type of operation we were in when this failure occurred. The second part of the question can be easily answered by looking at the above callstack. At the top of the callstack is HttpDataTransfer.UploadFile() method. This means were were uploading a file to the target farm when the error occurred.

Info: Content deployment transfers contents to the destination by splitting them into multiple cab files and then uploading those files to a page in the target central administration site. Thus if we open the IIS log for the target central administration website, we should see the failure and more specific information about HTTP 404 response. Here’s what I found

POST /_admin/Content+Deployment/DeploymentUpload.aspx filename=%22ExportedFiles1.cab%22&remoteJobId=%223c392e22-67d9-46cb-b6c3-225cade93479%22 81 - 404 13 0 20

The salient parts in this IIS log entry are highlighted in yellow colour. The value 13 right after 404 is a sub-status code that gives more information about the type of failure. The IIS status codes and their meanings can be found here

So from this article, we end up with: 404.13 - Content length too large

So what is the size of the file ExportedFiles1.cab that we are trying to upload to the destination? We can find this out easily by opening the path to the temporary files on the source (Export Server) as configured in the content deployment settings within Central Administration. By default, this path is %windir%\TEMP\ContentDeployment. Now there is a catch here. By default, content deployment removes the temporary files. You must first configure content deployment to keep the temporary files in the event of a failure. This setting cannot be changed from the User Interface. The correct procedure to change the setting is to use STSADM command line

STSADM.EXE -o editcontentdeploymentpath -pathname "MyPath" -keeptemporaryfiles Failure

After making this change in setting, on the next run, the temporary files will not be removed and you can easily find out the size of the file it was trying to upload by opening this folder. In my case, it turned out to be 61 MB. So why can’t I upload a file of this size? It was possible to do this in Windows Server 2003! Also, who is throwing this error? These are the next set of questions we need to find answers to.

As I just mentioned, it is very important to figure out who is throwing this error – Is it IIS or Is it SharePoint? We know this behaviour was not present in Windows 2003 and therefore a likely chance that IIS 7, that ships with Windows 2008 is a cause here. Also, a little bit of IIS domain knowledge will help you very easily conclude where the problem is. If you do not have the domain knowledge, you can still troubleshoot using the Failed Request Tracing feature of IIS to pin point where exactly the failure occurred. Failed request tracing is a different subject altogether and if you are interested to know how to use this feature, here’s the official documentation.

I know from experience that IIS 7 has a change where by we limit the max size of a request. The feature implements something like what URLScan did for us in the previous versions. Here is the documentation on requestLimits on IIS 7. So reading through, I can see that this module in IIS 7 can return 404.13 status and the size of request allowed is specified by maxAllowedContentLength attribute with a default value of 30000000 bytes, which is approximately 28.6MB.

So now we have identified why the server returned this status code and the solution is to simply increase this limit. Here’s how to do that using the new command line feature of IIS 7

On the destination farm, Open a command prompt and switch to C:\Windows\System32\inetsrv folder
Execute:
appcmd set config "SharePoint Central Administration v3" -section:system.webserver/security/requestFiltering -requestLimits.MaxAllowedContentLength:524288000

 

I was recently rebuilding one of my farms when I ran into the problem I describe below.

I logged onto the machine using my domain account. Then created local user accounts for use with my standalone farm on this machine and then ran SharePoint Post Installation Configuration Wizard. After creating a new farm with a new configuration database, it also created the SharePoint Central Administration site for me automatically. I then attempted to provision my Shared Services Provider and guess what I run into? After several minutes, I get a page with the following error:

Provisioning for Shared Services Provider 'My SSP' has failed and will be retried.  Reason: User cannot be found. 

The error message is very odd! It doesn’t tell us which user it cannot find and what it wants to do. So I turn on our friendly diagnostic logging from the Operations tab in SharePoint Central Administration to get a verbose output of the actions. What I ended up finding was amusing. Although I didn’t completely understand why it works this way, here’s what I found in the ULS logs. I am including only the data in the message column for brevity:

Retrieved central administration site 'http://CASite:81'.    
Central administration site owner is 'Domain\UserName'.    
Creating shared services administration site 'My SSP'.    
Creating site ssp/admin in content database MySite_Content_82    
Error in resolving user 'Domain\UserName' :

System.ComponentModel.Win32Exception: Unable to contact the global catalog server    

at Microsoft.SharePoint.Utilities.SPActiveDirectoryDomain.GetDirectorySearcher()    

at Microsoft.SharePoint.WebControls.PeopleEditor.SearchFromGC(SPActiveDirectoryDomain domain, String strFilter, String[] rgstrProp, Int32 nTimeout, Int32 nSizeLimit, SPUserCollection spUsers, ArrayList& rgResults)    

at Microsoft.SharePoint.Utilities.SPUserUtility.ResolveAgainstAD(String input, Boolean inputIsEmailOnly, SPActiveDirectoryDomain globalCatalog, SPPrincipalType scopes, SPUserCollection usersContainer, TimeSpan searchTimeout, String customFilter)    

at Microsoft.SharePoint.Utilities.SPActiveDirectoryPrincipalResolver.ResolvePrincipal(String input, Boolean inputIsEm...    

Microsoft.SharePoint.SPException: User cannot be found.    

at Microsoft.SharePoint.Administration.SPSiteCollection.Add(SPContentDatabase database, String siteUrl, String title, String description, UInt32 nLCID, String webTemplate, String ownerLogin, String ownerName, String ownerEmail, String secondaryContactLogin, String secondaryContactName, String secondaryContactEmail, String quotaTemplate, String sscRootWebUrl, Boolean useHostHeaderAsSiteName)    

at Microsoft.SharePoint.Administration.SPSiteCollection.Add(String siteUrl, String title, String description, UInt32 nLCID, String webTemplate, String ownerLogin, String ownerName, String ownerEmail, String secondaryContactLogin, String secondaryContactName, String secondaryContactEmail, Boolean useHostHeaderAsSiteName)    

at Microsoft.Sh...    

So it turns out that while provisioning the Shared Services, there is a lookup on who the Site collection administrator is for the Central Administration Website. This makes sense because it needs to update all these settings for SSP database. While provisioning the Central Administration site, it used my domain account for the site collection administrator. Now, during the provisioning of the Shared Services, it was trying to resolve the account specified in the Central Administration site’s site collection administrators list, but failed to do so as it could not contact the global catalog server, resulting in this behaviour.

Resolution:

  1. Open Central Administration and then click on Site Actions in top right corner, then site settings.
  2. Under users and permissions column, click on Site Collection Administrators.
  3. Removed my domain account from here and added the local Administrator account which I wanted to use for administering the local farm.
  4. Click OK and wait for the Shared Services to be provisioned automatically (SharePoint will periodically re-attempt a failed provisioning of SSP)

 

I use the “Add to Favorites” feature in Internet Explorer a lot. After moving to Windows Vista, I ran into a problem where I’d get an Access Denied error, anytime I attempted to create a folder within the Favourites. I was curious to find out why I can no longer create folders in my favourites. So I decided to begin with basics. I checked the permissions in my User Profile\Favourites folder and found I had Full Control. I also attempted to create a folder using the file system which was successful. Now comes our tools to figure out why Iexplore process cannot create the folder on my behalf. To figure this out, I used Process Monitor. I captured a process monitor output by filtering on iexplore.exe process and here’s what I found:

Fav

As you can see from the event captured, the iexplore.exe process is running with Low Integrity level. Integrity levels are a new feature within Windows Vista and upwards. So if a process has a low integrity level, irrespective of whether the identity is part of local administrators group, operations like these on file system will still fail. Understandably, this is a security feature that I like and dislike at the same time. I like it because it prevents any website that I browse to from running code on my system that may be harmful. On the other hand I cannot save links to my favourites, which is not good either.

Weighing both options, I think I will stick to running iexplore process in low integrity level and save the links manually. The other option is to run iexplore.exe process as Administrator by right clicking on the shortcut and selecting “Run as Administrator”.

CAUTION: I do not recommend you do this at all because this runs the browser with elevated privileges and therefore it will be very easy for any malicious code to infect the machine.

Ever run into a situation where the people picker in SharePoint will fail to resolve usernames that are within a domain in a totally different forest? Assuming the trust relationships are setup properly? Well think again. Here is a quick check list:

  • Ensure your people picker property is configured correctly in SharePoint.
  • Configure your trust relationships properly.
  • Ensure the ports required for inter server communication is opened. A list can be found <insert hyperlink>
  • Ensure your DNS configuration is correct. This is specifically important because the web server will need to locate the Global Catalog Servers and the Domain Controllers in the source & target domains.

Each of the above points is in itself a big task. A failure in any one of these dependent components will cause people picker to fail. To allow SharePoint to query AD of a different domain, you need to configure it to use a specific account from the trusted domain. Here’s how you do that using STSADM command line

  • stsadm –o setapppassword –password ********
    stsadm –o setproperty –pn peoplepicker-searchadforests –pv “domain:FQDN of trusted domain,Account in trusted domain,Password” –url URL of App

    Recycle AppPool for changes to take effect (Optional)

Alright. So now we know the basic check lists, we need to know how this works normally before you can troubleshoot any issues. In other words, unless you know what is normal, you cannot spot the abnormal?

Here’s a description of how PeoplePicker works in SharePoint.

Web server contacts one of the DCs in its domain and requests a SID lookup using the Windows API LsarLookupNames4. The LsarLookupNames4 method translates a batch of security principal names to their SID form. This traffic is encrypted and the Web server and domain controller talks via RPC. The RPC end point mapper is a UUID: E1AF8308-5D1F-11C9-91A4-08002B14A0FA. Now because this is initiated from LSASS, the LSARpc identifier is 12345778-1234-ABCD-EF00-0123456789AB. You should see both of these in a network trace. A successful request/response indicates that that RPC communication is successful.

So with LsarLookupNames4 API, we should get a SID. The next thing that happens is an LDAP query trying to lookup this SID and see if the name matches with what the user entered. To perform this, you need to have Kerberos traffic flowing properly. If Kerberos is working properly, you should also see that traffic just before the LDAP query with the username that you configured within SharePoint. After Kerberos authentication, SharePoint server then sends the LDAP query to one of the DCs in the trusted domains and does a search - something like:

LDAP:Search Request, MessageID: 26, BaseObject: DC=SharePoint,DC=com, SearchScope: WholeSubtree, SearchAlias: neverDerefAliases
LDAP:Search Result Entry, MessageID: 26, Status: Success

A filter is also passed based to indicate search based on the SID. Filter: (&(objectSID=))

The search result contains the properties requested for the user including the user’s SID. If everything matches, then we are done and the user’s full name should be displayed.

So that’s how it is “expected” to work. But most of the times when a support engineer is looking at the problem, he will not find the above traffic. Instead he is looking at the traffic in the broken scenario and there may be several reasons why the feature is not able to find the user. For eg:

  • What if the trust relationship is not setup properly? Can we verify that using a network trace? Is it possible?
  • What if the MSRPC is broken? Can we determine that using a network trace?
  • What if the DNS entries are not setup properly? Can you determine that using the network trace?

So well, the answer is “depends”. A lot of times you can make a good conclusion depending on what you see in the network trace if you have domain specific knowledge.

  • If you never see the MSRPC bind requests getting a success response, chances are that the trust is not setup properly.
  • If you do not see Kerberos traffic or connecting with the username specified in SharePoint, then your SharePoint configuration is probably not correct.
  • If you see DNS related errors in the network trace (filter by DNS traffic), then your DNS is probably broken and needs to be fixed.

Obviously, what needs to be fixed depends on what his broken. No matter what works and what does not, at the end of the day, if you performed a Check Name operation within People Picker, we must match the user with a SID. To do that, SharePoint goes to great lengths. SharePoint will attempt a query based on Person or Group and also perform a wild card search. Here’s an example of filters used that you may see in the network trace for LDAP queries:

First Attempt:

filter: (objectCategory=person)
filter: (objectClass=user)
filter: (!(BIT_AND: (userAccountControl)&2))
filter: (|(name=Sharepoint\Skumar)(displayName=Sharepoint\Skumar)(cn=Sharepoint\Skumar)(mail=Sharepoint\Skumar)(samAccountName=Sharepoint\Skumar)(proxyAddresses=SMTP:Sharepoint\Skumar)(proxyAddresses=sip:Sharepoint\Skumar))

filter: (objectCategory=group)
filter: (BIT_AND: (groupType)&2147483648)
filter: (|(name=Sharepoint\Skumar)(displayName=Sharepoint\Skumar)(cn=Sharepoint\Skumar)(samAccountName=Sharepoint\Skumar))

Result: None

Second Attempt

filter: (objectCategory=person)
filter: (objectClass=user)
filter: (!(BIT_AND: (userAccountControl)&2))
filter: (|(name=Sharepoint\Skumar*)(displayName=Sharepoint\Skumar*)(cn=Sharepoint\Skumar*)(mail=Sharepoint\Skumar*)(sn=Sharepoint\Skumar*)(SamAccountName=Skumar*)(proxyAddresses=SMTP:Sharepoint\Skumar)(proxyAddresses=sip:Sharepoint\Skumar))

filter: (objectCategory=group)
filter: (BIT_AND: (groupType)&2147483648)
filter: (|(name=Sharepoint\Skumar*)(displayname=Sharepoint\Skumar*)(cn=Sharepoint\Skumar*)(SamAccountName=Skumar*))

Chances are that you may get back a response with a wild card search – as in the second case on my machine, because an OR search on SamAccountName=SKumar* found a record but not with SamAccountName=SharePoint\SKumar as in the first case. However, what happens right after that is, the system will pick the SID from the response (if any) and attempt a match with the SID. If that fails, Check Name operation will throw an error that it could not find the user. So the key to getting that to work is ensure that we can perform a SID lookup successfully.

So what tools are there to check if we can resolve the SIDs?

Microsoft Support uses from PSGetSID from SysInternals. It is one of the tools you can use to verify if the SID lookup is working properly in your environment. & it is really easy to use. From a command line, run: PSGetSid <domain\username>.

If this tool fails to get the SID, ignore the SharePoint part and focus on fixing your environment first. 

Have you used this feature before? I see a lot of people on the internet using this feature and there are a couple of scenarios in which this feature won’t work “as expected”. I recently had a chance to spend some time on this and here are my observations after hours of testing.

If you attempt copying a document from one document library to another and the source and target libraries are in different web applications, then this feature won’t work with Office 2003 clients and MOSS 2007. My guess is that this behaviour occurs because, when Office 2003 was released, it wasn’t built to support features that MOSS 2007 provides. MOSS 2007 was released 4 years later!

This feature will work with Office 2007 clients, though. However if you are copying from one library to another which are located in different site collections AND there does not exist a root site collection at the target, then this feature won’t work. If you see these symptoms, creating an empty root site collection should resolve the problem. It appears to me that it may have something to do with not being able to find the path to the child if the root site collection does not exist.

The way the Send to... Other location works is by using a ActiveX Control (Multiple Document Upload Control) installed with Microsoft Office 2007. After you provide the destination URL (BTW, the destination URL must end with the library name and not the view name such as AllForms.aspx) and click OK, the Jscript on the web page invokes this control with a call to new ActiveXObject("STSUpld.CopyCtl");. To successfully instantiate the control, the following registry keys must be present.

  • HKCR\STSUpld.CopyCtl
  • HKCR\STSUpld.CopyCtl.CLSID\(Default)
  • HKCR\STSUpld.CopyCtl.CLSID
The control itself is located in the DLL present @ C:\Program Files\Microsoft Office\Office12\STSUPLD.DLL

When instantiated, this control contacts a web service on the SharePoint server to copy the contents from source to destination. It first makes a request to the source to get the document and its properties and then POSTs the data to the target location. Now, if the Multiple Upload control cannot be loaded for some reason, SharePoint will attempt to perform this operation via server side by sending a POST to CopyResults.aspx. This is what you see in the browser address bar. Now, the CopyResults.aspx page is capable of copying of documents only if the source and destination are in the same web application. The functionality appears to be by design – In other words, the page restricts copying across application boundaries. The API that causes this behaviour is also documented on MSDN.

ValidateDomainCompatibility Method

Also, the posting and responses from the Program Manager @ the SharePoint Designer Team blog has mentioned that calling Cross applications is not allowed. Here’s the . FYI, there isn’t a setting to change this behaviour. This is probably something that the development team knows about and may fix in the later versions of MOSS, hopefully.

When application domain boundaries need to be crossed, only the web service - Copy.asmx works because it completely extracts all the contents (encrypted of course!) and the properties of the document before posting it to the target document library. I am still not sure why this functionality cannot work from server side code though; but that appears to be what is going on.

Coming back to problems instantiating the Multiple Upload control installed with Office 2007 clients, the best way to fix it will be to run a repair installation of Microsoft Office because the DLL in question is not something you can register using regsvr command. Also, when running the Office Setup, ensure Windows SharePoint Services support component is selected.

I hope the information shared above was helpful to people who have/are running into this behaviour.

 

 

I recently ran into a problem where custom SharePoint solutions will randomly fail to deploy on any of the web servers in the farm. The nature of the problem was hard to debug. The failure can occur on any web server and on any resource file being deployed or when the solution is being retracted. The error message would be as follows:

"Copying of this file failed. This operation uses the SharePoint Administration service (spadmin), which could not be contacted. If the service is stopped or disabled,  start it and try the operation again."

So it turns out that the internal exception that causes this problem is:

Failed to connect to an IPC Port: The system cannot find the file specified. 

There is a new hotfix available for this which updates a .NET Framework DLL, which resolves this issue. The specific KB article that talks about this is

A System.Runtime.Remoting.RemotingException exception is thrown when you deploy a SharePoint solution on a SharePoint Web server that is running the .NET Framework 2.0 SP2

FYI: You need to have .NET Framework 2.0 Service Pack 2 installed before applying this fix.

If you want to debug this and confirm that the internal exception is actually the one noted above, there are 2 ways to use this.

Easy way: Look at the application event logs to see if this exception is logged.

Hard way: !!! Do not try this in Production !!! Read on…

  1. Download and install Debugging tools for Windows for the platform on which you are running.
  2. Using ADPlus, log all first chance .NET CLR exceptions.
  3. Note down the IIS worker process (w3wp.exe) PID corresponding to your app using using task manager
  4. Save the below script to Debugging tools for windows folder as LogExceptions.cfg
  5. Execute the command as: CScript adplus.vbs –p <PID> –c LogException.cfg
  6. Reproduce the problem.
  7. Inside the debugging tools for Windows folder there is another folder that is created that begins with Crash…The exceptions are logged in a text file that is named as PID-<PID>.exe…

Here is the script:

<ADPlus>
<Settings>
<RunMode>Crash</RunMode>
<Option>Quiet</Option>
</Settings>

<PreCommands>
<Cmd>.loadby sos mscorwks</Cmd>
</PreCommands>

<Exceptions>
<Option>NoDumpOnFirstChance</Option>
<Option>NoDumpOnSecondChance</Option>
<Config>
<Code>clr</Code>
<Actions1>Log;Stack</Actions1>
<CustomActions1>.time;!pe-nested;!clrstack;!threads</CustomActions1>
<ReturnAction1>gn</ReturnAction1>
</Config>
</Exceptions>
</ADPlus>

Ever run into a problem where your published links for office client applications do not show up in My SharePoint sites Link in Office applications? If so, read through…

First, the basics. Ensure it is configured right. This feature works only with Office 2007 client applications. The published links you create will appear on Office client applications of all users whose personal sites are stored within the same SSP. The mechanism that makes this work is a client side “pull”. The Office client application will request the Published Links when it starts by sending a POST to /personal/username_vti_bin/publishedlinksservice.asmx. You can catch this via Fiddler. The response code must be 200.

  1. Browse to SharePoint Central Administration site.
  2. In the Quick Launch Window in the left pane, click on your SSP under Shared Services Administration
  3. Under User Profiles and My Sites, click on Published Links to Office client applications.
  4. Click on New and enter the link, description and select the Type of link.
  5. Click OK

The next step is to open a browser Window on the client machine and browse to the library or link that you just published. Then click on the MySite Link on the top right corner of the web page. If your MySite has not been created yet, then it will be created for you. Just below Site Actions Link, you will find a link “Set as default My Site”. Click on that link. You should now be prompted with a message:

---------------------------
Configure My Site for Microsoft Office
---------------------------
Microsoft Office can remember your My Site to synchronize documents stored here in Outlook and to show it when opening and saving files.  Do you want Office to remember this site ('<link>')?  Only select 'yes' if you trust this site.
---------------------------
Yes   No  
---------------------------

Click Yes. What this does is update registry key called PersonalSiteURL located at:

HKEY_CURRENT_USER\Software\AppDataLow\Microsoft\Office\12.0\Common\Portal (Vista)
HKEY_CURRENT_USER\Software\AppDataLow\Microsoft\Office\12.0\Common\Portal (XP)


Next, open an Office application such as Excel and select Save As OR File, Open. At this moment, the Office application will access the String value of property Url under the registry key: HKEY_CURRENT_USER\Software\Policies\Microsoft\Office\12.0\Common\Portal\Link Providers\MySiteHost to determine which is your MySite host. It then makes a POST to the web service, Publishedlinksservice.asmx, hosted on SSP of your SharePoint server is performed. If you used fiddler, check out the user agent that indicates what application is making the request. User-Agent: Microsoft Office/12.0 (Windows NT 6.1; Microsoft Office Word 12.0.6425; Pro)

The server then sends a response as XML, which includes various <ServerLink> XML which has the published links. Eg:

Request:

HTTP/1.1 200 OK
Content-Type: text/xml; charset=utf-8
Set-Cookie: WSS_KeepSessionAuthenticated=100; path=/
Persistent-Auth: true
MicrosoftSharePointTeamServices: 12.0.0.6421

Response:

<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Body><GetLinksResponse xmlns="Your'>http://microsoft.com/webservices/SharePointPortalServer/PublishedLinksService"><GetLinksResult><ServerLink><Title>Your Site Documents</Title><Url>http://yoursite:80/sites/site1/shared%20documents</Url><LinkType>33554432</LinkType><IsMember>false</IsMember><IsPublished>true</IsPublished></ServerLink><ServerLink><Title>My Site</Title><Url>http://yoursite:100/personal/username/</Url><LinkType>2</LinkType><IsMember>true</IsMember><IsPublished>true</IsPublished></ServerLink><ServerLink><Title>Profile Site</Title><Url>http://yoursite:100/Person.aspx?user=</Url><LinkType>72057594037927936</LinkType><IsMember>false</IsMember><IsPublished>true</IsPublished></ServerLink></GetLinksResult></GetLinksResponse></soap:Body></soap:Envelope>

After calling the web service, the Office Client application then makes an HTTP request to the personal site link that is specified under the <ServerLink> xml. If you are able to get a successful response (HTTP 200 – Verify using Fiddler or Netmon) for that link, you should see it come up in the Save As dialog box when you select “My SharePoint Sites” link. If you cannot, then you will not see that link because the Office Application will delete that link. The Office application then updates the registry key by creating 2 new keys under:

HKEY_CURRENT_USER\Software\Microsoft\Office\12.0\Common\Server Links\Published

BTW, If you are using Windows 2008 or R2 as your client machine, then you need to enable WebDav and Desktop Experience support to make this work. Desktop Experience installs the WebDav redirector which is required for this functionality to work properly

[Update]

Microsoft released a new WebDAV extension module that was completely re-written for IIS 7.0 on Windows Server 2008. I tested a SharePoint site running in Classic mode and the response to the web service requests to SSP sites was a 405. At this time I am not 100% sure if that is expected but a resolution I found to making it work is to remove the WebDAV module for the SSP web application.

  1. In IIS manager, select your SSP Web application and then double click the Modules icon.
  2. Right click on WebDAVModule and select Remove.
  3. Recycle your SSP application pool.
  4. On your client machine, open the system registry and locate the registry key: HKEY_CURRENT_USER\Software\Microsoft\Office\12.0\Common\Portal
  5. Remove the entry LinkPublishingTimestamp.
  6. Restart your office application and test

So in my previous posts, I explained how you can use DebugDiag tool to capture high memory dumps with leak tracking enabled and also how to use the inbuilt memory analysis scripts to get a report of memory usage. In this post, I discuss how you can do things manually using Debugging tools for Windows or Windbg. Again, I have tried to provide a generic approach, but with an example. It doesn’t apply to each & every situation.

So I have a memory dump which is about 500 MB in size and was captured when web applications started throwing out of memory errors. The first thing to do find out is where most of the memory is. I discussed this a bit in one of my earlier blog posts.

   1: 0:000> !address -summary
   2:  
   3: -------------------- Usage SUMMARY --------------------------
   4:     TotSize (      KB)   Pct(Tots) Pct(Busy)   Usage
   5:     1806000 (   24600) : 01.17%    01.19%    : RegionUsageIsVAD
   6:     14f3000 (   21452) : 01.02%    00.00%    : RegionUsageFree
   7:     23e9000 (   36772) : 01.75%    01.77%    : RegionUsageImage
   8:     2200000 (   34816) : 01.66%    01.68%    : RegionUsageStack
   9:       88000 (     544) : 00.03%    00.03%    : RegionUsageTeb
  10:    78c82000 ( 1978888) : 94.36%    95.34%    : RegionUsageHeap
  11:           0 (       0) : 00.00%    00.00%    : RegionUsagePageHeap
  12:        1000 (       4) : 00.00%    00.00%    : RegionUsagePeb
  13:        1000 (       4) : 00.00%    00.00%    : RegionUsageProcessParametrs
  14:        2000 (       8) : 00.00%    00.00%    : RegionUsageEnvironmentBlock
  15:        Tot: 7fff0000 (2097088 KB) Busy: 7eafd000 (2075636 KB)
  16:  
  17: Largest free region: Base 57818000 - Size 00068000 (416 KB)

So from this output we can see that 94.36% of the entire virtual address space is in RegionUsageHeap, which means heap memory. We can also see the size – 1,978,888 KB or 1.88 GB! Remember I indicated a few moments back that our dump file itself is just 500 MB in size. So what this most likely means is that this value is the reserved memory vs. committed bytes plus other information that the dump file contains. We can also see that the largest contiguous free region is just 416 KB, which explains why this process ran into out of memory errors. There is just no large contiguous free block to satisfy allocation requests.

A process will have at least one heap, the default process heap which is created by the operating system for you when the process starts. This heap is used for allocating memory if no other heaps are created and used. Components loaded within the process can create their own heaps. For eg the C Runtime heap. Many of you will remember it as MSVCRT.dll – that’s our C Runtime library.

OK, so how many heaps and which heap has the most number of allocations? The trick I usually use is to look at all the heaps and check how many segments each heap has. I think the maximum number of segments a heap can have is 64. Segments are contiguous blocks of memory which hold smaller memory ranges of various sizes. These ranges are of various sizes are handed out to applications when they request memory. Thus, if a segment does not have enough memory to satisfy an allocation request, a new segment is created. The more number of segments, the more the chances are that it is our problem heap. Recommended reading. To view the segments, you can use the inbuilt Windbg extension command !heap

From this example: !heap 0

115:   02990000 <------------- Heap Handle
    Segment at 02990000 to 029d0000 (00030000 bytes committed)
    Segment at 0bc10000 to 0bd10000 (00037000 bytes committed)
    Segment at 0e350000 to 0e550000 (00007000 bytes committed)
    Segment at 15fe0000 to 163e0000 (00002000 bytes committed)
    Segment at 59530000 to 59d30000 (00001000 bytes committed)
    .
    .
    .
    Segment at 5e980000 to 5e997000 (00001000 bytes committed)
    Segment at 60040000 to 60057000 (00001000 bytes committed)
    Segment at 611e0000 to 611f7000 (00001000 bytes committed)


117:   02a10000 <------------- Heap Handle
    Segment at 02a10000 to 02a50000 (00040000 bytes committed)
    Segment at 0fc90000 to 0fd90000 (000b7000 bytes committed)
    Segment at 17640000 to 17840000 (0000e000 bytes committed)
    Segment at 21ba0000 to 21fa0000 (00001000 bytes committed)
    Segment at 58530000 to 58d30000 (00001000 bytes committed)
    Segment at 5e9c0000 to 5f9c0000 (00001000 bytes committed)
    .
    .
    .
    Segment at 7fe70000 to 7ff23000 (00001000 bytes committed)
    Segment at 23de0000 to 23e3a000 (00001000 bytes committed)
    Segment at 52770000 to 527ca000 (00001000 bytes committed)
    Segment at 52900000 to 5295b000 (00001000 bytes committed)
    Segment at 584c0000 to 5851b000 (00001000 bytes committed)
    Segment at 5a270000 to 5a2cb000 (00001000 bytes committed)

I have truncated the above entry for brevity, but essentially there were many segments. An easier way to see how many segments are there in a heap is to use the !heap command again with the –s switch (for statistics) followed by heap handle. Thus: !heap –s 02a10000

Take a look at lines #12 & # 13 in the following output.

   1: 0:000> !heap -s 02a10000
   2: Walking the heap 02a10000 .......................................................................
   3:  0: Heap 02a10000
   4:    Flags          00001003 - HEAP_NO_SERIALIZE HEAP_GROWABLE 
   5:    Reserved memory in segments              184708 (k)
   6:    Commited memory in segments              18014398506966656 (k)
   7:    Virtual bytes (correction for large UCR) 1252 (k)
   8:    Free space                               254 (k) (45 blocks)
   9:    External fragmentation          0% (45 free blocks)
  10:    Virtual address fragmentation   201004% (77 uncommited ranges)
  11:    Virtual blocks  0 - total 0 KBytes
  12:    Lock contention 2989
  13:    Segments        64
  14:    896 hash table for the free list
  15:        Commits 0
  16:        Decommitts 0
  17:  
  18:                     Default heap   Front heap       Unused bytes
  19:    Range (bytes)     Busy  Free    Busy   Free     Total  Average 
  20: ------------------------------------------------------------------ 
  21:      0 -   1024       64    142      0      0          0      0
  22:   1024 -   2048      279     24      0      0       2280      8
  23:   2048 -   3072       21      3      0      0        176      8
  24:   3072 -   4096        2      3      0      0         16      8
  25:   4096 -   5120       69      6      0      0        560      8
  26:   5120 -   6144        6      0      0      0         48      8
  27:   6144 -   7168       35      3      0      0        280      8
  28:   7168 -   8192        0      2      0      0          0      0
  29:   8192 -   9216        0      1      0      0          0      0
  30:   9216 -  10240        2      1      0      0         16      8
  31:  12288 -  13312        2      0      0      0         16      8
  32:  13312 -  14336        0      1      0      0          0      0
  33:  19456 -  20480        2      0      0      0         16      8
  34:  24576 -  25600        2      0      0      0         16      8
  35:  36864 -  37888        0      1      0      0          0      0
  36: ------------------------------------------------------------------ 
  37:   Total              484    187      0      0       3424      7

From the above output, you can also see the ranges of memory and their utilization. We can also obtain worst offender byte sizes and worst offender count size by using the –stat parameter of !heap command. Here’s the output.

   1: 0:000> !heap -stat -h 02a10000
   2:  heap @ 02a10000
   3: group-by: TOTSIZE max-display: 20
   4:     size     #blocks     total     ( %) (percent of total busy bytes)
   5:     1008 45 - 45228  (26.78) <----------- Worst offender Bytes (WOB)
   6:     19f8 21 - 358f8  (20.75)
   7:     418 c6 - 32a90  (19.62)
   8:     5e8 21 - c2e8  (4.72)
   9:     6008 2 - c010  (4.65)
  10:     5a0 21 - b9a0  (4.49)
  11:     4c08 2 - 9810  (3.68)
  12:     1440 6 - 7980  (2.94)
  13:     808 d - 6868  (2.53)
  14:     3008 2 - 6010  (2.33)
  15:     2708 2 - 4e10  (1.89)
  16:     1808 2 - 3010  (1.16)
  17:     a18 4 - 2860  (0.98)
  18:     6c0 5 - 21c0  (0.82)
  19:     408 6 - 1830  (0.59)
  20:     c08 2 - 1810  (0.58)
  21:     ac0 2 - 1580  (0.52)
  22:     a90 2 - 1520  (0.51)
  23:     778 2 - ef0  (0.36)
  24:     450 1 - 450  (0.10)

All values are in hex in the above output except the percent column. So from the above output (line # 5) we can say that worst offender bytes [WOB - allocation size that is using the most bytes in the heap] is 0x1008 Bytes [4,104 Bytes or 4K] and it adds up to a total of 0x45228 Bytes [283,176 Bytes or 276 KB]

Similarly, you could group by block size if you want to figure the worst offender count size [WOC - allocation size that has the most duplicates in the heap] and count of worst offender count by using the –grp switch.

   1: 0:000> !heap -stat -h 02a10000 -grp B
   2:  heap @ 02a10000
   3: group-by: BLOCKCOUNT max-display: 20
   4:     size     #blocks    total     ( %) (percent of totalblocks)
   5:     418 c6 - 32a90  (47.26)  <----------- Worst offender count (WOC)
   6:     1008 45 - 45228  (16.47)
   7:     19f8 21 - 358f8  (7.88)
   8:     5e8 21 - c2e8  (7.88)
   9:     5a0 21 - b9a0  (7.88)
  10:     808 d - 6868  (3.10)
  11:     1440 6 - 7980  (1.43)
  12:     408 6 - 1830  (1.43)
  13:     6c0 5 - 21c0  (1.19)
  14:     a18 4 - 2860  (0.95)
  15:     6008 2 - c010  (0.48)
  16:     4c08 2 - 9810  (0.48)
  17:     3008 2 - 6010  (0.48)
  18:     2708 2 - 4e10  (0.48)
  19:     1808 2 - 3010  (0.48)
  20:     c08 2 - 1810  (0.48)
  21:     ac0 2 - 1580  (0.48)
  22:     a90 2 - 1520  (0.48)
  23:     778 2 - ef0  (0.48)
  24:     450 1 - 450  (0.24)

Thus in this case, the most duplicates are of allocation size 0x418 bytes [1048 Bytes] and there are 0xc6 [196] of them. You could also dump the allocations in the 1 K – 4 K range and then dump out the contents using the address value in UserPtr column. To do that execute: dc <address value in UserPtr column>

Warning: This command can generate a huge output as it dumps allocations in the specified range from all heaps.

   1: !heap -flt r 418 1008
   2:     _HEAP @ 2a10000
   3:       HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
   4:         02a14a58 0084 00f0  [01]   02a14a60    00418 - (busy)
   5:         02a15000 0084 0084  [01]   02a15008    00418 - (busy)
   6:         02a20330 0144 0102  [01]   02a20338    00a18 - (busy)
   7:         02a25a40 0102 0144  [01]   02a25a48    00808 - (busy)
   8:         02a265f0 00be 0102  [01]   02a265f8    005e8 - (busy)
   9:         02a2d1b0 00b5 00be  [01]   02a2d1b8    005a0 - (busy)
  10:         5e9c0040 0084 00b6  [00]   5e9c0048    00418 - (free)
  11:         7c1d0040 0084 0084  [00]   7c1d0048    00418 - (free)
  12:     .
  13:     .
  14:  
  15: dc 02a14a60

So our story so far…

  • There are a couple of heaps that have lots of segments.
  • A lot of memory is reserved – in MBs and there are few committed blocks – in KBs.
  • The allocations appear to be in the range of 1 KB to 4 KB.

Next questions: So what are these heaps and who is allocating here?

If you want to see the stack back trace for the allocation, you can dump out the page heap information for a given address [UserPtr], but stack back trace is only displayed when available. If I remember correctly, it is available when page heap is enabled for the process.

   1: 0:000> !heap -p -a 7c1d0048
   2:     address 7c1d0048 found in
   3:     _HEAP @ 2a10000
   4:       HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
   5:         7c1d0040 0084 0000  [00]   7c1d0048    00418 - (free)
   6:         Trace: 0025
   7:         7c96d6dc ntdll!RtlDebugAllocateHeap+0x000000e1
   8:         7c949d18 ntdll!RtlAllocateHeapSlowly+0x00000044
   9:         7c91b298 ntdll!RtlAllocateHeap+0x00000e64
  10:         102c103e MSVCR90D!_heap_alloc_base+0x0000005e
  11:         102cfd76 MSVCR90D!_heap_alloc_dbg_impl+0x000001f6
  12:         102cfb2f MSVCR90D!_nh_malloc_dbg_impl+0x0000001f
  13:         102cfadc MSVCR90D!_nh_malloc_dbg+0x0000002c
  14:         102db25b MSVCR90D!malloc+0x0000001b
  15:         102bd691 MSVCR90D!operator new+0x00000011
  16:         102bd71f MSVCR90D!operator new[]+0x0000000f
  17:         4113d8 MyModule1!AllocateMemory+0x00000028
  18:         41145c MyModule1!wmain+0x0000002c
  19:         411a08 MyModule1!__tmainCRTStartup+0x000001a8
  20:         41184f MyModule1!wmainCRTStartup+0x0000000f
  21:         7c816fd7 kernel32!BaseProcessStart+0x00000023

The above output is just an example but you get the idea of how you can use this technique to help track the source of leaks in your application.

When memory at a given address is de-allocated, the heap manager checks how many contiguous bytes are free around that address. After that check is complete, the heap manager can do one of two things:

  • Keep the contiguous memory block committed.
  • De-commit the contiguous memory block and mark it as reserved only.

There is a registry key that controls the de-commit behavior. That key is:

  • Path: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager
  • Key: HeapDeCommitFreeBlockThreshold
  • Value Type: Reg_DWORD

For sake of completing this blog post, adjusting the value for this registry key was the resolution in my example. It could be something else in your case depending on the circumstances under which this occurs. Once a software developer has enough information about the pattern and source of the memory consumption, he will be able to recommend changes or make a suitable fix to resolve the issue.

In this blog post, I made an attempt to show how you can track down native memory leaks manually vs. using DebugDiag scripts as discussed in this blog post. Again, this doesn’t apply to every situation as there are umpteen possibilities to the cause of a leak. Hopefully this blog post is a good starter and a future reference.

 

In my last post, I discussed a generic approach to collecting memory dumps using Debug Diagnostics tool. In this post, I discuss how to use DebugDiag’s memory pressure scripts. Please note that the current version of DebugDiag does not have the ability to look up .NET heaps and draw conclusions. For .NET debugging, the best resources are the following blogs:

Tess’s Blog

Doug's Blog

Step 1: Capture a high memory dump as discussed in this post.

Step 2: Start Debug Diagnostics Tool. If prompted to select a rule, click Cancel.

dbgdiag1

Step 3: Select the Analysis Tab and select the memory pressure analysis scripts

dbgdiag2

Step 4: Add the dump files for analysis

dbgdiag3

Step 5: Start analysis

dbgdiag4

Wait for DebugDiag to finish. DebugDiag will automatically connect to the Microsoft Public symbol server, download and cache symbols on your local drive for analysis. You can also add your custom symbol stores and the location where you want to cache the symbols using the Tools, Options & Settings dialog box.

Have fun!

Debugging native memory leaks is one of the most difficult things to do - (at least for me). There are a few Escalation Engineers at Microsoft Product Support Services who are extremely good at debugging all kinds of issues. I learn a lot from these guys whenever I get an opportunity.

In this blog post, I am not going to talk about a specific issue, but rather a general approach to debugging native memory leaks. I work in the IIS/ASP support group and therefore some things I discuss may be more IIS/ASP specific at times.

To solve the problems of common debugging issues, Escalation Engineers in the IIS support group created a fantastic tool called Debug Diagnostics Tool. This link points you to to the 32 bit (x86 version). To obtain the 64 bit (x64) version, you need to call Microsoft Product Support at this time. What this tool allows you to do is inject a module called Leaktrack.dll into the target process so that it starts collecting allocation/de-allocation information. The concept is simple - create a heap where you track allocations from various memory managers. It works by hooking into the known Windows memory managers NTDLL, MSVCRT etc.

How it works

When a module makes an allocation request, it increments the count and also gets the size of allocation and also maintains a total size of allocation. When a de-allocation request is made by the same component, it reduces the count and updates the totals. For this to work effectively, you must inject leaktrack soon after you start the process. When the process has consumed memory in the upwards of 700 MB, you can dump out the process and then run Debug Diag’s inbuilt memory pressure analysis scripts against that dump file. Debug Diag is so cool that it will connect to the public Microsoft symbol server, download the symbols, analyze and create a nice report about the memory allocations and components responsible for those allocations. It is very easy to determine issues related to memory leaks & fragmentation with DebugDiag script. DebugDiag is very effective against issues in web applications hosted in IIS worker processes because it uses heuristics and is accurate many times. Below are the screen shots on how to setup a leak rule in Debug Diag.

NOTE: If you are debugging a web application hosted in IIS that is leaking memory, before you setup a memory leak rule, restart IIS and then send the first request to the application. This is to start tracking from the beginning of the life of the process and also to start the IIS worker process.

Step 1: Open Debug Diagnostics Tool

DDv1-1-1

Step 2: If prompted to select a rule, select Memory & Handle Leak OR click Add Rule button to get to this screen

DDv1-1-2

Step 3: Click Next to get to the Select Target Screen. Then select w3wp.exe if debugging IIS process or the process that you wish to debug. If you see multiple worker processes & is not sure which w3wp.exe instance to select, run the following command from a command prompt running as Administrator

CScript %windir%\system32\iisapp.vbs

The above script will output the IIS web application pool name and its corresponding PID value that you can use below.

DDv1-1-3 

Step 4: Click Next, then click On the Configure button

DDv1-1-4 

Step 5: Setup the rules as follows

  • Generate a userdump when private bytes reach - (Enter value)
  • And each additional 100 MB thereafter.
  • Auto-create a crash rule to get userdump on unexpected process exit

DDv1-1-5

Step 6: Click Save & Close and then the Next Button from the previous screen.

DDv1-1-6

Step 7: Type in any name that you like for the rule and also type in the path where you want the dumps to be generated. This drive must have lots of disk space as each dump file will be equal to the size of the process when the dump is captured. So since we are capturing it at 800 MB upwards here as in this example, this will create 10 dumps (by default) of 800 MB or higher each.

DDv1-1-7

Step 8: Finish up the rule and activate it. Then make sure you see the information screen like below

DDv1-1-8

You are done! You can see the rules that you just configured in the rules window. When a dump is captured, the userdump count column will have a value of 1 or more.

DDv1-1-9

Next Post: Using Analysis Scripts.

So, you have a managed dump and you want to find out the request headers. Here’s one of the methods I use to find this information. I use it especially when I want to view the session ID or cookies.

  • Load SOS
  • Run !aspxpages and note down the HttpContext address you are interested in. Eg:

0x120bea74    54000 Sec       yes                       XXX        200   GET /default.aspx
0x14244228    54000 Sec        no       284 Sec     XXX        200   GET /default.aspx
0x1426f4c0     54000 Sec        no       101 Sec     XXX        200   GET /default.aspx
0x163a3038    54000 Sec        no       193 Sec     XXX        200   GET /default.aspx
0x1e0c360c    54000 Sec        no       376 Sec      39          200   GET /default.aspx

  • Dump the HttpContext noted in Step 2: !do 0x1426f4c0
  • Dump the _wr field which is HttpWorkerRequest object.
  • Dump the _basicServerVars field: !dumparray <address of _basicServerVars>
  • Dump the last entry: !do <address of last entry in the list>
  • You should get the output like:

String: Connection: Keep-Alive
Cookie: ASP.NET_SessionId=5kwvjlzd3ksgii45ephn00aq; HTTP_REFERER::6841=
Host: Skyraider
User-Agent: DebugDiag Service HTTP Pinger

Hopefully this is what you wanted.

In one of my earlier posts, I discussed one of the reasons for compression failure and how we identified it using ETW traces and resolved it. Below are the list of other reason codes for your reference.

NO_ACCEPT_ENCODING The HTTP request did not contain the Accept-Encoding header
COMPRESSION_DISABLED No permissions for the IIS_WPG group or application pool identity on w3svc/filters/compression node in the IIS metabase file.
NO_COMPRESSION_10 The request is HTTP 1.0 and the IIS metabase key HcNoCompressionForHttp10 value = TRUE
NO_COMPRESSION_PROXY HTTP request contains a Via header which means the request is relayed via a proxy server AND the IIS metabase setting – HcNoCompressionForProxies has a value of TRUE
NO_MATCHING_SCHEME IIS could not find a matching configuration entry for the file extension of the requested web page. This typically means the extension was not added to HcFileExtensions or HcScriptFileExtensions list in IIS metabase.
UNKNOWN_ERROR Unknown reason.
NO_COMPRESSION_RANGE HTTP request contains a Range header and the IIS metabase setting – HcNoCompressionForRange is set to TRUE.
FILE_TOO_SMALL The file size is too small to be compressed.
FILE_ENCRYPTED The requested file is encrypted
COMPRESS_FILE_NOT_FOUND The compressed file was removed since it was compressed. This may also happen on the first request.
COMPRESS_FILE_STALE The compressed file has changed since it was compressed. It may be due to the symptoms described in Microsoft Knowledge Base article: 817442

For additional information, refer to this blog post from IIS support team.

You may perhaps have used Event Tracing Feature of Windows aka ETW for debugging many server side problems related to IIS. When I first learnt about ETW and started using it, I found it to be really cool! Unfortunately there’s not a lot of documentation around using it. For Eg: When to use which provider. it will be helpful to know which providers emit what information so that we can use a specific set of providers rather than a whole bunch of them, which of course will generate a ton of data. Looking through lots of data can sometimes be painful. Take an example where you want to enable ETW tracing but it may take a day or two for the problem to reproduce. Parsing the generated log can be a nightmare! So… I decided to put together this blog that gives information about some of the providers, if not all.

For a list of providers available on your machine, execute the following from a command prompt:

Logman Query Providers

The following table lists the details about providers (that I use usually) & their trace areas (where available). Use any combination of these providers depending on what problem you are troubleshooting.

Provider Trace Areas
IIS: WWW Server IISAuthentication, IISSecurity, IISFilter, IISStaticFile, IISCGI, IISCompression, IISCache, IISAll
IIS: IISADMIN Global Startup, Shutdown
IIS: WWW Global Startup, Shutdown, All
IIS: SSL Filter SSL related events
IIS: Request Monitor -
IIS: Active Server Pages (ASP) Events from ASP ISAPI
IIS: WWW Isapi Extension -
HTTP Service Trace -
ASP.NET Events  All ASP.net events

NOTE: ETW tracing is also very helpful when you want to view what is happening on the server side over a SSL connection.

I already have a blog post on using ETW providers to capture data & parsing ETW traces.

Windbg is a native debugger and you can use it to set a breakpoint on a virtual address. Any managed code running within the process wouldn’t have a virtual address associated with it until it is JIT compiled. Thus setting a breakpoint on a managed function is a bit tricky in Windbg. You can set a breakpoint on managed methods using windbg only:

  • When you are performing a live debug & not on a post mortem dump file.
  • You have a .RUN file from a Time Travel Debug trace.

When I started learning how to set managed breakpoints, one of the first questions I had is: How to set a breakpoint on a specific line of code in a managed method – because that is what we usually do in other IDE environments like Visual Studio. This is somewhat very difficult to do because, though you can get the virtual address where your method starts using the SOS commands, you will need to know the exact offset from the method’s starting virtual address [The actual address which corresponds to your line of code] and it isn’t easy at all to co-relate that to your source code. You will need to have an extremely good understanding of IL code, un-assemble the function using !u command and then set the breakpoint on that address. I do not have that skill yet, but will surely put out a post once I figure that out. So over here, I will describe how to set a breakpoint on a managed method for .NET Framework 2.0.

STEP 1: So assuming you are doing a live debug, the first step is to attach to the process that you want to debug. You can use the attach option in Windbg user interface [File menu]. Then load the SOS debugger extension - !loadby SOS mscorwks

STEP 2: You need to know which method you want to set a breakpoint on. The SOS command you need is !dumpmt with the –md parameter. This lists out the method table. For example, Dump the method table of System.Timespan

   1: !dumpmt -md 0x7911228c 
   2: EEClass: 791121e4 
   3: Module: 790c2000 
   4: Name: System.TimeSpan 
   5: mdToken: 02000114  (C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll) 
   6: BaseSize: 0x10 
   7: ComponentSize: 0x0 
   8: Number of IFaces in IFaceMap: 3 
   9: Slots in VTable: 56 
  10: -------------------------------------- 
  11: MethodDesc Table 
  12:    Entry MethodDesc      JIT Name 
  13: 796d2710   7914fb28     NONE System.TimeSpan.ToString() 
  14: 793624d0   7914b950   PreJIT System.Object.Finalize() 
  15: 796c07f8   7914fb08     NONE System.TimeSpan.CompareTo(System.TimeSpan) 
  16: 796d2708   7914fb18     NONE System.TimeSpan.Equals(System.TimeSpan) 
  17: 79381054   79266eb8   PreJIT System.TimeSpan..ctor(Int64) 
  18: 7939f058   79266ec0   PreJIT System.TimeSpan..ctor(Int32, Int32, Int32) 
  19: 7939f07c   79266ed8   PreJIT System.TimeSpan.get_Ticks() 
  20: 794002c8   79266ee0   PreJIT System.TimeSpan.get_Days() 
  21: 794002e8   79266ee8   PreJIT System.TimeSpan.get_Hours() 
  22: 79400328   79266ef0   PreJIT System.TimeSpan.get_Milliseconds() 
  23: 7940036c   79266ef8   PreJIT System.TimeSpan.get_Minutes() 
  24: 794003ac   79266f00   PreJIT System.TimeSpan.get_Seconds() 
  25: 794003ec   79266f08   PreJIT System.TimeSpan.get_TotalDays() 
  26: 7940040c   79266f10   PreJIT System.TimeSpan.get_TotalHours() 
  27: 79380c10   79266f18   PreJIT System.TimeSpan.get_TotalMilliseconds()
  28:  

STEP 3: [Optional] Using the method descriptor command, !dumpmd you can view if the code is JITted. See line #7 below. You can skip this and go to STEP 4 directly using the corresponding MethodDesc value from the previous output.

   1: !dumpmd 79266f18 
   2: Method Name: System.TimeSpan.get_TotalMilliseconds() 
   3: Class: 791121e4 
   4: MethodTable: 7911228c 
   5: mdToken: 0600101e 
   6: Module: 790c2000 
   7: IsJitted: yes 
   8: m_CodeOrIL: 79380c10

STEP 4: Add the breakpoint using !bpmd –md command.

   1: !bpmd –md 79380c10

Another way…

Syntax: !bpmd <ModuleName> <FunctionName>

Example: !bpmd mscorlib.dll System.TimeSpan.get_TotalMilliseconds

Notes

  1. The method names are case sensitive.
  2. In many cases, the breakpoints you set may be indicated as “Pending breakpoints”, which is normal, because your method may not yet be JITted.

Once your breakpoints are set, you can execute the g command to let the process execute till it hits the breakpoint. Once it hits the breakpoint you can do other tasks like examine callstacks, stack objects, local variables etc.

Have fun!

Issues related to high memory utilization on an IIS application server are common. With .NET there is a little misconception that the Garbage Collector (GC) will clean up objects and therefore the process can never run out of memory. This isn’t true. GC will never clean up an object which is in use. If that was the case, you can imagine the kind of problems it would create.

While debugging memory problems, it is a good idea to capture memory dump when the process memory consumption is at its peak maximum usage. For .NET applications, a System.OutOfMemoryException is thrown when GC fails on a VirtualAlloc().

So how do we capture a memory dump when this Exception is thrown? Here’s how.

For .NET Framework version 1.1

Open the registry path: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework

Key: GCFailFastOnOOM

Type: DWORD

Value: 2

For .NET Framework version 2.0 and above

Open the registry path: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework

Key: GCBreakOnOOM

Type: DWORD

Value: 2

Setting the above key causes a DebugBreak within the process when a System.OutOfMemoryException is encountered. You can then use a tool like DebugDiag or a Debugger like WinDBG/CDB/NTSD to capture a dump on this DebugBreak Exception. Windbg/CDB/NTSD debuggers are for advanced users and DebugDiag is generally preferred due to ease of use and is designed to be used in production environments.

Configuring Debug Diagnostic Tool

  1. Download and install DebugDiag to a drive with at least 4-5 GB of disk space.
  2. Open DebugDiag. If prompted to select a rule, select Crash. Else click on Add Rule button and select Crash.
  3. Click Next & select “A specific process”
  4. Select the process name. For IIS 6.0, this will be w3wp.exe. Click Next
  5. Under Advanced Settings, click on Exceptions, then click on Add Exception
  6. From the list of exceptions, select 80000003 Breakpoint Exception
  7. Set Action Type to Full userdump & Action limit to 1. Click OK
  8. Click Save & Close button
  9. Click Next and provide a name for the rule and location where the dump files must be saved.
  10. Click Next and then Finish button.

NOTE: When the dump is captured, the Userdump count column will be incremented by 1. You can then do post mortem debugging using Windbg and SOS.

 

More Posts Next page »
 
Page view tracker