Welcome to MSDN Blogs Sign in | Join | Help

So in my previous posts, I explained how you can use DebugDiag tool to capture high memory dumps with leak tracking enabled and also how to use the inbuilt memory analysis scripts to get a report of memory usage. In this post, I discuss how you can do things manually using Debugging tools for Windows or Windbg. Again, I have tried to provide a generic approach, but with an example. It doesn’t apply to each & every situation.

So I have a memory dump which is about 500 MB in size and was captured when web applications started throwing out of memory errors. The first thing to do find out is where most of the memory is. I discussed this a bit in one of my earlier blog posts.

   1: 0:000> !address -summary
   2:  
   3: -------------------- Usage SUMMARY --------------------------
   4:     TotSize (      KB)   Pct(Tots) Pct(Busy)   Usage
   5:     1806000 (   24600) : 01.17%    01.19%    : RegionUsageIsVAD
   6:     14f3000 (   21452) : 01.02%    00.00%    : RegionUsageFree
   7:     23e9000 (   36772) : 01.75%    01.77%    : RegionUsageImage
   8:     2200000 (   34816) : 01.66%    01.68%    : RegionUsageStack
   9:       88000 (     544) : 00.03%    00.03%    : RegionUsageTeb
  10:    78c82000 ( 1978888) : 94.36%    95.34%    : RegionUsageHeap
  11:           0 (       0) : 00.00%    00.00%    : RegionUsagePageHeap
  12:        1000 (       4) : 00.00%    00.00%    : RegionUsagePeb
  13:        1000 (       4) : 00.00%    00.00%    : RegionUsageProcessParametrs
  14:        2000 (       8) : 00.00%    00.00%    : RegionUsageEnvironmentBlock
  15:        Tot: 7fff0000 (2097088 KB) Busy: 7eafd000 (2075636 KB)
  16:  
  17: Largest free region: Base 57818000 - Size 00068000 (416 KB)

So from this output we can see that 94.36% of the entire virtual address space is in RegionUsageHeap, which means heap memory. We can also see the size – 1,978,888 KB or 1.88 GB! Remember I indicated a few moments back that our dump file itself is just 500 MB in size. So what this most likely means is that this value is the reserved memory vs. committed bytes plus other information that the dump file contains. We can also see that the largest contiguous free region is just 416 KB, which explains why this process ran into out of memory errors. There is just no large contiguous free block to satisfy allocation requests.

A process will have at least one heap, the default process heap which is created by the operating system for you when the process starts. This heap is used for allocating memory if no other heaps are created and used. Components loaded within the process can create their own heaps. For eg the C Runtime heap. Many of you will remember it as MSVCRT.dll – that’s our C Runtime library.

OK, so how many heaps and which heap has the most number of allocations? The trick I usually use is to look at all the heaps and check how many segments each heap has. I think the maximum number of segments a heap can have is 64. Segments are contiguous blocks of memory which hold smaller memory ranges of various sizes. These ranges are of various sizes are handed out to applications when they request memory. Thus, if a segment does not have enough memory to satisfy an allocation request, a new segment is created. The more number of segments, the more the chances are that it is our problem heap. Recommended reading. To view the segments, you can use the inbuilt Windbg extension command !heap

From this example: !heap 0

115:   02990000 <------------- Heap Handle
    Segment at 02990000 to 029d0000 (00030000 bytes committed)
    Segment at 0bc10000 to 0bd10000 (00037000 bytes committed)
    Segment at 0e350000 to 0e550000 (00007000 bytes committed)
    Segment at 15fe0000 to 163e0000 (00002000 bytes committed)
    Segment at 59530000 to 59d30000 (00001000 bytes committed)
    .
    .
    .
    Segment at 5e980000 to 5e997000 (00001000 bytes committed)
    Segment at 60040000 to 60057000 (00001000 bytes committed)
    Segment at 611e0000 to 611f7000 (00001000 bytes committed)


117:   02a10000 <------------- Heap Handle
    Segment at 02a10000 to 02a50000 (00040000 bytes committed)
    Segment at 0fc90000 to 0fd90000 (000b7000 bytes committed)
    Segment at 17640000 to 17840000 (0000e000 bytes committed)
    Segment at 21ba0000 to 21fa0000 (00001000 bytes committed)
    Segment at 58530000 to 58d30000 (00001000 bytes committed)
    Segment at 5e9c0000 to 5f9c0000 (00001000 bytes committed)
    .
    .
    .
    Segment at 7fe70000 to 7ff23000 (00001000 bytes committed)
    Segment at 23de0000 to 23e3a000 (00001000 bytes committed)
    Segment at 52770000 to 527ca000 (00001000 bytes committed)
    Segment at 52900000 to 5295b000 (00001000 bytes committed)
    Segment at 584c0000 to 5851b000 (00001000 bytes committed)
    Segment at 5a270000 to 5a2cb000 (00001000 bytes committed)

I have truncated the above entry for brevity, but essentially there were many segments. An easier way to see how many segments are there in a heap is to use the !heap command again with the –s switch (for statistics) followed by heap handle. Thus: !heap –s 02a10000

Take a look at lines #12 & # 13 in the following output.

   1: 0:000> !heap -s 02a10000
   2: Walking the heap 02a10000 .......................................................................
   3:  0: Heap 02a10000
   4:    Flags          00001003 - HEAP_NO_SERIALIZE HEAP_GROWABLE 
   5:    Reserved memory in segments              184708 (k)
   6:    Commited memory in segments              18014398506966656 (k)
   7:    Virtual bytes (correction for large UCR) 1252 (k)
   8:    Free space                               254 (k) (45 blocks)
   9:    External fragmentation          0% (45 free blocks)
  10:    Virtual address fragmentation   201004% (77 uncommited ranges)
  11:    Virtual blocks  0 - total 0 KBytes
  12:    Lock contention 2989
  13:    Segments        64
  14:    896 hash table for the free list
  15:        Commits 0
  16:        Decommitts 0
  17:  
  18:                     Default heap   Front heap       Unused bytes
  19:    Range (bytes)     Busy  Free    Busy   Free     Total  Average 
  20: ------------------------------------------------------------------ 
  21:      0 -   1024       64    142      0      0          0      0
  22:   1024 -   2048      279     24      0      0       2280      8
  23:   2048 -   3072       21      3      0      0        176      8
  24:   3072 -   4096        2      3      0      0         16      8
  25:   4096 -   5120       69      6      0      0        560      8
  26:   5120 -   6144        6      0      0      0         48      8
  27:   6144 -   7168       35      3      0      0        280      8
  28:   7168 -   8192        0      2      0      0          0      0
  29:   8192 -   9216        0      1      0      0          0      0
  30:   9216 -  10240        2      1      0      0         16      8
  31:  12288 -  13312        2      0      0      0         16      8
  32:  13312 -  14336        0      1      0      0          0      0
  33:  19456 -  20480        2      0      0      0         16      8
  34:  24576 -  25600        2      0      0      0         16      8
  35:  36864 -  37888        0      1      0      0          0      0
  36: ------------------------------------------------------------------ 
  37:   Total              484    187      0      0       3424      7

From the above output, you can also see the ranges of memory and their utilization. We can also obtain worst offender byte sizes and worst offender count size by using the –stat parameter of !heap command. Here’s the output.

   1: 0:000> !heap -stat -h 02a10000
   2:  heap @ 02a10000
   3: group-by: TOTSIZE max-display: 20
   4:     size     #blocks     total     ( %) (percent of total busy bytes)
   5:     1008 45 - 45228  (26.78) <----------- Worst offender Bytes (WOB)
   6:     19f8 21 - 358f8  (20.75)
   7:     418 c6 - 32a90  (19.62)
   8:     5e8 21 - c2e8  (4.72)
   9:     6008 2 - c010  (4.65)
  10:     5a0 21 - b9a0  (4.49)
  11:     4c08 2 - 9810  (3.68)
  12:     1440 6 - 7980  (2.94)
  13:     808 d - 6868  (2.53)
  14:     3008 2 - 6010  (2.33)
  15:     2708 2 - 4e10  (1.89)
  16:     1808 2 - 3010  (1.16)
  17:     a18 4 - 2860  (0.98)
  18:     6c0 5 - 21c0  (0.82)
  19:     408 6 - 1830  (0.59)
  20:     c08 2 - 1810  (0.58)
  21:     ac0 2 - 1580  (0.52)
  22:     a90 2 - 1520  (0.51)
  23:     778 2 - ef0  (0.36)
  24:     450 1 - 450  (0.10)

All values are in hex in the above output except the percent column. So from the above output (line # 5) we can say that worst offender bytes [WOB - allocation size that is using the most bytes in the heap] is 0x1008 Bytes [4,104 Bytes or 4K] and it adds up to a total of 0x45228 Bytes [283,176 Bytes or 276 KB]

Similarly, you could group by block size if you want to figure the worst offender count size [WOC - allocation size that has the most duplicates in the heap] and count of worst offender count by using the –grp switch.

   1: 0:000> !heap -stat -h 02a10000 -grp B
   2:  heap @ 02a10000
   3: group-by: BLOCKCOUNT max-display: 20
   4:     size     #blocks    total     ( %) (percent of totalblocks)
   5:     418 c6 - 32a90  (47.26)  <----------- Worst offender count (WOC)
   6:     1008 45 - 45228  (16.47)
   7:     19f8 21 - 358f8  (7.88)
   8:     5e8 21 - c2e8  (7.88)
   9:     5a0 21 - b9a0  (7.88)
  10:     808 d - 6868  (3.10)
  11:     1440 6 - 7980  (1.43)
  12:     408 6 - 1830  (1.43)
  13:     6c0 5 - 21c0  (1.19)
  14:     a18 4 - 2860  (0.95)
  15:     6008 2 - c010  (0.48)
  16:     4c08 2 - 9810  (0.48)
  17:     3008 2 - 6010  (0.48)
  18:     2708 2 - 4e10  (0.48)
  19:     1808 2 - 3010  (0.48)
  20:     c08 2 - 1810  (0.48)
  21:     ac0 2 - 1580  (0.48)
  22:     a90 2 - 1520  (0.48)
  23:     778 2 - ef0  (0.48)
  24:     450 1 - 450  (0.24)

Thus in this case, the most duplicates are of allocation size 0x418 bytes [1048 Bytes] and there are 0xc6 [196] of them. You could also dump the allocations in the 1 K – 4 K range and then dump out the contents using the address value in UserPtr column. To do that execute: dc <address value in UserPtr column>

Warning: This command can generate a huge output as it dumps allocations in the specified range from all heaps.

   1: !heap -flt r 418 1008
   2:     _HEAP @ 2a10000
   3:       HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
   4:         02a14a58 0084 00f0  [01]   02a14a60    00418 - (busy)
   5:         02a15000 0084 0084  [01]   02a15008    00418 - (busy)
   6:         02a20330 0144 0102  [01]   02a20338    00a18 - (busy)
   7:         02a25a40 0102 0144  [01]   02a25a48    00808 - (busy)
   8:         02a265f0 00be 0102  [01]   02a265f8    005e8 - (busy)
   9:         02a2d1b0 00b5 00be  [01]   02a2d1b8    005a0 - (busy)
  10:         5e9c0040 0084 00b6  [00]   5e9c0048    00418 - (free)
  11:         7c1d0040 0084 0084  [00]   7c1d0048    00418 - (free)
  12:     .
  13:     .
  14:  
  15: dc 02a14a60

So our story so far…

  • There are a couple of heaps that have lots of segments.
  • A lot of memory is reserved – in MBs and there are few committed blocks – in KBs.
  • The allocations appear to be in the range of 1 KB to 4 KB.

Next questions: So what are these heaps and who is allocating here?

If you want to see the stack back trace for the allocation, you can dump out the page heap information for a given address [UserPtr], but stack back trace is only displayed when available. If I remember correctly, it is available when page heap is enabled for the process.

   1: 0:000> !heap -p -a 7c1d0048
   2:     address 7c1d0048 found in
   3:     _HEAP @ 2a10000
   4:       HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
   5:         7c1d0040 0084 0000  [00]   7c1d0048    00418 - (free)
   6:         Trace: 0025
   7:         7c96d6dc ntdll!RtlDebugAllocateHeap+0x000000e1
   8:         7c949d18 ntdll!RtlAllocateHeapSlowly+0x00000044
   9:         7c91b298 ntdll!RtlAllocateHeap+0x00000e64
  10:         102c103e MSVCR90D!_heap_alloc_base+0x0000005e
  11:         102cfd76 MSVCR90D!_heap_alloc_dbg_impl+0x000001f6
  12:         102cfb2f MSVCR90D!_nh_malloc_dbg_impl+0x0000001f
  13:         102cfadc MSVCR90D!_nh_malloc_dbg+0x0000002c
  14:         102db25b MSVCR90D!malloc+0x0000001b
  15:         102bd691 MSVCR90D!operator new+0x00000011
  16:         102bd71f MSVCR90D!operator new[]+0x0000000f
  17:         4113d8 MyModule1!AllocateMemory+0x00000028
  18:         41145c MyModule1!wmain+0x0000002c
  19:         411a08 MyModule1!__tmainCRTStartup+0x000001a8
  20:         41184f MyModule1!wmainCRTStartup+0x0000000f
  21:         7c816fd7 kernel32!BaseProcessStart+0x00000023

The above output is just an example but you get the idea of how you can use this technique to help track the source of leaks in your application.

When memory at a given address is de-allocated, the heap manager checks how many contiguous bytes are free around that address. After that check is complete, the heap manager can do one of two things:

  • Keep the contiguous memory block committed.
  • De-commit the contiguous memory block and mark it as reserved only.

There is a registry key that controls the de-commit behavior. That key is:

  • Path: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager
  • Key: HeapDeCommitFreeBlockThreshold
  • Value Type: Reg_DWORD

For sake of completing this blog post, adjusting the value for this registry key was the resolution in my example. It could be something else in your case depending on the circumstances under which this occurs. Once a software developer has enough information about the pattern and source of the memory consumption, he will be able to recommend changes or make a suitable fix to resolve the issue.

In this blog post, I made an attempt to show how you can track down native memory leaks manually vs. using DebugDiag scripts as discussed in this blog post. Again, this doesn’t apply to every situation as there are umpteen possibilities to the cause of a leak. Hopefully this blog post is a good starter and a future reference.

 

In my last post, I discussed a generic approach to collecting memory dumps using Debug Diagnostics tool. In this post, I discuss how to use DebugDiag’s memory pressure scripts. Please note that the current version of DebugDiag does not have the ability to look up .NET heaps and draw conclusions. For .NET debugging, the best resources are the following blogs:

Tess’s Blog

Doug's Blog

Step 1: Capture a high memory dump as discussed in this post.

Step 2: Start Debug Diagnostics Tool. If prompted to select a rule, click Cancel.

dbgdiag1

Step 3: Select the Analysis Tab and select the memory pressure analysis scripts

dbgdiag2

Step 4: Add the dump files for analysis

dbgdiag3

Step 5: Start analysis

dbgdiag4

Wait for DebugDiag to finish. DebugDiag will automatically connect to the Microsoft Public symbol server, download and cache symbols on your local drive for analysis. You can also add your custom symbol stores and the location where you want to cache the symbols using the Tools, Options & Settings dialog box.

Have fun!

Debugging native memory leaks is one of the most difficult things to do - (at least for me). There are a few Escalation Engineers at Microsoft Product Support Services who are extremely good at debugging all kinds of issues. I learn a lot from these guys whenever I get an opportunity.

In this blog post, I am not going to talk about a specific issue, but rather a general approach to debugging native memory leaks. I work in the IIS/ASP support group and therefore some things I discuss may be more IIS/ASP specific at times.

To solve the problems of common debugging issues, Escalation Engineers in the IIS support group created a fantastic tool called Debug Diagnostics Tool. This link points you to to the 32 bit (x86 version). To obtain the 64 bit (x64) version, you need to call Microsoft Product Support at this time. What this tool allows you to do is inject a module called Leaktrack.dll into the target process so that it starts collecting allocation/de-allocation information. The concept is simple - create a heap where you track allocations from various memory managers. It works by hooking into the known Windows memory managers NTDLL, MSVCRT etc.

How it works

When a module makes an allocation request, it increments the count and also gets the size of allocation and also maintains a total size of allocation. When a de-allocation request is made by the same component, it reduces the count and updates the totals. For this to work effectively, you must inject leaktrack soon after you start the process. When the process has consumed memory in the upwards of 700 MB, you can dump out the process and then run Debug Diag’s inbuilt memory pressure analysis scripts against that dump file. Debug Diag is so cool that it will connect to the public Microsoft symbol server, download the symbols, analyze and create a nice report about the memory allocations and components responsible for those allocations. It is very easy to determine issues related to memory leaks & fragmentation with DebugDiag script. DebugDiag is very effective against issues in web applications hosted in IIS worker processes because it uses heuristics and is accurate many times. Below are the screen shots on how to setup a leak rule in Debug Diag.

NOTE: If you are debugging a web application hosted in IIS that is leaking memory, before you setup a memory leak rule, restart IIS and then send the first request to the application. This is to start tracking from the beginning of the life of the process and also to start the IIS worker process.

Step 1: Open Debug Diagnostics Tool

DDv1-1-1

Step 2: If prompted to select a rule, select Memory & Handle Leak OR click Add Rule button to get to this screen

DDv1-1-2

Step 3: Click Next to get to the Select Target Screen. Then select w3wp.exe if debugging IIS process or the process that you wish to debug. If you see multiple worker processes & is not sure which w3wp.exe instance to select, run the following command from a command prompt running as Administrator

CScript %windir%\system32\iisapp.vbs

The above script will output the IIS web application pool name and its corresponding PID value that you can use below.

DDv1-1-3 

Step 4: Click Next, then click On the Configure button

DDv1-1-4 

Step 5: Setup the rules as follows

  • Generate a userdump when private bytes reach - (Enter value)
  • And each additional 100 MB thereafter.
  • Auto-create a crash rule to get userdump on unexpected process exit

DDv1-1-5

Step 6: Click Save & Close and then the Next Button from the previous screen.

DDv1-1-6

Step 7: Type in any name that you like for the rule and also type in the path where you want the dumps to be generated. This drive must have lots of disk space as each dump file will be equal to the size of the process when the dump is captured. So since we are capturing it at 800 MB upwards here as in this example, this will create 10 dumps (by default) of 800 MB or higher each.

DDv1-1-7

Step 8: Finish up the rule and activate it. Then make sure you see the information screen like below

DDv1-1-8

You are done! You can see the rules that you just configured in the rules window. When a dump is captured, the userdump count column will have a value of 1 or more.

DDv1-1-9

Next Post: Using Analysis Scripts.

So, you have a managed dump and you want to find out the request headers. Here’s one of the methods I use to find this information. I use it especially when I want to view the session ID or cookies.

  • Load SOS
  • Run !aspxpages and note down the HttpContext address you are interested in. Eg:

0x120bea74    54000 Sec       yes                       XXX        200   GET /default.aspx
0x14244228    54000 Sec        no       284 Sec     XXX        200   GET /default.aspx
0x1426f4c0     54000 Sec        no       101 Sec     XXX        200   GET /default.aspx
0x163a3038    54000 Sec        no       193 Sec     XXX        200   GET /default.aspx
0x1e0c360c    54000 Sec        no       376 Sec      39          200   GET /default.aspx

  • Dump the HttpContext noted in Step 2: !do 0x1426f4c0
  • Dump the _wr field which is HttpWorkerRequest object.
  • Dump the _basicServerVars field: !dumparray <address of _basicServerVars>
  • Dump the last entry: !do <address of last entry in the list>
  • You should get the output like:

String: Connection: Keep-Alive
Cookie: ASP.NET_SessionId=5kwvjlzd3ksgii45ephn00aq; HTTP_REFERER::6841=
Host: Skyraider
User-Agent: DebugDiag Service HTTP Pinger

Hopefully this is what you wanted.

In one of my earlier posts, I discussed one of the reasons for compression failure and how we identified it using ETW traces and resolved it. Below are the list of other reason codes for your reference.

NO_ACCEPT_ENCODING The HTTP request did not contain the Accept-Encoding header
COMPRESSION_DISABLED No permissions for the IIS_WPG group or application pool identity on w3svc/filters/compression node in the IIS metabase file.
NO_COMPRESSION_10 The request is HTTP 1.0 and the IIS metabase key HcNoCompressionForHttp10 value = TRUE
NO_COMPRESSION_PROXY HTTP request contains a Via header which means the request is relayed via a proxy server AND the IIS metabase setting – HcNoCompressionForProxies has a value of TRUE
NO_MATCHING_SCHEME IIS could not find a matching configuration entry for the file extension of the requested web page. This typically means the extension was not added to HcFileExtensions or HcScriptFileExtensions list in IIS metabase.
UNKNOWN_ERROR Unknown reason.
NO_COMPRESSION_RANGE HTTP request contains a Range header and the IIS metabase setting – HcNoCompressionForRange is set to TRUE.
FILE_TOO_SMALL The file size is too small to be compressed.
FILE_ENCRYPTED The requested file is encrypted
COMPRESS_FILE_NOT_FOUND The compressed file was removed since it was compressed. This may also happen on the first request.
COMPRESS_FILE_STALE The compressed file has changed since it was compressed. It may be due to the symptoms described in Microsoft Knowledge Base article: 817442

For additional information, refer to this blog post from IIS support team.

You may perhaps have used Event Tracing Feature of Windows aka ETW for debugging many server side problems related to IIS. When I first learnt about ETW and started using it, I found it to be really cool! Unfortunately there’s not a lot of documentation around using it. For Eg: When to use which provider. it will be helpful to know which providers emit what information so that we can use a specific set of providers rather than a whole bunch of them, which of course will generate a ton of data. Looking through lots of data can sometimes be painful. Take an example where you want to enable ETW tracing but it may take a day or two for the problem to reproduce. Parsing the generated log can be a nightmare! So… I decided to put together this blog that gives information about some of the providers, if not all.

For a list of providers available on your machine, execute the following from a command prompt:

Logman Query Providers

The following table lists the details about providers (that I use usually) & their trace areas (where available). Use any combination of these providers depending on what problem you are troubleshooting.

Provider Trace Areas
IIS: WWW Server IISAuthentication, IISSecurity, IISFilter, IISStaticFile, IISCGI, IISCompression, IISCache, IISAll
IIS: IISADMIN Global Startup, Shutdown
IIS: WWW Global Startup, Shutdown, All
IIS: SSL Filter SSL related events
IIS: Request Monitor -
IIS: Active Server Pages (ASP) Events from ASP ISAPI
IIS: WWW Isapi Extension -
HTTP Service Trace -
ASP.NET Events  All ASP.net events

NOTE: ETW tracing is also very helpful when you want to view what is happening on the server side over a SSL connection.

I already have a blog post on using ETW providers to capture data & parsing ETW traces.

Windbg is a native debugger and you can use it to set a breakpoint on a virtual address. Any managed code running within the process wouldn’t have a virtual address associated with it until it is JIT compiled. Thus setting a breakpoint on a managed function is a bit tricky in Windbg. You can set a breakpoint on managed methods using windbg only:

  • When you are performing a live debug & not on a post mortem dump file.
  • You have a .RUN file from a Time Travel Debug trace.

When I started learning how to set managed breakpoints, one of the first questions I had is: How to set a breakpoint on a specific line of code in a managed method – because that is what we usually do in other IDE environments like Visual Studio. This is somewhat very difficult to do because, though you can get the virtual address where your method starts using the SOS commands, you will need to know the exact offset from the method’s starting virtual address [The actual address which corresponds to your line of code] and it isn’t easy at all to co-relate that to your source code. You will need to have an extremely good understanding of IL code, un-assemble the function using !u command and then set the breakpoint on that address. I do not have that skill yet, but will surely put out a post once I figure that out. So over here, I will describe how to set a breakpoint on a managed method for .NET Framework 2.0.

STEP 1: So assuming you are doing a live debug, the first step is to attach to the process that you want to debug. You can use the attach option in Windbg user interface [File menu]. Then load the SOS debugger extension - !loadby SOS mscorwks

STEP 2: You need to know which method you want to set a breakpoint on. The SOS command you need is !dumpmt with the –md parameter. This lists out the method table. For example, Dump the method table of System.Timespan

   1: !dumpmt -md 0x7911228c 
   2: EEClass: 791121e4 
   3: Module: 790c2000 
   4: Name: System.TimeSpan 
   5: mdToken: 02000114  (C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll) 
   6: BaseSize: 0x10 
   7: ComponentSize: 0x0 
   8: Number of IFaces in IFaceMap: 3 
   9: Slots in VTable: 56 
  10: -------------------------------------- 
  11: MethodDesc Table 
  12:    Entry MethodDesc      JIT Name 
  13: 796d2710   7914fb28     NONE System.TimeSpan.ToString() 
  14: 793624d0   7914b950   PreJIT System.Object.Finalize() 
  15: 796c07f8   7914fb08     NONE System.TimeSpan.CompareTo(System.TimeSpan) 
  16: 796d2708   7914fb18     NONE System.TimeSpan.Equals(System.TimeSpan) 
  17: 79381054   79266eb8   PreJIT System.TimeSpan..ctor(Int64) 
  18: 7939f058   79266ec0   PreJIT System.TimeSpan..ctor(Int32, Int32, Int32) 
  19: 7939f07c   79266ed8   PreJIT System.TimeSpan.get_Ticks() 
  20: 794002c8   79266ee0   PreJIT System.TimeSpan.get_Days() 
  21: 794002e8   79266ee8   PreJIT System.TimeSpan.get_Hours() 
  22: 79400328   79266ef0   PreJIT System.TimeSpan.get_Milliseconds() 
  23: 7940036c   79266ef8   PreJIT System.TimeSpan.get_Minutes() 
  24: 794003ac   79266f00   PreJIT System.TimeSpan.get_Seconds() 
  25: 794003ec   79266f08   PreJIT System.TimeSpan.get_TotalDays() 
  26: 7940040c   79266f10   PreJIT System.TimeSpan.get_TotalHours() 
  27: 79380c10   79266f18   PreJIT System.TimeSpan.get_TotalMilliseconds()
  28:  

STEP 3: [Optional] Using the method descriptor command, !dumpmd you can view if the code is JITted. See line #7 below. You can skip this and go to STEP 4 directly using the corresponding MethodDesc value from the previous output.

   1: !dumpmd 79266f18 
   2: Method Name: System.TimeSpan.get_TotalMilliseconds() 
   3: Class: 791121e4 
   4: MethodTable: 7911228c 
   5: mdToken: 0600101e 
   6: Module: 790c2000 
   7: IsJitted: yes 
   8: m_CodeOrIL: 79380c10

STEP 4: Add the breakpoint using !bpmd –md command.

   1: !bpmd –md 79380c10

Another way…

Syntax: !bpmd <ModuleName> <FunctionName>

Example: !bpmd mscorlib.dll System.TimeSpan.get_TotalMilliseconds

Notes

  1. The method names are case sensitive.
  2. In many cases, the breakpoints you set may be indicated as “Pending breakpoints”, which is normal, because your method may not yet be JITted.

Once your breakpoints are set, you can execute the g command to let the process execute till it hits the breakpoint. Once it hits the breakpoint you can do other tasks like examine callstacks, stack objects, local variables etc.

Have fun!

Issues related to high memory utilization on an IIS application server are common. With .NET there is a little misconception that the Garbage Collector (GC) will clean up objects and therefore the process can never run out of memory. This isn’t true. GC will never clean up an object which is in use. If that was the case, you can imagine the kind of problems it would create.

While debugging memory problems, it is a good idea to capture memory dump when the process memory consumption is at its peak maximum usage. For .NET applications, a System.OutOfMemoryException is thrown when GC fails on a VirtualAlloc().

So how do we capture a memory dump when this Exception is thrown? Here’s how.

For .NET Framework version 1.1

Open the registry path: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework

Key: GCFailFastOnOOM

Type: DWORD

Value: 2

For .NET Framework version 2.0 and above

Open the registry path: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework

Key: GCBreakOnOOM

Type: DWORD

Value: 2

Setting the above key causes a DebugBreak within the process when a System.OutOfMemoryException is encountered. You can then use a tool like DebugDiag or a Debugger like WinDBG/CDB/NTSD to capture a dump on this DebugBreak Exception. Windbg/CDB/NTSD debuggers are for advanced users and DebugDiag is generally preferred due to ease of use and is designed to be used in production environments.

Configuring Debug Diagnostic Tool

  1. Download and install DebugDiag to a drive with at least 4-5 GB of disk space.
  2. Open DebugDiag. If prompted to select a rule, select Crash. Else click on Add Rule button and select Crash.
  3. Click Next & select “A specific process”
  4. Select the process name. For IIS 6.0, this will be w3wp.exe. Click Next
  5. Under Advanced Settings, click on Exceptions, then click on Add Exception
  6. From the list of exceptions, select 80000003 Breakpoint Exception
  7. Set Action Type to Full userdump & Action limit to 1. Click OK
  8. Click Save & Close button
  9. Click Next and provide a name for the rule and location where the dump files must be saved.
  10. Click Next and then Finish button.

NOTE: When the dump is captured, the Userdump count column will be incremented by 1. You can then do post mortem debugging using Windbg and SOS.

 

Continuing my conversations on using tools, today I want to explain how to capture an ETW trace and parse it. Event tracing for Windows (ETW) is a very powerful, tracing mechanism built into the Windows operating system that allows you to view messages from various subsystems. This is very helpful in troubleshooting problems on the server side. We use it a lot in the IIS support group to troubleshoot various customer issues. In an earlier post, I discussed and explained the various Providers in Windows that spit information. We will use some of these providers today.  To view a list of providers available on your system, run the following command in a command prompt window.

Logman query providers

Stage 1: Capture an ETW trace in Windows Server 2003 with Service Pack 1 or upwards.

NOTE: You must be logged on as a computer administrator to perform these steps:

  1. Create a Trace text file that contains a list of the providers that you want to trace.
  2. Create a folder called ETW in C drive root
  3. Open a command prompt.
  4. Switch to C:\ETW folder.
  5. Type: Logman Query Providers > Providers.txt <enter> [This will output the results into Providers.txt file]
  6. Open Providers.txt in Notepad. you will have 2 columns – Provider & GUID.
  7. Remove the lines of those providers you do not wish to use. For IIS it will be something like this

IIS: WWW Global                          {d55d3bc9-cba9-44df-827e-132d3a4596c2}
IIS: SSL Filter                          {1fbecc45-c060-4e7c-8a0e-0dbd6116181b}
IIS: Request Monitor                     {3b7b0b4b-4b01-44b4-a95e-3c755719aebf}
IIS: WWW Server                          {3a2a4e84-4c21-4981-ae10-3fda0d9b0f83}
IIS: Active Server Pages (ASP)           {06b94d9a-b15e-456e-a4ef-37c984a2cb4b}
IIS: IISADMIN Global                     {DC1271C2-A0AF-400f-850C-4E42FE16BE1C}
IIS: WWW Isapi Extension                 {a1c2040e-8840-4c31-ba11-9871031a19ea}
HTTP Service Trace                       {dd5ef90a-6398-47a4-ad34-4dcecdef795f}

Now delete the values from the first column. We get:

{d55d3bc9-cba9-44df-827e-132d3a4596c2}
{1fbecc45-c060-4e7c-8a0e-0dbd6116181b}
{3b7b0b4b-4b01-44b4-a95e-3c755719aebf}
{3a2a4e84-4c21-4981-ae10-3fda0d9b0f83}
{06b94d9a-b15e-456e-a4ef-37c984a2cb4b}
{DC1271C2-A0AF-400f-850C-4E42FE16BE1C}
{a1c2040e-8840-4c31-ba11-9871031a19ea}
{dd5ef90a-6398-47a4-ad34-4dcecdef795f}

Now we will add flags to each provider. Flags indicate the areas to trace and the verbosity levels. Each flag will be separated by a TAB. After each entry, press Tab and type 0xFFFFFFFF, press Tab and type 0x5. For IIS: WWW Server only, this will be 0xFFFFFFFE 0x5. The flags 0xFFFFFFFF & 0xFFFFFFFE indicate all areas and 0x5 indicates full verbose mode.

Thus, we get:

{d55d3bc9-cba9-44df-827e-132d3a4596c2}    0xFFFFFFFF    0x5
{1fbecc45-c060-4e7c-8a0e-0dbd6116181b}    0xFFFFFFFF    0x5
{3b7b0b4b-4b01-44b4-a95e-3c755719aebf}    0xFFFFFFFF    0x5
{3a2a4e84-4c21-4981-ae10-3fda0d9b0f83}    0xFFFFFFFE    0x5
{06b94d9a-b15e-456e-a4ef-37c984a2cb4b}    0xFFFFFFFF    0x5
{DC1271C2-A0AF-400f-850C-4E42FE16BE1C}    0xFFFFFFFF    0x5
{a1c2040e-8840-4c31-ba11-9871031a19ea}    0xFFFFFFFF    0x5
{dd5ef90a-6398-47a4-ad34-4dcecdef795f}    0xFFFFFFFF    0x5

  1. Save the Providers.txt file.
  2. Go back to command prompt. We are now ready to start ETW Tracing.
  3. At the command prompt, type: Logman -start MyTrace -pf providers.txt –ets <enter>
  4. MyTrace is a name for the trace and is a mandatory parameter.

Now, we can reproduce the problem. Execute the HTTP request from any HTTP capable client or make the request via browser. Once the problem has been reproduced, we can stop the trace as follows:

At the command prompt, type: Logman –stop MyTrace –ets <enter>

We should now have a file called MyTrace.ETL in C:\ETW folder. This file is not readable using any editor. It first needs to be parsed.

Stage 2: Parsing an ETW Trace.

OK, so we now have a trace file that cannot be read by any editors because it is a binary file. To parse an ETW trace file, we need another tool – LogParser. I am sure many have heard great stories about this. This is another powerful tool and provided by Microsoft as a free download. Download LogParser v2.2. Then install it to C:\LogParser folder.

Important: You need to parse an ETW trace on the same version of the OS from where it was captured. This is because the ETW providers will be different for different versions of Windows.

Before you can use LogParser, there are a couple of things you need to do  - Register LogParser.dll in system registry and make a slight modification to the parsing Windows script so that it doesn’t prompt you.

  1. Open another command prompt and switch to C:\LogParser folder.
  2. Type: Regsvr32 Logparser.dll <Enter>. Click OK after the message.
  3. Copy the DumpTraceReqs.js file from C:\LogParser\Samples\Scripts to C:\Logparser [this is just for convenience]
  4. Open DumpTraceReqs.js in Notepad.
  5. Search for the text: “//Prompt the user to press a key”
  6. Comment out the lines WScript.Echo("[Hit ENTER...]"); and WScript.StdIn.ReadLine(); It should be as
    • //WScript.Echo("[Hit ENTER...]");
    • //WScript.StdIn.ReadLine();
  7. Save the file & close Notepad.

We are now ready to parse the ETW trace file. At the second command prompt, type the following command and press ENTER

Cscript DumpTraceReqs.js C:\ETW\MyTrace.etl > Output.txt <enter>

Wait for the command to finish and then open the file, Output.txt in Notepad. You should now have a file that contains a very organized collection of information that shows the activities in each stage of the request processing pipeline. Any kind of permissions problems, compression failure etc should show up here. It is self explanatory. However, if you do have questions on what you see in the ETW trace file, please share it or post it here and I’ll help.

LogParser is a very powerful tool and you can use SQL like commands for a variety of purposes including reading event logs, IIS logs, Pure text files, Network trace files and many others. It also includes many functions which you can use to transform data. Take a look at this post that contains a lot of examples.

I hope you found this post useful in debugging problems. Please feel free to leave your feedback.

 

Recently, we ran into a problem where static or dynamic compression would not work on a few websites, specially SharePoint sites, but works on other sites.

I always use our software tools and take a data driven approach for troubleshooting any kind of software problem. Software tools not only help you pin point problem area, but also assists in effective troubleshooting. Without data driven approach, we are pretty much into a guessing game or trial & hit method.

OK. So the next step is identify what tool(s) to use and the answer is – It depends. It depends on where you are troubleshooting (client side/server side etc). We know that IIS server is responsible for sending compressed responses. So naturally, we want to know why IIS isn’t compressing responses. There are many cases where IIS will not compress a response. For e.g. requests with status code 401. For a list of reasons, please refer to this blog post from Mike Laing, Escalation Engineer, Microsoft IIS support.

The most effective tools to trace into IIS are:

  1. Event Tracing for Windows aka ETW Tracing
  2. DebugView Tracing

ETW is the most easiest and preferred method because, unlike DebugView trace, it doesn’t require any registry changes and restart of the IIS services. In an earlier post, I discussed how to capture and parse an ETW trace.

Back to the problem… So in this case, here’s what the ETW trace showed us:

IISCompression: STATIC_COMPRESSION_NOT_SUCCESS - IIS has been unsuccessful doing static compression
Reason: COMPRESSION_DISABLED

CAUSE:

The problem occurs only when you use a custom account for application pool identity, which was the case here as well. When using a custom account for application pool, you must provide read permissions for that application pool identity on the metabase compression keys within IIS configuration file – metabase.xml, located in %windir%\system32\inetsrv, so that it can read the settings. A failure reading the settings will cause a status of COMPRESSION_DISABLED.

Honestly, I didn’t know the cause at the top of my head, but a genius (Vivek Kumbhar) in my team who fiddles around a lot with his machine knew the cause & resolution. I am trying to find out how we can track down this failure due to lack of permissions on certain metabase keys. I will update this post after that. I will also document the various reasons and their associated causes.

 

SOLUTION

So when using a custom identity for the application pool, take these additional steps

  1. Add the application pool identity to the IIS_WPG group.
  2. Provide the application pool identity Read permissions on the website content folder using NT ACLs.
  3. Application pool identity/IIS_WPG must have query permissions in the following metabase keys: (use IIS 6.0 Metabase Explorer from IIS Resource Kit):
  4. LM –> Read
  5. W3SVC –> Read
  6. AppPools -> Query Unsecure Property (Special Permission)
  7. Filters -> Read and Write
  8. Compression –> deflate (Read and Write - Inherited)
  9. Compression –> gzip (Read and Write - Inherited)
  10. Restart IIS

Internet Explorer 8 is expected to be available in the near future. It packs a whole lot of features. I downloaded the latest RC1 build from the Microsoft site and did some testing. This version packs in a whole lot of browser extensibility features besides many other CSS improvements.

One of the extensibility features is Accelerators. Accelerators allow users to quickly send information to a web service and get back some information. For example, you may be browsing a web page and came across a word for which you want to lookup the meaning. This is one of the scenarios where you can use accelerators. Websites can offer to install Accelerators as part of their service. A end user can then install Accelerators of his choice and just highlight a word on the web page and then select an Accelerator to use from a list of installed Accelerators. Internally, Internet Explorer 8 sends the highlighted text as a parameter to the selected Web Service and opens a new Internet Explorer Window OR a preview window where you can see the results.

So, how do you create a new accelerator for your website? It’s really simple! Here’s how to create one.

Step 1: Offer to install the Accelerator from your web page. For example, in your web page, add the following:

   1: <h1>Dictionary Lookup Accelerator</h1>
   2: <button onclick="window.external.AddService('http://www.mysite.com/dictionary.xml')"></button>

So what is window.external.AddService() method? That’s one of the new extensibility API in IE 8. More information is here

Step 2: Create your Accelerator definition XML file.

   1: <?xml version="1.0" encoding="UTF-8" ?>
   2: <openServiceDescription xmlns="http://www.microsoft.com/schemas/openservicedescription/1.0">
   3:     <homepageUrl>http://www.mysite.com/HomePage.asp</homepageUrl>
   4:     <display>
   5:         <name>Lookup in Dictionary</name>
   6:     </display>
   7:     <activity category="Define">
   8:         <activityAction context="selection">
   9:                 <preview method="post" action="http://www.mysite.com/Dictionary.asp">
  10:                     <parameter name="txtSearchFor" value="{selection}" type="text" />
  11:                 </preview>
  12:                 <execute method="post" action="http://www.mysite.com/Dictionary.asp">
  13:                     <parameter name="txtSearchFor" value="{selection}" type="text" />
  14:                 </execute>
  15:         </activityAction>
  16:     </activity>
  17: </openServiceDescription>

The XML file we just created is called the service description file and it contains specific description about the service provider and type of service it provides.

The first 2 lines define the OpenServiceFormat using the xml file version and the namespace. The namespace merely defines all elements in this specification. These lines must be in there.

The <homepageUrl> property : Defines the home page of the service provider. This domain name value must match the other URLs defined in this XML file, or you will get an error when Internet Explorer 8 attempts to install the accelerator. This is to avoid spoofing.

The <display> element : Defines the text that is presented to the user when they attempt to install the accelerator. A popup window appears on the browser that shows information about the type of Accelerator. This is where you provide that information. There can be only one definition of display element in the XML file.

The <name> element : The child of display element & contains the name of the service. There can only be one instance of this element in the XML file and the number of characters cannot exceed 50.

You can add 2 more child elements under the display element : 

  • <icon>URL</icon> – Holds the URL to the icon associated with the service.
  • <description>Your accelerator description</description> – Holds the description about the service.

The next section describes the Accelerator and its capabilities

The <activity> element : Describes the activities of this accelerator. The category attribute defines what the category of the accelerator such as Search, Map, Define. You can define your own category but there must be only one instance of this element.

The <activityAction> element : Child of <activity> element. This section defines information that is specific to a context of the activity using a context attribute. The context attribute indicates on what the accelerator should act upon. It can be one of the following values:

  • Selection (default) – The text selected by the user.
  • Document – The entire document.
  • Link – The URL, if selected by the user.

The <preview> element : Child of <activityAction> element. You can preview the information without opening the site. Internet Explorer 8 will open a box where you can preview the information without actually opening the web page. The preview element defines the action for preview. In the above example, we asked it to post the information to a specific web service URL – dictionary.asp using the action attribute. This is similar to the action attribute in a web page. The method attribute is self explanatory.

The <parameter> element : Child of <preview> or <execute> element. The <parameter> element defines the name and values you want to pass to the URL when executing a preview or execute. This is the posted data to the service. If you are not sure what the parameter names are for a service, such as live search or Encarta dictionary, you can use Fiddler to figure it out. The {selection} is like a variable. It will be replaced by the user selected value when invoking the web service. The Type defines the parameter data type.

The <execute> element : This element defines the action to take upon executing the accelerator. Typically this opens a new instance of the browser and displays the results using the posted data.

If you want to get started quickly, simply copy the XML from above and modify the parameters and use it. If you don’t yet have a site or IIS, simply download and install Visual Web Developer from here. Then

  • Create a new ASP.net website using File System option.
  • Create a new HTML file and paste the HTML above.
  • Create the XML file with the desired parameters.
  • Press CTRL + F5 to run your web page and install the Accelerator.
  • You can then use it with any website out there.

Isn’t it simple and really cool feature?

That’s it for now. Hope you will have fun creating your own accelerators and improve your browsing experience.

 

Unlike native debugging, you don’t need symbols for debugging managed code. SOS.dll can also be used with WinDBG and Visual Studio debugger. Simply use the intermediate window in Visual Studio to load SOS.dll and then you can use the commands provided by the extension.

My favorite blog for managed debugging (and almost everyone else’s) is by Tess Ferrandez-Norlander. Tess is an Escalation Engineer with Microsoft. I don’t think anyone else explained managed debugging better than Tess. It is a good place to start with the basics & I highly recommend reading her blog. Her blog not only teaches the techniques of debugging, but also contains code examples that illustrate various problems. The comments section of each post also has very valuable information.

So here, I explain how I go about looking at high memory dumps:

Big Question: I always have a question in my mind: Where is most of the memory? Then start looking for it. Once I identify what all components are majorly contributing to memory usage, I check to see if the problem can be resolved with a hotfix. While this usually requires identifying patterns and some experience knowledge, for beginners, it is still worth checking the Microsoft support site to see if the problem you are running into is described in a KB article and if a hotfix is available. The easiest way is to stay updated because Microsoft continuously investigates such issues and provides hotfixes. If the problem is traced down to application code, the developers should work to reduce memory usage.

First step - Open the dump in WinDBG. For instructions on setting up WinDBG, please see this post.

First, we need to find out where “most” of the memory is. A debugger command is available that will help us here – !address –summary. To effectively understand and interpret what the output means we need to first understand what the output values represent. Take a look at this MSDN article. In the output of !address –summary command, the column named “KB” indicates in KiloBytes, the amount of memory in each of these “areas” and the Pct(Tots) column indicates a percentage of the value in KB column against Total address space (4 GB on 32 bit).

Thus, if RegionUsageIsVAD is high, it indicates that most memory is in virtual allocations. .NET heaps are made with calls to VirtualAlloc. The GC in .NET uses the Microsoft Win32® VirtualAlloc() application programming interface (API) to reserve a block of memory for its heap. Depending on the flavor of GC in use, it may be 32 MB (Workstation GC) allocations or 64 MB (Server GC). For ASP.net applications running in IIS, by default, the Server GC is automatically selected, if the computer has 2 or more processors.

Similarly, RegionUsageFree indicates most of memory is “Free”. If applications start throwing out of memory exceptions and a captured dump shows most of memory is Free, then it indicates fragmentation within the process address space. Typically if 30% or more of memory is Free, then I’d suggest investigating why memory is fragmented.

If RegionUsageImage is high, then in means you have lots of DLLs being loaded in the process. Ideally 100 MB or less is good. ASP.NET applications should not have debug attribute set to TRUE in web.config. Enabling debugging emits debug information which can be an overhead and cause memory problems. Similarly, there should not be any modules that are built in debug mode.

If RegionUsageHeap is high, it means most of the memory is on heaps. Eg the C Runtime heap, MDAC (Microsoft Data Access Components) heap, heaps used for compression etc. When you use the MSVCRT.dll library, it internally ends up allocating on the C runtime heap by calls to the CAlloc API.

Note that there are many tools out there (not just winDBG) that allows you to identify the allocation profiles of applications. Eg: For .NET applications you can use CLRProfiler (Microsoft) and Ant’s Profiler (RedGate). I do not know about profilers for native C, C++ applications. However Debug Diagnostics Tool v1.1 from Microsoft includes a capability for injecting a DLL – Leaktrack into the process that hooks into the calls like CAlloc and can keep track of who made the allocations. DebugDiag also includes a Memory pressure analysis script that can read a dump with leak track injected and give you a very nice report on where the allocations are – By size, by count and many other parameters. It’s 99% helpful in debugging web applications hosted on IIS. I will discuss this in another blog post.

WinDBG is usually used in post mortem debugging. Profilers should be used only in development and test environments, not production. They are very invasive and will reduce performance by about 10 times.

Continuing a bit more about RegionUsageIsVAD… If you determine that most memory is from Virtual Allocations AND you have ASP.net web applications running in IIS, then almost always it is likely that large amounts of memory is allocated in managed heaps. You will then need to investigate into .NET heaps to determine the top memory consumers. That’s where the !dumpheap –stat command comes into the picture.

More in my next post.

In one of my earlier post I explained how to use Microsoft Network Monitor to debug a networking problem. Network trace tools aren’t very useful in debugging problems when the channel is secured (HTTPS) and you need to view the data to make your conclusions. However you can still debug SSL handshake failures using network monitor. Here’s a scenario.

A client (Not browser) is trying to connect to a IIS web server by sending its client certificate to post some data. However, the IIS machine always rejects the authentication.

The first step was to take a network trace as usual. For instructions on how to capture simultaneous traces, see this post

Analyzing the network trace:

I filtered the traffic by the keyword SSL. In the display filter tab, type SSL and click on Apply button.

Here are some of the frames the we picked from the capture:

Frame Src Dst Protocol Description
7 Client Server SSL SSL:SSLv2RecordLayer, ClientHello(0x01)
9 Server Client SSL SSL:  Server Hello. Certificate.
12 Server Client SSL SSL:Continued Data: 1460 Bytes
13 Server Client SSL SSL:Continued Data: 1460 Bytes
18 Server Client SSL SSL:Continued Data: 1460 Bytes
22 Server Client SSL SSL:Continued Data: 1460 Bytes
27 Server Client SSL SSL:  Encrypted Handshake Message.
30 Client Server SSL SSL:  Certificate. Client Key Exchange. Change Cipher Spec. Encrypted Handshake Message.
32 Server Client SSL SSL:  Encrypted Alert.

During SSL connection negotiation process, the client and server can mutually exchange certificates for authentication. The client authentication can be optional. To understand more about how the SSL negotiation takes place, please see these Microsoft KBs

Description of the Secure Sockets Layer (SSL) Handshake

Description of the Server Authentication Process During the SSL Handshake

Essentially, what is going on here can be summarized as:

  1. The client makes a hello request in Frame 7
  2. The server responds with its certificate and then continued bytes from the server certificate.
  3. The client then sends an "Encrypted handshake message"
  4. The client then sends its certificate with Client Key exchange and also indicates a change of cipher spec.
  5. In frame 32, we can see an encrypted alert!

While performing any type of debugging, we need to follow the data; we need to look for something that is “interesting”. In this case an “alert” from the server is being sent. So we looked at what this alert is. 

Since our “interesting” frame is 32, we looked more at the headers and the details in the frame. Here is the frame in detail

  Frame: Number = 32, Captured Frame Length = 61, MediaType = ETHERNET
+ Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[MAC Address],SourceAddress:[Source MAC Address]
+ Ipv4: Src = ServerIP, Dest = ClientIP, Next Protocol = TCP, Packet ID = 5022, Total IP Length = 47
+ Tcp: Flags=...AP..., SrcPort=5443, DstPort=1100, PayloadLen=7, Seq=2764379044 - 2764379051, Ack=3896915131, Win=64240 (scale factor 0x0) = 64240
- Ssl:   Encrypted Alert.
  - TlsRecordLayer:
     ContentType: Encrypted Alert
  - Version: TLS 1.0
      Major: 3 (0x3)
      Minor: 1 (0x1)
     Length: 2 (0x2)
     EncryptedData: Binary Large Object (2 Bytes)

Here is the handshake data in Hex (This data isn't encrypted yet because the handshake is still in progress)

00 1E 68 0F 3E 80 00 30 48 7E 1B 90 08 00 45 00 00 2F 13 9E 40 00 3D 06 D2 0E 0A F4 40 0F 0A 33 02 E7 15 43 04 4C A4 C5 13 A4 E8 46 34 BB 50 18 FA F0 2B B8 00 00 15 03 01 00 02 02 2A

We can conclude a lot from this frame. We know that this is an ethernet packet. The TCP flags set are AP, which means transfer data to the end application on the client without buffering at the TCP level, the communication port on the server is 5443, the client port is 1100, the window size for data transmission is 64240 bytes and so forth. The most interesting set of data is the TLS record layer… It shows the major and minor versions in use after negotiation and the length of the data, which is 2 bytes. This data is encrypted.

So here’s the trick now. This is an encrypted alert and RFC defines the alert descriptions. You can find it here for TLS 1.0 (Transport layer security)

Take the hex value of the 2 bytes of this message, which is 2A (The last 2 values, all values before this are headers in hex and we can ignore that because network monitor already gave us that information)

Using Windows calculator, convert hex 2A into decimal. We get: 42.

Now, let’s go look at the TLS 1.0 RFC to find out what what decimal value 42 represents for Alert Description. You can search for “AlertDescription” on the page. You should get:

decompression_failure(30),
handshake_failure(40),
bad_certificate(42),
unsupported_certificate(43),
certificate_revoked(44),
certificate_expired(45),
certificate_unknown(46),
illegal_parameter(47),
unknown_ca(48),
access_denied(49),
decode_error(50),
decrypt_error(51),
export_restriction(60),
protocol_version(70),
insufficient_security(71),
internal_error(80),
user_canceled(90),
no_renegotiation(100),
(255)
} AlertDescription;

Thus, we can see that 42 translates to bad_certificate. So we can say that the client sent a bad certificate to the server and therefore the server rejected the connection request. Armed with this information, we checked the client certificate in the certificate store and indeed, we had a bad certificate. Replacing it with a good certificate fixed the problem!

Software has some a long way and enabled people around the world to do many things. It has also made our lives so much easier. However these new capabilities have also created new problems and the answers to those problems are software tools which are excellent utilities for debugging these problems. I want to once again thank all those developers who made these tools, continue to improve and build new tools. It makes the job of a support professional much easier and helps a lot of customers resolve complex problems. Thank you!

IISreset.exe is a magical command that resolves many problems! Err… well, yeah that command restarts all IIS related processes and thus essentially running it cleans up all the “mess”. So is there anything special about IISReset.exe in Windows 2008? Yes and I do not recommend using it and here’s very useful information that you must know about what this command does.

Windows 2008 introduced a new service called Windows Process Activation service aka WAS. In IIS 7.0 you can run HTTP and non HTTP services (like net.tcp) in the same application pool. WAS manages application pools and worker processes for HTTP and non HTTP protocols. IIS uses the HTTP protocol with WAS. With WAS we opened it up for others to use our process model. The IIS process model is very efficient and I believe we made this change due to good feedback.

Back to IISERSET…. IISReset.exe restarts the following Windows Services

  • IISADMIN
  • W3SVC
  • WAS

So do you see what would happen if your ran IISReset.exe with the intention to restart ONLY IIS, but you also have other services dependent via WAS?

Conclusion: Running IISReset.exe will restart WAS and restarting WAS will restart all services that are dependent on it, including IIS, if installed. This can be problematic in production environments where you have non HTTP services implemented with WAS but your intention is to only restart IIS. An example to this will be with Small Business Server.

Therefore, the recommended way to restart IIS on Windows 2008 machines will be to use one of the following methods:

  • Open IIS Manager and then select the server node. In the right side pane, you will have an option to restart IIS.
  • Open a command prompt as Administrator and run the following commands
  1. net stop w3svc
  2. net stop IISAdmin
  3. net start IISAdmin
  4. net start w3svc

There are a lot of software tools provided by Microsoft and written by other companies that really make the job of a support engineer easy. Without software tools, it is extremely difficult to track down software problems. Mark Russinovich is famous for the Windows tools he has written and they are widely used by Microsoft support teams to help debug various customer issues. Microsoft also provides a lot of software tools. In the IIS support group, we deal with HTTP failures that are a result of networking problems. Once we isolate it, the next thing to do is get simultaneous network traffic captures from the client and the server. That way we can see how the traffic flows between client and server and draw good conclusions on where the problem might be. There are two tools that can do this job and both are extremely good & very popular tools. Microsoft Network Monitor & Wireshark are a couple of tools that we use. You can download Microsoft Network Monitor from the Microsoft Download Center and Wireshark from Wireshark.org

For a quick start on reading network traces using Network Monitor, please see this Microsoft article: Basics of Reading TCP/IP traces

Recently we used Network Monitor to resolve a problem – An ASP.net web page streams files to clients using Response.WriteFile method. All other pages work just fine, however when sending the file, the clients end up getting – Page cannot be displayed. I had earlier written this blog post about this error, but which was for different reason. In that case, no web pages would work but in this case only the file download page failed.

One of the first things we do is “isolate the problem area”. Isolating the problem helps you focus on a specific part rather than looking at too many variables & possibilities. In this case we wanted to isolate if this an application related problem or just networking. We could never reproduce the problem by browsing the page from the web server console. We always had success from the server console. By browsing the page from the server console, we take the network out of the equation. Because we had success in this case, it was determined that this problem is caused by something going on in the networking layer.

The next thing to do is get data about the underlying traffic looks during the failure. Remember that you always need to get good data to draw useful conclusions. Without good data, we are just shooting in the dark. In this case we captured simultaneous network traces. Please refer to this post for steps to take simultaneous network traces. Here’s how we went about analyzing the traces.

Opened the trace in Microsoft Network Monitor

The next thing to do is filter the traffic we are interested in. Take a moment to look at the user interface items of Network Monitor that I highlighted in red circles.

NetmonUI

The Display Filter tab allows you to specify keywords or expressions that will help you filter traffic. For Eg. if you want to see only HTTP traffic, you can type http in the Display Filter text area and click on Apply button.

In this particular case I not only wanted to see HTTP traffic but also the TCP frames (between the web server this the client) and therefore I used a different filter, which was:

(tcp.SrcPort == 3117 && tcp.DstPort == 80)
||
(tcp.SrcPort == 80 && tcp.DstPort == 3117)

So how did I figure the port numbers? HTTP port number on a web server is almost always 80 unless the URL in the browser contained the port number like http://localhost:8080. So that is how I got the DstPort value. Next, I wanted to get the SrcPort. I filtered using http and looked for a frame with the URL that we used in reproducing the problem. Then selected that frame and looked at the Frame Details pane to get the SrcPort & DstPort values.

Compare the traces from the client & server captures using Frame Summary Window.

This is where it gets a bit tricky for people who are not familiar with reading traces. For most its just a lot of data and numbers, but let me help you read these traces. Pay special attention to the coloring as they are important.

Client Capture

Frame Time Src IP Dst IP Protocol Description
182 11:38:16.449 CLIENT SERVER TCP TCP:Flags=......S., SrcPort=3117, DstPort=HTTP(80), PayloadLen=0, Seq=1608257832
183 11:38:16.465 SERVER CLIENT TCP TCP:Flags=...A..S., SrcPort=HTTP(80), DstPort=3117, PayloadLen=0, Seq=3131352675, Ack=1608257833
184 11:38:16.465 CLIENT SERVER TCP TCP:Flags=...A...., SrcPort=3117, DstPort=HTTP(80), PayloadLen=0, Seq=1608257833, Ack=3131352676
185 11:38:16.465 CLIENT SERVER HTTP HTTP:Request, POST /Server/AppFolder/SendFile.aspx
186 11:38:16.465 CLIENT SERVER TCP TCP:[Continuation to #185]Flags=...AP..., SrcPort=3117, DstPort=HTTP(80), PayloadLen=1095, Seq=1608259213 - 1608260308
188 11:38:16.496 SERVER CLIENT TCP TCP:Flags=...A...., SrcPort=HTTP(80), DstPort=3117, PayloadLen=0, Seq=3131352676, Ack=1608259213
189 11:38:16.496 SERVER CLIENT TCP TCP:Flags=...A...., SrcPort=HTTP(80), DstPort=3117, PayloadLen=0, Seq=3131352676, Ack=1608260308
1751 11:39:46.509 SERVER CLIENT TCP TCP:Flags=...A...F, SrcPort=HTTP(80), DstPort=3117, PayloadLen=0, Seq=3131352676, Ack=1608260308
1752 11:39:46.509 CLIENT SERVER TCP TCP:Flags=...A...., SrcPort=3117, DstPort=HTTP(80), PayloadLen=0, Seq=1608260308, Ack=3131352677
1753 11:39:46.509 CLIENT SERVER TCP TCP:Flags=...A...F, SrcPort=3117, DstPort=HTTP(80), PayloadLen=0, Seq=1608260308, Ack=3131352677
1754 11:39:46.525 SERVER CLIENT TCP TCP:[Segment Lost]Flags=...A...., SrcPort=HTTP(80), DstPort=3117, PayloadLen=0, Seq=3131352677, Ack=1608260309

Server Capture

Frame Time Src IP Dst IP Protocol Description
2789 11:28:18.442 CLIENT SERVER TCP TCP:Flags=......S., SrcPort=3117, DstPort=HTTP(80), PayloadLen=0, Seq=2256040559, Ack=0
2790 11:28:18.442 SERVER CLIENT TCP TCP:Flags=...A..S., SrcPort=HTTP(80), DstPort=3117, PayloadLen=0, Seq=3177618710, Ack=2256040560
2791 11:28:18.442 CLIENT SERVER TCP TCP:Flags=...A...., SrcPort=3117, DstPort=HTTP(80), PayloadLen=0, Seq=2256040560, Ack=3177618711
2792 11:28:18.442 CLIENT SERVER HTTP HTTP:Request, POST /Server/AppFolder/SendFile.aspx
2793 11:28:18.442 CLIENT SERVER TCP TCP:[Continuation to #2792]Flags=...AP..., SrcPort=3117, DstPort=HTTP(80), PayloadLen=12, Seq=2256041928 - 2256041940, Ack=3177618711
2794 11:28:18.442 SERVER CLIENT TCP TCP:Flags=...A...., SrcPort=HTTP(80), DstPort=3117, PayloadLen=0, Seq=3177618711, Ack=2256041940
2795 11:28:18.442 CLIENT SERVER HTTP HTTP:HTTP Payload, URL: /Server/AppFolder/SendFile.aspx
2859 11:28:18.563 SERVER CLIENT TCP TCP:Flags=...A...., SrcPort=HTTP(80), DstPort=3117, PayloadLen=0, Seq=3177618711, Ack=2256043035
6447 11:29:48.425 CLIENT SERVER TCP TCP:Flags=...A...F, SrcPort=3117, DstPort=HTTP(80), PayloadLen=0, Seq=2256043035, Ack=3177618711
6448 11:29:48.425 SERVER CLIENT TCP TCP:Flags=...A...., SrcPort=HTTP(80), DstPort=3117, PayloadLen=0, Seq=3177618711, Ack=2256043036

Observation

  1. If you compare the sequence (Seq) numbers (highlighted in red) for each frame in both captures, they are different. So what does this mean? It means, someone in between the client and server changed the sequence numbers. In a normal capture, the sequence numbers will be the same in both captures for each corresponding frame.


  2. The value for PayLoadLen (highlighted in yellow) are different in the server and client captures. What does this mean? It would indicate that the packets were split – by some device/program in between.


  3. The client capture indicates that the server closed the connection by sending a “FIN” (see frame 1751 in the client capture). However, you do not see this in the server capture. The server never set the FIN TCP flag!

Summary & Conclusion

With this data, it is clear that there is a “middle man”, perhaps a device or software in between the client and server that isn’t handling the data flow correctly. The next step is to look at the networking infrastructure or get a network administrator to look at the devices that are in between the clients and IIS web server and isolate the offending device.

More Posts Next page »
 
Page view tracker