Heap corruption is by nature a complicated issue to troubleshoot and in some cases luck is as important as debugging knowledge as well. I got an assistance request from one customer saying that they encountered a W3WP process crash intermittently. He reported that the server was indeed under heavy load and sometimes would report HTTP 500 error but sometimes the process was just terminated unexpectedly. It crashed with the second chance access violation (C0000005) exception and we were able to get the crash dump with DebugDiag.

 

From the dump, we can find the thread crashed on CUSTOM_ERROR_TABLE::FindCustomError. With IIS source code (which is something I can’t share with you guys), I know the function was to find the applicable custom error entry for a given status/subcode.

 

Here was the call stack:

0:033> knL

# ChildEBP RetAddr 

00 06cee2cc 5a49fb48 w3core!CUSTOM_ERROR_TABLE::FindCustomError+0x18

01 06cee428 5a42392b w3core!ISAPI_REQUEST::GetCustomError+0x8e

1b 06ceffb8 7c82482f msvcrt!_threadstartex+0x74

1c 06ceffec 00000000 kernel32!BaseThreadStart+0x34

 

The calls stack was quite clean without any 3rd components to suspect. J It’s also very rare that the crash happened within IIS module and as you may not know w3core.dll is the core component in IIS 6(iiscore.dll in IIS 7) there can’t be a bug in it as IIS 6 has been released for more than 7 years and the number of calling FindCustomError can be as big as myriad. If there is a bug, it can’t survive for a minute.

 

But the fact is it just crashed in it. Why? I have no idea and have to start with checking the register status: ESI is null, which seems to be the direct culprit.

 

eax=06ce0000 ebx=06ceedb4 ecx=017c4774 edx=000006e2 esi=00000000 edi=06ceedb4

eip=5a49fbd2 esp=06cee2c4 ebp=06cee2cc iopl=0         nv up ei ng nz ac po cy

cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010293

w3core!CUSTOM_ERROR_TABLE::FindCustomError+0x18:

5a49fbd2 668b4608        mov     ax,word ptr [esi+8]      ds:0023:00000008=????

 

Now what we want to do is we want to check out why ESI was null so we analyzed the below assembly. I gotta admit that after undergraduate I rarely did assembly programming. Things in mind were just as simple as push, pop, mov…. So don’t stop from here if you are not familiar with assembly. J

 

The below assembly code still seems to be simple. We see the address stored in ECX is null which caused the C++ exception.

 

w3core!CUSTOM_ERROR_TABLE::FindCustomError [d:\nt\inetsrv\iis\iisrearc\iisplus\ulw3\customerror.cxx @ 48]:

5a49fbac 8bff            mov     edi,edi

5a49fbae 55              push    ebp

5a49fbaf 8bec            mov     ebp,esp

5a49fbb1 56              push    esi

5a49fbb2 57              push    edi

5a49fbb3 8b7d10          mov     edi,dword ptr [ebp+10h]

5a49fbb6 85ff            test    edi,edi

5a49fbb8 0f8467fdffff    je      w3core!CUSTOM_ERROR_TABLE::FindCustomError+0x59 (5a49f925)

5a49fbbe 837d1400        cmp     dword ptr [ebp+14h],0

5a49fbc2 0f845dfdffff    je      w3core!CUSTOM_ERROR_TABLE::FindCustomError+0x59 (5a49f925)

5a49fbc8 8b31            mov     esi,dword ptr [ecx]<=================poi(ecx) -> esi ; poi(ecx) = null

5a49fbca 3bf1            cmp     esi,ecx

5a49fbcc 0f8448fdffff    je      w3core!CUSTOM_ERROR_TABLE::FindCustomError+0x38 (5a49f91a)

0:033> u

w3core!CUSTOM_ERROR_TABLE::FindCustomError+0x18 [d:\nt\inetsrv\iis\iisrearc\iisplus\ulw3\customerror.cxx @ 70]:

5a49fbd2 668b4608        mov     ax,word ptr [esi+8]  <==============esi is null

5a49fbd6 663b4508        cmp     ax,word ptr [ebp+8]

5a49fbda 7404            je      w3core!CUSTOM_ERROR_TABLE::FindCustomError+0x22 (5a49fbe0)

5a49fbdc 8b36            mov     esi,dword ptr [esi]

5a49fbde ebea            jmp     w3core!CUSTOM_ERROR_TABLE::FindCustomError+0x34 (5a49fbca)

5a49fbe0 668b460a        mov     ax,word ptr [esi+0Ah]

5a49fbe4 663b450c        cmp     ax,word ptr [ebp+0Ch]

5a49fbe8 0f841c780000    je      w3core!CUSTOM_ERROR_TABLE::FindCustomError+0x3f (5a4a740a)

 

 

Why ECX is NULL? We know “this call” calling convention (used for calling C++ non-static member functions) will pass “this” pointer in ECX. As you see, dt this will show the structure name. Although the debugger says “this” is stored in edx, it's wrong! It should be in ecx which is null.

 

0:033> dt this

Local var @ edx Type CUSTOM_ERROR_TABLE*

 

0:033> dd poi(ecx) L1

01dc7540  00000000

 

Then we dumped the CUSTOM_ERROR_ENTRY and would like to see why the object CUSTOM_ERROR_TABLE was null.

0:033> !list "-t ntdll!_LIST_ENTRY.Flink -e -x \"dt w3core!CUSTOM_ERROR_ENTRY @$extret\" 017c4774"

dt w3core!CUSTOM_ERROR_ENTRY @$extret

   +0x000 _listEntry       : _LIST_ENTRY [ 0x1dc7540 - 0x1d60620 ]

   +0x008 _StatusCode      : 0xbda0

   +0x00a _SubError        : 0x1db

   +0x00c _strError        : STRU

   +0x03c _fIsFile         : 0n0

 

dt w3core!CUSTOM_ERROR_ENTRY @$extret

   +0x000 _listEntry       : _LIST_ENTRY [ 0x0 - 0x0 ]

   +0x008 _StatusCode      : 0

   +0x00a _SubError        : 0

   +0x00c _strError        : STRU

   +0x03c _fIsFile         : 0n0

 

0:033> ? 0xbda0

Evaluate expression: 48544 = 0000bda0

 

Obviously, the custom error list is completely corrupted so we can't get actual custom error 500.100 (Internal Server Error - ASP error).

 

0:033> .frame 0

00 06cee2cc 5a49fb48 w3core!CUSTOM_ERROR_TABLE::FindCustomError+0x18 [d:\nt\inetsrv\iis\iisrearc\iisplus\ulw3\customerror.cxx @ 70]

0:033> dv

           this = 0x000006e2

     StatusCode = 0x1f4<=====500

       SubError = 0x64<======100

       pfIsFile = 0x06ceedb4

      pstrError = 0x06cee334

 

 

Checked the list entry address 0x1d60620 0x1dc7540 , it belongs to head 0x2b0000 which is msvcrt heap.

 

0:033> !address 0x1d60620

                                   

Usage:                  Heap

Allocation Base:        01d50000

Base Address:           01d50000

End Address:            01e50000

Region Size:            00100000

Type:                   00020000               MEM_PRIVATE

State:                  00001000               MEM_COMMIT

Protect:                00000004             PAGE_READWRITE

More info:              heap containing the address: !heap 0x2b0000

More info:              heap entry containing the address: !heap -x 0x1d60620

 

Heap 3 - 0x002b0000

Heap Name   msvcrt!_crtheap

Heap Description   This heap is used by msvcrt

Reserved memory   3.13 MBytes

Committed memory   1.64 MBytes (52.38% of reserved) 

Uncommitted memory   1.49 MBytes (47.63% of reserved) 

Number of heap segments   3 segments

Number of uncommitted ranges   1 range(s)

Size of largest uncommitted range   1.43 MBytes

Calculated heap fragmentation   3.94%

 

At the current stage, we believe that it is a typical heap corruption. While debugging heap corruption issues is not an easy task because the thread that causes the exception is not usually the thread that caused the corruption (FindCustomError is the victim in this case), we still can use pageheap.exe with full switch to capture another round of IIS crash dump. After several days monitoring, we were able to collect what we want and find out the culprit module. We are lucky as pagehelp didn’t keep silent.

 

Something else we’ve done is we searched the 0001003f pattern and hoped to find some clues. No luck! But it is really useful in some cases while it really requires luck (charter) as well. A good post for your reference here about 0001003f pattern: http://blogs.msdn.com/slavao/archive/2005/01/30/363428.aspx

Regards,

Yawei