<sigh>

 

So this issue was incredibly interesting, but you know at the end it just makes you mad. This past week a customer was having a very clear cut, reproduciable issue: When attempting to attach an item to a message via OWA, the Information Store would crash with what appeared to be stack corruption. Boy, I didn’t know where this would take me and everyone else that got drawn into this issue. Below is our debug journey to understand what was happening. The conclusion of this was a collective effort of several Escalation Engineers, so I am not taking full credit for all of this investigation… The information below may make it look easy, but it had us going in an infinite loop for a couple of hours <grin>

 

First the stack looked like this after the crash (yes I had correct symbols in place):

0:058> kb

ChildEBP RetAddr  Args to Child             

WARNING: Frame IP not in any known module. Following frames may be wrong.

7c576b4e f8458d51 0875ff50 ffffb7e8 0fc085ff 0x77f87cac

 

So based upon my past experience with Stack corruption, I looked at raw dump of the stack which showed over and over an exception handler was attempting to be located, which chewed up the remnants of the original stack. Fortunately the customer allowed me to perform a remote debug, so I didn’t have to decipher the mess left on the stack or attempt to find the real .cxr (Exception Record), besides there where errors when this userdump was created related to not being able to read memory… so we really couldn’t trust everything in the userdump. In turns out there was a valid reason for this error...


So after manually walking through the functions that are called, I eventually found the crash was occurring in this code path:

 

ChildEBP RetAddr 
25b0cf10 001f3cae ESE!ErrOSSLVFileOpen+0x61
25b0d68c 001edb85 ESE!ErrSLVSetColumnFromFile+0x5b
25b0dc84 001c4bc8 ESE!ErrSLVSetColumn+0x4d9
25b0dcc0 001c49a6 ESE!ErrIsamSetColumn+0x1ec
25b0dcd8 001c48f1 ESE!ErrESESetColumn+0x6f
25b0dd10 001c4885 ESE!JetSetColumnEx+0x4d
25b0dd54 004041d1 ESE!JetSetColumn+0x4c
25b0dd84 0040f710 store!JetSetColumnFn+0x4a
25b0ddc8 004a78ba store!JTAB_BASE::EcSetColumn+0x1aa
25b0de08 004a50ea store!ATTACH::EcFlushFileHandleProps+0x94
25b0de9c 004a45c9 store!ATTACH::EcFlushProps+0xac7
25b0df18 004a42d6 store!ATTACH::EcSaveChanges2+0x4a1
25b0df30 004a429a store!EcSaveOneAttach+0x19
25b0df58 213bb966 store!EcSaveChangesAttachOp2+0x81
25b0df6c 213bb7c2 EXOLEDB!CStoreLogon::ScSaveChangesAttach+0x42
25b0e030 213bb43d EXOLEDB!CWMMultiPartFormStream::ScCommitAttachment+0x1c6
25b0e06c 213c0233 EXOLEDB!CWMMultiPartFormStream::FWrite+0x9b
25b0efe0 2138ddb0 EXOLEDB!CRequestStream::ScProcessContinuation+0x8e
25b0f014 2138d9c9 EXOLEDB!ScDispatchRequest+0x1e1
25b0fec8 2138d88b EXOLEDB!CDavServer::ProcessItem+0x94

 

So in this code path we are taking in the information about the file and creating a file handle by calling NTCreateFile(). This is the call that causes the crash to occur. Simply stepping into this call instruction the store will crash. This means the problem is somewhere in NtCreateFile.

 

0:014>r
eax=20b9cb90 ebx=80000000 ecx=20b9cbc0 edx=20b9cba8 esi=00000000 edi=0000004b
eip=001f429f esp=20b9cb54 ebp=7c576b4e iopl=0         nv up ei pl zr na po cy
cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00000247
ESE!ErrOSSLVFileOpen+0x3b7:
001f429f ff159c253200 call dword ptr [ESE!pfnNtCreateFile (0032259c)]{ntdll!ZwCreateFile (
77f87cac)} ds:0023:0032259c=77

 

OK, so I dumped the parameters about to be passed to NTCreateFile (not shown) and nothing looked out of the ordinary. Next was to unassemble NtCreateFile and see what is about to happen:

 

0:014> u eip
ntdll!ZwCreateFile:
77f87cac b820000000       mov     eax,0x20
77f87cb1 8d542404         lea     edx,[esp+0x4]
77f87cb5 cd2e             int     2e
77f87cb7 c22c00           ret     0x2c

 

Still nothing out of the ordinary in comparison to a working system, so stepping through NTCreateFile actually gets me into ZwCreateFile (the backend function that is really called). Here are the register states as we sit poised at the first instruction of ZwCreateFile.

 

0:014> t
eax=20b9cb90 ebx=80000000 ecx=20b9cbc0 edx=20b9cba8 esi=00000000 edi=0000004b
eip=77f87cac esp=20b9cb50 ebp=7c576b4e iopl=0         nv up ei pl zr na po cy
cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00000247
ntdll!ZwCreateFile:
77f87cac b820000000       mov     eax,0x20

 

 

So I executed the first assembly instruction of ZwCreateFile, but $#%* what happened? EIP didn’t advance to the next instruction as expected, there is a weird value for EIP now,  and the values for all the other registers where exactly the same. We didn’t skip ahead and execute the int 2e instruction to cause a kernel mode transition so what the heck happened?

 

0:014> p
eax=20b9cb90 ebx=80000000 ecx=20b9cbc0 edx=20b9cba8 esi=00000000 edi=0000004b
eip=7ffa488d esp=20b9cb50 ebp=7c576b4e iopl=0         nv up ei pl zr na po cy
cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00000247
7ffa488d ??               ???

 

So what instruction(s) are these?

 

0:014> u eip
7ffa488d ??               ???
           ^ Memory access error in 'u eip'

 

Why couldn’t we read this memory? This is a live debug, it should be readable regardless if it is mapped in the process or not. It turns out this is the reason why the userdump could not be created successfully. So I reset the repro and stepped through ESE!ErrOSSLVFileOpen again to ensure we didn’t miss anything… and of course we found nothing. At this point the impossible was happening... a benign instruction was blowing up. So its time to get a different perspective on this, so I asked a few people to give me their thoughts... well, a few hours later there was 8 people huddled over my desk all scratching their heads.

 

Hmmm… time to breakout the kernel debugger J

 

OK, so once again we walked to the point in assembly of the instruction that failed and had the customer generate a kernel dump.

 

So now looking at the kernel dump lets first find our process (note output is truncated for brevity):

: kd> !process 0 0

**** NT ACTIVE PROCESS DUMP ****

 

PROCESS ff629400  SessionId: 0  Cid: 0f54    Peb: 7ffdf000  ParentCid: 0130

    DirBase: 041eb000  ObjectTable: 81432428  TableSize: 3056.

    Image: store.exe

 

 

PROCESS fd43e1e0  SessionId: 0  Cid: 0f5c    Peb: 7ffdf000  ParentCid: 0df0

    DirBase: 046c7000  ObjectTable: 811a54c8  TableSize: 265.

    Image: cdb.exe

 

Set the context to the store.exe process:

0: kd> !process ff629400 1

PROCESS ff629400  SessionId: 0  Cid: 0f54    Peb: 7ffdf000  ParentCid: 0130

    DirBase: 041eb000  ObjectTable: 81432428  TableSize: 3056.

    Image: store.exe

    VadRoot fd434fe8 Clone 0 Private 6435. Modified 445462. Locked 67.

    DeviceMap 8208c128

    Token                             e2c4a470

    ElapsedTime                        0:14:19.0750

    UserTime                          0:00:07.0809

    KernelTime                        0:00:04.0998

    QuotaPoolUsage[PagedPool]         325628

    QuotaPoolUsage[NonPagedPool]      207428

    Working Set Sizes (now,min,max)  (58595, 50, 345) (234380KB, 200KB, 1380KB)

    PeakWorkingSetSize                61146

    VirtualSize                       560 Mb

    PeakVirtualSize                   561 Mb

    PageFaultCount                    340776

    MemoryPriority                    BACKGROUND

    BasePriority                      8

    CommitCharge                      9763

    DebugPort                         e2602040

 

Now lets go and see the state of the memory range that could not be read from usermode (truncated output):

0: kd> !vad fd434fe8

VAD     level      start      end    commit

<snip>

fb8882e8 ( 7)      7ffa0    7ffa4         5 Private      EXECUTE_READWRITE

 

So can we dump this memory? You bet…

 

0: kd> dd 7ffa0000

7ffa0000  000000e8 b62d5800 c300405d 3d2d2e5f

7ffa0010  6361485b 2072656b 65666544 7265646e

7ffa0020  2e2d3d5d 0000005f 00000000 00000800

7ffa0030  72656b00 336c656e 6c642e32 6553006c

7ffa0040  73614c74 72724574 4300726f 74616572

7ffa0050  69614d65 6f6c736c 47004174 614d7465

7ffa0060  6c736c69 6e49746f 57006f66 65746972

7ffa0070  656c6946 61655200 6c694664 6c430065

 

What is the state of this entry in the Page Tables?

0: kd> !pte 7ffa0000

7FFA0000  - PDE at C03007FC        PTE at C01FFE80

          contains 0D634867      contains 035BC8C6

        pfn d634 --DA--UWV       not valid

                               Transition:   35bc

                               Protect:  6

 

0: kd> !pte 7ffd4000

7FFD4000  - PDE at C03007FC        PTE at C01FFF50

          contains 0D634867      contains 028070C2

        pfn d634 --DA--UWV       not valid

                               PageFile    1

                               Offset 2807

                               Protect:  6

 

OK, so the value of ‘Protect’ explains why we can access this memory from usermode, but why? This condition isn’t normal.  Hmmm… lets look at the memory again using dc command:

 

0: kd> dc 7ffa0000

7ffa0000  000000e8 b62d5800 c300405d 3d2d2e5f  .....X-.]@.._.-=

7ffa0010  6361485b 2072656b 65666544 7265646e  [Hacker Defender

7ffa0020  2e2d3d5d 0000005f 00000000 00000800  ]=-._...........

7ffa0030  72656b00 336c656e 6c642e32 6553006c  .kernel32.dll.Se

7ffa0040  73614c74 72724574 4300726f 74616572  tLastError.Creat

7ffa0050  69614d65 6f6c736c 47004174 614d7465  eMailslotA.GetMa

7ffa0060  6c736c69 6e49746f 57006f66 65746972  ilslotInfo.Write

7ffa0070  656c6946 61655200 6c694664 6c430065  File.ReadFile.Cl

0: kd> dc

7ffa0080  4865736f 6c646e61 65470065 766e4574  oseHandle.GetEnv

7ffa0090  6e6f7269 746e656d 69726156 656c6261  ironmentVariable

7ffa00a0  65470057 646f4d74 46656c75 4e656c69  W.GetModuleFileN

7ffa00b0  41656d61 70754400 6163696c 61486574  ameA.DuplicateHa

7ffa00c0  656c646e 65724300 50657461 65636f72  ndle.CreateProce

7ffa00d0  00417373 74697845 65726854 43006461  ssA.ExitThread.C

7ffa00e0  74616572 72685465 00646165 61657243  reateThread.Crea

7ffa00f0  69506574 50006570 4e6b6565 64656d61  tePipe.PeekNamed

 

<lights and sirens go off>

Hacker Defender??? What the heck is that? (I’ll let you go dig around the web for that thing and remember to read the disclaimer on my blog before you do.)

 

So this is the thread waiting in the usermode debugger at the time the kernel dump was taken:

 

 

       THREAD 8477f020  Cid f54.fd8  Teb: 7fee1000  Win32Thread: a217aea8 WAIT: (WrLpcReply) KernelMode Non-Alertable
SuspendCount 1
            8477f208  Semaphore Limit 0x1
        Waiting for reply to LPC MessageId 00034466:
        Pending LPC Reply Message:
            e2602041: [40000000,01000000]
        Not impersonating
        Owning Process ff629400
        Wait Start TickCount    244909        Elapsed Ticks: 1690
        Context Switch Count    1200                   LargeStack
        UserTime                  0:00:00.0004
        KernelTime                0:00:00.0006
        Start Address KERNEL32!BaseThreadStartThunk (0x7c574333)
        Win32 Start Address MDBTASK!MdbTaskPoolThread (0x61bd13e2)
        Stack Init bd412000 Current bd41169c Base bd412000 Limit bd40f000 Call 0
        Priority 10 BasePriority 8 PriorityDecrement 0 DecrementCount 0

 

        ChildEBP RetAddr  Args to Child
        bd4116b4 8042c413 8477f020 8477f1d8 8477f208 nt!KiSwapThread+0x1b1
        bd4116dc 80434085 8477f208 00000011 00000000 nt!KeWaitForSingleObject+0x1a3
        bd41170c 804c8113 e2602040 e2602040 bd411740 nt!LpcpRequestWaitReplyPort+0x52f
        bd411720 80526c43 e2602040 bd4118c0 bd411740 nt!LpcRequestWaitReplyPort+0x13
        bd4118a0 80526cf0 bd4118c0 e2602040 00000001 nt!DbgkpSendApiMessage+0x43
        bd411938 8042f24a bd411d10 00000001 00000000 nt!DbgkForwardException+0x78
        bd411cf4 80467735 bd411d10 00000000 bd411d64 nt!KiDispatchException+0x172
        bd411d5c 804676d0 00000000 00000000 00000000 nt!CommonDispatchException+0x4d
        bd411d64 00000000 00000000 00000000 00000000 nt!KiUnexpectedInterruptTail+0x1e1

 

bd411cf4 80467735 nt!KiDispatchException(
                        struct _EXCEPTION_RECORD * ExceptionRecord = 0xbd411d10,
                        struct _KTRAP_FRAME * ExceptionFrame = 0x00000000,
                        struct _KTRAP_FRAME * TrapFrame = 0xbd411d64,
                        char PreviousMode = 1 '',
                        unsigned char FirstChance = 0x01 '')+0x172
 

 

We can see this thread is hooked:

 

0: kd> .trap bd411d64
ErrCode = 00000000
eax=0e84cb90 ebx=80000000 ecx=0e84cbc0 edx=0e84cba8 esi=00000000 edi=00000716
eip=77f87cac esp=0e84cb50 ebp=7c576b4e iopl=0         nv up ei pl zr na po cy
cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00000247
ntdll!ZwCreateFile:
001b:77f87cac e9dccb0108  jmp     7ffa488d

 

Well, this isn’t the same code that usermode was showing us… recall this from the usermode debug at exactly the same address:

0:014> t
eax=20b9cb90 ebx=80000000 ecx=20b9cbc0 edx=20b9cba8 esi=00000000 edi=0000004b
eip=77f87cac esp=20b9cb50 ebp=7c576b4e iopl=0         nv up ei pl zr na po cy
cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00000247
ntdll!ZwCreateFile:
77f87cac b820000000       mov     eax,0x20

 

So, this means Hacker Defender was also intercepting calls made by the debugger during the live debug to read process memory and showing what would be expected at that address, not what was really happening… In kernel mode we can clearly see the instruction at 7ffa488d that attempting to be executed.. This address is in the range of 7ffa0000-7ffa4000 that we determined was unreadable from usermode.

 

The jmp that actually occurred placed the thread’s instruction pointer (EIP) pointing into the region of memory that could not be accessed from usermode prior to transitioning to kernel mode and this is why the information store was crashing.  So it appears this rootkit has a bug …

 

TOO FUNNY!!!!