• Ntdebugging Blog

    The Mystery of Lsass.exe Memory Consumption, (When all components get involved)

    • 4 Comments

    Hi All, this is Karim Elsaid and I’m a Support Escalation Engineer working with the Dubai platforms support team.  Recently I was working on a very challenging and interesting case, and I wanted to share that experience with you.

     

    One of our customers was experiencing a problem on all his Domain Controllers (Running x86 Windows Server 2003), where intermittently the DCs ran out of resources.  When one of the servers ran out of resources it had to be restarted to restore its functionality.  The following event was logged on the DC when the problem occurred:

     

    Event Type: Error
    Event Source: NTDS General
    Event Category: Internal Processing
    Event ID: 1169
    Date:
    Time:
    User: N/A
    Computer:
    Description:
    Active Directory could not allocate the needed amount of memory.

    Memory (bytes):
    4568

    Active Directory will continue to operate, but may not function correctly.

    User Action
    Restart this domain controller. If this condition continues, increase the available
    physical or virtual memory.

    Additional Data
    Internal ID:
    30205bd
    For more information, see Help and Support Center at
    http://go.microsoft.com/fwlink/events.asp.

     

    This event gets logged when the Local Security Authority Subsystem (lsass.exe) process is unable to allocate memory from the heap.

     

    When this problem occurs, users are not able to authenticate and Exchange servers cease functioning until the DC gets rebooted.

     

    By the time we got the call, the DC that was exhibiting the issue had been rebooted, hence when we checked the lsass.exe memory usage it was very low, but when we checked another DC on the same site, the lsass.exe memory usage was around 1.7GB.  Well, that’s pretty high but it is worth mentioning that by default lsass.exe may utilize a large amount of virtual memory for the ESENT Cache (Active Directory database), nevertheless this should not cause any issues on DCs as it should back-off when needed.

     

    The issue here is that the memory usage for lsass.exe kept growing and never got released back again - this is called a process memory leak.

     

    To confirm our theory we obtained a Perfmon Log from the DC and checked how Lsass.exe is doing with memory, which proved the theory that lsass.exe is actually leaking memory.

    Performance Monitor

     

    We can see from the graph above a steady increase in lsass.exe private bytes.  The DC was booted without /3GB and hence the limit of the user mode process address space is 2GB.

     

    For more information on this topic, please check the previous post “The Memory Shell Game

     

    Now, we need to explore the options on how to track and address such leaks. We could obtain an lsass.exe process dump and check the heap, or obtain a Full memory dump of the server while in the failed state, to check the lsass.exe heap usage and overall system state, but the thing is none of the above may immediately reveal what’s going on.  We would need to be very lucky to get the dump while the actual memory call that happens inside lsass.exe process is executing, which is very slim.

     

    What we are looking for is a “pattern”, the pattern which is causing the actual leak inside the lsass.exe process.

     

    For that we will turn to a tool that I see many of the customers are not aware of, but in fact it is very handy and easy to troubleshoot such a memory leak problem. We will use the “User Mode Dump Heap” or UMDH. By using this tool it will do the job for us, it will monitor the all heap calls that happen in the lsass.exe process.  All we need to do is to download the “Debugging Tools For Windows”, enable user mode stack acquisition from the command line, download the symbols from Microsoft public symbol servers, reboot the server and we are good to go.

     

    You can find information on UMDH; how to set it up and use it “Here

     

    So now, all heap blocks that the lsass.exe process allocates will get monitored by UMDH, and the tool will tell us information about that specific call stack.  For example, the number of allocations using this stack, number of bytes consumed, etc…

     

    Back to our main problem: We have setup the DC for UMDH and started monitoring the lsass.exe process.  We verified UMDH is working after the reboot by generating a sample log -  it is very critical to generate this simple log to make sure that you are able to resolve the symbols correctly.

     

    We kept generating UMDH logs daily and then the interesting pattern started to appear, which shows:

     

    + 123097760 ( 123098480 -    720) 1538731 allocs      BackTrace4BA0

    + 1538722 ( 1538731 -      9) BackTrace4BA0     allocations

     

          ntdll!RtlAllocateHeapSlowly+00000041

          ntdll!RtlAllocateHeap+00000E9F

          msvcrt!malloc+0000006C

          NTDSA!ABBind_local+0000031C

          NTDSA!NspiBind+0000013D

          RPCRT4!Invoke+00000030

          RPCRT4!NdrStubCall2+00000299

          RPCRT4!NdrServerCall2+00000019

          RPCRT4!DispatchToStubInCNoAvrf+00000038

          RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0000011F

          RPCRT4!RPC_INTERFACE::DispatchToStub+000000A3

          RPCRT4!OSF_SCALL::DispatchHelper+00000149

          RPCRT4!OSF_SCALL::DispatchRPCCall+0000010D

          RPCRT4!OSF_SCALL::ProcessReceivedPDU+0000057F

          RPCRT4!OSF_SCALL::BeginRpcCall+00000194

          RPCRT4!OSF_SCONNECTION::ProcessReceiveComplete+00000435

          RPCRT4!ProcessConnectionServerReceivedEvent+00000021

          RPCRT4!LOADABLE_TRANSPORT::ProcessIOEvents+000001B8

          RPCRT4!ProcessIOEventsWrapper+0000000D

          RPCRT4!BaseCachedThreadRoutine+0000009D

          RPCRT4!ThreadStartRoutine+0000001B

          kernel32!BaseThreadStart+00000034

     

    From the backtrace 4BA0 we can see that there are ~1.5 heap allocation calls that have occurred without being freed from the first day we started to monitor lsass using UMDH.

     

    There is also another call stack with exactly the same symptoms:

     

    + 51525199 ( 51525506 -    307) 1538802 allocs  BackTrace4B92

    + 1538793 ( 1538802 -      9) BackTrace4B92     allocations

     

          ntdll!RtlAllocateHeapSlowly+00000041

          ntdll!RtlAllocateHeap+00000E9F

          LSASRV!LsapAllocatePrivateHeapNoZero+0000004F

          RPCRT4!operator new+0000000D

          RPCRT4!W2AAttachHelper+00000032

          RPCRT4!CNlDelUnicodeAnsi::Convert+00000014

          NTDSA!ABBind_local+0000005D

          NTDSA!NspiBind+0000013D

          RPCRT4!Invoke+00000030

          RPCRT4!NdrStubCall2+00000299

          RPCRT4!NdrServerCall2+00000019

          RPCRT4!DispatchToStubInCNoAvrf+00000038

          RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0000011F

          RPCRT4!RPC_INTERFACE::DispatchToStub+000000A3

          RPCRT4!OSF_SCALL::DispatchHelper+00000149

          RPCRT4!OSF_SCALL::DispatchRPCCall+0000010D

          RPCRT4!OSF_SCALL::ProcessReceivedPDU+0000057F

          RPCRT4!OSF_SCALL::BeginRpcCall+00000194

          RPCRT4!OSF_SCONNECTION::ProcessReceiveComplete+00000435

          RPCRT4!ProcessConnectionServerReceivedEvent+00000021

          RPCRT4!LOADABLE_TRANSPORT::ProcessIOEvents+000001B8

          RPCRT4!ProcessIOEventsWrapper+0000000D

          RPCRT4!BaseCachedThreadRoutine+0000009D

          RPCRT4!ThreadStartRoutine+0000001B

          kernel32!BaseThreadStart+00000034

     

    From both call stacks above we can see that the originating function that leads to the heap allocation is “NTDSA!NspiBind

     

    NSPI is the interface which the Exchange Address Book provider uses to talk to the Global Catalogs. Typically when Outlook clients want to talk to the directory for address book, they talk to the Exchange server directly and then the Exchange server will look up the appropriate GC, forward the request to it and finally send the result back to the originating clients.  This method is called the DSproxy method.  Outlook clients can also send referral requests (RFR) to the Exchange server to get a referral to a GC and then start talking to this GC directly.  You can find more information about this operation “Here

     

     

    With this new information in hand, we turned our attention to the Exchange server for a while. We did so by obtaining a netmon trace from the Exchange server, and started examining it to reveal a massive number of RFR requests originating from outlook client machines toward the Exchange server:

    3795  11:50:59.1426170  10.3876170        GC_SERVER   EXCHSERVER NSPI  NSPI:NspiBind Response, ReturnValue=0x0     {MSRPC:535, TCP:488, IPv4:118}

    3796  11:50:59.1426170  10.3876170        EXCHSERVER 10.208.164.176    NSPI      NSPI:NspiBind Response, ReturnValue=0x0   {MSRPC:502, TCP:496, IPv4:326}

    3797  11:50:59.1426170  10.3876170        EXCHSERVER 10.208.164.176    NSPI      NSPI:NspiBind Response, ReturnValue=0x0   {MSRPC:509, TCP:508, IPv4:326}

    3798  11:50:59.1426170  10.3876170        EXCHSERVER 10.208.164.176    NSPI      NSPI:NspiBind Response, ReturnValue=0x0   {MSRPC:494, TCP:484, IPv4:326}

    9807  11:51:13.1894920  24.4344920        10.208.164.176    EXCHSERVER NSPI      NSPI:NspiBind Request, Flags=0x0    {MSRPC:1144, TCP:1142, IPv4:326}

    10084 11:51:13.3769920  24.6219920        EXCHSERVER GC_SERVER   NSPI  NSPI:NspiBind Request, Flags=0x0      {MSRPC:1145, TCP:1143, IPv4:118}

    10085 11:51:13.3769920  24.6219920        GC_SERVER   EXCHSERVER NSPI  NSPI:NspiBind Response, ReturnValue=0x0     {MSRPC:1145, TCP:1143, IPv4:118}

    10086 11:51:13.3926170  24.6376170        EXCHSERVER 10.208.164.176    NSPI      NSPI:NspiBind Response, ReturnValue=0x0   {MSRPC:1144, TCP:1142, IPv4:326}

    10103 11:51:13.3926170  24.6376170        10.208.164.176    EXCHSERVER NSPI      NSPI:NspiBind Request, Flags=0x0    {MSRPC:1182, TCP:1180, IPv4:326}

    10108 11:51:13.3926170  24.6376170        10.208.164.176    EXCHSERVER NSPI      NSPI:NspiBind Request, Flags=0x0    {MSRPC:1151, TCP:1149, IPv4:326}

    10111 11:51:13.3926170  24.6376170        10.208.164.176    EXCHSERVER NSPI      NSPI:NspiBind Request, Flags=0x0    {MSRPC:1200, TCP:1196, IPv4:326}

    10115 11:51:13.3926170  24.6376170        10.208.164.176    EXCHSERVER NSPI      NSPI:NspiBind Request, Flags=0x0    {MSRPC:1164, TCP:1163, IPv4:326}

    10148 11:51:13.4863670  24.7313670        10.34.51.69 EXCHSERVER NSPI  NSPI:NspiBind Request, Flags=0x0      {MSRPC:447, TCP:446, IPv4:367}

    10149 11:51:13.4863670  24.7313670        EXCHSERVER GC_SERVER   NSPI  NSPI:NspiBind Request, Flags=0x0      {MSRPC:473, TCP:448, IPv4:69}

    10172 11:51:13.4863670  24.7313670        10.208.164.176    EXCHSERVER NSPI      NSPI:NspiBind Request, Flags=0x0    {MSRPC:1199, TCP:1197, IPv4:326}

    10174 11:51:13.4863670  24.7313670        10.208.164.176    EXCHSERVER NSPI      NSPI:NspiBind Request, Flags=0x0    {MSRPC:1187, TCP:1186, IPv4:326}

    10180 11:51:13.4863670  24.7313670        10.208.164.176    EXCHSERVER NSPI      NSPI:NspiBind Request, Flags=0x0    {MSRPC:1170, TCP:1168, IPv4:326}

    10183 11:51:13.4863670  24.7313670        10.208.164.176    EXCHSERVER NSPI      NSPI:NspiBind Request, Flags=0x0    {MSRPC:1153, TCP:1152, IPv4:326}

    10185 11:51:13.4863670  24.7313670        GC_SERVER   EXCHSERVER NSPI  NSPI:NspiBind Response, ReturnValue=0x0     {MSRPC:473, TCP:448, IPv4:69}

    10186 11:51:13.4863670  24.7313670        EXCHSERVER 10.34.51.69 NSPI  NSPI:NspiBind Response, ReturnValue=0x0     {MSRPC:447, TCP:446, IPv4:367}

    10224 11:51:13.5176170  24.7626170        10.208.164.176    EXCHSERVER NSPI      NSPI:NspiBind Request, Flags=0x0    {MSRPC:1209, TCP:1208, IPv4:326}

    10227 11:51:13.5176170  24.7626170        10.208.164.176    EXCHSERVER NSPI      NSPI:NspiBind Request, Flags=0x0    {MSRPC:1192, TCP:1190, IPv4:326}

    10232 11:51:13.5176170  24.7626170        10.208.164.176    EXCHSERVER NSPI      NSPI:NspiBind Request, Flags=0x0    {MSRPC:1159, TCP:1157, IPv4:326}

    10235 11:51:13.5176170  24.7626170        10.208.164.176    EXCHSERVER NSPI      NSPI:NspiBind Request, Flags=0x0    {MSRPC:1175, TCP:1173, IPv4:326}

    10240 11:51:13.5176170  24.7626170        10.34.51.69 EXCHSERVER NSPI  NSPI:NspiBind Request, Flags=0x0      {MSRPC:447, TCP:446, IPv4:367}

    10241 11:51:13.5176170  24.7626170        EXCHSERVER GC_SERVER   NSPI  NSPI:NspiBind Request, Flags=0x0      {MSRPC:473, TCP:448, IPv4:69}

    10251 11:51:13.5332420  24.7782420        GC_SERVER   EXCHSERVER NSPI  NSPI:NspiBind Response, ReturnValue=0x0     {MSRPC:473, TCP:448, IPv4:69}

    10255 11:51:13.5332420  24.7782420        EXCHSERVER 10.34.51.69 NSPI  NSPI:NspiBind Response, ReturnValue=0x0     {MSRPC:447, TCP:446, IPv4:367}

    Snipp…..

     

    In the netmon capture we saw an excessive number of NspiBind requests been sent from the Outlook client machines to the Exchange server and Exchange server in turn sending these NspiBind requests to the GC, in the netmon capture we didn’t find any frames for NspiUnBind requests, which should be used to destroy that connection.

     

    When we checked the Exchange server security logs we found a massive number of logon events that are occurring for every single client:

    Event Type: Success Audit

    Event Source:     Security

    Event Category:   (2)

    Event ID:   540

    Date:       5/26/2010

    Time:       3:49:09 PM

    User:      

    Computer:   EXCHSERVER

    Description:

    Successful Network Logon:

          User Name:  User_Name

          Domain:           CONTOSO.COM

          Logon ID:         (0x0,0x3DA1F42)

          Logon Type: 3

          Logon Process:    Kerberos

          Authentication Package: Kerberos

          Workstation Name:

          Logon GUID: {b03f9ad2-3150-e1be-f50a-b3abb9d2c09d}

          Caller User Name: -

          Caller Domain:    -

          Caller Logon ID:  -

          Caller Process ID: -

          Transited Services: -

          Source Network Address: 10.11.12.13

          Source Port:      0

     

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

     

    So, now we know the problem of lsass.exe memory consumption on the DCs is actually occurring because of a massive number of NspiBind requests that are being sent to the DCs without proper NspiUnbind requests.

     

    So we turned our attention to the client machines that have Outlook installed and started to list all 3rd party add-ins that are installed on them.

     

    We found that all affected client machines are actually sharing 3rd party add-ins for email content archival. Further investigation on this software revealed a strange MAPI Wrapper DLL library, and our customer confirmed that the start time of the problem approximately matches the time they have rolled out the content archival software on the client machines.

     

    We uninstalled this content archival add-in from a couple of machines and these machines stopped hammering the Exchange Server and GC with NspiBind requests.

     

    Later we uninstalled it from the rest of the client machines and the lsass.exe process on the DCs has finally stabilized and we no longer have any issues with its memory usage.

     

    I’d like to mention that starting with Windows Server 2008 we have implemented a “NSPI Max Sessions Per User” where you can limit the number of NSPI requests from a client to the server.

     

    You can read more information about this new functionality “Here

     

    That’s it! I hope you enjoyed reading and hopefully this should help you to troubleshoot any memory leaks you may encounter.  The information contained here can be applied to other processes when you suspect heap leaks.

     

    See you again on another adventure.

     

    Best Regards,

    Karim Elsaid

  • Ntdebugging Blog

    Control Panel Policy and the Missing Icon

    • 1 Comments

    A customer recently reported a rather peculiar problem.  They were working on a roll out of Windows 7, and one of the policies they employed on their domain was the “Show only specified control panel applets” setting.  As its name implies, this policy allows an administrator to specify the names of any Control Panel icons they want their users to see, and all others are not shown.  This company chose to allow a few of the Control Panel items included with Windows as well as the Mail icon that is added when Microsoft Office (specifically Outlook) is installed, which allows users to configure their Outlook profiles.

     

    The problem the customer noticed was that, with the policy was in place, the first time a user opened the Control Panel on a computer, all of the allowed icons appeared except for Mail.  If the user closed Control Panel and reopened it, the icon suddenly appeared.  In fact, from that point on, the Mail icon would appear for that user on that computer.  However, if that user logged on to a different computer, or a different user logged in to the first user’s computer, the icon was missing until Control Panel was opened for a second time.

     

    Figure 1: Policy setting on our test system

    Figure 1: Policy setting on our test system

     

    Figure 2: Result from first open

    Figure 2: Result from first open

     

    Figure 3: Result on subsequent opens (desired)

    Figure 3: Result on subsequent opens (desired)

     

    Control Panel isn’t something we get many cases on, so I had to figure out how it worked, and what would cause it to fail only the first time.  One of the first things I discovered as I debugged the issue was that the Control Panel is actually loaded in two passes – a “fast” pass and a “slow” pass.  This is done so icons we know will be loaded quickly can be displayed first so a user doesn’t get stuck with an empty window waiting for some of the slower icons to load.  What makes an icon fast or slow?  It turns out shell maintains a cache of Control Panel items, so it can display them quickly without having to query the actual item for its name, description, icon, etc.  If we have to ask the item for its information, that will take longer, so it is loaded during the second, slow phase:

     

    shell32!CControlPanelDataWorkItem::_LoadSlowData

    shell32!CControlPanelDataWorkItem::DoWork

    shell32!CFrameTask::InternalResumeRT

    shell32!CRunnableTask::Run

    shell32!CShellTask::TT_Run

    shell32!CShellTaskThread::ThreadProc

    shell32!CShellTaskThread::s_ThreadProc

    shlwapi!ExecuteWorkItemThreadProc

    ntdll!RtlpTpWorkCallback

    ntdll!TppWorkerThread

    kernel32!BaseThreadInitThunk

    ntdll!__RtlUserThreadStart

    ntdll!_RtlUserThreadStart

     

    Luckily for these icons, they won’t stay relegated to slow loading status forever.  During the slow loading phase, one of the things we do is add the item’s information to the Control Panel cache, so we can load it during the fast phase in the future.

     

    shell32!CPLD_AddControlToReg

    shell32!CControlPanelEnum::_NextNonCachedCpl

    shell32!CControlPanelEnum::Next

    shell32!CRegFolderEnum::Next

    shell32!CControlPanelAppletList::_AddAppletsToCategories

    shell32!CControlPanelAppletList::LoadSlowApplets

    shell32!CControlPanelDataWorkItem::_LoadSlowData

    shell32!CControlPanelDataWorkItem::DoWork

    shell32!CFrameTask::InternalResumeRT

    shell32!CRunnableTask::Run

    shell32!CShellTask::TT_Run

    shell32!CShellTaskThread::ThreadProc

    shell32!CShellTaskThread::s_ThreadProc

    shlwapi!ExecuteWorkItemThreadProc

    ntdll!RtlpTpWorkCallback

    ntdll!TppWorkerThread

    kernel32!BaseThreadInitThunk

    ntdll!__RtlUserThreadStart

    ntdll!_RtlUserThreadStart

     

    This explains why the item was being loaded on subsequent opens of Control Panel – the item was added to the user’s cache on the first run, so the next time, it was loaded during the fast pass:

    shell32!CControlPanelEnum::_NextCachedCpl

    shell32!CControlPanelEnum::Next

    shell32!CRegFolderEnum::Next
    shell32!CControlPanelAppletList::_AddAppletsToCategories

    shell32!CControlPanelAppletList::EnsureLoaded

    shell32!CControlPanelDataWorkItem::_LoadFastData

    shell32!CControlPanelDataWorkItem::DoWork

    shell32!CFrameTask::InternalResumeRT

    shell32!CRunnableTask::Run

    shell32!CShellTask::TT_Run

    shell32!CShellTaskThread::ThreadProc

    shell32!CShellTaskThread::s_ThreadProc

    shlwapi!ExecuteWorkItemThreadProc

    ntdll!RtlpTpWorkCallback

    ntdll!TppWorkerThread

    kernel32!BaseThreadInitThunk

    ntdll!__RtlUserThreadStart

    ntdll!_RtlUserThreadStart

     

    But that only answers half the question – while we now know why the icon appears every other time, we still don’t know why it doesn’t show up the first time.  To figure this out, I debugged a system with the “Show only specified control panel applets” policy applied and one without the policy set, both as they opened Control Panel for the first time, and compared where the return codes from the functions diverged. 

     

    Starting with CControlPanelEnum::_NextNonCachedCpl, in the non-working case, we can see that it returns 0:

    0:003> bp shell32!CControlPanelEnum::_NextNonCachedCpl

    0:003> g

    Breakpoint 0 hit

    eax=14c5fa8c ebx=14c5fa7c ecx=11224630 edx=00000000 esi=11224630 edi=14c5fa8c

    eip=76977e11 esp=14c5f9b4 ebp=14c5f9c8 iopl=0         nv up ei pl zr na pe nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246

    shell32!CControlPanelEnum::_NextNonCachedCpl:

    76977e11 8bff            mov     edi,edi

    0:020> k

    ChildEBP RetAddr 

    14c5f9b0 76977dff shell32!CControlPanelEnum::_NextNonCachedCpl

    14c5f9c8 768a854d shell32!CControlPanelEnum::Next+0x5a

    14c5fa18 769710d2 shell32!CRegFolderEnum::Next+0x22e

    14c5fa94 769783e7 shell32!CControlPanelAppletList::_AddAppletsToCategories+0x35

    14c5fac4 76978353 shell32!CControlPanelAppletList::LoadSlowApplets+0x68

    14c5faec 76974db4 shell32!CControlPanelDataWorkItem::_LoadSlowData+0x2e

    14c5fb00 769e06ad shell32!CControlPanelDataWorkItem::DoWork+0x4b

    14c5fb18 768d83a6 shell32!CFrameTask::InternalResumeRT+0x14

    14c5fb38 7691639b shell32!CRunnableTask::Run+0xce

    14c5fb54 76918c1f shell32!CShellTask::TT_Run+0x167

    14c5fb9c 76918d53 shell32!CShellTaskThread::ThreadProc+0xa3

    14c5fba4 760cb2b1 shell32!CShellTaskThread::s_ThreadProc+0x1b

    14c5fbb4 77c8d877 shlwapi!ExecuteWorkItemThreadProc+0xe

    14c5fc28 77c90842 ntdll!RtlpTpWorkCallback+0x11d

    14c5fd80 00000000 ntdll!TppWorkerThread+0x572

    0:020> gu  <-- This steps out to the instruction after the call to CControlPanelEnum::_NextNonCachedCpl in CControlPanelEnum::Next.

    ModLoad: 35c70000 35c8b000   c:\progra~1\common~1\system\msmapi\1033\mlcfg32.cpl

    eax=00000000 ebx=14c5fa7c ecx=bea347f1 edx=006d2904 esi=11224630 edi=14c5fa8c

    eip=76977dff esp=14c5f9bc ebp=14c5f9c8 iopl=0         nv up ei pl zr na pe nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246

    shell32!CControlPanelEnum::Next+0x5a:

    76977dff 84c0            test    al,al

     

    But on the machine without the policy set, it returns 1:

    eax=0c83fc34 ebx=0c83fc24 ecx=04ab1060 edx=00000000 esi=04ab1060 edi=0c83fc34

    eip=76977e11 esp=0c83fb5c ebp=0c83fb70 iopl=0         nv up ei pl zr na pe nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246

    shell32!CControlPanelEnum::_NextNonCachedCpl:

    76977e11 8bff            mov     edi,edi

    0:012> k

    ChildEBP RetAddr 

    0c83fb58 76977dff shell32!CControlPanelEnum::_NextNonCachedCpl

    0c83fb70 768a854d shell32!CControlPanelEnum::Next+0x5a

    0c83fbc0 769710d2 shell32!CRegFolderEnum::Next+0x22e

    0c83fc3c 769783e7 shell32!CControlPanelAppletList::_AddAppletsToCategories+0x35

    0c83fc6c 76978353 shell32!CControlPanelAppletList::LoadSlowApplets+0x68

    0c83fc94 76974db4 shell32!CControlPanelDataWorkItem::_LoadSlowData+0x2e

    0c83fca8 769e06ad shell32!CControlPanelDataWorkItem::DoWork+0x4b

    0c83fcc0 768d83a6 shell32!CFrameTask::InternalResumeRT+0x14

    0c83fce0 7691639b shell32!CRunnableTask::Run+0xce

    0c83fcfc 76918c1f shell32!CShellTask::TT_Run+0x167

    0c83fd44 76918d53 shell32!CShellTaskThread::ThreadProc+0xa3

    0c83fd4c 760cb2b1 shell32!CShellTaskThread::s_ThreadProc+0x1b

    0c83fd5c 77c8d877 shlwapi!ExecuteWorkItemThreadProc+0xe

    0c83fdd0 77c90842 ntdll!RtlpTpWorkCallback+0x11d

    0c83ff30 778b3c45 ntdll!TppWorkerThread+0x572

    0c83ff3c 77cc37f5 kernel32!BaseThreadInitThunk+0xe

    0c83ff7c 77cc37c8 ntdll!__RtlUserThreadStart+0x70

    0c83ff94 00000000 ntdll!_RtlUserThreadStart+0x1b

    0:012> gu

    ModLoad: 35c70000 35c8b000   c:\progra~1\common~1\system\msmapi\1033\mlcfg32.cpl

    eax=00000001 ebx=0c83fc24 ecx=bea347f1 edx=00002bba esi=04ab1060 edi=0c83fc34

    eip=76977dff esp=0c83fb64 ebp=0c83fb70 iopl=0         nv up ei pl zr na pe nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246

    shell32!CControlPanelEnum::Next+0x5a:

    76977dff 84c0            test    al,al

     

    So we know something in this function is causing us to return FALSE in the failing case and TRUE in the working case.  An easy way to see where the return result of function calls is the wt (watch trace) debugger command with the -oR switch to see the return values.  I set the level (-l) to 1 so I’d only see functions directly called from CControlPanelEnum::_NextNonCachedCpl.  Here were the results.

     

    Failing case:

     

    eax=14c5fa8c ebx=14c5fa7c ecx=11224630 edx=00000000 esi=11224630 edi=14c5fa8c

    eip=76977e11 esp=14c5f9b4 ebp=14c5f9c8 iopl=0         nv up ei pl zr na pe nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246

    shell32!CControlPanelEnum::_NextNonCachedCpl:

    76977e11 8bff            mov     edi,edi

    0:020> wt -oR -l 1

    Tracing shell32!CControlPanelEnum::_NextNonCachedCpl to return address 76977dff

       31     0 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

       10     0 [  1]   shell32!DSA_GetItemPtr

       17     0 [  1]   comctl32!DSA_GetItemPtr eax = 1123aa78

       48    27 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

    ModLoad: 35c70000 35c8b000   c:\progra~1\common~1\system\msmapi\1033\mlcfg32.cpl

       34     0 [  1]   shell32!CPLD_InitModule eax = 1

       55    61 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

       15     0 [  1]   shell32!CDPA_Base<RESULT_FOLDER_ITEM,CTContainer_PolicyUnOwned<RESULT_FOLDER_ITEM> >::AppendPtr eax = 0

       59    76 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

       14     0 [  1]   shell32!CPL_AddModuleReference eax = 0

       61    90 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

       21     0 [  1]   shell32!CPL_ReleaseModuleReference eax = 0

       76   111 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

      128     0 [  1]   shell32!CPLD_AddControlToReg eax = 1

       78   239 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

       12     0 [  1]   shell32!FindCPLModuleInList eax = c666998

       84   251 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

       10     0 [  1]   shell32!DSA_GetItemPtr

       17     0 [  1]   comctl32!DSA_GetItemPtr eax = c5e7410

       91   278 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

       90     0 [  1]   shell32!CControlPanelEnum::_CanEnumerateApplet eax = 0

       97   368 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

       21     0 [  1]   shell32!CPL_ReleaseModuleReference eax = 0

      108   389 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

      143     0 [  1]   shell32!CPLD_FlushRegModules eax = 0

      116   532 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

        3     0 [  1]   shell32!__security_check_cookie eax = 0

      118   535 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

     

    Working case:

     

    eax=0c83fc34 ebx=0c83fc24 ecx=04ab1060 edx=00000000 esi=04ab1060 edi=0c83fc34

    eip=76977e11 esp=0c83fb5c ebp=0c83fb70 iopl=0         nv up ei pl zr na pe nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246

    shell32!CControlPanelEnum::_NextNonCachedCpl:

    76977e11 8bff            mov     edi,edi

    0:012> wt -oR -l 1

    Tracing shell32!CControlPanelEnum::_NextNonCachedCpl to return address 76977dff

       31     0 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

       10     0 [  1]   shell32!DSA_GetItemPtr

       17     0 [  1]   comctl32!DSA_GetItemPtr eax = 4a732c8

       48    27 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

    ModLoad: 35c70000 35c8b000   c:\progra~1\common~1\system\msmapi\1033\mlcfg32.cpl

       34     0 [  1]   shell32!CPLD_InitModule eax = 1

       55    61 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

       15     0 [  1]   shell32!CDPA_Base<RESULT_FOLDER_ITEM,CTContainer_PolicyUnOwned<RESULT_FOLDER_ITEM> >::AppendPtr eax = 0

       59    76 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

       14     0 [  1]   shell32!CPL_AddModuleReference eax = 0

       61    90 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

       21     0 [  1]   shell32!CPL_ReleaseModuleReference eax = 0

       76   111 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

      128     0 [  1]   shell32!CPLD_AddControlToReg eax = 1

       78   239 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

       12     0 [  1]   shell32!FindCPLModuleInList eax = 2ff3968

       84   251 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

       10     0 [  1]   shell32!DSA_GetItemPtr

       17     0 [  1]   comctl32!DSA_GetItemPtr eax = 4aa81a8

       91   278 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

      111     0 [  1]   shell32!CControlPanelEnum::_CanEnumerateApplet eax = 1

      104   389 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

       21     0 [  1]   shell32!CPL_ReleaseModuleReference eax = 0

      112   410 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

        3     0 [  1]   shell32!__security_check_cookie eax = 1

      114   413 [  0] shell32!CControlPanelEnum::_NextNonCachedCpl

     

    Here, we can see that the function CControlPanelEnum::_CanEnumerateApplet returns true in the working case, but false in the failing case.  This is what is causing the return value from _NextNonCachedCpl to differ, which is ultimately what causes the icon to load/not load.  So let’s take a look at _CanEnumerateApplet.  Using wt inside this function showed something rather interesting:

     

    Failing:

     

    eax=0c5e7410 ebx=769298fd ecx=11224630 edx=00000000 esi=11224630 edi=0c666998

    eip=7693a72f esp=14c5f980 ebp=14c5f9b0 iopl=0         nv up ei pl nz na po nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000202

    shell32!CControlPanelEnum::_CanEnumerateApplet:

    7693a72f 8bff            mov     edi,edi

    0:020> wt -oR -l 1

    Tracing shell32!CControlPanelEnum::_CanEnumerateApplet to return address 7693a68e

       27     0 [  0] shell32!CControlPanelEnum::_CanEnumerateApplet

      737     0 [  1]   shlwapi!PathFindFileNameW eax = c6669f8

       40   737 [  0] shell32!CControlPanelEnum::_CanEnumerateApplet

       59     0 [  1]   shell32!IDControlCreate eax = 0

       44   796 [  0] shell32!CControlPanelEnum::_CanEnumerateApplet

       43     0 [  1]   shlwapi!SHWindowsPolicy eax = 1

       53   839 [  0] shell32!CControlPanelEnum::_CanEnumerateApplet

        2     0 [  1]   shell32!CRegFolder::QueryInterface

        5     0 [  1]   shell32!CKnownFoldersFolder::QueryInterface

       12     0 [  1]   shell32!CAggregatedUnknown::QueryInterface eax = 0

       71   858 [  0] shell32!CControlPanelEnum::_CanEnumerateApplet

       48     0 [  1]   shell32!CRegFolder::GetDetailsEx eax = ffffffff`80070002

       77   906 [  0] shell32!CControlPanelEnum::_CanEnumerateApplet

       24     0 [  1]   shell32!CRegFolder::Release eax = 3

       79   930 [  0] shell32!CControlPanelEnum::_CanEnumerateApplet

        6     0 [  1]   shell32!ATL::CComVariant::Clear eax = 0

       88   936 [  0] shell32!CControlPanelEnum::_CanEnumerateApplet

        3     0 [  1]   shell32!__security_check_cookie eax = 0

       90   939 [  0] shell32!CControlPanelEnum::_CanEnumerateApplet

     

    Working:

     

    0:012> wt -oR -l 1

    Tracing shell32!CControlPanelEnum::_CanEnumerateApplet to return address 7693a68e

       27     0 [  0] shell32!CControlPanelEnum::_CanEnumerateApplet

      737     0 [  1]   shlwapi!PathFindFileNameW eax = 2ff39c8

       40   737 [  0] shell32!CControlPanelEnum::_CanEnumerateApplet

       59     0 [  1]   shell32!IDControlCreate eax = 0

       44   796 [  0] shell32!CControlPanelEnum::_CanEnumerateApplet

       43     0 [  1]   shlwapi!SHWindowsPolicy eax = 1

       53   839 [  0] shell32!CControlPanelEnum::_CanEnumerateApplet

        2     0 [  1]   shell32!CRegFolder::QueryInterface

        5     0 [  1]   shell32!CKnownFoldersFolder::QueryInterface

       12     0 [  1]   shell32!CAggregatedUnknown::QueryInterface eax = 0

       71   858 [  0] shell32!CControlPanelEnum::_CanEnumerateApplet

       48     0 [  1]   shell32!CRegFolder::GetDetailsEx eax = 0

       77   906 [  0] shell32!CControlPanelEnum::_CanEnumerateApplet

       28     0 [  1]   shell32!CPL_DoesPolicyAllow eax = 1

       82   934 [  0] shell32!CControlPanelEnum::_CanEnumerateApplet

       24     0 [  1]   shell32!CRegFolder::Release eax = 3

       84   958 [  0] shell32!CControlPanelEnum::_CanEnumerateApplet

        6     0 [  1]   shell32!ATL::CComVariant::Clear eax = 0

       98   964 [  0] shell32!CControlPanelEnum::_CanEnumerateApplet

      358     0 [  1]   shell32!CControlPanelFolder::GetCustomAttributes eax = 0

      109  1322 [  0] shell32!CControlPanelEnum::_CanEnumerateApplet

        3     0 [  1]   shell32!__security_check_cookie eax = 1

      111  1325 [  0] shell32!CControlPanelEnum::_CanEnumerateApplet

     

    Looking at the two, we can see that they both call GetDetailsEx, but in the failing case we seem to get an error result: 80070002, and in the working case we get 0, and then proceed to call CPL_DoesPolicyAllow.  CPL_DoesPolicyAllow is not called on the machine with the policy applied.  A quick code review of CPL_DoesPolicyAllow showed that this function checks if a policy prevents or allows an item from being displayed.  So if we can just get to that function, the icon will load.  Now we need to know why GetDetailsEx fails, preventing us from calling CPL_DoesPolicyAllow.   If we look up the error code, it is pretty generic:

     

    0:020> !error 80070002

    Error code: (HRESULT) 0x80070002 (2147942402) - The system cannot find the file specified.

     

    Next I stepped through and into GetDetailsEx.  I’ll save you the tedious steps and output, since the procedure is similar to what I’ve already shown above.  Anyway, I was able to trace the error 2 as being returned by a call to the registry a few functions deep into GetDetailsEx:

     

    kernel32!RegOpenKeyExW

    shell32!CControlPanelFolder::_GetExtPropRegKey

    shell32!CControlPanelFolder::_InitExtPropRegValNameCache

    shell32!CControlPanelFolder::_PropertyFromPidl

    shell32!CControlPanelFolder::GetDetailsEx

    shell32!CRegFolder::GetDetailsEx

    shell32!CControlPanelEnum::_CanEnumerateApplet

    shell32!CControlPanelEnum::_NextNonCachedCpl

     

    This is good news, because it means we can find out what we’re looking for that we aren’t finding.  First I stepped out to get back to CControlPanelFolder::_InitExtPropRegValNameCache.  Sure enough, eax was 2, which is our ‘cannot find the file’ error.

     

    0:020> gu

    eax=00000002 ebx=00000000 ecx=778ac2da edx=00000002 esi=14c5eff0 edi=00000000

    eip=76975ead esp=14c5ed40 ebp=14c5efc4 iopl=0         nv up ei pl zr na pe nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246

    shell32!CControlPanelFolder::_GetExtPropRegKey+0xb1:

    76975ead 50              push    eax

    0:020> !error 2

    Error code: (Win32) 0x2 (2) - The system cannot find the file specified

     

    Since we know that RegOpenKeyEx takes the subkey it is looking for as the second parameter, let’s look back at the assembly before the call to see what we passed in:

     

    0:020> ub @eip

    shell32!CControlPanelFolder::_GetExtPropRegKey+0x98:

    76975e94 7c1f            jl      shell32!CControlPanelFolder::_GetExtPropRegKey+0xb9 (76975eb5)

    76975e96 56              push    esi

    76975e97 6a01            push    1

    76975e99 53              push    ebx

    76975e9a 8d8590fdffff    lea     eax,[ebp-270h]

    76975ea0 50              push    eax

    76975ea1 ffb588fdffff    push    dword ptr [ebp-278h]

    76975ea7 ff15a0128776    call    dword ptr [shell32!_imp__RegOpenKeyExW (768712a0)]

     

    Remember that function parameters are passed in in reverse order, so the second parameter is the second to last value passed to the function.  In this case, we can see that it was stored in eax.  Of course, eax has been overwritten by our return value of 2, but we can see that just before pushing eax, we get the value from ebp-0x270.  Dumping that out as a Unicode string, we get the key we were looking for:

     

    0:020> du @ebp-270h

    14c5ed54  "Software\Microsoft\Windows\Curre"

    14c5ed94  "ntVersion\Control Panel\Extended"

    14c5edd4  " Properties\System.ApplicationNa"

    14c5ee14  "me"

     

    The first parameter is the root key, which was:

     

    0:020> dc @ebp-278h L1

    14c5ed4c  80000002 

     

    We can look up this value in the Windows SDK header files and see that it is for HKLM:

    #define HKEY_LOCAL_MACHINE                  (( HKEY ) (ULONG_PTR)((LONG)0x80000002) )

     

    Aha!  So now we know we were trying to access “HKLM\ Software\Microsoft\Windows\CurrentVersion\Control Panel\Extended

     Properties\System.ApplicationName” which didn’t exist, so we were unable to proceed with loading the icon. 

     

    What’s that key for, you ask?  MSDN tells us that it is used to store the Canonical Name of the Control Panel item.  Canonical Names were added to the Control Panel in Windows Vista as a better way to organize and name the items in the Control Panel.  By convention, the canonical name of “Mail” is “Microsoft.Office.Outlook.Profiles.”  Because the “Show only specified control panel applets” policy can now accept either the legacy name (“Mail”) or the canonical name, we are checking the registry for this string.  Since the key is not present, GetDetailsEx cannot determine the canonical name, and the icon does not load.

     

    As a test, I created the key HKLM\ Software\Microsoft\Windows\CurrentVersion\Control Panel\Extended

     Properties\System.ApplicationName and added a REG_SZ to it with a name of the file location of the Mail CPL file ("C:\Program Files\Common Files\SYSTEM\MSMAPI\1033\MLCFG32.CPL") and a value of "Microsoft.Office.Outlook.Profiles".  Sure enough, the icon loads up on the first try for new users.

     

    While new Control Panel items should implement a canonical name, that doesn’t work for existing CPLs, like Mail from Outlook 2003.  Since the canonical name isn’t actually required (CPL_DoesPolicyAllow works with both name formats), we’re following up with the development team to allow CPL_DoesPolicyAllow to be called even if a canonical name is not found.

     

    -Matt Burrough

Page 1 of 1 (2 items)