Some of this post overlaps my previous CPU one, but the other one seemed a bit lopsided towards LSASS so I thought I would do another one which wasn't..

Below is a perfmon log where the machine is 80-90 cpu utilized.

The first thing to determine is if the CPU usage is in kernel mode or user mode.

 

The white line is process \ % privileged time – the process in this case is winlogon.exe

krnl_perfmon 

 

The  medium jaggies are the 4 processors averaging about 40% each.

 

 

 

 

 

Processor

 

% Privileged Time is the percentage of elapsed time that the process threads spent executing code in privileged mode.  When a Windows system service in called, the service will often run in privileged mode to gain access to system-private data. Such data is protected from access by threads executing in user mode. Calls to the system can be explicit or implicit, such as page faults or interrupts. Unlike some early operating systems, Windows uses process boundaries for subsystem protection in addition to the traditional protection of user and privileged modes. Some work done by Windows on behalf of the application might appear in other subsystem processes in addition to the privileged time in the process.

 

% User Time is the percentage of elapsed time the processor spends in the user mode. User mode is a restricted processing mode designed for applications, environment subsystems, and integral subsystems.  The alternative, privileged mode, is designed for operating system components and allows direct access to hardware and all memory.  The operating system switches application threads to privileged mode to access operating system services. This counter displays the average busy time as a percentage of the sample time.

 

Process

 

% Privileged Time is the percentage of elapsed time that the process threads spent executing code in privileged mode. When a Windows system service is called, the service will often run in privileged mode to gain access to system-private data. Such data is protected from access by threads executing in user mode. Calls to the system can be explicit or implicit, such as page faults or interrupts. Unlike some early operating systems, Windows uses process boundaries for subsystem protection in addition to the traditional protection of user and privileged modes. Some work done by Windows on behalf of the application might appear in other subsystem processes in addition to the privileged time in the process.

 

 

Once we have determine if it is kernel  versus usermode we can develop a plan of attack.

 

If you own the application, and it reproduces easily for you , you can use qslice ( I find it a lighter and easier to use than perfmon when the machine is pegged )  to find the thread\process info.

 

 

qslice

 

In the picture above, we see csrss.exe using CPU ( lets pretend it was 100% ) and then you can ‘zoom’ in and see that it is spending time in kernel – specifically thread ID 538.

 

If the time is being spent in usermode – break in to the process a few times and see what that thread is up to. This is the easiest method IMO, but not always available.

You can use the !runaway command to get info about who is hogging time.

 

If you can’t use a live debugger, the next best choice  ( IMO ) for CPU issues is Adplus. http://support.microsoft.com/kb/286350

 I would use ADplus instead of userdump.exe because it will also automatically get you the !runaway  results which you can not get from a static dump.  Specifically – I use ADplus in “hang mode “ and get 3-4 dumps to compare against. Is the thread moving at all? What routines are we repeating? Do you see a pattern?

 

 

Hang mode

In this mode, ADPlus immediately produces full memory dumps for all the processes that are specified on the command line after the script has completed. Each .dmp file that is created is put in a folder that contains the date/time stamp when ADPlus was run. Each file name contains the process name, the process ID, and the date/time stamp when ADPlus was run. While the process memory is being dumped to a file, the process is frozen. After the memory dump file has been created, the process is resumed by using a non-invasive attach/detach with the CDB debugger.

 

 

When you get the adplus info, it will be  in the  *.log file that adplus makes. The output looks like the sample below.

 

0:000> * ------ Thread CPU usage information ------

0:000> !runaway

 User Mode Time

  Thread       Time

   2:324       0 days 0:03:30.921

   0:2f0       0 days 0:00:26.687

  14:6b94      0 days 0:00:25.625

   9:c18       0 days 0:00:02.109

  15:6c6c      0 days 0:00:00.062

  18:615c      0 days 0:00:00.031

   5:494       0 days 0:00:00.015

  21:a758      0 days 0:00:00.000

  20:4108      0 days 0:00:00.000

  19:2004      0 days 0:00:00.000

  17:5814      0 days 0:00:00.000

 

 

 

Here is a recent example- these are from 3 different dumps taken about 10 seconds apart. The time was spent in usermode

 

You can see that the stacks all are the same except for where we are processing the SaferpLevelObjpTableCompare function. In all 3 we are in some state of the same function we can tell this via the various offsets +13, +22 and 0 . This leads us to examine this function and the parameters passed to it in order to see why we are spinning here.

 

 

ChildEBP RetAddr 

0875c618 7c921496 ADVAPI32!SaferpLevelObjpTableCompare

0875c630 7c9215f0 ntdll!FindNodeOrParent+0x1b

0875c648 7c9215d1 ntdll!RtlLookupElementGenericTableFull+0x16

0875c660 77dd926a ntdll!RtlLookupElementGenericTable+0x18

0875c6d4 77ddc648 ADVAPI32!CodeAuthzLevelObjpLookupByLevelId+0x26

0875c8d0 77ddc02b ADVAPI32!CodeAuthzLevelObjpLoadTable+0x291

0875c8f4 77ddcbf0 ADVAPI32!CodeAuthzpImmediateReloadCacheTables+0x94

0875c954 7c819983 ADVAPI32!SaferIdentifyLevel+0x79

0875cb50 7c819054 kernel32!BasepCheckWinSaferRestrictions+0x5d0

0875d594 7c80235e kernel32!CreateProcessInternalW+0xae9

0875d5cc 694575b3 kernel32!CreateProcessW+0x2c

0875df70 694582b1 faultrep!StartDWException+0x556

0875efe4 7c8633b1 faultrep!ReportFault+0x533

0875f684 7c83ab20 kernel32!UnhandledExceptionFilter+0x587

0875f68c 7c839b09 kernel32!BaseThreadStart+0x4d

0875f6b4 7c9037bf kernel32!_except_handler3+0x61

0875f6d8 7c90378b ntdll!ExecuteHandler2+0x26

0875f788 7c90eafa ntdll!ExecuteHandler+0x24

0875f788 7c911f6c ntdll!KiUserExceptionDispatcher+0xe

0875fca8 77e78609 ntdll!RtlAllocateHeap+0x335

 

ChildEBP RetAddr 

0875c618 7c921496 ADVAPI32!SaferpLevelObjpTableCompare+0x22

0875c630 7c9215f0 ntdll!FindNodeOrParent+0x1b

0875c648 7c9215d1 ntdll!RtlLookupElementGenericTableFull+0x16

0875c660 77dd926a ntdll!RtlLookupElementGenericTable+0x18

0875c6d4 77ddc648 ADVAPI32!CodeAuthzLevelObjpLookupByLevelId+0x26

0875c8d0 77ddc02b ADVAPI32!CodeAuthzLevelObjpLoadTable+0x291

0875c8f4 77ddcbf0 ADVAPI32!CodeAuthzpImmediateReloadCacheTables+0x94

0875c954 7c819983 ADVAPI32!SaferIdentifyLevel+0x79

0875cb50 7c819054 kernel32!BasepCheckWinSaferRestrictions+0x5d0

0875d594 7c80235e kernel32!CreateProcessInternalW+0xae9

0875d5cc 694575b3 kernel32!CreateProcessW+0x2c

0875df70 694582b1 faultrep!StartDWException+0x556

0875efe4 7c8633b1 faultrep!ReportFault+0x533

0875f684 7c83ab20 kernel32!UnhandledExceptionFilter+0x587

0875f68c 7c839b09 kernel32!BaseThreadStart+0x4d

0875f6b4 7c9037bf kernel32!_except_handler3+0x61

0875f6d8 7c90378b ntdll!ExecuteHandler2+0x26

0875f788 7c90eafa ntdll!ExecuteHandler+0x24

0875f788 7c911f6c ntdll!KiUserExceptionDispatcher+0xe

0875fca8 77e78609 ntdll!RtlAllocateHeap+0x335

 

 

0:028> KL

ChildEBP RetAddr 

0875c618 7c921496 ADVAPI32!SaferpLevelObjpTableCompare+0x13

0875c630 7c9215f0 ntdll!FindNodeOrParent+0x1b

0875c648 7c9215d1 ntdll!RtlLookupElementGenericTableFull+0x16

0875c660 77dd926a ntdll!RtlLookupElementGenericTable+0x18

0875c6d4 77ddc648 ADVAPI32!CodeAuthzLevelObjpLookupByLevelId+0x26

0875c8d0 77ddc02b ADVAPI32!CodeAuthzLevelObjpLoadTable+0x291

0875c8f4 77ddcbf0 ADVAPI32!CodeAuthzpImmediateReloadCacheTables+0x94

0875c954 7c819983 ADVAPI32!SaferIdentifyLevel+0x79

0875cb50 7c819054 kernel32!BasepCheckWinSaferRestrictions+0x5d0

0875d594 7c80235e kernel32!CreateProcessInternalW+0xae9

0875d5cc 694575b3 kernel32!CreateProcessW+0x2c

0875df70 694582b1 faultrep!StartDWException+0x556

0875efe4 7c8633b1 faultrep!ReportFault+0x533

0875f684 7c83ab20 kernel32!UnhandledExceptionFilter+0x587

0875f68c 7c839b09 kernel32!BaseThreadStart+0x4d

0875f6b4 7c9037bf kernel32!_except_handler3+0x61

0875f6d8 7c90378b ntdll!ExecuteHandler2+0x26

0875f788 7c90eafa ntdll!ExecuteHandler+0x24

0875f788 7c911f6c ntdll!KiUserExceptionDispatcher+0xe

0875fca8 77e78609 ntdll!RtlAllocateHeap+0x335

 

 

 

 

 

 

 

 

 

 

If the time  was being spent in kernel mode – qslice will show it as red – and perfmon will show as “privileged time” then you can use kernrate.exe to get some more info.

 

You can specify the module to zoom in on

 

----- Zoomed module NTKRNLPA.EXE (Bucket size = 16 bytes, Rounding Down) --------

Percentage in the following table is based on the Total Hits for this Zoom Module

 

Time   67741 hits, 25000 events per hit --------

 Module                                                 Hits      msec                %Total  Events/Sec

CmRegisterCallback                               36050     379273            52 %     2376256

KiDeliverApc                                          18989     379273            27 %     1251670

wctomb                                                 2581     379273              3 %      170128

KiCheckForKernelApcDelivery                 1388     379273              2 %       91490

SeQueryInformationToken                        891     379273              1 %       58730

KeReleaseInStackQueuedSpinLockFromDpcLevel        786     379273     1 %       51809

ZwYieldExecution                                  748     379273               1 %       49304

<snip>

 

 

We can see that we are at 52% of this module – the Cm** api’s are registry API’s.. However – this does not tell us why we are spending time here.

 

In order to do that we can use either a crash dump which will give us a single timeslice – or we can use LiveKd.exe  http://www.microsoft.com/technet/sysinternals/SystemInformation/LiveKd.mspx and  “break in “  as many times as we need to without rebooting the server and possibly losing the repro.

 

I found that livekd is a little picky about it’s commands ( sometime it seems to hang ) but if you keep at it you should be able to dump the thread you identified via qslice or perfmon.

Then, quite livekd – fire it up again and get another !thread and see where it is now.. keep doing this until you have an idea of where  you are spending the time.

 

 

 

0: kd> !thread 8cacfa18

THREAD 8cacfa18  Cid 02ec.0324  Teb: 7ffd8000 Win32Thread: bc1f9da0 RUNNING on processor 0

IRP List:

    8d2dfa38: (0006,0094) Flags: 00000800  Mdl: 00000000

Impersonation token:  d9ad4530 (Level Impersonation)

Unable to load NT!CmpKeyObjectType

Unable to load NT!ObpSymbolicLinkObjectType

Owning Process            8ca0ad88       Image:         winlogon.exe

Wait Start TickCount      163258430

Context Switch Count      37599399                 LargeStack

UserTime                  00:00:30.921

KernelTime                15 Days 21:48:58.046

Win32 Start Address 0x0527abd1

LPC Server thread working on message Id 527abd1

Start Address 0x77e6b5f3

Stack Init b6a59000 Current b6a58924 Base b6a59000 Limit b6a55000 Call 0

Priority 13 BasePriority 13 PriorityDecrement 0

ChildEBP RetAddr  Args to Child

WARNING: Frame IP not in any known module. Following frames may be wrong.

b6a5892c b6a58954 ffffffff b6a58954 80a576a3 0x1737000

b6a58960 80a55c68 80a55c6e ffffffff f7737b01 0xb6a58954

b6a58960 e0dfe3a0 80a55c6e ffffffff f7737b01 hal!HalpClockInterruptPn+0xc0 (FPO: [0,2] TrapFrame @ b6a589fc)

b6a58a1c 8082d8d0 dd0b2080 00000000 d83305b0 0xe0dfe3a0

dd0b2080 808bc5b5 00000000 808cde64 808cdeb0 nt!KiDeliverApc+0x1cc (FPO: [Non-Fpo])

dd0b2090 808dced7 808ce442 808ce2ec 808ce1aa nt!HvpReleaseCellMapped+0xed

dd0b2094 808ce441 808ce2ec 808ce1aa d830a000 nt!CmpFindControlSet+0x22b

dd0b2098 808ce2eb 808ce1aa d830a000 00000c48 nt!CmpFileRead+0x155

dd0b209c 808ce1a9 d830a000 00000c48 d9a8dce0 nt!CmpCreateEvent+0xad

dd0b20a0 d830a000 00000c48 d9a8dce0 00000010 nt!CmpFileWriteThroughCache+0x103

dd0b20a4 00000000 d9a8dce0 00000010 0000018c 0xd830a000

 

 

 

 

 

 

 

!thread 8cacfa18        **************************************************

THREAD 8cacfa18  Cid 02ec.0324  Teb: 7ffd8000 Win32Thread: bc1f9da0 RUNNING on processor 3

IRP List:                                                             ***

    8d2dfa38: (0006,0094) Flags: 00000800  Mdl: 00000000              ***

Impersonation token:  d9ad4530 (Level Impersonation)                  ***

***    In order for this command to work properly, your symbol path   ***

***    must point to .pdb files that have full type information.      ***

***                                                                   ***

***    Certain .pdb files (such as the public OS symbols) do not      ***

Unable to load NT!MmSectionObjectTypeon.  Contact the group that      ***

Unable to load NT!MmSectionObjectTypes if you need this command to    ***

Unable to load NT!ObpSymbolicLinkObjectType                           ***

Owning Process            8ca0ad88       Image:         winlogon.exe  ***

Wait Start TickCount      163310971    erInfo                         ***

Context Switch Count      37602369                 LargeStack         ***

UserTime                  00:00:30.921

KernelTime                15 Days 22:02:38.796

Win32 Start Address 0x0527abd1                                        ***

LPC Server thread working on message Id 527abd1                       ***

Start Address kernel32!BaseThreadStartThunk (0x77e6b5f3)              ***

Stack Init b6a59000 Current b6a58924 Base b6a59000 Limit b6a55000 Call 0*

Priority 13 BasePriority 13 PriorityDecrement 0ly, your symbol path   ***

ChildEBP RetAddr  Args to Child              l type information.      ***

WARNING: Frame IP not in any known module. Following frames may be wrong.

b6a5892c b6a58954 ffffffff b6a58954 80a576a3 0x173f000ls) do not      ***

b6a58958 80a55c5e b6a58a20 80a55c68 80a55c6e 0xb6a58954roup that      ***

b6a58958 e0dfe3a0 b6a58a20 80a55c68 80a55c6e hal!HalpClockInterruptPn+0xb6 (FPO: [0,2] TrapFrame @ b6a589fc)

b6a58a88 808bc0a4 dd0b2080 0015b020 d83305b0 0xe0dfe3a0               ***

b6a58a9c 808bcea2 dd0b2080 0015b020 0015b020 nt!HvpGetHCell+0x10 (FPO: [Non-Fpo])

b6a58ae4 808de677 dd0b2080 0015b020 808bc4c8 nt!HvIsCellAllocated+0x7c (FPO: [Non-Fpo])

b6a58b10 808cf280 010b2080 b6a58b28 dd0b25ec nt!CmpValidateHiveSecurityDescriptors+0x5b (FPO: [Non-Fpo])

b6a58b2c 808d6c0a dd0b2080 00000001 b6a58c54 nt!CmCheckRegistry+0x54 (FPO: [Non-Fpo])

b6a58b6c 808c8801 b6a58ba8 00000005 00000000 nt!CmpInitializeHive+0x2c8 (FPO: [Non-Fpo])

b6a58bc4 808ca2bc b6a58cf4 00000000 b6a58c54 nt!CmpInitHiveFromFile+0x91 (FPO: [Non-Fpo])

b6a58be8 808c4797 b6a58cbc b6a58c58 b6a58c4c nt!CmpCmdHiveOpen+0x1e (FPO: [Non-Fpo])

b6a58c98 808bbebb b6a58cd4 b6a58cbc 00000000 nt!CmLoadKey+0xcb (FPO: [Non-Fpo])

b6a58d3c 808bc06c 007cf400 007cf418 00000000 nt!NtLoadKeyEx+0x26b (FPO: [Non-Fpo])

b6a58d54 80888c7c 007cf400 007cf418 007cf448 nt!NtLoadKey+0x14 (FPO: [Non-Fpo])

b6a58d54 7c82ed54 007cf400 007cf418 007cf448 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ b6a58d64)

007cf3e0 00000000 00000101 00000000 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

 

 

 

 

0: kd> THREAD 8cacfa18  Cid 02ec.0324  Teb: 7ffd8000 Win32Thread: bc1f9da0 RUNNING on processor 0

IRP List:

    8d2dfa38: (0006,0094) Flags: 00000800  Mdl: 00000000

Impersonation token:  d9ad4530 (Level Impersonation)

Owning Process            8ca0ad88       Image:         winlogon.exe

Wait Start TickCount      163361125

Context Switch Count      37606489                 LargeStack

UserTime                  00:00:30.921

KernelTime                15 Days 22:15:42.015

Win32 Start Address 0x0527abd1

LPC Server thread working on message Id 527abd1

Start Address kernel32!BaseThreadStartThunk (0x77e6b5f3)

Stack Init b6a59000 Current b6a589cc Base b6a59000 Limit b6a55000 Call 0

Priority 13 BasePriority 13 PriorityDecrement 0

ChildEBP RetAddr  Args to Child

WARNING: Frame IP not in any known module. Following frames may be wrong.

b6a589d4 8cacfa5c 00000001 b6a58d64 8ca0ad88 0x0

b6a589fc 8082dba2 00000000 00000000 00000000 0x8cacfa5c

b6a58a1c 8082d8d0 dd0b2080 00000000 d83305b0 nt!KiCheckForKernelApcDelivery+0x1c (FPO: [0,0,0])

b6a58a88 808bc0a4 dd0b2080 0015b020 d83305b0 nt!KiDeliverApc+0x1cc (FPO: [Non-Fpo])

b6a58a9c 808bcea2 dd0b2080 0015b020 0015b020 nt!HvpGetHCell+0x10 (FPO: [Non-Fpo])

b6a58ae4 808de70a dd0b2080 0015b020 808bc4c8 nt!HvIsCellAllocated+0x7c (FPO: [Non-Fpo])

b6a58b10 808cf280 010b2080 b6a58b28 dd0b25ec nt!CmpValidateHiveSecurityDescriptors+0xee (FPO: [Non-Fpo])

b6a58b2c 808d6c0a dd0b2080 00000001 b6a58c54 nt!CmCheckRegistry+0x54 (FPO: [Non-Fpo])

b6a58b6c 808c8801 b6a58ba8 00000005 00000000 nt!CmpInitializeHive+0x2c8 (FPO: [Non-Fpo])

b6a58bc4 808ca2bc b6a58cf4 00000000 b6a58c54 nt!CmpInitHiveFromFile+0x91 (FPO: [Non-Fpo])

b6a58be8 808c4797 b6a58cbc b6a58c58 b6a58c4c nt!CmpCmdHiveOpen+0x1e (FPO: [Non-Fpo])

b6a58c98 808bbebb b6a58cd4 b6a58cbc 00000000 nt!CmLoadKey+0xcb (FPO: [Non-Fpo])

b6a58d3c 808bc06c 007cf400 007cf418 00000000 nt!NtLoadKeyEx+0x26b (FPO: [Non-Fpo])

b6a58d54 80888c7c 007cf400 007cf418 007cf448 nt!NtLoadKey+0x14 (FPO: [Non-Fpo])

b6a58d54 7c82ed54 007cf400 007cf418 007cf448 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ b6a58d64)

007cf3e0 00000000 00000101 00000000 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

 

 

 

 

 

 

We can tell a few  things right off the bat – a huge amount of time in kernel and we at least appear to be in a continuous loop starting at CmpValidateHiveSecurityDescriptors.

 

We looked at the usermode portion of the call and we were processing the user logon in winlogon.exe – loading the users profile hive ( ntuser.dat  ).  In the end , the hive had been corrupted and had invalid security descriptors in many places.

 

Hopefully this will help you diagnose your problems and if you do have to call into support  - then at least you have a head start. I was wowed the other day when someone had a whole Power point done with their analysis and dumps etc.. all ready to go. Makes my job a  lot easier  J

 

spatdsg