Desktop heap is probably not something that you spend a lot of time thinking about, which is a good thing. However, from time to time you may run into an issue that is caused by desktop heap exhaustion, and then it helps to know about this resource. Let me state up front that things have changed significantly in Vista around kernel address space, and much of what I’m talking about today does not apply to Vista.
Laying the groundwork: Session Space
To understand desktop heap, you first need to understand session space. Windows 2000, Windows XP, and Windows Server 2003 have a limited, but configurable, area of memory in kernel mode known as session space. A session represents a single user’s logon environment. Every process belongs to a session. On a Windows 2000 machine without Terminal Services installed, there is only a single session, and session space does not exist. On Windows XP and Windows Server 2003, session space always exists. The range of addresses known as session space is a virtual address range. This address range is mapped to the pages assigned to the current session. In this manner, all processes within a given session map session space to the same pages, but processes in another session map session space to a different set of pages.
Session space is divided into four areas: session image space, session structure, session view space, and session paged pool. Session image space loads a session-private copy of Win32k.sys modified data, a single global copy of win32k.sys code and unmodified data, and maps various other session drivers like video drivers, TS remote protocol driver, etc. The session structure holds various memory management (MM) control structures including the session working set list (WSL) information for the session. Session paged pool allows session specific paged pool allocations. Windows XP uses regular paged pool, since the number of remote desktop connections is limited. On the other hand, Windows Server 2003 makes allocations from session paged pool instead of regular paged pool if Terminal Services (application server mode) is installed. Session view space contains mapped views for the session, including desktop heap.
Session Space layout:
Session Image Space: win32k.sys, session drivers
Session Structure: MM structures and session WSL
Session View Space: session mapped views, including desktop heap
Session Paged Pool
Sessions, Window Stations, and Desktops
You’ve probably already guessed that desktop heap has something to do with desktops. Let’s take a minute to discuss desktops and how they relate to sessions and window stations. All Win32 processes require a desktop object under which to run. A desktop has a logical display surface and contains windows, menus, and hooks. Every desktop belongs to a window station. A window station is an object that contains a clipboard, a set of global atoms and a group of desktop objects. Only one window station per session is permitted to interact with the user. This window station is named "Winsta0." Every window station belongs to a session. Session 0 is the session where services run and typically represents the console (pre-Vista). Any other sessions (Session 1, Session 2, etc) are typically remote desktops / terminal server sessions, or sessions attached to the console via Fast User Switching. So to summarize, sessions contain one or more window stations, and window stations contain one or more desktops.
You can picture the relationship described above as a tree. Below is an example of this desktop tree on a typical system:
- Session 0
| ---- WinSta0 (interactive window station)
| | |
| | ---- Default (desktop)
| | ---- Disconnect (desktop)
| | ---- Winlogon (desktop)
| ---- Service-0x0-3e7$ (non-interactive window station)
| | ---- Default (desktop)
| ---- Service-0x0-3e4$ (non-interactive window station)
| ---- SAWinSta (non-interactive window station)
| | ---- SADesktop (desktop)
- Session 1
- Session 2
---- WinSta0 (interactive window station)
---- Default (desktop)
---- Disconnect (desktop)
---- Winlogon (desktop)
In the above tree, the full path to the SADesktop (as an example) can be represented as “Session 0\SAWinSta\SADesktop”.
Desktop Heap – what is it, what is it used for?
Every desktop object has a single desktop heap associated with it. The desktop heap stores certain user interface objects, such as windows, menus, and hooks. When an application requires a user interface object, functions within user32.dll are called to allocate those objects. If an application does not depend on user32.dll, it does not consume desktop heap. Let’s walk through a simple example of how an application can use desktop heap.
1. An application needs to create a window, so it calls CreateWindowEx in user32.dll.
2. User32.dll makes a system call into kernel mode and ends up in win32k.sys.
3. Win32k.sys allocates the window object from desktop heap
4. A handle to the window (an HWND) is returned to caller
5. The application and other processes in the same session can refer to the window object by its HWND value
Where things go wrong
Normally this “just works”, and neither the user nor the application developer need to worry about desktop heap usage. However, there are two primary scenarios in which failures related to desktop heap can occur:
So how do you know if you are running into these problems? Processes failing to start with a STATUS_DLL_INIT_FAILED (0xC0000142) error in user32.dll is a common symptom. Since user32.dll needs desktop heap to function, failure to initialize user32.dll upon process startup can be an indication of desktop heap exhaustion. Another symptom you may observe is a failure to create new windows. Depending on the application, any such failure may be handled in different ways. Note that if you are experiencing problem number one above, the symptoms would usually only exist in one session. If you are seeing problem two, then the symptoms would be limited to processes that use the particular desktop heap that is exhausted.
Diagnosing the problem
So how can you know for sure that desktop heap exhaustion is your problem? This can be approached in a variety of ways, but I’m going to discuss the simplest method for now. Dheapmon is a command line tool that will dump out the desktop heap usage for all the desktops in a given session. See our first blog post for a list of tool download locations. Once you have dheapmon installed, be sure to run it from the session where you think you are running out of desktop heap. For instance, if you have problems with services failing to start, then you’ll need to run dheapmon from session 0, not a terminal server session.
Dheapmon output looks something like this:
Desktop Heap Information Monitor Tool (Version 7.0.2727.0)
Copyright (c) 2003-2004 Microsoft Corp.
Session ID: 0 Total Desktop: ( 5824 KB - 8 desktops)
WinStation\Desktop Heap Size(KB) Used Rate(%)
WinSta0\Default 3072 5.7
WinSta0\Disconnect 64 4.0
WinSta0\Winlogon 128 8.7
Service-0x0-3e7$\Default 512 15.1
Service-0x0-3e4$\Default 512 5.1
Service-0x0-3e5$\Default 512 1.1
SAWinSta\SADesktop 512 0.4
__X78B95_89_IW\__A8D9S1_42_ID 512 0.4
As you can see in the example above, each desktop heap size is specified, as is the percentage of usage. If any one of the desktop heaps becomes too full, allocations within that desktop will fail. If the cumulative heap size of all the desktops approaches the total size of session view space, then new desktops cannot be created within that session. Both of the failure scenarios described above depend on two factors: the total size of session view space, and the size of each desktop heap allocation. Both of these sizes are configurable.
Configuring the size of Session View Space
Session view space size is configurable using the SessionViewSize registry value. This is a REG_DWORD and the size is specified in megabytes. Note that the values listed below are specific to 32-bit x86 systems not booted with /3GB. A reboot is required for this change to take effect. The value should be specified under:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management
Size if no registry value configured
Default registry value
Windows 2000 *
Windows Server 2003
* Settings for Windows 2000 are with Terminal Services enabled and hotfix 318942 installed. Without the Terminal Services installed, session space does not exist, and desktop heap allocations are made from a fixed 48 MB region for system mapped views. Without hotfix 318942 installed, the size of session view space is fixed at 20 MB.
The sum of the sizes of session view space and session paged pool has a theoretical maximum of slightly under 500 MB for 32-bit operating systems. The maximum varies based on RAM and various other registry values. In practice the maximum value is around 450 MB for most configurations. When the above values are increased, it will result in the virtual address space reduction of any combination of nonpaged pool, system PTEs, system cache, or paged pool.
Configuring the size of individual desktop heaps
Configuring the size of the individual desktop heaps is bit more complex. Speaking in terms of desktop heap size, there are three possibilities:
· The desktop belongs to an interactive window station and is a “Disconnect” or “Winlogon” desktop, so its heap size is fixed at 64KB or 128 KB, respectively (for 32-bit x86)
· The desktop heap belongs to an interactive window station, and is not one of the above desktops. This desktop’s heap size is configurable.
· The desktop heap belongs to a non-interactive window station. This desktop’s heap size is also configurable.
The size of each desktop heap allocation is controlled by the following registry value:
The default data for this registry value will look something like the following (all on one line):
SharedSection=1024,3072,512 Windows=On SubSystemType=Windows
The numeric values following "SharedSection=" control how desktop heap is allocated. These SharedSection values are specified in kilobytes.
The first SharedSection value (1024) is the shared heap size common to all desktops. This memory is not a desktop heap allocation, and the value should not be modified to address desktop heap problems.
The second SharedSection value (3072) is the size of the desktop heap for each desktop that is associated with an interactive window station, with the exception of the “Disconnect” and “Winlogon” desktops.
The third SharedSection value (512) is the size of the desktop heap for each desktop that is associated with a "non-interactive" window station. If this value is not present, the size of the desktop heap for non-interactive window stations will be same as the size specified for interactive window stations (the second SharedSection value).
Consider the two desktop heap exhaustion scenarios described above. If the first scenario is encountered (session view space is exhausted), and most of the desktop heaps are non-interactive, then the third SharedSection can be decreased in an effort to allow more (smaller) non-interactive desktop heaps to be created. Of course, this may not be an option if the processes using the non-interactive heaps require a full 512 KB. If the second scenario is encountered (a single desktop heap allocation is full), then the second or third SharedSection value can be increased to allow each desktop heap to be larger than 3072 or 512 KB. A potential problem with this is that fewer total desktop heaps can be created.
What are all these window stations and desktops in Session 0 anyway?
Now that we know how to tweak the sizes of session view space and the various desktops, it is worth talking about why you have so many window stations and desktops, particularly in session 0. First off, you’ll find that every WinSta0 (interactive window station) has at least 3 desktops, and each of these desktops uses various amounts of desktop heap. I’ve alluded to this previously, but to recap, the three desktops for each interactive window stations are:
· Default desktop - desktop heap size is configurable as described below
· Disconnect desktop - desktop heap size is 64k on 32-bit systems
· Winlogon desktop - desktop heap size is 128k on 32-bit systems
Note that there can potentially be more desktops in WinSta0 as well, since any process can call CreateDesktop and create new desktops.
Let’s move on to the desktops associated with non-interactive window stations: these are usually related to a service. The system creates a window station in which service processes that run under the LocalSystem account are started. This window station is named service-0x0-3e7$. It is named for the LUID for the LocalSystem account, and contains a single desktop that is named Default. However, service processes that run as LocalSystem interactive start in Winsta0 so that they can interact with the user in Session 0 (but still run in the LocalSystem context).
Any service process that starts under an explicit user or service account has a window station and desktop created for it by service control manager, unless a window station for its LUID already exists. These window stations are non-interactive window stations. The window station name is based on the LUID, which is unique for every logon. If an entity (other than System) logs on multiple times, a new window station is created for each logon. An example window station name is “service-0x0-22e1$”.
A common desktop heap issue occurs on systems that have a very large number of services. This can be a large number of unique services, or one (poorly designed, IMHO) service that installs itself multiple times. If the services all run under the LocalSystem account, then the desktop heap for Session 0\Service-0x0-3e7$\Default may become exhausted. If the services all run under another user account which logs on multiples times, each time acquiring a new LUID, there will be a new desktop heap created for every instance of the service, and session view space will eventually become exhausted.
Given what you now know about how service processes use window stations and desktops, you can use this knowledge to avoid desktop heap issues. For instance, if you are running out of desktop heap for the Session 0\Service-0x0-3e7$\Default desktop, you may be able to move some of the services to a new window station and desktop by changing the user account that the service runs under.
I hope you found this post interesting and useful for solving those desktop heap issues! If you have questions are comments, please let us know.
- Matthew Justice
[Update: 7/5/2007 - Desktop Heap, part 2 has been posted]
[Update: 9/13/2007 - Talkback video: Desktop Heap has been posted]
[Update: 3/20/2008 - The default interactive desktop heap size has been increased on 32-bit Vista SP1]
Hi! My name is Tate. I’m an Escalation Engineer on the Microsoft Critical Problem Resolution Platforms Team. I wanted to share one of the most common errors we troubleshoot here on the CPR team, its root cause being pool consumption, and the methods by which we can remedy it quickly!
This issue is commonly misdiagnosed, however, 90% of the time it is actually quite possible to determine the resolution quickly without any serious effort at all!
First, what do these events really mean?
Event ID 2020 Event Type: Error Event Source: Srv Event Category: None Event ID: 2020 Description: The server was unable to allocate from the system paged pool because the pool was empty.
Event ID 2019 Event Type: Error Event Source: Srv Event Category: None Event ID: 2019 Description: The server was unable to allocate from the system NonPaged pool because the pool was empty.
This is our friend the Server Service reporting that when it was trying to satisfy a request, it was not able to find enough free memory of the respective type of pool. 2020 indicates Paged Pool and 2019, NonPaged Pool. This doesn’t mean that the Server Service (srv.sys) is broken or the root cause of the problem, more often rather it is the first component to see the resource problem and report it to the Event Log. Thus, there could be (and usually are) a few more symptoms of pool exhaustion on the system such as hangs, or out of resource errors reported by drivers or applications, or all of the above!
What is Pool?
First, Pool is not the amount of RAM on the system, it is however a segment of the virtual memory or address space that Windows reserves on boot. These pools are finite considering address space itself is finite. So, because 32bit(x86) machines can address 2^32==4Gigs, Windows uses (by default) 2GB for applications and 2GB for kernel. Of the 2GB for kernel there are other things we must fit in our 2GB such as Page Table Entries (PTEs) and as such the maximum amount of Paged Pool for 32bit(x86) of ~460MB puts this in perspective in terms of our realistic limits per processor architecture. As this implies, 64bit(x64&ia64) machines have less of a problem here due to their larger address space but there are still limits and thus no free lunch.
*For more about determining current pool limits see the common question post “Why am I out of Paged Pool at ~200MB…” at the end of this post.
*For more info about pools: About Memory Management > Memory Pools
*This has changed a bit for Vista, see Dynamic Kernel Address space
What are these pools used for?
These pools are used by either the kernel directly, indirectly by its support of various structures due to application requests on the system (CreateFile for example), or drivers installed on the system for their memory allocations made via the kernel pool allocation functions.
Literally, NonPaged means that this memory when allocated will not be paged to disk and thus resident at all times, which is an important feature for drivers. Paged conversely, can be, well… paged out to disk. In the end though, all this memory is allocated through a common set of functions, most common is ExAllocatePoolWithTag.
Ok, so what is using it/abusing it? (our goal right!?)
Now that we know that the culprit is Windows or a component shipping with Windows, a driver, or an application requesting lots of things that the kernel has to create on its behalf, how can we find out which?
There are really four basic methods that are typically used (listing in order of increasing difficulty)
1.) Find By Handle Count
Handle Count? Yes, considering that we know that an application can request something of the OS that it must then in turn create and provide a reference to…this is typically represented by a handle, and thus charged to the process’ total handle count!
The quickest way by far if the machine is not completely hung is to check this via Task Manager. Ctrl+Shift+Esc…Processes Tab…View…Select Columns…Handle Count. Sort on Handles column now and check to see if there is a significantly large one there (this information is also obtainable via Perfmon.exe, Process Explorer, Handle.exe, etc.).
What’s large? Well, typically we should raise an eyebrow at anything over 5,000 or so. Now that’s not to say that over this amount is inherently bad, just know that there is no free lunch and that a handle to something usually means that on the other end there is a corresponding object stored in NonPaged or Paged Pool which takes up memory.
So for example let’s say we have a process that has 100,000 handles, mybadapp.exe. What do we do next?
Well, if it’s a service we could stop it (which releases the handles) or if an application running interactively, try to shut it down and look to see how much total Kernel Memory (Paged or NonPaged depending on which one we are short of) we get back. If we were at 400MB of Paged Pool (Look at Performance Tab…Kernel Memory…Paged) and after stopping mybadapp.exe with its 100,000 handles are now at a reasonable 100MB, well there’s our bad guy and following up with the owner or further investigating (Process Explorer from sysinternals or the Windows debugger for example) what type of handles are being consumed would be the next step.
For essential yet legacy applications, which there is no hope of replacing or obtaining support, we may consider setting up a performance monitor alert on the handle count when it hits a couple thousand or so (Performance Object: Process, Counter: Handle Count) and taking action to restart the bad service. This is a less than elegant solution for sure but it could keep the one rotten apple from spoiling the bunch by hanging/crashing the machine!
2.) By Pooltag (as read by poolmon.exe)
Okay, so no handle count gone wild? No problem.
For Windows 2003 and later machines, a feature is enabled by default that allows tracking of the pool consumer via something called a pooltag. For previous OS’s we will need to use a utility such as gflags.exe to Enable Pool Tagging (which requires a reboot unfortunately). This is usually just a 3-4 character string or more technically “a character literal of up to four characters delimited by single quotation marks” that the caller of the kernel api to allocate the pool will provide as its 3rd parameter. (see ExAllocatePoolWithTag)
The tool that we use to get the information about what pooltag is using the most is poolmon.exe. Launch this from a cmd prompt, hit B to sort by bytes descending and P to sort the list by the type (Paged, NonPaged, or Both) and we have a live view into what’s going on in the system. Look specifically at the Tag Name and its respective Byte Total column for the guilty party! Get Poolmon.exe Here or More info about poolmon.exe usage.
The cool thing is that we have most of the OS utilized pooltags already documented so we have an idea if there is a match for one of the Windows components in pooltag.txt. So if we see MmSt as the top tag for instance consuming far and away the largest amount, we can look at pooltag.txt and know that it’s the memory manager and also using that tag in a search engine query we might get the more popular KB304101 which may resolve the issue!
We will find pooltag.txt in the ...\Debugging Tools for Windows\triage folder when the debugging tools are installed.
Oh no, what if it’s not in the list? No problem…
We might be able to find its owner by using one of the following techniques:
• For 32-bit versions of Windows, use poolmon /c to create a local tag file that lists each tag value assigned by drivers on the local machine (%SystemRoot%\System32\Drivers\*.sys). The default name of this file is Localtag.txt.
Really all versions---->• For Windows 2000 and Windows NT 4.0, use Search to find files that contain a specific pool tag, as described in KB298102, How to Find Pool Tags That Are Used By Third-Party Drivers.
3.) Using Driver Verifier
Using driver verifier is a more advanced approach to this problem. Driver Verifier provides a whole suite of options targeted mainly at the driver developer to run what amounts to quality control checks before shipping their driver.
However, should pooltag identification be a problem, there is a facility here in Pool Tracking that does the heavy lifting in that it will do the matching of Pool consumer directly to driver!
Be careful however, the only option we will likely want to check is Pool Tracking as the other settings are potentially costly enough that if our installed driver set is not perfect on the machine we could get into an un-bootable situation with constant bluescreens notifying that xyz driver is doing abc bad thing and some follow up suggestions.
In summary, Driver Verifier is a powerful tool at our disposal but use with care only after the easier methods do not resolve our pool problems.
4.) Via Debug (live and postmortem)
As mentioned earlier the api being used here to allocate this pool memory is usually ExAllocatePoolWithTag. If we have a kernel debugger setup we can set a break point here to brute force debug who our caller is….but that’s not usually how we do it, can you say, “extended downtime?” There are other creative live debug methods with are a bit more advanced that we may post later…
Usually, debugging this problem involves a post mortem memory.dmp taken from a hung server or a machine that has experienced Event ID: 2020 or Event ID 2019 or is no longer responsive to client requests, hung, or often both. We can gather this dump via the Ctrl+Scroll Lock method see KB244139 , even while the machine is “hung” and seemingly unresponsive to the keyboard or Ctrl+Alt+Del !
When loading the memory.dmp via windbg.exe or kd.exe we can quickly get a feel for the state of the machine with the following commands.
Debugger output Example 1.1 (the !vm command)
2: kd> !vm
*** Virtual Memory Usage *** Physical Memory: 262012 ( 1048048 Kb) Page File: \??\C:\pagefile.sys Current: 1054720Kb Free Space: 706752Kb Minimum: 1054720Kb Maximum: 1054720Kb Page File: \??\E:\pagefile.sys Current: 2490368Kb Free Space: 2137172Kb Minimum: 2490368Kb Maximum: 2560000Kb Available Pages: 63440 ( 253760 Kb) ResAvail Pages: 194301 ( 777204 Kb) Modified Pages: 761 ( 3044 Kb) NonPaged Pool Usage: 52461 ( 209844 Kb)<<NOTE! Value is near NonPaged Max NonPaged Pool Max: 54278 ( 217112 Kb) ********** Excessive NonPaged Pool Usage *****
Note how the NonPaged Pool Usage value is near the NonPaged Pool Max value. This tells us that we are basically out of NonPaged Pool.
Here we can use the !poolused command to give the same information that poolmon.exe would have but in the dump….
Debugger output Example 1.2 (!poolused 2)
Note the 2 value passed to !poolused orders pool consumers by NonPaged
2: kd> !poolused 2 Sorting by NonPaged Pool Consumed
Pool Used: NonPaged Paged Tag Allocs Used Allocs Used Thre 120145 76892800 0 0 File 187113 29946176 0 0 AfdE 89683 25828704 0 0 TCPT 41888 18765824 0 0 AfdC 90964 17465088 0 0
We now see the “Thre” tag at the top of the list, the largest consumer of NonPaged Pool, let’s go look it up in pooltag.txt….
Thre - nt!ps - Thread objects
Note, the nt before the ! means that this is NT or the kernel’s tag for Thread objects.
So from our earlier discussion if we have a bunch of thread objects, I probably have an application on the system with a ton of handles and or a ton of Threads so it should be easy to find!
Via the debugger we can find this out easily via the !process 0 0 command which will show the TableSize (Handle Count) of over 90,000!
Debugger output Example 1.3 (the !process command continued)
Note the two zeros after !process separated by a space gives a list of all running processes on the system.
PROCESS 884e6520 SessionId: 0 Cid: 01a0 Peb: 7ffdf000 ParentCid: 0124DirBase: 110f6000 ObjectTable: 88584448 TableSize: 90472Image: mybadapp.exe
We can dig further here into looking at the threads…
Debugger output Example 1.4 (the !process command continued)
0: kd> !PROCESS 884e6520 4PROCESS 884e6520 SessionId: 0 Cid: 01a0 Peb: 7ffdf000 ParentCid: 0124DirBase: 110f6000 ObjectTable: 88584448 TableSize: 90472.Image: mybadapp.exe
THREAD 884d8560 Cid 1a0.19c Teb: 7ffde000 Win32Thread: a208f648 WAIT THREAD 88447560 Cid 1a0.1b0 Teb: 7ffdd000 Win32Thread: 00000000 WAIT THREAD 88396560 Cid 1a0.1b4 Teb: 7ffdc000 Win32Thread: 00000000 WAIT THREAD 88361560 Cid 1a0.1bc Teb: 7ffda000 Win32Thread: 00000000 WAIT THREAD 88335560 Cid 1a0.1c0 Teb: 7ffd9000 Win32Thread: 00000000 WAIT THREAD 88340560 Cid 1a0.1c4 Teb: 7ffd8000 Win32Thread: 00000000 WAIT
And the list goes on…
We can examine the thread via !thread 88340560 from here and so on…
So in this rudimentary example the offender is clear in mybadapp.exe in its abundance of threads and one could dig further to determine what type of thread or functions are being executed and follow up with the owner of this executable for more detail, or take a look at the code if the application is yours!
Why am I out of Paged Pool at ~200MB when we say that the limit is around 460MB?
This is because the memory manager at boot decided that given the current amount of RAM on the system and other memory manager settings such as /3GB, etc. that our max is X amount vs. the maximum. There are two ways to see the maximum’s on a system.
1.) Process Explorer using its Task Management. View…System Information…Kernel Memory section.
Note that we have to specify a valid path to dbghelp.dll and Symbols path via Options…Configure Symbols.
c:\<path to debugging tools for windows>\dbghelp.dll
2.)The debugger (live or via a memory.dmp by doing a !vm)
*NonPaged pool size is not configurable other than the /3GB boot.ini switch which lowers NonPaged Pool’s maximum.
128MB with the /3GB switch, 256MB without
Conversely, Paged Pool size is often able to be raised to around its maximum manually via the PagedPoolSize registry setting which we can find for example in KB304101.
So what is this Pool Paged Bytes counter I see in Perfmon for the Process Object?
This is when the allocation is charged to a process via ExAllocatePoolWithQuotaTag. Typically, we will see ExAlloatePoolWithTag used and thus this counter is less effective…but hey…don’t pass up free information in Perfmon so be on the lookout for this easy win.
“Who's Using the Pool?” from Driver Fundamentals > Tips: What Every Driver Writer Needs to Know
Poolmon Remarks: http://technet2.microsoft.com/WindowsServer/en/library/85b0ba3b-936e-49f0-b1f2-8c8cb4637b0f1033.mspx
I hope you have enjoyed this post and hopefully it will get you going in the right direction next time you see one of these events or hit a pool consumption issue!
This information applies specifically to Windows 2000, Windows XP, and Windows Server 2003. I will blog separately on the boot changes in Windows Vista.
Additional information about this topic may be found in Chapter 5 of Microsoft Windows Internals by Russinovich and Solomon and also on TechNet: http://technet.microsoft.com/en-us/library/bb457123.aspx
The key to understanding and troubleshooting system startup issues is to accurately ascertain the point of failure. To facilitate this determination, I have divided the boot process into the following four phases.
2. Boot Loader
Over the next few weeks, I’ll be describing each of these phases in detail and providing appropriate guidance for troubleshooting relevant issues for each phase.
The Initial Phase of boot is divided into the Power-On Self Test (POST) and Initial Disk Access.
Power-On Self Test
POST activities are fully implemented by the computer’s BIOS and vary by manufacturer. Please refer to the technical documentation provided with your hardware for details. However, regardless of manufacturer, certain generic actions are performed to ensure stable voltage, check RAM, enable interrupts for system usage, initialize the video adapter, scan for peripheral cards and perform a memory test if necessary. Depending on manufacturer and configuration, a single beep usually indicates a successful POST.
Troubleshooting the POST
ü Make sure you have the latest BIOS and firmware updates for the hardware installed in the system.
ü Replace the CMOS battery if it has failed.
ü Investigate recently added hardware (RAM, Video cards, SCSI adapters, etc.)
ü Remove recently added RAM modules.
ü Remove all adapter cards, then replace individually, ensuring they are properly seated.
ü Move adapter cards to other slots on the motherboard.
ü If the computer still will not complete POST, contact your manufacturer.
Initial Disk Access
Depending on the boot device order specified in your computer’s BIOS, your computer may attempt to boot from a CD-ROM, Network card, Floppy disk, USB device or a hard disk. For the purposes of this document, we’ll assume that we’re booting to a hard disk since that is the most common scenario.
Before we discuss the sequence of events that occur during this phase of startup, we need to understand a little bit about the layout of the boot disk. The structure of the hard disk can be visualized this way: (Obviously, these data areas are not to scale)
Hard disks are divided into Cylinders, Heads and Sectors. A sector is the smallest physical storage unit on a disk and is almost always 512 bytes in size. For more information about the physical structure of a hard disk, please refer to the following Resource Kit chapter: http://www.microsoft.com/resources/documentation/windowsnt/4/server/reskit/en-us/resguide/diskover.mspx
There are two disk sectors critical to starting the computer that we’ll be discussing in detail:
· Master Boot Record (MBR)
· Boot Sector
The MBR is always located at Sector 1 of Cylinder 0, Head 0 of each physical disk. The Boot Sector resides at Sector 1 of each partition. These sectors contain both executable code and the data required to run the code.
Please note that there is some ambiguity involved in sector numbering. Cylinder/Head/Sector (CHS) notation begins numbering at C0/H0/S1. However, Absolute Sector numbering begins numbering at zero. Absolute Sector numbering is often used in disk editing utilities such as DskProbe. These differences are discussed in the following knowledge base article:
Q97819 Ambiguous References to Sector One and Sector Zero
Now that we have that straight, when does this information get written to disk? The MBR is created when the disk is partitioned. The Boot Sector is created when you format a volume. The MBR contains a small amount of executable code called the Master Boot Code, the Disk Signature and the partition table for the disk. At the end of the MBR is a 2-byte structure called a Signature Word or End of Sector marker, which should always be set to 0x55AA. A Signature Word also marks the end of an Extended Boot Record (EBR) and the Boot Sector.
The Disk Signature, a unique number at offset 0x1B8, identifies the disk to the operating system. Windows 2000 and higher operating systems use the disk signature as an index to store and retrieve information about the disk in the registry subkey HKLM\System\MountedDevices.
The Partition Table is a 64-byte data structure within the MBR used to identify the type and location of partitions on a hard disk. Each partition table entry is 16 bytes long (four entries max). Each entry starts at a predetermined offset from the beginning of the sector as follows:
Partition 1 0x1BE (446)
Partition 2 0x1CE (462)
Partition 3 0x1DE (478)
Partition 4 0x1EE (494)
The following is a partial example of a sample MBR showing three partition table entries in-use and one empty:
000001B0: 80 01
000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00 00 00
000001D0: 81 0A 07 FE FF FF 8A F5 – 7F 00 3D 26 9C 00 00 00
000001E0: C1 FF 05 FE FF FF C7 1B – 1C 01 D6 96 92 00 00 00
000001F0: 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00
Let’s take a look at each of the fields of a partition table entry individually. For each of these explanations, I’ll use the first partition table entry above and highlight the relevant section. Keep in mind that these values are little-endian.
00=Do Not Use for Booting
80=Active partition (Use for Booting)
000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00
000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00
Only the first 6 bits are used. The upper 2 bits of this byte are used by the Starting Cylinder field.
Uses 1 byte + 2 bits from the Starting Sector field to make up the cylinder value. The Starting Cylinder is a 10-bit number with a maximum value of 1023.
Defines the volume type. 0x07=NTFS
Other Possible System ID Values:
FAT12 primary partition or logical drive (fewer than 32,680 sectors in the volume)
FAT16 partition or logical drive (32,680–65,535 sectors or 16 MB–33 MB)
BIGDOS FAT16 partition or logical drive (33 MB–4 GB)
Installable File System (NTFS partition or logical drive)
FAT32 partition or logical drive
FAT32 partition or logical drive using BIOS INT 13h extensions
BIGDOS FAT16 partition or logical drive using BIOS INT 13h extensions
Extended partition using BIOS INT 13h extensions
Dynamic disk volume
Legacy FT FAT16 disk *
Legacy FT NTFS disk *
Legacy FT volume formatted with FAT32 *
Legacy FT volume using BIOS INT 13h extensions formatted with FAT32 *
Partition types denoted with an asterisk (*) indicate that they are also used to designate non-FT configurations such as striped and spanned volumes.
Ending Head (0xFE=254 decimal)
As with the Starting Sector, it only uses the first 6 bits of the byte. The upper 2 bits are used by the Ending Cylinder field.
Uses 1 byte in addition to the upper 2 bits from the Ending Sector field to make up the cylinder value. The Ending Cylinder is a 10-bit number with a maximum value of 1023.
The offset from the beginning of the disk to the beginning of the volume, counting by sectors.
0x0000003F = 63
The total number of sectors in the volume.
0x007FF54B = 8,385,867 Sectors = 4GB
Are you with me so far? Good! Now, Cylinder/Sector encoding can be a bit tricky, so let’s take a closer look.
Cylinder - Sector Encoding
Cylinder bits 1-8
9 & 10
As you can see, the Sector value occupies the lower 6 bits of the word and the Cylinder occupies the upper 10 bits of the word. In our example, the starting values for Cylinder and Sector are 01 00. Values in the MBR use reverse-byte ordering, which is also referred to as ‘little-endian’ notation. Therefore, we swap the bytes and find that our starting values are Cyl 0, Sec 1.
Our ending values are more interesting: BF 09. First, we swap the bytes and obtain a hex value of 0x09BF. This value in binary notation is 100110111111. The following table illustrates how we derive the correct partition table values from these bytes:
Example: BF 09
10 Cylinder value bits 1-8
The 6 low bits are all set to 1, therefore our Sector value is 111111 or 63. You can see above how the bits are arranged for the Cylinder value. The value above is 1000001001 (521). Since both Cylinder and Head values begin numbering at zero, we have a total of 522 Cylinders and 255 Heads represented here. This gives us an ending CHS value of: 522 x 255 x 63 = 8,385,930 sectors.
Subtracting the starting CHS address (Cylinder 0, Head 1, Sector 1) (63) gives us the total size of this partition: 8,385,867 sectors or 4GB. We can verify this number by comparing it to the Total Sectors represented in the partition table: 4B F5 7F 00. Applying reverse-byte ordering gives us 00 7F F5 4B which equals 8,385,867 sectors.
So, now that we have an understanding of what is contained within the structures on the disk, let’s look at the sequence of events that occur. Remember, this is just after POST has successfully completed.
1. The motherboard ROM BIOS attempts to access the first boot device specified in the BIOS. (This is typically user configurable and can be edited using the BIOS configuration utility.)
2. The ROM BIOS reads Cylinder 0, Head 0, and Sector 1 of the first boot device.
3. The ROM BIOS loads that sector into memory and tests it.
a. For a floppy drive, the first sector is a FAT Boot Sector.
b. For a hard drive, the first sector is the Master Boot Record (MBR).
4. When booting to the hard drive, the ROM BIOS looks at the last two signature bytes of Sector 1 and verifies they are equal to 55AA.
a. If the signature bytes do not equal 55AA the system assumes that the MBR is corrupt or the hard drive has never been partitioned. This invokes BIOS Interrupt 18, which displays an error that is BIOS-vendor specific such as “Operating System not found”.
b. If the BIOS finds that the last two bytes are 55AA, the MBR program executes.
5. The MBR searches the partition table for a boot indicator byte 0x80 indicating an active partition.
a. If no active partition is found, BIOS Interrupt 18 is invoked and a BIOS error is displayed such as “Operating System not found”.
b. If any boot indicator in the MBR has a value other than 0x80 or 0x00, or if more than one boot indicator indicates an active partition (0x80), the system stops and displays “Invalid partition table”.
c. If an active partition is found, that partition’s Boot Sector is loaded and tested.
6. The MBR loads the active partition’s Boot Sector and tests it for 55AA.
a. If the Boot Sector cannot be read after five retries, the system halts and displays “Error loading operating system”.
b. If the Boot Sector can be read but is missing its 55AA marker, “Missing operating system” is displayed and the system halts.
c. The bootstrap area is made up of a total of 16 sectors (0-15). If sector 0 for the bootstrap code is valid, but there is corruption in sectors 1-15, you may get a black screen with no error. In this case, it may be possible to transplant sectors 1-15 (not Sector 0) from another NTFS partition using DskProbe.
d. If the Boot Sector is loaded and it passes the 55AA test, the MBR passes control over to the active partition’s Boot Sector.
7. The active partition’s Boot Sector begins executing and looks for NTLDR. The contents of the Boot Sector are entirely dependent on the format of the partition. For example, if the boot partition is a FAT partition, Windows writes code to the Boot Sector that understands the FAT file system. However, if the partition is NTFS, Windows writes NTFS-capable code. The role of the Boot Sector code is to give Windows just enough information about the structure and format of a logical drive to enable it to read the NTLDR file from the root directory. The errors at this point of the boot process are file system specific. The following are possible errors with FAT and NTFS Boot Sectors.
a. FAT: In all cases it displays “press any key to restart” after either of the following errors.
(1) If NTLDR is not found “NTLDR is missing” is displayed
(2) If NTLDR is on a bad sector, “Disk Error” displays.
b. NTFS: In all cases it displays “Press CTRL+ALT+DEL to restart” after any of the following errors.
(1) If NTLDR is on a bad sector, “A disk read error occurred” is displayed. This message may also be displayed on Windows 2000 or higher systems if Extended Int13 calls are required, but have been disabled in the CMOS or SCSI BIOS. This behavior may also be seen if an FRS (File Record Segment) cannot be loaded or if any NTFS Metadata information is corrupt.
(2) If NTLDR is not found, “NTLDR is missing” displays.
(3) If NTLDR is compressed, “NTLDR is compressed” displays.
8. Once NTLDR is found, it is loaded into memory and executed. At this point the Boot Loader phase begins.
Troubleshooting the Initial Phase (Initial Disk Access)
If you are unable to boot Windows and the failure is occurring very early in the boot process, begin by creating a Windows boot floppy and using it to attempt to boot the system.
Q119467 How to Create a Bootable Disk for an NTFS or FAT Partition
If you are unable to boot to Windows NT with a floppy, skip to the section titled “Unable to boot using floppy”.
If you can successfully start the computer from the boot disk, the problem is limited to the Master Boot Record, the Boot Sector, or one or more of the following files: NTLDR, NTDetect.com, or Boot.ini. After Windows is running, you should immediately back up all data before you attempt to fix the Boot Sector or Master Boot Record.
1. Run a current virus scanning program to verify that no virus is present.
2. Select the method of troubleshooting based upon the error generated.
· Operating System not found
Typical BIOS-specific error indicating no 55AA signature on the MBR. This must be repaired manually using a disk editor such as DskProbe.
Warning: Running FixMBR or FDISK /MBR in this case will clear the partition table.
· Invalid Partition Table
May be repaired using DiskProbe – check for multiple 0x80 boot indicators.
· Error Loading Operating System
This error indicates an invalid Boot Sector. Use the “Repairing the Boot Sector” section below.
· Missing Operating System
This error indicates that the 55AA signature is missing from the Boot Sector. This may be replaced using DskProbe or by using the “Repairing the Boot Sector” section below.
· NTLDR is missing
Use the “Repairing the NTLDR file” section below.
· Disk error (FAT only)
Indicates that NTLDR is located on a bad sector.
· A disk read error occurred (NTFS only)
Indicates that NTLDR is on a bad sector. Use the “Repairing the NTLDR file” section below.
· NTLDR is compressed
Indicates that NTLDR is compressed. Uncompress or replace NTLDR. Note: This error is rare.
· Computer boots to a black screen with blinking cursor
Could be an issue with the MBR or the active partition’s boot code (Q228734). Use the appropriate section below.
Repairing the MBR
Use the FIXMBR command from the Windows Recovery Console to re-write the boot code located in the first 446 bytes of the MBR, but to leave the partition table intact.
326215 How To Use the Recovery Console on a Windows Server 2003-Based Computer That Does Not Start
WARNING: DO NOT use this method if you are receiving “Operating System not found” or other BIOS-specific error. FIXMBR will zero out the partition table if the 55AA signature is missing from the MBR. The data will be unrecoverable.
Repairing the Boot Sector
Use the FIXBOOT command from the Windows Recovery Console to write a new Boot Sector to the system partition.
If you are not able to repair the Boot Sector by using this method, you may repair it manually using the following knowledge base article:
Q153973 Recovering NTFS Boot Sector on NTFS Partitions
Repairing the NTLDR file
Use the Recovery Console to copy NTLDR from \i386 directory of the Windows CD-ROM to the root of the hard drive.
Unable to boot using floppy
If you are unable to start the computer with a boot floppy and you are not receiving an error message, then the problem is most likely with your partition tables. If invalid partitions are present and you are unable to start your computer with a boot disk, formatting the disk, reinstalling Windows and restoring from backup may be your only option. It may be possible to recreate the partition table using DskProbe if a backup of the partition information is available. It also may be possible to manually reconstruct the partition table however this is outside the scope of this blog post. We may cover this later.
If you do not have a current backup of the data on the computer, as a last resort a data recovery service may be able to recover some of your information.
Helpful knowledge base articles:
Q272395 Error Message: Boot Record Signature AA55 Not Found
Q155892 Windows NT Boot Problem: Kernel File Is Missing From the Disk
Q228004 Changing Active Partition Can Make Your System Unbootable
Q155053 Black Screen on Boot
Q314503 Computer Hangs with a Black Screen When You Start Windows
Q153973 Recovering NTFS Boot Sector on NTFS Partitions
Okay boys & girls, that’s it for now. Next time, I’ll continue with the boot sequence and cover the Boot Loader phase. This is where things really get interesting… J
Cache is used to reduce the performance impact when accessing data that resides on slower storage media. Without it your PC would crawl along and become nearly unusable. If data or code pages for a file reside on the hard disk, it can take the system 10 milliseconds to access the page. If that same page resides in physical RAM, it can take the system 10 nanoseconds to access the page. Access to physical RAM is about 1 million times faster than to a hard drive. It would be great if we could load up all the contents of the hard drive into RAM, but that scenario is cost prohibitive and dangerous. Hard disk space is far less costly and is non-volatile (the data is persistent even when disconnected from a power source).
Since we are limited with how much RAM we can stick in a box, we have to make the most of it. We have to share this crucial physical resource with all running processes, the kernel and the file system cache. You can read more about how this works here:
The file system cache resides in kernel address space. It is used to buffer access to the much slower hard drive. The file system cache will map and unmap sections of files based on access patterns, application requests and I/O demand. The file system cache operates like a process working set. You can monitor the size of your file system cache's working set using the Memory\System Cache Resident Bytes performance monitor counter. This value will only show you the system cache's current working set. Once a page is removed from the cache's working set it is placed on the standby list. You should consider the standby pages from the cache manager as a part of your file cache. You can also consider these standby pages to be available pages. This is what the pre-Vista Task Manager does. Most of what you see as available pages is probably standby pages for the system cache. Once again, you can read more about this in "The Memory Shell Game" post.
Too Much Cache is a Bad Thing
The memory manager works on a demand based algorithm. Physical pages are given to where the current demand is. If the demand isn't satisfied, the memory manager will start pulling pages from other areas, scrub them and send them to help meet the growing demand. Just like any process, the system file cache can consume physical memory if there is sufficient demand.
Having a lot of cache is generally not a bad thing, but if it is at the expense of other processes it can be detrimental to system performance. There are two different ways this can occur - read and write I/O.
Excessive Cached Write I/O
Applications and services can dump lots of write I/O to files through the system file cache. The system cache's working set will grow as it buffers this write I/O. System threads will start flushing these dirty pages to disk. Typically the disk can't keep up with the I/O speed of an application, so the writes get buffered into the system cache. At a certain point the cache manager will reach a dirty page threshold and start to throttle I/O into the cache manager. It does this to prevent applications from overtaking physical RAM with write I/O. There are however, some isolated scenarios where this throttle doesn't work as well as we would expect. This could be due to bad applications or drivers or not having enough memory. Fortunately, we can tune the amount of dirty pages allowed before the system starts throttling cached write I/O. This is handled by the SystemCacheDirtyPageThreshold registry value as described in Knowledge Base article 920739: http://support.microsoft.com/default.aspx?scid=kb;EN-US;920739
Excessive Cached Read I/O
While the SystemCacheDirtyPageThreshold registry value can tune the number of write/dirty pages in physical memory, it does not affect the number of read pages in the system cache. If an application or driver opens many files and actively reads from them continuously through the cache manager, then the memory manger will move more physical pages to the cache manager. If this demand continues to grow, the cache manager can grow to consume physical memory and other process (with less memory demand) will get paged out to disk. This read I/O demand may be legitimate or may be due to poor application scalability. The memory manager doesn't know if the demand is due to bad behavior or not, so pages are moved simply because there is demand for it. On a 32 bit system, the file system cache working set is essentially limited to 1 GB. This is the maximum size that we blocked off in the kernel for the system cache working set. Since most systems have more than 1 GB of physical RAM today, having the system cache working set consume physical RAM with read I/O is less likely.
This scenario; however, is more prevalent on 64 bit systems. With the increase in pointer length, the kernel's address space is greatly expanded. The system cache's working set limit can and typically does exceed how much memory is installed in the system. It is much easier for applications and drivers to load up the system cache with read I/O. If the demand is sustained, the system cache's working set can grow to consume physical memory. This will push out other process and kernel resources out to the page file and can be very detrimental to system performance.
Fortunately we can also tune the server for this scenario. We have added two APIs to query and set the system file cache size - GetSystemFileCacheSize() and SetSystemFileCacheSize(). We chose to implement this tuning option via API calls to allow setting the cache working set size dynamically. I’ve uploaded the source code and compiled binaries for a sample application that calls these APIs. The source code can be compiled using the Windows DDK, or you can use the included binaries. The 32 bit version is limited to setting the cache working set to a maximum of 4 GB. The 64 bit version does not have this limitation. The sample code and included binaries are completely unsupported. It is just a quick and dirty implementation with little error handling.
Matthew here again – I want to provide some follow-up information on desktop heap. In the first post I didn’t discuss the size of desktop heap related memory ranges on 64-bit Windows, 3GB, or Vista. So without further ado, here are the relevant sizes on various platforms...
Windows XP (32-bit)
· 48 MB = SessionViewSize (default registry value, set for XP Professional, x86)
· 20 MB = SessionViewSize (if no registry value is defined)
· 3072 KB = Interactive desktop heap size (defined in the registry, SharedSection 2nd value)
· 512 KB = Non-interactive desktop heap size (defined in the registry, SharedSection 3nd value)
· 128 KB = Winlogon desktop heap size
· 64 KB = Disconnect desktop heap size
Windows Server 2003 (32-bit)
· 48 MB = SessionViewSize (default registry value)
· 20 MB = SessionViewSize (if no registry value is defined; this is the default for Terminal Servers)
Windows Server 2003 booted with 3GB (32-bit)
· 20 MB = SessionViewSize (registry value has no effect)
You may also see reduced heap sizes when running 3GB. During the initialization of the window manager, an attempt is made to reserve enough session view space to accommodate the expected number of desktops heaps for a given session. If the heap sizes specified in the SharedSection registry value have been increased, the attempt to reserve session view space may fail. When this happens, the window manager falls back to a pair of “safe” sizes for desktop heaps (512KB for interactive, 128KB for non-interactive) and tries to reserve session space again, using these smaller numbers. This ensures that even if the registry values are too large for the 20MB session view space, the system will still be able to boot.
Windows Server 2003 (64-bit)
· 104 MB = SessionViewSize (if no registry value is defined; which is the default)
· 20 MB = Interactive desktop heap size (defined in the registry, SharedSection 2nd value)
· 768 KB = Non-interactive desktop heap size (defined in the registry, SharedSection 3nd value)
· 192 KB = Winlogon desktop heap size
· 96 KB = Disconnect desktop heap size
Windows Vista RTM (32-bit)
· Session View space is now a dynamic kernel address range. The SessionViewSize registry value is no longer used.
· 3072 KB = Interactive desktop heap size (defined in the registry, SharedSection 2nd value)
· 512 KB = Non-interactive desktop heap size (defined in the registry, SharedSection 3nd value)
Windows Vista (64-bit) and Windows Server 2008 (64-bit)
Windows Vista SP1 (32-bit) and Windows Server 2008 (32-bit)
· 12288 KB = Interactive desktop heap size (defined in the registry, SharedSection 2nd value)
· 64 KB = Disconnect desktop heap size
Windows Vista introduced a new public API function: CreateDesktopEx, which allows the caller to specify the size of desktop heap.
Additionally, GetUserObjectInformation now includes a new flag for retrieving the desktop heap size (UOI_HEAPSIZE).
[Update: 3/20/2008 - Added Vista SP1 and Server 2008 info. More information can be found here.]
I recently came across a very interesting profiling tool that is available in Vista SP1 and Server 08 called the Windows Performance Analyzer. You can use this tool to profile and diagnose different kinds of symptoms that the machine is experiencing. This tool is built on top off the Event Tracing for Windows (ETW) infrastructure. It uses the ETW providers to record kernel events and then display them in a graphical format.
Performance Analyzer provides many different graphical views of trace data including:
Download the latest version of the Windows Performance Tools Kit, and install it on your machine. (http://www.microsoft.com/whdc/system/sysperf/perftools.mspx : Windows Performance Tools Kit, v.4.1.1 (QFE)) You will need to find the toolkit that corresponds to your processor architecture. Currently there are 3 versions available i.e. X86, IA64, X64.
After installation you should be able to see 2 new tools. The first one is Xperf, which is a command line tool that is used to capture the trace. The second is called XperfView, which graphically interprets the trace that has been collected by Xperf.
You will need to run the Xperf and XperfView from an elevated command prompt for all functionality.
For many tasks all you need for effective analysis is a kernel trace. For this example, we'll use the –on DiagEasy parameter to enable several kernel events including: image loading; disk I/O; process and thread events; hard faults; deferred procedure calls; interrupts; context switches; and, and performance counters. From an elevated command prompt launch xperf –on DiagEasy.
This starts the kernel logger in sequential mode to the default file "\kernel.etl"; uses a default buffer size of 64K, with a minimum of 64 and a maximum of 320 buffers.
To stop a trace, type xperf –d <filename>.etl at the command line. This will stop the trace and output the file.
There are 2 ways to view the trace. From an Elevated Command prompt, launch xperf <filename>.etl, or launch the XperfView tool and open the file manually. When you open the trace file, you should see something similar like this.
NOTE - While you need to run xperf from an elevated command prompt in order to record a trace you do not need an elevated command prompt in order to *analyze* a trace.
Using the Chart Selector tab, you can select all the graphs that you want to look at. To drill down in each chart, you can select the Summary table. For instance, in the CPU Sampling chart, the summary table gets you the summary of the processes that were running, with information like the amount of CPU time, CPU %, stacks (if the stacks were collected in the trace, see below). When looking at the Summary table for the Disk I/O chart, you can see which processes were writing files (the filename too!) to disk, as well as how much time it took.
You also have the ability to zoom in on a selected area. Another really cool feature is the ability to overlay multiple graphs on one frame. This way you can correlate different pieces of data together very easily.
Also, you select which counter instances you want to see in each specific chart. On the top right corner of each chart is a drop down box from where you can select the counter instances. For instance on the Disk I/O chart, you can select Disk 0, Disk 1, or a combination as well.
You can also view detailed information about the system that the trace was taken on. Click on the Trace menu item, and select System Configuration.
In the first sample Xperf command we ran, xperf –on DiagEasy. I am sure many of you were wondering what DiagEasy means. DiagEasy is a group of kernel events that are predefined by the Windows Performance Toolkit. This group includes Process, Threads, Kernel and User Image Load/Unload, Disk I/O, DPCs and Context Switch events.
When we used the xperf –on DiagEasy command, we did not specify an individual provider, so we enabled the kernel events for all the ETW providers on the system. If you want to enable events for a specific provider, you can the following format xperf -on: (GUID|KnownProviderName)[:Flags[:Level]]. For more information about ETW providers, Kernel Flags and Groups, you can run the xperf –help providers command.
One of the most powerful features in Performance Analyzer is the ability to visualize stacks. It's important to note that this requires no special instrumentation in the code – only that you have symbols for the binary components you are interested in analyzing.
When the trace is setup to collect the stacks, Performance Analyzer will display call stack summary information for the events that had stack walking enabled. Here is an example that takes a trace (with stack tracing enabled) of the entire system while running a "find string" utility.. We can use the Stack Tracing feature of Xperf to record a stack when certain events happen, or take sample at regular intervals over time. See xperf –help stackwalk output for more info.
Below, we will use the Stack Tracking feature of Xperf to take stack samples at regular intervals. With this output, we will be able to determine where the CPU is spending most of its time within a process.
xperf -on latency -stackwalk Profile
xperf -on latency -stackwalk Profile
Latency is the kernel group to enable certain events, including the profile event which records the CPUs' activity every millisecond. The "-stackwalk Profile" flag tells Xperf to record stack walks on every profile event, which makes the profile information much more useful. In other words, in order to get profile information with stack walks you need to turn on the profile event, and turn on stack walking for that event.
Note that decoding of stacks requires that symbol decoding be configured. However stacks can be recorded without symbols, and can even be viewed without symbols, although they are much less useful without symbols. I only mention this in the event you're trying to record a trace of a problematic machine with little time to mess around with _NT_SYMBOL_PATH.
To get a trace with the stack information, do the following:
Click on the selector tab to bring up the column chooser list. Then select "Process name", "Process", "Stack", "Weight" and "%Weight". These are the most useful columns when looking at stacks from the sample profile event. You should get a view similar to this.
At this point I need to mention a few of the restrictions with stack walking coupled with when and how it works.
· Xperf stack walking is not available on XP
· On Vista stack walking is available for x86, and is available for x64 as of Vista SP1.
· On Windows 7 stack walking is available.
· Stack walking on x64 is complicated. You have to set DisablePagingExecutive in the registry, as documented here:
REG ADD "HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management" -v DisablePagingExecutive -d 0x1 -t REG_DWORD –f
I recently came across a case where the customer was complaining that DPC processing was taking up too much CPU time. We ran Xperf on the machine and drilled down into the DPC activity on the machine.
From the Xperf graph, I was able to confirm that the customer was actually seeing high DPC usage. I selected the Summary for this chart, and got the list of drivers that were actually taking up CPU time.
Right off the bat, I could identify the driver that had a lot of DPC activity. I also noted that the average duration for each DPC from that driver was taking 788 microseconds. This is way too high. Each DPC should be taking a maximum of 100 microseconds.
Performance.Analyzer.QuickStart.xps – This is shipped with the performance toolkit.
From an elevated command prompt, launch xperf -help
Excessive cached read I/O is a growing problem. For over one year we have been working on this problem with several companies. You can read more about it in the original blog post:
On 32 bit systems, the kernel could address at most 2GB of virtual memory. This address range is shared and divided up for the many resources that the system needs; one of which is the System File Cache's working set. On 32 bit systems the theoretical limit is almost 1GB for the cache’s working set; however, when a page is removed from the working set it will end up on the standby page list. Therefore the system can cache more than the 1 GB limit if there is available memory. The working set; however, is just limited to what can be allocated within the Kernel's 2GB virtual address range. Since most modern systems have more than 1 GB of physical RAM, the System File Cache's working set's size on a 32 bit system typically isn't a problem.
With 64 bit systems, the kernel virtual address space is very large and is typically larger than physical RAM on most systems. On these systems the System File Cache's working set can be very large and is typically about equal to the size of physical RAM. If applications or file sharing performs a lot of sustained cached read I/O, the System File Cache's working set can grow to take over all of physical RAM. If this happens, then process working sets are paged out and everyone starts fighting for physical pages and performance suffers.
The only way to mitigate this problem is to use the provided APIs of GetSystemFileCacheSize() and SetSystemFileCacheSize(). The previous blog post "Too Much Cache" contains sample code and a compiled utility that can be used to manually set the System File Cache's working set size.
The provided APIs, while offering one mitigation strategy, has a couple of limitations:
1) There is no conflict resolution between multiple applications. If you have two applications trying to set the System File Cache's working set size, the last one to call SetSystemFileCacheSize() will win. There is no centralized control of the System File Cache's working set size.
2) There is no guidance on what to set the System File Cache's working set size to. There is no one size fits all solution. A high cache working set size is good for file servers, but bad for large memory application and a low working set size could hurt everyone's I/O performance. It is essentially up to 3rd party developers or IT administrators to determine what is best for their server and often times, the limits are determined by a best guesstimate backed by some testing.
We fully understand that while we provide one way to mitigate this problem, the solution is not ideal. We spent a considerable amount of time reviewing and testing other options. The problem is that there are so many varied scenarios on how users and applications rely on the System File Cache. Some strategies worked well for the majority of usage scenarios, but ended up negatively impacting others. We could not release any code change that would knowingly hurt several applications.
We also investigated changing some memory manager architecture and algorithms to address these issues with a more elegant solution; however the necessary code changes are too extensive. We are experimenting with these changes in Windows 7 and there is no way that we could back port them to the current operating systems. If we did, we would be changing the underlying infrastructure that everyone has been accustomed to. Such a change would require stress tests of all applications that run on Windows. The test matrix and the chance of regression are far too large.
So that brings us back to the only provided solution - use the provided APIs. While this isn't an ideal solution, it does work, but with the limitations mentioned above. In order to help address these limitations, I've updated the SetCache utility to the Microsoft Windows Dynamic Cache Service. While this service does not completely address the limitations above, it does provide some additional relief.
The Microsoft Windows Dynamic Cache Service uses the provided APIs and centralizes the management of the System File Cache's working set size. With this service, you can define a list of processes that you want to prioritize over the System File Cache by monitoring the working set sizes of your defined processes and back off the System File Cache's working set size accordingly. It is always running in the background monitoring and dynamically adjusting the System File Cache's working set size. The service provides you with many options such as adding additional slack space for each process' working set or to back off during a low memory event.
Please note that this service is experimental and includes sample source code and a compiled binary. Anyone is free to re-use this code in their own solution. Please note that you may experience some performance side effects while using this service as it cannot possibly address all usage scenarios. There may be some edge usage scenarios that are negatively impacted. The service only attempts to improve the situation given the current limitations. Please report any bugs or observations here to this blog post. While we may not be able to fix every usage problem, we will try to offer a best effort support.
Side Effects may include:
Cache page churn - If the System File Cache's working set is too low and there is sustained cached read I/O, the memory manager may not be able to properly age pages. When forced to remove some pages in order to make room for new cache pages, the memory manager may inadvertently remove the wrong pages. This could result in cached page churn and decreased disk performance for all applications.
Version 1.0.0 - Initial Release
NOTE: The memory management algorithms in Windows 7 and Windows Server 2008 R2 operating systems were updated to address many file caching problems found in previous versions of Windows. There are only certain unique situations when you need to implement the Dynamic Cache service on computers that are running Windows 7 or Windows Server 2008 R2. For more information on how to determine if you are experiencing this issue and how to resolve it, please see the More Information section of Microsoft Knowledge Base article 976618 - You experience performance issues in applications and services when the system file cache consumes most of the physical RAM.
Welcome to the Microsoft NTDebugging blog! I’m Matthew Justice, an Escalation Engineer on Microsoft’s Platforms Critical Problem Resolution (CPR) team. Our team will be blogging about troubleshooting Windows problems at a low level, often by using the Debugging Tools for Windows. For more information about us and this blog, check out the about page.
To get things started I want to provide you with a list of tools that we’ll be referencing in our upcoming blog posts, as well as links to some technical documents to help you get things configured.
The big list of tools:
The following tools are part of the “Debugging Tools for Windows” – you’ll definitely need these
Sysinternals provides some great tools that we’ll be discussing
· Process Explorer
· Process Monitor
There are many tools contained in “MPS Reports” (MPSRPT_SETUPPerf.EXE), but I’m listing it here specifically for Checksym
“Windows Server 2003 Resource Kit Tools” is another great set of tools. In particular Kernrate is a part of that package
Windows XP SP2 Support Tools
· tracefmt (64-bit versions available in the DDK)
“Visual Studio “ – in addition to the compilers and IDE, the following tools come in handy:
Perfwiz (Performance Monitor Wizard)
Userdump (User Mode Process Dumper)
Dheapmon (Desktop Heap Monitor)
§ Go to http://connect.microsoft.com/
§ Sign in with your passport account
§ Choose "Available Connections" on the left
§ Choose "Apply for Network Monitor 3.0” (once you've finished with the application, the selection appears in your "My Participation" page)
§ Go to the Downloads page (On the left side), and select the appropriate build 32 or 64 bit build.
Some articles you may find useful:
Debugging Tools and Symbols: Getting Started
Boot Parameters to Enable Debugging
How to Generate a Memory Dump File When a Server Stops Responding (Hangs)
After installing the “Debugging Tools for Windows”, you’ll find two documents at the root of the install folder that are helpful:
· kernel_debugging_tutorial.doc - A guide to help you get started using the kernel debugger.
· debugger.chm - The help file for the debuggers. It details the commands you can use in the debugger. Think of this as a reference manual, rather than a tutorial.
Hello, this is Somak. Today I’d like to drop some Memory Manager info on the blog that I’ve used to communicate in brief (believe it or not) how the system deals with memory. If you are ever faced with checking how much Available Memory you have(or don’t have), poor system performance, questions about page faults, or having a Working Set Trimming performance issue, or just want a primer into how Windows manages memory on the system, be not afraid and read on!
How this memory stuff works
The fastest and most expensive memory is built into the CPU. The next fastest and less expensive is physical RAM. Next we have the hard drive, followed by remote storage or backup. Each step down the ladder adds significantly to access time. For example, physical RAM can be almost one million times faster than a hard disk. Since it is volatile and cost considerably more than hard disk space, we are limited at how much we can put in the system. This limit is far less than the much slower and non-volatile hard disk. Due to these design constraints we have a tiered memory structure in the computer. To achieve the fastest response times and best overall performance, an operating system must efficiently use this tiered memory structure. It must do its best to reduce the need to retrieve data from a slower storage media like the hard disk. It has to do this while juggling the memory and I/O demand from all running processes. The following paragraphs are an overview of how the Memory Manager achieves this in Windows. A more detailed description can be found in Chapter 7 of Microsoft Windows Internals, Fourth Edition (ISBN: 0-7356-1917-4). This book is the great source for how stuff works in Windows.
First let’s lay out some definitions. They will be useful later on when I talk about the interactions. These definitions are high level to maintain brevity.
Virtual Memory – This is a memory that an operating system can address. Regardless of the amount of physical RAM or hard drive space, this number is limited by your processor architecture. On a 32 bit processor you are limited to 4 GB of addressable virtual memory (2^32). With a default installation on a 32 bit box (not using /3GB) the kernel reserves 2GB for itself. Applications are left with 2GB of addressable virtual memory. When applications execute they are only presented with this 2GB of addressable memory. Each application gets its own 2GB virtual memory to play with. If you have 50 processes running, you’ll have 50 independent 2GB Virtual Memory address spaces and one 2GB Virtual Address space for kernel. This is possible because Virtual Memory always exists, but doesn't really exist (hence the term virtual). We basically lie to the application and say, here is 2GB of memory address space for you to use. This memory isn’t allocated until the application explicitly uses it. Once the application uses the page, it becomes committed. A virtual memory page is then translated to a physical memory page. From this translation, the virtual page can reside in physical RAM or on the hard disk.
Physical Memory – This is the physical storage media. It can be physical RAM, the hard disk, optical disks, tape backups, etc. This is anything that can store data. Most times when people talk about Physical Memory, they refer to physical RAM (the memory sticks on your motherboard), but with virtual page translation, physical memory can also be on the hard drive (in your paging file). Physical RAM is limited by your processor architecture.
Committed Memory – When an application touches a virtual memory page (reads/write/programmatically commits) the page becomes a committed page. It is now backed by a physical memory page. This will usually be a physical RAM page, but could eventually be a page in the page file on the hard disk, or it could be a page in a memory mapped file on the hard disk. The memory manager handles the translations from the virtual memory page to the physical page. A virtual page could be in located in physical RAM, while the page next to it could be on the hard drive in the page file.
Commit Limit – This is the maximum amount of memory that all your applications and the OS can commit. If you had 50 applications fully allocate their 2 GB of virtual address space, you would need 100GB of commit limit (ignore kernel memory usage to keep the numbers simple). So 100GB of committed pages can be backed by physical RAM or the hard drive. If you have 100 GB of RAM, you could handle this memory load. In most cases, 100 GB of RAM isn't economically feasible so the Commit Limit is comprised of physical RAM and the page file. If you have 2 GB of physical RAM and 98 GB of page file, then your commit limit would be 100 GB.
Page file – This is the storage area for virtual memory that has been committed. It is located on the hard drive. Since hard drive space is cheaper than physical RAM, it is an inexpensive way to increase the commit limit.
Working Set – This is a set of virtual memory pages (that are committed) for a process and are located in physical RAM. These pages fully belong to the process. A working set is like a "currently/recently working on these pages" list.
Modified pages - Once a virtual memory page leaves the process's working set, it is moved to another list. If the page has been modified, it is placed on the modified page list. This page is in physical RAM. A thread will then write the page to the page file and move it to the standby list.
Standby pages - This is a page that has left the process' working set. This page is in physical RAM. A standby page is like a cache for virtual memory pages. It is still associated with the process, but not in its working set. If the process touches the page, it is quickly faulted back into the working set. That page also has one foot out the door. If another process or cache needs more memory, the process association is broken and it is moved to the free page list. Most of the pages in available memory are actually standby pages. This makes sense when you realize that these pages can be quickly given to another process (hence available), but you should also understand that they are page caches for working sets and can be quickly given back if the process touches the page again. The vast majority of available memory is not wasted or empty memory.
Free pages - When a page is taken off of the standby page list, it is moved to the Free page list. This page is in physical RAM. These pages are not associated with any process. When a process exits, all of its pages are then dumped onto this list. Typically, there is a very small to no amount of free pages hanging around physical RAM.
Zeroed pages - When a free page is zeroed out, it is placed on the Zero page list. This page is in physical RAM. These are the pages that are given to processes that are making memory allocations. Due to C2 security requirements, all pages must be scrubbed before handed to a new process. When the system is idle, a thread will scrub free pages and put them on this list. Only a small amount of zero pages are required to handle the typical small memory allocations of processes. Once this list is depleted, and if there is demand for more pages, we pull pages off of the Free page list and scrub them on the fly. If the Free page list is depleted, then we pull pages off of the standby list, scrub them on the fly and hand them to the new process.
What is Task Manager telling me?
Prior to Windows Vista, Task Manager reports memory usage using accounting methods that you probably are not expecting. It is because of these accounting practices, that we rarely use Task Manager to gauge system performance and memory usage. We typically use it for a quick overview or to kill processes. I highly recommend using Performance Monitor (perfmon.msc) for investigating performance issues. Here's the breakdown of the numbers on the Performance tab:
Total - The is the total physical RAM installed in the system.
Available - This is the total of the Standby, Free and Zeroed list. Free and Zeroed makes sense, but Standby seems odd at first. Standby pages were added to this number because they are available for a quick scrub and given to a new process. So they are technically available (with minimal effort).
System Cache- This is the total of the Standby list and the size of the system working set (which includes the file cache). Standby pages are added to this list because they are cached pages for working sets.
PF Usage - This is the total number of committed pages on the system. It does not tell you how many are actually written to the page file. It only tells you how much of the page file would be used if all committed pages had to be written out to the page file at the same time.
Total - This is the total virtual memory that has been committed. This includes all committed memory for all processes and the kernel.
Limit - This is the maximum amount of committed memory this system can handle. This is a combination of physical RAM and the page file.
Peak – This is the highest amount of memory committed thus far on this system, since boot.
How does this work?
The memory manager optimizes physical RAM usage across the entire system. Since physical RAM is a finite resource, it has to balance sharing this critical resource amongst all process, the kernel and file I/O. It tries to keep disk I/O to a minimum, which results in a more responsive system. It does this by moving pages around to meet the demand of the system.
Typically, large sections of physical RAM are used for file cache. This is cache is necessary to improve disk performance. Without it, disk I/O would make the system crawl along at an nearly unusable pace. The file system cache is just like a working set for a process. Pages removed from the file cache are moved to the standby or modified page list. Many of the standby pages in RAM are probably file cache pages that were removed from its working set. For example, on a file server, if you see 8 GB of available memory, most of these pages are probably standby pages for the file cache. The file cache's working set could be 500 MB, but the 8 GB of standby pages should also be considered part of the file cache.
Now let's take a look at how the memory manager handles processes. While an application is working with its virtual memory pages, the memory manager keeps the pages in the process' working set. Since the vast majority of application do not use all of its memory all the time, some pages will age. Old pages are removed from the working set. If they are modified, they are moved to the modified list. The page is saved to the page file and moved to the standby list. If the page hasn't been modified, it is moved directly to the standby list. These pages will remain on the standby page list until there is a demand for it.
If the application touches the page again, it is soft faulted back into the process' working set. If the process doesn't use the page for a very long time, or if the demand for the page is greater elsewhere, the page is moved off of the standby list. It is disassociated with the process and moved to a the free page list. From the free page list, the page is scrub on demand or lazily and placed on the zero page list. It is from the zero page list that other processes or the kernel or the file cache will get a new page.
If after a very long time the application once again needs a page that is not in its working set, the memory manager will handle the memory fault. If the page is on the standby list, it is quickly put back into the process' working set. If the page is no longer in the standby list, a hard fault occurs. The memory manager issues I/O to the hard disk to read the page(s) from the page file. Once the I/O complete, the page is placed back into the process' work set.
All of this is done to keep physical RAM highly utilized and disk I/O to a minimum. We don't want to allow process to horde physical RAM for pages that are rarely used. The physical RAM must be shared with other processes, the kernel and the file cache. If you see lots of available memory on your system, rest assured that it is not going to waste. The vast majority is on standby lists for processes and the file cache.
Also note that page file usage isn't that bad. The page file allows the Memory Manager to save modified pages before placing the page on the standby list. The page is still in physical RAM and can be quickly faulted back into the process. This method gives the process a chance to reclaim an old page and it allows the page to be quickly used if there is demand elsewhere.
The best way to see the totals of these lists is to use a kernel debugger (live or postmortem). Use the !memusage command and you'll get an output like this:
0: kd> !memusage
loading PFN database
loading (100% complete)
Compiling memory usage data (99% Complete).
Zeroed: 414 ( 1656 kb)
Free: 2 ( 8 kb)
Standby: 864091 (3456364 kb)
Modified: 560 ( 2240 kb)
ModifiedNoWrite: 30 ( 120 kb)
Active/Valid: 182954 (731816 kb)
Transition: 2 ( 8 kb)
Bad: 0 ( 0 kb)
Unknown: 0 ( 0 kb)
TOTAL: 1048053 (4192212 kb)
Of the 4GB of physical RAM, only 1.6 MB are on Zeroed or free pages. 731 MB is in process, system and file cache working sets. 2 MB are on the modified page list. The vast majority, 3.4 GB, is on the standby list. On this server, most people will see 3.4 GB of wasted physical RAM, but you will know better.
What should I be worried about?
Typically you shouldn't be worried about these things, until you have a performance problem. If your system is sluggish or slow to respond or you are getting errors about out of memory, then you need to rely on this information. You will need to collect a performance monitor log of the problem time. If the counter list is daunting, then use the Performance Monitor Wizard to configure the performance monitor log.
Once the log is collected, you'll need to analyze several counters. I'm not going into detail about how to review performance monitor logs this time. I'll save that lengthy topic for another time. For now I'll focus on the counters relevant to the Memory Manager.
One of the biggest reasons for slow performance and sluggish system responsiveness is disk bottleneck. Look at Physical Disk\% Idle Time, Avg. Disk sec/Read and Avg. Disk sec/Write counters. If your system drive is under 50% idle or your disk response times are way above your drive specifications, then you need to investigate further. Look at the Memory\Available Mbytes. You should have a couple hundred Mbytes of Available Memory. This is one of the most important performance monitor counters. If this number drops too low, your standby lists, process working sets and cache will be greatly reduced. You'll need to find out if a process is consuming physical RAM. Check for large process working sets or for large file cache.
You will also need to see if paging is really affecting system performance. Take a look at Memory\Pages Input/sec and correlate that to Physical Disk\Avg. Disk sec/Read. Pages Input/sec is the number of pages being read in from the page file. These are the hard faults (when the page wasn't on the standby list). If your Avg. Disk sec/Read is close to your drive's specification and the drive's idle time is high, than paging really isn't a problem. Small amounts of hard faults are expected as applications will every once in a while re-touch an old page. As long as this I/O is not consistent or the disk can't keep up, you probably will not notice this impact.
You can also look at Memory\Pages Output/sec and Physical Disk\Avg. Disk sec/Write. These are the page commits to the page file when a modified page is removed from a process' working set. As long as the disk can keep up with the writes, this shouldn't be a problem. Remember that once the page is saved to the page file, it is placed on the standby list. If there isn't a great demand for new pages, it can remain on the standby list for a very long time. When the process touches the old virtual page again, it can be soft faulted back into the working set. If there is great demand for memory, you'll see process working sets aggressively being trimmed. Unless there is memory pressure, this is all done with lazy I/O, so you should not see much of an impact from this activity.
The Memory Manager works to meet current demand and prepares for future demand when possible. You need to look at a performance monitor log to see if there is memory pressure on the system. You'll see this in low Available Mbytes and reductions in process working sets. You'll be able to correlate this to increase disk I/O to the page file. If you have established that there is memory pressure on the box, you need to figure where that demand is coming from. Check for working set increases from processes, file cache or bottlenecked disk I/O to data drives.
Recently I had the opportunity to debug the clipboard in Windows, and I thought I’d share some of the things I learned. The clipboard is one of those parts of Windows that many of us use dozens (hundreds?) of times a day and don’t really think about. Before working on this case, I had never even considered how it works under the hood. It turns out that there’s more to it than you might think. In this first article, I’ll describe how applications store different types of data to the clipboard and how it is retrieved. My next post will describe how applications can hook the clipboard to monitor it for changes. In both, I’ll include debug notes to show you how to access the data from the debugger.
Let’s start by discussing clipboard formats. A clipboard format is used to describe what type of data is placed on the clipboard. There are a number of predefined standard formats that an application can use, such as bitmap, ANSI text, Unicode text, and TIFF. Windows also allows an application to specify its own formats. For example, a word processor may want to register a format that includes text, formatting, and images. Of course, this leads to one problem, what happens if you want to copy from your word processor and paste into Notepad, which doesn’t understand all of the formatting and pictures?
The answer is to allow multiple formats to be stored in the clipboard at one time. When I used to think of the clipboard I thought of it as a single object (“my text” or “my image”), but in reality the clipboard usually has my data in several different forms. The destination program gets a format it can use when I paste.
So how does this data end up on the clipboard? Simple, an application first takes ownership of the clipboard via the OpenClipboard function. Once it has done that, it can empty the clipboard with EmptyClipboard. Finally, it is ready to place data on the clipboard using SetClipboardData. SetClipboardData takes two parameters; the first is the identifier of one of the clipboard formats we discussed above. The second is a handle to the memory containing the data in that format. The application can continue to call SetClipboardData for each of the various formats it wishes to provide going from best to worst (since the destination application will select the first format in the list it recognizes). To make things easier for the developer, Windows will automatically provide converted formats for some clipboard format types. Once the program is done, it calls CloseClipboard.
When a user hits paste, the destination application will call OpenClipboard and one of these functions to determine what data format(s) are available: IsClipboardFormatAvailable, GetPriorityClipboardFormat, or EnumClipboardFormats. Assuming the application finds a format it can use, it will then call GetClipboardData with the desired format identifier as a parameter to get a handle to the data. Finally, it will call CloseClipboard.
Now let’s take a look at how you can find what data being written to the clipboard using the debugger. (Note that all of my notes are taken from a Win7/2008 R2 system – so things might vary slightly in different versions of the OS.) Since the clipboard is part of Win32k.sys, you’ll need to use a kernel debugger. I like to use win32k!InternalSetClipboardData+0xe4 as a breakpoint. The nice thing about this offset is that it is right after we’ve populated the RDI register with data from SetClipboardData in a structure known as tagCLIP.
kd> u win32k!InternalSetClipboardData+0xe4-c L5
fffff960`0011e278 894360 mov dword ptr [rbx+60h],eax
fffff960`0011e27b 8937 mov dword ptr [rdi],esi
fffff960`0011e27d 4c896708 mov qword ptr [rdi+8],r12
fffff960`0011e281 896f10 mov dword ptr [rdi+10h],ebp
fffff960`0011e284 ff15667e1900 call qword ptr[win32k!_imp_PsGetCurrentProcessWin32Process (fffff960`002b60f0)]
kd> dt win32k!tagCLIP
+0x000 fmt : Uint4B
+0x008 hData : Ptr64 Void
+0x010fGlobalHandle : Int4B
Here’s what a call to SetClipboardData from Notepad looks like:
Child-SP RetAddr Call Site
fffff880`0513a940 fffff960`0011e14f win32k!InternalSetClipboardData+0xe4
fffff880`0513ab90 fffff960`000e9312 win32k!SetClipboardData+0x57
fffff880`0513abd0 fffff800`01482ed3 win32k!NtUserSetClipboardData+0x9e
00000000`001dfad8 00000000`7792e494 USER32!ZwUserSetClipboardData+0xa
00000000`001dfae0 000007fe`fc5b892b USER32!SetClipboardData+0xdf
00000000`001dfb20 000007fe`fc5ba625 COMCTL32!Edit_Copy+0xdf
00000000`001dfb60 00000000`77929bd1 COMCTL32!Edit_WndProc+0xec9
00000000`001dfc00 00000000`779298da USER32!UserCallWinProcCheckWow+0x1ad
00000000`001dfcc0 00000000`ff5110bc USER32!DispatchMessageWorker+0x3b5
00000000`001dfd40 00000000`ff51133c notepad!WinMain+0x16f
00000000`001dfdc0 00000000`77a2652d notepad!DisplayNonGenuineDlgWorker+0x2da
00000000`001dfe80 00000000`77b5c521 kernel32!BaseThreadInitThunk+0xd
So here, we can dt RDI as a tagCLIP to see what was written:
kd> dt win32k!tagCLIP @rdi
+0x000 fmt : 0xd
+0x008 hData : 0x00000000`00270235 Void
+0x010fGlobalHandle : 0n1
Fmt is the clipboard format. 0xd is 13, which indicates this data is Unicode text. We can’t just ‘du’ the value in hData, however, because this is a handle, not a direct pointer to the data. So now we need to look up the handle. To do that, we need to look at a win32k global structure – gSharedInfo:
Evaluate expression: -7284261440224 = fffff960`002f3520
kd> dt win32k!tagSHAREDINFO fffff960`002f3520
+0x000 psi : 0xfffff900`c0980a70 tagSERVERINFO
+0x008 aheList : 0xfffff900`c0800000 _HANDLEENTRY
+0x010 HeEntrySize : 0x18
+0x018 pDispInfo : 0xfffff900`c0981e50 tagDISPLAYINFO
+0x020ulSharedDelta : 0
+0x028 awmControl :  _WNDMSG
+0x218DefWindowMsgs : _WNDMSG
+0x228DefWindowSpecMsgs : _WNDMSG
aheList in gSharedInfo contains an array of handle entries, and the last 2 bytes of hData multiplied by the size of a handle entry the address of our handle entry:
kd> ?0x00000000`00270235 & FFFF
Evaluate expression: 565 = 00000000`00000235
unsigned int64 0x18
kd> ? 0xfffff900`c0800000 + (0x235*0x18)
Evaluate expression: -7693351766792 = fffff900`c08034f8
kd> dt win32k!_HANDLEENTRY fffff900`c08034f8
+0x000 phead : 0xfffff900`c0de0fb0 _HEAD
+0x008 pOwner : (null)
+0x010 bType : 0x6 ''
+0x011 bFlags : 0 ''
+0x012 wUniq : 0x27
If we look in phead at offset 14, we’ll get our data (this offset may vary on different platforms):
kd> du fffff900`c0de0fb0 + 0x14
fffff900`c0de0fc4 "Hi NTDebugging readers!"
Let’s consider one other scenario. I copied some text out of Wordpad, and a number of SetClipboardData calls were made to accommodate different formats. The Unicode format entry looks like this:
Breakpoint 0 hit
fffff960`0011e284 ff15667e1900 call qword ptr[win32k!_imp_PsGetCurrentProcessWin32Process (fffff960`002b60f0)]
+0x008 hData : (null)
+0x010fGlobalHandle : 0n0
hData is null! Why is that? It turns out that the clipboard allows an application to pass in null to SetClipboardData for a given format. This indicates that the application can provide the data in that format, but is deferring doing so until it is actually needed. Sure enough, if I paste the text into Notepad, which needs the text in Unicode, Windows sends a WM_RENDERFORMAT message to the WordPad window, and WordPad provides the data in the new format. Of course, if the application exits before populating all of its formats, Windows needs all of the formats rendered. In this case, Windows will send the WM_RENDERALLFORMATS message so other applications can use the clipboard data after the source application has exited.
That’s all for now. Next time we’ll look at how applications can monitor the clipboard for changes using two hooks. If you want to know more about using the clipboard in your code, this is a great reference.
Part 2 of this article can be found here: