Heap corruption is very bad since it means that memory in the process is smashed (overwritten).   This typically occurs when an application allocates a block of heap memory of a given size and then writes to memory addresses beyond the requested size of the heap block. Heap corruption can also occur when an application writes to block of memory that has already been freed.  Such corruption can cause misbehavior and even crashes.  

 

Corruption may not appear for a while, never or even very often depending upon how significant the corruption is. The corruption may not manifest until a software update of some sort is installed (examples would be service packs and hot fixes to the OS, hosting application or end-application – and it can even manifest due to hardware change or a device driver change.

One way to check for heap corruption is using gflags.exe in combination with a debugger (you can use ADPlus which will attach a debugger for you) to take a dump.  This checking is not a 100% catch-all, however it works fairly well.  Using gflags to do this type of check will cause an application (as an example the w3wp.exe for the worker process under IIS for a web application) to crash when there is heap corruption.  When a debugger (such as what ADPlus would use) is to the process and there is a detected corruption, a crash dump will be created which can be used to find which module (dll, etc) caused the corruption.  There was a similar program in the past called pageheap.exe; however, its functionality has been merged into gflags.exe.

 How heap corruption detection works:

  • Corruptions in heap blocks are discovered by either placing a non-accessible page at the end of the allocation, or by checking fill patterns when the block is freed.
  • There are two heaps (full-page heap and normal page heap) for each heap created within a process that has page heap enabled.
    • Full-page heap reveals corruptions in heap blocks by placing a non-accessible page at the end of the allocation. The advantage of this approach is that you achieve "sudden death," meaning that the process will access violation (AV) exactly at the point of failure. This behavior makes failures easy to debug. The disadvantage is that every allocation uses at least one page of committed memory. For a memory-intensive process, system resources can be quickly exhausted.
    • Normal page heap can be used in situations where memory limitations render full-page heap unusable. It checks fill patterns when a heap block is freed. The advantage of this method is that it drastically reduces memory consumption. The disadvantage is that corruptions will only be detected when the block is freed. This makes failures harder to debug.

For IIS:

Enable pageheap corruption checking using the following command:

gflags.exe -p /enable w3wp.exe /full

Now, recycle the Application Pool associated with the application. Be sure that you don't recycle the others or they will pick up the settings since the settings for w3wp.exe will stay in effect until turned off. 

Now, set-up for a crash.

adplus –crash –p <pid>

Note:   You will need to know the process ID for the process running your web application.

 

After the crash, reset

gflags.exe -p /disable w3wp.exe

   Note:   You will need to recycle any application pools restarted after the /enable command was used in order for the /disable to take effect.

For full checking on w3wp.exe:

gflags.exe -p /enable w3wp.exe /full

For Outlook:

Enable pageheap corruption checking using the following command:

  gflags.exe -p /enable outlook.exe /full

 

Run Outlook.

Now, set-up for a crash:.

adplus –crash –sc C:\Program Files\Microsoft Office\Office12\outlook.exe

Note: This will launch outlook and attach to it.  If the process was already running when the debugger was attached, the memory corruption may have already happened.  The path for Outlook may vary depending upon the Outlook version, OS and install options.

After the crash, reset

gflags.exe -p /disable outlook.exe

Note:   You will need to recycle any application pools restarted after the /enable command was used in order for the /disable to take effect.

Additional commands:

You can use the following command line to see if page heap checking is enabled:    

gflags.exe –p

 

To see a list of commands which

gflags.exe –h   

This is what the heap checking registry looks like when enabled for notepad.exe:

 HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options

 

System-Wide Page Heap Checking:

Turning on system-wide checking is sometimes needed in the case of an issue is being caused with something in kernel mode (such as a device driver). As an example, GDI runs in Kernel Mode and is used by GDI+ User Mode, which is in turn used by Outlook.  So, if heap corruption occurs in a video driver, the corruption can manifest in an Outlook add-in dialog window.

 

The GFlags tool is used to enable system-wide page heap. In order for a GFlags command to take effect, you must restart your computer after you issue the command.

To enable system-wide normal page heap:

1.      Type the following at the command line:gflags -r +hpa

2.      Restart your computer.

To disable system-wide normal page heap:

1. Type the following at the command line:gflags -r -hpa

2. Restart your computer.

The dump File(s):

Dumps will be created each time an access violation occurs.  This means that there will be 1+ dumps generated. The dump with "1st_chance_CONTRL_C_OR_Debug_Break__mini" in it is the one which will has the detected heap corruption in it. If you load the dump using windbg, you should be able to see where the access violation was using the "!analyze –v" command. 

You should check the first chance dump files (the ones with "_1st_chance_AccessViolation__mini" in their name)  also to see if the caught exceptions are in the same area as the uncaught exception (they may provide further clues).  In one case, a customer saw flashing red crosses in a dialog for a custom add-in in Outlook and Outlook was hanging.  Heap corruption was detected in the add-in.  However, the first chance access violations caught in the dumps taken earlier in the same run showed access violations in GDI+.  This lead to two areas to check-out – one being the add-in and the other was the video driver. The cause turned-out to be with the video driver.

 

Here is an example of dumps created when heap checking is enabled:

   

For more information on heap corruptions:

Managing Heap Memory in Win32

http://msdn.microsoft.com/en-us/library/ms810603.aspx

 

What a Heap of ... (Part One)

http://blogs.technet.com/askperf/archive/2007/06/29/what-a-heap-of-part-one.aspx    

 

What a Heap of ... (Part Two)

http://blogs.technet.com/askperf/archive/2007/06/29/what-a-heap-of-part-two.aspx

 

How to use Pageheap.exe in Windows XP, Windows 2000, and Windows Server 2003

http://support.microsoft.com/kb/286470  

 

Example 12: Using Page Heap Verification to Find a Bug

http://msdn.microsoft.com/en-us/library/ff543097.aspx