Many of you SQL folks out there have extended stored procedures installed and being used in your SQL Server. Most often this is installed by some 3rd party applications like backup applications or linked server providers. There are times when some custom business logic is implemented in XPROC’s and hence these are loaded inside the SQL Server process address space.
I am going to talk about something that not many people know about, but which does play a role in SQL Server, hence a DBA/DB-Developer must be aware of this. This scenario was brought to me by one of our customers and it goes something like this… Scenario
I have an extended stored procedure called xp_customproc which is written in C/C++. During the course of migrating from SQL 2005 to SQL 2008, we noticed that we were getting Access Violation’s (AV) when running this xp_customproc. Here is the environment with the results of our testing.
SQL Server Version
SQL Server 2005
Windows Server 2003
Windows Server 2008
SQL Server 2008
Whenever it failed with an AV, a mini-dump was captured by SQL Server and the following error was reported back to the client.
Msg 109, Level 20, State 0, Line 0
A transport-level error has occurred when receiving results from the server. (Provider: Shared Memory Provider, error: 0 - The pipe has been ended.)
Not only this, once this happened our SQL Server used to crash with the following entry in the event log.
Faulting application sqlservr.exe, version 2007.100.1600.22, time stamp 0x4875735e, faulting module ntdll.dll, version 6.0.6001.18000, time stamp 0x4791a7a6, exception code 0xc0000374, fault offset 0x000b015d, process id 0xa10, application start time 0x01cb7556b688c75b.
When I looked into the mini-dump, I noticed the following call stack which had generated the exception.
77a30d68 ntdll!RtlReportCriticalFailure+0x2e 77a30e56 ntdll!RtlpReportHeapFailure+0x21 779f0531 ntdll!RtlpLogHeapFailure+0xa1 779f0531: ntdll!RtlFreeHeap+0x00000060 7623c56f: kernel32!HeapFree+0x00000014 544f3228: xp_popup!xp_func1+0x00000e58 544f2279: xp_popup!xp_customproc+0x00000959
If you notice the highlighted entry in the stack above, it is talking about freeing a memory segment called “Heap”. Here are my conclusions after looking into the exception in detail.
Q: Why is SQL Server crashing?
Ans: Heap Corruption (NT_HEAP_CORRUPTION) is a specific type of memory corruption which will cause a process termination. Since xp_customproc.dll is loaded inside SQL Server process, any goof-up which happens here will affect the parent process, which in this case is SQLServr.exe. This is the reason for the AV and subsequent SQL Server crash.
Q: Corruption, what type of corruption?
Ans: Double Free which means, trying to free an already free heap. i.e. trying to free an already free block of memory. You can read more about this here à http://support.microsoft.com/kb/286470
Q: Who is corrupting the heap?
Ans: Obvious, xp_customproc.dll itself.
Q: Who is creating these heaps?
Ans: xp_customproc.dll itself is creating these heaps. To confirm this you need to talk to the developer/vendor who wrote the custom xproc code.
Now that we have ascertained what happened and why, the obvious question in your mind is this,
Q: Why does this happen only with SQL 2008 on Windows 2008? Ans: Windows 2008 (actually Windows Vista onwards) introduced a lot of changes to the NT Heap Manager. The Heap Manager is smarter and is more capable of detecting heap corruption and better at heap memory management. By default, now it has the capability to detect certain types of heap corruption and whenever ANY PROCESS runs into this, it is detected and terminated. This is what happened to SQL Server as well.
Whenever Windows does this, it will automatically capture a dump of the parent process and it is stored in the following location along with the process name. C:\ProgramData\Microsoft\Windows\WER\ReportQueue\*.mdmp or *.dmp
For E.g., C:\ProgramData\Microsoft\Windows\WER\ReportQueue\AppCrash_sqlservr.exe_7561dcdfc5bdc1befcdfd7575b83ae72c949f1d3_cab_92e2688f
SQL Server 2008 is one of the applications which makes use of the new heap manager since it offers a lot more types of heaps and better performance (like LFH heaps). This is the reason why you only see it with this combination.
Is this a good thing?
YES! I would want any kind of memory corruption detected and captured rather than let the process continue. It can lead to data corruption, data loss, incorrect results and all other sorts of problems, if the process continues to run.
Okay smart guy, but why don’t I see this on Windows 2003?
Like I said earlier, this default behavior was introduced starting with Vista and for you to see this behavior on Windows 2003, you need to enable an option called PageHeap which turns on this extensive heap detection mechanism. This can be done following this KB article,
How to use Pageheap.exe in Windows XP, Windows 2000, and Windows Server 2003 http://support.microsoft.com/kb/286470
You can use GFlags.exe that is available with WinDbg to enable this option using the GUI. Here is an example for sqlservr.exe.
Note: This is not something to have turned ON always for sqlservr.exe since enabling PageHeap means extensive tracking of memory allocations and will affect performance. So never do this unless guided by MS Support. I have mentioned this here purely to help you understand what was happening behind the scenes.
Apart from the mini-dump, Windows also logs a specific exception code in the event log to help you identify what has occurred.
Windows Error Reporting 1001 None "Fault bucket, type 0 Event Name: APPCRASH Error 10/5/2010 2:30:38 PM Application Error 1000 (100) "Faulting application name: sqlservr.exe, version: 2009.100.1600.1, time stamp: 0x4bb6b40b Faulting module name: ntdll.dll, version: 6.1.7600.16385, time stamp: 0x4a5be02b Exception code: 0xc0000374 0xc0000374 is a new error code added in Windows Vista+ and above for reporting heap corruption. Exception Code: 0xc0000374 (STATUS_HEAP_CORRUPTION) Description: A heap has been corrupted.
For those of you lost with all this technical mumbo-jumbo, let me explain this in very simplistic terms using the following diagram. This time the Xproc is called “xp_popup.dll”.
Simplistic Overview of the Problem
Q: All Right, I get it. Now what?
Ans: Since the true problem is happening in your custom code in xp_customproc.dll, you will have to debug this to see where the problem lies. This means talking to the vendor or developer who wrote the code in the extended stored procedure. If they have a support team, call away.
Sample Test Code for Heap Corruption
For those who are wondering what heap corruption is, here is a sample C++ code I wrote to show you what is wrong,
int _tmain (int argc, _TCHAR* argv)
printf("Press any key to start\n");
BYTE* pByte=(BYTE*) HeapAlloc(GetProcessHeap(), 0, 10);
HeapFree(GetProcessHeap(), 0, pByte);
HeapFree(GetProcessHeap(), 0, pByte); /* Doing Free twice */
If you run this with PageHeap enabled using GFlags, you will notice that windows caught the heap corruption and terminates the process. Something similar is happening to the SQL Xproc’s.
Hope this was informative. As always, stay tuned for more. Cheers!
Regards, Sudarshan Narasimhan Technical Lead, Microsoft SQL Server CSS