Welcome to MSDN Blogs Sign in | Join | Help

There's more to switching stacks than just loading a new stack pointer

Sometimes people think they can switch stacks by just loading a new value into the ESP register. This may seem to work but in fact it doesn't, because there is more to switching stacks than just loading a new value into ESP.

On the x86, the exception chain is threaded through the stack, and the exception dispatch code verifies that the exception chain is "sane" before dispatching an exception. If you summarily yank ESP into a location outside the stack the operating system assigned to the thread, then the exception chain will appear to be corrupted, and once the exception dispatch code notices this, it will declare your program to be unrecoverably corrupted. It can't even raise an exception to indicate that this has happened, even if it wanted to, because it doesn't even know where the exception handlers are!

There are other parts of the system that rely on the stack pointer remaining inside the correct stack. For example, the code that expands the stack on demand needs to know where the stack is and how big it can get. (And the ia64 architecture has two stack pointers.) If a part of the system needs to do work with those values and it notices that the real stack pointer is "in la-la land", it will start taking drastic measures (typically by terminating the program).

If you want to switch stacks, use a fiber. Fibers provide a way to capture the state of a computation, which includes the instruction pointer and the stack.

Published Friday, February 15, 2008 7:00 AM by oldnewthing
Filed under:

Comments

# re: There's more to switching stacks than just loading a new stack pointer

Friday, February 15, 2008 11:28 AM by Tom

My impression was that Fibers were unofficially deprecated.  This, of course, does not imply they are not useful.  In my study of SQL Server 2005 < http://msdn2.microsoft.com/en-us/library/ms191135.aspx > it would seem that using Fibers in combination with asynchronous I/O might be beneficial, but I would have trouble justifying implementing Fibers when I/O Completion Ports already seem to give very good concurrency.  

SQL Server does, however, give you the option to enable Fibers during execution.

# re: There's more to switching stacks than just loading a new stack pointer

Friday, February 15, 2008 1:09 PM by nksingh

@Tom:

Fibers probably will not ever be removed from the OS, but they are usually a lot more trouble than they are worth (unless you're already going to the trouble of munging directly with esp).  I can't imagine what awful breakage Raymond had to look at to find this problem...

The difficulty with fibers is that some components of the OS store state in the TEB of your thread and if you have code that expects to be able to store unique data in the TEB running in a fiber, two fibers could trample each other's TEB states.  Kernel Threads also build up a bit of state that might be relevant to user programs.  This might get messed up if you move a fiber from one thread to another.

# re: There's more to switching stacks than just loading a new stack pointer

Friday, February 15, 2008 2:04 PM by Nathan_works

Am I the only one that finds it strange the guys wants to swap stack for a stack over flow exception ? How about finding the infinite recursion and fixing his code rather than fixing the symptom of his bad code ?

# re: There's more to switching stacks than just loading a new stack pointer

Friday, February 15, 2008 10:15 PM by Doug

@Nathan_works:

It'd definitely be better if he could fix his code rather than write a crash handler. However, writing a crash handler is either an admission of defeat on that front, or a back-stop debugging aid.

Other platforms (and embedded/OS programming) support ways of switching stacks to handle the case of stack overflows. For example, Unix supports such a mechanism, though as you imply, the wisdom of it is perhaps dubious, and it's fraught with all sorts of dangers. And of all the uses of alternate signal stacks that I can think of, crash handlers are the "least bad" use. If Windows' Posix subsystem supports alternate signal stacks even for stack overflows, then you might be able to use a similar mechanism, but I believe it isn't supported. This article: http://technet.microsoft.com/en-us/library/bb497013.aspx says that sigaltstack isn't supported by Services for Unix, and has no equivalent.

Anyway, it'd be better if people didn't have to or want to write crash handlers, but switching stacks is pretty much required if you want a crash handler to work in the case of a stack overflow.

# re: There's more to switching stacks than just loading a new stack pointer

Saturday, February 16, 2008 9:45 AM by Igor Levicki

I understand that some people want to show the message when their application crashes because of the stack overflow instead of leaving the user miffled with the application window suddenly disappearing without a trace.

Raymond already explained that the OS cannot do that for you because the application has lost the stack and the OS would want to run the error message in the context of your application.

If OS had its own thread and stack for displaying application errors which worked for _any_ application error (stack overflow included), then nobody would have to even think about writing such hacks.

# On second thought...

Saturday, February 16, 2008 12:05 PM by Igor Levicki

Could this code work?

#include <stdio.h>

#include <windows.h>

void no_overflow(void)

{

int dummy[1024];

dummy[1023] = 0;

}

void overflow(void)

{

int dummy[262144];

dummy[262143] = 0;

}

long __stdcall Filter(struct _EXCEPTION_POINTERS *ep)

{

DWORD Code;

PVOID Address;

DWORD *esp;

int i;

Code = ep->ExceptionRecord->ExceptionCode;

Address = ep->ExceptionRecord->ExceptionAddress;

esp = (DWORD*)ep->ContextRecord->Esp;

if (Code == STATUS_STACK_OVERFLOW) {

printf("Stack overflow at address 0x%p\n", Address);

printf("Called from %p\n", esp[1] - 5);

printf("Called from %p\n", esp[3] - 5);

printf("Terminating execution");

return EXCEPTION_EXECUTE_HANDLER;

} else {

return EXCEPTION_CONTINUE_SEARCH;

}

}

int main(int argc, char *argv[])

{

SetUnhandledExceptionFilter(Filter);

no_overflow();

overflow();

return 0;

}

It is just a quick hack and it works for this simple test case, not sure if it would work in debug build, nor in a complex application, and it most definitely wouldn't work without changes in 64-bit mode.

[It depends on your definition of "work". Most of the time, you can't assume that esp[1]-5 is the return address and that you aren't re-entering printf. There may be other problems; those are just the ones that I noticed right off the bat. -Raymond]

# Update...

Saturday, February 16, 2008 12:31 PM by Igor Levicki

It works if alloca causes stack overflow. Doesn't work for stray pointers hitting stack above the guard page.

MSDN says: "The exception handler specified by lpTopLevelExceptionFilter is executed in the context of the thread that caused the fault. This can affect the exception handler's ability to recover from certain exceptions, such as an invalid stack."

# re: There's more to switching stacks than just loading a new stack pointer

Saturday, February 16, 2008 7:07 PM by MadQ

This is just the kernel of an idea... if you're expecting a stack overflow, how about converting the thread to a fiber to begin with, then copying the EXCEPTION_POINTERS data to a pre-allocated location, and switch fibers in the exception handler?

I haven't really thought this through yet, but it could be worth considering.

# re: There's more to switching stacks than just loading a new stack pointer

Saturday, February 16, 2008 11:14 PM by Igor Levicki

I have experimented a bit with this and here is what I have found -- catching stack overflow generated by alloca() is not a problem. Sure Raymond has a point about esp[1]-5, and re-entering printf() (I never said it was good aynway) but that is not the real problem.

The problem is catching the stack overflow caused by accessing the stack beyond the guard page by a stray pointer.

Since the exception handler is executing in the context of a thread that caused the exception the process gets terminated and you don't even get the chance to catch the exception.

If I understand all of this correctly, the only way to catch the _real_ stack overflow would be to launch a process that monitors the thread in some way and make that process dump the context before the thread gets terminated. Obviously, that is exactly what any debugger can do.

I would be very interested to hear of some more lightweight approach to this problem.

The more I think of the whole concept of exception handling, the more it seems to me like an arbitrary and rather poor design. It may be inherited from the underlying x86 architecture, but I still believe that the whole winding/unwinding thing with pointers to exception structures intermingled with the random bits of data on the stack doesn't even remotely rhyme with the word "reliable".

I remember Motorola 68000 CPU where you could directly modify CPU exception handler pointers (those were called traps) so if your program did something bad you were sure to catch it. In my Amiga 500 I had 68010 which differed from 68000 by the fact that it didn't allow unaligned word accesses to memory. Some programs and games refused to work. My workaround was to write exception handler which could fit into the boot sector. It was intercepting alignment fault and emulating access to the odd addresses by using byte instead of word access. If only handling stack overflow on x86 would have been that easy.

By the way, Motorola also had two stack pointers -- supervisor and user stack pointer both using the same register A7.

# Great link on debugging...

Sunday, February 17, 2008 12:02 AM by Igor Levicki

Anyone interested in some advanced application debugging should see this:

http://www.debuginfo.com/examples.html

New Comments to this post are disabled
 
Page view tracker