Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

Structured Exception Handling Considered Harmful

Structured Exception Handling Considered Harmful

Rate This
  • Comments 33

I could have sworn that I wrote this up before, but apparently I’ve never posted it, even though it’s been one of my favorite rants for years.

In my “What’s wrong with this code, Part 6” post, several of the commenters indicated that I should be using structured exception handling to prevent the function from crashing.  I couldn’t disagree more.  In my opinion, SEH, if used for this purpose takes simple, reproducible and easy to diagnose failures and turns them into hard-to-debug subtle corruptions.

By the way, I’m far from being alone on this.  Joel Spolsky has a rather famous piece “Joel on Exceptions” where he describes his take on exception (C++ exceptions).  Raymond has also written about exception handling (on CLR exceptions).

Structured exception handling is in many ways far worse than C++ exceptions.  There are multiple ways that structured exception handling can truly mess up an application.  I’ve already mentioned the guard page exception issue.  But the problem goes further than that.  Consider what happens if you’re using SEH to ensure that your application doesn’t crash.  What happens when you have a double free?  If you don’t wrap the function in SEH, then it’s highly likely that your application will crash in the heap manager.  If, on the other hand, you’ve wrapped your functions with try/except, then the crash will be handled.  But the problem is that the exception caused the heap code to blow past the release of the heap critical section – the thread that raised the exception still holds the heap critical section. The next attempt to allocate memory on another thread will deadlock your application, and you have no way of knowing what caused it.

The example above is NOT hypothetical.  I once spent several days trying to track down a hang in Exchange that was caused by exactly this problem – Because a component in the store didn’t want to crash the store, they installed a high level exception handler.  That handler caught the exception in the heap code, and swallowed it.  And the next time we came in to do an allocation, we hung.  In this case, the offending thread had exited, so the heap critical section was marked as being owned by a thread that no longer existed.

Structured exception handling also has performance implications.  Structured exceptions are considered “asynchronous” by the compiler – any instruction might cause an exception.  As a result of this, the compiler can’t perform flow analysis in code protected by SEH.  So the compiler disables many of its optimizations in routines protected by try/catch (or try/finally).  This does not happen with C++ exceptions, by the way, since C++ exceptions are “synchronous” – the compiler knows if a method can throw (or rather, the compiler can know if an exception will not throw).

One other issue with SEH was discussed by Dave LeBlanc in Writing Secure Code, and reposted in this article on the web.  SEH can be used as a vector for security bugs – don’t assume that because you wrapped your function in SEH that your code will not suffer from security holes.  Googling for “structured exception handling security hole” leads to some interesting hits.

The bottom line is that once you’ve caught an exception, you can make NO assumptions about the state of your process.  Your exception handler really should just pop up a fatal error and terminate the process, because you have no idea what’s been corrupted during the execution of the code.

At this point, people start screaming: “But wait!  My application runs 3rd party code whose quality I don’t control.  How can I ensure 5 9’s reliability if the 3rd party code can crash?”  Well, the simple answer is to run that untrusted code out-of-proc.  That way, if the 3rd party code does crash, it doesn’t kill YOUR process.  If the 3rd party code is processing a request crashes, then the individual request fails, but at least your service didn’t go down in the process.  Remember – if you catch the exception, you can’t guarantee ANYTHING about the state of your application – it might take days for your application to crash, thus giving you a false sense of robustness, but…

 

PS: To make things clear: I’m not completely opposed to structured exception handling.  Structured exception handling has its uses, and it CAN be used effectively.  For example, all NT system calls (as opposed to Win32 APIs) capture their arguments in a try/except handler.  This is to guarantee that the version of the arguments to the system call that is referenced in the kernel is always valid – there’s no way for an application to free the memory on another thread, for example.

RPC also uses exceptions to differentiate between RPC initiated errors and function return calls – the exception is essentially used as a back-channel to provide additional error information that could not be provided by the remoted function.

Historically (I don’t know if they do this currently) the NT file-systems have also used structured exception handling extensively.  Every function in the file-systems is protected by a try/finally wrapper, and errors are propagated by throwing exception this way if any code DOES throw an exception, every routine in the call stack has an opportunity to clean up its critical sections and release allocated resources.  And IMHO, this is the ONLY way to use SEH effectively – if you want to catch exceptions, you need to ensure that every function in your call stack also uses try/finally to guarantee that cleanup occurs.

Also, to make it COMPLETELY clear.  This post is a criticism of using C/C++ structured exception handling as a way of adding robustness to applications.  It is NOT intended as a criticism of exception handling in general.  In particular, the exception handling primitives in the CLR are quite nice, and mitigate most (if not all) of the architectural criticisms that I’ve mentioned above – exceptions in the CLR are synchronous (so code wrapped in try/catch/finally can be optimized), the CLR synchronization primitives build exception unwinding into the semantics of the exception handler (so critical sections can’t dangle, and memory can’t be leaked), etc.  I do have the same issues with using exceptions as a mechanism for error propagation as Raymond and Joel do, but that’s unrelated to the affirmative harm that SEH can cause if misused.

  • I'm pretty much in the same camp as Joel wrt exceptions. I just wish that __try/__finally were actually usable in C++. Not being able to use it in a scope that contains C++ objects makes it pretty much useless.
  • Use _set_se_translator if you have to use try/catch -- it lets you take SEH exceptions and throw then as C++ exceptions.
  • The only time I ever had to use SEH was to recover from predictable crashes in IE when I was hosting the web browser control from shdocvw.
  • SEH is not evil; __try/__except(1) and catch(...) are.
  • > Use _set_se_translator if you have to use
    > try/catch -- it lets you take SEH exceptions
    > and throw then as C++ exceptions.

    Don't forget to enable /EHa (and as a result, say good-bye to a lot of compiler optimizations) if you do this.

  • What optimizations are affected by /EHa?
  • Nevermind; uncovered it in the VC documentation eventually.

    For the curious:

    "In previous versions of Visual C++, the C++ exception handling mechanism supported asynchronous (hardware) exceptions by default. Under the asynchronous model, the compiler assumes any instruction may generate an exception.

    With the new synchronous exception model, now the default, exceptions can be thrown only with a throw statement. Therefore, the compiler can assume that exceptions happen only at a throw statement or at a function call. This model allows the compiler to eliminate the mechanics of tracking the lifetime of certain unwindable objects, and to significantly reduce the code size, if the objects' lifetimes do not overlap a function call or a throw statement. The two exception handling models, synchronous and asynchronous, are fully compatible and can be mixed in the same application.

    Catching hardware exceptions is still possible with the synchronous model. However, some of the unwindable objects in the function where the exception occurs may not get unwound, if the compiler judges their lifetime tracking mechanics to be unnecessary for the synchronous model."
  • Pavel,
    The problem is that if you're not going to do __try/__except(1) or (catch(...)), then what do you do?

    The hard part of getting SEH correct is that people don't know what to do for the __except(1) part - that is very, very hard to get right, and is app specific (so Microsoft can't provide a "right" answer).

    People look at SEH and their first assumption is that they can use it to add robustness to their applications. All I'm trying to say is that SEH cannot be used as a robustifier, it usually has the exact opposite effect.

  • If /EHa is used, compiler assumes that every instruction could raise a C++ exception so it needs to do a lot of bookkeeping to ensure for example that local variables with destructors are cleaned up properly.

    I don't know how much of an impact this has on performance but presumably it was important enough to switch to /EHs by default in VC6 (or was it VC5?) and even add things like __declspec(nothrow).
  • > The problem is that if you're not going to
    > do __try/__except(1) or (catch(...)), then
    > what do you do?

    In theory, you could use SEH to do relatively safe things like lazily committing memory by catching access violations when the buffer grows beyond its initial size (I think FormatMessage does this when you tell it to allocate the buffer for you). You just need to be careful to not catch more than you need - instead of using __except(1), write a filter that makes sure the exception code is right, the referenced address is where you expect it to be, etc. Return EXCEPTION_CONTINUE_SEARCH for everything that you don't recognize.

    In practice however I think that you're right - most apps should probably stay away from SEH. It doesn't play well with C++ exception handling, and complicates debugging, especially if you use it to handle critical exceptions like AVs (windbg stops on 1st chance AVs).
  • I had to chuckle a little when I saw this posted right after I removed a __try/__except(1) that was wrapping an entire program and was met with the a chorus of other developers objecting with "but that stops it from crashing!". Now I can just send a link instead of a long winded explanation whenever I hear that! Thanks Larry!
  • As I mentioned in a post the other day, I still think you argue from a point where you assume that the one catching an exception would not handle it properly. Also it seems presupposed without saying that adding exception handling to your application as a way of adding fault tolerance is a always a bad thing. And here I will of course disagree with you, as a properly designed exception strategy will not in any way endanger the state of your process.

    It is all about care in design. Exceptions are indeed resource demanding and performance degrading, but if properly designed and used where it makes sense, it will spare you alot of grief. I could give you an endless list of successful implementation of this from my experience, and never did we have troubles assuring the state of the process. We might be naive, but I would like to believe that we were rather more specific in what kinds of exceptions we wanted to guard against and we tried to fully analyze the impact of those exceptions occuring.

    You mentioned the double free scenario, and I do agree that it is an unfortunate situation, and catching it will not make your life better, HOWEVER it will not make your life worse, if it did, the design of your diagnostics built into your application where not good enough, an exception's origin can be pin pointed through proper logging. The double free will still set your application in a bad state, it will not run, however it was already not running, so your gained nothing, but lost nothing. (or then again you might be so lucky that the heap did detect it and you are "safe").

    Now, since we are building a fault tolerant application, a hanging process does not pose a problem for us, since we do indeed have supervisiion of the process itself, which will merely terminate the process if it stops reponding in the way we expect it to. So your application will still hang, but your application will still restart, just as it would without the exception handling. Some faults are not meant to be caught is the lesson learned, such faults will be weeded out, and so will all other exceptions that you do actually handle, except that the process was kept alive and kicking.

    And if you are indeed trying to debug a double free problem, then you are apparently able to somehow reproduce the problem, if you are, there are plently of good tools available to pin point that problem for you.


    You mentioned that the only effective way to use SEH is to to have try/finally in your entire call stack. However at some point you have to swallow that exception or crash, if you crash then what good do the exception handling do? (except of course if you weren't fiddling with process global resources).

    In any case, I will not argue the fact the exception handling can be disasterous, but then again, nor do I argue that casting is lethal and buffer overruns shouldn't occur. There are ways to avoid them all by proper design, still they occur. I will not argue that threading can be the worst thing that ever happened to you, but yet again I will say it is a very power design tool that should be used. Exception handling is a very powerful design tool, and as such it should be used, with the same precautions that you would use any other tool. I suppose you see were I am going, there are always bad apples to make the cake go sour, but that does not mean there aren't plenty of apples that will make it sweet.

    I do however agree that exceptions is not that best way to communicate an error condition, but it might be the very easiest way to communicate a severe, critical and rare occurence, which I believe is written into the word exception. I do not agree with the people using this a mean of "normal" error propagation

    I also do not agree that simple crash analysis gets harder or simpler. If it was hard, then the diagnostic functions of the application where not up to par with the rest of the design.

    C++ exceptions are indeed synchronous by default, but they can set to, or rather the compiler can be, asynchronous if you wish. Which is why you can use C++ exception syntax to catch asynchronous exceptions too, or via _set_se_translator.

    And finally, you will not get a robust applications just by adding a try/catch sporadically in the code, it comes from a proper and through design of the full system, not by each function by itself. You can have 95% of code not bothering with exception handling, and still have a very robust application.

    In my experience the grief caused by exceptions is no where near the grief casued by race conditions and other subtile problems you might encounter. Race conditions, opposed to common view(in my experience), are equally hard in single threaded application as a multi threaded ones. Subtile race conditions are temporal dependent, and time, as we know, is a funny beast.

    And in the end, exceptions should not be used IMHO if proper function is more important that availbility. If proper function is a higher concern, then your main concern is to keep your state intact at all costs, exceptions makes this extremely difficult, at least if you want a 100% consistent state. However if minor glitches in the system are tolerated, but availability is important, then proper exception handling will save you time.

    As I tend to think, a mal functioning car is tolerated by most, but a total shutdown of the car is a big annoyance, actually that annoyance might discourage you from ever buying that brand again, but on the contrary for the car that actually still worked, might encourage you to actually by it again. This is of course given that both cars were put through the same kind of failure scenario.

    The again one could argue that is just an issue of robust application design, true, it indeed is, and robust exception handling to failure scenarios you weren't fully thinking of is part of it IMHO.

    Also in a robust environment, if the supervision of the application detect a high error rate in any part of the application, it should alert, or even actually reboot the process(in rare cases however, as you want an operator to do it).

    A improper exception design can make your application more vulernable for DoS, but then again a proper design without exception handling can make you less or equal to one with...

    Exceptions are exceptions and not a rule, apply that to your design, but do not turn down a powerful and useful tool.
  • As it took my forever to compile that post in this small window, I now notice that plenty of posts has come in since I started.

    Ian:
    What did you gain by removing the catch(1)?

    Pavel:
    In what way does exception handling complicate debugging?


  • Great post Larry.

    I'd also like to add one further argument against SEH:
    I believe it is patented and so is not supported on other compilers and platforms.
  • > In what way does exception handling complicate debugging?

    Extensive use of exceptions, especially low-level ones like access violations, complicates debugging because you can no longer tell truly exceptional cases from normal program operation.

    Here's a scenario that I've seen many times. You suspect that you have a crash somewhere in your program. You don't know for sure because somebody is catching it with __except(1) or catch(...), so instead of a nice memory dump with a callstack that tells you exactly where the problem happened, you get a deadlock with some orphaned locks, or your process simply disappears, or dies with some undebuggable error.

    You try running the program under debugger so that you can catch 1st chance access violations and other "bad" exceptions, only to find that it actually raises dozens of such exceptions during its normal operation. You waste even more time trying to filter out the noise and locate the real problems.

    All this because two fundamental rules of exception handling have been violated:

    1. Don't catch exceptions that you don't know how to recover from.

    2. Only use exceptions for exceptional cases.
Page 1 of 3 (33 items) 123