Asynchrony in C# 5, Part Eight: More Exceptions

Asynchrony in C# 5, Part Eight: More Exceptions

Rate This
  • Comments 34

(In this post I'll be talking about exogenous, vexing, boneheaded and fatal exceptions. See this post for a definition of those terms.)

If your process experiences an unhandled exception then clearly something bad and unanticipated has happened. If its a fatal exception then you're already in no position to save the process; it is going down. You might as well leave it unhandled, or just log it and rethrow it. If it had been anticipated because it's a vexing or exogenous exception then there would be a handler in place for it. An unhandled vexing/exogenous exception is a bug, but probably one which does not actually indicate a logic problem in the program's algorithms, it's just an oversight.

But if you have an unhandled boneheaded exception then that is evidence that your program has a very serious bug indeed, a bug so bad that its operation cannot continue. The boneheaded exception should never have been thrown in the first place; you never handle them, you make for darn sure they cannot possibly happen. If a boneheaded exception is thrown then you have no idea whatsoever what locks were released early, what internal state is now corrupt or inconsistent, and so on. You can't do anything with confidence, and often the best thing to do in that case is to aggressively shut down the process before things get any worse.

We cannot easily tell the difference between bugs which are missing handlers for vexing/exogenous exceptions, and which are bugs that have caused a program crash because something is broken in the implementation. The safest thing to do is to assume that every unhandled exception is either a fatal exception or an unhandled boneheaded exception. In both cases, the right thing to do is to take down the process immediately.

This philosophy underlies the implementation of unhandled exceptions in the CLR. Way back in the CLR v1.0 days the policy was that an unhandled exception on the "main" thread took down the process aggressively, but an unhandled exception on a "worker" thread simply killed the thread and left the main thread running. (And an exception on the finalizer thread was ignored and finalizers kept running.) This turned out to be a poor choice; the scenario it leads to is that a server assigns a buggy subsystem to do some work on a bunch of worker threads; all the worker threads go down silently, and the user is stuck with a server that is sitting there waiting patiently for results that will never come because all the threads that produce results have disappeared. It is very difficult for the user to diagnose such a problem; a server that is working furiously on a hard problem and a server that is doing nothing because all its workers are dead look pretty much the same from the outside. The policy was therefore changed in CLR v2.0 such that an unhandled exception on a worker thread also takes down the process by default. You want to be noisy about your failures, not silent.

I am of the philosophical school that says that sudden, catastrophic failure of a software device is, of course, unfortunate, but in many cases it is preferable that the software call attention to the problem so that it can be fixed, rather than trying to muddle along in a bad state, possibly introducing a security hole or corrupting user data along the way. Software that terminates itself upon encountering unexpected exceptions is software that is less vulnerable to attackers taking advantage of its flaws. As Ripley said, when things go wrong you should take off and nuke the entire site from orbit; it's the only way to be sure. But does this awesome philosophy serve the async scenario well?

Last time I mentioned two interesting scenarios: (1) what happens if a task-returning async method does a WhenAll or WhenAny on multiple tasks, several of which throw exceptions? and (2) what if a void-returning async method awaits a task which completes abnormally? What happens to that exception?

Let's consider the first case first.

WhenAll collects all the exceptions from its completed sub-tasks and stuffs them into an aggregating exception. When all its sub-tasks complete, it completes its task abnormally with the aggregated exception. A slightly bizarre fact, however, is that by default, the EndAwait only re-throws the first of those exceptions; it does not re-throw the entire aggregating exception. The more common scenario is for any try-catch surrounding an "await" to be catching some set of specific exceptions; making you always write code that goes and unpacks the aggregating exception seems onerous. This may seem slightly odd; for more details on why this is a reasonable idea see Jon Skeet's recent posts on the topic. 

The WhenAny case is similar. Suppose the first sub-task completes, either normally or abnormally. That completes the WhenAny task, either normally or abnormally. Suppose one of the additional sub-tasks completes abnormally; what happens to its exception? The WhenAny is done: it has already completed and called its continuation, which is now scheduled to run on some work queue if it hasn't already.

In both the WhenAll and WhenAny cases we have a situation where there could be an exception that goes "unobserved" by the creator of the WhenAll or WhenAny task. That is to say, in both these cases there could be an exception that is thrown, automatically caught, cached and never thrown again which in the equivalent synchronous code would have brought down the process.

This seems potentially bad. Should an unobserved exception from a task that was asynchronously awaited take down the process, as the equivalent synchronous code would have?

Suppose we decide that yes, an unobserved exception should take down the process. When does that happen? That is, when do we definitively know that the exception actually was not re-thrown? We only know that if the task object is finalized without its result ever being observed. After all, a "living" task object that has completed abnormally could have its continuation executed at any time in the future; it cannot know when that continuation is going to be scheduled. There could be any number of queued-up tasks on this thread that get to run between the time this task completed abnormally and its result is requested. As long as the task object is alive then its exception could be observed.

OK, so, great, if a task is finalized, and it completed abnormally then we... what? Throw the exception on the finalizer thread? Sure! That will take down the process, right? In CLR v2.0 and above, unhandled exceptions on any thread take down the process. But let's take a step back. Remind me, why do we want an unobserved exception to take down the process? The philosophical reason is: we cannot tell whether this was a boneheaded exception that indicates a potentially horrible, security-impacting situation that needs to be dealt with by immediate termination, or simply the result of a missing handler for an unanticipated exogenous exception. The safe thing to do is to say that it was a boneheaded exception with a security impact and immediately take the process down. Which is precisely what we are not doing! We are waiting for the task to be collected by the garbage collector and then trying to take the process down in the finalizer thread. But in the gap between the exception being recorded in the task and the finalizer observing the exception, we've potentially kept right on running dozens more tasks, any of which could be using the inconsistent state caused by the boneheaded exception.

Furthermore, we anticipate that most async tasks that throw exceptions in realistic code will in fact be throwing exogenous exceptions like "the password for this web service is wrong" or "you don't have permission to read this file", or "this operation timed out", rather than boneheaded exceptions like "you dereferenced null" or "you tried to pop an empty stack". In these realistic cases it seems much more plausible to say that if for some reason a task completes abnormally and no one bothers to observe its result, it's because some asynchronous unit of work was abandoned; any of its sub-tasks that ran into problems connecting to web servers (or whatever) can safely be ignored.

In short, an unobserved exception from a finalized task is one that no one cares about, is probably harmless, and if it was harmful, then we've already delayed taking action too long to prevent more harm. Either way, we might as well just ignore it.

This does illustrate that asynchronous programming introduces a new flavour of security vulnerability. If there is a security vulnerability caused by a bug that would normally take down the process, and if that code is rewritten to be asynchronous, and if the buggy task is abandoned without observation of its exception, then the bug might not result in an aggressive destruction of the now-vulnerable process. And even if the exception is eventually observed, there might be a window in time between when the bug introduces the vulnerability and the exception is observed. That window might be large enough for an attacker to succeed. That sounds like a tortuous chain of things that have to go wrong - because it is - but attackers will take whatever they can get. They are crafty, they have all the time in the world, and they only have to succeed once.

I never did say what happens to a void-returning method that awaits a task; you can think of this as a "fire and forget" sort of method. Perhaps a void-returning button-click event handler awaits fetching some data asynchronously and then updating the user interface; there's no "caller" of the event handler that cares to hold on to a task, and will never observe its result. So what happens if the data-fetching task completes abnormally?

In that case, when the void-returning method (which registered itself as a continuation, remember) starts up again, it checks to see if the task completed abnormally. If it did, then it immediately re-throws the exception to its caller, which is, of course, probably some message loop. I believe the plan of action here is to be consistent with the behaviour described above; in that scenario the message loop will discard the exception, assuming that the fire-and-forget asynchronous method failed in some benign way.

Having been an advocate of the "nuke from orbit" philosophy of unhandled exceptions for many years, emotionally this does not sit well with me, but I'm unable to marshal a convincing argument against this strategy for dealing with exceptions in task-based asynchrony. Readers: What do you think? What is in your opinion the right thing to do in scenarios where exceptions of tasks go unobserved?

And on that somewhat ominous note, I'm going to take a break from talking about the new Task Asynchrony Pattern for now. Please download the CTP, keep sending us your feedback and questions, and start thinking about what sorts of things that will work well or work poorly with this new feature. Next time: we'll pick up with more fabulous adventures after American Thanksgiving; I'm cooking turkey for 19 this year, which should be quite the adventure in of itself.

  • Hi Eric,

    this might be a good time to get back to an old feature request: Make compiler-generated types serializable! This would have been useful for C# 2.0 iterators, and now even more so for async methods. The idea is that, if you can serialize the closure object, you can code some long-running stuff in a procedural or OO fashion, store it away, and then resume it later. Useful for workflow programming, or (in my case) CPS web programming (resume on next request, with support for state servers), like so:

    async Task ShowMyPages()

    {

     await ShowPage ("Page1.aspx");

     if (await ConfirmAction())

       await ShowPage ("Page2.aspx");

    }

    Wouldn't that be cool? Here's the new feature proposal for async:

    connect.microsoft.com/.../tasks-returned-by-async-methods-should-be-serializable-async-ctp

    I realize that this is quite a stunt to do, and you can never fully support it for any case (like, mixing this with real asynchronous code, using non-serializable local variables etc.) But it could work under defined constraints, and it would be way cool (and useful too).

  • I'm concerned this will become another place where one can forget that an exception might be thrown. I'm OK with not killing the process, but (for instance) I'd like to log it so I can see what's happening. Currently I'll add a handler to the AppDomain (in a WinForms/Console app) or the Application_Error event in ASP.NET where I log exceptions. Will this still work or is there another place to put this kind of code now?

  • Maybe it's actually time to start classifying exceptions?

    So you actually know that the exception is the result of some method interface violation (like ArgumentException, also think of code contracts) or pure bug (like NullReferenceException) or fatal system exception (like OutOfMemoryException) as opposed to just some failed operation not caused by a bug in the program.

    In that case you actually could kill the process by default if it's a bug, and ignore by default if it's not a bug.

  • I vote to nuke 'em from space (I wish Ripley was in charge for mining operations on Pandora in Avatar). While you will not prevent security vulnerability at least you will know that there was a problem (and thus a potential security vulnerability). Otherwise you may never find out about the issue. What is more you will filter out some attacks that cannot fit in the window of time where your system is in incosistent state and has not yet crashed.

  • How would you handle cases where one of the many tasks being awaited on is lets say updating a database or has invoked

    (or just about to invoke) a service to do a transfer between accounts?

    If you bring down the process you are now clueless as to what tasks succeeded or failed.

  • I vote to leave it like this.

    If an exception is thrown in a non-void returning asynchronous method and goes unnoticed, that means its continuationmethod isn't executed, but that means the developer doesn't need the result anymore, otherwise the continuationmethod would execute sometime. So if he doesn't care anymore about the result, why would he care whether the result was acquired succesfully?

  • I vote to leave it like this.

    If an exception is thrown in a non-void returning asynchronous method and goes unnoticed, that means its continuationmethod isn't executed, but that means the developer doesn't need the result anymore, otherwise the continuationmethod would execute sometime. So if he doesn't care anymore about the result, why would he care whether the result was acquired succesfully?

  • While you can argue that "fail fast" is the right thing to do, it's pretty hard to imagine when "fail eventually" is the right policy. Thus, you are correct that failed tasks shouldn't randomly cause your process to fail at times unrelated to when the original failure happened.

    That said, you should definitely make it possible to catch these unhandled exceptions. Otherwise there will be no way to log the errors.

  • Seems to me that you could use some extra syntax here. Some kind of "async catch" or something? Something that'd let you plug into the WhenAny or WhenAll and asynchronously catch the aggregated exceptions as they happen and react to them. I'm not sure exactly how that would work, though, or how to detect an uncaught exception.

    When you were discussing CPS you talked about having two continuations, one for normal completion, one for errors. Seems to me like an aggregating task, like WhenAny and WhenAll, really wants to be able to call the error continuation repeatedly with different errors. But I don't know if there's a sane way to express that in the Task object. I wonder if there's a way to detect "uncaught exception" eagerly by telling the Task object when it's created exactly what catch blocks are around at the time? Then if an exception is thrown that is unhandled, it could fail aggressively rather than waiting to find out it's unhandled later.

  • Thanks for posting this in detail! I've had my concerns about how async was dealing with exceptions.

    It seems to me that true fail-fast (not fail-eventually) would be doable by adding a flag to CreationOptions. This leaves the question of whether fail-fast is the desirable choice.

    To me, either way is acceptable, but I'll insist on good documentation and I'll request end-user overrides. e.g., the fail-fast approach would work as long as there was an easy way to say "mark this particular task as ignore-exceptions" (possibly using yet another CreationOptions flag). Although I do lean towards the opposite approach: ignore exceptions by default and have a task-specific continuation handle error conditions.

    One final observation: Task.WaitAny suffers from the same problem. For consistency, could you make Task.WaitAny work like WhenAny/WhenAll in vNext? It is a breaking change, but I would think it's worth it.

  • I agree that, if it is not deemed fatal, then there should be a means, on a per app domain basis, to get at the silent exceptions and either log them, or do your own Environment.Exit if you feel like it.

    Those implementing a more secure environment can then up their aggressiveness to something not quite Ripley-esque but certainly Vasquez levels of Gung-Ho ("Let's Roooock").

    The question of which thread you are on when this callback happens is interesting, though you could argue convincingly that for the intended use you should never care, only whether or not multiple such callbacks can happen at one.

  • "If it did, then it immediately re-throws the exception to its caller, which is, of course, probably some message loop."

    Well, that depends on the context. It's possible this is executing on a thread-pool thread to start with. It's *unlikely* that you'd fire-and-forget an async method on the thread pool, but it's possible. At that point it'll bring down your process in a Ripleyesque fashion:

    using System;

    using System.Threading;

    using System.Threading.Tasks;

    class Test

    {

       static void Main()        

       {

           ThreadPool.QueueUserWorkItem(ignored => AsyncMethod());

           Console.WriteLine("Sleeping...");

           Thread.Sleep(5000);

       }

       static async void AsyncMethod()

       {

           Console.WriteLine("In async method");

           await TaskEx.Run(() => { throw new Exception(); });

       }

    }

    Interestingly, while coming up with this example I *thought* I saw it not actually fail, once. (It completed normally after 5 seconds.) I don't know how that happened though, and I'm going to blame user error until I can reproduce it...

  • What would happen if you would disallow void returning async methods altogether? Users could still call an async method without actually assigning the returned Task to a variable, but they would do this at their own risk.

  • Leon: One inconvenience there would be that you couldn't use async methods as event handlers.

  • I'm of the mindset that I want immediate feedback in exceptional circumstances. Don't continue to work, break. Let normal exception handling rules apply so that I can *choose* to continue working, but don't let that be the default behavior. Give me a chance to fix the bug or add handling where possible.

Page 1 of 3 (34 items) 123