Locks and exceptions do not mix

Locks and exceptions do not mix

Rate This
  • Comments 39

A couple years ago I wrote a bit about how our codegen for the lock statement could sometimes lead to situations in which an unoptimized build had different potential deadlocks than an optimized build of the same source code. This is unfortunate, so we've fixed that for C# 4.0. However, all is still not rainbows, unicorns and Obama, as we'll see.

Recall that lock(obj){body} was a syntactic sugar for

var temp = obj;
Monitor.Enter(temp);
try { body }
finally { Monitor.Exit(temp); }

The problem here is that if the compiler generates a no-op instruction between the monitor enter and the try-protected region then it is possible for the runtime to throw a thread abort exception after the monitor enter but before the try. In that scenario, the finally never runs so the lock leaks, probably eventually deadlocking the program. It would be nice if this were impossible in unoptimized and optimized builds.

In C# 4.0 we've changed lock so that it now generates code as if it were

bool lockWasTaken = false;
var temp = obj;
try { Monitor.Enter(temp, ref lockWasTaken); { body } }
finally { if (lockWasTaken) Monitor.Exit(temp); }

The problem now becomes someone else's problem; the implementation of Monitor.Enter takes on responsibility for atomically setting the flag in a manner that is immune to thread abort exceptions messing it up.

So everything is good now, right?

Sadly, no. It's consistently bad now, which is better than being inconsistently bad. But there's an enormous problem with this approach. By choosing these semantics for "lock" we have made a potentially unwarranted and dangerous choice on your behalf; implicit in this codegen is the belief that a deadlocked program is the worst thing that can happen. That's not necessarily true! Sometimes deadlocking the program is the better thing to do -- the lesser of two evils. 

The purpose of the lock statement is to help you protect the integrity of a mutable resource that is shared by multiple threads. But suppose an exception is thrown halfway through a mutation of the locked resource. Our implementation of lock does not magically roll back the mutation to its pristine state, and it does not complete the mutation. Rather, control immediately branches to the finally, releasing the lock and allowing every other thread that is patiently waiting to immediately view the messed-up partially mutated state! If that state has privacy, security, or human life and safety implications, the result could be very bad indeed. In that case it is possibly better to deadlock the program and protect the messed-up resource by denying access to it entirely. But that's obviously not good either.

Basically, we all have a difficult choice to make whenever doing multi-threaded programming. We can (1) automatically release locks upon exceptions, exposing inconsistent state and living with the resulting bugs (bad) (2) maintain locks upon exceptions, deadlocking the program (arguably often worse) or (3) carefully implement the bodies of locks that do mutations so that in the event of an exception, the mutated resource is rolled back to a pristine state before the lock is released. (Good, but hard.)

This is yet another reason why the body of a lock should do as little as possible. Usually the rationale for having small lock bodies is to get in and get out quickly, so that anyone waiting on the lock does not have to wait long. But an even better reason is because small, simple lock bodies minimize the chance that the thing in there is going to throw an exception. It's also easier to rewrite mutating lock bodies to have rollback behaviour if they don't do very much to begin with.

And of course, this is yet another reason why aborting a thread is pure evil. Try to never do so!

  • Great post, but a scary topic. I can't help feel that the "take-away" from all this is to not use the C# lock keyword in production code. It seems like it is broken and isn't getting fixed really.

    No, that's not the takeaway. The takeaway is that locking is only a starting point in ensuring thread safety. Ensuring that the lock body is truly atomic is hard. Locks are not magic dust that you sprinkle on code and suddenly it gets safe. -- Eric

  • Since the only situation this change affects is when an exception is thrown immediately after Monitor.Enter and before the try is entered, which also means before the protected state has been mutated, I don't see how this change implies any particular opinion about deadlocking versus state inconsistency.

    Perhaps I was unclear. This change does not imply any such opinion. The whole design of the lock statement implies that opinion. This change does not affect the more general problem, which was my point. -- Eric

    More generally, isn't the issue of an exception resulting in partially mutated state orthogonal to the issue of locking? The same situation would be true if there were not threading or locking involved, and an exception is thrown while mutating state. Anything which tries to read the partially mutated state after the exception is thrown could be in trouble. That's why the general guidance on handling exceptions is, if the process might be executing in an unknown state, let it die.

    Yes, you have similar problems in single-threaded land, but it is harder to avoid the problem in mutli-threaded land. In a single-threaded situation, someone up the call stack can catch the exception and fix up the state if need be, or cleanly shut down the process before touching it again, or whatever. In a multi-threaded situation, the very moment the lock is unlocked, some thread could immediately be in there messing with the state. -- Eric

  • I don't think locks have anything to do with the fact that a unexpected exception leaves a resource in an inconsistent state.

    The worst thing is asynchronous exceptions from which there's no true safeguard, unless you want to write some crazy almost impossible to test code with CERs, and which is not as easy to write as it might seem. And even if you do write it, I really don't know how you could test it completely without trowing a an asynchronous exception at every instruction.

    On the other hand synchronous exceptions ideally should never affect the state of you application.

    But in order to achieve this you need a programming model, where you never commit state changes to shared resources until you finished all processing on them.

    Ex:

      lock (syncKey)

      {

              tempResult1 = DoSomePotentiallyDangerousWork(sharedResource);

              tempResolt2 = DoSomeMoreDangerousWorkThatMightThrow(sharedResource, tempResul1);

              tempResult3 = DoMoreWork(sharedResource, tempResult2);

              * commit tempResult1, tempResoult2 , tempResoult3

      }

    Ideally the commit is a simple assignment of a reference or value, that has very small changes of failure, or operations that follow the same principle, in case of exceptions the state is never mutated.

    I always think my code should behave as small units of work, or small transactions that should change state in the safest way possible, only committing state changes at the end of each unit.

  • ihmo, the issue is not how the Lock statement is implemented, but rather is about why you should never use the ThreadAbortException.  Using the Abort method to stop a thread seems like a terrible thing to do.  Think about it - you stop a thread by arbitrarily triggering an exception in it, without underlying cause.  Therefore, the answer is, avoid using Thread.Abort like the plague.  If you need to be able to stop a thread, define a boolean flag, set it to abort the thread, and let the thread abort itself in a proper fashion, by checking the flag and shutting itself down cleanly, under its own control.  I have done multithreaded client and server code often, and never needed Thread.Abort.  Without Thread.Abort, the lock statement quirk described becomes a nonissue.  Ultimately, I think thread.Abort and the ThreadAbortException should just be eliminated from the framework to solve the problem.  But I agree with the design decision that a deadlock is the worst thing to happen.  I do not know of a case where it is not.  In server side apps, it means you have the whole IT department on your back, since they have nothing better to do than count server reboots, and for a client application, nothing is more exasperating for a user than an app that just freezes and sits there, needing to be end tasked.

    Indeed, thread abort exceptions are pure unmitigated evil and you should never use them. -- Eric

     

  • I've got to agree with the other commenters.  The lock statement does one thing and one thing only; ensure that the monitor enter and monitor exit calls are matched properly.  That is getting fixed.  Making sure all the application code is exception neutral is a completely separate issue.

  • I also agree with others. It's the responsibility of the programmer to ensure appropriate  exception safety of the code. lock() must only ensure that resource is unlocked when some exception is thrown within its body

  • I completely admire your writing and have great respect for your work. However in this case, as the other posters have said, exception safety is different from lock safety.

    We appear to be in violent agreement. :-) The point of this article is that getting the locks right is only the first step; getting the program truly threadsafe in the face of all possible exceptions requires more thought. -- Eric

    You write this:

    "Sometimes deadlocking the program is the better thing to do."

    And yet the deadlock situation that has been fixed is one where the deadlock occurs before the try block begins executing, meaning we haven't had a chance to mutate state yet.

    So, when this deadlock ever did occur in c# 3.0, it wasn't preventing half-mutated state from occurring. It was just preventing my program from continuing to run.

    Life and safety critical code is definitely important. However, a deadlock can be deadly too. If my heart defibrillator or vehicle/aircraft/missile guidance system stops being able to respond to my commands, someone could be in big trouble.

    So let's fix the problem we can right now, which is that deadlock (and especially the consistency problem).

    Meanwhile, let's keep educating developers on how to write exception-tolerant code, and to avoid Thread.Abort whenever possible. Also, I look forward to framework and other enhancements that make it easier to write safe code. Transactional memory, for example, looks promising.

    Informing developers that there is a problem is the first step in education, which is what I'm attempting to do here. As for transactional memory, I think it is a great idea, but we have yet to see an implementation that does not entail an unacceptably high performance price for the common (transaction succeeds) scenario. Once we have that, I definitely want to look at language enhancements that take advantage of it. -- Eric

  • I think this whole thing is much ado about nothing. If you don't use thread abort exceptions, this is a non-issue. If you do use them, you are screwed no matter what.

    So just don't use thread abort exceptions and stop worrying about it.

  • Will the C#4.0 fix also allow a TimeSpan or milliseconds to be passed in as a parameter to Enter so that an atomic setting of the lockWasTaken flag works when specifying a timeout?  It would be great to have a new version of the lock statement like below:

    lock (obj, timeout) { lockbody } else { timeoutbody }

    Sugar Coating for:

    bool lockWasTaken = false;
    var temp = obj;
    try {
       Monitor.Enter(temp, timeout, ref lockWasTaken);
        if (lockWasTaken) { lockbody } else { timeoutbody };
    }
    finally { if (lockWasTaken) Monitor.Exit(temp); }

    Sadly, no. I agree that this would be nice, but the cost of the feature doesn't pay for the benefit of the sugar. -- Eric

  • @DRBlaise, t I don't think "timed lock aqquires" is suck a frequent technique used in .Net that it requires a language construct to support it directly.

    Actually I've just ran an analyze of the Monitor.TryEnter(object, int) and Monitor.TryEnter(object, TimeSpan) using  Reflector and they aren't used a single time throughout the standard .Net 3.5 assemblies (WPF + WCF included)

  • I understand that the sugar coating was just pie in the sky, but will an Monitor.Enter or Monitor.TryEnter function be available to get an atomic setting of the lockWasTaken flag when passing in a TimeSpan or milliseconds?  I wouldn't think that would take too much time to implement.

  • I don't know how exception handling works in C#, but I'm guessing it's not any more powerful than C++.

    Read Chris Brumme's exegesis if you want the details of the relationship between C++ exception handling, win32 exception handling and managed exception handling. It is a fascinating read. -- Eric 

      I don't really know Lisp either, but I know that it has a fundamentally different and more powerful exception handling system.  For example, it's possible to handle an exception, clean up, and go back to what you were doing.  You don't need to unroll the stack and deal with things in some outer scope.  In a sense, you can cancel exceptions.  It's sort of like how you can ignore exceptions in VB, except you can do real cleanup as well.  I think it's actually more powerful than that, but the point is, you can do things with Lisp that are just flat impossible in C#.

    The price you pay is that it can be awfully brain-mangling to understand programs written using continuation passing style. See my blog archive for some articles on how to wrap your head around it. -- Eric

    I haven't thought it through, but I wouldn't be surprised if this problem had a simple solution given a sufficiently powerful exception handling system.  A more appropriate subtitle might be "Locks and excessively simplistic exception handling systems do not mix".

    Functional languages with CPS also emphasize programming without mutable state, eliminating the need for locks in the first place. -- Eric

    But, assuming you're stuck with C#, you still need to clean up, and you need to do it while the object is locked.  The syntactic sugar always needs to avoid the deadlock.  It's job is not to clean up after execptions.  You still need to do that manually.  It's easy:

    lock(foo) {
       try {
           body
       }
       finally {
           cleanup
       }
    }

    I think you got the code wrong. Surely that is meant to be

    lock(foo) {
       try {
           body
       }
       catch {
           cleanup
           throw;
       }
    }

    You don't want to clean up the mutation if no exception is thrown! -- Eric

  • Since you mention the ThreadAbortException, what happens when:

    try {
    Monitor.Enter(temp, ref lockWasTaken);
    { body }
    }
    finally
    {
       if (lockWasTaken)
           // <---- The exception happens here
           Monitor.Exit(temp);
    }

    Good question. In CLR v2 and higher the exception is delayed until the finally block is done. Which means that the thread might never end at all. Which is another reason why attempting to abort a thread is pure evil. No guarantee that it will actually work. -- Eric

  • Just as an aside, javac does

    monitorenter

    try { body }

    catch (Throwable t)

    {

     monitorexit

     throw t;

    }

    monitorexit

    which is the same as in a previous post, and seems a lot more sane to me.

  • Isn't using also a problem? As in:

    using(var s = new StreamReader()) {...}

    which translates into

    var s = new StreamReader();
    try { ... } finally { s.Dispose(); }

    and now you have the same issue.

    (I've seen people using IDisposable for locks also quite bit actually).

    Any plans to address this?

    Suppose an exception is thrown after the new but before the try. What's the worst thing that can happen? The disposer does not run, and the garbage collector eventually cleans up the resource a while later. Remember, "using" was designed to allow you to more easily optimize the cleanup of scarce resources. If a thread abort happens to un-optimize that process, it's probably not a big deal. (And if you are using "using" for something other than optimizing the cleanup of a scarce resource, then you're using it off-label. I recommend that you do not use "using" to implement locking semantics.) -- Eric

Page 1 of 3 (39 items) 123