Locks and exceptions do not mix

Locks and exceptions do not mix

Rate This
  • Comments 39

A couple years ago I wrote a bit about how our codegen for the lock statement could sometimes lead to situations in which an unoptimized build had different potential deadlocks than an optimized build of the same source code. This is unfortunate, so we've fixed that for C# 4.0. However, all is still not rainbows, unicorns and Obama, as we'll see.

Recall that lock(obj){body} was a syntactic sugar for

var temp = obj;
Monitor.Enter(temp);
try { body }
finally { Monitor.Exit(temp); }

The problem here is that if the compiler generates a no-op instruction between the monitor enter and the try-protected region then it is possible for the runtime to throw a thread abort exception after the monitor enter but before the try. In that scenario, the finally never runs so the lock leaks, probably eventually deadlocking the program. It would be nice if this were impossible in unoptimized and optimized builds.

In C# 4.0 we've changed lock so that it now generates code as if it were

bool lockWasTaken = false;
var temp = obj;
try { Monitor.Enter(temp, ref lockWasTaken); { body } }
finally { if (lockWasTaken) Monitor.Exit(temp); }

The problem now becomes someone else's problem; the implementation of Monitor.Enter takes on responsibility for atomically setting the flag in a manner that is immune to thread abort exceptions messing it up.

So everything is good now, right?

Sadly, no. It's consistently bad now, which is better than being inconsistently bad. But there's an enormous problem with this approach. By choosing these semantics for "lock" we have made a potentially unwarranted and dangerous choice on your behalf; implicit in this codegen is the belief that a deadlocked program is the worst thing that can happen. That's not necessarily true! Sometimes deadlocking the program is the better thing to do -- the lesser of two evils. 

The purpose of the lock statement is to help you protect the integrity of a mutable resource that is shared by multiple threads. But suppose an exception is thrown halfway through a mutation of the locked resource. Our implementation of lock does not magically roll back the mutation to its pristine state, and it does not complete the mutation. Rather, control immediately branches to the finally, releasing the lock and allowing every other thread that is patiently waiting to immediately view the messed-up partially mutated state! If that state has privacy, security, or human life and safety implications, the result could be very bad indeed. In that case it is possibly better to deadlock the program and protect the messed-up resource by denying access to it entirely. But that's obviously not good either.

Basically, we all have a difficult choice to make whenever doing multi-threaded programming. We can (1) automatically release locks upon exceptions, exposing inconsistent state and living with the resulting bugs (bad) (2) maintain locks upon exceptions, deadlocking the program (arguably often worse) or (3) carefully implement the bodies of locks that do mutations so that in the event of an exception, the mutated resource is rolled back to a pristine state before the lock is released. (Good, but hard.)

This is yet another reason why the body of a lock should do as little as possible. Usually the rationale for having small lock bodies is to get in and get out quickly, so that anyone waiting on the lock does not have to wait long. But an even better reason is because small, simple lock bodies minimize the chance that the thing in there is going to throw an exception. It's also easier to rewrite mutating lock bodies to have rollback behaviour if they don't do very much to begin with.

And of course, this is yet another reason why aborting a thread is pure evil. Try to never do so!

  • > I am aware that this abuse is widespread. Being widespread does not make it a good idea. It is a bad idea to use our tools "off label" for purposes that they were never intended to be used for. And it should not be a surprise that when you do, you sometimes get unexpected bad behaviour.

    Eric, but isn't using+IDispose just the C# implementation of RAII; and if so, why not use it as such?

    As you yourself said already, Thread.Abort breaks so many other things that it is no surprise it breaks RAII too. But in its absence, what's wrong with using?

  • Aborting a thread is quite safe if you run the thread in its own AppDomain and recreate its domain after aborting the thread. This is not as difficult or inefficient as it sounds and is often a very practical approach.

    Correct. You don't want to take a thread down unless you are doing so in the process of taking down its appdomain. If you cannot write a thread that can shut itself down cleanly, then the appdomain is the sensible structure to use for work that may be arbitrarily cancelled. -- Eric

    Please don't deprecate or in any way cripple Thread.Abort - it is a powerful and essential feature if used properly. Without it, you wouldn't be able to cancel a runaway query in LINQPad!

    Joe

  • "...but it's not all rainbows, unicorns, and Obama..."

    Haha! Thanks for that.

  • I am probably going to receive some serious hate mail for this blog posting. As the years have gone by, I've grown to believe that exceptions were a laudable idea gone almost entirely awry. Perhaps there is some esoteric programming...

  • By "rainbows, unicorns and Obama" are you perhaps referring to this?

    http://iamchriscollins.com/badpaintingsofbarackobama/images/4.jpg

  • - Do not do things that can throw exceptions in a critical section

    - Always have an explicit cleanup after the criticial seciotn is exited to handle obscure errors

       bool lock_obtained = false

       lock (abc)

           {

           lock_obtained = true

           do work

           unlock (abc)

           }

      if (lock_obtained == TRUE OR abc.is_locked()) then unlock(abc)

    - Important for code quality/readability - Put as little code as possible inside of the lock/unlock body by making the guts of a locked section a function call.  This makes it much easier to visually see that the lock obtain and lock cleanup code exists around the critical section.

    - Important for correctness - Make functions that are used inside of a critical section STATIC so that they do not rely on an instantiated object and pass in all data used by the critical section (i.e., no global variables) if possible.  This avoids the common problem of shared access to data in multithreded systems.  The critical section should not refer to any resource that it is not passed and if possible not rely on allocated objects (open file handles).

    - A safer alternative is to queue work done by critical sections if possible.  Winforms and win32 gui applications would routinely receive an interrupt message and call postmessage to queue it back to the main GUI's thread without doing any real work so that the main GUI thread could process the message/event normally without concurrency penalty or critical sections.  This is commonly done in old school C applications with the interrupt handlers since many older operating systems could not handle reentrant executables witout great programming pain (dos, unix system V, etc)

  • Eric, I must respectfully disagree with the statement that the change makes things consistently bad.  The implicit contract here that once you exit the braces the lock will always be released.  My opinion is  having syntactic sugar that does not perform as expected is very nasty.

    "The takeaway is that locking is only a starting point in ensuring thread safety. Ensuring that the lock body is truly atomic is hard. Locks are not magic dust that you sprinkle on code and suddenly it gets safe."

    I completely agree.  The truth is that writing good multithreaded code is very hard; and its not any easier when things do not work as expected

  • Regarding the comment earlier on this thread regarding transactional memory.....

    I was involved in a project that achieved this nearly 20 years ago, with acceptable performance (ie you could not statistically measure any performance difference using the transactional memory and normal memory)

    Of course I had one BIG advantage....it was custom HARDWARE that implemented the transactional memory.

    Great post on while lock(...) did not and does not, and probably never will magically make a system "safe"..on conventional hardware..

  • If I'm understanding you correctly you are saying attempt the lock but return it to its previous state as a sort of roll back mechanism?

    public void Process(Car car, string make, string model)

    {

               var originalCar = car;

               var lockObj = new object();

               try

               {

                   lock (lockObj)

                   {

                       car.Make = make;

                       car.Model = model;

                   }

               }

               catch

               {

                   //an error occured

                   //return car to previous state

                   car = originalCar;

               }

               finally

               {

               }

    }

Page 3 of 3 (39 items) 123