What is this thing you call "thread safe"?

What is this thing you call "thread safe"?

Rate This
  • Comments 30

Caveat: I am not an expert on multi-threading programming. In fact, I wouldn't even say that I am competent at it. My whole career, I've needed to write code to spin up a secondary worker thread probably less than half a dozen times. So take everything I say on the subject with some skepticism.

A question I'm frequently asked: "is this code thread safe?" To answer the question, clearly we need to know what "thread safe" means.

But before we get into that, there's something I want to clear up first. A question I am less frequently asked is "Eric, why does Michelle Pfeiffer always look so good in photographs?" To help answer this pressing question, I consulted Wikipedia:

"A photogenic subject is a subject that usually appears physically attractive or striking in photographs."

Why does Michelle Pfeiffer always look so good in photographs? Because she's photogenic. Obviously.

Well, I'm glad we've cleared up that mystery, but I seem to have wandered somehwat from the subject at hand. Wikipedia is just as helpful in defining thread safety:

 "A piece of code is thread-safe if it functions correctly during simultaneous execution by multiple threads."

As with photogenicity, this is obvious question-begging. When we ask "is this code thread safe?" all we are really asking is "is this code correct when called in a particular manner?" So how do we determine if the code is correct? We haven't actually explained anything here.

Wikipedia goes on:

"In particular, it must satisfy the need for multiple threads to access the same shared data, ..."

This seems fair; this scenario is almost always what people mean when they talk about thread safety. But then:

"...and the need for a shared piece of data to be accessed by only one thread at any given time."

Now we're talking about techniques for creating thread safety, not defining what thread safety means. Locking data so that it can only be accessed by one thread at a time is just one possible technique for creating thread safety; it is not itself the definition of thread safety.

My point is not that the definition is wrong; as informal definitions of thread safety go, this one is not terrible. Rather, my point is that the definition indicates that the concept itself is completely vague and essentially means nothing more than "behaves correctly in some situations". Therefore, when I'm asked "is this code thread safe?" I always have to push back and ask "what are the exact threading scenarios you are concerned about?" and "exactly what is correct behaviour of the object in every one of those scenarios?"

Communication problems arise when people with different answers to those questions try to communicate about thread safety. For example, suppose I told you that I have a "threadsafe mutable queue" that you can use in your program. You then cheerfully write the following code that runs on one thread while another thread is busy adding and removing items from the mutable queue:

if (!queue.IsEmpty) Console.WriteLine(queue.Peek());

Your code then crashes when the Peek throws a QueueEmptyException. What is going on here? I said this thing was thread safe, and yet your code is crashing in a multi-threaded scenario.

When I said "the queue is threadsafe" I meant that the queue maintains its internal state consistently no matter what sequence of individual operations are happening on other threads. But I did not mean that you can use my queue in any scenario that requires logical consistency maintained across multiple operations in a sequence. In short, my opinion of "correct behaviour" and your opinion of the same differed because what we thought of as the relevant scenario was completely different. I care only about not crashing, but you care about being able to reason logically about the information returned from each method call.

In this example, you and I are probably talking about different kinds of thread safety. Thread safety of mutable data structures is usually all about ensuring that the operations on the shared data always operate on the most up-to-date state of the shared data as it mutates, even if that means that a particular combination of operations appears to be logically inconsistent, as in our example above. Thread safety of immutable data structures is all about ensuring that use of the data across all operations is logically consistent, at the expense of the fact that you're looking at an immutable snapshot that might be out-of-date.

The problem here is that the choice about whether to access the first element or not is based on "stale" data. Designing a truly thread-safe mutable data structure in a world where nothing is allowed to be stale can be very difficult. Consider what you'd have to do in order to make the "Peek" operation above actually threadsafe. You'd need a new method:

if (!queue.Peek(out first)) Console.WriteLine(first);

Is this "thread safe"? It certainly seems better. But what if after the Peek, a different thread dequeues the queue? Now you're not crashing, but you've changed the behaviour of the previous program considerably. In the previous program, if, after the test there was a dequeue on another thread that changed what the first element was, then you'd either crash or print out the up-to-date first element in the queue. Now you're printing out a stale first element. Is that correct? Not if we always want to operate on up-to-date data!

But wait a moment -- actually, the previous version of the code had this problem as well. What if the dequeue on the other thread happened after the call to Peek succeeded but before the Console.WriteLine call executed? Again, you could be printing out stale data.

What if you want to ensure that you are always printing out up-to-date data? What you really need to make this threadsafe is:

queue.DoSomethingToHead(first=>{Console.WriteLine(first);});

Now the queue author and the queue user agree on what the relevant scenarios are, so this is truly threadsafe. Right?

Except... there could be something super-complicated in that delegate. What if whatever is in the delegate happens to cause an event that triggers code to run on another thread, which in turn causes some queue operation to run, which in turn blocks in such a manner that we've produced a deadlock? Is a deadlock "correct behaviour"? And if not, is this method truly "safe"?

Yuck.

By now you take my point I'm sure. As I pointed out earlier, it is unhelpful to say that a building or a hunk of code is "secure" without somehow communicating which threats the utilized security mechanism are and are not proof against. Similarly, it is unhelpful to say that code is "thread safe" without somehow communicating what undesirable behaviors the utilized thread safety mechanisms do and do not prevent. "Thread safety" is nothing more nor less than a code contract, like any other code contract. You agree to talk to an object in a particular manner, and it agrees to give you correct results if you do so; working out exactly what that manner is, and what the correct responses are, is a potentially tough problem.

************

(*) Yes, I'm aware that if I think something on Wikipedia is wrong, I can change it. There are two reasons why I should not do so. First, as I've already stated I'm not an expert in this area; I leave it to the experts to sort out amongst themselves what the right thing to say here is. And second, my point is not that the Wikipedia page is wrong, but rather that it illustrates that the term itself is vague by nature.

 

  • If something is thread safe, say an API, it can be used by several threads at the same time without the caller having to think about it. This is as old as computers.  It is not something you can wave your hand at and say is vague and then ignore. If something is not thread safe, you should only use it from a single thread, or it is likely to crash, hang, or just silently go wrong. If you want to develop in a thread safe manner, and you do, you must think defensively, and at very least do a global lock so in your code, things are only happening single threaded. Later on, carefully break stuff up into separate locks, then maybe use read/write locks. Yes it's harder, and there are many traps, but there is many off the shelf tools and techniques, plus most of computer history, to help you. That or develop in a language that does it for you, but that will always be more limited than a manual environment. Just like with cars and gears, all racing cars are manual and you can't get the same miles per gallon without manual. Your fooling yourself if you think automatic is as good.

  • I have to agree with Eric on this one.  While there are good definitions of "thread safe", there is no universal agreement on which of the many good definitions is "correct".  Correctness, of course, depends on the circumstances.  As a result, the term "thread safe" with no further qualification is vague at best, dangerously misleading at worst.

    Thread safety is somewhat analogous to exception safety.  The C++ community has settled on a multi-tier defintion of "exception safe" - I would propse that a similar family of thread safety guarantees would be a useful addition to the dialog.  The mathematical defintion that Alun provided above sounds like a good candiate for being "the strong guarantee".  At another extreme, a class that's documented as providing a single method that can be invoked by less than 3 threads under specific circumstances would be an example of a very weak guarantee.

    The problem with the very stong guarantee that Alun provided - just like the stong exception safety guarantee in C++ - is that most cases don't require a guarantee that strong, and generally speaking, providing such a strong guarantee is more difficult and less efficient (naturally, there are always exceptions).

  • I'm hoping (fingers crossed!) that the inclusion of the "Michelle Pfeiffer" tag means that Ms. Pfeiffer will make additional appearances in future examples. I for one welcome our new photogenic example overlords.

  • Hi Eric,

    "My whole point was that simply saying that an object is "thread-safe" tells you almost nothing about how to correctly use that object."

    No. It tells you something very, very important about how to correctly use that object: It tells you that you do not have to restrict the number or relative timing of threads that call into it for the /rest/ of the documentation to hold. This is a non-trivial piece of knowledge, and the meaning is clearly defined. I can document up my code as much as I like, but if I don't include the information that the code is thread-safe then any user must assume that it will all go to hell if they allow multiple threads to access with arbitrary timings.

    "Thread safe" means "correct".

    No, it doesn't. It means that whatever the code has been described to do, it will continue to do it properly regardless of the number and relative timing of threads that call it. Clearly, what "correct" means depends on what the code is intended to do, and clearly this must be defined. But that point is orthogoal to thread-safety; /all/ code must be documented so that the user knows what it is supposed to do. The fact that code must be documented before it is useable does not diminish the usefulness of the term "thread-safe" as part of that documentation.

    "Rather, the object must be extensively documented so that its exact contract can be stated."

    /Any/ code must be extensively documented so that its exact contract can be stated. "Thread-safe" is a useful part of that documentation.

    The problem here is that you are proposing a broader meaning of "thread-safe" than has ever existed, and then pointing out that your broader proposed meaning is too broad to be meaningful. Well, yes; your proposed meaning /is/ way too broad. But the meaning of "thread-safe" has always been much more narrow and specific. Usefully so. "Thread-safe" means the code will work as documented regardless of the number and relative timing of the calling threads.

    "Thread-safe" does /not/ mean that the code will help solve whatever threading challenges your application has. It does not mean, and has never meant, that it will meet any and all needs for consistent presentation of data to multiple threads. It does not mean that it will save you from having to figure out how to solve those problems yourself. How could it possibly mean that?

    Look, you and I agree on what "thread safe" typically means in typical documentation. You don't need to convince me of that. (Whether what it means is useful is another question, which I'll come to in a moment.)

    My point, which you seem to both be doing an admirable job of forcefully supporting, and yet at the same time completely missing, is that when I am asked by a customer "is this code thread safe?" nine times out of ten, they have a definition of "thread-safe" in their head that you would describe as "not at all the real meaning of thread-safe". Whether a precise mathematical definition exists or not is irrelevant; the way the term is used in practice by customers who ask me questions is completely vague. Which is why I call out that when I'm asked whether code is "thread safe" I have to immediately push back in order to determine exactly what the customer believes "thread safe" means. Because they almost certainly mean what Denis means above, not what you and I mean, and certainly not what our mathematically-inclined friend above means.- Eric

     

     

    Every single one of your examples involves problems in, or requirements of, the calling code, and how the Queue does not solve or meet these. "Thread-safe" has never been about any of these sorts of requirements. It has only ever been about /one/ thing: whether or not the code works as documented when called by multiple threads.

    You say that code should say what specific security threats it is armoured against rather that just saying it is "secure". Well, saying code is "thread-safe" is doing /exactly/ what you want; it is stating a particular /and very specific/ capability of the code. It is saying that whatever the code is documented to do, it will do so correctly regardless of the number and relative timing of threads calling into it.

    In summary: "thread-safe" does /not/ mean "correct". It does /not/ mean "useful in your situation". It does /not/ mean "provides the consistent view of shared data that you need". Yes indeed, all of those things must be described by the rest of the documentation. Then the term "thread-safe" lets you know that all of that rest of the documentation will continue to hold in the presence of multiple threads.

    Now, all that said, I think you've overstated the case somewhat. Let's take my naive threadsafe queue as an example. Suppose the documentation for IsEmpty() says that it "This method of the threadsafe queue returns true if the queue is empty and false otherwise". Is this documentation accurate? Does it actually do that in a multithreaded situation? No, it does not. It returns true if the queue has ever been empty in the past, and false if the queue has ever been non-empty in the past. If both conditions are met, which one you get depends on timings of other operations on other threads. That's very different! If you want to know what the queue is NOW, you have to put a lock around every access to the queue.

    Now, the documentation should probably say that (and the method should probably be called "MightOrMightNotHaveBeenEmptyAtSomePointInThePast()" What conclusion could we reach other than "if you use IsEmpty then every access to this allegedly-threadsafe queue needs to be synchronized, exactly as though it were not threadsafe in the first place"?

    Does a queue that needs to be globally synchronized in order for its most basic methods to be used predictably seem "threadsafe" to you? Maybe to you it does. To most of the customers who ask me questions like this? Certainly not.

    Another example. Consider my recent post on the thread safety of event delegate invocations. A lot of people ask "is this event invocation code threadsafe?" Are they asking "does this code never dereference null?" or are they asking "does this code never invoke a previously-removed event handler?" Both seem like perfectly reasonable interpretations of "is this code threadsafe?" but the answer to one of those questions is "yes" and the answer to the other is "no", so it is rather important to know which one they're asking about.

    Most of the time when people say that their code is "threadsafe" they mean that it happens to have the almost completely useless property that it maintains internal consistency robustly in the face of arbitrary thread timings. Though I suppose that's a nice property to have, does it really matter whether an object has that property if in order to use any of its basic methods, you're going to have to synchronize access to it exactly as though it did not armor its internals against race conditions?

    Basically, what you're saying is that a non-threadsafe object is an object that has completely undefined behaviour when called from multiple threads; if you want defined behaviour, you have to synchronize access to it. A threadsafe object, by contrast, has defined behaviour, and that behaviour is defined to be inconsistent and timing-dependent. If you want consistent behaviour then again, you have to synchronize access, exactly the same as if it had been non-threadsafe in the first place. Frankly, the difference between "undefined" and merely "inconsistent" seems pretty weak, hardly worth the effort of making the object threadsafe in the first place if every consumer of the object is going to need to synchronize access to it anyway. -- Eric

  • Eric:

    If I said the that "this code is thread un-safe" would it be less vague than "is this code thread safe"?

  • I don't think the assertion of thread safety tells us nothing, or even almost nothing. I've always interpreted "thread-safe" to assert the *minimal* contract of thread-safety: instance separation (no mutable static state) and method/property atomicity; operations on a shared instance in multiple threads will behave the same way and leave the instance in a the same state at sequence points as the same operations called in an unpredictable, arbitrary order on one thread. In other words, if one thread calls obj.A () and another thread calls obj.B (), then those calls will be equivalent to calling obj.A (); obj.B (); or obj.B (); obj.A () in one thread. I never have to lock an thread-safe object to do a single call; I have to lock it (or do something special) only when I want to be sure of the *order* of calls.

    As you note, this is not a very strong contract, and I'm not at all surprised that some people might believe "thread safety" asserts a stronger contract. But minimal thread-safety is still a non-trivial contract.

  • I agree with Larry, and I'll add that a lack of thread-safety doesn't just affect ordering. Code that isn't thread-safe being called from multiple threads easily results in state corruption, seemingly arbitrary exceptions and memory corruption (not in all languages). When you call code that's simply "thread-safe", you can expect behaviour approximately matching that of the contract.

    If there's a thread-safe queue with TryPop, I know I can use that from multiple threads without breaking anything and that each call will get a uniquely-added entry (or false,null). However, Peek will obviously always be meaningless in such circumstances.

    This does all make me think, though; there may be some good mileage in having thread-safe libraries go out of their way to induce failure in multi-threaded callers, in an appropriate debug mode. Many queue methods, for example, could delay a little, in hope of returning an unexpected null to a caller or returning the same result simultaneously to two callers (where Peek is involved).

    Hopefully, more people will start using multi-threading idioms other than shared-state-with-locks, and much of the problem will be reduced.

  • @Dan: "This raises the bigger question - is Michelle Pfeiffer  thread safe? "

    Michelle Pfeiffer can do whatever she wants to my threads :)

    (sorry for the OT, I couldn't resist)

  • I would define a method to be thread-safe if atleast one of these is true:

    1. it is a pure function

    2. For functions that read and/or write mutable shared state, if the class invariants can NOT be violated as a consequence of the number, timing and/or interleaving of threads executing this very same method concurrently.

    Eric, I would say that you are taking on a whole different animal in your example by talking about thread-safety at the level of conducting multiple operations on an object whose state can be modified by different threads. Thread-safe constructs as they are used now do NOT compose (EVER). That is to say in:

    if (!queue.IsEmpty) Console.WriteLine(queue.Peek());

    I might take and release a lock in IsEmpty and do the same in Peek(), but the outcome of composing these into the above statement is NOT thread safe. This is, I would argue,  not a knock on the thread-safety of the member functions of Queue at all. You could easily have a TryPeekAndPrint() call on Queue that does the above operation in a thread-safe manner by composing all the other operations.

    This is pretty much where Software Transactional Memory comes in since you want to make sure that the outcome of doing IsEmpty and Peek is not influenced by other threads or transient behavior.

  • There's exactly the same remark in Java Concurrency in Practice, section 2.1...

  • Hi Eric,

    "My point, which you seem to both be doing an admirable job of forcefully supporting, and yet at the same time completely missing, is that when I am asked by a customer "is this code thread safe?" nine times out of ten, they have a definition of "thread-safe" in their head that you would describe as "not at all the real meaning of thread-safe"."

    Are you really going to propose that the meaningfulness or usefulness of a technical term depends on how well customers understand it? Really? How many meaningful or useful technical terms would we have left if that were the standard?

    Look: the fact that a customer (or you, or any of the hypothetical other people you mention) might misunderstand what "thread-safe" means does not mean it is meaningless or useless. Multi-threading is /hard/. It is hard to understand the issues, and it is hard to get right even when using thread-safe components. The fact that one has to have a good understanding of threading to write correct mult-threaded code does /not/ mean that "thread-safe" is meaningless.

    "Suppose the documentation for IsEmpty() says that it "This method of the threadsafe queue returns true if the queue is empty and false otherwise". Is this documentation accurate? Does it actually do that in a multithreaded situation? No, it does not. It returns true if the queue has ever been empty in the past, and false if the queue has ever been non-empty in the past. If both conditions are met, which one you get depends on timings of other operations on other threads. That's very different! If you want to know what the queue is NOW, you have to put a lock around every access to the queue."

    Anyone who understood multithreading would know this immediately, without having to be told. Someone who didn't definitely might not. This does /not/ show that the term "thread-safe" is meaningless. It shows that people who do not understand threading issues will likely not understand what "thread-safe" really means. In other shocking news, people who do not understand quantum mechanics will not likely understand what "superposition" really means.

    To be blunt (but not unkind), I think your surprise/outrage at how useless IsEmpty would be in a multithreaded scenario says much more about your experience with multithreading than anything else. It is, as I said before, Threading 101. Most people haven't taken Threading 101. Their misunderstanding of threading terms does not say much about the usefulness or meaningfulness of those terms.

    I've written many, many thread-safe queues. None of them have an IsEmpty that is intended to be used when anything else could be accessing the thread at the same time. Neither do they have a Peek(). They have a "bool TryDequeue(out T value)", and even with /that/ one has to be very careful to make sure that whatever thread is servicing the queue won't end up stranding some entry that was put in right after the last time they checked and got back a "false". So then one adds an Event or a "bool WaitPop(Timespan wait, out T value)", or both, to /help/ the calling code do things correctly. Not /guarantee/. /Help/. There is /nothing/ I can do to make sure someone understands for sure how to use my queue correctly. And even if my queue only had useless methods like IsEmpty and Peek, I could /still/ call it thread-safe as long as it wouldn't become internally inconsistent when two threads called into it at the same time. Thread-safe, yes. Useful, no. Thread-safe does /not/ mean "useful" or "correct". It means "thread-safe".

    "Most of the time when people say that their code is "threadsafe" they mean that it happens to have the almost completely useless property that it maintains internal consistency robustly in the face of arbitrary thread timings."

    Eric, the fact that you consider this property to be "almost completely useless' indicates to me that you can't possibly have had to write much multithreaded code. Good thread-safe components are gold. Try to write a good Queue that helps one thread pass information to another, and you'll see.

    "A threadsafe object, by contrast, has defined behaviour, and that behaviour is defined to be inconsistent and timing-dependent. If you want consistent behaviour then again, you have to synchronize access, exactly the same as if it had been non-threadsafe in the first place."

    No, no, no. Please take your IsEmpty example and erase it from your mind. Nobody with any competence writes a thread-safe queue with an IsEmpty that is intended to be used when multiple threads could be running. Nobody with any competence at multi-threading would ever attempt to use IsEmpty when multiple threads are running. All you are demonstrating is that it is possible to be incompetent at mult-threading, and thread-safe components can't save you from that. This is not a surprise, and it does not advance your argument at all.

    Take my Queue. If one or more threads want to push information to one or more other threads, all the supplier threads have to do is call Queue.Enqueue(T value), and all the listener threads need to do is call WaitDequeue() with some timeout (so they can periodically check other things, including whether they should exit). Done! Information safely gets from suppliers to listeners, with no other synchronization necessary.

    Now, imagine the queue wasn't thread-safe! The whole thing becomes useless. You can't just wrap all access in an external lock; the suppliers wouldn't be able to call Enqueue when a listener was blocked in WaitDequeue. Thread-safety makes the queue useful.

    Eric, I will be blunt once again (and once again, not unkindly): I think you might want to consider whether your admitted lack of experience with multi-threading is the real issue here.

    It is completely unsurprising that IsEmpty is useless in a multi-threaded scenario. It is completely unsurprising that any Queue that fails to expose an atomic check-and-dequeue will be useless regardless of thread-safety. It is completely unsurprising that even useful thread-safe components do not absolve the user of worrying about their own code, and issues like deadlocks. Multi-threading is hard, and the fact that thread-safe components fail to make it as easy as single-threading does not mean that thread-safe components are useless, or that the term "thread-safe" is badly defined.

    My final word on this: do some serious multi-threaded coding, write some useful thread-safe components, and then we'll see how "almost completely useless" you think the term and the compoents are.

  • Hi,

    Imagine I have some code, for which I give the specifications : "This is a simple addition of two integers, but if more than one thread access it at a time, it will completely explose and burn your computer".

    Then I run it, create 2 threads and execute my code ... it explodes and burn my computer.

    This is the "correct" behavior (as stated in the specifications) and then my code is thread-safe !

    ...

    That's why coders should never write specs ;)

    Bye!

  • I think nobody should ever attest to an "object" being thread-safe. you may have stateful behaviour or may have stateless method calls (each method requires all parameters to be operate on and does not maintain state).

    The most we could attest to is an "operation" is thread-safe. Once you follow this definition, you may alternatively attest that "each" operations in Queue class is thread-safe.

  • ""Thread-safe" has never been about any of these sorts of requirements. It has only ever been about /one/ thing: whether or not the code works as documented when called by multiple threads."

    That's pretty straighforward. "Does the code work as documented?" Simple.

    "Information safely gets from suppliers to listeners, with no other synchronization necessary.

    Now, imagine the queue wasn't thread-safe! The whole thing becomes useless."

    I am imagining receiving your component with no documentation. By your definition above, since I have no documentation, the component cannot behave as documented, and is therefore not thread-safe. Strangely, I go ahead and use the component and, being well-designed, it is not useless. It's a great piece of code, and I find it indispensable, because it is, in fact, thread-safe, regardless of any documentation or lack thereof. Clearly, the definition of thread-safety revolving around the documentation is not helpful.

    Also, I believe the small aside concerning the exact definition of "begging the question" is not merely coincidental. It clearly illustrates that, however well documented the original use of some term may be, that definition is vague and useless when faced with overwhelming popular opinion. In similar fashion, because there are likely to be far more consumers of your code than creators of it(!), you might find that catering to the popular definition is of more use and promotes more clear communication. Certainly, not caring what definition of terms is popular amongst the majority of a product's customer base will neither promote that product, nor help people to use it correctly if use of the term is critical to the use of the product.

    To be blunt (but not unkind), I think your surprise/outrage at how little people understand the term (particularly in the face of your less than stellar definition) says much more about your experience with large groups of code consumers than anything else. It's Customer Relations 101. Most people haven't taken Customer Relations 101 (particularly customer service agents!). Their misunderstanding of threading terms says much about the usefulness, if not meaningfulness, of those terms.

  • If you updated wikipedia you'ed need to re-write your opening analogy. You wouldn't want to be referenceing stale data.

Page 2 of 2 (30 items) 12