What is this thing you call "thread safe"?

What is this thing you call "thread safe"?

Rate This
  • Comments 30

Caveat: I am not an expert on multi-threading programming. In fact, I wouldn't even say that I am competent at it. My whole career, I've needed to write code to spin up a secondary worker thread probably less than half a dozen times. So take everything I say on the subject with some skepticism.

A question I'm frequently asked: "is this code thread safe?" To answer the question, clearly we need to know what "thread safe" means.

But before we get into that, there's something I want to clear up first. A question I am less frequently asked is "Eric, why does Michelle Pfeiffer always look so good in photographs?" To help answer this pressing question, I consulted Wikipedia:

"A photogenic subject is a subject that usually appears physically attractive or striking in photographs."

Why does Michelle Pfeiffer always look so good in photographs? Because she's photogenic. Obviously.

Well, I'm glad we've cleared up that mystery, but I seem to have wandered somehwat from the subject at hand. Wikipedia is just as helpful in defining thread safety:

 "A piece of code is thread-safe if it functions correctly during simultaneous execution by multiple threads."

As with photogenicity, this is obvious question-begging. When we ask "is this code thread safe?" all we are really asking is "is this code correct when called in a particular manner?" So how do we determine if the code is correct? We haven't actually explained anything here.

Wikipedia goes on:

"In particular, it must satisfy the need for multiple threads to access the same shared data, ..."

This seems fair; this scenario is almost always what people mean when they talk about thread safety. But then:

"...and the need for a shared piece of data to be accessed by only one thread at any given time."

Now we're talking about techniques for creating thread safety, not defining what thread safety means. Locking data so that it can only be accessed by one thread at a time is just one possible technique for creating thread safety; it is not itself the definition of thread safety.

My point is not that the definition is wrong; as informal definitions of thread safety go, this one is not terrible. Rather, my point is that the definition indicates that the concept itself is completely vague and essentially means nothing more than "behaves correctly in some situations". Therefore, when I'm asked "is this code thread safe?" I always have to push back and ask "what are the exact threading scenarios you are concerned about?" and "exactly what is correct behaviour of the object in every one of those scenarios?"

Communication problems arise when people with different answers to those questions try to communicate about thread safety. For example, suppose I told you that I have a "threadsafe mutable queue" that you can use in your program. You then cheerfully write the following code that runs on one thread while another thread is busy adding and removing items from the mutable queue:

if (!queue.IsEmpty) Console.WriteLine(queue.Peek());

Your code then crashes when the Peek throws a QueueEmptyException. What is going on here? I said this thing was thread safe, and yet your code is crashing in a multi-threaded scenario.

When I said "the queue is threadsafe" I meant that the queue maintains its internal state consistently no matter what sequence of individual operations are happening on other threads. But I did not mean that you can use my queue in any scenario that requires logical consistency maintained across multiple operations in a sequence. In short, my opinion of "correct behaviour" and your opinion of the same differed because what we thought of as the relevant scenario was completely different. I care only about not crashing, but you care about being able to reason logically about the information returned from each method call.

In this example, you and I are probably talking about different kinds of thread safety. Thread safety of mutable data structures is usually all about ensuring that the operations on the shared data always operate on the most up-to-date state of the shared data as it mutates, even if that means that a particular combination of operations appears to be logically inconsistent, as in our example above. Thread safety of immutable data structures is all about ensuring that use of the data across all operations is logically consistent, at the expense of the fact that you're looking at an immutable snapshot that might be out-of-date.

The problem here is that the choice about whether to access the first element or not is based on "stale" data. Designing a truly thread-safe mutable data structure in a world where nothing is allowed to be stale can be very difficult. Consider what you'd have to do in order to make the "Peek" operation above actually threadsafe. You'd need a new method:

if (!queue.Peek(out first)) Console.WriteLine(first);

Is this "thread safe"? It certainly seems better. But what if after the Peek, a different thread dequeues the queue? Now you're not crashing, but you've changed the behaviour of the previous program considerably. In the previous program, if, after the test there was a dequeue on another thread that changed what the first element was, then you'd either crash or print out the up-to-date first element in the queue. Now you're printing out a stale first element. Is that correct? Not if we always want to operate on up-to-date data!

But wait a moment -- actually, the previous version of the code had this problem as well. What if the dequeue on the other thread happened after the call to Peek succeeded but before the Console.WriteLine call executed? Again, you could be printing out stale data.

What if you want to ensure that you are always printing out up-to-date data? What you really need to make this threadsafe is:

queue.DoSomethingToHead(first=>{Console.WriteLine(first);});

Now the queue author and the queue user agree on what the relevant scenarios are, so this is truly threadsafe. Right?

Except... there could be something super-complicated in that delegate. What if whatever is in the delegate happens to cause an event that triggers code to run on another thread, which in turn causes some queue operation to run, which in turn blocks in such a manner that we've produced a deadlock? Is a deadlock "correct behaviour"? And if not, is this method truly "safe"?

Yuck.

By now you take my point I'm sure. As I pointed out earlier, it is unhelpful to say that a building or a hunk of code is "secure" without somehow communicating which threats the utilized security mechanism are and are not proof against. Similarly, it is unhelpful to say that code is "thread safe" without somehow communicating what undesirable behaviors the utilized thread safety mechanisms do and do not prevent. "Thread safety" is nothing more nor less than a code contract, like any other code contract. You agree to talk to an object in a particular manner, and it agrees to give you correct results if you do so; working out exactly what that manner is, and what the correct responses are, is a potentially tough problem.

************

(*) Yes, I'm aware that if I think something on Wikipedia is wrong, I can change it. There are two reasons why I should not do so. First, as I've already stated I'm not an expert in this area; I leave it to the experts to sort out amongst themselves what the right thing to say here is. And second, my point is not that the Wikipedia page is wrong, but rather that it illustrates that the term itself is vague by nature.

 

  • It can get even more complicated.  In our implementation of the work-stealing queue (based on the excellent work of Danny Hendler, et al), we have thread-safety in one method only, namely TrySteal().  The other methods, Push() and TryPop() are by-design always called from the same thread.  Calling them from different threads would be a violation of the contract.  However, TrySteal() can be called by a number of different threads and not corrupt the state of the queue.  What's even more amazing is that the implementation achieves this without any locks or loops!

    For those curious, the C# implementation of LockFreeWorkStealingDeque is available under Apache 2 in MindTouch Dream.

  • Wouldn't putting access to data structures that mutate in a critical section(lock) be better? Thread safe (simultaneous access by many threads) should only be for code that dont handle shared data structures...

  • In this particular example, the queue is perfectly thread safe, the code using the queue isn't. The queue is only responsible for itself, the fact that you're doing multi-threaded operations on the queue implies that any information you get from the queue can be out of date a few moments later.

    Your example code does not take into account the fact that the queue can be modified at any time by other threads, and your code is thus not thread safe. If you need multiple calls to the queue to be consistent, you have an unguarded critical path in _your code_ not in the queue's, and it's your responsibility to guard it. In this case it could be solved by synchronizing on the queue.

  • When you reference Wikipedia, especially by link as you have done, it's especially important to use the "Permanent Link" button in the bottom left. That allows you to permanently link to the same version of the article that you saw.

    Otherwise what you're linking to could be vandalized (popular articles frequently are, though they are corrected that much more quickly,) and your users could see something other than what you intended, or problems that you point out could be corrected, and your point is moot. :)

  • > As with photogenicity, this is obvious question-begging.

    How is this question-begging?

    Question begging does not mean "to avoid providing the answer to a question" it means that a new question needs to be answered as a result of the answer to the original question.

    That is all.

    No, it doesn't mean that. Let me give you another example of question begging.

    Suppose I asked "why is diamond hard but butter is soft?" and you answered "diamond and butter are both made out of atoms; the atoms of diamonds are hard and the atoms of butter are soft." You would have begged the question; your answer to my question "why are some things hard and some things soft" is "because some things are made out of stuff that is hard and some things are made out of stuff that is soft" -- that is, you've avoided answering the question by providing an "explanation" that itself cannot be understood without answering the original question -- namely, why is some stuff hard and some stuff soft? This pseudo-explanation has no predictive power; it doesn't tell us anything new, it just circles back on itself. 

    A non-question-begging answer would be "diamond and butter are both made of atoms; the atoms of diamond are all identical and arranged in a stable, rigid lattice where every point in the lattice is reinforced by a strong bond to four other points. The atoms of butter are a disorganized collection of many different atoms that hold weakly to each other. It takes only a small force to disrupt the loose arrangement of butter atoms but a very large force to disrupt the strong arrangement of diamond atoms."

    Now, this explanation does *raise* more questions. It raises questions like "why are some lattices strong and some weak?" and "why are some objects composed of many different kinds of atoms, and some composed of just one atom?" Question-begging is not the act of raising more questions. Every explanation raises more questions. This particular explanation is testable, and has predictive power; we can investigate the hardness or softness of other substances, and make predictions about what sorts of atomic structures they will have -- or, vice versa, we can look at an atomic structure and try to figure out from it how hard the substance will be.

    My point here is that "because she's photogenic" is question-begging. Why does she look so good? Because she's photogenic. Why is she photogenic? Because she looks so good. We have learned nothing about photogenicity (or Ms. Pfeiffer).

    Similarly, if you ask "why is this code thread-safe?" and the answer is "because it can be correctly called on multiple threads", you've begged the question. Why is it thread-safe? Because it's correct. Why is it correct? Because it's thread-safe. Again, we have learned nothing about the nature of thread safety.

    -- Eric

  • > Question begging does not mean "to avoid providing the answer to a question"

    Correct, it does not mean that.

    > it means that a new question needs to be answered as a result of the answer to the original question.

    It doesn't mean that either.

    The way it's used in the article is correct, see for more information: http://en.wikipedia.org/wiki/Begging_the_question

  • This raises the bigger question - is Michelle Pfeiffer  thread safe?

  • >In this particular example, the queue is perfectly thread safe, the code using the queue isn't.

    I am inclined to agree. I think that trying to enforce some sort of "thread-safe" contract on the caller is too much work. I was just thinking of the LINQ Select where the selector takes anything, and gives you anything. I could make my selector throw and exception, but I don't think that is the fault of the Select statement.

    I see it as an implicit contract that the caller of Select will not pass in an invalid function (or one with exceptions, etc...), and if the caller does pass in garbage, then it is the responsibility of caller to catch and handle the exception.

    Now to turn this around on the "thread-safe" issue. I don't believe stateless method calls like Select should worry about anything but what they are designed to do. Even if the selector has state wrapped in a closure, I don't think that is something Select has to worry about.

    The wrench in the works is shared state like a queue, and I don't have an answer for that one.

  • Hey throw some shared memory in there for yet more fun.

  • I'd correct the Wikipedia, except the wikipedia doesn't allow you to know anything, it merely allows you to paraphrase other references. The definition in the wikipedia as it was at the time of that article is patently wrong.

    "A piece of code is thread-safe if it functions correctly during simultaneous execution by multiple threads."

    That's called "thread lucky", not "thread safe".

    A piece of code is thread safe if it

    a) _always_ functions according to spec irrespective

    b) of the number of threads running,

    c) or the number of threads invoking that code simultaneously,

    d) or the order in which the code is invoked by those threads,

    e) or the timing and sequence of external events that may trigger a thread context switch.

    That's what Thread Safety is... but it is incredibly hard to _prove_ anything is thread safe!

    Note this is subtly different from multi-processor safe or scheduler safe, which are somewhat harsher requirements. (However I consider it Bad Form to code to the assumption a particular scheduler or number of CPU's!)

    Alas, the Queue class _is_ thread safe. End of Story.

    The initial code examples you give using the Queue class isn't. Using Thread unsafe code without appropriately serializing access to it makes your code unsafe.However, using Thread Safe code does _not_ make your code Thread safe. So going back to the "meaning of Thread Safety" and what to do about the fact proving Thread Safety is so hard...

    Instead of trying to prove code is thread safe... we can either design our code to be so simple, that it is obviously thread safe.

    Or we, as is usual for this industry, design it so complex that there are no obvious thread safety issues.... and then rely on inspection to round up the usual suspects...

    The most usual causes of thread unsafety come down to one of...

    * Thread races. This is the problem the blogger had. A thread race on access to the shared queue resource.

    * Deadlocks.

    * Caching invalidation (Usually the failure to use the volatile in the rather rare cases where needed.)

    * Priority Inversion.

    * Starvation.

    While that may be a shortish list... the ways in which they may occur are

    legion and incredibly subtle.

    Why? Thread safety is a global property of a system, not a local property. (Although it is possible to code in such a manner that it _is_ a local property. In fact I heartily recommend _always_ coding that way for many other Good Reasons!)

  • I disagree. The fact that thread-safe code can be invoked by broken code does not mean that the term "thread-safe" is meaningless.

    I didn't say it was meaningless. I said it was vague. -- Eric

    The term "thread-safe" is perfectly well-defined. It means that the code in question will operate correctly regardless of the number and relative timing of threads calling into it.

    And now you are re-stating my point; I'm glad we agree. "Thread safe" means "correct". What does the entirely vague "correct" mean? That depends on what the code is supposed to do in a particular situation. We haven't said anything new. -- Eric

    It specifically does /not/ imply that any external code that invokes the thread-safe code is similarly guaranteed to operate correctly. Obviously it cannot provide any such guarantee, as the author has no control over the external code . The examples given in the post are thus completely beside the point. They in no way show that the term "thread-safe" is meaningless. They only show that it is possible to write broken code that uses thread-safe code.

    Obviously, the author of a piece of code is responsible for documenting how the code works: if Queue.Peek() will throw on an empty queue, this must be documented. And, of course, a thread-safe queue is more useful if it provides an atomic Peek() such as was suggested. But the bottom line is that it is not the author's responsibility if a user of the code is not sufficiently thread-savvy to know that calling IsEmpty and Peek sequentially introduces a hazard. That's Threading 101.

    Apparently we are in violent agreement. My whole point was that simply saying that an object is "thread-safe" tells you almost nothing about how to correctly use that object. Rather, the object must be extensively documented so that its exact contract can be stated. -- Eric

  • Well, I cannot, obviously, speak for all people who ask whether or not a piece of code is "thread-safe", but what I mean by this question is, basically, "can I be sure that this code will have no surprises for me when I execute if from within multiple threads, or, more specifically, does each thread that runs this code have to worry about the other threads, or can it 'believe it's the one and only', so to speak?"

    Two points. First, consider my example of a queue that you can ask if its empty, but then immediately peeking it throws an exception. It certainly wouldn't do that if there were only one thread. So for you, is this behaviour surprising? And is the object therefore not "thread-safe"?

    Second, I note that "Denis finds the object's behaviour unsurprising" is not a particularly useful definition of "thread safe". (If you'd like, I can simply forward all the questions I get about thread safety to you, so that you can tell people whether you're surprised or not.) -- Eric

    And, personally, I won't even insist on the code being "correct" (having no bugs or gotchas); I would just like to know that those bugs and gotchas are the same (as are the ways to work around them), no matter how many threads run the thing.

    In short, "thread-safe" means, to me, "don't have to worry about multi-threading: don't need any special care, any additional code to manage it, and so on".

    Of course, this is a highly subjective definition: I would, for example, define "financial security" as "don't have to worry about money: never so little as to think where to find more, and never so much as to think what to do with it; just enough to never think about it". Not many people will agree with that. :-)

    But, back to the subject, my definition of thread-safety would work well for any server-side web application: no matter how many people hammer the site, the code behaves the same (well, maybe, a little slower at the end), full stop. Not such a bad thing, is it?

  • RogueWave/Sun define the terms mt-hot vs mt-safe:

    http://www.roguewave.com/kb/index.php?View=entry&EntryID=1169

  • "Thread safe" is a bit of a C throw-back, where you can determine if a particular function is thread safe if it isn't dependent on any stored state.  However an object could only be deemed thread safe in the same manner if it didn't maintain any state, which wouldn't be much of an object.

  • "Thread-safe" is not at all vague.

    It's a statement that if each thread's rely condition is satisfied then that thread satisfies its guarantee condition and each thread's guarantee condition implies the others' rely conditions.

    It's only vague when we choose to rely on intuition rather than mathematics when we describe a system.

Page 1 of 2 (30 items) 12