All about Async/Await, System.Threading.Tasks, System.Collections.Concurrent, System.Linq, and more…
We're currently working on the Beta of .NET 4.0 (no dates to announce) and there are lots o’ new stuff for the Parallel Extensions. We hope you’re as excited about it as we are. Given that we have so much coming down the pipes, we decided to roll out posts on what’s coming in digestible chunks. What’s better to start with than the low-level infrastructure? In addition to the coordination data structures, you can expect posts on changes in the Task Parallel Library (TPL) as well as Parallel LINQ (PLINQ) and a few miscellaneous topics.
Since our last CTP in September ‘08, we’ve added a few new types to add to your parallel-programming arsenal.
ConcurrentBag<T> In short, ConcurrentBag<T> is a thread-safe, unordered collection of objects. Sounds a bit like a set, but unlike a set, ConcurrentBag<T> allows duplicates. So now you’re thinking, what’s so interesting about an unordered collection that allows duplicates? Truth be told, in a single-thread world, not very much. In a multi-threaded world, however, removing ordering restrictions allows us to do some pretty cool optimizations that makes for a really scalable and efficient type in certain situations.
You can think of ConcurrentBag<T> as a thread-safe linked list of thread-safe double-ended queues (deques). Each thread that touches the bag has it’s own deque to which it adds and removes items from the top. When any thread’s deque is empty, it pulls from the bottom of another thread’s non-empty deque. You can see why this is scalable and efficient: in many cases, there is no contention at all. Contention is only an issue when a thread does not have any items and is forced to “steal” items from another thread. It’s worth noting that the add-to-the-top-steal-from-the-bottom approach plays nice with the cache: threads get to work on items in nice contiguous chunks. In fact, this type of work-stealing queue works so well that it’s principally how the Thread Pool in .NET 4.0 load-balances work.
Just to make it clear, ConcurrentBag<T> isn’t always the fastest choice. Consider the single producer/single consumer scenario: the producer would always steal from the consumer’s deque – ConcurrentQueue<T> would be much more efficient. Scenarios in which threads both produce and consume values will benefit from ConcurrentBag<T> the most.
Figure 1. How threads access elements in a ConcurrentBag<T>.
ConcurrentLinkedList<T> and ConcurrentLinkedListNode<T> ConcurrentLinkedList<T> is essentially a thread-safe version of LinkedList<T>. Of course there are some caveats, this is parallel programming we’re talking about. :) Just inserting a value into a linked list becomes difficult in a multi-threaded world. Say you want to add items to a list in sorted order. With single-threaded linked lists, this is a simple as walking the list and finding the position where the left node is less than the value you want to insert and the right node is greater than or equal to that value. In a concurrent environment, between the time you’ve found the location and actually added the value, another value could have been inserted in that place or one of the neighboring nodes removed. Thus, ConcurrentLinkedList<T> provides methods that support these tasks with atomic operations, like the TryInsertBetween method:
list.TryInsertBetween( (left, right) => ((left.Value < item) && (right.Value > item)), item);
TryInsertBetween will walk the list and insert the value only when the supplied predicate returns true. It does all this atomically. If it fails, it returns false without changing the list.
Partitioner<T> and OrderablePartitioner<T> When TPL or PLINQ consume a data structure, they do their best to load balance the elements between threads. Of course, TPL and PLINQ don’t have detailed knowledge about the data sources, such as each structures internals, the values of the data, and the length of the data source – all of which are factors in execution time. The APIs simply see that a type implements IEnumerable<T> or IList<T> and start processing the elements based on enumeration or indexing. Often times, this means that significant performance gains are overlooked. Partitioners allow you to achieve these gains.
For example, say I’ve created a thread-safe collection that protects elements with striped-locking (where elements 0, 3, and 6 are protected by lock A, 1, 4, and 7 by lock B and 2, 5, and 8 protected by lock C). If we just use the standard enumerator and the pattern will look like: acquire lock A, acquire lock B, acquire lock C, acquire lock A, acquire lock B, and so on. This is hardly an efficient manner to partition out elements, especially if other threads are playing around with the data structure. It would be much more efficient if we acquired each lock just once. In a perfect world, if Parallel.ForEach decided that three threads wanted to enumerate this hypothetical data structure, we’d want to give all of the elements under lock A to one thread, the elements of lock B to another and the elements of lock C to the last, significantly reducing the amount of time spent acquiring locks and potentially waiting for locks to be given up.
We’ll be posting more detail on partitioners soon, but suffice it to say that Partitioner<T> allows you to create custom partitioning algorithms for any data source. OrderablePartitioner<T> does the same, but keeps account of each element’s original index so the data can be reordered when necessary. Each of these abstract classes supports both static partitioning as well as dynamic partitioning. In addition to these extension points, we’re providing a few out-of-the-box partitioners, but I’ll keep you in suspense – check back for more detailed posts on partitioners soon!
In addition to adding new types, we also decided to cut one out.
WriteOnce<T> In short, WriteOnce<T> was never really more than a simplification on top of Interlocked.CompareExchange and that doesn't provide enough value at this point to warrant a new type in the .NET Framework. Farewell!
There are lots of minor refactors and name changes coming in Beta 1 but there’s only one major change for the coordination data structures that we should call out: LazyInit<T>. I won’t list every name change an name space change but it’s important to see how LazyInit<T> has been refactored. Originally, LazyInit<T> was a struct which gave it all the nuances that come with a value-type. We’ve since refactored LazyInit<T> into four types:
In addition to the refactoring, Lazy<T> and LazyVariable<T> now have both thread-safe and non-thread-safe initialization modes for those that need the lazy initialization functionality without the overhead of synchronization.
In the Beta 1 release, you’ll find a few types for a unified cancellation model intended to be used throughout the .NET framework. Cancellation holds enough merit to warrant a dedicated post on it so we’ll be posting one soon. What’s exciting is that we’ve added cancellation support throughout TPL, PLINQ and the coordination data structures. For the data structures specifically, we’ve added cancellation overloads to any of our APIs that may block, including:
Again, there are too many minor changes to call out but just be aware that some types have moved to different namespaces and might have some APIs that were renamed.
There are two API additions worth calling out, however:
SpinWait.SpinUntil We realized that a lot of scenarios for SpinWait went something like this… Check a condition. If false, spin. Check a condition, if false, spin. Check a condition. If false spin… We decided to make it a tad easier by allowing you to pass a Func<Boolean> to SpinWait, causing it to spin until the condition was met or a timeout occurred.
ConcurrentStack<T>.PushRange and ConcurrentStack<T>.PopRange We’re all about squeezing out that last drop of performance and where we can, we do. Take ConcurrentStack<T> for example, which works on a single CompareAndSwap (CAS) operation if there is no contention. Many scenarios involve pushing or popping a series of items on or off the stack which typically results in looping through a collection, calling Push() on each item. This of course, results in N CAS operations and potentially a lot more if the stack is being contended on by multiple threads. Since we only need to do one CAS operation to update ConcurrentStack<T> anyway, we might as well build up a stack segment first and then push the whole segment onto the stack with a single CAS - reducing the number of potential retry CAS operations. Enter PushRange and PopRange, which do exactly that.
Check back soon for what’s new in TPL and PLINQ!
wow cool stuff :)
ive been dyin to get an update on this stuff :)
will there be channel9 interviews? :)
Glad to know you're interested; there's much more info coming.
I don't think we have anything planned for Channel9 but that's definitely good feedback for me to bring to the team. Thanks!
Please, post schematic interfaces of these new types in the next blog (or in comments here).
Also I'm curious as to how you have implemented per-thread/per-object storage (I mean ConcurrentBag<T>). Native Concurrency folks didn't blab how they have done it (combinable<T>) :)
Basically am I allowed to create let's say... 1'000'000 concurrent bags and access them chaotically from 100 threads on 32-bit machine?
For legal reasons, we can't post schematics until the beta releases (if it releases) and by that time, they'll be up on MSDN. :)
Regarding ConcurrentBag<T>, could you explain the scenario you're thinking about? When would you need to instantiate such a large number of the same data structure?
ConcurrentBag<T> is really useful when then the same set of threads are both adding AND removing elements from the same instance.
If you had 100 threads that each generated 5K elements and removed 5K elements (for a total 1M elements) in any order, this would much likely perform much better than most of the other collections.
If, however, you had a large set of producer threads that only added elements and a large set of consumer threads that only removed elements, the perf would be pretty terrible because every remove operation would result in a steal.
Hope that helps a bit.
Are you working with MS Silicon Valley R&D on providing PLINQ hooks into the wider multi-machine environment of DRYAD-LINQ? Is there any progress on making the DRYAD cluster architecture public, maybe as a set of tools in VS 2011, or should I be asking the R&D team that question?
Thanks for your question. We've looked at Dryad-LINQ and broader HPC integration in general but currently we don't have any plans to announce. As for what's going to happen with Dryad, I don't have much information on their plans. You might have better luck getting in touch with them directly. Sorry I didn't have too much to share!
In my previous post I described how to create a thread safe data cache using PFX. PFX however is scheduled
Is there a PushRange/PopRange on the ConcurrentQueue as well? This is something that I use all the time in my hand-rolled objects.
Also, is there any sort of UnqiueQueue concept? The use case is as follows.
1. I recieve some information tagged "123". I insert it into the queue for processing.
2. Before item "123" is processed, I get a new version. The original "123" is no longer needed.
3. I search the queue for the old one and replace it with the new version.
What I do, and would like at the framework level, is a way to indicate deduplication logic for the queue so work that doesn't need to be performed can be skipped.
The PushRange/PopRange APIs take advantage of the fact that a stack has only one "tail" to manage. It's because of this, that we can do a single interlocked operation to concatenate two stacks. A queue, on the other hand, has both a head and tail, and thus, doesn't have the same advantage. Therefore, there really wouldn't be much of a perf advantage for something like EnqueueRange/DequeueRange. If you just like the API as a matter of convenience, it could definitely be pulled off with a simple extension method.
Regarding UniqueQueue: Perhaps there's something I'm not understanding -- wouldn't a set or dictionary take care of this? Or is it just that you'd like the combination of uniqueness + FIFO ordering + thread-safety?
A number of improvements have been made to Parallel Extensions since the Visual Studio 2010 CTP across
PingBack from http://www.codedstyle.com/what%e2%80%99s-new-in-beta-1-for-parallel-linq-plinq-2/
PingBack from http://www.codedstyle.com/what%e2%80%99s-new-in-beta-1-for-parallel-linq-plinq/
PingBack from http://www.codedstyle.com/what%e2%80%99s-new-in-beta-1-for-parallel-linq-plinq-4/
We’re very excited that the .NET Framework 4 Beta is now available for public download, as .NET 4 has
Visual Studio 2010 Beta 1 est disponible pour les abonnés MSDN depuis quelques jours mais maintenant