Feedback requested: Enumerating Concurrent Collections

Feedback requested: Enumerating Concurrent Collections

  • Comments 24

The June 2008 CTP of Parallel Extensions contained a first look at some of the work we're doing to augment the .NET Framework with a set of additional coordination data structures that aid in the development of highly concurrent applications.  This included two thread-safe collections, ConcurrentQueue<T> and ConcurrentStack<T>, and it's probably not giving away too much to say that we're working diligently on additional collections that aid in key scenarios for parallel programming.  For the most part, these collections are similar to the exsiting non-thread-safe counterparts in the .NET Framework, except with APIs designed with concurrency patterns in mind, and of course implementations that tolerate multithreaded access.  Some of these designs come naturally and, from an implementation standpoint, are easily enabled.  However, there are times when we have to make significant trade-offs in functionality or performance in order to achieve certain designs, and for those situations, it's very important that we have a good understanding of the key scenarios and use cases in which the collections will be used.

One important design point we keep in mind for all of the collections we're working on is how to deal with concurrent modifications while a collection is being enumerated.  With the standard collections in the .NET Framework today, modifications to a collection invalidate any outstanding enumerators.  For example, consider the following code:

var q = new Queue<int>(new[] { 1, 2, 3, 4, 5 });
foreach (var item in q)
    if (item <= 5) q.Enqueue(item * 6);

This will compile fine, but upon execution, the call to q.Enqueue inside of the foreach loop will throw an InvalidOperationException: "Collection was modified after the enumerator was instantiated."  In contrast, if you change the Queue<int> to a ConcurrentQueue<int>, e.g.:

var q = new ConcurrentQueue<int>(new[] { 1, 2, 3, 4, 5 });
foreach (var item in q)
    if (item <= 5) q.Enqueue(item * 6);

no exception will be thrown, and upon examination of the queue after the loop, you'll find it to contain 10 elements.  This is an explicit design decision we've made to break the mold from the existing collections, the idea behind such a change in direction being that there are key scenarios where one thread needs to be enumerating while other threads are reading and writing to the collection (of course, as shown in this example, it could be the same thread reading and writing as the one enumerating).

This does raise some questions, of course, especially around the guarantees that can be provided for such an enumeration.  If data is added to the list while another thread is enumerating, must that enumerating thread see the new data?  If data is removed while another thread is enumerating, must that enumerating thread not see the new data?  Is it acceptable for data to be observed twice in the same enumeration or not at all due to the organization of the collection being changed concurrently?  And so forth.  For some collections, like ConcurrentStack<T> and ConcurrentQueue<T>, we're able to provide moment-in-time snapshot semantics, but for other collection types, doing so isn't necessarily possible without incurring serious performance penalties.

Given this, we're very curious to hear your feedback on a few things:

  1. Given the standard array of collections that exist in .NET, if you had thread-safe versions of them, are there scenarios where you would benefit from being able to enumerate concurrently with other threads modifying the same collection?  We already know about some, but we'd love to hear about more.
  2. Assuming you do have scenarios for #1, what are the minimum guarantees you'd need about the data returned in the enumerator for it to be useful?  For example, you could say for a thread-safe dictionary that if there are no concurrent modifications (adds/updates/removals), you'll get back in the enumeration exactly what's in the dictionary, and if there are concurrent accesses, you'll never get back something that wasn't in the dictionary, you may not see concurrent adds or updates, and you may still see items that are concurrently removed.
  3. If we did support enumeration on collections that were thread-safe in the face of concurrent modifications, would you ever want the ability to revert to the "throw on modifications" behavior?
  4. Finally, what are the most important collections that you'd like to see thread-safe and scalable counterparts for?

Thanks in advance for any input you may have!

Leave a Comment
  • Please add 3 and 3 and type the answer here:
  • Post
Thanks for sharing your comment! If your comment doesn't appear right away, please be patient as it may take a few minutes to publish or may require moderation.
  • Your problem can be compared to the problem resolved (or not) by database builders. In SQL Server this is called ISOLATION LEVEL.

    SQL Select statements (for reading a collection) have hints (options), among them "nolock" (reader may see new entries inserted by another thread) and "readpast" (reader will never see new entries) which can fully work as parallelized.

    So it may be a good idea to provide either one enumerator with an isolation level option, or at least two enumerators which behave differently: one for a nolock like reading, the other for a readpast like reading.

    I hope I'm not off subject !

  • Frank, thanks for the suggestion.  

    Thomas, thanks for the feedback.  There are a multitude of cases where seeing the latest and greatest version of a collection isn't all that important, and for those cases we've now heard quite a bit of feedback that support for concurrent enumeration is crucial.  As Softlion points out, this is very similar in nature to isolation levels in databases, and they have been used successfully for a long time.  Still, I definitely understand your concerns, as such capabilities do make it possible for developers to more easily make mistakes, assuming that the data they're looking at is both a snapshot and consistent with the current state of the collection.

    Softlion, thanks for the comparison.  You're not off subject at all.

  • PingBack from

  • This (below link) article describes a Concurrent collection in C# - MultiMap (similar to Dictionary)

  • I second all the requests for the concurrent priority queue implementation. It is badly needed in the framework, and being concurrent is paramount. Thanks!

  • The ability to support a "delete what doesn't belong" pattern is very useful, and I see no reason an iEnumerable should not be allowed to support such a thing (even though some styles of collection might have trouble with implementing it, I see no reason for the contract to require an exception in cases where an enumeration would otherwise be able to handle things sensibly).

    If one is implementing a new design, I would suggest a "set options" method, along with an "available options" property, which would specify the acceptable/available behaviors for an enumerator in case of insertion and deletion (the bitmap passed to the method would offer a choice of acceptable options.  If none of the acceptable behaviors were available, an exception would be thrown when attempting to set the option.  Even if there were no settable options, the method could do nothing if the calling program's acceptable choices would work, and throw an exception if they would not.

    For most scenarios, I would suggest that the preferred handling would be to specify that (1) an object which is inserted during an enumeration may or may not appear in that enumeration, with or without a reset; (2) an object which is deleted during an enumeration may or may not appear even with a reset, but if the object has already been enumerated it will not reappear until a reset; (3) all objects which existed prior to the start of enumeration and which are not deleted will be enumerated once each; (4) if an ordering is specified for the enumerator, items will be output consistent with that order (e.g. if a items are to be enumerated in alphabetical order and "aardvark" is added after "elephant" has been enumerated, "aardvark" should not be included in that enumeration); (5) ordering of items following a reset is arbitrary unless specified otherwise.

    If one is designing a new interface for an enumerable type, it may be helpful to add a 64-bit 'ChangeSeed' value which, if sampled before the start of an enumeration and after its completion, will indicate whether anything in the collection may have changed during the enumeration.  There are many times when throwing an exception because something changed would be very nasty behavior, but knowing that something changed would nonetheless be very useful.

  • Here's a scenario I'm looking at presently:

    An highly scalable distributed in-memory architecture, for millions of users. On a single server, you want all operations done in memory. Instead of using a disk based database with caching, the primary source of data is in memory. However, you want to synchronize changes to some data on disk (in a database) as live backup.

    On solution I'm looking at (that I'll have to benchmark) is to use queue(s) to flag updated data objects. Then, background process(es) could navigate the queue to make changes to disk while emptying objects from the queue while going through the queue.

    During the queue navigation, objects can be added to the queue from other in-memory updates.

    In that scenario, it does not really matter if added objects to the queue are visible while navigating the queue. If we reach the end, we can check if new objects where added to the queue. If not, we can sleep a bit and go back to check the queue.

    Of course, the example is simplified, but you get the idea. Such system would need to be benchmarked and probably need some multitheading balance to make sure the queues don't grow faster than they can eventually be emptied.

  • PingBack from

  • PingBack from

Page 2 of 2 (24 items) 12