Task Parallel Library changes since the MSDN Magazine article

Task Parallel Library changes since the MSDN Magazine article

Rate This
  • Comments 17

Back in the October 2007 issue of MSDN Magazine, we published an article on the beginning stages of what has become the Task Parallel Library (TPL) that's part of the Parallel Extensions to the .NET Framework.  While the core of the library and the principles behind it have remained the same, as with any piece of software in the early stages of its lifecycle, the design changes frequently. In fact, one of the reasons we've put out an early community technology preview (CTP) of Parallel Extensions is to solicit your feedback on the APIs so that we know if we're on the right track, how we may need to change them further, and so forth.  Aspects of the API have already changed since the article was published, and here we'll go through the differences.

First, the article refers to System.Concurrency.dll and the System.Concurrency namespace.  We've since changed the name of the DLL to System.Threading.dll, and TPL is now contained in two different namespaces, System.Threading and System.Threading.Tasks.  AggregateException, the higher-level Parallel class, and Parallel's supporting types (ParallelState and ParallelState<TLocal>) are contained in System.Threading, while all of the lower-level task parallelism types are in System.Threading.Tasks.

Next, the second page of the article states "if any exception is thrown in any of the iterations, ... the first thrown exception is rethrown in the calling thread."  The semantics here have changed such that we now have a common exception handling model across all of the Parallel Extensions, including PLINQ.  If an exception is thrown, we still cancel all unstarted iterations, but rather than just rethrowing one exception, we bundle all thrown exceptions into an AggregateException container exception and throw that new exception instance.  This allows developers to see all errors that occurred, which can be important for reliability.  It also preserves the stack traces of the original exceptions.

In the section on aggregation, the article talks about the Parallel.Aggregate API.  Since the article was published, we've dropped this method from the Parallel class.  Why?  Because a) PLINQ already supports parallel aggregation through the ParallelEnumerable.Aggregate extension method, and b) because aggregation can be implemented with little additional effort on top of Parallel.For if PLINQ's support for aggregation isn't enough.  Consider the example shown in the article:

int sum = Parallel.Aggregate(0, 100, 0,
                   delegate(int i) { return isPrime(i) ? i : 0; },
                   delegate(int x, int y) { return x + y; });

We can implement this with PLINQ as follows:

int sum = (from i in ParallelEnumerable.Range(0, 99)
                where isPrime(i)
                select i).Sum();

Of course, this benefits from LINQ and PLINQ already supporting a Sum method.  We can do the same thing using the Aggregate method for general reduction support:

int sum = (from i in ParallelEnumerable.Range(0, 99)
                where isPrime(i)
                select i).Aggregate((x,y) => x+y);

If we prefer not to use PLINQ, we can do a similar operation using Parallel.For (in fact, this is very similar to how Parallel.Aggregate was implemented internally):

int sum = 0;
Parallel.For(0, 100, () => 0, (i,state)=>
{
    if (isPrime(i)) state.ThreadLocalState += i;
},
partialSum => Interlocked.Add(ref sum, partialSum));

Here, we're taking advantage of the overload of Parallel.For that supports thread-local state.  On each thread involved in the Parallel.For loop, the thread-local state is initialized to 0 and is then incremented by the value of every prime number processed by that thread.  After the thread has completed processing all iterations it's assigned, it uses an Interlocked.Add call to store the partial sum into the total sum.  In fact, if you found yourself needing Aggregate functionality a lot, you could generalize this into your own ParallelAggregate method, something like the following:

static T ParallelAggregate<T>(
    int fromInclusive, int toExclusive, T seed,
    Action<int> selector, Action<T,T> aggregator)
{
    T result = seed;
    object aggLock = new object();
    Parallel.For(fromInclusive, toExclusive, () => initialValue, (i,state) =>
    {
        state.ThreadLocalState =
            aggregator(state.ThreadLocalState, selector(i));
    },
    partial => { lock(aggLock) result = aggregator(partial, result); } );
    return result;
}

With this, you can use ParallelAggregate just as is done with Parallel.Aggregate in the article:

int sum = ParallelAggregate(0, 100, 0,
                   delegate(int i) { return isPrime(i) ? i : 0; },
                   delegate(int x, int y) { return x + y; });

Moving on to the Task class, we've made some fairly substantial changes to the public facing API.  Here's a summary of the differences from what's described in the article:

  • As shown in the article, Task does not have a base class.  In the CTP, Task now derives from TaskCoordinator, a base class which provides all of the waiting and cancelation functionality.
  • The Task class described in the article exposes a constructor that accepts an Action delegate.  During construction, the Task is queued for execution.  Instead, we changed this so that Task doesn't expose any constructors.  Tasks are now created using the Task.Create static method; we felt this factory approach was clearer than the constructor approach... please let us know what you think regarding this change.  Were we right?  Is one approach better than the other?  Would it be a positive or a negative if both approaches were supported?
  • The type of the delegate used by Task.Create is Action<Object> rather than just Action, and overloads of Task.Create support providing a state argument that will be passed to the Task's action.
  • The article states that calling Cancel on a Task will cancel that task and all of the tasks that task created.  We've modified this approach so that the model is now opt-in, meaning that by default tasks created by a task will not be canceled when the parent task is canceled.  That behavior can be opted into, however, by specifying the RespectParentCancelation flag when the child Task instances are created.
  • The article describes a Task<T> type that derives from Task.  This has been renamed Future<T>.  As with Task, it no longer exposes a constructor that accepts a Func<T> delegate.  Instead, Future<T> provides a static Create factory method.  In addition, to better support type inference, we provide a non-generic Future class that exposes generic static Create<T> methods.  This makes it possible to create a Future<T> where T is an anonymous type returned from the provided Func<T>.
  • The article describes the ReplicableTask class as deriving from Task.  For the CTP, we've gotten rid of ReplicableTask, but we've kept the concept.  To create a Task as a replicable task, provide the TaskCreationOptions.SelfReplicating value to the Task.Create method when creating the Task.
  • The article describes the exception handling model for self-replicating tasks being one where only one exception is rethrown even if multiple exceptions were thrown from multiple invocations of the replicable task.  As with the loop exception handling model described earlier, we now collect all thrown exceptions into one new AggregateException and throw that exception instead of picking one of the thrown exceptions at random.

In addition to changes to the Task class, there have also been changes to the TaskManager class described in the article:

  • In the article, TaskManager exposed two constructors, one that's parameterless and one that accepts an integer representing the maximum concurrency level for the TaskManager.  We've modified that second constructor to instead accept a TaskManagerPolicy, a type which represents the allowed behavior of the TaskManager, providing information such as desired concurrency level and execution context flow behavior.
  • Similarly, rather than exposing a MaxConcurrentThreads property, TaskManager now exposes a TaskManagerPolicy property, which allows the current policy to be retrieved and a new policy to be provided dynamically.
  • The article shows Task providing a constructor that accepts a TaskManager, in order to associate that Task with a specific TaskManager.  Such behavior is still possible, but the TaskManager is provided to overloads of Task.Create.

That sums up the changes we've made.  Even with these changes, the article should still provide you with a good overview of the library and its intended usage.  For more information, check out the documentation that's included with the CTP, and stay tuned to this blog!  As mentioned before, we're very interested in your feedback on the API, so please let us know what you think (the good and the bad).

Leave a Comment
  • Please add 5 and 5 and type the answer here:
  • Post
  • I posted about changes we've made to the Task Parallel Library since we published the MSDN Magazine article

  • I really don't like the idea of not having constructors. Whenever I want to use a class for the first time, the first thing I look at it is its constructors.

    In addition, not having a constructor means we can never inherit from the class.

    And for what? Writing "Task.Create" instead of "new Task"?

  • Thanks for the feedback!  

    This is a design issue we went back and forth on.  We ended up with factory approach for several reasons.  One is that we got feedback with the ctor approach that it wasn't clear the tasks were being scheduled when they were constructed, and that doing so went against .NET design guidelines.  We explored alternatives, like not scheduling from the ctor and exposing a Start method, but this then required more code to be used to create a task, and it also made futures difficult to work with (it also gave some folks the impression through the API that the same task could be scheduled more than once).  We explored a variety of other options (including going with both the factory and the ctor approach), but at least for the CTP, we settled on just using the factory. Are there particular scenarios you find more difficult with the factory approach than with the ctor approach?

    As far as the inheritance issue, this was an explicit design choice, as we're trying to keep it a closed system for now.  Are there important scenarios you're unable to implement due to this decision?

  • Question: Why does Future descend from Task, instead of aggregating it?

    Sure, the implementation of a future is pretty similar to the implementation of a task, but it seems to me that the usage patterns would be way different. I can't imagine creating a Future and passing it to a method that takes a Task parameter -- they're conceptually different; it wouldn't make sense. It feels like a violation of the Liskov Substitution Principle here, although admittedly, I haven't scrutinized the code yet.

    It feels to me like Task should be a sealed class, and so should Future. The intent is for users to extend them via aggregation (passing in a delegate) rather than inheritance. If they have a lot of public methods in common, then make an ITask, so that *if* there's an obscure case where you want to treat them the same, you can. Give them a common base class, if it makes sense for implementation. But it doesn't feel like one should descend from the other.

    At that point, my main complaint with the new factory -- its name -- could be addressed. I really like the idea of a factory; I like the idea of constructors doing nothing but constructing, and having a factory method when there's extra work to do (like scheduling a thread). But readability is essential, and I don't think "Create" makes it clear to the reader that any extra work is happening.

    For Task, I think the factory method's name should be more descriptive. Perhaps "CreateAndSchedule" (though "schedule" is pretty technical lingo for a library that's supposed to be making this stuff easier), or "CreateAndStart", or even just "Task.Start".

    For Future, again, the usage patterns are different. Conceptually, when you create a future, you're saying "here's a question that I will want the answer to later" -- the "and schedule" is implied by the very nature of a future. I think Future's factory method *could* be called "Create" without confusion. Or perhaps something a bit different -- but not necessarily the same name as on Task. Which is another reason they shouldn't descend from each other: it doesn't make sense to force them to have identical factory names.

  • That's great feedback, Joe, thank you!

    re: inheritance vs aggregation

    We did strongly consider the aggregation model, where a Future would contain a Task rather than derive from it.  And we haven't completely abandoned the idea, so it's good to know that someone might prefer that approach.  One thing that drove us away from the aggregation model was performance, with the extra object allocations and increase in instance size representing a non-trivial performance loss.  But aside from that, the idea that a Future<T> is a Task that adds a return value was appealing to folks, and that "is a" of course is a typical indication of an inheritance relationship.  There also are scenarios where you may want to treat a Future<T> as a Task, such as to wait on it or cancel it (sometimes in concert with other Tasks), though that can also be accomplished by passing around its contained Task or by having Future<T> still derive from a shared base class.  At the end of the day, there are pros and cons to both approaches, and for now we felt that the pros to the inheritance model (both from a usability and performance standpoint) won out over the pros of the aggregation model, but this kind of design decision is exactly why we've released an early CTP, to get feedback on it; nothing is set in stone.

    re: factory name

    Both Task.CreateAndStart and Task.Start were considered, and we opted away from "*Start" because it implied to some folks that the Task was being run immediately (like Thread.Start); "Scheduled" has the problem you mention, in that we don't necessarily want folks to have to think about "scheduling" things.  We went with Task.Create for the CTP (you're not only creating a Task instance, but also you're conceptually creating a task), but we'll certainly consider other names moving forward, especially if we get a lot of feedback like yours that Task.Create isn't clear.  If you have other suggestions, please do pass those along.

  • I'm with Joe on this one...

    I don't like the idea of a constructor if it is doing something more than just creating an instance of Task. That said, I think the same applies to factory methods. If I was to call Task.Create() I would assume that was exactly what it did - create an instance of Task. The next thing I would look for was to find some way to start/schedule/etc. the Task (not sure of the best nameing however).

    Andy

  • Thanks, Andy.  Assuming we were to stick with the factory approach, I'd be interested in hearing any naming suggestions you may have to replace "Create".

  • The basic issue around naming of {Start, Schedule, Run, Create} etc seems to be caused by the implicit use of a common static TaskManager when a Task.Create() is called.  Perhaps it would be a clearer if Task constructors and factory methods only create a Task, and that TaskManager.Run( new Task()) or similar is the default way to add and run a new task.

    If people are expecting synchronous or immediate execution, then perhaps add Async to naming.

    eg so that simple usages are

     Task t = new Task(..)

     t.RunAsync()

    or

     TaskManager.RunAsync(new Task(..))

    The idea of a syntactic rewrite of "new X()" to "X.Create()" often comes up.  I think that this usage should at least be standardised so that if we ever see Class.Create() we know that is a simple constructor call, nothing more.  But I think "new X()" is just fine and avoids confusion in an API.

    Perhaps also the common static TaskManager can have a special name so that confusion with instance TaskManagers is avoided.

  • Thanks, Mike!  It's a good suggestion, though it does introduce additional issues.  For example, can a task be run twice?  Can it be run twice concurrently (through RunAsync)?  If so, what happens with a Future<T> (that derives from Task) regarding its Value property (does a second invocation overwrite the first)?  How does that affect things like Completed events? Etc. And how should we deal with concurrent access to a Task that may or not have been started yet (e.g. I create a Task, pass it off to several threads, and they all try to run it). Then there are issues around expectations concerning other methods on Task... what if I call Wait on an unstarted Task?  And so on. And from a performance perspective, even if we come up with a clean design around this, there may be issues in terms of the cost of tracking all of this (extra space required in the types, interlocked operations used to ensure consistency, etc.) Obviously many if not all of these are issues that could be worked/designed around, but they all start adding levels of complexity that increase the concept count a developer needs to understand (and keep track of) to use the system.  My point (and this was largely a random stream of consciousness) is that regardless of name, one of the nice things about the static factory that both constructs and schedules a Task is that it's easy to use (the basic operation is one method call, and is very similar to ThreadPool.QueueUserWorkItem in concept), it eliminates the need to be concerned about the same task being run twice and all of the various ramifications that could result, and it keeps the API a bit simpler in that we don't need to provide support for running a previously created task.

    As always, though, we're keeping our minds open, and this kind of dialogue is exactly what we were hoping to get by releasing a CTP (so thank you, thank you to everyone participating).  If we get a lot of feedback that this would be to more customers' liking than the current design, we'll definitely take a good, in-depth look at it.

    And I'm still interested in other naming suggestions for the-API-currently-known-as Create. ;)

  • Hmm... seems like you just touched on a good possibility for a name: how about a static Task.Run() method? Creates a new task with the delegate you pass in, and returns the new Task instance.

    Of course, a name like Run doesn't make it immediately obvious that there's a return value; so you'd have less-experienced developers calling Task.Run() and ignoring the return value. But you know, I think that's fine. It'd act pretty much the same as QueueUserWorkItem: start this task, and don't bother me about it again.

    Actually, if I understand it correctly, tasks can be set up to have cascading cancels, e.g., task A creates task B with a certain flag; then task A gets canceled, which causes task B to automatically be canceled as well. So Task.Run() would be even a little bit cooler than QueueUserWorkItem.

  • How about Task.Prepare(...); ?

  • ...or Task.CreateAndPrepare(...);?

  • Task.Run, Task.Prepare, Task.CreateAndPrepare... cool, thanks for the suggestions.

  • Postback from:

    http://blog.rednael.com/2008/11/07/ParallelComputingWithSockets.aspx

    This is an article that goes deeper into parallelism for server applications which use sockets.

  • Please, also read the following article:

    http://blog.rednael.com/2009/02/05/ParallelProgrammingUsingTheParallelFramework.aspx

    It's an article about basic parallel programming. Examples in C# .Net included. Also, it describes a lightweight parallel framework to work with tasks. Opposed to some other frameworks, this one is very light and very easy to use.

    After reading this article, you should be able to write code using parallelism.

    Regards,

    Martijn

Page 1 of 2 (17 items) 12