What’s new in Beta 1 for the Task Parallel Library? (Part 3/3)

What’s new in Beta 1 for the Task Parallel Library? (Part 3/3)

Rate This
  • Comments 39

Related posts:

So what else is new in TPL for Beta 1 (finally)?  In the last post, we mentioned that TaskFactory offers more static helper methods than just StartNew.  In this post, we’ll cover those methods (FromAsync, ContinueWhenAll, and ContinueWhenAny) as well as the new TaskScheduler class.

FromAsync

In order to better to integrate the Asynchronous Programming Model (APM) with TPL, we added FromAsync to TaskFactory.  Here’s an example for reading asynchronously from a FileStream:

FileStream fs = ...;

byte [] buffer = ...;

Task<int> t = Task<int>.Factory.FromAsync(

    fs.BeginRead, fs.EndRead, buffer, 0, buffer.Length, null);

 

The overloads provided for FromAsync support all implementations of the APM pattern up to a certain number of parameters (which represent the vast majority of the APM implementations in the .NET Framework); additional overloads that work directly with IAsyncResults are also available to cover the stragglers.

FromAsync returns a Task that represents the asynchronous operation, and once you have that Task, you have all of the functionality available to normal Tasks.  You can wait on them, schedule continuations off of them, and so on.

ContinueWhenAll and ContinueWhenAny

The ContinueWith method on Task enables a slew of powerful patterns and is a fundamental building block in many higher-level implementations.  However, while being able to run a task when another completes is useful, it’s also useful to be able to run a task when any or all of a set of tasks completes.  For these purposes, we added ContinueWhenAll and ContinueWhenAny to TaskFactory:

var tasks = new Queue<Task>();

for (int i = 0; i < N; i++)

{

    tasks.Enqueue(Task.Factory.StartNew(() =>

    {

        ...

    }));

}

 

// Schedule a task to execute when all antecedent tasks complete.

var continuationOnAll = Task.Factory.ContinueWhenAll(

    tasks.ToArray(), (Task[] completedTasks) =>

    {

        ...

    });

 

// Schedule a task to execute when one (any) antedent task completes.

var continuationOnAny = Task.Factory.ContinueWhenAny(

    tasks.ToArray(), (Task completedTask) =>

    {

        ...

    });

 

TaskScheduler

In previous releases, you may have seen a TaskManager class.  TaskManager is no more – replaced by the new TaskScheduler class.  By default, tasks get scheduled on TaskScheduler.Default, which is an implementation based on new work-stealing queues in the .NET 4 ThreadPool (as mentioned in the first post).  However, TaskScheduler is an abstract class, so developers can derive from it and create their own custom schedulers!  The default scheduler should be sufficient in most cases, but a custom scheduler might be needed for some special scenarios (strict FIFO scheduling, special priorities, etc).

One of those special scenarios, scheduling onto the UI thread, is supported out-of-the-box.  The TaskScheduler.FromCurrentSynchronizationContext method returns a scheduler that wraps the current synchronization context.  Thus, if it is called on the GUI thread of a Windows Forms application, you’ll get back a TaskScheduler that marshals any queued tasks to the GUI thread.  Here’s an example:

public void Button1_Click(…)

{

    var ui = TaskScheduler.FromCurrentSynchronizationContext();

 

    Task.Factory.StartNew(() =>

    {

        return LoadAndProcessImage(); // compute the image

    }).ContinueWith(t =>

    {

        pictureBox1.Image = t.Result; // display it

    }, ui);

}

 

In this set of posts, we talked about most of the major updates to TPL for Beta 1, including changes under the covers, renames of some of our core types, redesigns of some of our core functionalities, and the addition of brand new types.  Some topics were left out such as debugger and profiler integration.  Stay tuned for posts on those and thanks for reading!

Leave a Comment
  • Please add 8 and 7 and type the answer here:
  • Post
  • PingBack from http://blogs.msdn.com/pfxteam/archive/2009/04/06/9534426.aspx

  • PingBack from http://blogs.msdn.com/pfxteam/archive/2009/03/27/9514938.aspx

  • awsome stuff :) the ui scheduler and APM helpers will be a real production booster, as well as helping to reduce bugs :)

    i have a question though,

    is there any perticular reason why ContinueWhenAny/All takes a Task[] and not IEnumerable<Task>? (or maybe it does, its just that you used the ToArray method)

  • On a related note, ContinueWhenAny/All would make great extension methods for IEnumerable<Task> :) any chance of that happening?

  • What is the algorithm for the default scheduler for creating new threads? Is it like the current .NET threadpool? Does it use the .NET threadpool?

    Any thoughts on scheduling tasks on IO completion port threads (like WCF does)?

    When replacing the default scheduler, how much of the underlying infrastructure is available (like local work stealing queues etc.)?

  • Michael,

    Let me answer your questions in the order you asked:

    TPL and .Net ThreadPool are now tightly integrated (TPL tasks run on TP worker threads, the most significant difference being  how they can take advantage of the thread local work queues that enable work stealing). Given this, the policies for creating new threads for the default TPL task scheduler are exactly the same as the ThreadPool.

    Currently we don't support scheduling tasks on IO completion port threads. Although we do provide the new FromAsync API set for creating tasks to wrap asynchronous operations. I'm not sure about the scenario you have in mind, but you probably want to check out the FromAsync methods. I should also note that they indirectly take advantage of the IO threadpool under the hood.

    And about replacing the default scheduler. In summary the answer to your question is: No, none of the local workstealing queues structures are available for direct reuse in custom schedulers. But read on...

    When you implement a custom scheduler, you need to implement all major blocks about that scheduler's functionality. This includes work queue structures, mangement of worker threads, and the dispatch loop itself.  The reason we choose to go this way is because it enables a whole variety of application specific scheduling policies.

    That being said, the interface b/w TPL runtime and the custom scheduler is laid out in such a way that, if the scheduler implementer decides to provide a local workqueue & workstealing enabled scheduler, then the portions of TPL that can take better advantage of local queues will continue to do so with their custom scheduler. Actually I'm specifically referring to task inlining behavior, which is used in various TPL codepaths involving waits. This can be enabled by a custom scheduler if it provides a proper implementation for the TaskScheduler.TryExecuteTaskInline() override.

    Please let us know if you have further questions. Thanks.

    -Huseyin Yildiz

  • A few additional points, on top of Huseyin's excellent response:

    - It is possible and relatively straightforward to implement a custom scheduler that schedules to the I/O ThreadPool, as WCF does.  Once the Beta is out, we'll make sure to release a sample that shows how to do this, in case its important for your needs.

    - We're also planning to release a sample that shows how to implement a work-stealing scheduler, in line with Huseyin's comments.

    aL, regarding your questions:

    - ContinueWhenAny/All are actually instance methods on the TaskFactory/TaskFactory<TResult> classes, so new, additional methods would be necessary to support them as extensions (which need to be static methods).  You'll of course be able to add these easily on your own if you need them, just creating extension methods that delegate, e.g.

    public static class MyExtensions

    {

       public static Task ContinueWhenAll(this IEnumerable<Task> tasks, Action<Task[]> continuationAction)

       {

           return Task.Factory.ContinueWhenAll(tasks.ToArray(), continuationAction);

       }

       ...

    }

    - Regarding why these work with Task[] instead of IEnumerable<Task>, we went that route for consistency with existing APIs like WaitHandle.WaitAll.  Additionally, we wanted consistency between WaitAll and WaitAny, and for WaitAny (which provides the index of the completed item), one needs to be able to index into the original set.  From a usability perspective, do you value this kind of consistency? How important do you consider support for IEnumerable<Task> instead of Task[]?

    Thanks.

  • hi,

    i would like to suggest that you concentrate your efforts more on features instead of performance and extensibility.

    its nice that you can implement your own scheduler but who actually does that? there will probably be about 3-5 community driven and significant implementations of this. you dont need to provide a sample for that.

    also, acceptance and usage will be driven by features and usability, not by performance as long as it is in a close range below optimal. please use your time to make coders more productive instead of investing 50% more work to achieve 10% perf improvement.

    nevertheless i really enjoy using this library.

  • The scenario for scheduling tasks on IOCP threads are:

    We have a highly parallel server application than continuously have about 1000 tasks scheduled. We try keep CPU load at 100% (we have some heavy calculations that we would like to run continuously).

    We currently have a very simple framework where we schedule tasks on IOCP threads (we plan to switch to TPL when it is released). When using the .NET ThreadPool we very easily end up with 1000 threads after running for some time (due to the .NET ThreadPool adding threads whenever there are queued workitems).

    When using regular threads we can remedy the problem a bit, but we still end up having either way too many threads (utilizing too many resources and getting a lot of context switches) or too few threads (prune to deadlocks and less than optimal utilization of all CPUs whenever a task is waiting).

    Scheduling tasks on IOCP threads provides better throughput in our scenario and is less prone to deadlocks than scheduling on regular threads (due to have way more IOCP threads ready for scheduling).

    On a somewhat related note, now that you embrace the Asynchronous Programming Model, do you have any plans for asynchronous synchronization primitives (e.g. an asynchronous monitor or readerwriterlock)?

  • hello, thanks for the reply :)

    i dont have any perticular scenario where i have to have IEnumerable<Task> and not Task[] but i generally prefer ienumerable since it lazy :)

    i do like consistency but if i'd rather have the exsisting apis also taking IEnumerables :)

    however maybe it doesnt really matter that the arguments for ContinueWhenAny/All cant be lazily evaluated and there might be other side effects of this that i havent thought of :)

    my main bother is that i have to write ToArray more, if they took ienumerables passing in Task[] would work anyway so from a callers prespective it would still be sort of consistent :)

  • what about taking middle ground in the debate about enumerables and Task[]. you said that you need to index into the collection that is passed in. so why dont you use IList<Task>? ie<Task> would always require you to internally call toList wasting cpu cycles.

    i also doubt that i will ever see an ie<Task> that is supposed to be evaluated lazily. evaluation of an ie<T> is conceptually side-effect free like property getters are conceptually read-only operations. materializing tasks however is not side-effect free thereby violating intuition. also, the lazy evaluation immediately breaks down because the Wait-method has to evaluate the ie<Task> internally.

  • also i'd like to hear a statement about scheduling io-heavy workloads or webservice-calls. the degree of parallelism is only weakly correlated with the number of cpu cores in this case. so the cpu-based scheduling will under or over utilize this resources. being able to specify the exact degree of parallelism conveniently will help a lot here.

  • Michael, thanks for the extra information.  Regarding asynchronous locks (I'm assuming you mean something like http://msdn.microsoft.com/en-us/library/microsoft.ccr.core.interleave.aspx), it's definitely something we're investigating for the future, but we don't currently have plans to ship anything like that in .NET 4.

    aL, thanks for the follow-ups on IEnumerable<Task>.  We'll take a hard look at this and see if it's something that should be revamped.

    tobi, thanks for the IList<Task> suggestion.  Regarding always calling ToList internally, in effect we have to do that anyway, even if we're provided with an array. As we're dealing with concurrency, and as we need to work on the provided data concurrently with the thread that provided it to us, we need to make a defensive copy of the array/list/enumerable so that we can party on it without fear that after the call to us the code that called us will change the array.  This enables that calling thread to reuse the array safely, without having to worry about additional lifetime and concurrency safety issues (at the expense, of course, of an allocation and copy).

    Regarding your question around CPU vs I/O workloads, this is one of the reasons we've integrated so tightly with the .NET ThreadPool.  The CLR team will be talking more about the work that's been done here for 4.0, but the ThreadPool has been augmented with new thread injection/retirement heuristics meant to very quickly ramp up to the optimal number of threads to maximize work item throughput.  The limitation for this maximum is largely irrelevant to the ThreadPool, whether it's the number of CPUs or the bandwith of the network connection or disk throughput, etc.  If you do want to place minimums and maximums, you can still use ThreadPool.SetMin/MaxThreads.  And with the ability to write custom TaskSchedulers, you can also completely customize the scheduling logic to your hearts content.  Your scheduler could use your own threads managed explicitly by your code, it could be built on top of the worker thread pool (simply layering additional logic on top of delegations to QueueUserWorkItem), it could be built on the I/O thread pool, etc.

  • Also, tobi, thanks for your suggestions on priorities, and I'm glad you enjoy using the library; I hope that when the Beta comes out, you're excited about the features it offers.  Please let us know at that point if there's something specific and high-value to you that you see it missing.

  • glad to provide food for thought :) im by no means an expert though so i might be totaly wrong..

    one way to get around indexing might by to have WaitAny provide the actual task and not the index :) but perhaps a change like that would break alot of exsisting code

    lazy eval might not be the best thing (again, im no expert) but it would seem you'd be able to save allocations in wait/continueWhenAny if lets say the first task out of 1000 immedietly finishes, they you wouldnt even need to instansiate the other 999 :)

    also, i've heard that ie<T> promotes shorter object lifetimes and that that is beneficial for garbage collection. i might be wrong about that though

Page 1 of 3 (39 items) 123