"Task.Factory.StartNew" vs "new Task(...).Start"

"Task.Factory.StartNew" vs "new Task(...).Start"

  • Comments 11

With TPL, there are several ways to create and start a new task.  One way is to use the constructor for task followed by a call to the Start method, e.g.
        new Task(...).Start();
and the other is by using the StartNew method of TaskFactory, e.g.
        Task.Factory.StartNew(...);
This begs the question... when and why would you use one approach versus the other?

In general, I always recommend using Task.Factory.StartNew unless the particular situation provides a compelling reason to use the constructor followed by Start.  There are a few reasons I recommend this.  For one, it's generally more efficient.  For example, we take a lot of care within TPL to make sure that when accessing tasks from multiple threads concurrently, the "right" thing happens.  A Task is only ever executed once, and that means we need to ensure that multiple calls to a task's Start method from multiple threads concurrently will only result in the task being scheduled once.  This requires synchronization, and synchronization has a cost.  If you construct a task using the task's constructor, you then pay this synchronization cost when calling the Start method, because we need to protect against the chance that another thread is concurrently calling Start.  However, if you use Task.Factory.StartNew, we know that the task will have already been scheduled by the time we hand the task reference back to your code, which means it's no longer possible for threads to race to call Start, because every call to Start will fail.  As such, for StartNew we can avoid that additional synchronization cost and take a faster path for scheduling the task.

There are, however, some cases where creating a new task and then starting it is beneficial or even required (if there weren't, we wouldn't have provided the Start method).  One example is if you derive from Task.  This is an advanced case and there's typically little need to derive from Task, but nevertheless, if you do derive from it the only way to schedule your custom task is to call the Start method, since in .NET 4 the Task.Factory.StartNew will always return the concrete Task or Task<TResult> types.  Another even more advanced use case is in dealing with certain race conditions.  Consider the need for a task's body to have access to its own reference, such as if the task wanted to schedule a continuation off of itself.  You might try to accomplish that with code like:
    Task t = null;
    t = Task.Factory.StartNew(() =>
    {
        ...
        t.ContinueWith(...);

    });
This code, however, is buggy.  There is a chance that the ThreadPool will pick up the scheduled task and execute it before the Task reference returned from StartNew is stored into t.  If that happens, the body of the task will see Task t as being null.  One way to fix this is to separate the creation and scheduling of the task, e.g.
    Task t = null;
    t = new Task(() =>
    {
        ...
        t.ContinueWith(...);

    });
    t.Start();
Now, we know that t will in fact be properly initialized by the time the task body runs, because we're not scheduling it until after it's been set appropriately.

In short, there are certainly cases where taking the "new Task(...).Start()" approach is warranted.  But unless you find yourself in one of those cases, prefer TaskFactory.StartNew.

Leave a Comment
  • Please add 5 and 2 and type the answer here:
  • Post
  • I was wondering if it is possible to introduce a race condition if you call ContinueWith on the task returned from StartNew? Or another way of putting it: can you call ContinueWith on a task that has already completed?

    Suppose I have the following:

    Task t = Task.Factory.StartNew(() => ...);

    t.ContinueWith(...);

    It's possible that by the time I arrive at the ContinueWith call, task t has already completed. Is this a problem?

  • Hi Ronald-

    It's certainly possible by the time you arrive at t.ContinueWith that t has completed, but that's ok.  The Task class handles this internally... if the task has already completed when you call ContinueWith, the continuation will be scheduled immediately.

  • Perfect timing Toub! I was just writing a sample where I needed to use your "advanced" usage of a closure to get the instance of the very thing I needed inside a delegate.

    Also, I was wondering about the t.ContinueWith thing too.

  • I'm baffled by the .StartNew() Implementation; assuming I'm doing something wrong, but really not sure what.  I'm finding that .Startnew() doesn't actually start anything.  The following code doesn't cause ANY tasks to run for me:

    Task<Domain> domainCreationTask = Task<Domain>.Factory.StartNew(() => ImportDomain(domainConstructor, domainAttributes, importDate));

    However, if I add a call to .Result, then the task runs and I get the expected output:

    Task<Domain> domainCreationTask = Task<Domain>.Factory.StartNew(() => ImportDomain(domainConstructor, domainAttributes, importDate));

                                       Domain d = domainCreationTask.Result;

    Of course, I've just negated the PTL by causing my main thread to block awaiting the result.

    Any suggestions on what I might be doing wrong?  I then tried replacing the access to the Result property with a call to Start(), and found that 1000 calls resulted in only one task being dispatched, presumably because it looked the same--even though one of the parameters (domainAttributes) contains a different value each time this code is hit.

  • Hi iissystems-

    When you call StartNew to create domainCreationTask, what is the type of the instance returned from calling TaskScheduler.Current?  It sounds like you're calling it from within another task that's scheduled to the UI thread.  Unless you specify another scheduler, StartNew schedules tasks to TaskScheduler.Current, which will be the scheduler for whatever task you're currently running within.

  • I've been working through the examples in the Visual C# 2010 Step By Step in Chapter 27, introducing the TPL.  The code uses Task.Factory.StartNew() in the following manner:

       Task first = Task.Factory.StartNew( () => generateGraphData(data, 0, pixelWidth / 8) );

       Task second = Task.Factory.StartNew(() => generateGraphData(data, pixelWidth / 8, pixelWidth / 4));

       Task third = Task.Factory.StartNew(() => generateGraphData(data, pixelWidth / 4, pixelWidth * 3/ 8));

       Task fourth = Task.Factory.StartNew(() => generateGraphData(data, pixelWidth * 3 / 8, pixelWidth / 2));

       Task.WaitAll(first, second, third, fourth);

    This seem to work just fine, but thinking a bit different, I tried:

       Task[] tasks = new Task[4];

       for (int idx = 0; idx < 4; idx++)

       {

           tasks[idx] = Task.Factory.StartNew

           (

               () => generateGraphData(data, pixelWidth * idx / 8, pixelWidth * (idx + 1) / 8)

           );

       }

      Task.WaitAll(tasks[0], tasks[1], tasks[2], tasks[3]);

    The above does not behave predictably.  If I put a Debug.Write in the generateGraphData function, it occasionally has duplicate values for the second and third parameter.  Sometimes it has 2 sets of values, but never produces the full result.  I ensure that my approach was good, I tried the following which seems to work fine (substituting a hard coded value for "idx"):

       Task[] tasks = new Task[4];

       tasks[0] = Task.Factory.StartNew(() => generateGraphData(data, pixelWidth * 0 / 8, pixelWidth * (0 + 1) / 8));

       tasks[1] = Task.Factory.StartNew(() => generateGraphData(data, pixelWidth * 1 / 8, pixelWidth * (1 + 1) / 8));

       tasks[2] = Task.Factory.StartNew(() => generateGraphData(data, pixelWidth * 2 / 8, pixelWidth * (2 + 1) / 8));

       tasks[3] = Task.Factory.StartNew(() => generateGraphData(data, pixelWidth * 3 / 8, pixelWidth * (3 + 1) / 8));

       Task.WaitAll(tasks[0], tasks[1], tasks[2], tasks[3]);

    This also seems to work fine, as does the first listing.  I am a bit new to C#, but are there some limitations on index variables used in this way?  (I know that Parallel class attempts to do partitioning automatically, but wanted to see if I can get mine to work.)

    Thanks for any suggestions.

    Regards,

    Robert

  • Hi Robert-

    The problem has to do with C# closure capture rules around your for loop iteration variable idx.  That variable is being shared across all of the lambdas you're creating, so when your for loop iterates, you're actually updating the shared idx value you passed to all of your lambdas.  See blogs.msdn.com/.../closing-over-the-loop-variable-considered-harmful.aspx for more info.

    I hope that helps.

  • If I am correct, this looping variables in lambda problem is fixed in upcoming version of C#.

  • Ajay, it's addressed for 'foreach', but not for 'for'.  See the updated on Eric Lippert's blog at blogs.msdn.com/.../closing-over-the-loop-variable-considered-harmful.aspx.

  • Thanks for the article Stephen, but I am wondering if you could please explain how this:

       Task t = null;

       t = Task.Factory.StartNew(() =>

       {

           ...

           t.ContinueWith(...);

       });

    is different from this (and why you would want to do it the previous way):

       Task.Factory

           .StartNew(() =>

           {

               ...

           })

          .ContinueWith(...);

  • Z, as outlined in the post, the former code snippet is actually buggy, in that the task could execute and try to process "t.ContinueWith" before 't' has actually been assigned to, resulting in a NullReferenceException.  That said, this kind of pattern can arise when dealing with recursion, where the async operation needs to do some processing before deciding what the continuation should be (or whether there even needs to be a continuation); it's relatively rare and quite advanced.

Page 1 of 1 (11 items)