Why is TaskContinuationsOptions.ExecuteSynchronously opt-in?

Why is TaskContinuationsOptions.ExecuteSynchronously opt-in?

  • Comments 6

For a relatively advanced feature, I've been surprised how often this question has come up recently.
 
When a task completes, its continuations become available for execution, and by default, a continuation will be scheduled for execution rather than executed immediately.  This means that the continuation has to be queued to the scheduler and then later retrieved so that it may be run.  Given that there's some overhead involved there, why would we choose to make that the default behavior rather than avoiding that overhead by executing the continuation synchronously upon completion of the antecedent? There are a few reasons.
 
First, it's quite common for multiple continuations to be created off of the same task.  If continuations were executed synchronously by default, we would lose out on a valuable opportunity for parallelism, as they would all be executed synchronously one after the other.  By scheduling the continuations to run asynchronously rather than executing them synchronously, we expose those continuations to be picked up by other available threads, thereby allowing them to run in parallel.
 
Second, it's quite common for long chains of continuations to be formed, with one task continuing off of another, and another off of that, and another off of that, and so on.  If these continuations were all executed synchronously, the completion logic from one task would invoke the next task, and its completion logic would invoke the next... each of these would lead to additional stack frames piling up on top of each other, and with a long enough chain, we could end up overflowing the stack.
 
Third, a common solution to such overflow conditions as discussed above is to use a "trampoline," where you store a reference to some work to be done, back out of your current stack frame(s), and have a higher-level frame (typically a looping construct) look for the stored reference and execute it.  That way, after every invocation, rather than picking up the next piece of work immediately, you store the reference, back out, and then execute it.  This, as it happens, is exactly the solution TPL employs to make asynchronous execution fast.  Remember that as part of .NET 4, the ThreadPool's internal implementation was augmented with work-stealing queues to which TPL has access.  When work running on a ThreadPool thread schedules a Task for execution, that Task is put into a work-stealing queue local to that thread.  The thread is able to push and pop work items from it very efficiently and with minimal synchronization.  Now, when a task completes, it's typically completing on a ThreadPool thread, and as such all of the continuations it queues get queued to the local work-stealing queue.  The thread will then go in search of work to do, first checking its local queue, and immediately find one of the continuations it just queued.  This is, in effect, the trampoline.  The thread picks off the most recently queued continuation efficiently and begins processing it.
 
These are the primary reasons why we default to queueing continuations rather than just executing them synchronously: it provides more opportunities to leverage parallelism, it's the safer choice, and the difference in performance is typically not important.  Of course, microbenchmarks will highlight a non-negligable performance difference, so if you’re dealing with continuations that contain very few instructions, have little risk of blocking, etc., ExecuteSynchronously can be worthwhile.  Consider the following simple test:

using System;

using System.Diagnostics;

using System.Threading.Tasks;

 

class Program

{

    const int NUM_CONTINUATIONS = 100000;

 

    static long Test(bool executeSynchronously)

    {

        GC.Collect();

        GC.WaitForPendingFinalizers();

        GC.Collect();

 

        var first = new Task(() => { });

        var last = first;

        for (int i = 0; i < NUM_CONTINUATIONS; i++)

        {

            last = last.ContinueWith(delegate { }, executeSynchronously ?

                TaskContinuationOptions.ExecuteSynchronously :

                TaskContinuationOptions.None);

        }

 

        var sw = Stopwatch.StartNew();

        first.Start();

        last.Wait();

        return sw.ElapsedMilliseconds;

    }

 

    static void Main(string[] args)

    {

        while (true)

        {

            long withoutExecuteSynchronously = 0;

            long withExecuteSynchronously = 0;

            for (int i = 0; i < 5; i++)

            {

                withoutExecuteSynchronously += Test(false);

                withExecuteSynchronously += Test(true);

            }

            Console.WriteLine((withoutExecuteSynchronously /

                (double)withExecuteSynchronously).ToString("F2"));

        }

    }

}

 

This test creates a chain of NUM_CONTINUATIONS continuations, each of which does zero work, seeing how long it takes to execute the whole chain, and comparing the cases where the continuations are and are not created with the ExecuteSynchronously option.  On the laptop on which I’m writing this blog post, I see the ExecuteSynchronously version running faster, with at most a 2x difference in throughput.  This highlights why we made the ExecuteSynchronously option available even though it's not the default.

Leave a Comment
  • Please add 7 and 3 and type the answer here:
  • Post
  • I thoguht you might like to know that the email address on this blog is bouncing.

    Your message did not reach some or all of the intended recipients.

         Subject: MSDN Blogs: Contact request: ConcurrentDictionary question

         Sent: 5/24/2010 9:50 PM

    The following recipient(s) cannot be reached:

         pfx@microsoft.com on 5/24/2010 9:50 PM

               The e-mail system was unable to deliver the message, but did not report a specific reason.  Check the address and try again.  If it still fails, contact your system administrator.

               < mail5-tx2-R.bigfish.com #5.0.0 X-Postfix; host    winse-6216-mail6.customer.frontbridge.com[131.107.115.215] said: 550 5.1.1    User unknown (in reply to RCPT TO command)>

    Rick

  • Thanks for letting us know, Rick!  We'll look into why it's not working.  In the meantime, please feel free to post your ConcurrentDictionary question here or in our forums at social.msdn.microsoft.com/.../threads.

  • Code from the article executes terribly slow within VS2010 in x86 mode under Windows 7 64-bit.

    In x64 mode within VS2010 and from command prompt (both x86 and x64) all is OK.

    What's wrong ?

    Oleg Subachev

  • Here are approximate timings:

    VS2010, x86

    NUM_CONTINUATIONS = 100 (<- !!!)

    withoutExecuteSynchronously ~ 2530 ms

    withExecuteSynchronously ~ 80 ms

    VS2010, x64

    NUM_CONTINUATIONS = 100000

    withoutExecuteSynchronously ~ 610 ms

    withExecuteSynchronously ~ 210 ms

    command prompt, x86

    NUM_CONTINUATIONS = 100000

    withoutExecuteSynchronously ~ 670 ms

    withExecuteSynchronously ~ 280 ms

    command prompt, x64

    NUM_CONTINUATIONS = 100000

    withoutExecuteSynchronously ~ 620 ms

    withExecuteSynchronously ~ 200 ms

    Oleg Subachev

  • Hi Oleg-

    When you say from VS2010, are you running with the debugger attached, e.g. pressing F5?  Have you enabled the IntelliTrace events for threading?  If so, that would very likely slow it down, and as IntelliTrace wouldn't be recording those events for x64 nor when you run from outside of VS, that could explain the difference.  I just tried it on my box, and without those events enabled in IntelliTrace, with just 100 continuations it's way too fast to measure.  With those events enabled for IntelliTrace, I see a similar x86 slowdown.

  • Yes! Thank you! You are absolutely right!!!

    The problem is exactly as you described.

Page 1 of 1 (6 items)