All about Async/Await, System.Threading.Tasks, System.Collections.Concurrent, System.Linq, and more…
It’s been a few months since April when we last released a Community Technology Preview (CTP) of System.Threading.Tasks.Dataflow.dll, aka “TPL Dataflow”. Today for your programming pleasure, we have another update.
As mentioned in “What’s New for Parallelism in .NET 4.5”, System.Threading.Tasks.Dataflow.dll is part of the .NET Framework 4.5 Developer Preview released last week at the BUILD conference. In addition to that release, however, we’ve also refreshed the standalone CTP bits available for download on the MSDN DevLabs site at http://msdn.microsoft.com/en-us/devlabs/gg585582. You’ll find that the two DLLs are very similar though not identical. We’re continuing to work hard on getting both the API surface area and the implementation of TPL Dataflow to be just right, addressing your great feedback about the library’s functionality and also focusing on improving performance. Resulting changes are highlighted in the TPL Dataflow CTP on DevLabs.
In addition to bug fixes around robustness, here are some of the improvements you’ll find:
As an example of some of these performance improvements, consider a few microbenchmarks (and as always with microbenchmarks, take these with large grains of salt, both in terms of what’s being measured and in terms of the representativeness of the hardware on which they’re executed). Here’s a microbenchmark that just sees how fast we can push data through an ActionBlock:
using System; using System.Diagnostics; using System.Threading; using System.Threading.Tasks.Dataflow; class Program { static void Main() { var sw = new Stopwatch(); const int ITERS = 6000000; var are = new AutoResetEvent(false); var ab = new ActionBlock<int>(i => { if (i == ITERS) are.Set(); }); while (true) { sw.Restart(); for (int i = 1; i <= ITERS; i++) ab.Post(i); are.WaitOne(); sw.Stop(); Console.WriteLine("Messages / sec: {0:N0}", (ITERS / sw.Elapsed.TotalSeconds)); } } }
using System; using System.Diagnostics; using System.Threading; using System.Threading.Tasks.Dataflow;
class Program { static void Main() { var sw = new Stopwatch(); const int ITERS = 6000000; var are = new AutoResetEvent(false); var ab = new ActionBlock<int>(i => { if (i == ITERS) are.Set(); }); while (true) { sw.Restart(); for (int i = 1; i <= ITERS; i++) ab.Post(i); are.WaitOne(); sw.Stop(); Console.WriteLine("Messages / sec: {0:N0}", (ITERS / sw.Elapsed.TotalSeconds)); } } }
On my 64-bit quad-core 1.6GHz i7 laptop, here are example throughput numbers I see from the April DevLabs CTP and the September DevLabs CTP:
April CTP Sept CTP ActionBlock throughput 4,801,434 10,942,715
Now consider a minor modification to this benchmark, simply configuring the ActionBlock to use the new SingleProducerConstrained option:
var ab = new ActionBlock<int>(i => { if (i == ITERS) are.Set(); } , new ExecutionDataflowBlockOptions { SingleProducerConstrained = true});
With that change, I get:
Sept CTP ActionBlock throughput 37,456,691
As another example, consider the performance of sending and receiving asynchronously from a bounded buffer block, a common case when implementing asynchronous producer/consumer scenarios where you want to limit production so that producers never get too far ahead of consumers, and doing so in a way where all operations are represented using Tasks (which could then be awaited):
using System; using System.Diagnostics; using System.Threading.Tasks.Dataflow; class Program { static void Main() { var sw = new Stopwatch(); const int ITERS = 1000000; var bb = new BufferBlock<int>( new DataflowBlockOptions { BoundedCapacity = 1 }); while (true) { sw.Restart(); for (int i = 0; i < ITERS; i++) { bb.SendAsync(i); bb.ReceiveAsync(); } sw.Stop(); Console.WriteLine("Messages / sec: {0:N0}", (ITERS / sw.Elapsed.TotalSeconds)); } } }
using System; using System.Diagnostics; using System.Threading.Tasks.Dataflow;
class Program { static void Main() { var sw = new Stopwatch(); const int ITERS = 1000000;
var bb = new BufferBlock<int>( new DataflowBlockOptions { BoundedCapacity = 1 }); while (true) { sw.Restart(); for (int i = 0; i < ITERS; i++) { bb.SendAsync(i); bb.ReceiveAsync(); } sw.Stop(); Console.WriteLine("Messages / sec: {0:N0}", (ITERS / sw.Elapsed.TotalSeconds)); } } }
On that same machine with the two different builds, I see the following:
April CTP Sept CTP SendAsync / ReceiveAsync throughput 671,619 1,216,397
We hope you enjoy these updates! As always, feedback is very welcome and encouraged, and we look forward to hearing from you. You can discuss the TPL Dataflow CTP in the TPL Dataflow forum on MSDN.
I think ~200 CPU-cycles per fully synchronized ActionBlock message are a pretty good achievement when you consider that interlocked memory access takes 100-200 cycles.
Thanks. We're also excited about the improvements.