The aim of this post is to help developers writing applications in which operations may need to be performed and then later undone due to a subsequent failure. It shows a pattern for how to maintain such a consistent application state by utilizing functionality from the Task Parallel Library (TPL) in the .NET Framework 4.
For the purposes of this blog post, a program/routine is a state machine where each state is equally valid and important and should be handled correctly regardless of whether it belongs to a happy path or not. Specifically, a step that rolls back an incomplete state transition is equally important as a step that makes a forward state transition. A lot could be written about the advantages and disadvantages of exceptions vs. error codes, and such a discussion is not the goal of this post: for the purpose of this article, it is only worth mentioning that throwing and catching exceptions is focused on the happy path (it keeps the code clean and easy to comprehend), while returning and checking error codes is focused on detecting the places of the code (program states) where deviations from the happy path may occur.
In general, a routine that consists of n atomic forward steps, where each forward step may fail and then needs to be undone, could be modeled as the following state machine:

Assuming a step is atomic means we don't have to undo that particular step if it fails. That is why the undo sequences in the diagram start with the last successful step.
Note: It is also assumed that undoing may not fail. If failures are possible to occur during rollback, then you should strongly consider using a real transactional resource manager like a database server.
Now that we've stated the problem, what is the best way to approach it? Obviously, there are solutions through either error checking or through exception handling, but both of those come at the expense of convoluting the code. Notice that the number of state transition paths is about three times the number of forward steps. Ideally, we are looking for a solution that has a clean happy path as well as clean undo paths without any if statements or separate Exception types for each forward step. In other words, we are looking for a way to describe a program as a state machine.
That's exactly what the TPL does through tasks (as states) and continuations (as state transitions).
Let's take a concrete application as an example and write some real code. Let's say we have a system that manages the process of making a sandwich. In a terminal state our resources, bread and ham, are in the fridge. (For the sake of simplicity, we produce one sandwich at a time.) First, we take a slice of bread out of the fridge. Second, we take a slice of ham out of the fridge. Third, we assemble the bread and the ham into a sandwich. If an operation fails, we have to return whatever ingredients are currently on the line back to the fridge so they don't rot:
And here is the code of our routine:
// HAPPY PATH
Task retrieveBread = Task.Factory.StartNew(RetrieveBreadFromFridge);
Task retrieveHam = retrieveBread.ContinueWith(_ => RetrieveHamFromFridge(),
TaskContinuationOptions.OnlyOnRanToCompletion);
Task assembleSandwich = retrieveHam.ContinueWith(_ => AssembleSandwich(),
TaskContinuationOptions.OnlyOnRanToCompletion);
// RESET STATE
TaskCompletionSource<bool> reset = new TaskCompletionSource<bool>();
assembleSandwich.ContinueWith(_ => reset.SetCanceled(),
TaskContinuationOptions.OnlyOnRanToCompletion);
// UNDO PATH
Task returnBread = new Task(ReturnBreadToFridge);
returnBread.ContinueWith(_ => reset.SetResult(true),
TaskContinuationOptions.OnlyOnRanToCompletion);
Task returnHam = new Task(ReturnHamToFridge);
returnHam.ContinueWith(_ => returnBread.Start(),
TaskContinuationOptions.OnlyOnRanToCompletion);
// HAPPY PATH -> UNDO PATH
// On a failure in bread retrieval - signal "reset".
retrieveBread.ContinueWith(_ => reset.SetResult(true),
TaskContinuationOptions.OnlyOnFaulted);
// On a failure in ham retrieval - return bread
retrieveHam.ContinueWith(_ => returnBread.Start(),
TaskContinuationOptions.OnlyOnFaulted);
// On a failure in sandwich assembly- return ham and bread
assembleSandwich.ContinueWith(_ => returnHam.Start(),
TaskContinuationOptions.OnlyOnFaulted);
// WAIT
// Log the execution of the HAPPY PATH tasks.
// Additionally, wait on the reset state to get signaled.
Task[] loggedTasks = new Task[]
{retrieveBread, retrieveHam, assembleSandwich, reset.Task};
Task.Factory.ContinueWhenAll(loggedTasks,LogErrors)
.Wait();
As you can see, there are no if statements, nor are there any Exception types customized to provide information what step failed.
Finally, I added a logging task that effectively waits for all the state machine tasks to finish, and then traverses the given task array and logs any error messages. (Notice that I still have to wait for that task to finish.) The body of that method is straightforward:
private void LogErrors(Task[] tasks)
{
Console.WriteLine("\n\tERRORS:");
foreach (Task task in tasks)
{
if (task.IsFaulted)
{
// Use the InnerException since it is wrapped in an AggregateException
Console.WriteLine("\t{0}", task.Exception.InnerException.Message);
}
}
}
In conclusion, TPL enables writing state-sensitive applications on the .NET platform without convoluting the application code with many if statements or customized exception types. While the code is still a little verbose, it closely resembles the state machine's diagram. The attached zip file contains the complete C# project. Feel free to download it and play with it. Your feedback is welcome.
ASP.NET applications already get a lot of concurrency for free. The .NET Framework load balances incoming requests among ThreadPool worker threads, striving for optimal use of available CPUs. As long as you minimize blocking in your ASP.NET page code, ASP.NET will process requests concurrently. In most cases, and in particular for Web applications with heavy usage, it is probably not necessary to introduce extra parallelism since adding more work items will only result in competition for CPU time and ultimately reduce request throughput.
Dealing with I/O bound work
If most of the work being done in an ASP.NET request is asynchronous in nature (such as I/O), doing the asynchronous work synchronously can be a huge scalability bottleneck. Solutions based on Asynchronous Programming Model (APM) and Event-based Asynchronous Pattern (EAP) have been recommended to ease this bottleneck. For an in-depth discussion on this refer to Scalable Apps with Asynchronous Programming in ASP.NET and Asynchronous Pages in ASP.NET 2.0. The article Improving ASP.NET Performance also has some good pointers to improving the scalability of your web applications.
New features in the .NET Framework 4 can also be used to make programming asynchronous pages easier. The System.Threading.Tasks.Task class (and the Task<TResult> class that derives from it) can be used to represent asynchronous operations, both classes implement IAsyncResult, and they provide capabilities for coordinating between multiple asynchronous activities. Since part of ASP.NET’s asynchronous pages support is based on the Asynchronous Programming Model (APM) pattern and IAsyncResult, Task can play a role in easing the implementation of asynchronous pages. In particular, Task is most useful if you want to structure your code with continuations, which can be useful if you have multiple stages of asynchronous activity that need to happen before the rest of the page continues execution. For more details, refer to Tasks and the Event-based Asynchronous Pattern and Tasks and the APM Pattern
Dealing with CPU intensive work
Web applications that need to perform expensive computations may still benefit from parallelism if the latency of an individual request is more important than overall request throughput. If this is the case, the new APIs for parallelism in .NET 4 such as Task Parallel Library and PLINQ can simplify writing the parallel code. When integrating parallelism into your web application, consider the following factors:
If requests are computationally cheap to process, then parallelism is probably an unnecessary overhead.
If the incoming request rate is high, then adding more parallelism will likely yield few benefits and could actually decrease performance, since the incoming rate of work may be high enough to keep the CPUs busy.
If the incoming request rate is low, then the Web application could benefit from parallelism by using the idle CPU cycles to speed up the processing of an individual request. We can use either PLINQ or TPL (either Parallel loops or the Task class) to parallelize the computation over all the processors. Note that by default, however, the PLINQ implementation in .NET 4 will tie-up one ThreadPool worker per processor for the entire execution of the query. As such, it should only be used in Web applications that see few but expensive requests.
If the incoming request rate is variable, i.e. there are long periods when request rate is low (say, at night) and then other periods when request rate is high (say, midday), we need a strategy that will dynamically adjust to the available resources. When the load is high, we don’t want to add to the contention but when the load is low, we want to use the idle resources. For this scenario, we can use TPL’s Parallel or Task constructs since they can adapt to use available resources within a process. If the server is already loaded, the Parallel loops can use as little as one worker and make forward progress. If the server is mostly free, they can grow to use as many workers as the ThreadPool can spare.
Developing libraries for ASP.NET
If you’re developing a library that uses the parallel programming features of .NET 4, you should consider whether it is going to be to be used within ASP.NET. If it is, you should consider exposing knobs from your library that enable controlling how much parallelism is employed by the library. This is particularly important for libraries that utilize PLINQ. In .NET 4, PLINQ by default uses a fixed number of workers equal to the number of logical processors. By exposing control to the consumer of the library, the consumer can specify a maximum amount of parallelism to be employed, and this value can be configured based on the environment. The number of workers PLINQ utilizes is controllable through the WithDegreeOfParallelism operator; the maximum number of workers utilized by the Parallel loops is controllable through the ParallelOptions class, an instance of which is supplied as a parameter to overloads of the looping constructs.
Conclusion
ASP.NET already takes advantage multiple processors on your server. Most developers will not need to explicitly add any parallelism into their ASP.NET Web applications. However, if your particular situation requires explicit parallelism, the new parallelism APIs in .NET 4 can be beneficial to you.
Igor Ostrovsky is a developer on the Parallel Extensions team. On his blog, he's
documented a great set of examples for how caches can affect application performance; this is important to think through when writing parallel applications, but as Igor demonstrates, it applies equally to serial applications. Check out his post.
Are you using Parallel LINQ (PLINQ), the Task Parallel Library (TPL), or any of the new coordination and synchronization primitives in .NET 4 (or in the Parallel Extensions June 2008 CTP or with the recent Reactive Extensions release)? Are you planning to use or are you already using this support in a production application or library?
We'd love to hear about it, and if you have time, what your experiences have been (good or bad). Please email me at stoub at microsoft dot com... we're looking forward to hearing from you. Thanks!
Several months ago, Microsoft announced for academic customers the availability of DryadLINQ. DryadLINQ is a LINQ provider developed by Microsoft Research that enables .NET developers to use the LINQ programming model for writing distributed queries and computations against a cluster of computers using Windows HPC Server. DryadLINQ enables developers to harness and tame the distributed data storage and computational resources of a cluster, all with a familiar LINQ-based syntax, just as PLINQ enables developers to more easily take advantage of multi-core and manycore. (In fact, DryadLINQ is capable of using PLINQ internally to harness multiple cores available on each cluster node.)
It’s a pleasure to announce that, as of today, MSR has also released DryadLINQ under an additional license agreement, one that allows for non academic use. The academic and non academic releases are largely identical: key differences are in the licenses themselves, and that the academic release supplies source code for the programming model layer whereas the commercial release is a binary-only distribution.
In order to download the new release, you will need to register on the Dryad connect site. You will also need a Windows HPC Server cluster (three nodes will suffice), for which you can download a free evaluation version at http://www.microsoft.com/hpc/en/us/try-it.aspx.
We are looking forward to receiving your feedback about this release!
Are you using the CCR (Microsoft Robotics' "Concurrency & Coordination Runtime") today in production applications or libraries, and in particular for non-robotics purposes? If so, we’d love to hear about your experiences, and any and all information you’re willing to share would be very welcome.
What do you like about it and the programming model it employs? What don’t you like about it? What features are crucial to you, and what features do you never use? How are you architecting your applications with it? Any key code samples that are representative of your application you’d like to share, or even better, a standalone implementation that highlights how you use it? If you experiemented with it but ended up not using it, why? Etc. If you’re interested and willing to share, please send me an email at stoub at microsoft dot com.
We’re excited to hear from you!
Thanks,
Stephen
(This answer is based on the .NET Framework 4. As the details below are undocumented implementation details, they may change in future releases.)
No. All of the collections in the new System.Collections.Concurrent namespace employ lock-free techniques to some extent in order to achieve general performance benefits, but traditional locks are used in some cases.
It’s worth noting that purely relying on lock-free techniques is sometimes not the most efficient solution. When we say “lock-free,” we mean that locks (in .NET, traditional mutual exclusion locks are available via the System.Threading.Monitor class, typically via the C# “lock” keyword or the Visual Basic “SyncLock” keyword) have been avoided by using memory barriers and compare-and-swap CPU instructions (in .NET, “CAS” operations are available via the System.Threading.Interlocked class).
ConcurrentQueue<T> and ConcurrentStack<T> are completely lock-free in this way. They will never take a lock, but they may end up spinning and retrying an operation when faced with contention (when the CAS operations fail).
ConcurrentBag<T> employs a multitude of mechanisms to minimize the need for synchronization. For example, it maintains a local queue for each thread that accesses it, and under some conditions, a thread is able to access its local queue in a lock-free manner with little or no contention. Therefore, while ConcurrentBag<T> sometimes requires locking, it is a very efficient collection for certain concurrent scenarios (e.g. many threads both producing and consuming at the same rate).
ConcurrentDictionary<TKey,TValue> uses fine-grained locking when adding to or updating data in the dictionary, but it is entirely lock-free for read operations. In this way, it’s optimized for scenarios where reading from the dictionary is the most frequent operation.
All of these terms are overloaded, even in the context of parallel computing. However, we’ve used them extensively to describe how well our parallel algorithms and demo applications work. And sometimes, we throw them around carelessly on the blog, forums, etc., so here are our general definitions.
Performance is an attribute that refers to the total elapsed time of an algorithm’s execution. Less elapsed time means higher performance.
Speedup is a metric that quantifies performance by comparing two elapsed time values. In parallel computing, these two values are usually generated by the execution of a serial algorithm and a parallelized version of the same algorithm. Speedup is then calculated using the following equation:
Speedup = Serial Execution Time / Parallel Execution Time
So if a serial algorithm takes 100 seconds to complete, and the parallel version takes 40 seconds, the speedup is “2.5x”.
Efficiency is a metric that builds on top of speedup by adding awareness of the underlying hardware. It is usually calculated using the following equation:
Efficiency = Speedup / # of cores
So if speedup is “2.5x” on a 4-core machine, efficiency is 0.625 or 62.5%.
Scalability is an attribute that refers to the speedup of an algorithm given different numbers of cores/processors. The efficiency metric is good for quantifying scalability, because if efficiency holds constant as the number of cores changes, we have linear scaling (or awesome scalability).
The following code correctly observes and handles a Task exception and should print “gotcha!” to the console. By default though, the Debugger will report a crash.
Task t = Task.Factory.StartNew(() => { throw new Exception("poo"); });
try { t.Wait(); }
catch (AggregateException) { Console.WriteLine("gotcha!"); }
The issue has to do with the “Just My Code” mode (enabled by default), which causes the Debugger to break in immediately when an exception leaves user code (the Task delegate) and enters non-user code (TPL internal). This is usually a good thing, because it allows you to pinpoint exactly where an exception is going unhandled. However, in this case, the Debugger is breaking before TPL can observe the exception.
Running without debugging or disabling “Just My Code” (Tools -> Options -> Debugging -> General) should resolve the issue. Also, note that the Debugger actually broke in as though it had hit a breakpoint, so Continuing (F5) or Stepping (F10/F11) should allow further execution.

ConcurrentDictionary<TKey,TValue> is a new type in the .NET Framework 4, living in the System.Collections.Concurrent namespace. As noted in the MSDN documentation, ConcurrentDictionary “represents a thread-safe collection of key-value pairs that can be accessed by multiple threads concurrently.”
While ConcurrentDictionary implements IDictionary<TKey, TValue> just as does Dictionary<TKey,TValue>, it also has several unique mechanisms for adding / updating key/value pairs in the dictionary, mechanisms specific to its concurrent focus. Here is a summary of when to use which mechanism.
- If you want to…
- Add a new item to the dictionary only if the key doesn’t currently exist in the dictionary…
- Use TryAdd. TryAdd accepts the key and the value, and adds the pair to the dictionary if the key doesn’t currently exist in the dictionary. The method returns whether or not the new pair was added.
- public bool TryAdd(TKey key, TValue value)
- Update the value for an existing key in the dictionary, but only if the existing value for that key is equal to a specific value…
- Use TryUpdate. TryUpdate accepts the key and the new value to set that key to, but it also accepts the value that the key in the dictionary must currently contain in order for the update to succeed. You can think of TryUpdate like Interlocked.CompareExchange but for an element in the dictionary.
- public bool TryUpdate(TKey key, TValue newValue, TValue comparisonValue)
- Store a key/value pair into the dictionary unconditionally, overwriting any value for that key if the key already exists…
- Use the indexer’s setter, e.g. dictionary[key] = newValue.
- public TValue this[TKey key] { get; set; }
- Add a key/value pair to the dictionary if the key doesn’t exist in the dictionary, or if the key does exist, update the value for the key based on the key’s existing value…
- Use AddOrUpdate. AddOrUpdate has two overloads:
- One overload accepts the key as well as two delegates. The first delegate is used if the key doesn’t exist in the dictionary; it accepts the key and returns the value that should be added for the key. The second delegate is used if the key does exist: it accepts both the key and the current value for the key, and it returns the new value that should be set for the key.
- public TValue AddOrUpdate(TKey key, Func<TKey, TValue> addValueFactory, Func<TKey, TValue, TValue> updateValueFactory)
- The other overload accepts the key, an add value, and the update delegate. This is exactly the same as the former overload, except that if the key isn’t in the dictionary, the provided value is added for the key, rather than invoking a delegate to get the necessary value
- public TValue AddOrUpdate(TKey key, TValue addValue, Func<TKey, TValue, TValue> updateValueFactory)
- Get the value for a key in the dictionary, adding the value to the dictionary (and returning it) if the key doesn’t exist…
- Use GetOrAdd. You can think of this as lazy-initialization for a key/value pair in the dictionary, getting the value if it’s there, or adding it and then getting it if it’s not. As with AddOrUpdate, GetOrAdd provides two overloads: one takes the value to be added if the key doesn’t exist, and the other takes a delegate that will generate the value if the key doesn’t exist.
- public TValue GetOrAdd(TKey key, TValue value)
- public TValue GetOrAdd(TKey key, Func<TKey, TValue> valueFactory)
All of these operations are atomic and are thread-safe with regards to all other operations on the ConcurrentDictionary. The only caveat to the atomicity of each operation is for those which accept a delegate, namely AddOrUpdate and GetOrAdd. For modifications / writes to the dictionary, ConcurrentDictionary employs fine-grained locking to ensure thread-safety (reads on the dictionary are performed in a lock-free manner); however, these delegates are invoked outside of the locks in order to avoid the myriad of problems that can arise from executing unknown code under a lock. As such, the code executed by these delegates is not subject to the atomicity of the operation.
We’ll be regularly posting answers to frequently asked questions that we’ve gotten on the forum, internal email lists, etc. Here’s the first – enjoy!
Why is the speedup not X on my X-way machine? Or, why does my parallel code run slower?
Less than ideal speedup can typically be attributed to two things:
1. Sequential costs. It’s often the case that parts of a particular algorithm must be executed sequentially. As the parallelizable parts get faster (with the addition of more cores), the sequential parts become more significant (Amdahl's Law).
2. General overheads. For example, the costs associated with partitioning/merging data, scheduling tasks to the runtime, synchronization to support additional features, or delegate invocations.
The former can sometimes be addressed by choosing a different algorithm that requires less serial execution. The latter can often be addressed by reducing communication between tasks in a parallel operation, making fine-grained parallelism more coarse-grained, etc. Also, Microsoft is constantly working to reduce the impact of both points, where possible.
On Code Gallery, we have a plethora of samples that highlight aspects of the .NET Framework 4 that help with writing scalable and efficient parallel applications. This post examines each of those samples, providing an overview of what each provides.
| Project Name: AcmePizza Languages: C#, Visual Basic Description: This simple and somewhat silly application demonstrates using concurrent collections with WPF. The collections are wrapped with observable facades, such that multiple threads may modify the collections concurrently, and those updates are safely propagated to UI controls.
|
| Project Name: BabyNames Languages: C#, Visual Basic Description: This is one of the first applications we ever built to use Parallel LINQ. Using LINQ and PLINQ, it queries a data set of baby name popularity information, sorts the results, and displays the results in a simplistic WPF user interface.
|
| Project Name: BlendImages Languages: C# Description: A demo of very simple image manipulation using a Parallel.For loop. The application allows the user to load up two images and blends them together into a single, new image.
|
| Project Name: ComputePi Languages: C#, Visual Basic Description: A console application that estimates the value of PI using a variety of both serial and parallel implementations, the latter done with both PLINQ and the Parallel class. |
| Project Name: DiningPhilosophers Languages: C#, Visual Basic Description: A WPF application that demonstrates the classic “Dining Philosophers” synchronization problem. The application implements several solutions, including one based on asynchronous techniques using Tasks. |
| Project Name: EditDistance Languages: C#, Visual Basic Description: A console application that uses Tasks to parallelize a dynamic programming problem, that of computing the “edit distance” between two strings.
|
| Project Name: GameOfLife Languages: C#, Visual Basic Description: This application provides an implementation of Conway’s Game of Life, using the Parallel class to parallelize the processing of the cellular automata. |
| Project Name: ImageColorizer Languages: C#, Visual Basic Description: This application manipulates an image by converting the majority of the image to grayscale, except for portions of the image containing user-selected hues. |
| Project Name: LINQRayTracer Languages: C#, Visual Basic Description: Based on Luke Hoban’s LINQ implementation of a ray tracer, this application parallelizes a computationally intensive LINQ query using PLINQ. |
| Project Name: MandelbrotFractals Languages: C#, C++/CLI Description: This application provides an implementation of the classic Mandelbrot fractal, parallelizing the processing of the fractal using the Parallel class. |
| Project Name: Morph Languages: C# Description: Implements a morphing algorithm between two images. Parallelization is done using the Parallel class. |
| Project Name: NQueens Languages: C#, Visual Basic Description: This application implements a solution to the N-Queens problem, using both LINQ and PLINQ. |
| Project Name: OptionPricing Languages: C#, Visual Basic Description: This Excel VSTO application utilizes PLINQ to price Asian Options.
|
| Project Name: ParallelGrep Languages: C#, Visual Basic Description: This console application implements “grep” functionality across a file system using PLINQ. |
| Project Name: Raytracer Languages: C#, Visual Basic, F# Description: This Windows application provides an animated, ray-traced bouncing ball. Sequential and parallel implementations are provided, as is a special parallel implementation that colors the animated image based on which thread was used to calculate which regions. |
 | Project Name: ShakespeareanMonkeys Languages: C#, Visual Basic Description: This application implements and parallelizes a genetic algorithm for breeding monkeys able to speak text from Hamlet. |
| Project Name: SpellChecker Languages: C#, Visual Basic Description: This application implements and parallelizes a spellchecking algorithm based on the same edit distance calculation in the Edit Distance sample. |
| Project Name: Strassens Languages: C# Description: This application implements several algorithms for performing and parallelizing matrix multiplication, including the Strassen algorithm. |
| Project Name: Sudoku Languages: C#, Visual Basic Description: This is a fun application that provides a full Sudoku experience, including on-demand puzzle generation and solving. Unlike many Sudoku demos which parallelize the solver, this implementation parallelizes the generator, using PLINQ. It also demonstrates a use for speculative execution. |
| Project Name: VisualizePartitioning Languages: C#, Visual Basic Description: This application demonstrates various approaches to partitioning as employed by both Parallel and PLINQ. |
| Project Name: ParallelExtensionsExtras Languages: C# Description: This class library provides a plethora of interesting and useful extensions to take advantage of and complement the functionality available in the .NET Framework 4 for parallel programming. |
We've refreshed our Beta 2 samples for parallel programming with the .NET Framework 4. Thanks to the gracious assistance of the fabulous Lisa Feigenbaum and others on the Visual Basic team, in this refresh the majority of the samples are now available not only in C# but also in Visual Basic. The samples are available for download at http://code.msdn.microsoft.com/ParExtSamples, and a description of the samples is available at http://blogs.msdn.com/pfxteam/archive/2009/12/09/9934811.aspx.
Enjoy!
Check out the updated Parallel Computing Developer Center on MSDN. Both the look and the content have been revised significantly…
Enjoy!
Attendees at PDC09 this past week were privy to quite a few sessions on parallel computing. Now that the videos of these sessions are online, you can view them as well from the comfort of your own home. Here are some of the key parallelism-related sessions from this past week:
Overview
Managed code in Visual Studio 2010
Native code in Visual Studio 2010
HPC Server
Research and Incubation
Enjoy!