Attendees at PDC09 this past week were privy to quite a few sessions on parallel computing. Now that the videos of these sessions are online, you can view them as well from the comfort of your own home. Here are some of the key parallelism-related sessions from this past week:
Overview
Managed code in Visual Studio 2010
Native code in Visual Studio 2010
HPC Server
Research and Incubation
Enjoy!
It’s awesome to see the Reactive Extensions to .NET (Rx) live on the DevLabs site. If you haven’t checked out this exciting project, we urge you to do so. Not only is it cool and useful technology, but the download includes a back ported (and unsupported) release of Parallel Extensions for the .NET Framework 3.5 in the form of System.Threading.dll. Rx relies on this, and you can also use it to try out our new parallel programming support for .NET 4 but with your .NET 3.5 projects.
I recently sat down with Wes Dyer from the Rx team to discuss the Reactive Extensions integration with Parallel Extensions. You can view a video of that discussion at Wes Dyer and Stephen Toub: Rx and Px - Working Together.
While unsupported, the 3.5 back port does contain all of the types and constructs present in the .NET 4 supported release. In fact, using the back ported version is very similar to using the bits in .NET 4, albeit with a few known differences:
- In order to use the features with .NET 3.5 applications, you will need to add a reference to the back ported standalone DLL named System.Threading.dll. This additional step is not present with .NET 4, since the parallel technology is baked into core CLR components.
- The performance of the parallel types is optimized for and predicated on the .NET 4 Framework. You should not assume that an application using the .NET 3.5 version will exhibit similar performance characteristics to one using the supported .NET 4 version. Parallel Extensions extensively leverages advancements made in the .NET 4 ThreadPool and other performance optimizations present in .NET 4.
- Unlike in .NET 4, the OperationCanceledException type does not have a constructor that accepts a CancellationToken, nor does the exception have a CancellationToken property. In order to throw an exception to cooperatively acknowledge cancellation within a Task, you may use the ThrowIfCancellationRequested method present on the CancellationToken type. Refer to MSDN documentation for more details.
- To handle exceptions thrown due to cancellation, users should catch the OperationCanceledException. Although the exception thrown is of type OperationCancelledException2, users will not be able to catch this type as it is an internal exception derived from OperationCanceledException (necessary as part of the back port to maintain consistency).
- The Parallel Task and Parallel Stack tool windows in Visual Studio 2010 will not highlight tasks from the .NET 3.5 back port. These features require .NET 4.
- The concurrency visualizations in the Visual Studio 2010 profiler will not show parallel region markers for parallel loops in the .NET 3.5 back port. This is due to the .NET 3.5 version not outputting any of the ETW events output by the .NET 4 release.
Even with these limitations, we hope you enjoy the release! You can also expect further integration between Rx and Parallel Extensions in future drops on the DevLabs site.
The new parallel debugger windows in Visual Studio 2010 (Parallel Tasks and Parallel Stacks) have had many fixes and updates.
I have refreshed the existing content and also added new material for Beta 2. Find links to all of it from my blog post on Parallel Debugging.
Cheers
Daniel
The Microsoft Biology Initiative (MBI) is a combined project in Microsoft Research focused on two components: the Microsoft Biology Foundation (MBF) and the Microsoft Biology Tools (MBT). Both the framework and the tools are related to the areas of computation biology, genomics, and bioinformatics. MBF is a language-neutral bioinformatics toolkit built as an extension to the Microsoft .NET Framework. Currently it implements a range of parsers for common bioinformatics file formats; a range of algorithms for manipulating DNA, RNA, and protein sequences; and a set of connectors to biological Web services such as NCBI BLAST.
Microsoft Research has released Beta 1 of MBF, including a version that runs on .NET 4 Beta 2 and that’s using Parallel Extensions to provide impressive speedups on some new algorithms that have been added in the release. From the release notes:
“Support for .NET Parallel Extensions. In addition to the complete Beta 1 version of the project being released, we are also providing the Beta 1 - Dev10 Preview release. This version of the code and binaries are utilizing the latest in technology improvements in the .NET 4.0 and Visual Studio 2010 Beta 2 release. The inclusion of a novel multiple sequence alignment algorithm, PAMSAM, provided as an example of how this technology can be used to turn a commodity desktop computer into a valuable research asset.”
If you’re interested in bioinformatics, definitely check this out.
PLINQ enables developers to scale up computations in order to leverage the multiple cores available in modern hardware. For many problem domains, this is quite useful and sufficient. What happens, however, when a workload being processed is so big that even a manycore machine is insufficient to adequately handle the load? This can be the case with massively complex calculations. Moreover, in the age of “big data,” more and more we as an industry are encountering problems as we attempt to analyze and churn through data sets measured in terabytes and even petabytes.
In such situations, one solution is to scale out, distributing the computation amongst multiple computers aggregated into a computing cluster. That’s the domain of HPC, High Performance Computing, for which Microsoft provides valuable support through Windows HPC Server.
For those of you attending PDC09 or SC09 next week, we’re excited that we’ll be showcasing DryadLINQ, a project from Microsoft Research that runs LINQ queries on an HPC Server cluster. Instead of just using an IEnumerable, DryadLINQ provides a PartitionedTable entity that represents data partitioned across the nodes of the cluster. You can create a PartitionedTable from any IEnumerable, or you can preload the data onto the cluster manually. When you write a LINQ expression on a PartitionedTable, that expression is parsed and shipped to the cluster, which executes the expression in a distributed fashion. The Dryad execution infrastructure running on HPC Server and targeted by DryadLINQ handles the communication amongst the cluster nodes, deals with data partitioning, ensures proper aggregation, and so forth. The cluster nodes themselves run PLINQ in order to fully utilize as many cores as are available in the machine in order to achieve maximum speedup locally. Results can be streamed back into your program, or you can have them saved back into another partitioned table on the cluster to be reused by a subsequent query or manually at a later time.
Here’s an example of using DryadLINQ to implement the core of the PageRank algorithm:
public static IQueryable<Rank> Step(
IQueryable<Page> pages, IQueryable<Rank> ranks)
{
// Join pages with ranks, and disperse updates.
var updates = from page in pages
join rank in ranks on page.name equals rank.name
select page.Disperse(rank);
// Re-accumulate.
return from list in updates
from rank in list
group rank.rank by rank.name into g
select new Rank(g.Key, g.Sum());
}
…
var pages = PartitionedTable.Get<Page>(“pages.pt”);
var ranks = pages.Select(page => new Rank(page.name, 1.0));
ranks = Step(pages, ranks);
ranks.ToPartitionedTable<Rank>(“ranks.pt”);
As you can see, the actual algorithm is just a standard LINQ query and could run as well on a single machine (also in parallel if you used PLINQ’s AsParallel operator). However, as the data source is a PartitionedTable loaded from the cluster, this query will run now in a distributed fashion utilizing the remote environment, and all with very little additional effort.
If you’ll be at PDC09, we encourage you to go to John Vert’s presentation on DryadLINQ (currently scheduled in room 515A on Tuesday at 3:00 PM). If you’ll be attending SC09, come visit us at the Microsoft Exhibitor Booth for live demos, more details, and conversations about scenarios. Dryad and DryadLINQ are currently research projects, and we’d love any feedback you may have.
See you all soon!
I've recently written a detailed paper, "Patterns for Parallel Programming: Understanding and Applying Parallel Patterns with the .NET Framework 4", on implementing a variety of common parallel patterns with the .NET Framework 4. The paper is now live in the Microsoft Download Center, and you can grab it from http://www.microsoft.com/downloads/details.aspx?FamilyID=86b3d32b-ad26-4bb8-a3ae-c1637026c3ee.
Any and all feedback is welcome. I very much hope you enjoy it and find it useful.
Thanks!
Stephen
Rejoice! Visual Studio 2010 Beta 2 is upon us and it includes lots of great changes for the Parallel Extensions. TPL guru Danny Shih has already covered what’s new in TPL and PLINQ aficionado Ed Essey has covered the best way to use LINQ (specifically between Beta 1 and Beta 2) but your favorite set of thread-safe collections and synchronization primitives are in on the action too.
So what’s changed? Overall, we’ve been fixing bugs, of course, but we’ve also spent a lot of time fine-tuning these types to get the best performance possible. That said, performance varies greatly from architecture to architecture and from scenario to scenario, so make sure you stick on your detective hat, fire up the Concurrency Visualizer and go sleuthing around your app’s performance profile to ensure that you’re using the right types in the right way. We’re always happy to see perf results and field perf-related questions on our forum.
I’m not going to talk about specific bugs or perf improvements here but let’s jump into functionality that’s been added or changed:
Barrier
In Beta 1, Barrier used to track phase numbers using a 32-bit integer. While most scenarios for Barrier couldn’t conceive of exhausting a range of 4+ billion numbers, for long running services, it’s not infeasible. That’s why we’ve decided to change Barrier to track phases using a 64-bit integer (so check for any code that called AddParticipant{s} or CurrentPhaseNumber). We encourage you to take comfort in the fact that your tasks and threads can joyously join together in cyclic fashion for an eternity*.
Also, in previous releases, if you decided to give Barrier a Post Phase Action to execute when all threads completed a phase, any exceptions thrown from that action would be thrown unwrapped and only on the thread that actually ended up executing the action. We’ve since realized that that’s a little silly. If your Post Phase Action faulted, more than likely, your algorithm is corrupt but only one thread would know about it. In Beta 2, all of your threads participating in a Barrier are now privy to these kinds of failures: any exceptions thrown from your action are wrapped in a BarrierPostPhaseException and thrown on all the threads that participated in that phase.
*Not really an eternity. A really really long time though.
BlockingCollection<T>
Good ol’ blocking collection. It’s giving producer threads and consumer threads everywhere a common ground upon which to communicate and brimming with thread-safety at every API. What’s that you say? In Beta 1 BlockingCollection<T>.Add() and BlockingCollection<T>.CompleteAdding() were not thread-safe with respect to each other? Using them concurrently might’ve corrupted the data structure so in Beta 1 you couldn’t do something like a nice parallel graph-traversal?
var targetNode = …;
var bc = new BlockingCollection<Node>(startingNodes);
// since we expect GetConsumingEnumerable to block, limit parallelism to the number of
// procs, avoiding too much thread injection
var parOpts = new ParallelOptions() { MaxDegreeOfParallelism = Enivronment.ProcessorCount };
Parallel.ForEach(bc.GetConsumingEnumerable(), parOpts, (node,loop) =>
{
if (node == targetNode)
{
Console.WriteLine(“hooray!”);
bc.CompleteAdding();
loop.Stop();
}
else
{
foreach(var neighbor in node.Neighbors) bc.Add(neighbor);
}
});
Well worry no longer. Our team of concurrency-geniuses has crammed even more thread-safety into BlockingCollection<T> for your graph-traversing pleasure by making these two methods thread-safe. (Note that Add() will now throw an exception if called after CompleteAdding() has been called so with Beta 2, you’ll need to put a catch block around the call to Add in this example.)
ConcurrentDictionary<TKey,TValue>
Since Beta 1 has released, we’ve gotten a lot of feedback, both internal and external, about ConcurrentDictionary<TKey,TValue>. CD is slightly more special than the other concurrent collections added in .NET 4: unlike the other collections, it provides direct access to all of it’s elements. This, of course, supports a whole different set of scenarios. While ConcurrentStack<T>, ConcurrentQueue<T>, and ConcurrentBag<T> are really about processing arbitrary elements, ConcurrentDictionary<TKey,TValue> is about processing elements associated with known keys. With the former, threads will only race on access to the end elements of the collection. With the latter, access to every element may result in a race. That said, in Beta 1, it was very difficult to do any multi-step operations with the elements in a ConcurrentDictionary<TKey,TValue>. Suppose you just wanted to create a simple thread-safe cache. You were forced to check if the value existed and add it if it did not. Of course, in the time between checking for the item and adding it, an element with the same key could have been added, forcing you to check for such a condition and trying again.
Enter two new atomic multi-step operations* on CD: GetOrAdd and AddOrUpdate.
GetOrAdd allows you to specify a key as well as a value (or delegate to produce the value). The method will atomically check for the existence of a given key and, if the key already exists, return the existing value. If the key does not exist, this method will add the new value you’ve specified (with or without the delegate). For example:
private ConcurrentDictionary<string, Data> _cache =
new ConcurrentDictionary<string,Data>();
// on multiple threads
public Data ProcessRequest(string input)
{
return _cache.GetOrAdd(input, (key) =>
{
...;
return ...;
}
}
AddOrUpdate enables you to either add a new element at a given key or update an existing one if that key already exists, great for scenarios that need thread-safe lookup-tables like counting the number of occurrences of a given string in parallel:
private ConcurrentDictionary<string,int> _counts =
new ConcurrentDictionary<string,int>();
// on multiple threads
public int CountRequest(string input)
{
return _counts.AddOrUpdate(input, 1, (key,old) => old + 1);
}
With AddOrUpdate, you specify the key that you’re interested in, the value to add if the key is not present (or a delegate to create that value), and, finally, a delegate that will generate the new value in the case that the key is already present. This delegate passes along the key and existing value key so your delegate can use it to determine the new value.
Nifty!
*Atomic with regards to other mutating methods on the collection (e.g. TryAdd/TryUpdate/TryRemove/etc.), excluding the execution of the user-provided delegate. This means that if multiple threads race to call GetOrAdd all of their delegates may be called and some delegates may be called multiple times. Users are expected to rollback any unwanted side-effects.
ConcurrentLinkedList<T> and ConcurrentLinkedListNode<T>
In each and every software professional’s career there comes a point where he or she might have to swallow their pride and let a creation that they love go. For some reason or another, their fancy invention just ultimately, doesn’t provide enough value to justify its existence. Now I know we went and got you all excited about ConcurrentLinkedList<T> in Beta 1 but we had to let it go (we did warn you though!). Unfortunately, with the time we had available we just couldn’t get it to be usable and perform well. There seem to be many thread-safe linked list implementations that are very scalable but usually that scalability is based on some assumption or odd caveat in the design that severely degrades the practicality of the type. It hurts us deeply to take CLL<T> out but its performance just wasn’t good enough to ship it. No need to send any flowers.
Lazy<T>, LazyInitializer, and ThreadLocal<T>
Oh boy has Lazy<T> been fun! In Beta 1, we told you that we renamed LazyInit<T> to Lazy<T>, added a new struct-based version called LazyVariable<T> and moved the thread-local mode over to its own type (ThreadLocal<T>). Well, in the interest of making these primitives as usable as possible, we’ve gone and done it again! Here’s a breakdown of the lazy-initialization primitives in Beta 2:
- Lazy<T> is still a class-based lazy initialization primitive that’s safe to pass around. We’ve given it it a simple constructor that accepts a bool to indicate whether it needs to be thread-safe or not.*
Also, we now detect recursive calls to Value in the valueFactory delegate. If your valueFactory delegate attempts to access the Value property on the Lazy<T> to which it was passed, an InvalidOperationException will be thrown.
Furthermore, we firmed up our stance on exception handling. In the interest of keeping the type observationally pure, if a valueFactory delegate throws an exception, that exception will be cached and subsequent calls to Value will result in that same exception being rethrown.
- LazyExecutionMode has been removed now that we have only two simple initialization modes on Lazy<T>*
- LazyVariable<T> has also been removed (we said it would!). There were just too many issues with having a struct-based version. If you need lighter-weight lazy initialization, use LazyInitializer. If you need the lighter-weight and an object to pass around, it’s simple enough to create your own struct that uses LazyInitializer’s methods internally.
- LazyInitializer still contains all the advanced static methods for lazy initialization patterns.
- ThreadLocal<T> remains unchanged other than some great perf improvements!
*Though in Beta 2 we reduced the number of execution modes to simply thread-safe and not-thread-safe, customer requests have proven that we still need the three. By RTM, there will be an additive constructor to support advanced developers that need tighter control over how thread-safety is achieved. Mostly, this is to enable library writers to use third-party initialization delegates without having to worry if they take locks and will potentially cause a deadlock. Regardless, feel free to use the constructors provided today as any changes are purely additive.
Serialization
In Beta 1, ConcurrentBag<T>, ConcurrentDictionary<TKey,TValue>, ConcurrentQueue<T>, ConcurrentStack<T>, and Lazy<T> all implemented ISerializable. Turns out we were a little behind the times. ISerializable is not version tolerant. We’ve remedied the situation and all of these types now use VTS for serialization, with little to no impact on you!
Have fun with Beta 2 and drop us a line. We’d love any feedback you have on the changes!
Josh and the coordination data structures crew
Included in the .NET 4 Framework Beta 2 is a more robust and faster version of PLINQ. Between B1 and B2, PLINQ changes have mainly been under the covers, so hopefully no need to rewrite any of your applications to see the improvements.
1. Many improvements to performance and scalability
2. GroupBy and GroupJoin now preserve ordering within the groups
3. Better integration with performance profiling, which now shows PLINQ markers
4. Cancellation consistently works across all PLINQ operators
5. Scoped some of the boundary conditions of PLINQ in .NET 4 in Beta2 of Parallel Extensions as follows:
a. Max Degree of Parallelism changed from 64 to 63
b. Greater consistency regarding maximum input length
i. PLINQ does not support inputs of length > Int32.MaxValue. Queries may throw OverflowException for many operators.
ii. LongCount no longer counts beyond Int32.MaxValue
6. Improved robustness
a. Reliability and concurrency focus
b. More consistent exception handling and error model
7. Other bug fixes
If any of these severely impact your applications, particularly the boundary conditions, please let us know so that we can include that in our planning efforts. For now, we found that it makes the most sense to scope some inputs and give a reliable and predictable experience in this release.
Happy Coding!
Related posts:
Last time, we covered Tasks being detached by default and some refactorings in our multiple-Task continuation APIs. The final post of this series will discuss Nested Tasks and Unwrap, a Parallel namespace change, and some changes under the covers.
Nested Tasks and Unwrap
We’ve added the Unwrap APIs to address scenarios that deal with nested Tasks. Before jumping into Unwrap, let’s first talk about nested Tasks, e.g. a Task<Task> or Task<Task<TResult>>. For example:
var nestedTask = Task.Factory.StartNew(() =>
{
return Task.Factory.StartNew(() =>
{
return 42;
});
});
Nested Tasks commonly lead to unexpected behavior in applications. For example, consider an API that provides the following functionality for asynchronously logging into a web service (like one from Facebook), retrieving a list of friends, and sending an email to each friend.
// Given a user name and password, returns a Task whose
// result is a UserToken object.
public Task<UserToken> LogOn(string username, string password);
// Given a UserToken, returns a Task whose result
// is a collection of Friend objects.
public Task<FriendCollection> GetFriendList(UserToken userToken);
// Given a Friend, subject, and body, returns a Task that
// represents an email sending operation.
public Task SendEmail(Friend friend, string subject, string body);
A user would like to be able to write code like the following, utilizing these APIs and Task continuations:
var userToken = LogOn(user, pass);
var friends = userToken.ContinueWith(_ => GetFriendList(userToken.Result));
var emails = friends.ContinueWith(_ =>
{
var sentMails = new List<Task>();
foreach(var friend in friends.Result)
{
sentMails.Add(SendEmail(friend, subject, body));
}
return Task.Factory.ContinueWhenAll(
sentMails.ToArray(), tasks => Task.WaitAll(tasks));
});
emails.ContinueWith(_ => Console.WriteLine(“All emails sent”));
The bolded code is problematic. Because the GetFriendList method returns a Task<FriendCollection>, the ‘friends’ variable is actually going to be a Task<Task<FriendCollection>>. This will cause a compiler error at the foreach loop, because ‘friends.Result’ will return a Task<FriendCollection> instead of a FriendCollection. The compiler error in this case is a good thing, of course, highlighting a programming error. However, once a developer realizes the type mismatch, he still has to code additional logic to unwrap the ‘friends’ Task so that it returns an actual FriendCollection. This logic is nontrivial, especially if it is to correctly deal with exceptions, cancellation, etc. The last line is also problematic. The emails variable here is actually of type Task<Task>. The outer Task will complete once the inner Task is returned from its body, even if the inner Task hasn’t completed. The net result of this is that “All emails sent” could be written out prior to all of the email tasks actually completing.
Now, you may have noticed that, in Beta 1, we provided special ContinueWith overloads to deal with this type of scenario. For example:
public Task<TResult> ContinueWith<TResult>(
Func<Task, Task<TResult>> continuationFunction);
The Func returns a Task<TResult>, so normally, ContinueWith would return a Task<Task<TResult>>. But this ContinueWith overload does some magic under the covers to return a Task<TResult>. There were a number of reasons why we didn’t like this approach, including:
· Too much magic. It’s hard to explain why one set of ContinueWith overloads is “just different”.
· Only works for ContinueWith. What if user scenarios result in nested Tasks for other Task creation APIs like StartNew, ContinueWhenAll, etc.?
· What if nested Tasks are actually desired? If a user actually wants that Task<Task<TResult>>, he still might unknowingly bind to this magical overload.
Given these reasons, our solution for Beta 2 and beyond is two Unwrap extension methods.
namespace System.Threading.Tasks
{
public static class TaskExtensions
{
public static Task Unwrap(
this Task<Task> task);
public static Task<TResult> Unwrap<TResult>(
this Task<Task<TResult>> task);
}
}
These methods may be used to transform any Task<Task> or Task<Task<TResult>> into a Task or Task<TResult>, respectively. The transformation performed produces a Task or Task<TResult> that fully represents the original nested Task, including exceptions, cancellation state, etc.
With Unwrap, we can fix the above scenario (note the bolded).
var userToken = LogOn(user, pass);
var friends = userToken.ContinueWith(
_ => GetFriendList(userToken.Result)).Unwrap();
var emails = friends.ContinueWith(_ =>
{
var sentMails = new List<Task>();
foreach(var friend in friends.Result)
{
sentMails.Add(SendEmail(friend, subject, body));
}
return Task.Factory.ContinueWhenAll(
sentMails.ToArray(), tasks => Task.WaitAll(tasks));
}).Unwrap();
emails.ContinueWith(_ => Console.WriteLine(“All emails sent”));
Parallel Namespace Change
We’ve moved the Parallel class from the System.Threading namespace to the System.Threading.Tasks namespace. We found that most applications needed to bring in both namespaces when using TPL, so why not put everything into one namespace? Additionally, Parallel is such a common word (and will likely become more so in the future), and System.Threading such a common namespace, we wanted to reduce the chances of conflict with other .NET types as much as possible.
Here’s a useful IDE tip. Use “Ctrl + .” to automatically bring in the relevant namespace once you’ve typed a class/type name.

Under the Covers
Beta 2 contains quite a few bug fixes not done for Beta 1. All of them were important, but we’ll focus on only two here.
First, Parallel.For and ForEach have been tweaked for better load-balancing with other workloads in the current or other AppDomains. Essentially, the change was to service the parallel loops with Tasks that periodically retired and re-queued themselves, allowing other contenders to grab Threads and make progress.
Second, waiting for Tasks in parent/child relationships has become more efficient. In Beta 1 and before, parent Tasks waited for their children using explicit Waits, so even if a parent completed first, it would burn a thread until all of its children completed. In Beta 2, parent Tasks wait for their children using callbacks; the parent maintains a count for number of children it has, and each child decrements the count as it completes. This waiting logic significantly improves scalability.
That’s it folks! We hope you’ve enjoyed this series. Download Beta 2 and try it out!
Are you attending PDC09? Want to meet with members of the Parallel Computing Platform team? See
http://blogs.msdn.com/visualizeparallel/archive/2009/11/02/the-parallel-computing-platform-team-pcd-09.aspx for more details.
We've posted a whole bunch of samples on Code Gallery showcasing how to use the new parallelism support in the .NET Framework 4. You can find them at http://code.msdn.microsoft.com/ParExtSamples. Enjoy!
Last week, I had the privilege of touring around Tennessee, Kentucky, Ohio, and Michigan, speaking about the new parallel computing support in Visual Studio 2010 and the .NET Framework 4. Many folks I spoke with were interested in getting a copy of the slide deck I used, so I’ve attached it to this blog post. Enjoy!
The goal of PLINQ is to execute computationally intensive LINQ to Objects queries efficiently by splitting up the work across multiple cores on multi-core machines. However, not all queries are equally appropriate for parallelism.
Usually, the best way to use PLINQ is to write short, simple queries with an expensive delegate. This is one example of such query:
var q = src.AsParallel()
.Where(x => ExpensiveFilter(x));
foreach(var x in q) { ... }
And here is another:
int sum = src.AsParallel()
.Select(x => ExpensiveFunc(x))
.Sum();
One design goal behind PLINQ is maximum parity with LINQ to Objects. So, you can combine LINQ operators in all kinds of ways, and PLINQ will correctly execute the query. However, performance characteristics get trickier as queries get more complex.
In some cases, PLINQ may decide to run the query in its entirety or in part sequentially. For example, this query will execute sequentially up to and including the TakeWhile operator:
src.AsParallel()
.Select(x => Foo(x))
.TakeWhile(x => Filter(x))
.Select(x => Bar(x))
.ToArray();
The TakeWhile operator introduces a tricky sequential dependency – whether an element is or isn’t included in the output depends on the result of Filter() on all previous elements in the sequence. There are various ways to execute this query partly in parallel that take different trade-offs. Depending on the selectivity of Filter and the costs of Foo, Bar and Filter, there are different algorithms which may be appropriate.
Whether a particular query executes in part sequentially depends on the combination of operators in the query. The precise rules are fairly complex, but they can be summarized in a simple way. If a query contains one of these operators, PLINQ may decide to run it sequentially:
· First, Last
· Take, Skip
· TakeWhile, SkipWhile
· Concat
· Special overloads of Select, Where and SelectMany that pass position indices into the user delegate
· ElementAt
· Zip
For example, if the Take operator follows a Where operator, PLINQ will execute the Where and the Take sequentially by default. Filtering followed by a Take is a tricky operation to parallelize – different algorithms are appropriate depending on the selectivity of the filter, the size of the argument passed to Take, and other details. However, if the Take is applied straight to an array (or an array followed by a Select), PLINQ can handle the Take operator efficiently simply by restricting execution to a section of the array.
Also, in some queries, SelectMany causes the part of the query that comes before the SelectMany to run sequentially. I haven’t seen a practical example where this would be an issue, though. SelectMany produces multiple output elements for each input, and the part of the query before the SelectMany is normally not the computationally expensive part.
You can prevent PLINQ from falling back to sequential execution by turning on the ForceParallelism mode. In the ForceParallelism mode, PLINQ will always use parallel algorithms to execute the query, even if potentially expensive algorithms will be used. This is desirable if the query contains an expensive delegate:
src.AsParallel()
.WithExecutionMode(
ParallelExecutionMode.ForceParallelism)
.Select(x => Foo(x))
.TakeWhile(x => Filter(x))
.Select(x => Bar(x))
.ToArray();
Alternatively, you can try breaking up the query so that only the simple but computationally expensive part of the query is done in PLINQ, and the rest of the processing is done in LINQ to Objects:
src.Select(x => Foo(x))
.TakeWhile(x => Filter(x))
.AsParallel() // <- only parallelize here
.Select(x => Bar(x))
.ToArray();
This implementation will scale well if the calls to Bar() represent the bulk of the work in the query. This solution is often preferable, as the part of the query that executes in parallel is clearly marked, and thus the code is easier to understand.
Related posts:
Last week, we talked about how TPL adopted a new, better cancellation model. Today, we’ll cover a change that makes Tasks Detached by Default, some ContinueWhenAll/Any Refactoring, and the handy UnobservedTaskException event.
Tasks are Detached by Default
In Beta 2, we have changed an important default. Tasks are now created as detached (instead of attached) if no options specify otherwise.
Let’s consider the following code to review the difference between attached and detached Tasks.
Task p = Task.Factory.StartNew(() =>
{
Task c = Task.Factory.StartNew(() =>
{
DoWork();
});
});
p.Wait();
In Beta 1 and before, since the default options are used, ‘c’ is created as a child Task of Task ‘p’, the parent Task; we refer to this as Task ‘c’ being “attached” to Task ‘p’. This means that the p.Wait() statement will not return until the call to DoWork completes, because parent Tasks do not complete until all of their child Tasks complete. To opt out of this behavior, a user needs to create ‘c’ with the DetachedFromParent option:
Task c = Task.Factory.StartNew(() =>
{
DoWork();
}, TaskCreationOptions.DetachedFromParent);
The original code shown behaves differently in Beta 2. Now, by default, ‘c’ is not related to ‘p’ (it’s “detached” by default), and the p.Wait() statement will return as soon as ‘p’ completes, regardless of the status of Task ‘c’ and thus regardless of when DoWork returns. To opt in to the parent/child relationship, a user needs to create ‘c’ with the AttachedToParent option:
Task c = Task.Factory.StartNew(() =>
{
DoWork();
}, TaskCreationOptions.AttachedToParent);
Here is a summary of the changes:
· Removed the DetachedFromParent option
· Added the AttachedToParent option
· Changed the default behavior so that Tasks do not enlist in parent/child relationships when no options are specified.
There were a number of reasons why we decided that detached is the correct default and to move forward with this change, including:
· Many users were using attached Tasks unknowingly. The vast majority of the time, users create Tasks for simple, asynchronous work. In such scenarios, parent/child relationships (and the implicit waiting) are not needed. We found through many interactions that folks were just going with the default options and were accidentally opting in to this behavior. In the best case, this would only result in a slight performance cost. In the worst case, this would bring with it incorrect behavior that would lead to difficult to diagnose errors.
· Easier migration from ThreadPool.QueueUserWorkItem. Tasks are now the recommended way to queue work to the ThreadPool, but the easiest way to create Tasks resulted in different behavior from QueueUserWorkItem (where there’s no concept of parent/child work items). This change makes Task.Factory.StartNew (with no options) a true replacement for QueueUserWorkItem.
· Additional behavior should be opt-in and pay-for-play. Almost everything in TPL that results in additional behavior is opt-in, e.g. cancellation, LongRunning, PreferFairness. With the Beta 1 default, users opt-out of parent/child relationships. In Beta 2, users opt-in, making it consistent. This makes the extra functionality provided by parent/child relationships pay-for-play, such that you don’t pay the cost for parents implicitly waiting for their children or for exceptions propagating from children to parents unless you need that functionality.
ContinueWhenAny/All Refactoring
We have refactored the set of ContinueWhenAny and ContinueWhenAll overloads to make things more intuitive, consistent, and complete.
To demonstrate the main issue, let’s consider the following overload that was provided in Beta 1.
public class TaskFactory<TResult>
{
public Task<TNewResult> ContinueWhenAny(
Task<TResult>[] tasks,
Func<Task<TResult>, TNewResult> continuationFunction);
}
This confused the meaning of TaskFactory<TResult>, which is meant to create tasks of type Task<TResult>. However, with these overloads, TaskFactory<TResult> could be used to create tasks of type Task<TNewResult>. As an example, consider the code:
Task<int>[] taskOfInts = ...;
Task<string> t = Task<int>.Factory.ContinueWhenAll(taskOfInts, _ => “”);
This compiles and works just fine, but the type parameter mismatch (shown in bold) is certainly odd. To address this, we changed a bunch of overloads, so that instead of taking Task<TResult>s and returning a Task<TNewResult>, they take Task<TAntecedentResult>s and return Task<TResult>s. For example, the overload that replaced the above is:
public Task<TResult> ContinueWhenAny<TAntecedentResult>(
Task<TAntecedentResult>[] tasks,
Func<Task<TAntecedentResult>, TResult> continuationFunction);
And the above example becomes:
Task<int>[] taskOfInts = ...;
Task<string> t = Task<string>.Factory.ContinueWhenAll(taskOfInts, _ => “”);
In addition to this change, we also added, removed, or modified a number of other overloads to make the set consistent and complete. Now, the entire set of ContinueWhenAll and ContinueWhenAny overloads follow these clear rules:
· A TaskFactory creates Tasks, but also provides overloads to create Task<TResult>s.
· A TaskFactory<TResult> only ever creates Task<TResult>s (never Tasks or Task<TNewResult>s).
UnobservedTaskException event
We’ve added an event that fires for every Task exception that goes unobserved. Recall that to “observe” a Task’s exceptions, you must either Wait on the Task or access its Exception property after it has completed. At least one of these actions must be done before the Task object is garbage collected, or its exceptions will propagate (currently this occurs on the finalizer thread).
The new static event resides on the TaskScheduler class, and subscribing to it is straightforward. Here’s an example to log all unobserved exceptions and mark them as observed (preventing them from being propagated).
TaskScheduler.UnobservedTaskException +=
(object sender, UnobservedTaskExceptionEventArgs exceptionArgs) =>
{
exceptionArgs.SetObserved();
LogException(exceptionArgs.Exception);
};
Some customers have complained that TPL’s exception policy is too strict. The UnobservedTaskException event provides an easy way out by allowing you to simply squash all Task exceptions in an application (though using it in this manner is not recommended). The primary reason that we made the addition was to support host-plugin scenarios where a host application can still be perfectly useful in the presence of some truly harmless exceptions (thrown by buggy plugins). These scenarios may be achieved using the UnobservedTaskException event in conjunction with AppDomains to sandbox plugins. Look for a future post that describes this in more detail!
We’re done for now! The 3rd and final post of this series will cover the new Unwrap APIs, a Parallel namespace change, and some changes under the covers.
If you’re going to PDC this year, we have four great talks on parallelism coming you’re way and, if you’re not, may we suggest you sign up?
We don’t have the exact dates of the talks yet (we’ll let you know when we do) but here are the talks you won’t want to miss!
Patterns of Parallel Programming: A Tutorial on Fundamental Patterns and Practices for Parallelism
(by Richard Ciapala, Ade Miller, Herb Sutter, and our very own Stephen Toub)
A workshop for experienced developers who are relatively new to parallel computing. Learn how established software patterns can help you build on Microsoft’s Parallel Computing Platform (including deep dives into TPL and PLINQ).
Manycore and the Microsoft .NET Framework 4: A Match Made in Microsoft Visual Studio 2010
(by Stephen Toub)
A deep dive into the System.Threading.Tasks and System.Collections.Concurrent namespaces, cutting-edge concurrency views in the Visual Studio profiler, and debugger tool windows for analyzing the state of concurrent code.
PLINQ: LINQ, but Faster!
(by Igor Ostrovsky and Ed Essey)
Our very own Igor and Ed dive deep into PLINQ via Visual Studio 2010. See what it looks like from the perspective of LINQ developers, the debugging and profiling support, how it's implemented under the covers, and how to best incorporate it into your applications in order to reap the performance benefits of the manycore era.
The State of Parallel Programming
(by supercomputing luminary Burton Smith)
A “relatively recent consensus view about what is needed for productive parallel programming, and why.”
F# for Parallel and Asynchronous Programming
(by Luke Hoban)
Luke will take you through the core concepts of the F# language and show you how ideas like immutability, functional design, async workflows, agents, and more can be used to meet the challenges of today’s real-world applications.
C++ Forever: Interactive Applications in the Age of Manycore
(by Rick Molloy)
Come for a deep dive into the power of actor-based and dataflow programming in Microsoft Visual C++ 2010.
Lighting up Windows Server 2008 R2 Using the ConcRT on UMS
(by Dana Groff)
See examples of how to use C++ and the new Concurrency Runtime (ConcRT) to take advantage of new technologies on Windows Server 2008 R2, such as the ability to scale beyond 64 cores and User-Mode Scheduling (UMS) of threads
Have fun!
Josh Phillips | Program Manager | Parallel Computing Platform |Microsoft