All about Async/Await, System.Threading.Tasks, System.Collections.Concurrent, System.Linq, and more…
PLINQ is a very cool technology, and I believe it will prove useful for parallelizing operations in a wide range of important scenarios. Moreover, I believe that the programming model it provides will enable a wide-range of developers to easily take advantage of concurrency in their applications. However, one of the risks involved in our providing such a simple programming model is that we don't currently shield developers from all of the possible issues they may run into from using it to parallelize code.
As an example, there is a set of 101 LINQ samples available at http://msdn2.microsoft.com/aa336746.aspx. Unfortunately, many of these samples rely on implementation details and behaviors that don't necessarily hold when moving to a parallel model like the one employed by PLINQ. In fact, some of them are dangerous when it comes to PLINQ.
On our team, we've been referring to these kinds of dangers as "parallelism blockers," and they're one of the primary reasons we're not planning to automatically replace all LINQ-to-Objects queries with PLINQ queries, instead supporting the opt-in model available through the AsParallel extension method. We've talked about some of these issues before (see the CTP documentation for a more thorough look at some of these issues), but as a refresher, consider a few examples:
The 101 LINQ samples are not good exemplars for PLINQ. Consider the Query Execution category. These samples contain code like:
int i = 0;var q = from n in numbers select ++i;
NOOOOOOO!!!!!!!!!!!!!! With PLINQ, those increments to i will potentially happen in parallel, and they won't be atomic. You could fix this by using Interlocked.Increment rather than ++, but then you'd likely destroy your performance.
Luckily, only a few of the 101 samples modify state. But that doesn't mean switching over to PLINQ will produce the same results as LINQ for all of these. Almost all of the 101 samples have the potential to produce different output orderings than the LINQ versions due to lack of order preservation by default (at least in the CTP; we're reconsidering whether that is a good default, so if you have any opinion on the issue, please let us know). For most of the samples, this doesn’t matter semantically (except for a hypothetical tool that does a straight output to output comparison between LINQ-to-Objects and PLINQ runs); for example, the first sample lists all of the numbers from the array { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 } where the numbers are less than 5, and the second sample lists all of the sold out products from a dynamically generated list of products. However, there are some where (depending on the intended usage) ordering may matter. For example, the eighth sample generates the name of a number in the array (e.g. 5 to “five”); the output ordering may not match up to the input array in this case, and thus while you’re successfully generating all of the relevant outputs, the correlation between input and output is lost (you’d either need to update the query to include in the results the original position, or you’d need to enable order preservation). There are other sample queries that will have this same issue.
Additionally, almost all of the samples deal with very small data sets, and do very little work on each item. It's likely that PLINQ's overhead will dominate in many of these cases, producing slower runs than if LINQ-to-Objects were used. We're spending a lot of time now on PLINQ performance, and we're hoping to see that overhead decrease as much as possible, but even with all of the optimizations we plan to throw at it, it's likely there will still be some queries for which the non-parallel LINQ-to-Objects will still be preferred. That's one place where we need your help. If you have queries that you believe should be executing much faster in parallel than they are, please send them our way; we'd love to analyze them to figure out where bottlenecks may be.
How do you want them passed on? I've got a portion of a library that I can tweak a bit to basically form the parse phase of a C# compiler...it has plenty of points of parallelization and should be able to handle large source code files. It's not something I'd prefer to post onto a forum right now, but it's an interesting test case because it's very recursive and very threaded (but also very incomplete for what it is ultimately intended for)
Pingback from http://oakleafblog.blogspot.com/2007/12/linq-and-entity-framework-posts-for_24.html
PingBack from http://websitescripts.247blogging.info/parallel-programming-with-net-linq-101-parallelism-blockers-and/
There is no escaping from concurrency challenges... or is there? (A slightly modified version of this
This post by Pedram looks at Parallel Extensions in the .NET Framework, to view other posts by Pedram