All about Async/Await, System.Threading.Tasks, System.Collections.Concurrent, System.Linq, and more…
Community Technology Preview (CTP) releases from Microsoft typically provide early looks at the technologies a team is working on. Frequently, CTP quality is nowhere near what folks might expect from Beta releases and the like, and that's ok. The idea is to give all of you in the community a look at what we're working on, giving you enough time to provide deep feedback on the direction the technology is taking, and giving the product team working on that technology enough time to react to the feedback so that it actually has an impact.
We're excited to have released a CTP of the Parallel Extensions to the .NET Framework today, and we're very interested in the feedback you have on the approach, on the APIs we're providing, and so forth. However, it's also important to keep in mind the quality level present in this release.
We've been working on PLINQ for quite a while now, and the CTP release contains relatively stable bits. They're certainly not reliable and robust enough to put into production (please don't!) nor is performance anywhere near what we expect it to ultimately be by the time the technology is released in final v1 form. However, except for a few known correctness bugs, the quality should be good enough for you to start trying out the APIs, seeing what sort of business value they'll provide in your applications, providing correctness bug reports or feature requests to us, sending us code samples representing cool things you've done or things that you expected to work but didn't, and so forth.
The Task Parallel Library has a much lower quality bar for this release. We've been referring to it internally as "API evaluation quality," meaning that while key scenarios will work with the bits, portions of the API we have in our specs have not been included in this release, there are several known and important correctness issues, and performance hasn't been a priority thus far (it absolutely will be going forwards!). We really need your feedback here. We know we have correctness bugs, so while those are important to know about, we're more interested in feedback on the design of the APIs. Will these APIs help you to introduce enough parallelism into your applications? Do they require any funky gyrations when solving problems that other APIs we could provide wouldn't require? With the Parallel class, are For/ForEach/Do the right methods to expose, or are there other common loops you'd like to see? We've already started revamping the APIs slightly here and there based on internal and customer feedback, but we've a ways to go, so definitely let us know what's on your mind.
(Note that several times we've described PLINQ as building on top of the Task Parallel Library. If you take a look inside the System.Threading.dll, however, you'll probably notice that PLINQ is actually using the .NET ThreadPool rather than TPL's Task APIs. What's up? Given what I've just said about the quality differences between PLINQ and TPL, you can probably guess. Given that for this release we had a higher quality bar for PLINQ, we opted to have PLINQ use the ThreadPool for this release. As we improve the correctness, performance, and reliability of TPL, expect to see this change in a future release.)
You can provide feedback to us through our Connect site. You can also see a list of known correctness issues.
Is it normal, that using Parallel.For is SLOWER than the normal for?
I made a little test with MatrixMultiplication, and the parallelizized part was more than two times slower.
That's the code I used:
static void Main(string args)
double[,] res = new double[10, 10];
Console.WriteLine("press a key");
for (int i = 0; i < 1000000; i++)
MatrixMult(10, getRandMat(10), getRandMat(10), res);
ParMatrixMult(10, getRandMat(10), getRandMat(10), res);
private static double[,] getRandMat(int p)
double[,] res = new double[p, p];
Random rand = new Random();
for (int i = 0; i < p; i++)
for (int j = 0; j < p; j++)
res[i, j] = (int)(rand.NextDouble() * 1000);
static void ParMatrixMult(int size, double[,] m1, double[,] m2, double[,] result)
Parallel.For(0, size, delegate(int i)
for (int j = 0; j < size; j++)
result[i, j] = 0;
for (int k = 0; k < size; k++)
result[i, j] += m1[i, k] * m2[k, j];
static void MatrixMult(int size, double[,] m1, double[,] m2, double[,] result)
for(int i=0;i<size;i++) //Parallel.For(0, size, delegate(int i)
Matrix of rank 10 is too small. In this case parallel matrix multiplication will give you no gain, rather performance degradation, as you observed.
Instead of using Paraller.For inside the calculation algorithm, try moving it one scope higher, when you iterate 1000000 times. Or try testing it on a bigger matrix.
Remember that parralel coumputing/programming is not a cure for everything, you have to know when to use it, and when fallback to sequential algorithms.
Yes thanks you're right. I noticed this myself after better testing.
PingBack from http://sitecore.alexiasoft.nl/2007/12/02/plinq-to-sitecore/
On the speed issue, just for kicks I wanted to try converted a generic grammar parser I wrote because a) it's highly recursive with some relatively tightly-written core algorithms, and b) I had already started refactoring it for a threading recompiler I abandoned recently.
The the plus side, it was super-easy to wrap the central code in a Parallel.For/Each. I tried both ForEach and later For, trying a couple of ways of getting the results into the result collection. I'm not really expecting a performance gain on my single-core setup with preview bits anyway, but it was interesting in terms of the different PFX methods for enumerating. My Parallel.ForEach version ran 10x slower than the original version using a synch-heavy collection scheme, but with the For, where I store my results for each iteration in an array element and then gather them at the end, it was 100x.
I'm interesting in comparing it with a PLINQ version of my loop, but that's where it gets a bit more complex in my head to integrate the parallel stuff (extension methods and the SQL-ness of it don't feel natural to me). I do expect performance to improve with later CTPs and upon final release, but comparing the different ways within PFX is interesting, as is trying to restructure my code to me more "pure" (in PFX doc parlance)
Thanks, Keith. Very interesting. If you wanted to put together a small repro code sample that highlights the kinds of bottlenecks you're running into, we'd be happy to look at it. I'm most curious, though, about your 100x slowdown reported with Parallel.For. Are you comparing that Parallel.For implementation to your original sequential implementation or to a sequential implementation that mimicks the same new approach you're taking?
toub: The slowdown was caused in the different method I used for joining the results of my parse, not in the PFX library itself; I was trying to restructure my outer code to work in a parallel manner without explicit locks.
I've been trying to wrap my head around the issue, and I think the solution lies within ParallelEnumerable.Concat, as it probably has a much more elegant means for preserving order and sharing the result buffer than my brute force methods. I'm still trying to get my head around the paradigm enough to rewrite my logic, but basically what I do is take in an object with a string value, split it via some delimiter, and end up with the same object, but instead of it containing a string value, it has a bunch of children of the same type as itself, but now *they* have the string values (or depending on how many levels the grammar def is, their children have the strings). Catch is each thread is expected to dump its results in a shared collection while preserving order, and that isn't the most thread-friendly thing to do (and also why I want to try using the PLINQ concatenator instead of my brute force attempts)