All about Async/Await, System.Threading.Tasks, System.Collections.Concurrent, System.Linq, and more…
I've recently written a detailed paper, "Patterns for Parallel Programming: Understanding and Applying Parallel Patterns with the .NET Framework 4", on implementing a variety of common parallel patterns with the .NET Framework 4. The paper is now live in the Microsoft Download Center, and you can grab it from http://www.microsoft.com/downloads/details.aspx?FamilyID=86b3d32b-ad26-4bb8-a3ae-c1637026c3ee.
Any and all feedback is welcome. I very much hope you enjoy it and find it useful.
I added the link to my blog.
Great document, a bible for parallel developper.
Thanks you so much,
Thanks!! Very interesting stuff. As i'm starting to figure out, thread safe code does not equal effectively utilizing all those cores. This kind of information is very welcome! Keep up th egood work,
If I have only one processor ,how I can increase the number of tasks/threads for this specific proccessor? because it seems that w/o that the code run the same as w/o parallel,.
The ThreadPool serves as the default scheduling mechanism for the Task Parallel Library, and it does not limit the number of threads to the number of cores. If its thread injection heuristic determines that more threads will help with throughput, it will likely add more. You can also influence it further through additional APIs, like ThreadPool.SetMinThreads and SetMaxThreads. On top of this, PLINQ provides a WithDegreeOfParallelism method which allows you to control how many tasks PLINQ uses.
Great paper, however here is a pattern that I did not find any guidance on:
We have a highly parallel system with a few thousand very independent tasks executing concurrently (I hope soon to get a 64 core box to run it on). The system runs for days. From each of these tasks I would like to collect statistics: Number of items processed, Average processing time per item, etc. At regular intervals I would like to sample these statistics.
A naive implementation would protect the collected statistics with a lock, essentially serializing update of it. However such a pattern do not scale very well (and I would not like to introduce such locks in our otherwise almost lock free design).
A nicer solution would keep an array of statistics data. Each update would only update one slot of statistics data thus removing the need for a lock. When sampling the data, we would loop through the array and aggregate statistics. The tricky thing is to find a good key for the array. ThreadId is not a good candidate as it can grow quite large (As I understand it, there are ~500 IO completion port threads allocated per core). A “core id” would be better, although there are some interesting synchronization issues there.
I suspect the new ThreadLocal<T> class will not solve the problem either, as there will be one slot per thread (which is the essence of the ThreadLocal abstraction).
What are your comments?
Glad you enjoyed the paper. Consider checking out the ReductionVariable type in the Beta 2 samples at http://code.msdn.microsoft.com/ParExtSamples. It's a very thin wrapper on top of ThreadLocal<T> that maintains a list of all of the local values created. Each of your threads can update the .Value property, which could contain the statistics for the local thread, and then your sampling routine would loop through the IEnumerable<T> .Values property.
But that still leaves me with potentially thousands of values to sample (in case different IOCP threads are posting threadlocal results).
What I was hoping for was more like "CoreLocal<T>" ... that way I would only need to sample one value per core.
It's extremely unlikely that you'll end up with hundreds or thousands of threads, and even if you do, you likely have more fundamental issues to deal with than having to check one piece of state per thread. Have you actually seen this approach to be problematic?
Great paper. I'm really looking forward to the F# version!
Are you sure about the SerialPi algorithm on page 67? On my machine, with NUM_STEPS = 1000, the result is 4.28601621355331E+298; NUM_STEPS = 10000 gives Infinity.
OK, I've found the problem:
double partial = sum + 4.0 / (1.0 + x * x);
sum += partial;
You're adding sum twice. It should be:
double partial = 4.0 / (1.0 + x * x);
Thanks, Richard! Nice catch. Copy and paste error... I'll fix it when I repost the paper.
Excellenet patterns, From where I can download the complete samples from this book? For me the section CONTINUATION CHAINING is bit unclear, and bit confused about Task Walk<T>..
AB, for my paper, there's no separate download with the code. If you're interested in the recent book from the patterns & practices team on parallel patterns (msdn.microsoft.com/.../ff963553.aspx), you can find the code for that at parallelpatterns.codeplex.com.