Welcome to MSDN Blogs Sign in | Join | Help

News

  • These postings are provided "AS IS" with no warranties and confer no rights. All code and tools presented are done so under the Microsoft Public License.
Patterns for Parallel Programming with the .NET Framework

I've recently written a detailed paper, "Patterns for Parallel Programming: Understanding and Applying Parallel Patterns with the .NET Framework 4", on implementing a variety of common parallel patterns with the .NET Framework 4.  The paper is now live in the Microsoft Download Center, and you can grab it from http://www.microsoft.com/downloads/details.aspx?FamilyID=86b3d32b-ad26-4bb8-a3ae-c1637026c3ee.

Any and all feedback is welcome.  I very much hope you enjoy it and find it useful.

Thanks!
Stephen

Posted: Monday, November 09, 2009 2:51 PM by toub
Filed under:

Comments

Josh said:

I added the link to my blog.

Very useful

THANKS,

Josh

# November 9, 2009 10:17 PM

ericv said:

Great document, a bible for parallel developper.

Thanks you so much,

Bruno

# November 9, 2009 11:29 PM

GJ said:

Thanks!! Very interesting stuff. As i'm starting to figure out, thread safe code does not equal effectively utilizing all those cores. This kind of  information is very welcome! Keep up th egood work,

GJ

# November 10, 2009 2:28 AM

Arik said:

If I have only one processor ,how I can increase the number of tasks/threads for this specific proccessor? because it seems that w/o that the code run the same as w/o parallel,.

Thanks,

# November 11, 2009 2:43 PM

toub said:

Hi Arik-

The ThreadPool serves as the default scheduling mechanism for the Task Parallel Library, and it does not limit the number of threads to the number of cores.  If its thread injection heuristic determines that more threads will help with throughput, it will likely add more.  You can also influence it further through additional APIs, like ThreadPool.SetMinThreads and SetMaxThreads.  On top of this, PLINQ provides a WithDegreeOfParallelism method which allows you to control how many tasks PLINQ uses.

# November 11, 2009 5:06 PM

Michael Cederberg said:

Great paper, however here is a pattern that I did not find any guidance on:

We have a highly parallel system with a few thousand very independent tasks executing concurrently (I hope soon to get a 64 core box to run it on). The system runs for days. From each of these tasks I would like to collect statistics: Number of items processed, Average processing time per item, etc.  At regular intervals I would like to sample these statistics.

A naive implementation would protect the collected statistics with a lock, essentially serializing update of it. However such a pattern do not scale very well (and I would not like to introduce such locks in our otherwise almost lock free design).

A nicer solution would keep an array of statistics data. Each update would only update one slot of statistics data thus removing the need for a lock. When sampling the data, we would loop through the array and aggregate statistics. The tricky thing is to find a good key for the array. ThreadId is not a good candidate as it can grow quite large (As I understand it, there are ~500 IO completion port threads allocated per core). A “core id” would be better, although there are some interesting synchronization issues there.

I suspect the new ThreadLocal<T> class will not solve the problem either, as there will be one slot per thread (which is the essence of the ThreadLocal abstraction).

What are your comments?

# November 17, 2009 6:49 AM

toub said:

Hi Michael-

Glad you enjoyed the paper.  Consider checking out the ReductionVariable type in the Beta 2 samples at http://code.msdn.microsoft.com/ParExtSamples.  It's a very thin wrapper on top of ThreadLocal<T> that maintains a list of all of the local values created.  Each of your threads can update the .Value property, which could contain the statistics for the local thread, and then your sampling routine would loop through the IEnumerable<T> .Values property.

# November 17, 2009 7:41 AM

Michael Cederberg said:

But that still leaves me with potentially thousands of values to sample (in case different IOCP threads are posting threadlocal results).

What I was hoping for was more like "CoreLocal<T>" ... that way I would only need to sample one value per core.

# November 17, 2009 12:32 PM

toub said:

It's extremely unlikely that you'll end up with hundreds or thousands of threads, and even if you do, you likely have more fundamental issues to deal with than having to check one piece of state per thread.  Have you actually seen this approach to be problematic?

# November 18, 2009 12:14 AM

Matt said:

Great paper. I'm really looking forward to the F# version!

# November 20, 2009 2:10 PM

Richard said:

Great paper.

Are you sure about the SerialPi algorithm on page 67? On my machine, with NUM_STEPS = 1000, the result is 4.28601621355331E+298; NUM_STEPS = 10000 gives Infinity.

# November 23, 2009 12:50 PM

Richard said:

OK, I've found the problem:

double partial = sum + 4.0 / (1.0 + x * x);

sum += partial;

You're adding sum twice. It should be:

double partial = 4.0 / (1.0 + x * x);

sum += partial;

# November 23, 2009 12:58 PM

toub said:

Thanks, Richard!  Nice catch.  Copy and paste error... I'll fix it when I repost the paper.

# November 23, 2009 1:00 PM
Leave a Comment

(required) 

(required) 

(optional)

(required) 

  
Enter Code Here: Required

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Page view tracker