<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Developer Division Performance Engineering blog : Visual Studio Team Developer</title><link>http://blogs.msdn.com/ddperf/archive/tags/Visual+Studio+Team+Developer/default.aspx</link><description>Tags: Visual Studio Team Developer</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Parallel Scalability Isn’t Child’s Play, Part 3: The Problem with Fine-Grained Parallelism</title><link>http://blogs.msdn.com/ddperf/archive/2009/06/09/parallel-scalability-isn-t-child-s-play-part-3-the-problem-with-fine-grained-parallelism.aspx</link><pubDate>Tue, 09 Jun 2009 23:17:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9718379</guid><dc:creator>MarkBFriedman</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/ddperf/comments/9718379.aspx</comments><wfw:commentRss>http://blogs.msdn.com/ddperf/commentrss.aspx?PostID=9718379</wfw:commentRss><description>&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT face=Calibri&gt;&lt;A title="Part 2 in this series" href="http://blogs.msdn.com/ddperf/archive/2009/04/29/parallel-scalability-isn-t-child-s-play-part-2-amdahl-s-law-vs-gunther-s-law.aspx" mce_href="http://blogs.msdn.com/ddperf/archive/2009/04/29/parallel-scalability-isn-t-child-s-play-part-2-amdahl-s-law-vs-gunther-s-law.aspx"&gt;&lt;FONT size=3&gt;In the last blog entry in this series&lt;/FONT&gt;&lt;/A&gt;&lt;FONT size=3&gt;, I introduced the model for parallel program scalability proposed by Neil Gunther, which I praised for being a realistic antidote to more optimistic, but better known, formulas. Gunther’s model adds a new parameter to the more familiar Amdahl’s law. The additional parameter&lt;I style="mso-bidi-font-style: normal"&gt; k&lt;/I&gt;, representing &lt;I style="mso-bidi-font-style: normal"&gt;coherence&lt;/I&gt;-related delays, enables Gunther’s formula to model behavior where the performance of a parallel program can actually degrade at higher and higher levels of parallelization.&amp;nbsp;&lt;/FONT&gt;&lt;FONT size=3&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&lt;/SPAN&gt;Although I don’t know that the coherence delay factor that Gunther’s formula adds fully addresses the range and depth of the performance issues surrounding fine-grained parallelism, it is certainly one of the key factors Gunther’s law expresses that earlier formulations do not.&lt;/FONT&gt;&lt;/P&gt;&lt;/FONT&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;Developers experienced in building parallel programs recognize that Gunther’s formula echoes an inconvenient truth, namely, that the task of achieving performance gains using parallel programming techniques is often quite arduous. For example, in a recent blog entry entitled “&lt;/FONT&gt;&lt;A href="http://software.intel.com/en-us/articles/when-to-say-no-to-parallelism/"&gt;&lt;FONT size=3 face=Calibri&gt;When to Say No to Parallelism&lt;/FONT&gt;&lt;/A&gt;&lt;FONT size=3 face=Calibri&gt;,” Sanjiv Shah, a colleague at Intel, expressed similar sentiments. One very good piece of advice Sanjiv gives is that you should not even be thinking about parallelism until you have an efficient single-threaded version of your program debugged and running. &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;Let’s continue, for a moment, in the same vein as “&lt;/FONT&gt;&lt;A href="http://software.intel.com/en-us/articles/when-to-say-no-to-parallelism/"&gt;&lt;FONT size=3 face=Calibri&gt;When to Say No to Parallelism&lt;/FONT&gt;&lt;/A&gt;&lt;FONT size=3 face=Calibri&gt;.” Let’s look at the major sources of coherence-related delays in various kinds of parallel programs, how and why they occur, and what, if anything, can be done about them. Ultimately, I will try to tie this discussion into one about tools, especially some great new tools in Visual Studio Team System (see &lt;/FONT&gt;&lt;A href="http://blogs.msdn.com/hshafi/archive/2009/05/18/visual-studio-2010-beta-1-parallel-performance-tools.aspx"&gt;&lt;FONT size=3 face=Calibri&gt;Hazim Shafi’s blog&lt;/FONT&gt;&lt;/A&gt;&lt;FONT size=3 face=Calibri&gt; for details) that help you understand contention in your multi-threaded apps. When you use these new tools to gather and analyze the thread contention data for your app, it helps when you understand some of the common patterns to look for. &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;The first aspect of the coherence delays Gunther is concerned with that we will look at are the bare &lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;I style="mso-bidi-font-style: normal"&gt;minimum&lt;/I&gt; additional costs that a parallel program running multiple threads must pay, compared to running the same program single threaded. To simplify matters, we will look at the best possible prospect for parallel programming, an algorithm that is both &lt;I style="mso-bidi-font-style: normal"&gt;embarrassingly parallel&lt;/I&gt; and easy to partition into roughly equal sized subprogram chunks. There are two basic costs to consider: one that is paid up front for initialization of the parallel runtime environment, and one that is paid incrementally each time one of the parallel tasks executes. It is also worth noting that these are unavoidable costs. I will lump both costs into an &lt;I style="mso-bidi-font-style: normal"&gt;overhead&lt;/I&gt; category associated with Gunther’s coherence delay factor &lt;I style="mso-bidi-font-style: normal"&gt;k&lt;/I&gt;. The embarrassingly parallel programs we will consider&amp;nbsp;here will incurr these minimum processor overhead penalties when they are transformed to execute in parallel.&lt;/FONT&gt;&lt;/P&gt;
&lt;H3 style="MARGIN: 10pt 0in 0pt"&gt;&lt;FONT color=#4f81bd size=3 face=Cambria&gt;Fine grained parallelism. &lt;/FONT&gt;&lt;/H3&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;To frame this part of the discussion, let’s also consider the characterization of workloads and their amenability to parallelization into either &lt;I style="mso-bidi-font-style: normal"&gt;fine-grained&lt;/I&gt; or &lt;I style="mso-bidi-font-style: normal"&gt;coarse-grained&lt;/I&gt; ones. This distinction implicitly recognizes the impact of coherency delay factors on scalability. With fine-grained parallelism, the overhead of setting up the parallel runtime &amp;amp; executing the tasks in parallel can easily exceed the benefits. By definition, the initialization overhead of spinning up multiple threads and dispatching them is not nearly so big an issue when the program lends itself to coarse-grained parallelism. Plus, when you are looking at a very long running process with many opportunities to exploit parallelism, it is important to understand you should only have to incur the setup cost once. &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;Coarse-grained parallelism occurs when each parallel worker thread is assigned to computing a relatively long running function:&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal mce_keep="true"&gt;&lt;IMG style="WIDTH: 338px; HEIGHT: 419px" title="coarse-grained parallelism" alt="coarse-grained parallelism" src="http://5l3vgw.bay.livefilestore.com/y1pa53nIchv7Sk5zdNifva6i3kLD1Eu2MsrdAIOayN3r2OUL9XBpRy72kD_wGhKM1uYeb9TjLCGfyiz9E3hg5bqtg/Coarse-grained%20parallelism%20Fork-Join%20flowchart.jpg" width=338 height=419 mce_src="http://5l3vgw.bay.livefilestore.com/y1pa53nIchv7Sk5zdNifva6i3kLD1Eu2MsrdAIOayN3r2OUL9XBpRy72kD_wGhKM1uYeb9TjLCGfyiz9E3hg5bqtg/Coarse-grained%20parallelism%20Fork-Join%20flowchart.jpg"&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal mce_keep="true"&gt;&lt;STRONG&gt;Figure 5. Coarse-grained parallelism.&lt;/STRONG&gt;&lt;/P&gt;&lt;?xml:namespace prefix = o /&gt;&lt;o:wrapblock&gt;&lt;?xml:namespace prefix = v ns = "urn:schemas-microsoft-com:vml" /&gt;&lt;v:shapetype id=_x0000_t75 coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f"&gt;&lt;v:stroke joinstyle="miter"&gt;&lt;/v:stroke&gt;&lt;v:formulas&gt;&lt;v:f eqn="if lineDrawn pixelLineWidth 0"&gt;&lt;/v:f&gt;&lt;v:f eqn="sum @0 1 0"&gt;&lt;/v:f&gt;&lt;v:f eqn="sum 0 0 @1"&gt;&lt;/v:f&gt;&lt;v:f eqn="prod @2 1 2"&gt;&lt;/v:f&gt;&lt;v:f eqn="prod @3 21600 pixelWidth"&gt;&lt;/v:f&gt;&lt;v:f eqn="prod @3 21600 pixelHeight"&gt;&lt;/v:f&gt;&lt;v:f eqn="sum @0 0 1"&gt;&lt;/v:f&gt;&lt;v:f eqn="prod @6 1 2"&gt;&lt;/v:f&gt;&lt;v:f eqn="prod @7 21600 pixelWidth"&gt;&lt;/v:f&gt;&lt;v:f eqn="sum @8 21600 0"&gt;&lt;/v:f&gt;&lt;v:f eqn="prod @7 21600 pixelHeight"&gt;&lt;/v:f&gt;&lt;v:f eqn="sum @10 21600 0"&gt;&lt;/v:f&gt;&lt;/v:formulas&gt;&lt;v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"&gt;&lt;/v:path&gt;&lt;?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /&gt;&lt;o:lock v:ext="edit" aspectratio="t"&gt;&lt;/o:lock&gt;&lt;/v:shapetype&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;v:shape style="Z-INDEX: 251662336; POSITION: absolute; MARGIN-TOP: 0px; WIDTH: 231.55pt; HEIGHT: 286.85pt; MARGIN-LEFT: 0px; mso-position-horizontal: center" id=_x0000_s1026 type="#_x0000_t75"&gt;&lt;v:imagedata src="file:///C:\Users\markfr\AppData\Local\Temp\msohtmlclip1\01\clip_image001.wmz" o:title=""&gt;&lt;/v:imagedata&gt;&lt;?xml:namespace prefix = w ns = "urn:schemas-microsoft-com:office:word" /&gt;&lt;w:wrap type="topAndBottom"&gt;&lt;/w:wrap&gt;&lt;/v:shape&gt;&lt;FONT size=3 face=Calibri&gt;while fine-grained parallelism looks more like this:&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;IMG style="WIDTH: 338px; HEIGHT: 300px" title="fine-grained parallelism" alt="fine-grained parallelism" src="http://5l3vgw.bay.livefilestore.com/y1pdnte4FSJ18igb7DEZZ2veO2esTg2BykZz6FavZ-37bbyLejUIQlFBc4mboJUbzN3jUp3uq-FV8MA2RGunfMe8Q/Fine-grained%20parallelism%20Fork-Join%20flowchart.jpg" width=338 height=300 mce_src="http://5l3vgw.bay.livefilestore.com/y1pdnte4FSJ18igb7DEZZ2veO2esTg2BykZz6FavZ-37bbyLejUIQlFBc4mboJUbzN3jUp3uq-FV8MA2RGunfMe8Q/Fine-grained%20parallelism%20Fork-Join%20flowchart.jpg"&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;STRONG&gt;Figure 6. Fine-grained parallelism.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;The difference, of course, lies in how long, relatively speaking, the worker thread processing the unit of work executes.&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;The rationale for the fine-grained:coarse-grained distinction is its significance to performance. We identify those parallel algorithms that execute in the worker thread long enough to recover the costs of setting up and running the parallel environment as coarse-grained. The benefits of running such programs in parallel are much easier to realize. On the other hand, the one full proof way to identify fine-grained parallelism is to find embarrassingly parallel programs with very short execution time spans for each parallel task. When executing fine-grained parallel programs, there is a very high risk of slowing down the performance of the program, instead of improving it. (If this sounds like a bit of circular reasoning, it most surely is.) &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;Now let’s drill into these costs. In the .NET Framework, setting up a parallel execution environment is usually associated with the ThreadPool object. (If you are not very familiar with how to set up and use a ThreadPool in .NET, you might want to read &lt;/FONT&gt;&lt;A href="http://msdn.microsoft.com/en-us/library/0ka9477y.aspx"&gt;&lt;FONT color=#0000ff size=3 face=Calibri&gt;this&lt;/FONT&gt;&lt;/A&gt;&lt;FONT size=3 face=Calibri&gt; bit of documentation first that shows some simple C# examples. If you really want to understand all the ins and outs of the .NET ThreadPool, you should considering picking up a copy of Joe Duffy’s very thorough and authoritative book, &lt;/FONT&gt;&lt;A href="http://www.amazon.com/Concurrent-Programming-Windows-Microsoft-Development/dp/032143482X/ref=sr_1_1?ie=UTF8&amp;amp;s=books&amp;amp;qid=1241140965&amp;amp;sr=1-1"&gt;&lt;FONT color=#0000ff size=3 face=Calibri&gt;Concurrent Programming in Windows&lt;/FONT&gt;&lt;/A&gt;&lt;FONT size=3 face=Calibri&gt;.) &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal mce_keep="true"&gt;&lt;BR style="mso-ignore: vglayout" clear=all&gt;&lt;FONT size=3 face=Calibri&gt;The thing about using the ThreadPool object in .NET is that you don’t need to write a lot of code on your own to get it up and running. With very little coding effort, you can be running a parallel program. In the .NET Framework, there are newer programming constructs in the &lt;/FONT&gt;&lt;A href="http://msdn.microsoft.com/en-us/concurrency/default.aspx"&gt;&lt;FONT color=#0000ff size=3 face=Calibri&gt;Task Parallel Library&lt;/FONT&gt;&lt;/A&gt;&lt;FONT size=3 face=Calibri&gt; that are designed to make it easier for developers to express parallelism and exploit multi-core and many-core computers. But underneath the covers, many of the new TPL constructs are still using the CLR ThreadPool. So whatever overheads are associated with this older, less elegant approach still apply to any of the newer parallel programming constructs.&lt;/FONT&gt;&lt;/P&gt;
&lt;H3 style="MARGIN: 10pt 0in 0pt"&gt;&lt;FONT color=#4f81bd size=3 face=Cambria&gt;The ThreadPool in .NET&lt;/FONT&gt;&lt;/H3&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;The basic pattern for using the ThreadPool is to call the QueueUserWorkItem() method on the ThreadPool class, passing a delegate that performs the actual processing of the request, along with a set of parameters that are wrapped into a singleton Object. The parameters delineate the unit of work that is being requested. Typically, you also pass to the delegate a &lt;A title="ManualResetEvent reference" href="http://msdn.microsoft.com/en-us/library/system.threading.manualresetevent.aspx" mce_href="http://msdn.microsoft.com/en-us/library/system.threading.manualresetevent.aspx"&gt;ManualResetEvent&lt;/A&gt;, what is known as a &lt;EM&gt;synchronization primitive&lt;/EM&gt;. This event is used by the delegate to signal the Main task that the worker thread is finished processing the Work Item request.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;Structurally, you have to write:&lt;/FONT&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;
&lt;DIV style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;a class that wraps the Work Item parameter list,&lt;/FONT&gt;&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;a C# delegate that runs in the worker thread to process the Work Item request,&lt;/FONT&gt;&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;a dispatcher routine that queues work items for the thread pool delegate to process, and&lt;/FONT&gt;&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;finally, an event handler to get control when the worker thread completes.&lt;/FONT&gt;&lt;/DIV&gt;&lt;/LI&gt;&lt;/OL&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;For example, to implement the Fork/Join pattern in .NET using the built-in ThreadPool Object, create (1) a wrapper for the parameter list:&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: auto auto 0pt; mso-add-space: auto; mso-layout-grid-align: none" class=MsoNormalCxSpMiddle&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; COLOR: blue; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;public&lt;/SPAN&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: blue"&gt;class&lt;/SPAN&gt; &lt;SPAN style="COLOR: #2b91af"&gt;WorkerThreadParms&lt;/SPAN&gt; &lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: auto auto 0pt; mso-add-space: auto; mso-layout-grid-align: none" class=MsoNormalCxSpMiddle&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;{&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; TEXT-INDENT: 0.5in; MARGIN: auto auto 0pt; mso-add-space: auto; mso-layout-grid-align: none" class=MsoNormalCxSpMiddle&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; COLOR: blue; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;private&lt;/SPAN&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: #2b91af"&gt;ManualResetEvent&lt;/SPAN&gt; _thisevent;&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; TEXT-INDENT: 0.5in; MARGIN: auto auto 0pt; mso-add-space: auto; mso-layout-grid-align: none" class=MsoNormalCxSpMiddle&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;…&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: auto auto 0pt; mso-add-space: auto; mso-layout-grid-align: none" class=MsoNormalCxSpMiddle&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; COLOR: blue; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;public&lt;/SPAN&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: #2b91af"&gt;ManualResetEvent&lt;/SPAN&gt; thisevent&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: auto auto 0pt; mso-add-space: auto; mso-layout-grid-align: none" class=MsoNormalCxSpMiddle&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;{&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; TEXT-INDENT: 0.5in; MARGIN: 0in 0in 0pt 0.5in; mso-add-space: auto; mso-layout-grid-align: none" class=MsoNormalCxSpMiddle&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; COLOR: blue; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;get&lt;/SPAN&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt; { &lt;SPAN style="COLOR: blue"&gt;return&lt;/SPAN&gt; _thisevent; &lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; TEXT-INDENT: 0.5in; MARGIN: auto auto 0pt; mso-add-space: auto; mso-layout-grid-align: none" class=MsoNormalCxSpMiddle&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;}&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; TEXT-INDENT: 0.5in; MARGIN: auto auto 0pt; mso-add-space: auto; mso-layout-grid-align: none" class=MsoNormalCxSpMiddle&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;…&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: auto auto 0pt; mso-add-space: auto; mso-layout-grid-align: none" class=MsoNormalCxSpMiddle&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;public&lt;/SPAN&gt; WorkerThreadParms(&lt;SPAN style="COLOR: #2b91af"&gt;ManualResetEvent&lt;/SPAN&gt; signalwhendone, …,)&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: auto auto 0pt; mso-add-space: auto; mso-layout-grid-align: none" class=MsoNormalCxSpMiddle&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;{&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: auto auto 0pt; mso-add-space: auto; mso-layout-grid-align: none" class=MsoNormalCxSpMiddle&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="mso-tab-count: 1"&gt;&lt;/SPAN&gt;&lt;SPAN style="mso-tab-count: 1"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;_thisevent = signalwhendone;&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: auto auto 0pt; mso-add-space: auto; mso-layout-grid-align: none" class=MsoNormalCxSpMiddle&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;&lt;SPAN style="mso-tab-count: 2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;…&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: auto auto 0pt; mso-add-space: auto; mso-layout-grid-align: none" class=MsoNormalCxSpMiddle&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;}&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: auto auto 0pt; mso-add-space: auto; mso-layout-grid-align: none" class=MsoNormalCxSpMiddle&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;}&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt; mso-layout-grid-align: none" class=MsoNormal&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;SPAN style="mso-no-proof: yes"&gt;&lt;FONT size=3&gt;&lt;FONT face=Calibri&gt;(2) the ThrealPool delegate that unwraps the parameter list, performs some work, then signals the main thread when it is done:&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt; mso-layout-grid-align: none" class=MsoNormal&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; COLOR: blue; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;public&lt;/SPAN&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: blue"&gt;void&lt;/SPAN&gt; ThreadPoolDelegate(&lt;SPAN style="COLOR: #2b91af"&gt;Object&lt;/SPAN&gt; parm)&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt; mso-layout-grid-align: none" class=MsoNormal&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;{&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; TEXT-INDENT: 0.5in; MARGIN: 0in 0in 0pt; mso-layout-grid-align: none" class=MsoNormal&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; COLOR: #2b91af; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;WorkerThreadParms&lt;/SPAN&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt; p = (&lt;SPAN style="COLOR: #2b91af"&gt;WorkerThreadParms&lt;/SPAN&gt;) parm;&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt; mso-layout-grid-align: none" class=MsoNormal&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="COLOR: #2b91af"&gt;ManualResetEvent&lt;/SPAN&gt; signal = p.thisevent;&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;… &lt;SPAN style="COLOR: #00b050"&gt;//Do some work here&lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;…&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="TEXT-INDENT: 0.5in; MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;signal.Set(); &lt;SPAN style="COLOR: #00b050"&gt;//Signal main task when done&lt;/SPAN&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;}&lt;/SPAN&gt;&lt;/P&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;(3) a (simple) dispatcher loop:&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt; mso-layout-grid-align: none" class=MsoNormal&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; COLOR: #2b91af; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;ManualResetEvent&lt;/SPAN&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;[] thisevent = &lt;SPAN style="COLOR: blue"&gt;new&lt;/SPAN&gt; &lt;SPAN style="COLOR: #2b91af"&gt;ManualResetEvent&lt;/SPAN&gt;[tasks];&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt; mso-layout-grid-align: none" class=MsoNormal&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; COLOR: blue; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;for&lt;/SPAN&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt; (&lt;SPAN style="COLOR: blue"&gt;int&lt;/SPAN&gt; j = 0; j &amp;lt; tasks; j++)&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt; mso-layout-grid-align: none" class=MsoNormal&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;{&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; TEXT-INDENT: 0.5in; MARGIN: 0in 0in 0pt; mso-layout-grid-align: none" class=MsoNormal&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;thisevent[j] = &lt;SPAN style="COLOR: blue"&gt;new&lt;/SPAN&gt; &lt;SPAN style="COLOR: #2b91af"&gt;ManualResetEvent&lt;/SPAN&gt;(&lt;SPAN style="COLOR: blue"&gt;false&lt;/SPAN&gt;);&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt; mso-layout-grid-align: none" class=MsoNormal&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="COLOR: #2b91af"&gt;WorkerThreadParms&lt;/SPAN&gt; p = &lt;SPAN style="COLOR: blue"&gt;new&lt;/SPAN&gt; &lt;SPAN style="COLOR: #2b91af"&gt;WorkerThreadParms&lt;/SPAN&gt;(thisevent[j],…);&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt; mso-layout-grid-align: none" class=MsoNormal&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="COLOR: #2b91af"&gt;WorkerThread&lt;/SPAN&gt; worker = &lt;SPAN style="COLOR: blue"&gt;new&lt;/SPAN&gt; &lt;SPAN style="COLOR: #2b91af"&gt;WorkerThread&lt;/SPAN&gt;();&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt; mso-layout-grid-align: none" class=MsoNormal&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="COLOR: #2b91af"&gt;ThreadPool&lt;/SPAN&gt;.QueueUserWorkItem (worker.ThreadPoolDelegate,p);&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="LINE-HEIGHT: normal; MARGIN: 0in 0in 0pt; mso-layout-grid-align: none" class=MsoNormal&gt;&lt;SPAN style="FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;}&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;SPAN style="mso-no-proof: yes"&gt;&lt;o:p&gt;&lt;FONT size=3 face=Calibri&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;SPAN style="mso-no-proof: yes"&gt;&lt;FONT size=3&gt;&lt;FONT face=Calibri&gt;followed by (4) a WaitForMultipleObjects in the Main thread:&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-FAMILY: 'Courier New'; COLOR: #2b91af; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;ManualResetEvent&lt;/SPAN&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-FAMILY: 'Courier New'; FONT-SIZE: 9pt; mso-no-proof: yes"&gt;.WaitAll(thisevent);&lt;/SPAN&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-SIZE: 9pt"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;FONT face=Calibri&gt;&lt;FONT size=3 face=Calibri&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;You can see there isn’t very much code for you to write to get up and running in parallel and start taking advantage of all those multi-core processor resources. You should take Sanjiv Khan’s advice and write &amp;amp; debug the delegate code you intend to parallelize by testing it in a single threaded mode first. Once the single threaded program is debugged and optimized, you can easily restructure the program to run in parallel by encapsulating that processing in your worker thread delegate following this simple recipe.&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;Even though there isn’t very much code for you to write, there are overhead considerations that you need to be aware of. Let’s look at the simplest case where the program is embarrassingly parallel (as discussed in the previous blog entry). This allows us to ignore complications introduced by serialization and locking (which we will get to later). These overheads include &lt;/P&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="TEXT-INDENT: -0.25in; MARGIN: 0in 0in 0pt 0.5in; mso-list: l0 level1 lfo1" class=MsoListParagraphCxSpFirst&gt;&lt;SPAN style="mso-bidi-font-family: Calibri; mso-bidi-theme-font: minor-latin"&gt;&lt;SPAN style="mso-list: Ignore"&gt;&lt;FONT size=3 face=Calibri&gt;(1)&lt;/FONT&gt;&lt;SPAN style="FONT: 7pt 'Times New Roman'"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;FONT size=3 face=Calibri&gt;work done in the Common Language Runtime (CLR) on your behalf to spin up the worker threads in the ThreadPool initially, and &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="TEXT-INDENT: -0.25in; MARGIN: 0in 0in 10pt 0.5in; mso-list: l0 level1 lfo1" class=MsoListParagraphCxSpLast&gt;&lt;SPAN style="mso-bidi-font-family: Calibri; mso-bidi-theme-font: minor-latin"&gt;&lt;SPAN style="mso-list: Ignore"&gt;&lt;FONT size=3 face=Calibri&gt;(2)&lt;/FONT&gt;&lt;SPAN style="FONT: 7pt 'Times New Roman'"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;FONT size=3 face=Calibri&gt;the additional cost when running the parallel program associated with queuing work items, dispatching them to a thread pool thread to process, and signaling the main dispatcher thread when done. &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;In the case of the initialization work, this is something that only needs to be done once. The other costs accrue each time you need to dispatch a worker thread. For parallelism to be an effective scaling strategy, it is necessary to amortize this overhead cost over the life of the parallel threads. The worker thread delegates need to execute long enough that there is a benefit to executing in parallel. And this is for a best case for parallelism where the underlying program is both embarrassingly parallel and easy to partition into roughly equivalent work requests. &lt;/FONT&gt;&lt;/P&gt;
&lt;H3 style="MARGIN: 10pt 0in 0pt"&gt;&lt;FONT color=#4f81bd size=3 face=Cambria&gt;A Parallel.For example.&lt;/FONT&gt;&lt;/H3&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;Using the new Parallel.For construct in the Task Parallel Library, by the way, there is even less code you have to write. All you need to code the Parallel.For is write is the worker thread delegate, because the TPL library handles the remaining boilerplate tasks. However, the underlying overhead considerations are almost identical.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;The challenge of speeding up programs that have fine-grained parallelism grows in tandem with making it easier for developers to write concurrent programs. Take a look at the following example of parallelization in C# using the new Parallel.For construct taken verbatim from a Microsoft white paper entitled “&lt;A title="Taking Parallelism Mainstream" href="http://download.microsoft.com/download/D/5/9/D597F62A-0BEE-4CE7-965B-099D705CFAEE/Taking%20Parallelism%20Mainstream%20Microsoft%20February%202009.docx" mce_href="http://download.microsoft.com/download/D/5/9/D597F62A-0BEE-4CE7-965B-099D705CFAEE/Taking%20Parallelism%20Mainstream%20Microsoft%20February%202009.docx"&gt;Taking Parallelism Mainstream&lt;/A&gt;.” The white paper describes some of the new language facilities in the Task Parallel Library that make it easier for developers to write parallel programs. These new language facilities include Parallel For loops, Parallel LINQ, Parallel Invoke, Futures and Continuations, and messaging using asynchronous agents. To the extent that these parallel computing initiatives succeed, they will generate the need for better performance analysis tools to understand the performance of concurrent programs because not everyone who implements these new constructs is going to see impressive speed-up of his or her applications. Some developers will even see the retrograde performance predicted by Gunther’s scalability formula.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;Here’s the C# program that illustrates one of the new parallel programming constructs:&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-FAMILY: 'Courier New'; COLOR: #4f81bd; FONT-SIZE: 9pt; mso-themecolor: accent1"&gt;IEnumerable&amp;lt;StockQuote&amp;gt; Query(IEnumerable&amp;lt;StockQuote&amp;gt; stocks) {&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;B style="mso-bidi-font-weight: normal"&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-FAMILY: 'Courier New'; COLOR: #4f81bd; FONT-SIZE: 9pt; mso-themecolor: accent1"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;var results = new ConcurrentBag&amp;lt;StockQuote&amp;gt;();&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/B&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;B style="mso-bidi-font-weight: normal"&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-FAMILY: 'Courier New'; COLOR: #4f81bd; FONT-SIZE: 9pt; mso-themecolor: accent1"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;Parallel.ForEach (stocks, stock =&amp;gt; {&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/B&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;B style="mso-bidi-font-weight: normal"&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-FAMILY: 'Courier New'; COLOR: #4f81bd; FONT-SIZE: 9pt; mso-themecolor: accent1"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;if (stock.MarketCap &amp;gt; 100000000000.0 &amp;amp;&amp;amp;&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/B&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;B style="mso-bidi-font-weight: normal"&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-FAMILY: 'Courier New'; COLOR: #4f81bd; FONT-SIZE: 9pt; mso-themecolor: accent1"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;stock.ChangePct &amp;lt; 0.025 &amp;amp;&amp;amp;&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/B&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;B style="mso-bidi-font-weight: normal"&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-FAMILY: 'Courier New'; COLOR: #4f81bd; FONT-SIZE: 9pt; mso-themecolor: accent1"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;stock.Volume&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&amp;gt; 1.05 * stock.VolumeMavg3M) {&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/B&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;B style="mso-bidi-font-weight: normal"&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-FAMILY: 'Courier New'; COLOR: #4f81bd; FONT-SIZE: 9pt; mso-themecolor: accent1"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;results.Add(stock);&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/B&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;B style="mso-bidi-font-weight: normal"&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-FAMILY: 'Courier New'; COLOR: #4f81bd; FONT-SIZE: 9pt; mso-themecolor: accent1"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;}&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/B&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;B style="mso-bidi-font-weight: normal"&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-FAMILY: 'Courier New'; COLOR: #4f81bd; FONT-SIZE: 9pt; mso-themecolor: accent1"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;});&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/B&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;B style="mso-bidi-font-weight: normal"&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-FAMILY: 'Courier New'; COLOR: #4f81bd; FONT-SIZE: 9pt; mso-themecolor: accent1"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;return results;&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/B&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 0pt" class=MsoNormal&gt;&lt;B style="mso-bidi-font-weight: normal"&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-FAMILY: 'Courier New'; COLOR: #4f81bd; FONT-SIZE: 9pt; mso-themecolor: accent1"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&lt;/SPAN&gt;}&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/B&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;The example uses a &lt;I style="mso-bidi-font-style: normal"&gt;Parallel.ForEach&lt;/I&gt; enumeration loop, along with one of the new concurrent collection classes, the &lt;I style="mso-bidi-font-style: normal"&gt;ConcurrentBag&lt;/I&gt;, to evaluate stock prices based on some set of selection criteria. The problem, the white paper author notes, is one that is considered “embarrassingly parallel” because it is easily decomposed into independent sub-problems that can be executed concurrently. The new C# Task Parallel Library language features provide an elegant way to express this parallelism. Underneath the Parallel.For expression is a run-time library that understands how to partition the body of the parallel For loop into multiple work items, and dispatches them to separate worker threads that are then scheduled to execute concurrently. &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;The .NET Task Parallel Library (TPL) provides the run-time machinery to turn this program into a parallel program. At run-time, it automatically parallelizes the lambda expression that the &lt;I style="mso-bidi-font-style: normal"&gt;Parallel.For&lt;/I&gt; construct references. In this example, the Task Parallel Library takes the If Statement in the lambda expression and queues it up to run in parallel using the concurrent runtime library in .NET. The concurrent runtime library creates a thread pool and then delegates the processing of the lambda expression to these worker threads. The concurrent runtime attempts to allocate and schedule an optimal number of worker threads to this parallel task.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;While TPL makes it easier to express parallelism in your programs and eliminates most of the grunt work in setting up the runtime environment associated with parallel threads, it cannot guarantee that running this program in parallel on a machine with four or eight processors will actually speed up its execution time. That is the essence of the challenge of fine-grained parallelism. There is overhead associated with queuing work items to the thread pool the parallel run-time manages. This is overhead that the serial version of the program does not encounter. Only when the amount of work done inside the lambda expression executes for a long enough time does the benefit of parallelizing the lambda expression exceeds this cost, which must be amortized over each concurrent execution of the inner body of the Parallel.For loop. Note that this particular set of overheads is unavoidable. When you are dealing with fine-grained parallelism, the overhead of setting up the parallel run-time environment alone often exceeds the potential benefit of executing in parallel, notwithstanding other possible sources of contention-related delays that could further slow down execution time.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;So, it is important to realize that a code sample like the one I have taken here from the parallel programming white paper was chosen to illustrate the expressive power of the new language constructs. The example was intended to show the &lt;EM&gt;pattern&lt;/EM&gt; that developers should adhere to -- I am certain it wasn't intended to illustrate something specific you would actual do. When you are taking advantage of these new parallel programming language extensions, you need to be aware of the fine-grained:coarse-grained distinction.&amp;nbsp;This is emphatically not an example of a program that you will necessarily speed up by running it in parallel. It will take considerably more anlysis to figure out if parallelism is the right solution here. Speeding up a serial program by running portions of it in parallel isn’t always easy – even when that program has sections that are “embarrassingly parallel.”&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;/P&gt;&lt;SPAN style="mso-spacerun: yes"&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;What we’d like to be able to do is to estimate the actual performance improvement we can expect of this parallel program, compared to its original serial version. “Embarrassingly parallel” is another way of saying that, once parallelized, the serial portion of this program is expected to be quite small. However, this is also an example of fine-grained parallelism because the amount of code associated with the lambda expression that is passed to worker thread delegate to execute is also quite small. An experienced developer at this point should be asking whether the overhead of creating these working threads and dispatching them might, in fact, be greater than the benefit of executing this task in parallel. This relatively fixed overhead is especially important when the amount of work performed by each delegate is quite small – there is only a limited opportunity to amortize this overhead to initiate and manage the multithreaded operation across the execution time of each of the worker threads. It is extremely important to understand this in the case of fine-grained parallelism.&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;Next, we will turn to coarse-grained parallelism, where the odds are much better that you may be able to speed up program execution substantially using concurrent programming techniques. In the next blog entry, I will start to tackle more promising examples. The analysis of the performance costs associated with parallel programming tasks will become more complex. I will try to illustrate this analysis using a concrete programming example that will simulate coarse-grained parallelism. As we look at how this simple programming example scales on a multi-core machine, it will bring us face-to-face with the pitfalls even experienced developers can expect to encounter when they attempt to parallelize their existing serial applications. &lt;/FONT&gt;&lt;/P&gt;
&lt;P style="MARGIN: 0in 0in 10pt" class=MsoNormal&gt;&lt;FONT size=3 face=Calibri&gt;We will also start to look into the analysis tools that we are available in the next version of the Visual Studio Profiler that greatly help with understanding the performance of your .NET parallel program. In the meantime, if you'd like to get a head start on these new tools in the Visual Studio Profiler , be sure to check out &lt;A href="http://blogs.msdn.com/hshafi/archive/2009/05/18/visual-studio-2010-beta-1-parallel-performance-tools.aspx"&gt;&lt;FONT size=3 face=Calibri&gt;Hazim Shafi’s blog&lt;/FONT&gt;&lt;/A&gt;&lt;FONT size=3 face=Calibri&gt; for more details.&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/o:wrapblock&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9718379" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/ddperf/archive/tags/.NET/default.aspx">.NET</category><category domain="http://blogs.msdn.com/ddperf/archive/tags/Scalability/default.aspx">Scalability</category><category domain="http://blogs.msdn.com/ddperf/archive/tags/Parallel+programming/default.aspx">Parallel programming</category><category domain="http://blogs.msdn.com/ddperf/archive/tags/Performance/default.aspx">Performance</category><category domain="http://blogs.msdn.com/ddperf/archive/tags/Beta/default.aspx">Beta</category><category domain="http://blogs.msdn.com/ddperf/archive/tags/Visual+Studio+Profiler/default.aspx">Visual Studio Profiler</category><category domain="http://blogs.msdn.com/ddperf/archive/tags/Visual+Studio+Team+Developer/default.aspx">Visual Studio Team Developer</category></item></channel></rss>