<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Achieving Speedups with Small Parallel Loop Bodies</title><link>http://blogs.msdn.com/pfxteam/archive/2009/06/06/9703059.aspx</link><description>The Parallel class represents a significant advancement in parallelizing managed loops. For many common scenarios, it just works, resulting in terrific speedups. However, while ideally Parallel.For could be all things to all people, such things rarely</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Samples for Parallel Programming with the .NET Framework 4</title><link>http://blogs.msdn.com/pfxteam/archive/2009/06/06/9703059.aspx#9703168</link><pubDate>Sun, 07 Jun 2009 02:34:18 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9703168</guid><dc:creator>Samples for Parallel Programming with the .NET Framework 4</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://dougfinke.com/blog/index.php/2009/06/06/samples-for-parallel-programming-with-the-net-framework-4/"&gt;http://dougfinke.com/blog/index.php/2009/06/06/samples-for-parallel-programming-with-the-net-framework-4/&lt;/a&gt;&lt;/p&gt;
</description></item><item><title>re: Achieving Speedups with Small Parallel Loop Bodies</title><link>http://blogs.msdn.com/pfxteam/archive/2009/06/06/9703059.aspx#9703243</link><pubDate>Sun, 07 Jun 2009 03:33:49 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9703243</guid><dc:creator>David Cuccia</dc:creator><description>&lt;p&gt;Thanks! Just the guidance I was looking for. I implemented something similar to the first example, but I like the elegance of the second solution.&lt;/p&gt;
&lt;p&gt;Any news on when we might see the TPL in Silverlight? Working on an interactive computational tool for light simulation in Biomedical Optics...would love to use all those procs!&lt;/p&gt;
</description></item><item><title>Interesting Finds: June 7, 2009</title><link>http://blogs.msdn.com/pfxteam/archive/2009/06/06/9703059.aspx#9704480</link><pubDate>Sun, 07 Jun 2009 19:41:35 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9704480</guid><dc:creator>Jason Haley</dc:creator><description>&lt;p&gt;Interesting Finds: June 7, 2009&lt;/p&gt;
</description></item><item><title>re: Achieving Speedups with Small Parallel Loop Bodies</title><link>http://blogs.msdn.com/pfxteam/archive/2009/06/06/9703059.aspx#9707830</link><pubDate>Mon, 08 Jun 2009 12:57:43 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9707830</guid><dc:creator>Jack</dc:creator><description>&lt;p&gt;using foreach will be faster the for, because when using for, system will check the range automatically.&lt;/p&gt;
</description></item><item><title>re: Achieving Speedups with Small Parallel Loop Bodies</title><link>http://blogs.msdn.com/pfxteam/archive/2009/06/06/9703059.aspx#9728342</link><pubDate>Fri, 12 Jun 2009 03:23:57 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9728342</guid><dc:creator>toub</dc:creator><description>&lt;p&gt;Hi David-&lt;/p&gt;
&lt;p&gt;Thanks for the enthusiasm and feedback. &amp;nbsp;Regarding Silverlight, no news, but thanks for the scenario and it's great to know you're interested.&lt;/p&gt;
</description></item><item><title>re: Achieving Speedups with Small Parallel Loop Bodies</title><link>http://blogs.msdn.com/pfxteam/archive/2009/06/06/9703059.aspx#9729824</link><pubDate>Fri, 12 Jun 2009 07:35:36 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9729824</guid><dc:creator>David Cuccia</dc:creator><description>&lt;p&gt;Hi Stephen,&lt;/p&gt;
&lt;p&gt;Thanks for letting me know re: Silverlight. Could you say whether there's anything truly &amp;quot;missing&amp;quot; in the CoreCLR to port a library like the June CTP, or is it just a productization issue?&lt;/p&gt;
&lt;p&gt;Thanks,&lt;/p&gt;
&lt;p&gt;David&lt;/p&gt;
</description></item><item><title>re: Achieving Speedups with Small Parallel Loop Bodies</title><link>http://blogs.msdn.com/pfxteam/archive/2009/06/06/9703059.aspx#9736069</link><pubDate>Fri, 12 Jun 2009 19:57:18 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9736069</guid><dc:creator>toub</dc:creator><description>&lt;p&gt;Hi David-&lt;/p&gt;
&lt;p&gt;There's nothing I know of inherent to Silverlight or CoreCLR that would prevent a library similar to TPL from working. &amp;nbsp;Building TPL purely in transparent code would require some changes and would likely suffer from a performance perspective (to what extent I can't say), but the bulk of the public API surface can surely be made to work.&lt;/p&gt;
</description></item><item><title>re: Achieving Speedups with Small Parallel Loop Bodies</title><link>http://blogs.msdn.com/pfxteam/archive/2009/06/06/9703059.aspx#9737668</link><pubDate>Fri, 12 Jun 2009 22:12:53 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9737668</guid><dc:creator>David Cuccia</dc:creator><description>&lt;p&gt;That's great to hear, thanks for following up! FWIW, the types of parallelzation we use PFX for on the desktop (that we'd like to port to our SL tool) are very granular parameter sweeps, so any performance hit for coordination/task-stealing/etc would be minimal for us. We wait in patient anticipation. ;) In the meantime, we have the ThreadPool, PowerThreading, and Joe Duffy's masterpiece.&lt;/p&gt;
</description></item><item><title>re: Achieving Speedups with Small Parallel Loop Bodies</title><link>http://blogs.msdn.com/pfxteam/archive/2009/06/06/9703059.aspx#9753387</link><pubDate>Mon, 15 Jun 2009 17:30:52 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9753387</guid><dc:creator>Luke Puplett</dc:creator><description>&lt;p&gt;I notice that many examples (and I know that they're for the purposes of example) use the ProcessorCount as a guide to division of work.&lt;/p&gt;
&lt;p&gt;However, in a highly parallel environment such as those we hope to be building as standard practice in the coming years or in ASP or WCF services today, the code may already be executing down a &amp;quot;branch&amp;quot; of concurrent work, so it may not be desirable to parallelize a task further.&lt;/p&gt;
&lt;p&gt;Has the team thought about a supervising component that can be 'asked' about how many concurrent tasks are executing and whether to split.&lt;/p&gt;
&lt;p&gt;Where I have recursive calls to a parallel method I use a static field to keep a count of DOP; work may continue on the current thread and then if some other work completes the next iteration takes up the slack by going parallel again.&lt;/p&gt;
</description></item><item><title>re: Achieving Speedups with Small Parallel Loop Bodies</title><link>http://blogs.msdn.com/pfxteam/archive/2009/06/06/9703059.aspx#9753457</link><pubDate>Mon, 15 Jun 2009 17:51:28 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9753457</guid><dc:creator>toub</dc:creator><description>&lt;p&gt;Hi Luke-&lt;/p&gt;
&lt;p&gt;Thanks for the note. &amp;nbsp;Yes, we've have thought about it, and in fact that's similar in principle to how Parallel.For/ForEach behave. &amp;nbsp;They don't immediately partition into N==# of logical partitions. &amp;nbsp;Rather, they scale up dynamically as resources become available; if all of the other threads in the ThreadPool are busy doing other work, Parallel.For/ForEach will in effect execute sequentially, until such time as other threads are available, at which point they'll increase their degree of parallelism appropriately. (There are a few issues with this behavior in Beta 1, but they're being addressed.)&lt;/p&gt;
&lt;p&gt;As you mention, this is important for environments like ASP.NET. &amp;nbsp;You should be able to implement a parallelized component and use that component in ASP.NET. &amp;nbsp;At peak times when your servers are saturated with requests, the loops will likely run sequentially. &amp;nbsp;But at off-hours when there's more CPU resources to go around, the loops will be able to increase their degree of parallelism, helping to decrease the latency for individual requests.&lt;/p&gt;
&lt;p&gt;Now, there may still be times when you want to employ a counter like what you've described. &amp;nbsp;As much as we try to keep the cost down, there is some overhead to using Parallel.* and Tasks, so it's possible on a scenario by scenario basis you could avoid some of this cost by using interlocks to keep track of a work count and only going parallel if that current work count is below some threshold.&lt;/p&gt;
</description></item><item><title>re: Achieving Speedups with Small Parallel Loop Bodies</title><link>http://blogs.msdn.com/pfxteam/archive/2009/06/06/9703059.aspx#9753866</link><pubDate>Mon, 15 Jun 2009 19:59:52 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9753866</guid><dc:creator>Luke Puplett</dc:creator><description>&lt;p&gt;Thanks very much for the prompt response, Stephen.&lt;/p&gt;
&lt;p&gt;From what you say, I now understand that I can code with Parallel.* and not have to think about this issue.&lt;/p&gt;
&lt;p&gt;I'm considering the view of a developer that knows not of her code's 'position' within a call stack, or a component that may end up being called recursively (even if indirectly via side-effects).&lt;/p&gt;
&lt;p&gt;So last question before lunch:&lt;/p&gt;
&lt;p&gt;Would your guidance be to base sync/async decisions on the available ThreadPool threads or is there a TPL way to just keep on banging out new tasks safe in the knowledge that the DOP is being managed by people much smarter than me :) ?&lt;/p&gt;
&lt;p&gt;(I'm sorry if I've drifted away from topic, but I hope I'm not the only one who worries about writing something that if used unexpectedly, explodes into zillions of queued items) --Luke&lt;/p&gt;
</description></item><item><title>re: Achieving Speedups with Small Parallel Loop Bodies</title><link>http://blogs.msdn.com/pfxteam/archive/2009/06/06/9703059.aspx#9757771</link><pubDate>Tue, 16 Jun 2009 06:55:28 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9757771</guid><dc:creator>toub</dc:creator><description>&lt;p&gt;Hi Luke-&lt;/p&gt;
&lt;p&gt;In an ideal world, I'd like to say that Task and Parallel.For/ForEach have no overhead so you can use them without concern for that. &amp;nbsp;Unfortunately, parallelization does incur overheads, and while we work hard to keep those overheads minimal, they still exist. &amp;nbsp;Thus, there are several facets to an answer to your question.&lt;/p&gt;
&lt;p&gt;We have explicitly designed our APIs with composition in mind, such that if you for example create nested Parallel.For calls, they'll play nicely together. &amp;nbsp;It doesn't matter whether they're both in your assembly, or whether your assembly has a Parallel.For that calls into another assembly that has a Parallel.For, they should still play nicely together. &amp;nbsp;So, if you have an individual safe loop that has enough work to warrant parallelizing it, by all means, parallelize it. &amp;nbsp;Others can consume your library in their code parallelized with other TPL constructs, and they'll play together nicely.&lt;/p&gt;
&lt;p&gt;Now, a key statement in my previous paragraph was &amp;quot;loop that has enough work to warrant parallelizing it&amp;quot;. &amp;nbsp;It's often the case that we want to use Tasks, for example, to parallelize a large operation, such as a recursive divide-and-conquer decomposition, e.g. quicksort. &amp;nbsp;Using Tasks towards the root of the decomposition is likely a good thing to do, as the work represented by those tasks will be significant. &amp;nbsp;If, however, you continue to divide-and-conquer, using tasks for every level of the recursion, eventually you'll end up sorting just one or two elements in the body of a Task, which is extremely little work when compared to the cost of a Task (which, as I mentioned, while small, isn't negligable). &amp;nbsp;Thus, in order to get the best speedups, you do want to consider switching over to a sequential implementation at some point in the recursion, similar to how in a recursive sequential implementation of quicksort, at some point in the recursion you likely switch over to a different algorithm, like insertion sort, for better overall performance. &amp;nbsp;At what point you switch over to sequential in such a problem really needs to be based on the details of the problem and on performance testing. &amp;nbsp;If, however, each of the Tasks you're spinning up contains enough work in and of itself to justify the costs of that Task, then by all means keep on &amp;quot;banging out new tasks&amp;quot;, as that will enable your code to scale well on larger and larger machines.&lt;/p&gt;
&lt;p&gt;I hope that helps.&lt;/p&gt;
</description></item><item><title>Achieving Speedups with Small Parallel Loop Bodies | VishwaTech IT News</title><link>http://blogs.msdn.com/pfxteam/archive/2009/06/06/9703059.aspx#9769231</link><pubDate>Wed, 17 Jun 2009 15:22:05 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9769231</guid><dc:creator>Achieving Speedups with Small Parallel Loop Bodies | VishwaTech IT News</dc:creator><description>&lt;p&gt;PingBack from &lt;a rel="nofollow" target="_new" href="http://www.vishwatech.com/globalnews/?p=1719"&gt;http://www.vishwatech.com/globalnews/?p=1719&lt;/a&gt;&lt;/p&gt;
</description></item><item><title>re: Achieving Speedups with Small Parallel Loop Bodies</title><link>http://blogs.msdn.com/pfxteam/archive/2009/06/06/9703059.aspx#9788426</link><pubDate>Fri, 19 Jun 2009 17:44:18 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9788426</guid><dc:creator>Luke Puplett</dc:creator><description>&lt;p&gt;Thanks again. You're remarkably perceptive because I am indeed working on a Conqueror&amp;lt;T&amp;gt; class which is designed to take a Lambda to split the sides of a collection. It then, depending upon DOP and a 'work size' threshold, works on one side of the results (using anon methods or recursive calls) on a new thread and while the calling thread falls through to execute the other 'side' delegate.&lt;/p&gt;
&lt;p&gt;It got me thinking about how even MS approach this puzzle, considering even within the AppDomain some code may call into, say, the Robotics SDK with its CCR API (if both look to the same threadpool to make decisions then no worries I guess) &lt;/p&gt;
&lt;p&gt;-- its like &amp;quot;I'm going parallel&amp;quot;, &amp;quot;Well, so am I&amp;quot;, &amp;quot;Well, that'd just create a switching storm, back away from my threadpool!&amp;quot;&lt;/p&gt;
&lt;p&gt;Anyway, its way off topic now and I'm suitably furnished with what you've said. Thanks.&lt;/p&gt;
</description></item></channel></rss>