<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Parallel Aggregations in PLINQ</title><link>http://blogs.msdn.com/pfxteam/archive/2008/01/22/7211660.aspx</link><description>Quick Overview of LINQ Aggregations In order to explain the issues we encounter when parallelizing aggregations in PLINQ, let's first take a quick look at how aggregations work in LINQ. Aggregation is an operation that iterates over a sequence of input</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>re: Parallel Aggregations in PLINQ</title><link>http://blogs.msdn.com/pfxteam/archive/2008/01/22/7211660.aspx#7222782</link><pubDate>Thu, 24 Jan 2008 17:54:02 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:7222782</guid><dc:creator>Thomas Danecker</dc:creator><description>&lt;p&gt;I don't understand why you're building the Aggregate method that complicated. If you make func a binary method of the form TAccumulate x TAccumulate -&amp;gt; TAccumulate, you're getting an algebraic structure.&lt;/p&gt;
&lt;p&gt;So, (TAccumulate, func) has to be a commutative Monoid which implies all the properties you described.&lt;/p&gt;
&lt;p&gt;This can be easily achieved by introducing a valueSelector: TSource -&amp;gt; TAccumulate.&lt;/p&gt;
&lt;p&gt;You're getting the following method signature:&lt;/p&gt;
&lt;p&gt;public static TResult Aggregate&amp;lt;TSource, TAccumulate, TResult&amp;gt;(&lt;/p&gt;
&lt;p&gt; &amp;nbsp; &amp;nbsp;this IEnumerable&amp;lt;TSource&amp;gt; source,&lt;/p&gt;
&lt;p&gt; &amp;nbsp; &amp;nbsp;Func&amp;lt;TSource, TAccumulate&amp;gt; valueSelector,&lt;/p&gt;
&lt;p&gt; &amp;nbsp; &amp;nbsp;TAccumulate identityElement,&lt;/p&gt;
&lt;p&gt; &amp;nbsp; &amp;nbsp;Func&amp;lt;TAccumulate, TAccumulate, TAccumulate&amp;gt; func,&lt;/p&gt;
&lt;p&gt; &amp;nbsp; &amp;nbsp;Func&amp;lt;TAccumulate, TResult&amp;gt; resultSelector&lt;/p&gt;
&lt;p&gt; &amp;nbsp; &amp;nbsp;)&lt;/p&gt;
&lt;p&gt;{&lt;/p&gt;
&lt;p&gt; &amp;nbsp; &amp;nbsp;TAccumulate accumulator = identityElement;&lt;/p&gt;
&lt;p&gt; &amp;nbsp; &amp;nbsp;foreach (TSource elem in source)&lt;/p&gt;
&lt;p&gt; &amp;nbsp; &amp;nbsp;{&lt;/p&gt;
&lt;p&gt; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;accumulator = func(accumulator, valueSelector(elem));&lt;/p&gt;
&lt;p&gt; &amp;nbsp; &amp;nbsp;}&lt;/p&gt;
&lt;p&gt; &amp;nbsp; &amp;nbsp;return resultSelector(accumulator);&lt;/p&gt;
&lt;p&gt;}&lt;/p&gt;
&lt;p&gt;Now you can split up the source and compute thread-local accumulators and use also func(accumulator1, accumulator2) to combine them.&lt;/p&gt;
&lt;p&gt;I do not like the fact to provide two different accumulator functions. It makes no sense either because the second method must be quasi the same as the first because of the nondeterminism (i.e. every value could be processed by a seperate thread).&lt;/p&gt;
&lt;p&gt;With this version, you can implement the Aggregate method like a binary tree where the two childs are always independent from the other.&lt;/p&gt;
&lt;p&gt;func(&lt;/p&gt;
&lt;p&gt; &amp;nbsp;func(&lt;/p&gt;
&lt;p&gt; &amp;nbsp; &amp;nbsp;valueSelector(elem0),&lt;/p&gt;
&lt;p&gt; &amp;nbsp; &amp;nbsp;valueSelector(elem1))&lt;/p&gt;
&lt;p&gt; &amp;nbsp;func(&lt;/p&gt;
&lt;p&gt; &amp;nbsp; &amp;nbsp;valueSelector(elem2),&lt;/p&gt;
&lt;p&gt; &amp;nbsp; &amp;nbsp;valueSelector(elem3)))&lt;/p&gt;
&lt;p&gt;The leaves of the tree are just calls to valueSelector(elem). In this tree you have always the option to run a child in an seperate thread.&lt;/p&gt;
</description></item><item><title>re: Parallel Aggregations in PLINQ</title><link>http://blogs.msdn.com/pfxteam/archive/2008/01/22/7211660.aspx#7226373</link><pubDate>Thu, 24 Jan 2008 23:07:38 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:7226373</guid><dc:creator>igoro</dc:creator><description>&lt;p&gt;Thanks for your suggestion, Thomas.&lt;/p&gt;
&lt;p&gt;We have thought about the approach you are proposing previously. I agree that it is logically cleaner, and has all the desired algebraic properties. Unfortunately, it also comes with a price: an extra delegate invocation for each element in the input.&lt;/p&gt;
&lt;p&gt;Another possible issue is that your proposed approach would rule out aggregations with mutable accumulators. For example, say that you have an aggregation over an IEnumerable&amp;lt;char&amp;gt;, and you want to compute how many times each letter (A-Z) appears. TAccumulate might be an integer array with 26 elements, representing the count of each letter. Value selector would have to return an array of 26 integers, so we'd end up constructing an array for each input element. On the other hand, a (TAccumulate, TElement)-&amp;gt;TAccumulate reduce function could simply increment one element in the array, and return the same array.&lt;/p&gt;
&lt;p&gt;Note that as of the CTP release, PLINQ does not support mutable accumulators, but it is an extension we are considering. We would need an overload where the user provides a seedConstructor delegate instead of a seed instance, so that we can have multiple independent accumulators.&lt;/p&gt;
&lt;p&gt;So, that's my take on this issue. However, the Aggregate design is far from closed, and we are are open to consider other solutions.&lt;/p&gt;
&lt;p&gt;Igor&lt;/p&gt;
</description></item><item><title>re: Parallel Aggregations in PLINQ</title><link>http://blogs.msdn.com/pfxteam/archive/2008/01/22/7211660.aspx#7358615</link><pubDate>Fri, 01 Feb 2008 00:07:39 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:7358615</guid><dc:creator>David Callahan</dc:creator><description>&lt;p&gt;Another issue with simply mapping TSource =&amp;gt; TAccumator is that operation of combining TSource's may be singificantly cheaper than combining TAccumator's (think Maxtrix-matrix multiplication instead of matrix-vector multiplication).&lt;/p&gt;
&lt;p&gt;A different use of a function mapping TSource-&amp;gt;TAccumaultor is to avoid the change of &amp;quot;seed&amp;quot; to &amp;quot;identify&amp;quot;. Seed is nice because it gives semantics to zero-length sequences. A mapping from source to accumulator can be used to handle sequences of lenght 1 and avoid the need for a separate zero element or change in semantics of the seed elelement.&lt;/p&gt;
</description></item></channel></rss>