<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>MSDN Utopia : Parallelism</title><link>http://blogs.msdn.com/salvapatuel/archive/tags/Parallelism/default.aspx</link><description>Tags: Parallelism</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Design: Task Parallel Library explored</title><link>http://blogs.msdn.com/salvapatuel/archive/2007/11/11/task-parallel-library-explored.aspx</link><pubDate>Mon, 12 Nov 2007 00:11:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:6117352</guid><dc:creator>Salva Patuel</dc:creator><slash:comments>10</slash:comments><comments>http://blogs.msdn.com/salvapatuel/comments/6117352.aspx</comments><wfw:commentRss>http://blogs.msdn.com/salvapatuel/commentrss.aspx?PostID=6117352</wfw:commentRss><description>&lt;FONT face=Calibri&gt;
&lt;P class=MsoNormal style="MARGIN: 0cm 0cm 10pt; TEXT-ALIGN: justify"&gt;&lt;FONT size=3&gt;As some of you may know, the threadpool code of the .NET is helping many developers to use multiple threads on their applications, increasing “sometimes” the responsiveness and delegating all the switching responsibility. But, for those less fuzzy that like to know how it really works you usually find that the code is not the best threadpool library. The internal reason for it is mostly related to inherit code. If we really have to write it again we will do it in a completely different way but this is a story for another post, what I really want to focus today is in the inner workings of the task parallel library, that come to complement a important part of it : Organized parallelism.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0cm 0cm 10pt; TEXT-ALIGN: justify"&gt;&lt;FONT size=3&gt;I am not planning to explain what is TPL, if you need this information I will recommend this basic article that will introduce you to the library &lt;/FONT&gt;&lt;A href="http://msdn.microsoft.com/msdnmag/issues/07/10/Futures/default.aspx"&gt;&lt;FONT size=3&gt;here&lt;/FONT&gt;&lt;/A&gt;&lt;FONT size=3&gt;. The idea is to show you how the internal architecture works and some considerations on the use of it, as it not a magical library, you can still write bad code on it.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0cm 0cm 10pt; TEXT-ALIGN: justify"&gt;&lt;FONT size=3&gt;Let’s start with the general architecture, as we can see on the first figure the TPL sits just above the CLR, both teams has been working together in order to coordinate the implementation. The TPL library contains the task scheduler, this is available for extensibility. The best examples of this extensibility are the Parallel framework and PLINQ.&lt;/FONT&gt;&lt;/P&gt;&lt;/FONT&gt;
&lt;P class=MsoNormal style="MARGIN: 0cm 0cm 12pt; TEXT-ALIGN: justify"&gt;&lt;FONT face=Calibri&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;IMG style="WIDTH: 313px; HEIGHT: 164px" height=164 src="http://5uaopw.blu.livefilestore.com/y1pvtkJMzymZotiu3c4SFdKvsHxrZNMCMoPDfITBLeafAgDgjpPSG9nXcTcFQiMpoDAucq4MJEFi7MeVicEXxxqkA/TPL%20Architecture.jpg" width=313 mce_src="http://5uaopw.blu.livefilestore.com/y1pvtkJMzymZotiu3c4SFdKvsHxrZNMCMoPDfITBLeafAgDgjpPSG9nXcTcFQiMpoDAucq4MJEFi7MeVicEXxxqkA/TPL%20Architecture.jpg"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;FONT face=Calibri&gt;
&lt;P class=MsoNormal style="MARGIN: 0cm 0cm 10pt; TEXT-ALIGN: justify"&gt;&lt;FONT size=3&gt;Let’s start from the top of the stack; the parallel FX publishes a set of methods that allows developers to access the library (note that is static (shared VB))&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="BACKGROUND: #eeeeee; MARGIN: 0cm 0cm 12pt; VERTICAL-ALIGN: top; LINE-HEIGHT: 18pt; tab-stops: 45.8pt 91.6pt 137.4pt 183.2pt 229.0pt 274.8pt 320.6pt 366.4pt 412.2pt 458.0pt 503.8pt 549.6pt 595.4pt 641.2pt 687.0pt 732.8pt"&gt;&lt;SPAN style="COLOR: black; mso-fareast-language: EN-GB; mso-bidi-font-family: 'Courier New'; mso-fareast-font-family: 'Times New Roman'"&gt;&lt;FONT size=3&gt;Parallel.For (start, end, delegate(int param) { a[param] = a[param]*a[param]; });&lt;?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0cm 0cm 10pt; TEXT-ALIGN: justify"&gt;&lt;FONT size=3&gt;The method will invoke the TPL library, assigning the parameters and the delegate to execute (note that this can be a lambda expression as well!). The interesting bit comes at this stage when the request enters in the main queue. The library evaluates the amount of threads that will be required for the task, trying to find the optimal amount based on amount of cores and length of the task. Is even possible to decide to use a single thread in a sequential mode, this scales well as the developers does not need to know how many cores will have the final users, and the application will behave differently if the system is scaled in.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0cm 0cm 12pt; TEXT-ALIGN: justify"&gt;&lt;/FONT&gt;&lt;FONT face=Calibri&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;IMG style="WIDTH: 396px; HEIGHT: 224px" height=224 src="http://5uaopw.blu.livefilestore.com/y1pvtkJMzymZosQa2Yj1zDl7QVPyzt8l84vTGNaq2I5koI2PevELfAjVHYGk6vkpf3F0wkRQTCfNjSmsoRcKlbzVA/TPL%20Architecture%202.jpg" width=396 mce_src="http://5uaopw.blu.livefilestore.com/y1pvtkJMzymZosQa2Yj1zDl7QVPyzt8l84vTGNaq2I5koI2PevELfAjVHYGk6vkpf3F0wkRQTCfNjSmsoRcKlbzVA/TPL%20Architecture%202.jpg"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;FONT face=Calibri&gt;
&lt;P class=MsoNormal style="MARGIN: 0cm 0cm 10pt; TEXT-ALIGN: justify"&gt;&lt;FONT size=3&gt;In order to avoid a bottleneck in the queue consumption by the threads the TPL implements a queue per thread. The main queue distributes equally the tasks across the available threads and each thread starts to execute the delegates that are waiting on the individual queues, this is a common pattern applied in OpenMP. In theory this works fine but as we know, the cores are not always available for the threads; therefore one thread maybe finishes the queue earlier than the other threads. This situation is not optimal therefore the queue stealing model is applied. When the thread does not has any other task on the queue it will start to query the neighbour queues looking for tasks. If it finds more tasks it will remove them from the busy queue respecting the order, optimizing the processor time. As you have guessed by this time, the threads are not based on the threadpool, as they have core affinity.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0cm 0cm 10pt; TEXT-ALIGN: justify"&gt;&lt;FONT size=3&gt;This execution model does not&amp;nbsp;guarantees the execution order, as if you have 1000 tasks in a 4 core machine all of them should have 250 tasks, but if a thread finishes earlier it will consume other tasks. This is an important consideration if you need to respect certain order on the execution, as I said at the beginning, it is important to use this extensions when you are completely sure about the task independence. The same applies if the delegates use shared memory, a contention is possible on this scenarios that will limit the scalability.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0cm 0cm 10pt; TEXT-ALIGN: justify"&gt;&lt;FONT size=3&gt;The final architectural consideration is regarding exceptions. In a normal sequential “for” if an exception is detected the loops stops, with the parallel library this becomes a little more difficult.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0cm 0cm 12pt; TEXT-ALIGN: justify"&gt;&lt;/FONT&gt;&lt;FONT face=Calibri&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;IMG style="WIDTH: 314px; HEIGHT: 280px" height=280 src="http://5uaopw.blu.livefilestore.com/y1pvtkJMzymZov4pWxxZiLyH9P5PO6EB1-WC4_7dJsLtR5YcPWncPuiVnyglQq4FR8eVaWxzwCJWS1LEZtgZYkc5g/TPL%20Architecture%203.jpg" width=314 mce_src="http://5uaopw.blu.livefilestore.com/y1pvtkJMzymZov4pWxxZiLyH9P5PO6EB1-WC4_7dJsLtR5YcPWncPuiVnyglQq4FR8eVaWxzwCJWS1LEZtgZYkc5g/TPL%20Architecture%203.jpg"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;FONT face=Calibri&gt;
&lt;P class=MsoNormal style="MARGIN: 0cm 0cm 10pt; TEXT-ALIGN: justify"&gt;&lt;FONT size=3&gt;If an exception occurs in one of the threads the TPL will bubble up the exception and it will inform the other threads that the execution needs to be cancelled. This happens automatically, but the TPL cannot guarantee that some of the tasks are executed after the exception. Now, what happen if two threads throw an exception at the same time? The parallel library throws a special exception type that contains both exceptions. This is something important to consider.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0cm 0cm 10pt; TEXT-ALIGN: justify"&gt;&lt;FONT size=3&gt;To summarize this blog I must confess that I am quite excited about the potential of this library in the future, is only the first version but I can see the future of parallel computing in a single place. If we think about the next step....mmm...maybe including grid computing... &lt;SPAN style="FONT-FAMILY: Wingdings; mso-ascii-theme-font: minor-latin; mso-hansi-theme-font: minor-latin; mso-ascii-font-family: Calibri; mso-hansi-font-family: Calibri; mso-char-type: symbol; mso-symbol-font-family: Wingdings"&gt;&lt;SPAN style="mso-char-type: symbol; mso-symbol-font-family: Wingdings"&gt;J&lt;/SPAN&gt;&lt;/SPAN&gt; I better stop here.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0cm 0cm 10pt; TEXT-ALIGN: justify"&gt;&lt;FONT size=3&gt;UPDATED: New Microsoft Parallel Computing Developer Section (including CTP download: &lt;A href="http://msdn2.microsoft.com/en-us/concurrency/default.aspx" target=_new rel=nofollow&gt;&lt;FONT color=#35648c&gt;http://msdn2.microsoft.com/en-us/concurrency/default.aspx&lt;/FONT&gt;&lt;/A&gt;)&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=6117352" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/salvapatuel/archive/tags/Parallelism/default.aspx">Parallelism</category><category domain="http://blogs.msdn.com/salvapatuel/archive/tags/Architecture/default.aspx">Architecture</category><category domain="http://blogs.msdn.com/salvapatuel/archive/tags/TPL/default.aspx">TPL</category></item><item><title>Design: How good SOA can help you dealing with multi-cores</title><link>http://blogs.msdn.com/salvapatuel/archive/2007/07/09/design-how-good-soa-can-help-you-dealing-with-multi-cores.aspx</link><pubDate>Mon, 09 Jul 2007 17:59:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3782615</guid><dc:creator>Salva Patuel</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/salvapatuel/comments/3782615.aspx</comments><wfw:commentRss>http://blogs.msdn.com/salvapatuel/commentrss.aspx?PostID=3782615</wfw:commentRss><description>&lt;FONT face=Calibri&gt;
&lt;P style="TEXT-ALIGN: justify; MARGIN: 0cm 0cm 10pt" class=MsoNormal&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-SIZE: 12pt"&gt;Unless you have been living under a rock for the last 5 years you should know about the multi-core architecture that the new processors bring to our world. Well, processors hardly can go any faster with the current technology therefore we need to start to have parallel thoughts...&lt;?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="TEXT-ALIGN: justify; MARGIN: 0cm 0cm 10pt" class=MsoNormal&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-SIZE: 12pt"&gt;Well, you are about to design a solution and you are facing the challenge &lt;I style="mso-bidi-font-style: normal"&gt;&lt;SPAN style="COLOR: #984806; mso-themecolor: accent6; mso-themeshade: 128"&gt;“How I can take the most of the multi-core processor that my server has” &lt;/SPAN&gt;&lt;/I&gt;Sometimes is not easy to take advantage of this extra horse due the lack of parallel design skills, I have seen many attempts to achieve this with more headaches than solutions. &lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="TEXT-ALIGN: justify; MARGIN: 0cm 0cm 10pt" class=MsoNormal&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-SIZE: 12pt"&gt;After working on several projects where we were trying to improve the performance of certain components I came across multiple solutions but there is one tendency to solve this problem that comes within the SOA shadow. Thinking about services can help you to divide the pieces of functionality that can be parallelized; this will provide a natural processor optimization with the incorporation of different processes to get the maximum output from the multi-core environment.&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P style="TEXT-ALIGN: justify; MARGIN: 0cm 0cm 10pt" class=MsoNormal&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-SIZE: 12pt"&gt;I have seen many times a SOA model that includes a service multi layer derived from an ontology model, where the hierarchy drives to the conclusion that some services must be componentized within a single service to reduce complexity. The problem of this merge is that sometimes the developers fail to divide the components (adding dependencies) or they limit the multi threading capabilities of the hardware layer.&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/FONT&gt;
&lt;P style="TEXT-ALIGN: justify; MARGIN: 0cm 0cm 10pt" class=MsoNormal&gt;&lt;o:p&gt;&lt;FONT face=Calibri&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;IMG style="WIDTH: 481px; HEIGHT: 127px" src="http://5uaopw.blu.livefilestore.com/y1pnZqMcd2-6vzqkJUJJOkU8t3rJVbvQ5ndEcrX9U6nnKl_QkUcTsqPpFy37BscyTaeqn6nI8KOZjQ/SOA.jpg" width=481 height=127 mce_src="http://5uaopw.blu.livefilestore.com/y1pnZqMcd2-6vzqkJUJJOkU8t3rJVbvQ5ndEcrX9U6nnKl_QkUcTsqPpFy37BscyTaeqn6nI8KOZjQ/SOA.jpg"&gt;&lt;/FONT&gt;&lt;/o:p&gt;&lt;/P&gt;&lt;FONT face=Calibri&gt;
&lt;P style="TEXT-ALIGN: justify; MARGIN: 0cm 0cm 10pt" class=MsoNormal&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-SIZE: 12pt"&gt;In the figure above we can see the service hierarchy that encapsulates service C1 and C2, when they are implemented in a single process the services needs to properly handle the threading model and the communication model, using memory synchronization patterns. This adds locking and latency problems that reduce the performance and scalability when the service is scaled in.&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/FONT&gt;
&lt;P style="TEXT-ALIGN: justify; MARGIN: 0cm 0cm 10pt" class=MsoNormal&gt;&lt;o:p&gt;&lt;FONT face=Calibri&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;IMG style="WIDTH: 193px; HEIGHT: 281px" src="http://5uaopw.blu.livefilestore.com/y1pWwfjZBHtKoKGfVkWDQbjgXwLPk0yRolZ-T32p8i9AQCoxkXXtG6GoWE264sYxHI_BN2gwdhjQYo/SOA%202.jpg" width=193 height=281 mce_src="http://5uaopw.blu.livefilestore.com/y1pWwfjZBHtKoKGfVkWDQbjgXwLPk0yRolZ-T32p8i9AQCoxkXXtG6GoWE264sYxHI_BN2gwdhjQYo/SOA%202.jpg"&gt;&lt;/FONT&gt;&lt;/o:p&gt;&lt;/P&gt;&lt;FONT face=Calibri&gt;
&lt;P style="TEXT-ALIGN: justify; MARGIN: 0cm 0cm 10pt" class=MsoNormal&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-SIZE: 12pt"&gt;If we decide to divide our services in different process and use the common contract as communication layer we can solve the problem reducing complexity and isolating the threads. This helps you to reduce the component coupling and scales in better and cheaper. There is no need for synchronizating shared resources as the process isolation will dela with the individual heaps.&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="TEXT-ALIGN: justify; MARGIN: 0cm 0cm 10pt" class=MsoNormal&gt;&lt;o:p&gt;&lt;FONT face=Calibri&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;IMG style="WIDTH: 214px; HEIGHT: 298px" src="http://5uaopw.blu.livefilestore.com/y1psE7tmHP46RC4-JAqGKDV86mDX5WCg08SUNME8hf_XXHAfifkwIcx2NCsZ6xG0LiDtpOT-E4utv4/SOA%203.jpg" width=214 height=298 mce_src="http://5uaopw.blu.livefilestore.com/y1psE7tmHP46RC4-JAqGKDV86mDX5WCg08SUNME8hf_XXHAfifkwIcx2NCsZ6xG0LiDtpOT-E4utv4/SOA%203.jpg"&gt;&lt;/FONT&gt;&lt;/o:p&gt;&lt;/P&gt;&lt;FONT face=Calibri&gt;
&lt;P style="MARGIN: 0cm 0cm 10pt" class=MsoNormal&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-SIZE: 12pt; mso-fareast-language: EN-GB"&gt;What is more, the model can have processor affinity to control the load on the hardware, this can became more common in a 4x4 architecture where 16 processors can give you not only the service isolation, also the hardware. As software architects need to deal with teams without the proper parallelism skills, this type of approach can help you reducing project risk and delivers a proper set of services to the business processes that allows you more flexibility if a new process needs the C1 service directly.&lt;/SPAN&gt;&lt;SPAN style="LINE-HEIGHT: 115%; FONT-FAMILY: 'Times New Roman','serif'; FONT-SIZE: 12pt; mso-bidi-font-family: 'Times New Roman'; mso-fareast-language: EN-GB; mso-bidi-theme-font: minor-bidi"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;o:p&gt;&lt;FONT face=Calibri&gt;&lt;/FONT&gt;&lt;/o:p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=3782615" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/salvapatuel/archive/tags/SOA/default.aspx">SOA</category><category domain="http://blogs.msdn.com/salvapatuel/archive/tags/Parallelism/default.aspx">Parallelism</category><category domain="http://blogs.msdn.com/salvapatuel/archive/tags/Architecture/default.aspx">Architecture</category><category domain="http://blogs.msdn.com/salvapatuel/archive/tags/Design/default.aspx">Design</category></item></channel></rss>