<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Parallel Scalability Isn’t Child’s Play</title><link>http://blogs.msdn.com/ddperf/archive/2009/03/16/parallel-scalability-isn-t-child-s-play.aspx</link><description>In a recent blog entry , Dr. Neil Gunther, a colleague from the Computer Measurement Group (CMG), warned about unrealistic expectations being raised with regard to the performance of parallel programs on current multi-core hardware. Neil’s blog entry</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>re: Parallel Scalability Isn’t Child’s Play</title><link>http://blogs.msdn.com/ddperf/archive/2009/03/16/parallel-scalability-isn-t-child-s-play.aspx#9482197</link><pubDate>Tue, 17 Mar 2009 03:40:08 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9482197</guid><dc:creator>Sunny</dc:creator><description>&lt;p&gt;There is no FSB in Nehalem processor based platforms. Nehalelem uses Intel QuickPath Interconnect technology to connect multiple processor and the I/O hub. See &lt;a rel="nofollow" target="_new" href="http://www.intel.com/technology/quickpath/index.htm"&gt;http://www.intel.com/technology/quickpath/index.htm&lt;/a&gt; for more information.&lt;/p&gt;
</description></item><item><title>re: Parallel Scalability Isn’t Child’s Play</title><link>http://blogs.msdn.com/ddperf/archive/2009/03/16/parallel-scalability-isn-t-child-s-play.aspx#9483972</link><pubDate>Tue, 17 Mar 2009 18:53:28 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9483972</guid><dc:creator>Antonio Drusin</dc:creator><description>&lt;p&gt;I think there is at least one (semi) PIM machine out there. The core processor by IBM/Toshiba/Sony. It has 256KB of RAM attached to each processing unit and uses explicit DMA to push/pull info from external memory. It seems to perform well.&lt;/p&gt;
</description></item><item><title>Parallel Scalability Isn’t Child’s Play, Part 2: Amdahl’s Law vs. Gunther’s Law</title><link>http://blogs.msdn.com/ddperf/archive/2009/03/16/parallel-scalability-isn-t-child-s-play.aspx#9575050</link><pubDate>Wed, 29 Apr 2009 08:23:36 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9575050</guid><dc:creator>Developer Division Performance Engineering blog</dc:creator><description>&lt;p&gt;Part 1 of this series of blog entries discussed results from simulating the performance of a massively&lt;/p&gt;
</description></item><item><title>re: Parallel Scalability Isn’t Child’s Play</title><link>http://blogs.msdn.com/ddperf/archive/2009/03/16/parallel-scalability-isn-t-child-s-play.aspx#9718433</link><pubDate>Tue, 09 Jun 2009 23:49:08 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9718433</guid><dc:creator>Jibey Jacob</dc:creator><description>&lt;p&gt;Don't most Web servers theoretically host massively parallel apps? &amp;nbsp;Each HTTP request is processed by spinning off a thread, and HTTP is ideal for this since its stateless. So, &amp;nbsp;according to this article, IIS wouldn't perform exponentially well going from two to four cores and higher:&lt;/p&gt;
&lt;p&gt;&lt;a rel="nofollow" target="_new" href="http://blogs.msdn.com/ddperf/archive/2009/03/16/parallel-scalability-isn-t-child-s-play.aspx"&gt;http://blogs.msdn.com/ddperf/archive/2009/03/16/parallel-scalability-isn-t-child-s-play.aspx&lt;/a&gt;&lt;/p&gt;
</description></item><item><title>re: Parallel Scalability Isn’t Child’s Play</title><link>http://blogs.msdn.com/ddperf/archive/2009/03/16/parallel-scalability-isn-t-child-s-play.aspx#9718446</link><pubDate>Wed, 10 Jun 2009 00:02:41 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9718446</guid><dc:creator>Jibey Jacob</dc:creator><description>&lt;p&gt;Allow me to refine what I've said. Since this article is suggesting that the problem is with multicore hardware, &amp;nbsp;all claims that have been made about all these Web servers being massively scalable is false.&lt;/p&gt;
</description></item><item><title>re: Parallel Scalability Isn’t Child’s Play</title><link>http://blogs.msdn.com/ddperf/archive/2009/03/16/parallel-scalability-isn-t-child-s-play.aspx#9718625</link><pubDate>Wed, 10 Jun 2009 01:38:01 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9718625</guid><dc:creator>MarkBFriedman</dc:creator><description>&lt;p&gt;Excellent question/comment. &lt;/p&gt;
&lt;p&gt;I wasn’t particularly headed in that direction originally. But I do like the direction, and I promise I will address it in more detail at some point down the road before I get to the end of this topic. Web-site scalability isn't always child's play either. Reviewing published TPC benchmarks results is very instructive in this regard -- see www.tpc.org for details. Note how the price per request tends to increase as you run the benchmarks on machines with more and more processors.&lt;/p&gt;
&lt;p&gt;As you suggest, the characteristics of most Web-based applications do, in fact, lend themselves to parallel processing. The stateless nature of the HTTP protocol certainly promotes this. HTTP requests can normally be processed in parallel by asynchronous agents, which in ASP.NET are dispatched and processed by individual worker threads from a thread pool. Transaction processing workloads, in general, share these characteristics. As you continue through the blog entries in this series, you will see that I do start to drill in on the performance characteristics of the .NET thread pool. &lt;/p&gt;
&lt;p&gt;In the context of this set of postings, however, I have been avoiding transaction processing workloads generally. I am focusing mainly on the parallel programming initiatives here that are designed to give developers better programming abstractions and models for developing concurrent programs. HTTP request processing already has an inherent concurrent programming model that you are correct in thinking software developers have had no difficulty in adopting.&lt;/p&gt;
&lt;p&gt;Some approaches to parallel programming that I regard very highly try to apply a transaction processing pattern to other types of computing problems. This can be traced at least as far back as Tony Hoare's &amp;quot;cooperating sequential processes&amp;quot; paradigm. A recent example is Rick Molloys's MSDN article entitled &amp;quot;Solving The Dining Philosophers Problem With Asynchronous Agents&amp;quot; available at &lt;a rel="nofollow" target="_new" href="http://msdn.microsoft.com/en-us/magazine/dd882512.aspx"&gt;http://msdn.microsoft.com/en-us/magazine/dd882512.aspx&lt;/a&gt;. &lt;/p&gt;
&lt;p&gt;I am a big proponent of this approach. For example, there was a Microsoft Research project called Singularity (which you can read about here: &lt;a rel="nofollow" target="_new" href="http://msdn.microsoft.com/en-us/magazine/cc163603.aspx"&gt;http://msdn.microsoft.com/en-us/magazine/cc163603.aspx&lt;/a&gt;) that relies exclusively on message passing between asynchronous agents as the sole method of interprocess communication in an OS. That is a great idea, I believe.&lt;/p&gt;
&lt;p&gt;There is one caveat to be aware of, though, when considering ASP.NET workloads. For a number of reasons, not all ASP.NET request processing conforms to the pure, connectionless and stateless model for HTTP requests. Most transaction processing web sites maintain at least some session-oriented state in order to improve the user experience in trafficking on the web. (A good example from commercial transaction processing web sites is associating the user request with a specific, retained shopping cart value.) &lt;/p&gt;
&lt;p&gt;Moreover, to improve performance, many ASP.NET applications rely on caching, where frequently re-used state is retained across requests. (A decent, although, unfortunately, not particularly current overview of the ASP.NET Caching mechanisms is here: &lt;a rel="nofollow" target="_new" href="http://msdn.microsoft.com/en-us/library/aa478965.aspx#aspnet-cachingtechniquesbestpract_topic4"&gt;http://msdn.microsoft.com/en-us/library/aa478965.aspx#aspnet-cachingtechniquesbestpract_topic4&lt;/a&gt;.) In ASP.NET, View state, Session state, the Page cache, and the Application cache are all mechanisms for retaining state across requests that larger web sites rely on to improve performance and scalability. All these caching mechanisms have significant performance &amp;amp; scalability performance considerations. ASP.NET also supports standard state-retention mechanisms that are incorporated into the HTTP protocol, such as cookies and query strings. IIS itself supplies additional caching mechanisms. And, finally, there is a Microsoft distributed caching technology known as Velocity (see &lt;a rel="nofollow" target="_new" href="http://msdn.microsoft.com/en-us/library/cc645013.aspx"&gt;http://msdn.microsoft.com/en-us/library/cc645013.aspx&lt;/a&gt; and &lt;a rel="nofollow" target="_new" href="http://code.msdn.microsoft.com/velocity"&gt;http://code.msdn.microsoft.com/velocity&lt;/a&gt; for details) that can be used across clustered ASP.NET web applications. Maintaining cache coherence in the ASP.NET Application-level cache, for example, has many of the same issues around parallel programming and coherence-oriented delays that I am highlighting here.&lt;/p&gt;
&lt;p&gt;Thanks for the comment, and stay tuned for more on this topic.&lt;/p&gt;
&lt;p&gt;-- Mark Friedman&lt;/p&gt;
</description></item><item><title>re: Parallel Scalability Isn’t Child’s Play</title><link>http://blogs.msdn.com/ddperf/archive/2009/03/16/parallel-scalability-isn-t-child-s-play.aspx#9718721</link><pubDate>Wed, 10 Jun 2009 02:40:25 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9718721</guid><dc:creator>MarkBFriedman</dc:creator><description>&lt;p&gt;To your first point, transaction processing workloads (which includes web-oriented request processing) are inherently parallel. (But they can still block -- there are lots of gotchas.)&lt;/p&gt;
&lt;p&gt;To your second comment, not exactly the way I would phrase it, but I am cynical, too, about any general claim to massive scalability. In the immortal words of Ronald Reagan, &amp;quot;Trust, but verify.&amp;quot;&lt;/p&gt;
&lt;p&gt;It is worth noting that shared-nothing clustered approaches in massively parallel transaction processing have achieved some impressive results in scaling near linearly up to a 1000 machines, for example. We are generally trying to achieve something similar today on more general purpose (i.e., commodity) hardware, which is a challenge. Scalability can be achieved, but requires a long, tough push to get there on both the hardware and software sides. That holds for web applications, databases, HPC (e.g., OpenMP), etc. Minimizing shared state is crucial to achieiving good scalability in all these application domains.&lt;/p&gt;
</description></item><item><title>re: Parallel Scalability Isn’t Child’s Play</title><link>http://blogs.msdn.com/ddperf/archive/2009/03/16/parallel-scalability-isn-t-child-s-play.aspx#9724474</link><pubDate>Wed, 10 Jun 2009 18:48:32 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9724474</guid><dc:creator>Jibey Jacob</dc:creator><description>&lt;p&gt;ASP.NET adds a lot of functionality over HTTP to make Web programming easier, and it has been my experience that state management in ASP.NET does not scale very well. Cookies and query strings in HTTP does not by any way impose performance penalties as those imposed by Session State, View State, etc. in ASP.NET.&lt;/p&gt;
&lt;p&gt;While new APIs that have been introduced in Windows Server 2003 for applications to take advantage of NUMA architectures are useful, well written multithreaded code that simply use primitive thread synchronization mechanisms would scale well on NUMA architectures as the Windows scheduler is intelligent enough to facilitate those apps with the advantages offered by NUMA architectures.&lt;/p&gt;</description></item><item><title>re: Parallel Scalability Isn’t Child’s Play</title><link>http://blogs.msdn.com/ddperf/archive/2009/03/16/parallel-scalability-isn-t-child-s-play.aspx#9724530</link><pubDate>Wed, 10 Jun 2009 19:20:58 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9724530</guid><dc:creator>Jibey Jacob</dc:creator><description>&lt;p&gt;And there's a real problem generally in any computing network that uses Ethernet that I haven't seen addressed. Ethernet's global hierarchical model and the use of Ethernet device drivers in the kernel with full privileges opens up the kernel to all kinds of attacks. Multithreaded programming would be impossible on such networks. I've sworn to never again work for any employer unless:&lt;/p&gt;
&lt;p&gt;1. Everything attached to the Internet around the planet is upgraded to IPv6. This implies use of InfiniBand at Layer 2, in lieu of Ethernet.&lt;/p&gt;
&lt;p&gt;2. The Internet is enhanced with the logic to biometrically authenticate human users and determine that the person being authenticated cannot be at more than one place at the same time. This requires the development of a new routing protocol that enhances IPv6 and one that uses GPS-bases addressing in addition to the functionality in IPv6 addressing.&lt;/p&gt;
&lt;p&gt;3. WiMAX is rolled out around the planet so I can use a VoIP phone on this newly enhanced Internet wherever I go. My phone must be intelligent enough to tell me whether any phone number is on a PSTN, cell network or if its a VoIP number, and I wouldn't want to deal with anyone on a PSTN or cell network.&lt;/p&gt;
&lt;p&gt;4. When I do start working again, I wouldn't want any device drivers running in the Windows kernel other than signed drivers from Microsoft. All other drivers will have to be user-mode drivers signed by their respective developer companies and certified by Microsoft HQL.&lt;/p&gt;
&lt;p&gt;4. All monies that are due to me must be paid to me ASAP.&lt;/p&gt;
&lt;p&gt;Without all of this in place, I just can't be sure that I'm getting paid for my work.&lt;/p&gt;
</description></item></channel></rss>