<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Brandon Werner : concurrency</title><link>http://blogs.msdn.com/brandonwerner/archive/tags/concurrency/default.aspx</link><description>Tags: concurrency</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Software Transactional Memory: Debunked?</title><link>http://blogs.msdn.com/brandonwerner/archive/2009/01/14/software-transactional-memory-debunked.aspx</link><pubDate>Wed, 14 Jan 2009 12:45:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9319099</guid><dc:creator>brandon_werner</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/brandonwerner/comments/9319099.aspx</comments><wfw:commentRss>http://blogs.msdn.com/brandonwerner/commentrss.aspx?PostID=9319099</wfw:commentRss><wfw:comment>http://blogs.msdn.com/brandonwerner/rsscomments.aspx?PostID=9319099</wfw:comment><description>&lt;p&gt;If I go in to my excellent academic article organizer, &lt;a title="Papers" mce_href="http://mekentosj.com/papers/" href="http://mekentosj.com/papers/"&gt;Papers&lt;/a&gt;, and search for "software transaction memory" or "stm" I get at least 30 results of papers both high level and detailed regarding this next big thing that will allow us to finally, without any effort, take advantage of our multi-core CPUs and handle all the nasty locking and synchronization issues for us with nothing more than a language keyword. So much publicity has been given to this idea that no less than three presenters at the Google Scalability Conference mentioned it, with one presentation being nothing but a glimpse in to the STM future.&lt;/p&gt;  &lt;p&gt;As &lt;a title="Thoughts On Google’s Conference on Scalability In Seattle" mce_href="http://www.brandonwerner.com/2008/06/16/thoughts-on-googles-conference-on-scalability-in-seattle/" href="http://www.brandonwerner.com/2008/06/16/thoughts-on-googles-conference-on-scalability-in-seattle/"&gt;I wrote then&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;This was the trend though, as all of the presentations had a bit of hand waving regarding performance metrics and distribution of computation. This was highlighted by the talk of Vijay Menon of Google - whose work at Intel I was familiar with - discussing Software Transactional Memory. He illustrated the challenges of implementing this in an imperative language (I’m suspicious you can even do STM well in an imperative language with state - &lt;a title="The Rise Of Functional Programming: F#/Scala/Haskell and the failing of Lisp" mce_href="http://www.brandonwerner.com/2008/01/13/the-rise-of-functional-programming-fscalahaskell-and-the-failing-of-lisp/" href="http://www.brandonwerner.com/2008/01/13/the-rise-of-functional-programming-fscalahaskell-and-the-failing-of-lisp/"&gt;as I discussed before&lt;/a&gt;) but beyond suggesting the keyword “atomic” to replace “synchronized” in the Java language there was very little real content discussed for those already familiar with the issue of locks and multiprocessors. Concurrent Haskell wasn’t even mentioned. &lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;It turns out that, according to a paper published in the Communications of the ACM, &lt;a title="Software transactional memory: why is it only a research toy?" mce_href="http://doi.acm.org/10.1145/1400214.1400228" href="http://doi.acm.org/10.1145/1400214.1400228"&gt;Software transactional memory: why is it only a research toy?&lt;/a&gt;, Software Transactional Memory may not work at all. The article presents research from IBM, who built the &lt;a title="IBM XL C/C++ for Transactional Memory for AIX" mce_href="http://www.alphaworks.ibm.com/tech/xlcstm/" href="http://www.alphaworks.ibm.com/tech/xlcstm/"&gt;IBM XL C/C++ for Transactional Memory for AIX&lt;/a&gt;, known as IBM STM, and also takes benchmarks from the Intel STM and the SUN TL2 STM. In the paper, they put the STM implementations through the ringer using b+tree and the Delaunay Mesh Refinement algorithm. It's well worth a read.&lt;/p&gt;  &lt;p&gt;Their final analysis puts a deep nail in the coffin of STM:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Based on our results, we believe that the road ahead for STM is quite challenging. Lowering the overheads of STM to a point where it is generally appealing is a difficult task and significantly better results have to be demonstrated. If we could stress a single direction for further research, it is the elimination of dynamically unnecessary read and write barriers—possibly the single most powerful lever toward further reduction of STM overheads. However, given the difficulty of similar problems explored by the research community such as alias analysis, escape analysis, and so on, this may be an uphill battle. And because the argument for TM hinges upon its simplicity and productivity benefits, we are deeply skeptical of any proposed solutions to performance problems that require extra work by the programmer.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Many academics takes the approach that most developers don't need to be aware of, much less optimize for, atomic transactions in their code. Much like pointers and Aunt May's apple pie, it's best to leave those things to the professionals and their compilers. This is the approach argued by Bryan Cantrill and Jeff Bonwick from Sun Microsystems in their article &lt;a title="Real-world concurrency" mce_href="http://doi.acm.org/10.1145/1400214.1400227" href="http://doi.acm.org/10.1145/1400214.1400227"&gt;Real-world concurrency&lt;/a&gt;. Seeing as these are the same people who brought us D-Trace and lockstat in Solaris OS, so it's probably best to take their word on it.&lt;/p&gt;  &lt;p&gt;From the article:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;The most important conclusion from our foray into the history of concurrency is that concurrency has always been employed for one purpose: to improve the performance of the system. This seems almost too obvious to make explicit. Why else would we want concurrency if not to improve performance? And yet for all its obviousness, concurrency's raison d'être is seemingly forgotten, as if the proliferation of concurrent hardware has awakened an anxiety that all software must use all available physical resources. Just as no programmer felt a moral obligation to eliminate pipeline stalls on a superscalar microprocessor, no software engineer should feel responsible for using concurrency simply because the hardware supports it. Rather, concurrency should be considered and used for one reason only: because it is needed to yield an acceptably performing system...&lt;/p&gt;    &lt;p&gt;...To make this concrete, in a typical Model/View/Controller application, the View (typically implemented in environments like JavaScript, PHP, or Flash) and the Controller (typically implemented in environments like J2EE or Ruby on Rails) can consist purely of sequential logic and still achieve high levels of concurrency provided that the Model (typically implemented in terms of a database) allows for parallelism. Given that most don't write their own database (and virtually no one writes their own operating system), it is possible to build (and indeed, many have built) highly concurrent, highly scalable MVC systems without explicitly creating a single thread or acquiring a single lock; it is concurrency by architecture instead of by implementation.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;However, I think that some easy atomic transactional wrappers would be helpful for developers, and hope that the research in to some way of implementing atomic transactions in an easy and accessible way continues. Of course, I am still skeptical that any imperative language with living objects that have state can easily have their transactions atomic and it would appear this paper agrees with this skepticism. Anyone for &lt;a title="Concurrent Haskell" mce_href="http://en.wikipedia.org/wiki/Concurrent_Haskell" href="http://en.wikipedia.org/wiki/Concurrent_Haskell"&gt;Concurrent Haskell&lt;/a&gt;?&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9319099" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/brandonwerner/archive/tags/concurrency/default.aspx">concurrency</category><category domain="http://blogs.msdn.com/brandonwerner/archive/tags/stm/default.aspx">stm</category><category domain="http://blogs.msdn.com/brandonwerner/archive/tags/architecture/default.aspx">architecture</category></item><item><title>Goodbye Map Reduce - Hello Cascading</title><link>http://blogs.msdn.com/brandonwerner/archive/2008/09/19/goodbye-map-reduce-hello-cascading.aspx</link><pubDate>Fri, 19 Sep 2008 19:35:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8959063</guid><dc:creator>brandon_werner</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/brandonwerner/comments/8959063.aspx</comments><wfw:commentRss>http://blogs.msdn.com/brandonwerner/commentrss.aspx?PostID=8959063</wfw:commentRss><wfw:comment>http://blogs.msdn.com/brandonwerner/rsscomments.aspx?PostID=8959063</wfw:comment><description>
&lt;p&gt;An &lt;a href="http://blog.rapleaf.com/dev/?p=33#more-33" title="GOODBYE MAPREDUCE, HELLO CASCADING" mce_href="http://blog.rapleaf.com/dev/?p=33#more-33"&gt;interesting post&lt;/a&gt; from Nathan Marz regarding an abstraction layer from Chris Wensel called &lt;a href="http://www.cascading.org/" title="Cascading" mce_href="http://www.cascading.org/"&gt;Cascading&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
We have been doing a lot of batch processing with Hadoop MapReduce lately, and we quickly realized how painful it can be to write MapReduce jobs by hand. Some parts of our workflow require up to TEN MapReduce jobs to execute in sequence, requiring a lot of hand-coordination of intermediate data and execution order. Additionally, anyone who has done really complex MapReduce workflows knows how hard it is to keep “thinking” in MapReduce.
Luckily, we discovered a great new open source product called Cascading which has alleviated a ton of our pain. Cascading is the brainchild and work of Chris Wensel, and he’s done a great job developing an API which solves many of our problems. Cascading abstracts away MapReduce into a more natural logical model and provides a workflow management layer to handle things like intermediate data and data staleness.
&lt;/blockquote&gt;

&lt;p&gt;Very good walkthrough of how they take a tuple problem set and use Cascading to simplify the management of pipes, particularly forking and merging pipes together. &lt;/p&gt;&lt;p&gt;You may also want to see &lt;a href="http://research.yahoo.com/node/90" title="Yahoo Research Pig" mce_href="http://research.yahoo.com/node/90"&gt;Yahoo Research's Pig&lt;/a&gt; as another example of an abstraction layer over MapReduce, which seem to be all the rage now as we need a way to query / join and generally work with these large datasets in an easy way. Yahoo's Pig seems to rely heavily on SQL like syntax - an approach I'm not as fond of as the approach Cascade takes.
&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8959063" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/brandonwerner/archive/tags/scalability/default.aspx">scalability</category><category domain="http://blogs.msdn.com/brandonwerner/archive/tags/cloud+computing/default.aspx">cloud computing</category><category domain="http://blogs.msdn.com/brandonwerner/archive/tags/concurrency/default.aspx">concurrency</category><category domain="http://blogs.msdn.com/brandonwerner/archive/tags/mapreduce/default.aspx">mapreduce</category></item><item><title>The Rise Of Functional Programming: F#/Scala/Haskell and the failing of Lisp</title><link>http://blogs.msdn.com/brandonwerner/archive/2008/09/16/the-rise-of-functional-programming-f-scala-haskell-and-the-failing-of-lisp.aspx</link><pubDate>Tue, 16 Sep 2008 11:17:22 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8953635</guid><dc:creator>brandon_werner</dc:creator><slash:comments>2</slash:comments><comments>http://blogs.msdn.com/brandonwerner/comments/8953635.aspx</comments><wfw:commentRss>http://blogs.msdn.com/brandonwerner/commentrss.aspx?PostID=8953635</wfw:commentRss><wfw:comment>http://blogs.msdn.com/brandonwerner/rsscomments.aspx?PostID=8953635</wfw:comment><description>&lt;p&gt;Over at Lambda The Ultimate, the best academic programming blog on earth, there is &lt;a title="Prediction for 2008" href="http://lambda-the-ultimate.org/node/2600"&gt;a large debate going on&lt;/a&gt; regarding what the future of languages will be for 2008. The most important thing to emerge from the discussion is the larger role functional programming will play. It seems like a safe bet. This year has seen the explosion of interest and creation of functional languages such as Apple OS X's &lt;a title="Programming Nu" href="http://programming.nu/"&gt;Nu&lt;/a&gt;, Java's JVM using &lt;a title="Scala" href="http://www.scala-lang.org/"&gt;Scala&lt;/a&gt; and Microsoft Research's .Net language &lt;a title="F#" href="http://research.microsoft.com/fsharp/fsharp.aspx"&gt;F#&lt;/a&gt;.&lt;/p&gt; &lt;p&gt;I am ecstatic at this change.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;The Failure Of Lisp&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;It's hard to understand where it came from. Certainly one can argue the broader academic community had nothing to do with it, the old guard Common Lisp hackers are still as fickle and as judgmental to new comers as ever. Also, the old standards in Lisp languages, &lt;a title="Fraz" href="http://www.franz.com/"&gt;Franz&lt;/a&gt; and &lt;a title="LispWorks" href="http://www.lispworks.com/"&gt;LispWorks&lt;/a&gt; have not lowered their prices to anything approachable to the casual developer. There are open source ANSI Lisp implementations without all the supporting engines and functionality, such as SBCL. In fact, my most linked thing I've ever written in my career is the&lt;a title="How To Setup Your Environment Using Allegro" href="http://common-lisp.net/project/cl-semantic/installation.shtml"&gt; installation walk-through I did for installing SBCL and Allegro&lt;/a&gt; which includes adding your repository and packages for CLOS and automatically compiling the FASL files, especially dealing with the asdf differences between the implementations. The complexity of this in itself points to problems with portability and configuration in Lisp. However, even that project that targeted Lisp's Bread and Butter, the parsing of semantic ontologies for the Semantic Web, was met in the message boards with worries on if there would be enough developer participation using such an odd language, and recommendations on moving it to Java.&lt;/p&gt; &lt;p&gt;In reality, Common Lisp showed its failure as a community by sitting out this enthusiasm that has been generated around functional programming languages. It didn't have to be that way. I recall my first awareness of functional programming's growth was the awesome work of &lt;a title="A mostly Lisp weblog by John Wiseman" href="http://lemonodor.com/"&gt;Lemonodor&lt;/a&gt;'s blog and Sriram Krishnan posting &lt;a title="Lisp is sin" href="http://blogs.msdn.com/sriram/archive/2006/01/15/lisp_is_sin.aspx"&gt;"Lisp Is Sin&lt;/a&gt;". I was happy at the time that Lisp was getting such attention, as well as functional language architectures in general. I imagined that as OO languages had grown so verbose and feature dense that even the IDEs to develop your applications run in to the tens of gigabytes, a new evolution "Back To The Future" was inevitable. Even more, I believe long suffering Lisp deserves to be back in favor again, it's certainly spent its time in purgetory. Yet, it didn't happen. You can blame the old 50 year old men sitting on IRC channels for that. It was the most thorny and un-inspiring community I've ever participated in, despite my extreme interest in the language. It's jaw dropping that a language with such promise has sat out the resurgence, and speaks to what an un-friendly and un-inviting community can do a technology platform. I would be the first to march it off to the grave.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;The Rise Of Functional Languages&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;The interest in functional programming actually grew up around more academic but pure languages like Scheme and Haskell. Although these languages sit within their own island and lack many of the "dirty" aspects of Lisp's CLOS environment that make it easy to access OS and hardware resources, they are still strikingly useful in learning things that are the staple of functional languages, such as Closures and Lambdas. Indeed, one could argue that the movement to move Closures in to OO languages (first C#, now Java) was in part due to the rise of awareness of functional languages.&lt;/p&gt; &lt;p&gt;Further, it seems to me that functional programming languages answered two prayers of those more ambitious engineers who don't seem to want to stick with the script and Java worlds they were taught in college. Those two large wins, far more important than the semantic features of functional languages that have gotten all the attention, are architecture foundations of functional languages:&lt;/p&gt; &lt;ul&gt; &lt;li&gt;Referential Transparency / Side Effects  &lt;li&gt;Concurrency &lt;/li&gt;&lt;/ul&gt; &lt;p&gt;&lt;strong&gt;Referential Transparency&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;To those coming from a pure OO world, Referential Transparency and the restriction of side-effects can be something hard to get their heads around. The best way I describe this concept is by hitting at the root of their assumptions: Everything they deal with are dead. The objects are dead, the variables are dead, the entire atmosphere is dead, as if something had come along and killed everything in your stack and you have to assemble your program by only what's been given to you, nothing more. There are no instances, objects do not "come alive" and have state; a state that you have to poke in to and a state that can change at any time. A function will always do what you expect, and nothing can come along and change that behavior.&lt;/p&gt; &lt;p&gt;One of the things that seems to appeal to developers most about the promise of SOA architectures happening in enterprise environments, if you're smart enough to pry it out of them, is that they get the same referential transparency in services. No one can override a service (besides versioning, which is explicit to the developer) and a service will only return what it did earlier in your code and earlier in the year. This forces developers to design services that have the same relationship to the world as functional programmers write their functions for. This is perhaps the trickiest part of migrating enterprise teams to a services based model, their expectations of the mutableness of the services they are accessing and their inability to anticipate what working in that world will be like. Especially for those who use tools or libraries to convert service interaction in to an object, the interaction can be jarring.&lt;/p&gt; &lt;p&gt;However, the soon find the predictability and the safety of such an environment liberating. In much the same way OO programmers were use to making their objects or variables immutable to maintain their contracts and relationships with other objects, often sacrificing many of the benefits that OO programming promised their stack, now they have immutability and transparency in an environment where functional paradigms are key, they do not expect to be able to "embrace and extend" services. They are what they are. This tends to cascade out to the living instantiated code a developer writes as well, as there is no point in entering the world of the living if what you have to return to is a dead function.&lt;/p&gt; &lt;p&gt;This was hinted at in an article in the ACM Queue magazine by Terry Coatta, entitled "&lt;a title="From Here to There, The SOA Way" href="http://www.acmqueue.org/modules.php?name=Content&amp;amp;pa=showpage&amp;amp;pid=507"&gt;From Here to There, The SOA Way&lt;/a&gt;". He states,&lt;/p&gt; &lt;blockquote&gt; &lt;p&gt;&lt;br&gt;Objects are still a very good way to model systems and they function reasonably efficiently in the local context. But they don't distribute well, particularly if one tries to use them in a naive way. A service-oriented architecture solves this problem by dealing with the latency issues up front. It does this by looking at the patterns of data access in a system and designing the service-layer interfaces to aggregate data in such a way as to optimize bandwidth, usage, and latency.&lt;br&gt;&lt;/p&gt;&lt;/blockquote&gt; &lt;p&gt;Not that SOA limitations are the only thing that is affecting the consciousness of a software engineer, the other issue is the large rise in the complexity of managing a large enterprise library written in an OO language. One of the largest pain points of any application of large size is the management of graphs and graphs of live objects and the living data within them. When software engineers experience the lack of side-effects in functional languages, it's a breath of fresh air.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Concurrency&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;A funny thing happened on the way to those multi-core processors. People loaded their applications on them and noticed nothing got much faster, particularly when it came to transaction intensive tasks. Turns out Intel and AMD left out an important fact about their Moore's Law cheating multi-core environment: you can't ring as much performance out of it without changing the way you manage concurrency and threads. Sequential programming could always rely on going faster as the single processor speed got faster, but as multicores come in to play that isn't always the case. You want to farm off transactions to occur on separate processors, and in the living world of mutable objects and variables, breaking out two transactions to work concurrently that operate on the same living data is a bad idea. Add structural programming's solution to this problem, optimistic and pessimistic locking, and you have dead-locks in short order.&lt;/p&gt; &lt;p&gt;Functional programming has been a natural place to explore parallel processing and new ways of doing atomic transactions because of the reasons above. More important, these atomic structures can be composable which is lost when doing locks in structural programming. A lot of the buzz has been generated around the idea of &lt;a title="Composable Memory Transactions" href="http://research.microsoft.com/%7Esimonpj/papers/stm/"&gt;software transactional memory&lt;/a&gt;, where execution blocks can be flagged and managed and built upon. The best introduction to this topic is the paper by Tim Harris entitled &lt;a title="Concurrent Programming Without Locks" href="http://research.microsoft.com/~tharris/drafts/cpwl-submission.pdf"&gt;Concurrent Programming Without Locks&lt;/a&gt;. Although this use to be expressed only in the confines of &lt;a title="Concurrent Haskell" href="http://en.wikipedia.org/wiki/Concurrent_Haskell"&gt;Concurrent Haskell&lt;/a&gt;, others have shown how the same techniques can be used in other functional languages, such as &lt;a title="First Class Functions:2 Multiprogramming" href="http://leibnizdream.wordpress.com/2007/12/22/first-class-functions2-multiprogramming/"&gt;F#&lt;/a&gt; using nothing more than PowerList.&lt;/p&gt; &lt;p&gt;This experimentation is one of the large reasons why functional languages have become more important as software engineers wrestle with the problems and promise of multi-core processors in transaction processing. Although not every engineer will be interested in the deeper details of STM or other strategies in concurrent programming, the fact that these libraries will emerge and only be available in the functional realm will force software engineers to learn the core concepts and bring even more visibility to the functional programming space.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Functional Hybrids: Functional Programming Is Now Approachable&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;The other driver for adoption of functional programming languages, besides the architectural benefits it has to solve current problems, is the fact that languages such as F# and Scala have adopted a more hybrid model in their language design, where a developer isn't forced completely outside her comfort zone. Scala is a combination of functional and deeper OO methodologies (as in SmallTalk) and has access to the entire Java library, significantly reducing the learning curve. The same can be said for F# and .Net and Nu and Objective-C. This does have draw-backs however, as both F# and Scala have not been able to use more of the STM strategies that Concurrent Haskell allows because the underlying thread architecture of the VMs they run against are built for structural programming languages. It is easy to see how this can be fixed, however, and allow those using hybrid functional languages the same power as those who express their ideas in Haskell or even Lisp.&lt;/p&gt; &lt;p&gt;As I said, I am excited about this new resurgence in functional programming languages, and I am enthusiastic 2008 will have even more to offer those who are just getting their toes wet. I personally know some college freshman who started out using Nu as their first language, and are already contributing to the community. The future of software engineering is bright.&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8953635" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/brandonwerner/archive/tags/lisp/default.aspx">lisp</category><category domain="http://blogs.msdn.com/brandonwerner/archive/tags/programming/default.aspx">programming</category><category domain="http://blogs.msdn.com/brandonwerner/archive/tags/concurrency/default.aspx">concurrency</category><category domain="http://blogs.msdn.com/brandonwerner/archive/tags/haskell/default.aspx">haskell</category><category domain="http://blogs.msdn.com/brandonwerner/archive/tags/stm/default.aspx">stm</category></item><item><title>Thoughts On Google's Conference on Scalability In Seattle</title><link>http://blogs.msdn.com/brandonwerner/archive/2008/09/16/thoughts-on-google-s-conference-on-scalability-in-seattle.aspx</link><pubDate>Tue, 16 Sep 2008 07:33:52 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8953605</guid><dc:creator>brandon_werner</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/brandonwerner/comments/8953605.aspx</comments><wfw:commentRss>http://blogs.msdn.com/brandonwerner/commentrss.aspx?PostID=8953605</wfw:commentRss><wfw:comment>http://blogs.msdn.com/brandonwerner/rsscomments.aspx?PostID=8953605</wfw:comment><description>&lt;p&gt;&lt;img style="margin: 5px" height="99" alt="Google Scalability Conference Logo" src="http://brandonwerner.blob.core.windows.net/images/google-scale.gif" width="297"&gt;&lt;/p&gt; &lt;p&gt;If you are looking for a good collection of notes regarding the topics covered at the &lt;a title="Seattle Conference on Scalability" href="http://www.google.com/events/scalability_seattle/"&gt;Seattle Conference on Scalability&lt;/a&gt;, you can do no better than what &lt;a title="Google Seattle Conference on Scalability" href="http://perspectives.mvdirona.com/2008/06/15/GoogleSeattleConferenceOnScalability.aspx" rel="colleagueOf"&gt;James Hamilton put together&lt;/a&gt;. Instead, I'll write a quick commentary on what I experienced.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Scalability Is Your Problem Too&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;The goals of the conference are laudable. Scalability is an issue that almost all practitioners of software engineering face, especially as we move towards offering services both inside and outside the enterprise. Many are taken off guard by the sudden issues that confront them after wiring up a large scale services-based environment; especially around distributing load, distributing the data, and writing the data quickly. Sadly, I didn't see too many people from large companies there - most were software companies like Microsoft, Google, MySpace and Amazon.com. The attendance may be a consequence of the subject matter. This was some intense stuff dealing with MPI at Cray and its hopeful successor, Wikipedia redone with DHT and Erlang, a b-tree vs. Hashmap debate and scalable storage issues when dealing with billions of files. A more fun loving person would have done better going over to Adobe and hanging out at &lt;a title="BarCampSeattle" href="http://pathable.com/events/barcampseattle"&gt;BarCampSeattle&lt;/a&gt;, which was going on at the same time.&lt;/p&gt; &lt;p&gt;Despite the intimidating material, there are real architectural and design issues that these discussions present that should be in the mind of anyone dealing with large datacenters that scale globally or even nationally. The approach of &lt;a title="GIGA+" href="http://www.sosp2007.org/talks/WIP/1%20-%20sosp07-wip.pdf"&gt;GIGA+ file storage&lt;/a&gt;, maidsafe's new computer architecture, and &lt;a title="NetWorkSpaces for R" href="http://nws-r.sourceforge.net/"&gt;NetWorkSpaces for the R language&lt;/a&gt; was uniform: off-loading responsibility for management of data (meta or otherwise) to all vertices in the deployment graph instead of a central repository. NetWorkSpaces in R and maidsafe even discussed computational scalability - while Cray's new &lt;a title="Chapel:The Cascade High-Productivity Language" href="http://chapel.cs.washington.edu/"&gt;Chapel language&lt;/a&gt; and the discussion around Software Transactional Memory focused on scalability across processing cores as well as machines.&lt;/p&gt; &lt;p&gt;&lt;img style="margin: 0px 5px 5px" alt="GIGA+ Bitmap Example" src="http://brandonwerner.blob.core.windows.net/images/giga.gif" align="left"&gt;&lt;/p&gt; &lt;p&gt;GIGA+'s approach of maintaining a small bitmap file on each node and passing that around - while anticipating and accepting stale data on a few edge nodes - was brilliant in the patterns it hinted at, including that perhaps being right all the time isn't as important as being fast. You can be right most of the time and accept the performance hit of not being right some of the time. There are many people who would cringe at this, but at this point we're going to have to play loose and leave a few balls up in the air as we juggle - doing the math of how often one may fall while keeping the rest going as fast as we can.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Pay No Attention To The Man Behind The Curtain&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;Yet if I had to sum up the content of the conference I would say it was big on strategy and architecture but short on implementation. There was a lot of things hinted at "behind the curtain" but nothing assured hand raising from the compsci geeks in the room more than hand waving when you got to the distributed piece of your solution. For instance, one of the big benefits of Chapel - the MPI successor that Bratford Chamberlain of Cray presented - was that you could have distributed arrays and graphs that would be automatically sliced up to be distributed to parallel cores or even other "locales" if desired. How the language determines where to split these large arrays and graphs and farm them out was not discussed. One of the more interesting slides was dashed lines drawn across various nodes and vertices of a graph symbolizing how it would be chopped and distributed. Someone in the audience raised their hand at this - but he moved on and the hand went back down. To be fair, Chapel was called a "multi-resolution" language where one could start fairly abstract and then add more detail and control to get the best desired result - something I assume you have to do to get good or intelligent chopping and distribution of the data. Given that one of his slides was a comparison of code lines between Fortan using MPI and Chapel: seeing a working code snippet of Chapel would have been helpful. It may turn out to be the same amount of work after you get past the "global view".&lt;/p&gt; &lt;p&gt;This was the trend though, as all of the presentations had a bit of hand waving regarding performance metrics and distribution of computation. This was highlighted by the talk of Vijay Menon of Google - whose work at Intel I was familiar with - discussing Software Transactional Memory. He illustrated the challenges of implementing this in an imperative language (I'm suspicious you can even do STM well in an imperative language with state - &lt;a title="The Rise Of Functional Programming: F#/Scala/Haskell and the failing of Lisp" href="http://www.brandonwerner.com/2008/01/13/the-rise-of-functional-programming-fscalahaskell-and-the-failing-of-lisp/"&gt;as I discussed before&lt;/a&gt;) but beyond suggesting the keyword "atomic" to replace "synchronized" in the Java language there was very little real content discussed for those already familiar with the issue of locks and multiprocessors. &lt;a title="Concurrent Haskell on Wikipedia" href="http://en.wikipedia.org/wiki/Concurrent_Haskell"&gt;Concurrent Haskell&lt;/a&gt; wasn't even mentioned. A better introduction and discussion is to be had by &lt;a title="OSCON - Simon Peyton-Jones" href="http://s5.video.blip.tv/1720000615947/OSCON-OSCON2007SimonPeytonJones914.mov"&gt;watching the O'Reily's OSCON video&lt;/a&gt; from &lt;a title="Simon Peyton-Jones" href="http://research.microsoft.com/%7Esimonpj/"&gt;Simon Peyton-Jones&lt;/a&gt; (the writer of GHC and now at Microsoft Research) on the subject. After that, if you're still hungry, his collection of &lt;a title="Papers and presentations about transactional memory in Haskell" href="http://research.microsoft.com/%7Esimonpj/papers/stm/"&gt;papers on his Microsoft Research site&lt;/a&gt; is a delight.&lt;/p&gt; &lt;p&gt;Of course the point of these conferences is the discussions that occur during the breaks and in the networking event afterwards - something that I treasure having newly moved to the Seattle area from Cincinnati. Instead of just observing and blogging from afar - I get to be at the same table as Vijay Menon, &lt;a title="Thorsten Schuett" href="http://www.zib.de/schuett/" rel="colleagueOf"&gt;Thorsten Schuett&lt;/a&gt;, &lt;a title="Swapnil Patil" href="http://www.cs.cmu.edu/~svp/" rel="colleagueOf"&gt;Swapnil Patil&lt;/a&gt;, &lt;a title="Paul Watson" href="http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/w/Watson:Paul.html" rev="hasMet colleagueOf"&gt;Paul Watson&lt;/a&gt; and others.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Summary of the Architectural Patterns I Saw&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;If I had to summarize what I took away from the conference from a high-level architectural stand-point, here are they are:&lt;/p&gt; &lt;ul&gt; &lt;li&gt;Every node must be aware of the state of every other node without a centralized controller.  &lt;li&gt;To do this, a mechanism should be in place to share state quickly but peer-to-peer.  &lt;li&gt;It's ok to let some nodes go stale.  &lt;li&gt;Client/Server is now one thing. Pub/Sub with computation. Every node on the graph should do work.  &lt;li&gt;As much as possible, each node should maintain its own security and state. You should be able to have anonymous resources appear in your data center and be put to use without much configuration.  &lt;li&gt;As much as possible, abstract the distribution of processing away from programmers.  &lt;li&gt;Key,Value with Hashes are best for scalability and distribution (it seems to have won out in all the solutions presented here.) Blame &lt;a title="MapReduce" href="http://labs.google.com/papers/mapreduce.html"&gt;MapReduce&lt;/a&gt;.  &lt;li&gt;Ants can be used to demonstrate anything. &lt;/li&gt;&lt;/ul&gt; &lt;p&gt;I hope everyone had a good of a time as I did.&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8953605" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/brandonwerner/archive/tags/scalability/default.aspx">scalability</category><category domain="http://blogs.msdn.com/brandonwerner/archive/tags/concurrency/default.aspx">concurrency</category><category domain="http://blogs.msdn.com/brandonwerner/archive/tags/stm/default.aspx">stm</category><category domain="http://blogs.msdn.com/brandonwerner/archive/tags/computer+science/default.aspx">computer science</category></item></channel></rss>