Welcome to MSDN Blogs Sign in | Join | Help

Software Transactional Memory: Debunked?

If I go in to my excellent academic article organizer, Papers, and search for "software transaction memory" or "stm" I get at least 30 results of papers both high level and detailed regarding this next big thing that will allow us to finally, without any effort, take advantage of our multi-core CPUs and handle all the nasty locking and synchronization issues for us with nothing more than a language keyword. So much publicity has been given to this idea that no less than three presenters at the Google Scalability Conference mentioned it, with one presentation being nothing but a glimpse in to the STM future.

As I wrote then:

This was the trend though, as all of the presentations had a bit of hand waving regarding performance metrics and distribution of computation. This was highlighted by the talk of Vijay Menon of Google - whose work at Intel I was familiar with - discussing Software Transactional Memory. He illustrated the challenges of implementing this in an imperative language (I’m suspicious you can even do STM well in an imperative language with state - as I discussed before) but beyond suggesting the keyword “atomic” to replace “synchronized” in the Java language there was very little real content discussed for those already familiar with the issue of locks and multiprocessors. Concurrent Haskell wasn’t even mentioned.

It turns out that, according to a paper published in the Communications of the ACM, Software transactional memory: why is it only a research toy?, Software Transactional Memory may not work at all. The article presents research from IBM, who built the IBM XL C/C++ for Transactional Memory for AIX, known as IBM STM, and also takes benchmarks from the Intel STM and the SUN TL2 STM. In the paper, they put the STM implementations through the ringer using b+tree and the Delaunay Mesh Refinement algorithm. It's well worth a read.

Their final analysis puts a deep nail in the coffin of STM:

Based on our results, we believe that the road ahead for STM is quite challenging. Lowering the overheads of STM to a point where it is generally appealing is a difficult task and significantly better results have to be demonstrated. If we could stress a single direction for further research, it is the elimination of dynamically unnecessary read and write barriers—possibly the single most powerful lever toward further reduction of STM overheads. However, given the difficulty of similar problems explored by the research community such as alias analysis, escape analysis, and so on, this may be an uphill battle. And because the argument for TM hinges upon its simplicity and productivity benefits, we are deeply skeptical of any proposed solutions to performance problems that require extra work by the programmer.

Many academics takes the approach that most developers don't need to be aware of, much less optimize for, atomic transactions in their code. Much like pointers and Aunt May's apple pie, it's best to leave those things to the professionals and their compilers. This is the approach argued by Bryan Cantrill and Jeff Bonwick from Sun Microsystems in their article Real-world concurrency. Seeing as these are the same people who brought us D-Trace and lockstat in Solaris OS, so it's probably best to take their word on it.

From the article:

The most important conclusion from our foray into the history of concurrency is that concurrency has always been employed for one purpose: to improve the performance of the system. This seems almost too obvious to make explicit. Why else would we want concurrency if not to improve performance? And yet for all its obviousness, concurrency's raison d'être is seemingly forgotten, as if the proliferation of concurrent hardware has awakened an anxiety that all software must use all available physical resources. Just as no programmer felt a moral obligation to eliminate pipeline stalls on a superscalar microprocessor, no software engineer should feel responsible for using concurrency simply because the hardware supports it. Rather, concurrency should be considered and used for one reason only: because it is needed to yield an acceptably performing system...

...To make this concrete, in a typical Model/View/Controller application, the View (typically implemented in environments like JavaScript, PHP, or Flash) and the Controller (typically implemented in environments like J2EE or Ruby on Rails) can consist purely of sequential logic and still achieve high levels of concurrency provided that the Model (typically implemented in terms of a database) allows for parallelism. Given that most don't write their own database (and virtually no one writes their own operating system), it is possible to build (and indeed, many have built) highly concurrent, highly scalable MVC systems without explicitly creating a single thread or acquiring a single lock; it is concurrency by architecture instead of by implementation.

However, I think that some easy atomic transactional wrappers would be helpful for developers, and hope that the research in to some way of implementing atomic transactions in an easy and accessible way continues. Of course, I am still skeptical that any imperative language with living objects that have state can easily have their transactions atomic and it would appear this paper agrees with this skepticism. Anyone for Concurrent Haskell?

What I've Been Working On: Microsoft Online Launched

Microsoft Online HomepageOne of the things that is hard to get my head around is what to be secret about and what I am free to talk about. Therefore, I have decided to not talk about what I'm working on very much. Some choose not to talk at all. I considered that, and can see the merit. However, I also like participating in the community quite a bit. This awkward compromise will have to do.

There has been a lot of changes working for Microsoft - but it is also the most challenging and rewarding job I've ever had. Being part of a product that seeks to take an entire company, and it's millions upon millions of customers, in a new direction is a humbling experience and demands you give your best. Twenty four hour email and high stress ship decisions are part of life.

It's with that enthusiasm that I'd like to talk about a huge milestone our team accomplished - shipping Microsoft Online. More specifically, the Business Online Productivity Suite, known by us acronym lovers as "BPOS". It is our first offering that combines Exchange, Sharepoint and Live Meeting together "in the cloud". Your company can either exist completely on the cloud - or you can sync your Active Directory at various times throughout the day from your own corporate datacenter to the cloud so that your infrastructure is always in sync across the enterprise. Already, large customers such as Eddie Bauer, Energizer, and Blockbuster have made the switch. Saying those names in a blog would have had me escorted out of the building a few weeks ago - but now that the launch event has occurred and they talked about their experience using our product, it's out in the open that we are launching with some great companies evaluating and using our products in the cloud.

This is just the beginning - and something I'm very excited to participate in. Microsoft is a great place to work - I can't believe I get to do this stuff every day.

Congratulations to the team and co-workers for a great release - and even better to come.

By the way, if you'd like to see our work on building a great community for MS Online users, check out the TechNet forums for Microsoft Online. Feel free to participate.

Posted by brandon_werner | (Comments Off)
Filed under:

Un-PC Reality

One of the things in all the better Leadership training seminars I've been to in my career has been the insistence and dedication to reality. Usually the best strike hard by announcing a string of facts about what is changing that is causing organizational problems and asking everyone to confront them.

There are some that have taken this advertising campaign as a validation that their world has not changed. Some that continue to believe we can maintain the way things have existed - the way our revenue and ecosystem has worked in the past - and confuse that desire with the goals of the PC Campaign. Certainly, Windows as a platform is not going anywhere - the marketshare numbers still show remarkable power. The problem is Google and others have proven that not only is that not a hindrance to their efforts - they can use it against us by layering a platform on top of it. That is the reality we find ourselves in. We risk becoming a pretty TCP/IP stack for the larger world.

That is why it is dangerous to assume "I'm a PC" means an old client PC with a tower/flat panel screen in the corner of the Den running Windows 95. It's tempting to want that old model back, but the reality is Windows has moved away from the PC. What Windows is the air we breathe? Are you talking just of the client? What about Windows in the cloud? If I spend 90% of my time on my MacBook Pro inside mesh.com using Silverlight - am I then - in your opinion - still running Windows?

To borrow from Emerson: there is no wall where I, the device, ends and you, the operating system, begins. Windows now comes to see us without bell. The walls are taken away.

I'm still trying to figure out if we have seriously - each of us - looked that reality dead in the face. There is a sense of urgency that needs to spread through everyone - an urgency that I've seen so I'm optimistic we can achieve a lot more in the future than we have in the past. What we have to guard against, however, is assuming that this future looks like the past - a PC on every desktop running Microsoft software.

Perhaps we should second the new mission statement with a vision "the network, through every device, running Microsoft software"


I think in some there is real fear of this new reality- fear that needs to be addressed - a way forward clearly communicated for them. That I believe would help Microsoft regain it’s spirit - which never relied on the current products - but always the future. We should aspire to make our customers fans - fans of the brand and the innovation, not of just the PC.

On The New Communications of the ACM Redesign

Communications of the ACM July 08 A while ago ACM embarked on an ambitious mission: to change their flagship publication, Communications of the ACM, for Association for Computing Machinery members, in to the JAMA of Computer Science. If this new issue of the re-designed CACM is any indication, they will succeed. In the first few pages we have quantum computing, modeling to eliminate errors in software, an analysis of cloud computing, a debate about the future of the computer science curriculum and what it means for their career path as programming becomes offshored, and the history of the IT industry in India.

.. and I'm only on page 33.

There are 112 pages.

It use to be that way - back from the inception of CACM on through the 1970s the magazine was a collection of computer science research for the academic professional. However, as the 1980s and 1990s moved computers in to people's homes and the IT field changed from Phds toying with large Turing machines to undergrads who used Visual Basic and Java for basic business purposes, the magazine changed. These new practitioners didn't come from the academic field, didn't really understand the basic underpinnings of a computer, and usually didn't care. The funding of the ACM dried up as well, even as the number of people in the field boomed. The CACM changed to grab these people by becoming more of a mainstream magazine geared towards those new entrants - maybe to attract these people to the ACM membership. It didn't seem to work. The magazine lost its way.

Now we are once again approaching a change in the computer science field. Much like the way Cloud Computing is taking us back to the large machines in the back rooms and thin clients at the edge, software engineering is changing back from large numbers of engineers with basic knowledge to a smaller number with more specialized knowledge. The Googles of this world are not as worried about basic applications written across millions of detached machines - things that usually create reusable patterns and easy software construction from a weekend's reading of O'Reilly books. Instead, they are worried about problems of concurrency, massively scalable storage systems and parallel processing while sharing the same memory space. The choice of the language has changed to an implementation detail to express these ideas and can be interchangeable. These problems require knowledge of tuples and binary trees and graph theory, to name a few.

At the same time programming jobs that boomed in the 90s and 00s are being outsourced to cheaper and cheaper labor overseas with the harder proofs being demonstrated once on the internet and then communicated across the world for others to incorporate. Pre-packaged software for businesses are becoming more configurable to existing systems and removing the need for custom software from programmers in non-software companies. This means that those who are serious about the profession are diving deeper in to the roles of architect, designer and academic - while those whom aren't as interested are moving on to other careers. These two changes are providing an entrance for a journal like CACM to come alive again and publish the best research available needed to solve these hard problems.

The new CACM couldn't come at a better time.

Tech Trends For Fall Reading: Software Transactional Memory, Cloud Computing Storage, and more

Now that the summer is over – the tech industry is back to work – and the new products and service announcements are coming quick, why not do some good reading to prepare for the fall when everyone returns from vacation and you get back to the serious business of deadlines, programming and of course geeky arguments about the topics of the day. Here is a good reading list to bookmark.

Get up to speed of Generic Programming, or Programming In General

My first recommendation is the collected papers of Alexander Stepanov, which you can get from his website entitled... Collected Papers of Alexander Stepanov. For those who don't know, Stepanov is the key person behind the C++ Standard Template Library, which he started to develop around 1993. He had earlier been working for Bell Labs close to Andrew Koenig and tried to convince Bjarne Stroustrup to introduce something like Ada Generics in C++. His papers are a treasure of thought on generic programming, logic, robotics and anything else that made you turn to the Computer Science page in your university's catalog. Best of all he also provides slides for his book in progress, written with Paul McJones, called Programming Elements. This is a great book for refreshing your knowledge of abstract and concrete concepts in quick and easy powerpoint format. Just take a look at the table of contents and I dare you not to click on at least one of the Chapter links. Don't worry, I won't tell.

The "Core" Debate of the Community: Concurrent Programming and Software Transaction Memory

Yes, the pun was bad. It does however illustrate one of the facets of the problem that is burning up academic and commercial researchers alike, and responsible for a large amount of papers flooding the ACM portal: Software Transactional Memory (STM). Well, actually, that's a possible answer to the problem - not the problem itself. They are often confused now. The problem is that since Intel and AMD have decided to start introducing more cores on to single chip we have to deal with the big problem that comes along with that: managing the threads of multiple cores trying to do the same work on behalf of the system it's working for. It also scales in to bigger problems of any type of work you may want to farm off to "locales" that may need to cross boundaries and work on the same data within a transaction (for more information on some of this, see my post from the Google Scalability Conference regarding Cray's work to replace MPI with a new concurrent language Chapel and the GIGA+ filesystem below)

I think Simon Peyton Jones from Microsoft Research in Cambridge illustrates it best in his paper Composable Memory Transactions(PPOPP'05) :

The dominant programming technique is based on locks, an approach that is simple and direct, but that simply does not scale with program size and complexity. To ensure correctness, programmers must identify which operations con?ict; to ensure liveness, they must avoid introducing deadlock; to ensure good performance, they must balance the granularity at which locking is performed against the costs of ?ne-grain locking. Perhaps the most fundamental objection, though, is that lock-based programs do not compose: correct fragments may fail when combined. For example, consider a hash table with thread-safe insert and delete operations. Now suppose that we want to delete one item A from table t1, and insert it into table t2; but the intermediate state (in which neither table contains the item) must not be visible to other threads. Unless the implementor of the hash table anticipates this need, there is simply no way to satisfy this requirement. Even if she does, all she can do is expose methods such as LockTable and UnlockTable � but as well as breaking the hash-table abstraction, they invite lock-induced deadlock, depending on the order in which the client takes the locks, or race conditions if the client forgets. Yet more complexity is required if the client wants to await the presence of A in t1, but this blocking behaviour must not lock the table (else A cannot be inserted). In short, operations that are individually correct (insert, delete) cannot be composed in to larger correct operations.

The most that has come out of this is that we know it's a problem and we'd love to use the keyword "atomic" to wrap our transactional code in our languages. Beyond that, it's a lot of hand waiving and Powerpoint slides. Some people though are actually trying to work it out. The best starting point here are the papers from the before mentioned researcher Simon Peyton Jones. His collection of papers on STM offers a good starting point of the problem and what some possible solutions are. In his papers he uses Haskell, and his work has led to Concurrent Haskell. Haskell lends itself to STM for reasons I won't go in to here, but it will be quite a bit more of a challenge to get the same functionality in Java and C#, but there is already an API for C# Software Transactional Memory from Microsoft Research you may want to explore.

If you don't care about this, just don't go naming classes atomic and you should be fine.

Storing The Cloud: How Do We Scale?

Solid State (read: Flash) drives aren't the only thing showing the age of our old file system technologies. As we expose software as services and begin taking on large numbers of tenants for our software, cloud computing needs clusters with thousands of nodes that, with the multi-core technology mentioned above, will impose a challenge for storage systems. We will need the ability to scale to handle data generated by applications executing in parallel in tens of thousands of threads. There have been some solutions posed, such as IBM General Parallel File System (GPFS) and Microsoft Research's Boxwood technology.

I was lucky enough to watch a presentation on GIGA+, another solution that is being researched by Swapnil V. Patil at Carnegie Mellon University. One of its neatest ideas is leaving the header-table behind, using a bitmap instead. I got to sit down with him afterward and talk about the challenges we face in this space. It was a great time. His primary concern about GPFS and Boxwood is the use of hashing and B-trees, which causes the possibility of bottlenecks and synchronization issues. By using a bitmap, and keeping it small so that it can be shared across nodes easily, GIGA+ eliminates a need for "metanodes" or other controllers on the HPC storage architecture.

His paper, GIGA+ : Scalable Directories for Shared File Systems, is a great read for those interested both in the problem of high-performance computing and storage. Their work seeks to maintain the UNIX file structure however, so those who care about scaling Microsoft infrastructure may find less to enjoy, but the overall architecture and problems outlined in the paper is applicable to any massively large storage cluster technology.

Enough Already

That should be enough to get you through August. When your boss comes back from his Alaskan cruise, nothing will ensure he leaves you alone more than talking about Concurrent Haskell or how much you enjoyed Chapter 9 of Programming Elements: Algorithms on increasing ranges. Enjoy the air conditioning you lucky bums.

Goodbye Map Reduce - Hello Cascading

An interesting post from Nathan Marz regarding an abstraction layer from Chris Wensel called Cascading:

We have been doing a lot of batch processing with Hadoop MapReduce lately, and we quickly realized how painful it can be to write MapReduce jobs by hand. Some parts of our workflow require up to TEN MapReduce jobs to execute in sequence, requiring a lot of hand-coordination of intermediate data and execution order. Additionally, anyone who has done really complex MapReduce workflows knows how hard it is to keep “thinking” in MapReduce. Luckily, we discovered a great new open source product called Cascading which has alleviated a ton of our pain. Cascading is the brainchild and work of Chris Wensel, and he’s done a great job developing an API which solves many of our problems. Cascading abstracts away MapReduce into a more natural logical model and provides a workflow management layer to handle things like intermediate data and data staleness.

Very good walkthrough of how they take a tuple problem set and use Cascading to simplify the management of pipes, particularly forking and merging pipes together.

You may also want to see Yahoo Research's Pig as another example of an abstraction layer over MapReduce, which seem to be all the rage now as we need a way to query / join and generally work with these large datasets in an easy way. Yahoo's Pig seems to rely heavily on SQL like syntax - an approach I'm not as fond of as the approach Cascade takes.

Microsoft Live Mesh on Apple Mac OS X

This is a screenshot of Mesh running on the Silverlight platform on Mac OS X. Pretty neat example of the future.

By the way, if your interested in developing for Live Mesh, there are some new videos posted on Microsoft Videos that provide an impressive amount of content. The RESTful services and the Pub/Sub model are of particular interest to me, since I think it will unleash a host of service aggregation possibilities in the future. The ability to ask the cloud to give you JSON or RSS without mapping or conversion is amazing.

Microsoft Mesh Running In Safari Browser on OS X

The Rise Of Functional Programming: F#/Scala/Haskell and the failing of Lisp

Over at Lambda The Ultimate, the best academic programming blog on earth, there is a large debate going on regarding what the future of languages will be for 2008. The most important thing to emerge from the discussion is the larger role functional programming will play. It seems like a safe bet. This year has seen the explosion of interest and creation of functional languages such as Apple OS X's Nu, Java's JVM using Scala and Microsoft Research's .Net language F#.

I am ecstatic at this change.

The Failure Of Lisp

It's hard to understand where it came from. Certainly one can argue the broader academic community had nothing to do with it, the old guard Common Lisp hackers are still as fickle and as judgmental to new comers as ever. Also, the old standards in Lisp languages, Franz and LispWorks have not lowered their prices to anything approachable to the casual developer. There are open source ANSI Lisp implementations without all the supporting engines and functionality, such as SBCL. In fact, my most linked thing I've ever written in my career is the installation walk-through I did for installing SBCL and Allegro which includes adding your repository and packages for CLOS and automatically compiling the FASL files, especially dealing with the asdf differences between the implementations. The complexity of this in itself points to problems with portability and configuration in Lisp. However, even that project that targeted Lisp's Bread and Butter, the parsing of semantic ontologies for the Semantic Web, was met in the message boards with worries on if there would be enough developer participation using such an odd language, and recommendations on moving it to Java.

In reality, Common Lisp showed its failure as a community by sitting out this enthusiasm that has been generated around functional programming languages. It didn't have to be that way. I recall my first awareness of functional programming's growth was the awesome work of Lemonodor's blog and Sriram Krishnan posting "Lisp Is Sin". I was happy at the time that Lisp was getting such attention, as well as functional language architectures in general. I imagined that as OO languages had grown so verbose and feature dense that even the IDEs to develop your applications run in to the tens of gigabytes, a new evolution "Back To The Future" was inevitable. Even more, I believe long suffering Lisp deserves to be back in favor again, it's certainly spent its time in purgetory. Yet, it didn't happen. You can blame the old 50 year old men sitting on IRC channels for that. It was the most thorny and un-inspiring community I've ever participated in, despite my extreme interest in the language. It's jaw dropping that a language with such promise has sat out the resurgence, and speaks to what an un-friendly and un-inviting community can do a technology platform. I would be the first to march it off to the grave.

The Rise Of Functional Languages

The interest in functional programming actually grew up around more academic but pure languages like Scheme and Haskell. Although these languages sit within their own island and lack many of the "dirty" aspects of Lisp's CLOS environment that make it easy to access OS and hardware resources, they are still strikingly useful in learning things that are the staple of functional languages, such as Closures and Lambdas. Indeed, one could argue that the movement to move Closures in to OO languages (first C#, now Java) was in part due to the rise of awareness of functional languages.

Further, it seems to me that functional programming languages answered two prayers of those more ambitious engineers who don't seem to want to stick with the script and Java worlds they were taught in college. Those two large wins, far more important than the semantic features of functional languages that have gotten all the attention, are architecture foundations of functional languages:

  • Referential Transparency / Side Effects
  • Concurrency

Referential Transparency

To those coming from a pure OO world, Referential Transparency and the restriction of side-effects can be something hard to get their heads around. The best way I describe this concept is by hitting at the root of their assumptions: Everything they deal with are dead. The objects are dead, the variables are dead, the entire atmosphere is dead, as if something had come along and killed everything in your stack and you have to assemble your program by only what's been given to you, nothing more. There are no instances, objects do not "come alive" and have state; a state that you have to poke in to and a state that can change at any time. A function will always do what you expect, and nothing can come along and change that behavior.

One of the things that seems to appeal to developers most about the promise of SOA architectures happening in enterprise environments, if you're smart enough to pry it out of them, is that they get the same referential transparency in services. No one can override a service (besides versioning, which is explicit to the developer) and a service will only return what it did earlier in your code and earlier in the year. This forces developers to design services that have the same relationship to the world as functional programmers write their functions for. This is perhaps the trickiest part of migrating enterprise teams to a services based model, their expectations of the mutableness of the services they are accessing and their inability to anticipate what working in that world will be like. Especially for those who use tools or libraries to convert service interaction in to an object, the interaction can be jarring.

However, the soon find the predictability and the safety of such an environment liberating. In much the same way OO programmers were use to making their objects or variables immutable to maintain their contracts and relationships with other objects, often sacrificing many of the benefits that OO programming promised their stack, now they have immutability and transparency in an environment where functional paradigms are key, they do not expect to be able to "embrace and extend" services. They are what they are. This tends to cascade out to the living instantiated code a developer writes as well, as there is no point in entering the world of the living if what you have to return to is a dead function.

This was hinted at in an article in the ACM Queue magazine by Terry Coatta, entitled "From Here to There, The SOA Way". He states,


Objects are still a very good way to model systems and they function reasonably efficiently in the local context. But they don't distribute well, particularly if one tries to use them in a naive way. A service-oriented architecture solves this problem by dealing with the latency issues up front. It does this by looking at the patterns of data access in a system and designing the service-layer interfaces to aggregate data in such a way as to optimize bandwidth, usage, and latency.

Not that SOA limitations are the only thing that is affecting the consciousness of a software engineer, the other issue is the large rise in the complexity of managing a large enterprise library written in an OO language. One of the largest pain points of any application of large size is the management of graphs and graphs of live objects and the living data within them. When software engineers experience the lack of side-effects in functional languages, it's a breath of fresh air.

Concurrency

A funny thing happened on the way to those multi-core processors. People loaded their applications on them and noticed nothing got much faster, particularly when it came to transaction intensive tasks. Turns out Intel and AMD left out an important fact about their Moore's Law cheating multi-core environment: you can't ring as much performance out of it without changing the way you manage concurrency and threads. Sequential programming could always rely on going faster as the single processor speed got faster, but as multicores come in to play that isn't always the case. You want to farm off transactions to occur on separate processors, and in the living world of mutable objects and variables, breaking out two transactions to work concurrently that operate on the same living data is a bad idea. Add structural programming's solution to this problem, optimistic and pessimistic locking, and you have dead-locks in short order.

Functional programming has been a natural place to explore parallel processing and new ways of doing atomic transactions because of the reasons above. More important, these atomic structures can be composable which is lost when doing locks in structural programming. A lot of the buzz has been generated around the idea of software transactional memory, where execution blocks can be flagged and managed and built upon. The best introduction to this topic is the paper by Tim Harris entitled Concurrent Programming Without Locks. Although this use to be expressed only in the confines of Concurrent Haskell, others have shown how the same techniques can be used in other functional languages, such as F# using nothing more than PowerList.

This experimentation is one of the large reasons why functional languages have become more important as software engineers wrestle with the problems and promise of multi-core processors in transaction processing. Although not every engineer will be interested in the deeper details of STM or other strategies in concurrent programming, the fact that these libraries will emerge and only be available in the functional realm will force software engineers to learn the core concepts and bring even more visibility to the functional programming space.

Functional Hybrids: Functional Programming Is Now Approachable

The other driver for adoption of functional programming languages, besides the architectural benefits it has to solve current problems, is the fact that languages such as F# and Scala have adopted a more hybrid model in their language design, where a developer isn't forced completely outside her comfort zone. Scala is a combination of functional and deeper OO methodologies (as in SmallTalk) and has access to the entire Java library, significantly reducing the learning curve. The same can be said for F# and .Net and Nu and Objective-C. This does have draw-backs however, as both F# and Scala have not been able to use more of the STM strategies that Concurrent Haskell allows because the underlying thread architecture of the VMs they run against are built for structural programming languages. It is easy to see how this can be fixed, however, and allow those using hybrid functional languages the same power as those who express their ideas in Haskell or even Lisp.

As I said, I am excited about this new resurgence in functional programming languages, and I am enthusiastic 2008 will have even more to offer those who are just getting their toes wet. I personally know some college freshman who started out using Nu as their first language, and are already contributing to the community. The future of software engineering is bright.

Thoughts On Google's Conference on Scalability In Seattle

Google Scalability Conference Logo

If you are looking for a good collection of notes regarding the topics covered at the Seattle Conference on Scalability, you can do no better than what James Hamilton put together. Instead, I'll write a quick commentary on what I experienced.

Scalability Is Your Problem Too

The goals of the conference are laudable. Scalability is an issue that almost all practitioners of software engineering face, especially as we move towards offering services both inside and outside the enterprise. Many are taken off guard by the sudden issues that confront them after wiring up a large scale services-based environment; especially around distributing load, distributing the data, and writing the data quickly. Sadly, I didn't see too many people from large companies there - most were software companies like Microsoft, Google, MySpace and Amazon.com. The attendance may be a consequence of the subject matter. This was some intense stuff dealing with MPI at Cray and its hopeful successor, Wikipedia redone with DHT and Erlang, a b-tree vs. Hashmap debate and scalable storage issues when dealing with billions of files. A more fun loving person would have done better going over to Adobe and hanging out at BarCampSeattle, which was going on at the same time.

Despite the intimidating material, there are real architectural and design issues that these discussions present that should be in the mind of anyone dealing with large datacenters that scale globally or even nationally. The approach of GIGA+ file storage, maidsafe's new computer architecture, and NetWorkSpaces for the R language was uniform: off-loading responsibility for management of data (meta or otherwise) to all vertices in the deployment graph instead of a central repository. NetWorkSpaces in R and maidsafe even discussed computational scalability - while Cray's new Chapel language and the discussion around Software Transactional Memory focused on scalability across processing cores as well as machines.

GIGA+ Bitmap Example

GIGA+'s approach of maintaining a small bitmap file on each node and passing that around - while anticipating and accepting stale data on a few edge nodes - was brilliant in the patterns it hinted at, including that perhaps being right all the time isn't as important as being fast. You can be right most of the time and accept the performance hit of not being right some of the time. There are many people who would cringe at this, but at this point we're going to have to play loose and leave a few balls up in the air as we juggle - doing the math of how often one may fall while keeping the rest going as fast as we can.

Pay No Attention To The Man Behind The Curtain

Yet if I had to sum up the content of the conference I would say it was big on strategy and architecture but short on implementation. There was a lot of things hinted at "behind the curtain" but nothing assured hand raising from the compsci geeks in the room more than hand waving when you got to the distributed piece of your solution. For instance, one of the big benefits of Chapel - the MPI successor that Bratford Chamberlain of Cray presented - was that you could have distributed arrays and graphs that would be automatically sliced up to be distributed to parallel cores or even other "locales" if desired. How the language determines where to split these large arrays and graphs and farm them out was not discussed. One of the more interesting slides was dashed lines drawn across various nodes and vertices of a graph symbolizing how it would be chopped and distributed. Someone in the audience raised their hand at this - but he moved on and the hand went back down. To be fair, Chapel was called a "multi-resolution" language where one could start fairly abstract and then add more detail and control to get the best desired result - something I assume you have to do to get good or intelligent chopping and distribution of the data. Given that one of his slides was a comparison of code lines between Fortan using MPI and Chapel: seeing a working code snippet of Chapel would have been helpful. It may turn out to be the same amount of work after you get past the "global view".

This was the trend though, as all of the presentations had a bit of hand waving regarding performance metrics and distribution of computation. This was highlighted by the talk of Vijay Menon of Google - whose work at Intel I was familiar with - discussing Software Transactional Memory. He illustrated the challenges of implementing this in an imperative language (I'm suspicious you can even do STM well in an imperative language with state - as I discussed before) but beyond suggesting the keyword "atomic" to replace "synchronized" in the Java language there was very little real content discussed for those already familiar with the issue of locks and multiprocessors. Concurrent Haskell wasn't even mentioned. A better introduction and discussion is to be had by watching the O'Reily's OSCON video from Simon Peyton-Jones (the writer of GHC and now at Microsoft Research) on the subject. After that, if you're still hungry, his collection of papers on his Microsoft Research site is a delight.

Of course the point of these conferences is the discussions that occur during the breaks and in the networking event afterwards - something that I treasure having newly moved to the Seattle area from Cincinnati. Instead of just observing and blogging from afar - I get to be at the same table as Vijay Menon, Thorsten Schuett, Swapnil Patil, Paul Watson and others.

Summary of the Architectural Patterns I Saw

If I had to summarize what I took away from the conference from a high-level architectural stand-point, here are they are:

  • Every node must be aware of the state of every other node without a centralized controller.
  • To do this, a mechanism should be in place to share state quickly but peer-to-peer.
  • It's ok to let some nodes go stale.
  • Client/Server is now one thing. Pub/Sub with computation. Every node on the graph should do work.
  • As much as possible, each node should maintain its own security and state. You should be able to have anonymous resources appear in your data center and be put to use without much configuration.
  • As much as possible, abstract the distribution of processing away from programmers.
  • Key,Value with Hashes are best for scalability and distribution (it seems to have won out in all the solutions presented here.) Blame MapReduce.
  • Ants can be used to demonstrate anything.

I hope everyone had a good of a time as I did.

 
Page view tracker