Software Transactional Memory: Debunked?
If I go in to my excellent academic article organizer, Papers, and search for "software transaction memory" or "stm" I get at least 30 results of papers both high level and detailed regarding this next big thing that will allow us to finally, without any effort, take advantage of our multi-core CPUs and handle all the nasty locking and synchronization issues for us with nothing more than a language keyword. So much publicity has been given to this idea that no less than three presenters at the Google Scalability Conference mentioned it, with one presentation being nothing but a glimpse in to the STM future.
As I wrote then:
This was the trend though, as all of the presentations had a bit of hand waving regarding performance metrics and distribution of computation. This was highlighted by the talk of Vijay Menon of Google - whose work at Intel I was familiar with - discussing Software Transactional Memory. He illustrated the challenges of implementing this in an imperative language (I’m suspicious you can even do STM well in an imperative language with state - as I discussed before) but beyond suggesting the keyword “atomic” to replace “synchronized” in the Java language there was very little real content discussed for those already familiar with the issue of locks and multiprocessors. Concurrent Haskell wasn’t even mentioned.
It turns out that, according to a paper published in the Communications of the ACM, Software transactional memory: why is it only a research toy?, Software Transactional Memory may not work at all. The article presents research from IBM, who built the IBM XL C/C++ for Transactional Memory for AIX, known as IBM STM, and also takes benchmarks from the Intel STM and the SUN TL2 STM. In the paper, they put the STM implementations through the ringer using b+tree and the Delaunay Mesh Refinement algorithm. It's well worth a read.
Their final analysis puts a deep nail in the coffin of STM:
Based on our results, we believe that the road ahead for STM is quite challenging. Lowering the overheads of STM to a point where it is generally appealing is a difficult task and significantly better results have to be demonstrated. If we could stress a single direction for further research, it is the elimination of dynamically unnecessary read and write barriers—possibly the single most powerful lever toward further reduction of STM overheads. However, given the difficulty of similar problems explored by the research community such as alias analysis, escape analysis, and so on, this may be an uphill battle. And because the argument for TM hinges upon its simplicity and productivity benefits, we are deeply skeptical of any proposed solutions to performance problems that require extra work by the programmer.
Many academics takes the approach that most developers don't need to be aware of, much less optimize for, atomic transactions in their code. Much like pointers and Aunt May's apple pie, it's best to leave those things to the professionals and their compilers. This is the approach argued by Bryan Cantrill and Jeff Bonwick from Sun Microsystems in their article Real-world concurrency. Seeing as these are the same people who brought us D-Trace and lockstat in Solaris OS, so it's probably best to take their word on it.
From the article:
The most important conclusion from our foray into the history of concurrency is that concurrency has always been employed for one purpose: to improve the performance of the system. This seems almost too obvious to make explicit. Why else would we want concurrency if not to improve performance? And yet for all its obviousness, concurrency's raison d'être is seemingly forgotten, as if the proliferation of concurrent hardware has awakened an anxiety that all software must use all available physical resources. Just as no programmer felt a moral obligation to eliminate pipeline stalls on a superscalar microprocessor, no software engineer should feel responsible for using concurrency simply because the hardware supports it. Rather, concurrency should be considered and used for one reason only: because it is needed to yield an acceptably performing system...
...To make this concrete, in a typical Model/View/Controller application, the View (typically implemented in environments like JavaScript, PHP, or Flash) and the Controller (typically implemented in environments like J2EE or Ruby on Rails) can consist purely of sequential logic and still achieve high levels of concurrency provided that the Model (typically implemented in terms of a database) allows for parallelism. Given that most don't write their own database (and virtually no one writes their own operating system), it is possible to build (and indeed, many have built) highly concurrent, highly scalable MVC systems without explicitly creating a single thread or acquiring a single lock; it is concurrency by architecture instead of by implementation.
However, I think that some easy atomic transactional wrappers would be helpful for developers, and hope that the research in to some way of implementing atomic transactions in an easy and accessible way continues. Of course, I am still skeptical that any imperative language with living objects that have state can easily have their transactions atomic and it would appear this paper agrees with this skepticism. Anyone for Concurrent Haskell?