Transactional memory has not been immune to the Gartner hype-cycle. Nearly two years ago Ali-reza Adl-tabatabai from Intel reminded me of the hype-cycle, and we both commiserated that it appeared that we were at the “Peak of Inflated Expectations”. We had both just completed a week at the ACM’s PPoPP and we were both at what we viewed as the climax of the week, the Transact ’08 workshop. Every day that week we heard research about transactional memory – even though there was a whole day dedicated to this technology at the workshop. There was just so much enthusiasm about its promise, it seemed like every university was dedicating significant research efforts to it. It was an exciting time.
Yet, this was alarming to both of us since we had been working for commercial companies on implementations of transactional memory and both of us had already gone through our own personal hype cycle. I think I can honestly say that Ali was reminding me of the hype-cycle because he was worried about the impact of the “trough of disillusionment”. The higher the enthusiasm, the harder the fall feels when you dip into the trough. It felt like we were flying pretty high at that conference.
Looking back, maybe we were even beyond that peak but didn’t know it. Later that year both ACM Queue and the Communications of the ACM published critical articles on software transactional memory (STM). A critic with some rather damaging evidence was Calin Cascaval who was published in both venues and calling STM a research toy in CASCM and in Queue. The harshest terms were left to Bryan Cantrill, who while polite in print resorted to name-calling on his blog. (I tried not to take it personally since I have never met the man, I prefer civil discourse please.)
The economy then helped accelerate the fall into the trough of disillusionment with what appears to be Sun’s decision to abandon the ROCK processor. (It must have been made months before the day the Times reported that decision.) Lastly, looking at the pace of research today, we are not seeing the same number of papers published on STM as in the past.
I can talk about my own “hype-cycle” -- how I expected transactional memory to be a pervasive solution; and then as I gained more experience with it, I learned that it’s no silver bullet and only a piece of the pie.
Parallel programming is hard. Transactional memory is just a tool – by itself it doesn’t make parallel programming any easier but in the hands of skilled programmers and with tools such as Parallel Extensions or the Concurrency Runtime, it can appreciably enhance productivity. Parallel programing still requires the ability to decompose your work into logical pieces that can be run in parallel; you should still strive to reduce the amount of shared state; and the more performance you try to squeeze from hardware, the more you need to understand its memory model.
I came to this conclusion after writing some horrible parallel code with transactional memory that performed abysmally. I went after a problem naively, trying to see if it was solvable. It was. Although I found I still had to be careful with STM’s memory model and synchronize on an object boundary. Even still, my application performed worse than serialized code, but it was correct. That is the inherent value of transactions – correctness. As anyone experienced with database technology or on-line transaction processing (OLTP) can attest, transactional actions can and often do impact performance.
Transactions underscore the inherent tension between performance and correctness.
My colleague Pat Helland, a major pioneer in the field of distributed transactions, denounced the two-phase commit protocol as an “anti-availability protocol”. While some may think he dislikes transactions, he was just dramatically describing the dangers inherent in this technology. Transactions create a dependency tree. As your application develops more dependencies, it becomes more fragile. This is especially true if these dependencies are between components in different recovery domains. To put it another way, the reliability of a transactional system is the intersection of the availability of all its participants. If the availability of two participants rarely overlaps, the likelihood of that transaction succeeding is low. This is a hard problem that really requires a lot of thought and compromise. Jim Johnson, et al and I worked on gluing various transactional management domains together and now Windows has transactions available from managed code through to the kernel.
When you apply this knowledge to transactional memory, the discussion only changes slightly. Availability is not the Achilles-heel of transactional memory since the typical TM system’s memory is shared across one or many CPUs within a single computer and available so long as the computer is running. So availability in TM is replaced by contention. The transaction is broken not by a participant leaving or voting “no” but by participating memory being incompatibly accessed by two different transactions. The larger the set of memory involved in a transaction (read-set), the greater likelihood that there will be some contention between that transaction and other transactions. Again, the more work you want to put under transactions the higher performance impact you accept.
The optimist in me is not too worried about incompatible accesses by “well-written” applications. There have been well documented studies that show locks are not always necessary. So, what do well-written applications do? They make sure that their time under lock is small and that there is little or no data sharing. That mirrors OLTP and database transaction guidance to keep your transactions as small as possible and the observation that a successful transactional system is one where all participants are reliably available – and in STM’s case, with little or no contention.
Those who come out of the compiler world seem to focus on a transaction’s inherent overhead – it must do at least two stores for single store. It likely does more than one read for every read. This pervasive overhead is what critics point to when they call transactional memory a “research toy” or worse, hopeless. But, they are critics and pessimistic about this technology; personally, I think they are focusing on the wrong problem.
Databases have the same issues; from a naïve examination, they have to do two-writes for every write and yet they are the pervasive solution to multi-user access to durable data. How is it that transactions are “OK” for durable media and not “OK” for volatile memory? The expense of reads and writes on spinning media should make this inherent problem with databases much worse than for volatile memory.
Database developers have worked many years to batch up their writes, reduce the number of seeks the disk-head must take, and create innovative logging techniques all for the stated purpose of improving performance. Compilers can do the same. They can determine what data is “local” and does not need to be instrumented; make static or dynamic analysis that detects that specific variables are only accessed in transactions or that specific transactions are working on disjoint sets of memory. Basically, the tricks that the database community has been developing for the past thirty-ish years may and probably do have parallels that can be exploited by compilers. I allude to this in our Ch9 interview last year.
Further, the critics are focusing on the negative. If you create micro-benchmarks that only measure time-under-synchronization; the transactional memory serial performance will be distressing. But, the real measure is how fast the overall application is and how much it speeds up when more processors are available; how well does it scale?
Users of transactional memory can take a page out of the database programming manual and keep their atomic regions as small as possible. This is interesting from a performance point-of-view. If the amount of time your application is inside of a transactional-memory block is small; so small, in fact, that the serial cost of executing that block does not impact the overall application performance; then the serial overhead of transactional memory is less of a concern. When combined with real-world blocking events such as I/O, it might not even be a measurable cost.
The real advantage of transactional memory is then expressed in the form of productivity to the programmer. Instead of a grand replacement of locks with transactions, the goal is to enable application architectures that scale without needing to create complex locking mechanisms. Keep what little shared state you have managed by the transaction and free yourself from non-deterministic deadlocks.
This observation was recently underscored with multiple research reports. The first report “Transactional Memory versus Locks – A comparative Case Study” was delivered at ICSE 2009 - a research team tested usability and scalability of transactional memory. In what I consider to be a groundbreaking report on STM usability, researchers at the University of Karlsruhe took 12 students, had them create a parallel desktop search engine using Intel’s STM C compiler. 3 teams worked with traditional locks and the other 3 teams used STM. Their results demonstrated that STM is usable, created more maintainable code, and created an application that scales. The overall performance of the resulting application was that the best TM application was more than 3 times faster than the best time using traditional synchronization mechanisms. They were able to develop their solution in less time; resulting in code that was judged to be more readable; and the majority of their time was spent in sequential code – focusing on the real problem they were solving – not on the mechanisms of parallelism and shared state. To put this in clearer terms:
The value of STM is that it allows you to focus on designing applications which happen to scale instead of the mechanisms employed to scale those applications.
This is the promise of STM – not that it makes your application scale, but it frees you from worrying about lock ordering or even building a system that imposes some lock-hierarchy. Instead, focus on your problem and make it scalable. STM is simply a synchronization tool -- one that helps you get your work done without too much effort.
The second report is from WDDD 2009. I was unfamiliar with that conference. It’s the Eighth Annual Workshop on Duplicating, Deconstructing, and Debunking held at ISCA (Symposium of Computer Architecture). Honestly, that sounds like a really hard audience and I would love to have heard this talk. The UT Austin team asked “Is Transactional Programming Actually Easier?” So what do you think they found?
Now this isn’t just a talk, it’s a full paper. Their results are impressive – and very defendable. They did a “user-study in which 147 undergraduate students in an operating systems course implemented the same programs using coarse and fine-grain locks, monitors, and transactions.” They then made both quantitative and qualitative observations on the students work through code analysis and surveys. They did this over two years and documented all results.
The results are as I would have hoped. Some quotes from the paper (the bolding are my additions):
· “Overwhelmingly, the number and types of programming errors the students made was much lower for transactions than for locks. On a similar programming problem, over 70% of students made errors with fine-grained locking, while less than 10% made errors with transactions.”
· “…we found that coarse locks and transactions required less time than fine-grain locks on the more complex two-lane assignments. This echoes the promise of transactions, removing the coding and debugging complexity of fine-grain locking and lock ordering when more than one lock is required.”
· “…transactional programming really is less error-prone than high-performance locking, even if newbie programmers have some trouble understanding transactions…. for similar programming tasks, transactions are considerably easier to get correct than locks.”
To be fair, the students thought course-grained locks were the easiest to use. But to fairly judge that observation, the Austin team was using also using more arcane implementations of STM than the Karlsruhe study. They commented that the second years usability scores improved when they adopted a different STM library. The easy take-away here is that STM does deliver on its promise of safe parallelism. The more implied result is that fine-grained locks and TM provide scalability and fine-grained locking was hard to do and error prone. This latter statement comes from the fact that they did not present scalability or performance benchmarks associated with the students work. But, between these two studies, I think this is a viable conclusion.
At the end of last year there were many detractors that, as Ali and I feared, significantly blunted the TM hype and plunged it into the “trough of disillusionment”. STM detractors argue that the serial cost of transactional memory is too high to make it more than a research tool. As I discuss here, the serial cost is the least important part of the equation. In fact, it’s making a scalable application that you need to focus on. If STM can free you from worrying about correct execution in the very few cases you need to synchronize, then this productivity tool helps you scale your application. If you use it gratuitously such that the serial performance costs of STM are noticeable, then it is not helping you. In fact, I will argue you need to rethink how you are using it and likely will need to change your code to make it scalable – it’s not STM that is the problem but instead that you are using it incorrectly.
Does STM help you create this scalable architecture? I don’t think so. It does not impose a “well-disciplined” pattern on your work. You have to gain that discipline through other means. Instead, what it provides is a tool to avoid deadlock and be able to easily reason about the correct behavior of the small amount of code that impacts shared state.
What I am hoping is that research, such as presented by UT Austin and the University of Karlsruhe, is signaling an exit from the trough and now we as an industry can focus on delivering this technology and not focus on addressing hype. Their students used STM to create scalable, reliable applications. With less effort these students produced more readable and maintainable code that scaled better than the code of students who were forced to use traditional synchronization methods.
I encourage you to read the paper from WDDD and review the slides from ICSE. Is there actionable guidance there as well? I think that one area that practitioners of STM should investigate is how to make its programming model more approachable, understandable, and maybe even prescriptive in its use. I wonder what that looks like.
What do you think? Are we coming up onto the “Slope of Enlightenment”?