The basic computational model we decided to adopt for Dryad is the directed-acyclic graph (DAG). Each node in the graph is a computation, and each edge in the graph is a stream of data traveling in the direction of the edge. The amount of data on any given edge is assumed to be finite, the computations are assumed to be deterministic, and the inputs are assumed to be immutable. This isn’t by any means a new way of structuring a distributed computation (for example Condor had DAGMan long before Dryad came along), but it seemed like a sweet spot in the design space given our other constraints.


So, why is this a sweet spot? A DAG is very convenient because it induces an ordering on the nodes in the graph. That makes it easy to design scheduling policies, since you can define a node to be ready when its inputs are available, and at any time you can choose to schedule as many ready nodes as you like in whatever order you like, and as long as you always have at least one scheduled you will continue to make progress and never deadlock. It also makes fault-tolerance easy, since given our determinism and immutability assumptions you can backtrack as far as you want in the DAG and re-execute as many nodes as you like to regenerate intermediate data that has been lost or is unavailable due to cluster failures.


The obvious way to generalize the DAG model would be to allow cycles in the graph, but that makes both scheduling and fault-tolerance more complicated. They are still possible, but we decided to go with the simpler approach and see what we ended up being able to do with it; and leave as a research question the extent to which adding cycles would be useful.


On the other hand, there’s no obvious way to restrict the DAG model that makes the underlying system any simpler. Although MapReduce is I think a simpler programming model than Dryad, arguably the system itself is, at least conceptually, more complicated, since the Map nodes and the Reduce nodes have different scheduling and fault-tolerance properties and implementations. In addition when the system wants to optimize some data transfer, e.g. building an aggregation tree or sampling a dataset to find a balanced range partition, the optimization can usually be expressed as a subgraph of a DAG, and doesn’t need to be “hand-rolled” the way it would if you wanted to add it to a system like Hadoop. So the big advantage of adopting a general DAG as the low-level computational abstraction is that there’s just one state machine that handles all computation nodes. Of course, any real implementation of MapReduce or a DAG-driven system like Dryad can get complicated, but the topology of the data flow is not in itself a source of complexity in Dryad.


One thing we learned fairly early in the Dryad project was that people don’t want to build DAGs by hand: it’s too low-level, and people don’t want to have to learn to think about how to think about their algorithms in terms of dataflow graphs. So this seems like it might be a disadvantage of something like Dryad compared with MapReduce, whose simple programming model is often touted as its big advantage. But with a couple of years of experience, and having talked to a lot of people in academia and industry who have been using Hadoop, I’m not sure that’s really true. In practice, once you get beyond a coursework assignment, most interesting computations cannot be expressed purely as a single Map followed by a single Reduce. Instead, it turns out, you typically need to take the output of the first Reduce and do another round of Map and another round of Reduce, and so on. Often you are trying to write an iterative algorithm (e.g. something like k-means) or very commonly you need to do something like a Join that combines information from two input sets. Join is the basic construct that is used for most graph traversal algorithms (you are combining data at the nodes with an edge map), and it is also used all over the place when there are common subexpressions in a computation: the common result is computed once and then “joined” with other data several times as necessary.


In other words, MapReduce is really too low level for most programming tasks as well; programmers don’t want to have to break up a complicated algorithm into a set of Maps and Reductions, and manually manage all the types and manually write a script to sequentially execute a series of MapReduce jobs. Good evidence for this comes from the proliferation of higher-level language layers that have been built on MapReduce and Hadoop, including Sawzall, PigLatin, HIVE, Cloudbase, and Yahoo!’s Hadoop SQL.


So: if programmers aren’t actually directly writing Map and Reduce functions but instead using a higher-level language; and the MapReduce system isn’t easier to implement than a general DAG model (especially once you have added optimizations for things like sorting and hierarchical aggregation and distribution); and the high-level language you write can generate better-performing execution plans when it has access to a general DAG rather than a limited set of constructs such as Map and Reduce, then why would you choose to implement MapReduce over something like Dryad? We don’t think you would, but then we are biased :)


Michael Isard