Welcome to MSDN Blogs Sign in | Join | Help

Watcher of the skies

Channel 9, Astrobiology, Operating Systems, and other unrelated things

Syndication

Distributed Data-Parallel Computation in the Cloud: Dryad - Now Available for Academic Use

Microsoft Research recently announced the availability, under Academic Licensing, of Dryad, an infrastructure which allows a programmer to use the resources of a computer cluster or a data center for running data-parallel programs. A Dryad programmer can use thousands of machines, each of them with multiple processors or cores, without knowing anything about concurrent programming

That's quite a statement! Well, when you think about what happens when you submit a query to a general purpose search engine like Bing, for example, you can imagine that what happens on the other side of the fence (the distributed search infrastructure) happens so quickly and efficiently because of highly parallel computations across many servers. So, you don't have to understand the details of search computation occuring behind the scenes to input a search term and get a bunch of results... DryadLINQ is similar from a programming perspective -> you create a LINQ query which, on the surface, is just a set of sequential query commands (but only on the surface...). The DryadLINQ compiler takes the resulting AST and creates a Dryad vertex topographical map that gets handed over to the Dryad runtime. "The computation is structured as a directed graph: programs are graph vertices, while the channels are graph edges. A Dryad job is a graph generator which can synthesize any directed acyclic graph. These graphs can even change during execution, in response to important events in the computation. Dryad is quite expressive. It completely subsumes other computation frameworks, such as Google's map-reduce, or the relational algebra. Moreover, Dryad handles job creation and management, resource management, job monitoring and visualization, fault tolerance, re-execution, scheduling, and accounting", say the Dryad people.

DryadLINQ is the managed high level programming abstraction (think LINQ to DistributedDataParallelComputation :-)) used to compose Dryad vertex topology graphs that the Dryad infrastructure uses to partition, manage and schedule parallel computations.

In essence, Dryad and DryadLINQ enable a sequential programming experience over what will execute across potentially thousands of machines (depending upon the computational complexity of the program) concurrently. There's a good introductory piece on Dryad/DryadLINQ over on Channel 9 that covers the basics and provides a glimpse into some of the thinking behind the thinking...

In the near future, Channel 9 will present a Going Deep episode that digs into the architecture and composition of Dryad with one of the scientists who designed and implemented the system.

Published Thursday, July 16, 2009 1:45 PM by LifeOnTitan

Comments

No Comments

Anonymous comments are disabled
Page view tracker