Super computers and parallel processing have long been a complex topic to address in Computer Science. Practical applications of these algorithms typically involve massive datasets that can be easily broken down and distributed across many computational nodes (such as the IBM Blue Gene supercomputer).  Workloads that are good candidates for parallelization are scientific data analysis such as genetic/genome mapping or commercial applications such as rendering ray-traced views for motion pictures.  Most of the customers I work with are in the Enterprise space but occasionally there are needs for these customers to take large datasets and perform calculations to test a theory or identify a trend in data.  These customers seek an easy to use method to build and deploy these applications; often times to a set of servers that do not have common/connected memory buses and high speed links connecting nodes, traits characteristic with the clusters on the top-10 list of supercomputers in the world.

While on paper one would hope for a direct linear increase in performance if you break a complex task across N physical nodes, there are theoretical limits involving the overhead of breaking work apart, physical properties of electrons and the size of the dataset being evaluated by a given application.  There are models one can use to identify the potential speed-up of a given task described by Amdahl’s Law involving a calculation based on the relationship of serial vs. parallel components of the computational algorithm used.  Gustafson’s law picks up on Amdahl’s law to prove that, with larger sized data sets and larger compute clusters, more work can be performed in the same amount of time as a smaller dataset on smaller clusters.

Theory aside, my colleague came up with a brilliant and simplistic solution for partitioning work across physical nodes using the Task Parallel Library (TPL) in .NET 4.5.  Check out the details of the solution and download the code sample from the October 2012 issue of MSDN magazine.