Parallel Programming in Native Code

Parallel programming using C++ AMP, PPL and Agents libraries.

Welcome to the Native Concurrency blog!

Welcome to the Native Concurrency blog!

  • Comments 5

Welcome to the Parallel Programming in Native Code blog.  I started this blog so that I and others on my team would have a place to talk about topics relating to native concurrency.  I want to use this blog to provide early looks into what we’re thinking about, give announcements about any publicly available content or CTPs and of course respond to feedback that we receive from readers and customers.

We’ve talked before about the shift in hardware that is occurring as hardware vendors move towards multi- and many-core machines. We’ve also suggested that to take full advantage of this new hardware, the development platform will very likely need to grow and change. 

Today, I’d like to share some of the recurring major obstacles we see for C++ developers, provide insight into our design goals and a very brief glimpse of a technology we’re exploring to help address this, the Concurrency Runtime.  I’ll also touch upon a parallel library built on top of it intended to help C++ developers be more productive at building applications that scale.

If you’re at TechEd 2008 in Florida you may have already seen this content in part of a talk titled “Parallelize Your C++ Applications with the Concurrency Runtime.”  I’m really only talking about the first few minutes here and in future posts, I’ll walk through most of what was presented in that talk.

What are the major problems?

I’ve talked to a lot of C++ developers and ISVs about concurrency and the technical challenges that they face.   There are 2 major themes that always seem to come up. 

First and almost invariably I hear that multi-threading programming is considered quite difficult because of the challenges ensuring that concurrent code is correct, reliable and race and deadlock free.  Often folks will tell me that only a very small number of senior developers or concurrency experts do this work within an organization or company.

Another thing I hear a lot is that the amount of actual work and code required to take a portion of code and make it concurrent is significant.  The APIs for threading, events and locks are relatively low level and it isn’t always particularly obvious how to build concurrency into an application with them.  Building an application that scales well across a mix of hardware with these APIs is even harder.

Enabling productive concurrency

This is where our team comes in, we’re looking to help overcome these barriers and improve productivity for C++ developers.

We want to make expressing concurrency easier by adding abstractions for describing opportunities for parallelism that maintain the original intent, readability and composability of the code. 

We’re trying to minimize the number of new concepts we introduce to ensure that the model remains approachable and familiar to mainstream C++ developers.

We’re exploring ways for developers to overcome the challenges of shared memory by providing a means of describing applications as isolated components that communicate with a rich message passing interface.

We’re looking at providing a common and efficient Concurrency Runtime that supports a broad range of parallel abstractions and removes the need for developers to build this infrastructure.

A simple example with matrix multiply

I’ll provide a brief example before I close this post down…  Here’s an example of a naive matrix multiplication, I’d like to show how easy it can be to express concurrency in something simple like a for loop:

void MatrixMult(int size, double** m1, double** m2,double** result){

    for (int i = 0; i < size; i++){

        for (int j = 0; j < size; j++){

            for (int k = 0; k < size; k++) {

                result[i][j] += m1[i][k] * m2[k][j];

            }

        }

    }

}

 

And here’s a possible parallel version (note how similar it is):

void MatrixMult(int size, double** m1, double** m2,double** result){

    parallel_for (0,size,1,[&](int i){

        for (int j = 0; j < size; j++){

            for (int k = 0; k < size; k++){

                result[i][j] += m1[i][k] * m2[k][j];

            }

        }

    });

}

In this example I’ve taken the outer for loop and replaced it with a call to a parallel_for template function.  I’m also using a new C++0x language feature called lambdas to automate the work of manually creating a functor and capturing the variables used in the function.

We’ll be sure to discuss in more detail the libraries and runtime in future posts, but that’s all for now.  See you next time!

-Rick

Blog - Comment List MSDN TechNet
  • Loading...
Leave a Comment
  • Please add 8 and 3 and type the answer here:
  • Post