Parallel Programming in Native Code

Parallel programming using C++ AMP, PPL and Agents libraries.

Using function objects instead of lambdas with C++ AMP

Using function objects instead of lambdas with C++ AMP

  • Comments 0

Most of the code samples shared by our team have been passing lambda expressions to the parallel_for_each call. Lambdas, which have been supported in VC++ since VS2010, are a new feature in C++11, which was published just over 2 months ago. There seems to be no good reason why you would not want to use lambdas, but since someone asked the question, “yes” you can absolutely use function objects (a.k.a. functors) instead, and read on to see how!

Original algorithm using lambda expressions

The good starting point will be a simple matrix multiplication algorithm. Its implementation using lambdas was introduced as follows:

void MatrixMultiplySimple(std::vector<float>& vC,

                          const std::vector<float>& vA,

                          const std::vector<float>& vB,

                          int M, int N, int W)

{

       using namespace concurrency;

       array_view<const float, 2> a(M, W, vA);

       array_view<const float, 2> b(W, N, vB);

       array_view<float, 2> c(M, N, vC);

       c.discard_data();

       parallel_for_each(c.extent, [=](index<2> idx) restrict(amp)

       {

              int row = idx[0]; int col = idx[1];

              float sum = 0.0f;

              for(int i = 0; i < W; i++)

                    sum += a(row, i) * b(i, col);

              c[idx] = sum;

       });

}

Transforming the lambda expression to a function object

Under the hood, lambdas are just function objects generated for your convenience by the compiler. They do not convey any functionality that could not be added to hand-crafted functors. Therefore transforming one to another is a matter of few simple steps:

  1. Define a class with a unique name, recall this can be done locally in a function.
  2. Define the class constructor accepting the same parameters as variables that were captured by the lambda (remember about those captured implicitly) and store them in the class member fields, keeping the pass-by-value and pass-by-reference semantics intact.
  3. Define the class call operator having the same return type, parameter list, restriction specifier and body as the original lambda expression.

The concurrency::parallel_for_each invocation site will then slightly change to accept the instance of the function object.

Applying the steps to the presented example will result in the following function:

void MatrixMultiplySimple(std::vector<float>& vC,

                          const std::vector<float>& vA,

                          const std::vector<float>& vB,

                          int M, int N, int W)

{

       using namespace concurrency;

       array_view<const float, 2> a(M, W, vA);

       array_view<const float, 2> b(W, N, vB);

       array_view<float, 2> c(M, N, vC);

       c.discard_data();

 

       class functor

       {

       public:

              functor(const array_view<const float, 2>& a,

                     const array_view<const float, 2>& b,

                     const array_view<float, 2>& c,

                     int W)

                     : a(a)

                     , b(b)

                     , c(c)

                     , W(W)

              {

              }

 

              void operator()(index<2> idx) const restrict(amp)

              {

                     int row = idx[0]; int col = idx[1];

                     float sum = 0.0f;

                     for(int i = 0; i < W; i++)

                           sum += a(row, i) * b(i, col);

                     c[idx] = sum;

              }

 

       private:

              array_view<const float, 2> a;

              array_view<const float, 2> b;

              array_view<float, 2> c;

              int W;

       };

 

       parallel_for_each(c.extent, functor(a, b, c, W));

}

The result is clearly more explicit, or unnecessarily long-winded if you will.

Rules for lambdas in parallel_for_each also apply to functors

The function object call operator will be used instead of a lambda expression to invoke the computation. There are the same requirements imposed on it as on a lambda used in similar context: it must have an amp restriction and be callable with an index<N> or a tiled_index<N> argument.

Additionally, the function object has to adhere to the rules for compound types used in the amp context.

Functors are still… functors

Apart from the aforementioned rules, the function object behaves as a normal C++ object. It can implement member functions that will be executed only on the CPU side and provide restrict(amp) member functions that will be called from its call operator. Finally, it can use both inheritance and templates (in fact these might be the only valid reasons for preferring it over a lambda, in my opinion).

The function object may be created anywhere in the program and passed around using a pointer, a reference or copy semantics. Eventually it reaches a parallel_for_each algorithm, where it is passed by value. Be aware that a copy constructor will be invoked at this point by the C++ rules, but in the further processing in the C++ AMP runtime the object will be merely blitted, so copy constructors, move constructors and destructors will be suppressed.

The last thing to note is that, on the contrary to std::for_each, the function object will not be returned back from the C++ AMP runtime, so any side-effects made to its member fields will be lost.

Blog - Comment List MSDN TechNet
  • Loading...
Leave a Comment
  • Please add 3 and 4 and type the answer here:
  • Post