Parallel Programming in Native Code

Parallel programming using C++ AMP, PPL and Agents libraries.

section example in C++ AMP

section example in C++ AMP

  • Comments 0

In this blog post I will give a simple example of using the section member function for array and array_view, demonstrating how to offset your origin point in order to operate on a smaller section of data in your computation. So for example if your data is matrix that looks like this:

array_view<float, 2> qin(height, width, data);

Where height and width are divisible by 2, you can view it in four quarters as follows:

array_view<float, 2> q1 = qin.section(index<2>(0, 0), extent<2>(height/2, width/2));

array_view<float, 2> q2 = qin.section(index<2>(height/2,0), extent<2>(height/2, width/2));

array_view<float, 2> q3 = qin.section(index<2>(0,width/2), extent<2>(height/2, width/2));

array_view<float, 2> q4 = qin.section(index<2>(height/2, width/2));

clip_image001

Below is a complete code example that does a summation of all elements in the array_view ‘qin’ and places the result in the first element. The algorithm views the data as two dimensions and splits it into four quarters, and then it sums up all elements in one quarter ‘qout’. By repeating this operation making ‘qout’ to be ‘qin’ it stores the overall reduction result in qin(0,0).

example

The code demonstrates the section functionality, but is not aimed to be (and indeed isn’t) an optimum implementation of a reduction algorithm (we have one of those in the pipeline) – it was written simply to demonstrate usage of the section API.

 1: #include <amp.h>
 2:  
 3: using namespace concurrency;
 4: using std::vector;
 5:  
 6: void main()
 7: {
 8:   // a small data size for example
 9:   // a sample constrain require data to be equal and power of 2
 10:   int width = 16;
 11:   int height = 16;
 12:  
 13:   // generate dummy data
 14:   vector<float> data (width * height);
 15:  
 16:   for (int x = 0; x < (width * height); x++)
 17:   {
 18:     data[x] = x * 1.0f;
 19:   }
 20:  
 21:   // wrap data so it is ready to copy to accelerator
 22:   array_view<float,2> qin(height, width, data);
 23:  
 24:   // repeat reduction
 25:   // till data can't be reduced
 26:   while(width > 1)
 27:   {
 28:     height /= 2;
 29:     width /= 2;
 30:     extent<2> quarterdim(height, width);
 31:     array<float,2> qout(quarterdim);
 32:  
 33:     // view the data in 4 quarters 
 34:     // create an array_view with offset to each quarters
 35:     const array_view<const float,2> q1 =
 36:             qin.section(index<2>(0, 0) /*origin*/, quarterdim /*extent*/);
 37:     const array_view<const float,2> q2 =
 38:             qin.section(index<2>(height, 0), quarterdim);
 39:     const array_view<const float,2> q3 =
 40:             qin.section(index<2>(0, width), quarterdim);
 41:     const array_view<const float,2> q4 =
 42:             qin.section(index<2>(height, width));
 43:  
 44:     // execute the kernel to accumulate all quarters into the first one
 45:     parallel_for_each(quarterdim, [=, &qout] (index<2> idx) restrict(amp)
 46:     {
 47:       // accumulate all quarters in output quarter
 48:       // using same index but in different section
 49:       qout[idx] = q1[idx] + q2[idx] + q3[idx] + q4[idx];
 50:     });
 51:  
 52:     // set output data array as input view
 53:     // for next loop
 54:     // NOTE: that doesn't sync data from GPU to host
 55:     qin = qout;
 56:  
 57:     // only for demo, print output data
 58:     // transition after every iteration
 59:     for(int y = 0; y < height; y++)
 60:     {
 61:       for (int x = 0; x < width; x++)
 62:       {
 63:         // accessing qin here force sync that quarter back to host
 64:         // this cause a performance hit 
 65:         printf( "%0.1f ", qin(y, x));
 66:       }
 67:       printf("\n");
 68:     }
 69:     printf("===============================================\n");
 70:  
 71:   } // while loop
 72:  
 73:   // final summation result can be obtained from
 74:   // qin(0,0) here
 75: }
 76: // Sample print out
 77:  
 78: //272.0 276.0 280.0 284.0 288.0 292.0 296.0 300.0
 79: //336.0 340.0 344.0 348.0 352.0 356.0 360.0 364.0
 80: //400.0 404.0 408.0 412.0 416.0 420.0 424.0 428.0
 81: //464.0 468.0 472.0 476.0 480.0 484.0 488.0 492.0
 82: //528.0 532.0 536.0 540.0 544.0 548.0 552.0 556.0
 83: //592.0 596.0 600.0 604.0 608.0 612.0 616.0 620.0
 84: //656.0 660.0 664.0 668.0 672.0 676.0 680.0 684.0
 85: //720.0 724.0 728.0 732.0 736.0 740.0 744.0 748.0
 86: //===============================================
 87: //1632.0 1648.0 1664.0 1680.0
 88: //1888.0 1904.0 1920.0 1936.0
 89: //2144.0 2160.0 2176.0 2192.0
 90: //2400.0 2416.0 2432.0 2448.0
 91: //===============================================
 92: //7616.0 7680.0
 93: //8640.0 8704.0
 94: //===============================================
 95: //32640.0
 96: //===============================================
 
Observe in the sample that array_view objects captured in the kernel need read only access to data, that is why I declared them as array_view<const float,2>.

Also notice that ‘q1’ creation - line(35) - can benefit from the section overloads to retrieve same view as follows:

array_view<float,2> q1 = qin.section(quarterdim);

In this case the extent is inferred to cover the rest of the parent array/array_view.

array_view<float,2> q1 = qin.section(0, 0, height, width);

Similarly q2 and q3 can be created using the latter section function call.

Finally, one might look close to ‘q1’ and ask couldn’t ‘qin’ replace its functionality and reduce the number of lines of code? The answer is “yes”, but that would introduce a performance overhead; instead of copying 4 quarters to GPU memory, this change will copy 3 quarters plus the whole matrix. Also copying data back to the host would again copy the whole matrix instead of just one quarter of it.

That completes my example for creating sub-sections using the section member function. Feel free to ask questions in the comments section below or in our MSDN forum.

Blog - Comment List MSDN TechNet
  • Loading...
Leave a Comment
  • Please add 8 and 3 and type the answer here:
  • Post