Parallel Programming in Native Code

Parallel programming using C++ AMP, PPL and Agents libraries.

concurrency::array_view – discard_data

concurrency::array_view – discard_data

  • Comments 0

The previous posts in this series on C++ AMP array_view covered:

  1. Introduction to array_view and some of its key semantic aspects
  2. Implicit synchronization on destruction of array_views

This post will describe the discard_data function on array_views.

discard_data

Often, your algorithm may need to use an array_view purely as an output, and the current contents of the array_view are inconsequential. When accessing such an output array_view on any accelerator_view for a computation, copying the current contents underlying the array_view to the accelerator is undesirable. C++ AMP provides the discard_data method as a means for you to indicate to the runtime that the contents of portion of data underlying the array_view are not interesting and need not be copied when the array_view is accessed on an accelerator_view where the array_view is not already cached. Calling this method on an array_view can be thought of as trashing the existing contents of the portion of data referenced by the array_view and applies to the array_view and all other array_views that are its sections or projections.

It is important to note that the effect of discard_data is transient – as soon as a discarded array_view is written to (on the host or any accelerator_view), the contents of the array_view become valid and are no longer considered discarded. Hence if an array_view is discarded, then written to on accelerator_view “av1 and then accessed on another location, the new contents of the array_view are transferred over from “av1” to the next location where it is accessed.

Another use of the discard_data method is to avoid any unwanted implicit synchronization upon destruction of the last array_view of a data source. Typical scenarios requiring this would be where temporary data sources and array_views over them are used for storing intermediate results in an algorithm and their final contents need not be synchronized to the data source.

Guidelines on discard_data

Guideline A: For an array_view that is exclusively used for output in your CPU code or a parallel_for_each invocation, call discard_data on the array_view before capturing the array_view in the parallel_for_each kernel or accessing it in your CPU code.

Keeping in mind the transient effects of a discard_data call, C++ AMP programmers are encouraged to call discard_data on the output array_view just before the parallel_for_each invocation, to serve the dual purpose of self-documenting the output-only nature of an array_view in the parallel_for_each and also to ensure that effects of the discard are not inadvertently lost due to intermediate writes to the array_view between the discard_data call and the parallel_for_each invocation where the array_view is used for output.

Guideline B: Call discard_data on array_views created on temporary data sources after they are no longer needed, before array_views on such data sources are destructed.

This would avoid any pending modifications for such array_views (on locations other that the data source’s home location), to be implicitly synchronized to the data source upon destruction of the array_views.

template <typename BinaryFunction>
float ReduceRow(const array_view<const float> &rowView,
                const BinaryFunction &func)
{
    std::vector<float> tempVec(rowView.extent[0] / 2);
    array_view<float> tempView(tempVec.size(), tempVec);
 
    // Guideline A: Call discard_data on an array_view that is to be used
    // purely as an output, to avoid unnecessary copying to the accelerator_view
    tempView.discard_data();
 
    parallel_for_each(tempView.extent, [=](index<1> idx) restrict(amp) {
        float a = matrixView(rowToReduce, idx[0]);
        float b = matrixView(rowToReduce, idx[0] + (numCols / 2));
        tempView(idx) = func(a, b); 
    });
 
    for (int stride = tempView.extent[0] / 2; stride > 0; stride /= 2) 
    {
        parallel_for_each(extent<1>(stride), [=](index<1> idx) restrict(amp) {
            float a = tempView(idx);
            float b = tempView(idx[0] + stride);
            tempView(idx) = func(a, b);
        });
    }
 
    int result;
    copy(tempView.section(0, 1), &result);
 
    // Guideline B: Call discard_data on the temporary view before it goes 
    // out of scope to avoid the view from being synchronized to the CPU on destruction
    tempView.discard_data();
 
    return result;
}

 

In closing

In this post we looked at the discard_data function for array_views. Subsequent posts will dive into other functional and performance aspects of array_view - stay tuned!

I would love to hear your feedback, comments and questions below or in our MSDN forum.

Blog - Comment List MSDN TechNet
  • Loading...
Leave a Comment
  • Please add 6 and 2 and type the answer here:
  • Post