Parallel Programming in Native Code

Parallel programming using C++ AMP, PPL and Agents libraries.

concurrency::array_view - Implicit synchronization on destruction

concurrency::array_view - Implicit synchronization on destruction

  • Comments 0

The previous post in this series introduced array_views, why you should prefer using them in your C++ AMP code, and some key semantic aspects of array_view. In this post we will look at the implicit synchronization behavior on destruction of array_views.

Implicit synchronization on destruction

Our earlier blog post on array_view synchronization lists the common operations that result in implicit synchronization of an array_view’s contents across different locations (accelerator_views and/or the host CPU). While most instances that trigger an implicit synchronization are fairly intuitive and easy to reason about, the semantics around the implicit synchronization upon destruction of an array_view are somewhat subtle.

When the last array_view associated with a data source is destructed, any pending modifications on a location other than the home storage location of the data source are synchronized back to the data source (unless the modifications were explicitly discarded using the discard_data API).

This is done to ensure that any modifications to the data are not inadvertently lost. The implicit synchronization is done only on destruction of the last array_view associated with a data source (basically concurrency::array objects and the top-level array_views created from CPU data). Synchronizing on destruction of all array_views is obviously undesirable (since it will cause expensive and unnecessary data transfers) as long as you have another array_view on that data source.

As mentioned in the introductory post of this series, the runtime relies on the type of the array_view to determine whether the contents of an array_view captured in a parallel_for_each may be modified on the accelerator_view by the parallel_for_each invocation. An array_view<const T> object captured in a parallel_for_each indicates to the runtime that the array_view is only read from and will not be modified on the accelerator_view. An array_view<T> object captured in a parallel_for_each indicates read-write access. The compiler attempts to analyze the array_view usage in the parallel_for_each kernel and inform the runtime if it is only read from, even if the type of the array_view object indicates read-write access. However there are no guarantees regarding the compiler always being able to determine this and it is highly recommended that programmers use an array_view<const T> object for read-only data, to explicitly communicate the read-only usage intent to the runtime.

Another important detail with respect to implicit synchronization of modifications on array_view destruction is that the destructor swallows any exceptions during the implicit synchronization (general C++ rule for exception handling reasons).

Guidelines regarding implicit synchronization on destruction

 

Guideline A: Use an array_view<const T> object when the array_view is only read from (not written to) in a parallel_for_each or in your CPU code.

The use of array_view<const T> leads to self-documenting code and also precludes the runtime from thinking that the data is being modified on an accelerator_view. Not doing so may result in an unexpected implicit synchronization on destruction of the last array_view associated with that data source. There are other reasons too why you should explicitly indicate the read-onliness of the array_view but let us save those for another post.

Guideline B: Do not rely on the implicit synchronization on destruction behavior and always perform the synchronization explicitly before the destruction of the array_view, unless the array_view contents have been discarded using discard_data.

This ensures that any exceptions encountered during the synchronization are propagated to the application.

Let us see these guidelines in action:

template <typename T>
void VectorAddition(float *A, float *B, float *C, int numElements)
{
    // Guideline A: Explicitly specify read-onliness by creating array_view<const T>
    // since if the compiler fails to infer read-onliness the contents would not
    // be unnecessarily synchronized from the accelerator_view to the CPU data source
    // on destruction of the views 
    array_view<T> viewA(numElements, A);
    array_view<const T> viewA(numElements, A);
 
    array_view<T> viewB(numElements, B);
    array_view<const T> viewB(numElements, B);
 
    array_view<T> viewC(numElements, C);
    viewC.discard_data();
 
    parallel_for_each(viewC.extent, [=](index<1> idx) restrict(amp) {
        viewC(idx) = viewA(idx) + viewB(idx);
    });
 
    // Guideline B: Explicitly synchronize instead of relying on the implicit
    // synchronization on destruction which would swallow exceptions.
    viewC.synchronize();
}

 

In closing

In this post we looked at the implicit synchronization behavior for array_views on destruction. Subsequent posts will dive into other functional and performance aspects of array_view - stay tuned!

I would love to hear your feedback, comments and questions below or in our MSDN forum.

Blog - Comment List MSDN TechNet
  • Loading...
Leave a Comment
  • Please add 7 and 6 and type the answer here:
  • Post