If you follow the excellent Parallel Programing with .Net blog, you will have read a recent post by Emad Omara demonstrating a Parallel Merge Sort using Barrier. While there may be more efficient parallel sorting options, as this post notes, this is a good demonstration of the usage of a Barrier, and presents a reasonable parallel sorting solution. As such I thought it would be useful to present this code in F#.
Before going into the code in detail here are some comparisons of the F# parallel sorting performance on my quad core laptop, for an array of 5 million floats.
As you can see the parallel sort performance is as stated quite reasonable; at least doubling the base sort performance. The code to be demonstrated will provide an implementation for all six operations; sort, sortInPlace, sortBy, sortInPlaceBy, sortWith, and, sortInPlaceWith.
A quick point about the original implementation. In the original code it is the original array that is sorted. F# sort operations support sorts that return a new array and sort that operate InPlace. As such the InPlace performance will be slightly faster. The reason for this is that for the non-InPlace version the original array is copied into a secondary array, which is sorted InPlace.
So without further adieu, what does the code look like? Here is a complete listing:
As you can see, the internal method SortInPlaceInternal is the actual implementation of the sort. The remaining members deal with settings parameters for calling this function, based on the sort options.
Whereas the nature of this sort is identical to the original implementation, there are some subtle differences. These mostly dealing with comparison, optional projection for the sortBy, and array references.
Firstly it is worth talking about the implementation of the Barrier and array references. In this Barrier implementation the partition size is decreased (as before) and a flag is set indicating that a swap has occurred. So what does this mean? The swapped flag is used to determine the direction of the merge operation after the initial parallelized sorts. When a merge is performed the From and To arrays are determined function in which the returned array is determined by this swapped flag:
So why do this? The rational behind this was so that the array references between merge passes did not have to change. Instead the merge operation just gathers a reference to the appropriate array.
As mentioned one of the big differences in this implementations is the processing of comparisons and the optional projection for sortBy operations.
During the sort process there are 2 comparisons that take place. The first is that for the initial Array.Sort, and the second is that used for the merge steps. The reason for this distinction is the use of Array.Sort for sorting the sections of the array in Parallel. Array.Sort supports the optional use of an optional IComparer(T) generic interface. Thus when a comparer is specified it has to be converted to this interface. Luckily F# makes this easy:
When dealing with the optional projection for sortBy one could similarly construct an object implementing IComparer using a comparer like:
However, for sorting performance, I found a more efficient approach was to define an array containing the projected keys for the array. This allows the usage of the override for Array.Sort that takes a set of keys, in addition to the array.
Thus combing both the comparer and projection requirements for sorting, each parallel range is sorted with the following:
For merging, if possible structural comparisons are used. Thus the comparison of array elements is defined using the following:
With the internal sorting process defined all that remains is to actually perform the sort:
As in the previous example an optimization is in place to ensure that small arrays are sorted using the base sort operations. One can however configure this to suit as necessary; as one can also configure the total worker threads.
As one of objectives of this exercise was to provide sort operations on the Array.Parallel module, a few extensions are defined:
These definitions allow one to perform parallel sort operations as one would do for base sorting:
So if you have the need to perform parallel sort operation hopefully this will get you off the ground. If you want to see the code run, here is an fsx source file definition that I have used for testing.
In future posts I will look at QuickSort options in more detail.
Written by Carl Nolan