We are delighted to announce that CMA wrote an addendum to C++ AMP case study, detailing their problem and how they used C++ AMP. For context, we do recommend reading the C++ AMP case study, Financial Data Leader Shortens Time-to-Market, Increases Speed with Right Tools. This goes without saying, we are very thankful to Moody Hadi and CMA for providing this addendum (which is quoted verbatim below)
CMA needs to build multiple interest rate curves per currency that have relationships amongst them within the same currency. In the cases, when market inputs are volatile due to illiquidity in a section of the curve or due to genuine volatility, the analysts have to intervene in order to validate that the moves are genuine and if not rely on other sources in order to accurately derive that segment of the curve. This process is highly iterative where the analyst needs to try different market inputs and regenerate the curves and then check the impact of the changes they made on the entire set of curves. A change in a market input can have a ripple effect on other curves in the same set. The time window for the analysts to perform this process is short (~20minutes) daily and they have to validate multiple asset classes and currencies as well. Thus, the user experience for every analyst needs to be kept to as close to instantaneous as possible.
The set of relationships that exist amongst each curve set within the same currency translate to a number of constraints that ensure those relationships exist after deriving the curves from market inputs. In addition, each individual curve has to satisfy a number of constraints to ensure smoothness, some degree of locality and monotonicity. Finally, all the generated curves need to been consistent with all the market inputs used to generate them. The algorithm needs to generate the curves that satisfy these constraints.
A multidimensional gradient based root searching algorithm is used that has to solve for approximately 20 degrees of freedom and satisfy all the constraints to reach a solution. The algorithm itself is iterative and so is dependent on the input instrument selection. We used C++ AMP to remove the bottlenecks in cases when the instruments require more processing time, the gains we get there are compounded as each iteration requires much less time to converge.
Intel Core i7-860
Windows 8 Professional 64bit
The machine is used to process multiple currencies and asset classes as well. As such, we are trying to use the same hardware as efficiently as possible. During the analyst validation period there is a high degree of concurrency of requests coming into the machine, so if a certain set of curves have expensive instruments they can delay the processing of other requests.
We were able to reduce the processing time for curve groups from ~20 to 30 seconds (single threaded) to ~10 to 15 seconds (CPU multi threaded) to less than 5 seconds (GPU multi threaded)
An additional benefit is that with the use of GPU we are able to achieve a good degree of load balancing during critical time periods and not experience delay due to high processing requests whether they are due to incongruent market inputs or due to expensive instruments. Requests that can be processed quickly on the CPU are sent there and expensive requests can be simultaneously sent to the GPU to process thereby giving all users a fast and consistent experience.
The C++ AMP API was easy to use and allows us to quickly deploy and address bottlenecks without having to resort to more complicated APIs or perform major re-writes to the existing code base. The use of lambdas and STL like library made the entire coding process much easier and thus we were able to deploy our solution quickly, with a good degree of re-usability and without needing proprietary GPU API specific knowledge."