Excel Workbook Offloading Beta Transition Guide

The Beta 2 release of Windows HPC 2008 R2 that was made available this past spring included the first public release of a feature allowing developers to take a computationally intensive workbook, add some VBA code, and run it on a cluster entirely within the workbook. By implementing a set of pre-defined VBA functions to parallelize and coordinate computation, existing business logic can used across a set of workbooks running in parallel on a cluster.

This post is for those developers and partners that have already started using HPC Beta 2 to offload workbook computation or received training on the API as it was defined in the Beta release. Based on customer and partner feedback, we made a number of breaking changes for the release candidate (RC) to ensure that the first official release met the needs of our customers in an intuitive and robust manner. This guide will walk through those changes, why they were made, and how to update your existing Beta workbooks to use the new APIs. For those of you who are unfamiliar with this feature, I highly suggest reading the HPC Services for Excel white papers (links available here ) and encourage you to think about what workbooks you use that may benefit from parallelization (see also Developer resources for Excel).

This article addresses the following changes from Beta 2 to RC:

Macro Specification

  • For workbook offloading, the workbook must now include a new macro called HPC_GetVersion().

ExcelClient COM API

  • ExcelClientV1 and IExcelClientV1 have changed to ExcelClient and IExcelClient, respectively.
  • To select a resource type, use SessionUnitType instead of ResourceUnitType.
  • The workbook path is now a parameter for the ExcelClient.OpenSession method instead of for the ExcelClient.Initialize method.
  • The ExcelClient.OpenSession method includes new parameters to specify the workbook path, the service name, and a job template.
  • The ExcelClient.DeployWorkbook method was removed.

Macro Specification

As part of the Beta 2 release, we provided a specification for macros that must be implemented to allow a workbook computation to be divided into pieces (HPC_Partition), calculated in parallel (HPC_Execute), and aggregated and displayed to the user (HPC_Merge). In addition to these and the other optional macros defined in Beta 2 (HPC_Initialize, HPC_Finalize, and HPC_ExecutionError), a final macro, HPC_GetVersion is now required. It is expected to be implemented as follows.

Function HPC_GetVersion() As String
{
  HPC_GetVersion = "1.0"
}

This macro is necessary for future versioning support in the eventuality that we change the macro specification to support additional features in future releases. By implementing HPC_GetVersion now, you will be ensuring that your workbook will run correctly across future releases of HPC Services for Excel. This is a required function and ExcelClient.Initialize will throw if it is not implemented or returns an incompatible version ("1.0" is the only compatible version for this release).

ExcelClient COM API

Type Names

To initiate calculation and request cluster resources, a COM-visible library is available called Microsoft_HPC_Excel. In the Beta 2 release, we exposed two types: ExcelClientV1 and IExcelClientV1, the class and interface respectively. In the RC, the names of these types have changed to ExcelClient and IExcelClient, dropping the V1 suffix. As our future versioning plans solidified we realized that this naming scheme will better enable us to seamlessly support forward/backward compatibility among workbooks written targeting multiple versions of the HPC Services for Excel libraries.

SessionUnitType Enumeration

In Beta2, developers were instructed to use the ResourceUnitType enumeration to select the type of resource (core, node, or socket) requested in ExcelClient.OpenSession. This type was defined in Microsoft.Hpc.Scheduler.Properties. In an effort to increase the modularity and independence of our components, we now expose a separate enumeration over COM called SessionUnitType in Microsoft.Hpc.Excel. This enum contains the same possible selections of core, socket, or node allocation.

ExcelClient.Initialize Method

In Beta 2, the ExcelClient.Initialize method signature was defined as follows.

ExcelClient.Initialize (Microsoft.Office.Interop.Excel.Application app, string remoteWorkbookPath)

  • app - Currently running Excel application
  • remoteWorkbookPath - path where the compute nodes in the cluster can find the workbook

The first thing we realized was that specifying the remoteWorkbookPath in ExcelClient.Initialize was unintuitive. Initialize may only be called once per ExcelClient instance, so changing the workbook path between calculations was not possible without disposing and recreating an ExcelClient instance. In addition, this path is not required if calculating locally and it doesn't make sense to require a parameter that is never used. To remedy both of these issues, remoteWorkbookPath is now a parameter to the ExcelClient.OpenSession method.

The remaining parameter to ExcelClient.Initialize, the Excel application, was also changed. It is possible to have multiple workbooks open within a single Excel instance, but when offloading workbook computations, all of the interaction is with the workbook, so it is less ambiguous to pass a reference to the workbook that implements the partition/execute/merge macro specification rather than the application which has that workbook (and potentially other workbooks) open.

With those changes, ExcelClient.Initialize now has the signature:

ExcelClient.Initialize (Microsoft.Office.Interop.Excel.Workbook workbook)

  • workbook - Excel workbook that contains the functions that implement the macro specification

ExcelClient.OpenSession Method

In Beta 2, the ExcelClient.OpenSession method signature was defined as follows.

ExcelClient.OpenSession(string headNode, int minResources, int maxResources, ResourceUnitType resourceType)

  • headNode - name of the head node of the cluster to use
  • minResources - minimum number of resources required
  • maxResources - maximum number of resources required
  • resourceType - type of resource requested. Defined by Microsoft_Hpc_Scheduler_Properties.ResourceUnitType. Core, Socket, or Node types available.

As discussed in the ExcelClient.Initialize Method section above, the remoteWorkbookPath parameter specifying where cluster compute nodes can find the workbook is now included in the parameter list for ExcelClient.OpenSession. ResourceUnitType was replaced by SessionUnitType as discussed above as well. Several additional parameters were also added while making a number of parameters optional.

The additional parameters added are a job template and a service name. The job template parameter allows a cluster administrator to create a template that defines things like job priority and node groups. This is particularly helpful for Excel workbook offloading because many clusters may only have Excel installed on a subset of compute nodes, and creating an Excel-specific job template could allow jobs to be run only on those nodes that have Excel installed.

The default service name is Microsoft.Hpc.Excel.ExcelService and a configuration file called Microsoft.Hpc.Excel.ExcelService_1.0.config is installed to the service registration directory. Unfortunately, this implies that the same configuration can be used for all Excel workbook offloading jobs. In some cases (extremely long running HPC_Execute macros, for example), it may be important to provide different configurations for different workbooks. This can be accomplished by creating a separate configuration file with a distinct name, say LongRunningExcel.config, and passing the service name, in this case "LongRunningExcel", to ExcelClient.OpenSession.

Finally, with the exception of the headNode parameter which selects a cluster and the remoteWorkbookPath parameter which specifies where compute nodes can find the workbook, all parameters were made optional and product-wide defaults will be used for each unspecified parameter.

After these changes, the method signature is as follows.

ExcelClient.OpenSession(string headNode, string remoteWorkbookPath, [int minResources], [int maxResources], [SessionUnitType resourceType], [string jobTemplate], [string serviceName])

  • headNode - name of the head node of the cluster to use
  • remoteWorkbookPath - path where the compute nodes in the cluster can find the workbook
  • minResources - optional parameter specifying the minimum number of resources required
  • maxResources - optional parameter specifying the maximum number of resources required
  • resourceType - optional parameter specifying the type of resource requested. Defined by Microsoft_Hpc_Excel.SessionUnitType. Core, Socket, or Node types available.
  • jobTemplate - optional parameter specifying a job template to use
  • serviceName - optional parameter specifying a service name to use

ExcelClient.DeployWorkbook Method

In an effort to clarify what functionality we provide make use of existing functionality where already provided, we have removed the DeployWorkbook method. This method essentially saved a copy of the workbook in a specified directory, which is functionality provided by the ThisWorkbook.SaveCopyAs method provided in Excel's API.

Example Transition

Beta 2 API usage

 

Final API usage after transition

 

Additional Changes to Notice

  • Similar changes were made in the ExcelClient .NET API, and using this guide should make .NET code transition easy as well.
  • ExcelClient .NET API now uses an event property rather than a parameter in OpenSession to specify a delegate for handling errors returned by the cluster.
  • Event Viewer Tracing has moved from [Applications and Services Logs]\[Microsoft HPC Excel Pack] to [Applications and Services Logs]\[Microsoft]\[HPC]\[Excel] where it is organized into the traditional admin/operational/debug events. Significant expansion in tracing is also apparent and local tracing of has been added to help diagnose client problems.
  • Performance and robustness improvements.