One question that comes up quite often is whether HMC Provisioning is "thread safe" i.e.. does it support multiple simultaneous provisioning actions. The answer to this question can be complicated, the simple answer is "yes, but there are caveats."

To understand the caveats better we have to understand the basic architecture of HMC Provisioning. For this discussion we will break the architecture into 3 blocks.    

HMC and MPF Namespaces

Layered business/service logic defined as XML workflow descriptions.

MPF Engine

COM service that executes requests based on workflows defined in the namespaces. This is the component that provides transaction based compensation/rollback support.

Providers and Underlying Product APIs

DLLs that run under the context of the MPF Engine, these DLLs wrap the Product specific APIs that are used to perform provisioning actions.

Now to understand the impact each of these components can have on the HMC Concurrency story we will start from the bottom and work our way up the stack.

Providers and Underlying Product APIs

The core provisioning capabilities of HMC are generally defined at, and limited by the underlying Product APIs, and specifically the way in which the Provider DLLs expose the product APIs. This is particularly true when it comes to the concurrency characteristics of HMC. The easiest way to explain this is to look at two core Providers and their concurrency characteristics.

  • SQL Provider: This provider supports execution of SQL Stored Procedures and ad-hoc requests within the context of a DTC transaction that can be bound to executing thread inside of the MPF Engine (more on MPF Engine Process Controller threads below). This allows for transaction scoping all the way down to the underlying System in this case SQL, leaving the responsibility for compensation in the case of a failure to DTC and SQL. This is a very powerful feature of the SQL Provider though one must consider that this also implies that any locks on a table or row in SQL are scoped to the entirety of the transaction. This must be taken into consideration when designing named procedures for Namespaces which orchestrate multiple SQL Provider requests. The SQL Provider is used mostly by the ManagedPlans Namespace for almost every single Plan related operation
  • Active Directory Provider: This provider uses standard directory services interfaces, so while the API exposed by this provider is fully thread safe there is no transactional scoping beyond the MPF Engine. Therefore, actions taken on one thread can directly impact other threads, ie. if thread A deletes an object before thread B tries to retrieve it, thread B will be impacted by the change made by thread A. This also must be taken into account when designing highly concurrent systems particularly when dealing with procedures that act on global, or organization wide objects.

These kinds of variations in concurrency characteristics exist across most of the MPF providers, though the SQL provider is unique in that it is the only provider that supports transactional scoping all the way down to the underlying system. It is generally these variations that result in currency related failures within in HMC.

MPF Engine

The MPF Engine was actually designed for high levels of concurrency. Each incoming request is processed on a separate "process controller" thread. Each process controller thread is fully isolated from other process controllers within the context of the MPF engine.  Each process controller thread is also an MPF transaction; all actions performed within the context of a transaction are persisted and if a failure occurs, these actions will be rolled back. So while this component in and of itself enables high concurrency request processing, as a developer you must take into account that other components in the system specifically the other two blocks in the architecture Providers and Underlying Product APIs (discussed above), and HMC and MPF namespaces (discussed below) have a significant impact on the concurrency behavior of the overall system.

HMC and MPF Namespaces

This is where the majority of the business or service logic is defined. Namespace logic is defined as multiple layers of named procedures that either call other named procedures or execute provider methods. The layering and orchestration capabilities of MPF are extensive and very powerful. Unfortunately, this is also the root of almost all concurrency related failures in HMC, some are easily avoided others require careful design consideration or in some cases external throttling and/or retry mechanisms. The following are some high level examples of concurrency related failures in HMC and MPF Namespaces.

MPF Requests that operate on global, or organization wide objects can cause failures under high concurrency

While this may seem fairly straight forward there are some corner scenarios where this can cause concurrency related problems.
Let's take for example the scenario where a SharePoint site is being created at the exact same time as a separate SharePoint site belonging to the same organization is being deleted. One might not expect these two requests to have any impact on each other however both rely on the organization wide SharePointSites service pointer for tracking purposes. If the request to delete a SharePoint site removes one of the site pointers at the same time that request to create a SharePoint site is enumerating the list of SharePoint sites the request to create a SharePoint site might fail because the Servicepointer is overwritten by the delete operation. This issue was alluded to above in the discussion about the Active Directory Provider, it is important to note though that any provider that interacts with Active Directory or other similar systems, Resource Manager has similar characteristics, is susceptible to this kind of failure.

MPF Requests that bundle multiple "locking" SQL Provider requests

This issue typically occurs when an MPF Named Procedure bundles two or more HMC Named Procedures, from the Hosted Namespace layer, that write to or read from the PlanManager Database. The root of this issue is that the Managed Plans Namesapce API utilizes the SQL Provider to manipulate the PlanManager Database. Since the SQL Provider establishes and holds locks for the duration of a transaction, transactions or requests that bundle multiple Managed Plans named procedures introduce an increased risk of SQL Deadlocks. SQL Deadlocks result in one or more of the MPF Requests failing.

Steps have been taken within the PlanManager database to try and prevent these deadlock scenarios under the most common scenarios where a customer might want to bundle requests for efficiency. However there are still some scenarios where bundling of requests will result in this failure. For example a transaction that attempts to add or modify a customer plan then subsequently assign the plan to a customer will fail under concurrency.
As a general rule one must take into account the cost of SQL transactions when designing highly orchestrated MPF named procedures, this goes along with considering the cost of rollback when bundling large numbers of procedure calls into a transaction.

OK now on to how do you design a custom namespace or process to facilitate successful bulk import or creation of organizations and users/mailboxes

As a general rule the HMC Provisioning System was designed to operate under high levels of concurrency. However, in a high scale, high volume HMC environment it may not be possible to avoid concurrency related issues completely. There are cases where a simple retry is the best solution to the problem. There are also cases however, where there are known failure scenarios and in these cases we strongly suggest that you take steps or put mechanisms in place to avoid concurrency, The most common of these cases are listed below

  • Bulk creation of Mailboxes within an organization.
    • It is best to avoid creation of a user, the mailbox and other mailbox features like UM in the same transaction. It is recommended to avoid these because of AD replication induced delays and PlanManager related concurrency failures.
  • Bulk creation and/or deletion of SharePoint sites within an organization.
    • Avoid deleting and creating the same site in the same transaction. Split the calls apart and put logic into the calling code to retry certain failed operations.
  • Bulk enablement of OCS users within an organization.
  • Avoid creating a user and immediately OCS enabling it. OCS Admin API does not have a preferredDomainController concept thus AD replication delay has to be taken into account.
  • Requests that bundle the creation/modification of a plan with assignment of a plan to an organization and or a user in the same organization

In general it is OK to bulk load multiple organizations in parallel however you should avoid the bulk provisioning of objects within a single organization in parallel. In other words requests to bulk load objects within a single organization boundary should be serialized.

Finally

While this covers some of the most common scenarios we see in support there are many others out there I am sure. Do you have a scenario not covered above which you are not sure if it is impacted by this discussion. Post a comment and I will be happy to expand the discussion 

Until next time (I promise it will not be 2 ½ years)

Mike