Cluster Resource Dependency Expressions

Cluster Resource Dependency Expressions

Rate This
  • Comments 6

Cluster resource dependencies used to be a fairly limited relationship.  A resource could depend on one or more resources (we call them “provider” resources in the context of the dependency relationship).  This dependency relationship was an “and” relationship, in that the dependent resource R depends on provider resource P1 AND provider resource P2, etc.  In other words, every one of the provider resources must be online for the dependent resource to be online.  If even one provider fails, then the dependent must be terminated (while we restart the provider and/or invoke group failover, etc.).

There are scenarios, however, where something more flexible is called for.  For instance, you may have multiple redundant network interfaces in your server, and as long as one of them is functional, your highly available service can keep running.  Or perhaps you’ve got both IPv4 and IPv6 addresses allocated, and you only need one of them up and running. 

So what you really want is the ability to say that my highly available service can keep running as long as provider P1 OR provider P2 is alive.

 

New and Improved

Enter Dependency Expressions, the new way of specifying resource dependencies in Windows Server 2008.  There are two advantages here: now you can specify all of your resource dependencies with one expression string (and one API call), instead of having to repeatedly call AddClusterResourceDependency for each provider.  Secondly, you can now specify “OR” relationships between providers. 

Old way:

                AddClusterResourceDependency( r, p1 );

AddClusterResourceDependency( r, p2 );

New way:

                SetClusterResourceDependencyExpression( r,  L”([p1] or [p2])” );

              

Note: for convenience, you can use the provider resource’s name inside the square brackets.  However, the parsing becomes difficult if the resource name happens to contain a square bracket itself (e.g. “Data Disk A]”).  In this case, you must use the resource’s ID, which you can obtain via the CLUSCTL_RESOURCE_GET_ID resource control. 

 

Power Good, Complexity Bad

Here’s the conundrum: one of the major goals of the new Failover Clustering product was simplicity.  We made great strides in reducing the complexity involved in creating a cluster, in configuring a clustered application such as a file share, and in many other operations.  So when faced with the question of just how much flexibility to allow in these new dependency expressions, there was a tradeoff.  How do we add this powerful new feature without making it an administrative nightmare?

If any valid Boolean expression involving ANDs and ORs was allowed, then we risk ending up with spaghetti dependencies like “(P1 AND P2) OR P3 AND (P4 OR (P5 AND P6))” … well, you get the picture. 

We also considered the possibility of “m of n” dependencies, in other words “4 out of 10 of these resources must be online”.   How do you specify that in an expression – and expose it in the UI in a human-friendly way?  And is this a compelling enough scenario to justify the extra complexity?

Finally, what about  priority?  E.g. you might want to have both P1 and P2 brought online before the dependent, but maybe P2 is totally optional, while P1 is not.  P1 OR P2” doesn’t quite capture this. 

 

The End Result: ANDs of ORs

In the end, the balance we arrived at was to allow “ANDs of ORs”.  In other words, you can have groups of OR dependencies that are all ANDed together, like this:

( [P1] OR [P2] ) AND ( [P3] OR [P4] ) AND [P5] AND ( [P6] OR …

This enables us to provide the functionality that enables some powerful new failover clustering scenarios (including geographically distributed cluster, or “multi-site clusters” as they’re also called, with some nodes in one location and some nodes in another), while at the same time not making the new API so complex as to be unusable.

 

Solution to the “Priority” Example

Also, as it turned out, the priority example above can be handled (in a fairly awkward way, admittedly) with the existing dependency expressions.  Say you have essential provider P1, and non-essential provider P2.  P1 has to stay up, but P2 can fail without dire consequence to the dependent resource or group. 

If you set the dependencies as “P1 OR P2”, that wouldn’t work, because P1 could fail, and as long as P2 was online, the dependent resource would stay online.  Similarly, “P1 AND P2” doesn’t work because if P2 fails, the dependent will be terminated, and that’s not what we want, since P2 is optional.

Well, if you use the Boolean trick that “TRUE OR X” is true for all values of X, then you could create a dummy resource (perhaps with a simple script and the GenScript resource) that was always online, and never fails.  Then you can use the following dependency expression to give P2 the “optional” status we desired:

                [P1] AND ( [P2] OR [AlwaysOnline] )

where AlwaysOnline is the dummy resource.

 

Thanks,

Jonathan Fischer

Leave a Comment
  • Please add 6 and 2 and type the answer here:
  • Post
  • Are the OR dependencies valid for adding new volumes for SQL Server? I'd like the ability to add new Drives to a SQL Server instance with zero downdtime. Is it possible? Will be possible with SQL2008/Win2008?

    Thank you, Eladio

  • PingBack from http://www.travel-hilarity.com/airline_travel/?p=1002

  • There are two answers to your question:

    First, OR dependencies may (in face probably) need to be specifically supported by the resource type.  One can't necessarily just set up OR dependencies under any old resource and assume that they will work.  For instance, we had to make some major changes to the builtin Network Name resource type to enable it to support multiple IP addresses in an OR dependency.  Because Netname has a fairly intimate relationship with the IP addresses that it depends on, it needed to be coded in such a way that it can deal with IP addresses coming online and going offline underneath it.

    So in other words, if you added disks in an OR dependency under SQL, and SQL's resource type wasn't coded to support it, then SQL itself may fail in its health check if it detects that one of its databases is inaccessible because of a failed disk.

    (And I don't happen to know the current or future state of the SQL resource type as far as plans for OR dependency support goes.  Perhaps one of our program managers can answer that question).

    However, what you are specifically talking about doesn't need OR dependencies.  In Win2003, all dependency modifications required the resources to be offline.  But this is no longer a requirement in Win2008.  If resource D is online, and you want to make it dependent on resource P, then as long as resource P is online, you can add the dependency without taking D offline.

    Thanks,

    Jonathan Fischer

  • Cluster resource dependencies used to be a fairly limited relationship. A resource could depend on one or more resources (we call them “provider” resources in the context of the dependency relationship). This dependency relationship was an “and” relationship

  • Cluster resource dependencies used to be a fairly limited relationship. A resource could depend on one or more resources (we call them “provider” resources in the context of the dependency relationship). This dependency relationship was an “and” relationship

  • マイクロソフト Clustering and High Availablity ブログ

Page 1 of 1 (6 items)