Last updated 2/22/2012
In the Microsoft Multipath I/O Step by Step guide for Windows Server 2008 R2 two new MPIO registry keys were introduced (UseCustomPathRecoveryInterval and PathRecoveryInterval) to mitigate a transient error. We issued post-release guidance previously and have recently updated our guidance as described below.
For additional information about MPIO settings, please refer to the Microsoft Multipath I/O Step-By-Step guide here:
http://technet.microsoft.com/en-us/library/ee619749(WS.10).aspx
The two new settings were introduced in Windows Server 2008 R2 to help mitigate the issue detailed below:
The end result is that the system has at least one path and one device online, but no pseudo-LUN to represent that device.
MPIO has a path recovery mechanism that can be used to avoid this issue. However, by default, the period at which path recovery is attempted is set to twice in the PDORemovePeriod. In the majority of cases, the default is acceptable, but it does not solve the problem in this particular scenario.
This is where the CustomPathRecoveryInterval setting comes into play. They allow you to configure a timer that determines the period at which path recovery attempts are performed. By setting the PathRecoveryInterval to less than the PDORemovePeriod, the path recovery attempt executes before the pseudo-LUN is removed, the path is detected as back online, and the pseudo-LUN is saved from removal.
We recommend that you test the use of this value before widespread deployment in production to ensure that path recovery attempts are not happening so frequently that they have a significant impact on regular I/O.
As the default settings allow for the potential that a path recovery under high load may be missed, we are making the following updated recommendation around the use of these settings. Note however, that, as always, settings should be evaluated for potential impact in a test environment prior to implementing changes in production environments.
We now recommend that the keys above be considered for wider use since they have the potential to allow path recovery under load in situations that might otherwise result in a path failure and I/O delays.
A warning about this value. The PathRecoveryInterval controls how often MPIO will check to see if the device has returned after an error. This translates to a greater amount of traffic to the array. Caution should be used when implementing this setting, as implementing this change with a value that is too low may cause adverse performance impact.
Our general guidance going forward for this setting is as follows:
It is also important to note that the PDORemovePeriod must be set to a value less than the global Windows Disk Timeout, to allow path recovery prior to I/O timeouts. For more information on the global Windows “Disk” timeout registry key, please see the article link at the end of this post.
For example:
If the Windows Disk timeout is 30 seconds
AND
The PDORemovePeriod is 25 seconds
Then a good starting point value for PathRecoveryInterval would be 15 to 20 seconds.
Setting
Definition
HKLM\System\CurrentControlSet\Services\mpio\Parameters\
UseCustomPathRecoveryInterval
If this key exists and is set to 1, it allows the use of PathRecoveryInterval.
PathRecoveryInterval
Represents the period after which PathRecovery is attempted. This setting is only used if it is not set to 0 and UseCustomPathRecoveryInterval is set to 1.
Regardless of the values that you choose for MPIO, it is crucial that the following rules be used when setting the timeouts referenced in this article:
Note: The settings detailed in this article are also useful in the recovery of paths with the iSCSI Initiator and MPIO.
Additional References:
http://blogs.msdn.com/b/san/archive/2011/09/01/the-windows-disk-timeout-value-understanding-why-this-should-be-set-to-a-small-value.aspx
Thanks,
The MPIO Team