Failover Clustering and Network Load Balancing Team Blog
Since Failover Clustering traditionally requires shared storage (you could also use hardware or software replication between nodes), we get a lot of questions about which shared storage types we support. Windows Server Failover Clustering has a very flexible storage model that allows a wide variety of storage and volume management solutions from 3rd parties to integrate and extend the functionality of clustering. One common question I commonly get asked is around Dynamic Disk support on Windows Server Failover Clusters, so I thought I would take a moment to address this.
Yes, they absolutely are, however support is not provided natively in-box in Windows Server for Failover Clusters. It requires an add-on product from Symantec called Storage Foundation for Windows Server to enable support of Dynamic Disks on Windows Server Failover Clusters. You can learn more about the Storage Foundation for Windows product here:http://www.symantec.com/business/storage-foundation-for-windows
This KB article also discusses support for Dynamic Disks on Windows Server Failover Clusters: http://support.microsoft.com/kb/237853
Dynamic Disks do provide a number of different features, so it is good to think about the applicable scenarios as it is common for people to have misunderstandings.
I commonly hear three answers when I ask this question:
Well, did you know that you actually don’t need Dynamic Disks to accomplish those?
While Basic disks that use MBR partition table only support a maximum of 2 TB partitions, basic disks that use GUID partition table (GPT) disks enable partitions that are greater than 2 TB and are fully supported on Failover Clusters, using Windows Server 2008 and later. If you happen to still be using Windows Server 2003, you can add support for GPT based disks with a post Service Pack 2 hotfix, available at: http://support.microsoft.com/kb/919117
If you want to learn more about the advantages of GPT disks, here is a good FAQ: http://www.microsoft.com/whdc/device/storage/GPT_FAQ.mspx
So, you can create large volumes with in-box Basic GPT disks.
With Windows Server 2003 and later you can dynamically increase the size of a partition on a Failover Cluster. There is a simple right-click option in the Disk Management (DiskMgmt.msc) snap-in to “Extend Volume”. Starting with Windows Server 2008 R2 another new feature was added so that you can now not only extend a volume, but you can also “Shrink Volume”. This can be done on clustered disks with no downtime.
In Windows Server 2003 this needed to be done via the command line with DiskPart, as described in this KB article: http://support.microsoft.com/kb/304736
So, you can dynamically grow or shrink volumes with in-box Basic disks.
Most Failover Clusters are deployed using external storage (Fibre Channel, iSCSI, FCoE, or SAS), and all clusterable SAN class storage supports Hardware RAID. So most customers choose to go with the Hardware RAID they already have in the storage array instead of using Software RAID.
Some customers use low cost commodity JBOD's and then want to use software RAID to create resilient shared storage. With Windows Server 2012 this can be accomplished using the in-box Storage Spaces feature. Spaces creates resilient storage with software from external shared JBOD's, and is fully supported on a Failover Cluster.
So, you can create resilient storage from JBOD's with in-box Storage Spaces.
There are fewer reasons why you might need Dynamic Disks these days, since much of this functionality is now possible with Basic disks. What are the reasons why you might actually need them? That is fair to discuss as well and I commonly hear two answers
With spanning volumes, that really is a matter of how you do your SAN management when increasing capacity. Most storage arrays these days support dynamically expanding the size of a LUN. As I said earlier, with Basic disks you can dynamically increase the size of that volume to match the new larger LUNs. You can also use Thin Provisioning to create LUN's, but not fully allocate the disk space at provisioning time. However some people prefer to concatenate LUNs and span a single volume over multiple LUNs, then they just create a new LUN and span the volume over that new LUN when they want to add capacity. For old-school IT departments that are highly silo'd, I sometimes hear this is ‘easier’ for SAN admins to just create a new LUN, opposed to tracking down the right LUN and expanding it. I have no right or wrong answer for you here, it’s a matter of how you manage your SAN’s. With Storage Spaces you can use a Simple Space to combine multiple physical disks into a single logical volume, however Spaces is designed to be a SAN alternative... not another layer to put on top of your SAN. So it doesn't fit the bill for what most people are looking for in this specific scenario.
I hope this helps in understanding that Dynamic Disks are supported with Windows Server Failover Clustering with the add-on product from Symantec Storage Foundations for Windows. Microsoft has continually strived to build functionality into Windows that provides ease of use and convenience for server administrators. This is the case with the ability to dynamically expand and shrink volumes and create large volumes using GPT formatted disks. As mentioned, using Dynamic Disks do provide a great add-on product that extends the functionality of Failover Clustering and I hope the above information allows you to make informed choices on what is right for you.
Thanks,Elden ChristensenPrincipal Program Manager LeadClustering & High-AvailabilityMicrosoft
One benefit of dynamic disks in a cluster out of the box still exists though: You'd be able to migrate disks from one storage array to another transparently to the application and without interruption of the service.
Right now you need third party tools for that.
Just to make sure I understand, you are suggesting another way that Software RAID can be leveraged is that when doing a migration from one storage array to another that you can present a LUN from the new array to the same server then create a RAID 1 mirror between the two LUN's, and then break off the old LUN. Which will allow you to effectively mirror and copy all data to the new LUN with no downtime. While not the mainstream use for Software RAID, this is a creative solution for this scenario and you are absolutely correct.
yes, that's exactly what I mean. It would be very very helpful.
That feature alone is worth the price of the Symantec storage foundation because it requires zero downtime but a minor performance hit during this process of storage migrations.
What is the agreement betwwen Microsoft and Symantec regarding support for Veritas Foundation and Windows 2008R2 and SQL2008r2. The validation of the cluster environment seems to indicate that it is not supported. Is their a joint statement from both partties regarding support? If yes, what is the exact details around the support structure?
I have 2 servers in a Hyper-V cluster currently connecting to the same SAN iSCSI LUN for the CSV
I would like add another SAN node to the mix, present the new LUN thru the iSCSI initiator as a new disk to both Hyper-V hosts
I then wanted to convert the current CSV disk to a dynamic disk and the mirror the two LUN's/disks and present them as a single disk to both Hyper-V hosts
Is this doable with the symantec overlay, and what specifically do i need??
Sorry but I don't agree. Software RAID would be an ideal solution to use in a geocluster scenario (2 datacenters separated geographicaly). Even if the SAN from 1 datacenter support many kind of RAID systems, I would need to mirror the shared disk between them (for example in a CSV storage system) so if 1 datacenter burns in fire, there's no service downtime due to that mirror.
Even Linux and Solaris distros support clusters with software mirroring, so that of not letting do that if not using 3rd party software like that from Symantec, lead me to the guess Keith did two posts before about an agreement between Microsoft and Symantec to maintain it like it's now.
Some good discussions, but somewhat down different paths. Some answers / thoughts
Software RAID in its traditional sense is a way of configuring a set of JBOD disks in a RAID1 mirror or RAID5 parity set for redundancy of the loss of a disk. For protection from hardware faults, since Hardware RAID is available on all cluster class storage, hardware RAID protection is the obvious choice. RAID0 stripping is also another option, and there used to be theories that software stripping in the OS across multiple LUN’s (even with hardware RAID) gave better performance. The Windows storage stack has evolved in current versions that this no longer holds true.
Data Migrations – As Robert described, creating a mirror can be a handy trick to do a data migration. Traditionally moving from one SAN to another, where you present a new LUN from the new storage array and then create a mirror between the new and old LUN. Once sync’d, you break the mirror, and now you have a simple seamless move of all data to the new LUN. While not what it was intended to be used for, I’ll admit that this is a clever solution to the scenario.
Software/Host based Replication – Akuma, what you describe I would characterize more as data replication software… not traditional Software RAID. Replication solutions are used for disaster recovery scenarios and conduct replication between two different servers, which could be either synchronous or asynchronous replication. Since a host based replication solution for DR could be asynchronous, this is why I do not consider this a RAID1 mirror Software RAID. DR replication is another feature Symantec Storage Foundations for Windows can provide, additionally there are a number of solutions on the market that integrate with Windows Server Failover Clustering to pick from for DR scenarios as well; such as Double-Take Availability or SteelEye DataKeeper Cluster Edition to name a few.
Jens, to your question around CSV support with software/host based replication solutions. No, none of the above mentioned software replication solutions are compatible with CSV. However, using Sanbolic (which also integrates with Failover Clustering) with its integrated software replication may be an option for you.
This is great feedback for us to be thinking about, thank you!
How about host level queuing performance issues with large LUNs? ( en.wikipedia.org/.../Little%27s_law ) By using multiple LUNs, you can distribute IO across multiple LUNs relieving the problems with IO limitations of a single queue.
A decade ago, I would have totally agreed with you. But it is outdated thinking these days. With any modern storage array, LUN's are completely abstracted these days. They are just logical representations and the storage array's handle intelligently placing data on the spindles and new technologies such as storage tiering and SSD's have really changed this. On the Windows side, the Windows storage stack has evolved significantly that carving up multiple LUN's in an effort to help Windows more intelligently queue across them really doesn't apply anymore with current OS releases.
Anyone know if Windows 2008 (non-R2) supports shrinking a disk in a cluster? The document clearly states 2008 R2, but in my case i have vanilla 2008.
I think what we really need is support for a SSD PCIe card in each cluster box. A super good card is $4000 each, but you could put one in each server that is in a cluster...at least two member ones.
If Windows clustering or the drive subsystem (or both) could replicate data between these two SSD card(and quickly) you could essentially not have the expensive of an external shared storage device and get even better performance.
In the case of SQL server you might want to add writing to the log (.LDF) file on both cluster nodes so that it gets written much faster. You might need two SSDs or two partitions to make that happens, but you can imagine the speed you'd get.
Now we are stuck using mirroring if we want our server to have super fast SSDs PCIe cards.
basic disk (partitioned disk): you can expand a partition only if its the last partition on the disk, ie there are contiguous blocks on the disk. This means if you were to expand a simple disk in future, you have to plan in advance and make sure you either put that volume as the last partition on the disk or you put only one partition per disk.
Also, dynamic lun expansion, expanding backend array LUN and running diskpart , though its a supported way, from a risk perspective, I believe adding a LUN is much more straightforward than expanding a backend array LUN.
There are numerous other benefits in dynamic disk which helps operationally, storage migration, 2nd mirror break off, software based BCVs, snaps etc..
This article is now 3 years old. I would like to use dynamic disks in a cluster out of the box : Id be able to migrate disks from one storage array to another transparently to the application and without interruption of the service :
I present a LUN from the new array to the same server then create a RAID 1 mirror between the two LUN's, and when sync'ed, then break off the old LUN. Doing so, I mirror and copy all data to the new LUN with no downtime.
Does same limitation exists for Win2008R2 ?