Adventures with Hyper-V and Backup

Adventures with Hyper-V and Backup

  • Comments 22

A while ago I talked about how I use Hyper-V in my house.  One of the problems that I identified with my current setup was that I had most of my virtual machines (except for the Windows Home Server) running on the same disk as the system disk for the management operating system.

Apart from being a bad practice in general – this has always concerned me as that disk represents a pretty large single point of failure in my server (if that disk fails I will lose my domain controller, FTP server, SCVMM server, SCOM server, MED-V server and WDS server).

Recently I also discovered that the disk in question is the oldest (and slowest) disk in the system – and this is causing performance issues for all of the virtual machines running off of it. 

Given all of this I decided to shuffle some disks out of other systems in my house and setup a higher performance two disk mirror for my system disk.  This would at least address the issues of performance and resiliency to disk failure.  The problem I faced was how to transfer my current system disk to a new RAID configuration.

After some failed attempts at using various cloning programs out there – it struck me that this was an ideal use of our backup technology.  I would just backup the current system disk – and restore it to the new physical disk.

As this was just going to be a “once off” backup – I did not want to spend the time to setup a full enterprise backup solution (like DPM) but just wanted to use Windows Server Backup.

I knew that Windows Server Backup does not support Hyper-V by default – so went off to get the details of how to enable this from the appropriate KB article (http://support.microsoft.com/kb/958662) and was pleasantly surprised to find that a “Fix it” has been made for this issue – so I was able to complete this step without too much trouble.

Side note: You may wonder what happens if you do not enable this fix it.  Simply put, by default Windows Server Backup will not engage our VSS backup components.  This means that it will just copy the files of the virtual machines without doing anything to prepare them for backup.  If your virtual machines are turned off – this is fine.  If your virtual machines are running – this can result in your backup having corrupt data in the virtual machines (but it will not affect the currently running virtual machines).

Once you enable the fix it – there is nothing in the Windows Server Backup user interface that indicates that anything is different.  But now when you backup a drive that contains virtual machines we will either use VSS inside the virtual machine in order to guarantee a valid backup is taken – or we will momentarily put the virtual machine into a saved state (if VSS is not supported by the guest operating system) and resume it after the backup is taken.

Most of my virtual machines support VSS, but I did fire up a Windows XP virtual machine just to watch the backup progress – otherwise there is no way for me to know that anything actually happened to the virtual machines :-)

I then fired up Windows Server Backup and requested to do a custom backup, and selected to only do a “Bare metal recovery” backup.  This meant that I was able to backup my system disk without backing up the (rather large) data disks used by my Windows Home Server virtual machine:

image

But then things started to go sideways.

On my first attempt, the backup failed after 10 minutes with an error message that stated:

“(0x81000101) The creation of a shadow copy has timed out. Try this operation again.”

Searching on this error message revealed nothing of particular interest – and as I was backing up the system due to slow performance of the disk I was trying to backup – I figured this was not too surprising.  So I decided to do as the error message advised – and try again.

The second attempt got further – about 30 minutes in – when it failed with an I/O error message.  A bit of investigation quickly revealed that the USB disk that I was trying to back up the system to had chosen this particular point in time to die.  Hmmm… Ominous.

For the third attempt I tried to backup over the network to my main desktop computer (after having to shuffle a lot of virtual machines around to make space).  This time I received an error message that stated:

“(0x80042336) The writer experienced a partial failure.”

Sigh.  At least I knew about this error message.  Basically – VSS (the backup infrastructure in Windows) prefers to have applications either succeed or fail an entire backup process.  The problem that we have is that we can succeed on all but a single virtual machine – in which case we need to report failure back to the backup application, but we also need to indicate that a specific virtual machine caused the problem.

Seeing this error message I went to check the event log.  Looking in the Admin section of the Hyper-V-VMMS log showed me that it was my FTP server that had caused the problem:

image

From here I went to look in the event log inside my FTP server. 

At first I checked the System log – and saw a number of error messages from the VDS Basic Provider that stated:

“Unexpected failure. Error code: 490@01010004”

One of these occurred around the time of the failed backup – but there were a number of other instances that did not appear to correlate to any backup activity.  A quick web search turned up this KB article:http://support.microsoft.com/kb/979391 that explains that this is a benign error message that can be safely ignored.

Next I checked the Application log – and saw an error message at the right time that looked like the culprit:

“Volume Shadow Copy Service error: Unexpected error calling routine.  IVssBackupComponents::SetContextInternal.  hr = 0x80042301, A function call was made when the object was in an incorrect state.”

Unfortunately searching on this error message revealed nothing but random people struggling with random variations of the error message – and none of them related to Hyper-V.  After reading through a number of these I decided that the layman's interpretation of this error message was “something went wrong deep in guts of VSS”.  With such insight in hand I decided that I would just give it another shot.

The fourth time the backup went through without a hitch.

I honestly did not expect this process to be so painful – but the nice thing is that (with the exception of my Windows XP virtual machine, which does not support VSS) through this whole process none of my running virtual machines were disturbed.  In fact – I was watching video streaming off of one of them for pretty much the entire time.

Unfortunately this story is yet to have a happy ending – as while I have been able to confirm that a valid and complete backup was taken (ironically by restoring the backup to a virtual machine on my Hyper-V server – which worked fine) I cannot get the darned thing to restore to my new disk configuration.

So for now my server continues to run a little slow on the old disk, and I am hunting down Windows Server Backup people to try and figure out why my restore is failing.  On the plus side – if I do have a hardware failure now I will have a valid backup to restore the system from (once I get that part working).

Cheers,
Ben

Update: This discussion is continued at http://blogs.msdn.com/virtual_pc_guy/archive/2010/03/10/adventures-in-backup-continued.aspx

Leave a Comment
  • Please add 6 and 8 and type the answer here:
  • Post
  • Sounds like Deja vu. I use backup exec with the hyper-v agent on 6 vm servers. Two of them just refuse to backup. It's trying a new thing each night with the hopes it will finally work. The vss is blessing and a pain all at once. I am almsot at the point of giving up and having the servers shutdown during the backup.

  • And now you see why it's so hard to trust this... I run HyperV in a production environment with four VMs - one of them running SQL 2008 - and I never feel safe when the backup is taken.

    There's no guarantee that a backup is good, unless I test every single one. I can't trust the platform as it is. Too many false signals, noise and error messages that are not documented anywhere else.

  • Hi Ben,

    This is exactly why i use my powershell script to inform me if the last backup was successfull. I have had some wierd crashes causing the backup service to fail. :\

    http://mindre.net/post/Backing-up-Virtual-Machines-using-Windows-Server-Backup-in-Server-2008-R2.aspx

  • Yes.  On one hand I still think this is really cool technology (I mean - my virtual machines never missed a beat through this whole process) but clearly this is something that needs to be more robust / reliable.

    Cheers,

    Ben

  • This is one area where VMWare seemingly has the upper hand sadly. I've tried Backup Exec, Windows Backup and DPM and they all seem to randomly fail with VSS errors.

    DPM was the most entertaining as it snapshots every 15 mins and you can imagine the amount of error logs you get in a day on 20 servers (physical and VM).

    VSS seems to need _way_ more error checking internally and some sensible error messages would be nice too. Actually scratch that, if it "just worked" I wouldn't care about error messages cos I'd never see them :)

  • Hey Ben,

    Can you not just install W2K8 R2 on your new disk configuration and then restore your old W2K8 R2 backup over the top?  Have you tried this?  That may be a route forward.  I'd suggest other than that it is likely that the restore isn't correctly starting up the necessary RAID controller drivers.  The old way around that would be to install the RAID controller drivers before taking the backup.  Not sure if that still applies to W2K8R2 as not done that for a while! :o)

    Good luck

    Janson

  • Hi Ben,

    Sorry that this is off-topic but I'd appreciate your comments on this.

    I read here (http://www.microsoft.com/windows/enterprise/products/mdop/med-v.aspx) that MED-V SP1 will support both 32 and 64-bit guest operating systems when it is released.

    "MED-V 1.0 SP1 with support for Windows7 (32bit and 64bit) will be available in the first quarter of calendar year 2010.

    MED-V 1.0 SP1 will rely on Virtual PC 2007 technology, and will not require hardware-assisted virtualization (e.g. Intel VT, AMD-V)."

    Since this is based on Microsoft Virtual PC 2007 technology, this would imply that MS are working on making Virtual PC work with 64-bit guests.

    Will there be a standalone version of Virtual PC 2007 which support 64-bit guests in the future ?

    Are you able to comment on this at all ?

  • It is good to see I am in the same boat.  The biggest downfall of Hyper-V is that reliable backups are nearly impossible in my experience.  It does not matter what backup software you actually use because the problems always arise from VSS.

    We have had to revert back to "within" backups of guests or a shutdown/suspend + robocopy.  It is just not possible to rely on VSS in production.

  • Janson -

    Yes, that idea has crossed my mind - but I would like to figure out why a bare metal restore is not working on my specific hardware - as this is my prefered route.

    Paul Lynch -

    The 64-bit reference here is support for 64-bit host operating systems - which MED-V does not support today.

    Dave / Doug -

    After this experience I have been thinking about trying to setup nightly back ups of my system so that I can get a better feel for the sorts of issues encountered.  The thing that really annoys me is that (apart from the busted USB disk) both of the errors I encountered had no corrective action other than "try it again".

    Cheers,

    Ben

  • By the way -

    You may be wondering why I made a blog post that seems to shine a negative light on our backup functionality.  

    The main reason for this is:

     - I blog what I see, good or bad

     - Having spent a couple of hours getting this working I wanted to share my experience, so that others would know what to do in a similar situation (e.g. how to diagnose a 0x80042336 error)

    I still think our back up functionality is pretty darned cool - and remain impressed that even while I was seeing these random failures - I suffered no downtime.  But clearly we do need to make this system more reliable / predictable.

    Cheers,

    Ben

  • Hi Ben

    If you haven't researched futher already, this link may help: http://social.technet.microsoft.com/Forums/en/windowsbackup/thread/268e5d38-fc99-41b0-9d79-31e6f5e98d96

    My understanding reading this is that Bare Metal Recovery is only possible/supported on similar hardware, so the change of drive controller is likely causing the issue.

    Hope that helps

    Janson

  • Regarding no downtime - that is true, until the only way to resolve a VSS issue becomes "reboot the server." :)

  • Ben,

    as an interim and additional saftey measure for your "eggs in one disk" problem:

    Grab a copy using disk2vhd, it should still let you watch tv while you run it and you then you will have hedged your bets with both a backup to restore and a vhd file to play with.

    If you get time you could push for disk2vhd2disk.

    Good luck

  • Ben,

    Great time to post this. I was actually thinking about how I want to back up my Hyper-V R2 VMs. Since I have JBOD with VMs on them so i could have as many spindles as possible I have the same single point of failure as you.

    I'm thinking about getting a large external USB drive and just let it back up to there every other day or so. Only down side after reading the KB article is you can only do a whole drive not individual VMs.

  • To offer some contrasting perspective to these comments, I would like to say that I am running a production Hyper-V environment with a fully virtualized corporate server infrastructure.  Many of the VMs have undergone P2V migration, restoration from backup, or OS upgrades – and they continue to work as expected.  I use DPM for more granular backups during the week and WSB for full images of the Hyper-V hosts + VMs on the weekend.  And I have found this to be the most stable, complete, and efficient backup system I’ve ever encountered, though it took a while to work out the bugs.  More to the point, I feel safer than ever that I will be able to revert or recover when necessary, and IT management is much easier.  So thanks MS, and in particular, thank you Ben for all the great virtualization work you have done over the years – it is inspiring to me, and it keeps getting better with every release!

    I like watching the Hyper-V manager when the backups run.  In the status column it shows saving/restoring for VMs that hibernate, or “Creating VSS snapshot set…succeeded!” for those with full integration services enabled.  This gives me a warm, secure feeling inside.

    That being said, I do think the VSS infrastructure can be dramatically enhanced and improved to address the issues described in this post (I’ve dealt with many of them myself), but I have faith that the developers will get there in time, as more R&D investments are made in the technology.  A big complaint I have is that initially MS did not implement backing-up certain things with WSB.  Exchange 2007 didn’t get it until SP2.  DPM 2007 can’t use WSB to make a full image of itself (system volume only, not the replicas).  And as for Hyper-V, I would have thought the registry key enabler for WSB would have been added automatically in 2008 R2, and was surprised it wasn’t.

    In short, if it is a MS product, WSB (or its future incarnation) should be able to back it up natively with VSS, right out of the box…period!  Personally, I think lack of support for certain applications was a deliberate attempt to push DPM on enterprise customers.  But now I use both technologies in tandem, and am pleased with the level of protection this combination offers.   Full images are vital to have in a crunch, plus DPM incremental grabs for good measure.

Page 1 of 2 (22 items) 12