System Requirements:
- Windows Server 2008 R2
- Windows Server 2012
- Windows Server 2012 R2
The Problem:
You backup, right? Of course you do! Only the cool people backup – and you are one of the cool people aren’t you?
…If only life was that simple.
So imagine for a moment that you are attempting to use VVS and WIndows Server Backup to backup a server. In particular a fully loaded Hypervisor running Windows Server 2012 R2 Datacentre in this case.
The backup process goes OK for the most part, but fails to complete on a number (but by no means all) VMs. The process fails with the following errors on Windows Server 2008 VMs, but not necessarily newer ones.
From Windows Server Backup:
In the Hyper-V-VMMS\Admin log in Event Log:
and…
and…
and of course most helpfully…
More Info
If you actually look at the backup file, you will see what looks to be a complete file set for the backup, however given that this error represents an error in VSS, you would not be advised to trust it.
As usual with Hyper-V error logs, the error message have little if anything to do with the actual issue and someone in the Microsoft Development team just needs to be shot for it… but I digress.
The odd think was that the issue was occurring on all of the Windows Server 2008 (R1) VMs, the Windows Server 2008 R2 and higher VMs were backing up correctly.
The Fix
So before I get into the issue I encountered, lets run past the generic fixes
- Check that you have enough disk space on the volume to perform the VSS. If your volume is sub-15%, try using the Hyper-V Manager to change the snapshot directory to another volume – plug in an external NTFS formatted hard drive if you have to.
- Check the permissions of the VHD stated in the error.
icacls “C:\Users\Public\Documents\Hyper-V\Virtual Hard Disks\<VHD File>” /grant “NT VIRTUAL MACHINE\Virtual Machines”:F /TSource: Technet
Source: System Center Central - Ensure that Hyper-V is patched fully.
Windows Server 20102 R2 users see: https://support.microsoft.com/en-us/kb/2920151 - Run chkdsk on the physcial volume on the Hypervisor and on the virtual volume in the VM
- Ensure that the Integration Service Components are at the latest version and that they VSS Writer module for it is enabled in the VM properties in Hyper-V Manager
Now the less well documented approaches
- Check that you can manually checkpoint/snapshot the VM while it is running.
In Hyper-V Manager or in PowerShell, force a checkpoint on the VM and then delete it and wait for it to merge back. If this works, you are not having a physical VSS issue. If it fails, you need to troubleshoot this and not the WSB error. - Live Migrate the VM off of the current server and onto a different Hypervisor, attempt the backup here, then bring it back to the original server and try again. This process will reset the permissions on the VM file set. If you cannot live or offline migrate the VM, the you need to troubleshoot this and not the WSB error.
My fix
In my case, the issue was to do with having the VM VHDX files split across a couple of different storage LUN/volumes. I usually move VM page files onto a dedicated partition on a dedicated spindle (usually an SSD) and leave OS and data volumes on larger arrays. This helps to keep the VMs running smoothly and keeps unnecessary paging operations off of parity checked storage volumes.
So imagine that the VM has the following file storage structure
Physical Hypervisor SSD (this is where the Page File’s live)
– D:\my-virtual-server-d-drive.vhdxPhysical Hypervisor Storage Array
– E:\Virtual Machines\<VM Name>\Planned Virtual Machines
– E:\Virtual Machines\<VM Name>\Snapshots
– E:\Virtual Machines\<VM Name>\Virtual Hard Drives\my-virtual-server-c-drive.vhdx
– E:\Virtual Machines\<VM Name>\Virtual Hard Drives\my-virtual-server-e-drive.vhdx
– E:\Virtual Machines\<VM Name>\Virtual Machines
It is actually this structure which breaks the WSB backup job. Contrary to the VSS event log error, the problem drive is NOT my-virtual-server-c-drive.vhdx it is actually my-virtual-server-d-drive.vhdx. The event log will actually log that the error was caused on the first drive attached to the system bus (I think).
If you weren’t too clever when you followed my advice above and live migrated all of the storage to the same location on a different Hypervisor, you probably found this out for yourself – the backup should have worked.
When you split the job back into separate LUNs, it fails again. The fix is oddly simple and continues to allow you to have split LUN storage if you wish. Change the file system structure to:
Physical Hypervisor SSD (this is where the Page File’s live)
– D:\<VM Name>\my-virtual-server-d-drive.vhdxPhysical Hypervisor Storage Array
– E:\Virtual Machines\<VM Name>\Planned Virtual Machines
– E:\Virtual Machines\<VM Name>\Snapshots
– E:\Virtual Machines\<VM Name>\Virtual Hard Drives\my-virtual-server-c-drive.vhdx
– E:\Virtual Machines\<VM Name>\Virtual Hard Drives\my-virtual-server-e-drive.vhdx
– E:\Virtual Machines\<VM Name>\Virtual Machines
Note the introduction of a folder on the D drive with the same name as the <VM Name> folder on the E drive. Do NOT shutdown the VM and move the storage there yourself, use the Hyper-V Manager or PowerShell processes to perform a “move” on the ‘storage only’ and just move the one drive. This will ensure that permissions are correct.
The next time that you run the backup, it will VSS correctly.
As for why it can do this on its own with Windows Server 2008 R2 or higher VMs, but not Windows Server 2008 or lower VMs… I have no idea although I suspect it to have something to do with the capabilities of the integration services components.
Edit: A post publish search on the issue reveals that I’m not alone in working this out
View: Technet