Netgear ReadyNAS Duo v2 as a Windows Server Backup Target across SMB while allowing differencing in the backup type

System Requirements:

  • Netgear ReadyNAS Duo v2 or any SMB capable NAS
  • Windows Server 2008, 2008 R2, 2012, 2012 R2

The Problem:

One of the most frustrating “features” of Windows Server since the release of Windows Server 2008 has been the backup set. Windows Server Backup added support for backing up to SMB, however only if you perform a full, rather than incremental or differential backup of the host server.

The main problem with this is the time it takes to perform the backup. Depending on the size of the disk array involved, a normal backup job can take tens of hours, even days. If you want to run the backup job daily and the job is taking more than a day to complete while saturating the network, then it is not a very effective backup solution.

Yet the real power in using the network in the first place is the fact that it permits the distribution of the backup to a remote location without the need to go and physically disconnect a drive and carry it. The drives can also be a lot further away than with USB, eSATA or firewire. In another building or in another country.

Further more, the array is more expandable than a typical USB disk. The maximum supported size of this little ReadyNAS duo v2 is 2x4TB in RAID 0, resulting in 8GB of storage. I could also run it in RAID 1 if I needed higher levels of data security. This is much better than a typical USB disk. With a 4 or 8 bay NAS, you can even grow the array by adding new drives and expand the VHDX file according to you needs (up to the limit of the native NAS file system or the NTFS volume limit). Devices with more bays also allow for additional RAID types and associated data security such as RAID 5, 6 or 10.

In many situations, you can use iSCSI for this purpose. Most high end and Enterprise NAS storage and SAN solutions are designed to provide thick or thin provisioned iSCSI targets which you can easily mount via the iSCSI initiator in Windows. An iSCSI mounted drive in Windows is – at least as far as Window is concerned – presented as a local disk and therefore you can perform a differencing backup under the control of Windows Server Backup (WSB).

So what can you do if you have a consumer grade NAS appliance or an old model device that does not expose iSCSI services? A device such as the Netgear ReadyNAS Duo v2? While the v2 version has an unofficial iSCSI Target plugin, this does not work on the v2 model and so having a very low power, ARM based NAS lying around with 6TB of disks in it, it seems a shame to relegate it to the dustbin.

More Info

Storage virtualisation is the answer.

Simply put, Windows Server Backup (WSB) cannot itself perform differential backups to a SMB share, however a SMB share (even a SMB 2.0 share) can host a virtualised storage disk… and Windows can mount a virtualised disk across SMB. Once mounted, WSB is agnostic to the underlying disk location or the fact that it is stored on a SMB share as Windows presents the disk as being locally attached and abstracts the ‘what’ and ‘were’ entirely to the virtualisation layer.

The Test

If you are going to attempt this, I strongly recommend that you enable Jumbo Frames on the device as you may be able to squeeze a 10-20 Mbps of additional write speed out of the device.

View: Netgear ReadyNAS Duo V2 and Jumbo Frames

  1. Ensure that your machine can connect to the ReadyNAS over SMB i.e. \\<ipAddress\<shareName>
  2. Create a share on the NAS for the backup. Create a dedicated one so that you minimise SMB file system update requests to the share. As will be mentioned below, this causes between a 10-20Mbps loss of performance even if nothing is actually happening in Windows Explorer.
  3. Disable as many services as you can on the NAS. The less work the CPU is doing and the more free RAM, the better this will be.
  4. Open PowerShell on Windows 8, 8.1, 10, 2012, 2012 R2 or 2016 and enter:
    New-VHD –Path “\\<ipAddress\<shareName>\Backup.vhdx” –SizeBytes 4096GB

    Substitute the 4096GB (4TB) with the size that you require. This will create a dynamically expanding Hyper-V Virtual Hard Drive on the ReadyNAS

  5. In PowerShell issue the following command to mount the VHDX file
    Mount-VHD -Path "\\<ipAddress\<shareName>\Backup.vhdx"

    Now use DiskPart or Disk Manager to initialise, partition and format the disk. Remember to format it using 64K sectors as this will be important to preserve performance for the large files involved.

    Alternately you can execute the entire initialisation and mounting process in PowerShell using:

    Mount-VHD -Path "\\<ipAddress\<shareName>\Backup.vhdx" -Passthru |Initialize-Disk -Passthru |New-Partition -DriveLetter B -UseMaximumSize
    Format-Volume -DriveLetter B -FileSystem NTFS -NewFileSystemLabel "Backup Disk" -AllocationUnitSize 65536 -Confirm:$false -Force

    This will create a 64K NTFS partition called “Backup Disk” and mount it on B:\ using the VHDX file found on the ReadyNAS

  6. Now, if you attempt to use Windows Server Backup you will be able to create a differencing disk backup set.

Does it work?

Windows Server Backup (WSB) certainly accepts the disk without any complaints and is dutifully able to create the first Normal copy after which it is able to easily perform the delta-backup as would be familiar for an incremental or differential backup type. So yes, it does work. It tricks Windows into accepting the SMB target.

Performance is however a sticking point.

To apply some context: According to a “Legit Reviews” review of the WD Red 5400rpm 3TB drives in the NAS, each drive should easily have been able to manage a write speed of 80MB/s or 640Mbps at a minimum – with something around 147MB/s or 1176Mbps being expected for sequential writes.

Source: WD Red 3TB NAS Hard Drive Review (Page 3)

 

Creating a Hyper-V VHDX file and writing linear zero’s to it across the network results in a write speed variance of between 445Mbps and 495Mbps on the wire (55.6MB/s – 61.25MB/s). The highest that I saw it peak at was 537Mbps in burst.

Performing a backup onto the drive took 23 hours and 2 minutes with the MTU set to 1500 bytes with the average bit rate being approximately 420Mbps – 430Mbps for the backup. Particularly painful for the first normal backup. This is however comparable to the performance of a USB 2.0 drive.

So we can safely conclude that the bottleneck is not the drives. The bottleneck is the ReadyNAS Duo v2. Other, newer devices with more CPU horsepower, more RAM, larger NIC buffers, native Jumbo Frame offload support and more NICs (as well as more drives) should be able to offer better performance.

As an interesting side observation, having a Windows Explorer session sitting open to a SMB share and doing nothing slowed the zeroing process by 10-20Mbps on its own. This highlights the impact of having the NAS CPU performing other actions and its impact of write performance.

Reality Check

There are however some problems here. Windows Server Backup is not aware that this is a virtual drive, it expects the drive to perform and present like a physical hard drive and it will treat it as such consequently

  1. It is going to have poor support for and tolerance of power management (suspend and standby).
  2. It is going to have little to no tolerance for an unreliable network connection i.e. never try to do this over wireless or an unstable Internet connection.
  3. It is going to be extremely susceptible to power outages. You really should use UPS on the NAS, switch(s) and the source machine to prevent data corruption during a power outage. Note that the important part here is that the source machine stops the backup and dismounts the VHDX in the time it spends on the UPS. After that it can all turn off quite happily.
  4. Windows is not going to automatically mount the VHDX. WSB will not do this for you. You will have to either ensure that it mounts as boot or schedule it to mount before the backup.
  5. Windows is going to need to ensure that it cleanly dismounts the VHDX during shutdown and power management operations. WSB is not going to do this for you either.
  6. Use write caching on the NAS and on the host operating system at your risk i.e. definitely have a UPS if you want this performance benefit.
  7. If you need to perform a bare metal recovery of the server, the extra steps of getting the VHDX mounted in the boot recovery environment may prove frustrating.
  8. While VHDX in Server 2012 R2 can technically be ued as a shared medium, you should probably avoid even contemplating trying to share one VHDX between multiple WSB hosts. Create one VHDX for each server.
  9. The current maximum size of a VHDX is 64TB. If this is an issue 1) why are you using a consumer grade NAS? 2) you need a SAN 3) you shouldn’t be using WSB

It should be noted that all of the above postential disadvantages also apply to some degree to the use of iSCSI. The advantage of this approach is that you get the data virtualisation advantage where as with iSCSI your NAS would have to expose this i.e. you can literally just pickup the VHDX and move it to a new HDD, Array, NAS or SAN and WSB isn’t going to care or even notice.

So what can you do? My suggestion is this: do not use the WSB UI to schedule the backup. Use task scheduler and the WSB command line tool WBAdmin.exe to perform the backup in a PowerShell script. Something like the following:

Mount-VHD -Path "\\192.168.0.100\Backup\Backup.vhdx"

Start-Sleep -s 60       # Wait 60 seconds for the disk to come online

C:\Windows\System32\wbadmin.exe start backup -backupTarget:B: -allCritical -include:C: -systemState -vssFull -quiet

Start-Sleep -s 120      # Wait 120 seconds for the disk to go offline

DisMount-VHD -Path "\\192.168.0.100\Backup\Backup.vhdx"

When task scheduler fires the script it will mount the VHDX, wait 60 seconds to allow the file system to mount, perform the backup, wait 120 seconds for the backup sub-system to shutdown and then cleanly dismount the VHDX.

Optimisation and issues

The 256MB RAM, ARM based ReadyNAS Duo v2 was never intended for these kinds of workloads and that does show. Most of the issues encountered with it are simply as a result of the low power, low resource hardware specification.

I have already covered the need to:

  • Use Jumbo Frames on the NIC
  • Do not use wireless connections to mount the VHDX
  • Use 64K sectors on the NTFS volume
  • Optionally use the write cache setting on the ReadyNAS
  • Optionally enable write caching and prevent buffer flushing on the volume as exposed via host operating system

To this I will add the following:

Do not use VHD files, only use VHDX. VHDX are far,far safer to use over SMB compared to VHD as they have error correction and handling built-in. consequently, there is a reasonable chance that the file will actually survive a disconnect of the network cable or power of the source or destination as a result. This does however restrict you to using Windows 8/Server 2012 or higher at the expense of Windows Server 2008/2008 R2

Only use 1Gbps or 10Gbps networks. Do not use 802.11 wireless and do not use 10/100 Fast Ethernet.

Use server grade NICs in your devices if you can

Use MPIO and multiple switches if you can spare/afford the hardware

Keep your VHDX defragged just like any other NTFS formatted hard drive

If you have managed switches, consider preventing broadcast and multicast traffic from reaching the NAS. This will reduce CPU load a little although it will prevent NetBIOS discovery and may impact other services.

Do not use the NAS for anything else, especially small SMB file storage. Client access will degrade the write performance and consume CPU time. In particular do not leave the NAS SMB mount point mounted as a network drive as this also holds a SMB session open with the Linux Samba service.

Do not use dynamically expanding VHDX files. Using a dynamically expanding VHDX file was in reality fine (if you accept the limitations of the device). It took nearly 4TB of data without incident, however the use of dynamically expanding disks is itself inefficient. Dynamic disks have a performance penalty associated with them as the disk head is constantly being told to zero the trailing 12MB of the VHDX file to permit future growth of the VHDX. There are also associated writes to the metadata of the VHDX to update the file boundary markers. In trying to squeeze every last bit of performance out of the ReadyNAS Duo v2, I wanted to use a fixed size VHDX file to see if it was any more performance efficient.

One of the first issues encountered was on the length of time it takes to allocate and deallocate space from the Linux disk journal. Allocation is proportionately faster then deallocation, however on attempting to allocate 5.4TB of disk space to a singe VHDX file, it would take the system an extended period to process and the VHDX creation process on Windows would timeout, causing the VHDX to be corrupted. At this point the VHDX would be deleted by Windows. This storage deallocation could take upwards of 20 minutes to appear as released in the ReadyNAS web UI.

Looking at ‘top’ in the SSH session, it was clear that the CPU was the culprit, capping out at 100% throughout the entire operation before dropping down to <1% once the journal had been updated.

After some trial and error, I found that with the web UI closed, only necessary services running, SSH logged out and no active Windows Explorer sessions open, I could allocate 2TB at a time without it causing a timeout.

The following script can thus be used to create the VHDX at 2TB, expand it to 4TB and then expand it to the desired 5.3TB (the maximum size of the ReadyNAS volume I was using was 5.4TB).

New-VHD -Path "\\192.168.0.100\Backup\Backup.vhdx" –Fixed –SizeBytes 2TB
Resize-VHD –Path "\\192.168.0.100\Backup\Backup.vhdx" –SizeBytes 4TB
Resize-VHD –Path "\\192.168.0.100\Backup\Backup.vhdx" –SizeBytes 5.3TB

Remember, this script is creating a Fixed Size VHDX file. Consequently it is going to pre-zero each sector on the disk instead of performing a constant 12MB zeroing chase at the end of the file. This means that it will take an extremely long time to complete (especially at only ~470Mbps) i.e. over 24 hours! So I suggest that you copy and paste all three lines at once into the PowerShell buffer and walk away. Once it has finished chewing over all three lines, mount the disk, partition it and format it as outlined earlier in the article.

Note: There are a coupe of utilities out on the Internet that can create a fixed size VHDX from free space without performing the zeroing operation. You can save yourself a lot of time using such tools however you should NEVER use them in a production or in a shared environment due to reasons of data safety, privacy and security.

After the allocation of the space in the Journal and during the zeroing process the CPU use remains high, running constantly at 100% with about 20MB of RAM showing as free out of the 256MB total. This proves that the sub 500Mbps cap on the transfer speed is being caused by the CPU and not the disks. You must thus be realistic about the capability of the appliance or pay for more robust, more capable hardware.

You can technically also disable journalling on the volume using SSH, however you must ensure that you have a UPS wired into the NAS and the UPS can perform a controlled shutdown of the NAS if you try and use it. I elected not to do this.

tune4fs -O ^has_journal /dev/sda0
e4fsck –f /dev/sda0
sudo reboot

Final Results

If you have read this and the Jumbo Frames article on the ReadyNAS Duo v2, I am sure that you might be interested to hear what the cumulative impact of all of the performance tweaks and optimisation’s was.

With write caching enabled on both the NAS and the Windows Server and buffer flushing disabled on Windows Server, plus all of the other tweaks listed, backup throughput rose to a fairly consistent 560Mbps – 590Mbps with bursts up to 638Mbps. That equates to 70MB/s – 73.75MB/s and 79.75MB/s at burst. While nowhere near the capability of the drives themselves, it is at least now tantalisingly close to the benchmark value for the drives random write performance test and network write performance is nearly 200Mbps faster.

Performing the backup job (which without any optimisation’s on a Dynamic VHDX took 23 hours and 2 minutes) with all optimisation’s enabled – and actually a significantly larger workload due to the addition of VM state backups in the job – took some 16 and 47 minutes. A considerable improvement! That works out at around 200GB per hour.

Most importantly, when the job ran again the next evening, it took less than 30 minutes thanks to it only having to backup file differences.

So why is this? It is predominantly related to the fixed size VHDX file. The higher throughput is being achieved because the ReadyNAS CPU is sitting at around 5% – 30% idle during the ~600Mbps copy. The Linux file system sees the write process as constituting changes that are internal to the VHDX file and the file itself isn’t growing, therefore the file system driver on the NAS has significantly less work to do. It is instead NTFS on the backup server that is processing the MFT updates into the file allocation table of the VHDX completely transparently to the NAS. This means that the CPU work has been transferred to the backup server, resulting in a performance increase (and a slightly cooler, less power consuming NAS).

Error 0x80070002 when attempting to backup a Hyper-V Virtual Machine using Windows Server Backup

System Requirements:

  • Windows Server 2008 R2
  • Windows Server 2012
  • Windows Server 2012 R2

The Problem:

You backup, right? Of course you do! Only the cool people backup – and you are one of the cool people aren’t you?

…If only life was that simple.

So imagine for a moment that you are attempting to use VVS and WIndows Server Backup to backup a server. In particular a fully loaded Hypervisor running Windows Server 2012 R2 Datacentre in this case.

The backup process goes OK for the most part, but fails to complete on a number (but by no means all) VMs. The process fails with the following errors on Windows Server 2008 VMs, but not necessarily newer ones.

From Windows Server Backup:

Windows Server Backup “Failed” -or- “Completed with warnings” -or-“Backup failed to complete”

The component <VM Name>(Online) was skipped during the snapshot and will not be available for recovery. Error: The writer experienced a non-transient error. If the backup process is retried, the error is likely to reoccur

In the Hyper-V-VMMS\Admin log in Event Log:

‘<VM Name>’ cannot create the storage required for the checkpoint using disk E:\Virtual Machines\<VM Path>\Virtual Hard Disks\<VHD Filename>.vhdx: The system cannot find the file specified. (0x80070002). (Virtual machine ID <VM GUID>)

and…

Checkpoint operation for ‘<VM Name>’ failed. (Virtual machine ID <VM GUID>)

and…

Could not create backup checkpoint for virtual machine ‘<VM Name>’: The system cannot find the file specified. (0x80070002). (Virtual machine ID <VM GUID>)

and of course most helpfully…

The operation failed.

More Info

If you actually look at the backup file, you will see what looks to be a complete file set for the backup, however given that this error represents an error in VSS, you would not be advised to trust it.

As usual with Hyper-V error logs, the error message have little if anything to do with the actual issue and someone in the Microsoft Development team just needs to be shot for it… but I digress.

The odd think was that the issue was occurring on all of the Windows Server 2008 (R1) VMs, the Windows Server 2008 R2 and higher VMs were backing up correctly.

The Fix

So before I get into the issue I encountered, lets run past the generic fixes

  1. Check that you have enough disk space on the volume to perform the VSS. If your volume is sub-15%, try using the Hyper-V Manager to change the snapshot directory to another volume – plug in an external NTFS formatted hard drive if you have to.
  2. Check the permissions of the VHD stated in the error.
    icacls “C:\Users\Public\Documents\Hyper-V\Virtual Hard Disks\<VHD File>” /grant “NT VIRTUAL MACHINE\Virtual Machines”:F /TSource: Technet
    Source: System Center Central
  3. Ensure that Hyper-V is patched fully.
    Windows Server 20102 R2 users see: https://support.microsoft.com/en-us/kb/2920151
  4. Run chkdsk on the physcial volume on the Hypervisor and on the virtual volume in the VM
  5. Ensure that the Integration Service Components are at the latest version and that they VSS Writer module for it is enabled in the VM properties in Hyper-V Manager

Now the less well documented approaches

  1. Check that you can manually checkpoint/snapshot the VM while it is running.
    In Hyper-V Manager or in PowerShell, force a checkpoint on the VM and then delete it and wait for it to merge back. If this works, you are not having a physical VSS issue. If it fails, you need to troubleshoot this and not the WSB error.
  2. Live Migrate the VM off of the current server and onto a different Hypervisor, attempt the backup here, then bring it back to the original server and try again. This process will reset the permissions on the VM file set. If you cannot live or offline migrate the VM, the you need to troubleshoot this and not the WSB error.

My fix

In my case, the issue was to do with having the VM VHDX files split across a couple of different storage LUN/volumes. I usually move VM page files onto a dedicated partition on a dedicated spindle (usually an SSD) and leave OS and data volumes on larger arrays. This helps to keep the VMs running smoothly and keeps unnecessary paging operations off of parity checked storage volumes.

So imagine that the VM has the following file storage structure

Physical Hypervisor SSD (this is where the Page File’s live)
– D:\my-virtual-server-d-drive.vhdx

Physical Hypervisor Storage Array
– E:\Virtual Machines\<VM Name>\Planned Virtual Machines
– E:\Virtual Machines\<VM Name>\Snapshots
– E:\Virtual Machines\<VM Name>\Virtual Hard Drives\my-virtual-server-c-drive.vhdx
– E:\Virtual Machines\<VM Name>\Virtual Hard Drives\my-virtual-server-e-drive.vhdx
– E:\Virtual Machines\<VM Name>\Virtual Machines

It is actually this structure which breaks the WSB backup job. Contrary to the VSS event log error, the problem drive is NOT my-virtual-server-c-drive.vhdx it is actually my-virtual-server-d-drive.vhdx. The event log will actually log that the error was caused on the first drive attached to the system bus (I think).

If you weren’t too clever when you followed my advice above and live migrated all of the storage to the same location on a different Hypervisor, you probably found this out for yourself – the backup should have worked.

When you split the job back into separate LUNs, it fails again. The fix is oddly simple and continues to allow you to have split LUN storage if you wish. Change the file system structure to:

Physical Hypervisor SSD (this is where the Page File’s live)
– D:\<VM Name>\my-virtual-server-d-drive.vhdx

Physical Hypervisor Storage Array
– E:\Virtual Machines\<VM Name>\Planned Virtual Machines
– E:\Virtual Machines\<VM Name>\Snapshots
– E:\Virtual Machines\<VM Name>\Virtual Hard Drives\my-virtual-server-c-drive.vhdx
– E:\Virtual Machines\<VM Name>\Virtual Hard Drives\my-virtual-server-e-drive.vhdx
– E:\Virtual Machines\<VM Name>\Virtual Machines

Note the introduction of a folder on the D drive with the same name as the <VM Name> folder on the E drive. Do NOT shutdown the VM and move the storage there yourself, use the Hyper-V Manager or PowerShell processes to perform a “move” on the ‘storage only’ and just move the one drive. This will ensure that permissions are correct.

The next time that you run the backup, it will VSS correctly.

As for why it can do this on its own with Windows Server 2008 R2 or higher VMs, but not Windows Server 2008 or lower VMs… I have no idea although I suspect it to have something to do with the capabilities of the integration services components.

Edit: A post publish search on the issue reveals that I’m not alone in working this out
View: Technet

Error 0x80070005 when attempting to Perform a Shared Nothing migration between Hyper-V hosts or move a Hyper-V VM between CSV’s in the same or separate Clusters

System Requirements:

  • Windows Server 2012 R2
  • Windows Server 2016

The Problem:

Hyper-V 2012 R2 has a lot of new features that are worthy of note and one of the most appealing features for Virtualisation Administrators is shared nothing migration between hosts via SMB. If you are in an environment that doesn’t have shared storage it’s useful enough in itself because for VM purposes it may have just validated your decision not to get shared storage in the first place. Yet less well documented is the features value for setups where when you do have shared storage as you can use shared nothing migration as a mechanism to live migrate VM’s between clusters that are backed onto shared storage – or more specifically between “Cluster Shared Volumes” (CSV).

The picture on the back of the box of the smiling, happy systems administrator performing a shared nothing administrator makes it look so easy right? This is however an all too common occurrence:

0x80070005 Error

'General access denied error'('0x80070005')

 

There was an error during move operation.

Virtual machine migration operation failed at migration source.

Failed to create folder.

 

There was an error during move operation.

Virtual machine migration operation failed at migration source.

Failed to create folder.

Virtual machine migration operation for ‘<VM Name>’ failed at migration source ‘<Source Hypervisor name>’. (Virtual machine ID <VM-SID>)

Migration did not succeed. Failed to create folder ‘<RPC path>…\Virtual Hard Disks’: ‘General access denied error'(0x80070005’).

If you look at the specified destination path (e.g. c:\ClusterStorage\Volume1\test) after receiving this error, you will find that it has created the test folder and it will have created a ‘Planned Virtual Machines’ folder beneath it which will in turn contain a folder named with the VM’s VM-SID (the Virtual Machines unique security ID) and a .xml file named with the same VM-SID.

The migration will however not progress any further.

If you attempt to perform the same operation in PowerShell you will receive the PowerShell version of the same error:

VERBOSE: Move-VM will move the virtual machine "<VM Name>" to host "<Destination Server>"
Move-VM : Virtual machine migration operation for '<VM Name>' failed at migration source '<Source Server>'. (Virtual machine ID<VM-SID>)
Migration did not succeed. Failed to create folder
'\\<Destination Server>\<Source Server>.762091686$\{e166ba26-8a4a-4029-ac34-c2466451e439}\<VM Name>\Virtual Hard Disks': 'General access denied error'('0x80070005').
You do not have permission to perform the operation. Contact your administrator if you believe you should have permission to perform this operation.
At line:1 char:69
+ $vm = Get-VM -Name 'test' -ComputerName "<Source Server>" | Move-VM -Des ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : PermissionDenied: (Microsoft.Hyper...VMMigrationTask:VMMigrationTask) [Move-VM], Virtual izationOperationFailedException + FullyQualifiedErrorId : AccessDenied,Microsoft.HyperV.PowerShell.Commands.MoveVMCommand

Please Note: This document does not specifically address 0x80070005 for Hyper-V Replication Troubleshooting, which is a slightly different (yet related) issue.

More Info

Understanding the topology involved in my setup also reveals my reason for needing to get this working – this is important as setup and reasons yours may differ slightly. What I was attempting to do was migrate between two multi-node Windows Hyper-V Server 2012 R2 clusters while being able to initiate the migration from a third device, a Windows 8.1 management console.

Much of the discussion surrounding 0x80070005 suggests that you simply need to deal with the fact that you need to log onto the source workstation and initiate a push of the VM from the source server to the destination server using CredSSP. This is fine if you have a general purpose commodity server that happens to have Hyper-V on it. In the real world if you have a Hyper-V Cluster, you should not be running it in GUI mode, you should be using Server Core – and if you are using Windows Hyper-V Server to begin with, you don’t even have the option of a GUI.

So we can eliminate the use of the GUI tools or the simplicity of “just RDP into the server” immediately from this discussion. People answering as such are running in very simple Hyper-V setups and in environments with simple, very liberal security policies.

You can of course use PowerShell to perform a CredSSP migration on a Server Core installation and as a mater of good practice the ability to transfer VM’s using CredSSP should be confirmed as working before you start out with Kerberos. To do that, log onto the Source Server and execute the following command in a PowerShell session:

Get-VM -Name ‘<VM Name To Move>’ | Move-VM -DestinationHost “<Destination Server>” -DestinationStoragePath “C:\ClusterStorage\Volume1\<VM Name to Move>” -Verbose

If that doesn’t work, I recommend that you troubleshoot this issue before you look to go any further on the 0x80070005 issue.

Additionally, before make sure that you have done performed the basic troubleshooting steps and also ensure that you are simplifying the problem as much as possible before starting. The following provides an overview of such steps in no particular order:

  • Log-in as a Domain Admin to perform this test (if possible). After you have that working migrate down to delegated users and troubleshoot any issues that they are experiencing
  • Only try to ‘shared nothing’ migrate a VM that is turned off (create a new VM, attached a default sized dynamically expanding disk, don’t add any networks and leave it off as this means that you will only have 4MB of data to test move). Once you can migrate a VM that is off, attempt to migrate a running VM with a Live Migration.
  • Only test migrate between the Source Cluster storage (CSV) owner node and the Destination Cluster storage owner node
  • If possible, make the owner of the source and destination cluster core resources the same node that owns the CSV
  • Remember that you must use Hyper-V Manager after you have de-clustered the VM from within Failover Cluster Manager before you can perform a shared nothing migration – the fact that your VM has anything to do with a cluster is an aside for Hyper-V. Treat this process as a Hypervisor to Hypervisor move that happens to be on a CSV and forget about the cluster.
  • On the ‘Choose a new location for virtual machine’ page of the migration wizard, remember that you must enter a file system path (e.g. C:\ClusterStorage\volume 1\test) and not a UNC path (e.g. \\server\c$\ClusterStorage\volume 1\test). The migration is going to take place using RPC and not SMB. Thus do not use a UNC path.
    'Choose a new location for virtual machine' wizard page
  • Ensure that you can migrate the VM using CredSSP as discussed at the beginning of this section
  • Ensure that your Domain Controllers are running Windows Server 2008 or higher (or at least your logon server), Windows Server 2003 Domain Controllers are known to have issues here (possibly due to lack of AES support). Your domain / forest functional levels can reportedly be Windows Server 2003 if required. I have only tested with Windows Server 2008 domain functional and Windows Server 2008 forest functional levels
  • If you are attempting to move between servers in a domain trust, you must ensure that the domain trust supports AES
  • Keep your initial testing paths simple and avoid overly complicated NTFS structures. For example, target the destination to be a local sub folder of c:\ and not a junction (such as ClusterStorage\Volume #) or a non-drive letter NTFS Mount Point (i.e. a iSCSI share or drive mount point exposed as a sub-folder to a higher file system). See the links below for more on this.View: Snapshot – General access denied error (0x80070005)
    View: Migrating a Virtual Machine problemNote: The iCACLS command listed in the second link does not use the principal of least permission. The command to enact the principal of least permission would be as follows:

    icacls F:\hvtest /grant “NT VIRTUAL MACHINE\Virtual Machines”:(OI)(CI)(R,RD,RA,REA,WD,AD) /T

    Finally, keep in mind that for delegation purposes, permissions must be valid for the user account that you are using to perform the move as well as the SYSTEM account.

  • Initially, forget about testing the migration into the cluster CSV itself. Instead, create a new folder on the root of the C Drive of the destination server and migrate into this. There are a few suggestions online that you need to put a couple of folder depths between the root of the drive and the VM itself so try something like:
  • C:\VM Store\Test\
  • If you are following my advice, you will be testing with a 4MB VM called ‘test’ so there won’t be any issue with storage space and the use of the C Drive for testing
  • User PowerShell for testing, otherwise you will go insane from having to repeatedly re-enter information in the Move VM wizard. The general gist of the command is:
    Get-VM -Name ‘<VM Name To Move>’ -ComputerName “<Source Server>” | Move-VM -DestinationHost “<Destination Server>” -DestinationStoragePath “C:\ClusterStorage\Volume1\<VM Name to Move>” -Verbose

    With the 0x80070005 error, you should find that it will get to 2% and then error after a few seconds.

  • Ensure that you have enabled Kerberos authenticated Live Migrations in the properties for the Hypervisor in Hyper-V Manager
    Hypervisor PropertiesNote: You can perform this action in PowerShell using

    Enable-VMMigration -ComputerName <Server Hostname>
    Set-VMHost -ComputerName <Server Hostname> -VirtualMachineMigrationAuthenticationType Kerberos
  • Ensure that your Hypervisor’s and the Windows 8.1 management VM are up to date (at the same patch level) and are joined to the same domain
  • Ensure that all parties in the process have properly registered DNS records in AD DNS
  • Check your Windows Firewall rules – for testing purposes just turn them off if you can (remember to turn them back on afterwards!)
  • Check your ASA/Hardware Firewall rules for the same
  • Keep an eye on the Hyper-V event logs for any additional information. The log of consequence is found in event Viewer under:Applications and Services Logs > Microsoft > Windows > Hyper-V-VMMS > AdminIf you are experiencing the same problem that I was, you will see three events on the Source Server’s log (20414, 20770 and 21024). The 20770 error is the one being reflected by PowerShell or the Hyper-V Management console. Shortly there-after, the Destination Server will log a 13003 event informing you that the virtual machine from the Source Server (with the same VM-SID) was deleted, indicating that the Destination Server performed a clean-up of the initial migration process.

Permissions

There is a lot of discussion about permissions and 0x80070005 errors. Let us look at the salient points

VERBOSE: Move-VM will move the virtual machine "<VM Name>" to host "<Destination Server>"
Move-VM : Virtual machine migration operation for '<VM Name>' failed at migration source '<Source Server>'. (Virtual machine ID <VM-SID>)
Migration did not succeed. Failed to create folder
'\\<Destination Server>\<Source Server>.762091686$\{e166ba26-8a4a-4029-ac34-c2466451e439}\<VM Name>\Virtual Hard Disks': 'General access denied error'('0x80070005').
You do not have permission to perform the operation. Contact your administrator if you believe you should have permission to perform this operation.
At line:1 char:69
+ $vm = Get-VM -Name 'test' -ComputerName "<Source Server>" | Move-VM -Des ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : PermissionDenied: (Microsoft.Hyper...VMMigrationTask:VMMigrationTask) [Move-VM], VirtualizationOperationFailedException + FullyQualifiedErrorId : AccessDenied,Microsoft.HyperV.PowerShell.Commands.MoveVMCommand
  1. The Migration failed at the Source Server
  2. The Source Server failed the migration because it could not ‘create a folder
  3. We know that the folder in question is the Source Server being unable to create a ‘<VM Name>\Virtual Hard Disks‘ folder
  4. We know that the Source Server was able to create a ‘<VM Name>\Planned Virtual Machines’ folder because we can see it in the file system if we use the GUI Wizard to perform the migration.
    Note: The PowerShell version cleans up after itself!
  5. You have told the Hypervisor to use Kerberos to perform the migration

What does this tell us? It tells us that YOU, the administrator are being told that you cannot create the folder. You are using Kerberos to perform the migration, not CredSSP, so the entire process is being run end-to-end using YOUR credentials. The Management Workstation is logging onto the Source Server as YOU. The Management Workstation is telling the Source Server to initiate the move and in turn the Source Server is delegating your authentication session to the Destination Server and telling it to receive instructions from the Source Server using your credentials. At this point it has nothing to do with ‘NT Virtual Machine’ or VM-SID permissions, this comes after the migration of the core parts of the VM and during initialisation of the VM on the Destination Server. We are not there yet.

So the first thing to check is that your account is authorised to perform the move. If you are a Domain Admin, you should be OK, however you should ensure that the Domain Admin’s security group is a member of the Local Administrators Group on the all participating machines – source server, destination server and management workstation.

If you do not want the user account to have full local admin rights you can add them to the “Hyper-V Administrators” group on each server. To add an account to a local group on Server Core or Windows Hyper-V Server:

net localgroup "Hyper-V Administrators" /add domain\user
net localgroup "Administrators" /add domain\user

Constrained Delegation

When viewing the Delegation tab on the computer account in Active Directory Users & Computers (ADUC) ensure that:

  1. You are using “Trust this computer for delegation to specified services only” (it doesn’t appear to work if you use the “any service” option)
  2. You have selected “Use Kerberos only”
  3. You tick the ‘Expanded’ checkbox to view the full list of entries
  4. That (once Expanded) there are two entries for each type (types being CIFS and Microsoft Virtual System Migration Service), one entry will have the NetBIOS Name and the other will have the FQDN i.e. there are 4 entries for each delegated host, two with NetBIOS Names and two with FQDN entries.
  5. When you create the Kerberos Constrained Delegation, you need to ensure that the “Service Name” field column is blank. If there is something listed in the Service Name column, your delegation is not going to work properly.
  6. You need to have the same number of “CIFS” entries for each host as you do for “Microsoft Virtual System Migration Service”
  7. It is not necessary to add the Management Workstation to the Constrained Delegation

When you issue the Move-VM command in PowerShell, try substituting the -ComputerName and -DestinationHost values for four combinations of the NetBIOS Name and FQDN.

Get-VM -Name ‘<VM Name To Move>’ -ComputerName “<Source Server>” | Move-VM -DestinationHost “<Destination Server>” -DestinationStoragePath “C:\ClusterStorage\Volume1\<VM Name to Move>” -Verbose

For example, if your have Server1 and Server1 and your domain is domain.local the combinations to test are:

Source Destination
Server1 Server2
Server1.domain.local Server2
Server1 Server2.domain.local
Server1.domain.local Server2.domain.local

If you find that one of these works while the others do not, you have an error in the constrained delegation setup for DNS or NetBIOS aliasing. Carefully recreate the delegation.

After you have setup the delegation, go into a LDAP browser, ADSI Edit or the Attribute Editor in ADUC. For each delegated server, find the servicePrincipalName property and look at the value list. You should have two of each of the following entries (one with the NetBIOS Name and the other with the FQDN).

  • Hyper-V Replica Service/
  • Microsoft Virtual System Migration Service/
  • RestrictedKrbHost/

If you do not see these, you have a Delegation Error and/or an issue in creating SPN records. Either delete and try to recreate them by recreating the delegation or carefully add them by hand.

DNS

Bindings. I know that you checked them, but check them again. Trust me. On Server Core where you have very little contact with the actual server console this is very easy to overlook.

Constrained delegation may work with both NetBIOS and DNS, however Kerberos does not care for NetBIOS. If your DNS doesn’t work, you aren’t going to get a successful ticket session creation that you will need in order to pass credentials forward as part of the Constrained Delegation setup.

Check the following using short hand and FQDN lookups i.e. nslookup server1.domain.local and just nslookup server1. Are they both going where you expect? Crucially, which server NIC is the DNS query going out of and once the reply comes back, which NIC is being used to attempt to contact the host?

  1. The management console can query all domain controllers in DNS
  2. The management console can query all Hypervisors in DNS
  3. The hypervisors can all query the management console in DNS
  4. The hypervisors can all query all domain controllers in DNS
  5. The hypervisors can all query each other in DNS

This also requires you to check your default gateway settings.

This is important in the following scenario. Most of you will not encounter this because of the scale of your operations, however the fact is that at Enterprise level I did encounter this problem, hence why I able to write about it.

  1. Lets assume that you follow best practice and have separate public, management, cluster, iSCSI and heartbeat networks.
  2. Your management network is data centre local, on a private network with minimal routing and is designated to management of servers, IPC traffic, un-routed VM’s etc in a secure fashion
  3. Local DNS is available on the management network but does not expose Internet Resolution
  4. Your public VM address ranges come from the public network and are not exposed via NAT/PAT i.e. routing and firewall’s
  5. Your domain controllers exist on a public routed network subnet that is separate from the public VM address ranges used for VM’s
  6. You followed best practice and set your management networks binding order to be the first adapter in the binding order on the hypervisors
  7. You will now receive 0x80070005 when you attempt to replicate, live migrate of off-line migrate a VM between cluster nodes using Kerberos Constrained Delegation

The problem is the adapter binding order caused by the use of local DNS on a network that offers no connectivity to the domain controllers. When the KDC attempts to generate a Kerberos ticket for the constrained delegation, the lookups for the domain controllers will be performed using the DNS servers on the management network and will mistakenly attempt to connect to the domain controllers via the management network. This is simply going to time out – causing the wait during migration. Once it times out, Windows DNS doesn’t defer to the next set of DNS servers or attempt to get to the DC’s on a different NIC. It simply gives up.

The resulting very helpful error code that Hyper-V offers back is Access Denied while seemingly attempting to create files in the file system – the Hypervisor will log that it was unable to create the ‘Virtual Hard Drives’ folder on the destination Hypervisor. What it should actually say here is that it could not properly initialise the end to end Kerberos Constrained Delegation ticket session due to a timeout. It of course doesn’t do that.

In this situation the fixes are one of:

  1. Add an interface on the domain controllers on the management LAN
  2. Add a network interface which can connect to the domain controllers in a higher adapter binding order position in the Hypervisor binding order
  3. Remove the DNS servers from the management networks TCP/IP properties, thus forcing Windows Server to use the first available DNS server configuration on a lower ordinal adapter
  4. Allow routing from the management LAN to the domain controllers. Alias, stub or secondary zone the domain controllers in the management networks DNS and hope you remember to keep them up to date when you make changes to Domain Controller DNS records

Assuming that your constrained delegations are correct, it will start working as soon as the DNS updates have propagated.

The Fix

Ultimately the problem that I had was in the setup of the Constrained Delegation and in another case as discussed above, the DNS binding order. For the Constrained Delegation issueI only had NetBIOS values for the ‘Microsoft Virtual System Migration Service’ while I only had FQDN values for CIFS entries which in turn meant that the associated SPN records were missing.

I was originally using a script by Robin CM for this purpose, it appears that it is this script which isn’t quite ticking all of the boxes.

View: Robin CM’s IT Blog – PowerShell: Kerberos Constrained Delegation for Hyper-V Live Migration

 

In my environment, the following represents a corrected version of the script.

The script assumes that you have placed all of your Hypervisor’s in a dedicated OU. The script will obtain a list of all servers in the OU and automatically create the constrained delegation complete with both pairs of the NetBIOS Name and FQDN records.

In addition, the script also now ensures that the system is not adding a constrained delegation back to itself into the AD database.

You must be a domain admin or have permissions to write to msDS-AllowedToDelegateTo objects in AD in order to run this script.

$OU = [ADSI]"LDAP://OU=Hypervisor's,OU=Servers,DC=ad,DC=domain,DC=co,DC=uk"
$DNSSuffix = "ad.domain.co.uk"
$Computers = @{} # Hash tableforeach ($child in $OU.PSBase.Children){
# add each computer in the OU to the hash table
if ($child.ObjectCategory -like '*computer*'){
$Computers.Add($child.Name.Value, $child.distinguishedName.Value)
}
}# Process each AD computer object in the OU in turn
foreach ($ADObjectName in $Computers.Keys){
Write-Host $ADObjectName
Write-Host "Enable VM Live Migration"
Enable-VMMigration -ComputerName $ADObjectName
Write-Host "Set VM migration authentication to Kerberos"
Set-VMHost -ComputerName $ADObjectName -VirtualMachineMigrationAuthenticationType Kerberos
Write-Host "Processing KCD for AD object"
# Add delegation to the current AD computer object for each computer in the OU
foreach ($ComputerName in $Computers.Keys){
#Write-Host $ComputerName.toUpper() $ADObjectName.toUpper()
if ($ComputerName.toUpper() -ne $ADObjectName.toUpper()) {
Write-Host (" Processing "+$ComputerName+", added ") -NoNewline
$ServiceString = "cifs/"+$ComputerName+"."+$DNSSuffix
Set-ADObject -Identity $Computers.$ADObjectName -Add @{"msDS-AllowedToDelegateTo" = $ServiceString}
$ServiceString = "cifs/"+$ComputerName
Set-ADObject -Identity $Computers.$ADObjectName -Add @{"msDS-AllowedToDelegateTo" = $ServiceString}
Write-Host ("cifs") -NoNewline
$ServiceString = "Microsoft Virtual System Migration Service/"+$ComputerName
Set-ADObject -Identity $Computers.$ADObjectName -Add @{"msDS-AllowedToDelegateTo" = $ServiceString}
$ServiceString = "Microsoft Virtual System Migration Service/"+$ComputerName+"."+$DNSSuffix
Set-ADObject -Identity $Computers.$ADObjectName -Add @{"msDS-AllowedToDelegateTo" = $ServiceString}
Write-Host (", Microsoft Virtual System Migration Service")
}
}
}

Once you have run it, give the system a few minutes so that AD can distribute the update to all DC’s and for the Kerberos session on the respective nodes to refresh.

Update for Windows Server 2016

So I decided to reinstall a node to Hyper-V Server 2016 and have a play with it in amongst HyperV Server 2012 R2.

The experience did not go swimmingly well. Here is a quick overview of some issues and I encountered/created myself to keep in mind when troubleshooting this

  1. The Hyper-V server Win32 installer will perform an in-place upgrade as a clean install. Remember that this means that you will need to delete the AD computer account object and DNS records and then re-join the system to the domain in the correct OU.
  2. Once you have done this, you will need to re-create the Kerberos Constrained Delegation records for all Hyper-V nodes
  3. I was experiencing a problem where I could use Kerberos to Live Migrate or offline migrate to the Hyper-V 2016 host, however I could not migrate back unless I logged onto the 2016 node and use CredSSP to move it back again. Looking at the Windows Server 2008 R2 domain controller security logs, Kerberos authentication was failing. In the end the fix was to add a Delegation for CIFS and the ‘Microsoft Virtual Systems Migration’ delegation classes of the computer account object — TO ITSELF. Yes, if you have Computer Accounts HVNode01, HVNode02, HVNode03, the delegation tab for HVNode01 must include CIFS and MVSM entries in DNS and NetBIOS nomenclature for not only HVNode02 and HVNode03 but ALSO HVNode01 (itself). Once I did this, I could magically migrate the VMs back again.
  4. If you are using Jumbo Frames, remember to perform a test using the following command. If it doesn’t work, fix this before doing anything else
    ping <ipAddress> -l 8500 -f
  5. I made a silly mistake in late night PowerShell command entry when setting up the networking on the 2016 box, I entered
    add-vmnetworkadapter -managementos -Name Management

    when I actually meant to enter

    add-vmnetworkadapter -managementos -Name Management -SwitchName VS_Managmement

    This hooked up a new Virtual network adapter on the Hypervisor called ‘Management’ to each and every Virtual Switch on the Hypervisor. So I wound up with 3 NIC’s called Management all on different networks. They went off and got their own IP addresses from DHCP, registered themselves in DNS and created chaos in the adapter binding order. Naturally the one on the unrouted Management network wound up at the top of the binding order and things got a little upset!

  6. The very first randomly selected non-production critical VM that I attempted to migrate was the nodes local console VM. This VM was not designed to move from the node and didn’t have CPU compatibility mode enabled. This caused additional failure issues.
  7. The second randomly selected non-production critical VM that I attempted to migrate gave no hex error code or message what so ever either through the UI or the event log, just throwing Event ID 24024 and stating that the migration failed and the error message could not be found. To cut a long winded story short, in the end I (correctly) assumed it was the VM itself at fault and decided to Export / Import it in order to lazily cycle the file system permissions. It turns out that when I attempted to re-import the VM (as a restore) the import wizard notified me that it was expecting to find a snapshot file but that the snapshot itself was unavailable (this VM had no snapshot on the UI and no snapshot file in the export snapshots folder). The wizard asked me if it could clear the snapshot remnant and imported the VM. Once it was imported again, it could now live migrate and offline migrate properly. It had nothing to do with the 2016 node.Note: Remember to check on the source Hypervisor for remnants of the original Exported VM which may be left in place on the file system.

With the above issues resolved, everything is working correctly between the Hyper-V Server 2012 R2 nodes and the test Hyper-V Server 2016 node.

Error 0x80070490 or 0x00000490 when attempting to connect to a Printer queue on a Windows Print Server

System Requirements:

  • Windows Vista, 7, 8, 8.1,10
  • Windows Server 2008, 2008 R2, 2012, 2012 R2

The Problem:

I was having some problems automating the connection to a printer queue from a set of managed Windows 7 systems to foreign Windows Server managed SafeCom printer queue. The device in question was a generic follow-me Printing queue for Xerox Workcentre 7655 devices (not that it is especially relevant).

On opening the SMB share to the print server and connecting to the printer queue, the system would go off for 60 seconds before coming back with error 0x00000490 and no description.

Exploration of this error in event viewer under Event Viewer > Applications and Services > Microsoft > Windows > PrintService > Admin reveals:

Installing printer driver Xerox GPD PS V3.2.303.16.0 failed, error code 0x490, HRESULT 0x80070490. See the event user data for context information.

The only additional information available in the user data of any substance was either

Parse Inf
ProcessDriverDependencies failed

or

PerformInfInstallActions
ParseInf failed

More Info

I also tried the following recommendations from general troubleshooting/elsewhere:

  1. Attempting to manually install the driver didn’t help
  2. Using pnputil -d to delete the driver oemXX.inf didn’t help (i.e. clearing the driver out of C:\Windows\System32\DriverStore\FileRepository)
  3. Using pnputil -a to manually add the desired driver didn’t help
  4. Using the Print Management MMC snap-in to flush the driver out (including renaming any reference dll’s under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Print\Environments\Windows x64\Print Processors\winprint to xxx.old and restarting the print server to allow the deletion of stuck print drivers) didn’t help
  5. Obtaining older and new drivers from Xerox didn’t help
  6. The Microsoft Printer Troubleshooter tool
    View: Fixing Printer Problems
  7. Enabling the Operational log under Event Viewer > Applications and Services > Microsoft > Windows > PrintService didn’t add anything significant to the troubleshooting process just:
    ParseInfAndCommitFileQueue
    PerformInfInstallActions failedProcessDriverDependencies
    FindLatestCoreDrivers failed
  8. Checking setupapi.app.log and setupapi.dev.log for errors under C:\Windows\inf did not show any errors, everything was reporting ‘success’

The Fix

In downloading and installing the older version of the print driver, event viewer was showing an identical error for the driver install to that shown when displaying the error for the install of the most up to date driver version, however comparing the source driver files to the destination files that appeared after repository injection in C:\Windows\System32\DriverStore\FileRepository revealed some slight differences in file dates, the print server was sending slightly modified versions of the driver compared to the vanilla Xerox source of the same version number (2015 file dates for a 2013 Xerox driver package).

Some sleuthing through monitoring tools ultimately presented the cause of the issue and ultimately its fix. Windows was downloading the driver package from the target foreign print server (with its modified files) and injecting it into the repository correctly. Immediately afterwards however it was going off to our internal, public driver repository (a SMB share on a build server) and finding additional copies of a compatible x64 Xerox driver, finding that they were newer and then attempting to use the newer driver.

Without the driver customisation’s (presumably part of the SafeCom suite configuration for follow-me printing) the print server was immediately rejecting the connection.

So the lesson from this experience was that even if you are explicitly telling Windows to use a specific driver version, if it can find a newer version in a driver search path, it will attempt to pick it up and use it instead. Remove any media with drivers (UFD, CD, DVD, Floppy) and check/modify you driver search paths for conflicting drivers as listed in:

HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\DevicePath