Booting Windows Hyper-V Server from USB: Lessons From Practice

System Requirements

  • Windows Hyper-V Server 2008 R2
  • Windows Hyper-V Server 2012
  • Windows Hyper-V Server 2012 R2

The Problem

I recently wanted to explore the viability of pulling a Server RAID controller into a workstation. A few choice pieces of electrical tape to cover PCIe pins later and the card worked as intended… until it melted down a few minutes later.

The inevitable failure got me thinking. Like most enterprise hardware, the PERC 5/i and 6/i do not support processor power management. The onboard processor runs at 100% speed, 100% of the time. As a result the heat that it generated easily overwhelmed the modest airflow of a desktop. The thermals went well past 80 degrees C before it tripped out.

Most of the servers that we are running in one particular production stack were using the same controllers. Despite this, none of them were actually being used as RAID controllers. They were set as HBA/JBOD devices with a single drive attached – i.e. no disk redundancy. The reason why we have a production setup with such a bad design? These servers are clustered hypervisors. It doesn’t much matter if they burn out. There are 20 more to take their place and all actual client data is held within a fully redundant, complex storage network. An admin simply needs to replace the broken part, rebuild the OS and throw it back into the pool.

Cost Rationalisation

Was changing the design of these servers feasible? Each 10,000 RPM 70GB hard drive was at best using 20GB of data – and less than 15GB in most cases. Each of those drives is consuming 15-25w of power, making noise and never sleeping. At the same time each controller is consuming 6-18w of power and again, never sleeping. Both are adding to the heat being thrown down through the backplane and out into the hot isle. All for pretty much needlessly.

Based upon my domestic energy tariff, the potential per-server electricity cost saving stands to be between £3.29 and £4.38 per month. £39.48 and £52.56 per year. This does not include any residual savings in air conditioning costs. While it doesn’t seem a lot. On a cluster of 20 servers that’s between £789.60 and £1051.20 per-year. At that level the potential savings to start to add up.

As an IT designer, it also gives me a budgetary value that I can rationalise any savings against. If we split the difference over 12 months between the upper and lower estimate we get a £46.02 average. If it costs more than that – particularly for old server hardware – it isn’t worth doing: so £46.02 became the ‘per-machine budget’ for my experiment.

Options to Consider

With that said and on the understanding that there is no RAID redundancy involved in the setup I am (re)designing. There were three options to explore:

  1. Pull the RAID controller and attempt to utilise the DVD drive SATA connector with an SSD. This would solve the heat issue, solve the noise issue and reduce power consumption (to ~4w). It will also be faster than the 10,000 RPM rotational drive. The down side is that getting hold of affordable SSD’s (as of writing) isn’t yet an option. Not to mention that various adapters and extra cabling would be required to get the SSD mounted properly (at extra cost). Modifying new cable runs into 1u servers can often be a challenge (it’s bad enough in 3u). The Server BMC also complicates matters as under Dell, OpenManage will notice that you aren’t using a Dell approved drive and this will quickly hit your environmental reporting data. Approximate cost ~£70+ per server. Well over budget.
  2. Pull the RAID controller and mount a SSD/mSATA/m.2 into a PCI-e slot (even potentially the RAID controllers slot) on a PCIe adapter. This solves the cabling problem and has the added advantage of clearing both drive slots. It also means that I can control the bus specification, potentially getting a boost from a SATA III or NVMe controller. Of course this is more expensive although it is easier to get hold of smaller mSATA SSD’s than it is 2.5″ ones. Cost per-server ~£125+. Again, over budget.
  3. Look at SATA DOM or booting from Compact Flash/SD Card. SATA DOM isn’t an option for the PowerEdge 1950 and a NAND flash solution would require modification of the chassis. The headache of managing boot support would also be an issue. Rending this unreaslistic.
  4. Pull the RAID controller, disk and boot the entire enclosure from USB. This solves pretty much all problems but does add one in that these servers do not have an internal USB port. The active OS drive would therefore need to be insecurely exposed and accessible within the rack. Think malicious intent through to “I need a memory stick… ah, no one will notice if I use that one”. The cost of an average consumer USB 3.0/ 16GB USB Flash Drive (UFD) is about £7 – and it just so happened that operations have boxes of new ones lying around for the pilfering fully authorised, fully funded project.

I decided to experiement with option 4 and started to investigate how to boot Hyper-V from USB.

How to

Running Hyper-V Server from a UFD is a supported mechanism (as long as you use supported hardware types and not a consumer off the shelf UFD like I am).

The main Microsoft article on this topic was written for Hyper-V Server 2008 R2, however a set of liner notes with hardware recommendations are also available for 2012/R2.

View: Run Hyper-V Server from a USB Flash Drive

View: Deploying Microsoft Hyper-V Server 2008 R2 on USB Flash Drive

 

So far, so good. The basic premise is that you use disk virtualisation and the Windows 7/8 boot loader to boot strap the operating system. The Hyper-V Server is installed into a VHD and once the boot loader mounts the VHD and loads Windows as if it were any other Virtual Machine. The performance will suffer, but for Windows Server Core, this really doesn’t matter.

Microsoft states that USB 2.0 or higher must be used and that (for OEM redistributors) the UFD must not report itself as being ejectable.

Microsoft recommends the following drives

  • Kingston DataTraveler Ultimate
  • uper Talent Express RC8
  • Western Digital My Passport Enterprise

The closest that I could find were 16GB Kingston DataTraveler G4’s. Based upon UserBenchmark data offer 45% lower performance vs. the DataTraveler Ultimate G3 and 131% lower write speeds. Similarly, USB3Speed reports that the G4 read/write is 102.86/31.48 MB/second on a USB 3.0 bus vs. 174.76/38.46 for the Ultimate G3. So there is a decisive bottle neck being introduced as a result of using a cheaper UFD model.

The Microsoft article recommends the use of 16GB UFD’s rather than 8GB ones to allow for the installation of future updates I grabbed 4x 16GB DataTraveler G4 sticks and proceeded to prepare them to support the boot process.

View: UserBenchmark: DataTraveler Ultimate G3 vs. DataTraveler G4

View: USB3Speed

Check your Server for Suitability

The Microsoft article states that USB 2.0 is supported for Hyper-V Server USB Booting. I confirmed through empirical experimentation that the PowerEdge 1950 does support USB 2.0 and that its firmware supported booting from USB in a reliable, consistent way.

What I mean here is that you don’t want to have to go into a F12 boot menu every time you restart the server because the BIOS/UEFI will not automatically attempt to boot from the USB port. You should – as a matter of course – update your server firmware, including (but not limited to) the BIOS/UEFI as a way to mitigate against any potentially solvable issues in this regard. Do remember however that in a clustered environment, you shold normalise your hardware and firmware setup on all participating nodes before you set out to create the cluster.

In testing, the PowerEdge 1950 demonstrated that it could boot properly from the UFD without intervention. Thus with another tick in the box, the idea was looking increasingly more viable.

The USB Stick & VHD(X) Creation Process

I am not going to repeat the instructions for creating the bootable USB stick, they are clear enough on the Microsoft website. It is a shame that Microsoft closed the Technet Code Library meaning that you can no longer get access to the automated tool.

What I will add is that as I was installing Windows Hyper-V Server 2012 R2, I decided to attempt to use convert VHD to the newer VHDX format. The advantages here are nominal; better crash recovery and support for 4K drives are the main headlines. Regardless I wanted to start with the latest rather than using the VHD as prescribed in the 2008 R2 creation guide.

It didn’t work. The boot loader seemed unable to read the VHDX file. Running the VHDX back through the Hyper-V’s disk editor and into a VHD did however work. After some testing, I discovered that the issue was in the issue was the VHD migration process. To use VHDX you must update the BCD using the Windows 8.1 version of BCD edit and start with the Windows 8.1 boot loader. Repeating the process from scratch in a native VHDX did however result in a bootable OS.

I had initially started testing Hyper-V Server on an 8GB UFD. During the process and having obtained a 16GB drive, I decided to expand the size of the VHD from 7 to 14GB. This was a mistake. The VHD will expands fine, however Windows will not allow you to resize the VHD’s primary partition to fill the newly available space via the GUI or DiskPart. So unless you have access to partition management tools that can work with a mounted VHD(x), you will need to ensure that the size of the VHD is correct when you create it.

The file copy from the management computer onto the UFD of the 14GB VHD file (with write cache enabled) was excruciating. Making around 12.8MB/s from a USB 2.0 port is was getting far less than the benchmarked speed of 31MB/s.

Windows file copy showing 12.8MB/s

Finally, I created a Tools folder on the root of each UFD and copied the Windows 8.1 x64 versions of:

  • ImageX.exe
  • bcdedit.exe
  • bcdboot.exe
  • bootsect.exe

I also copued the x86 version of a Microsoft utility called dskcache.exe into here. dskcache can be used to enable/disable write caching and buffer flusging on connected hard drives. You could directly inject these into the VHD if you wanted to, however if left on the UFD, they are servicable.

Also note that this is your best opportunity to inject drivers into the VHD should you have any special hardware requirements.

The Results

USB 2.0

Despite the Microsoft article stating that USB 2.0 is supported, it became obvious within about 20 seconds of the boot process that something was not right. The time that it took to boot was agonising. Given the poor sustain file write speed shown above, this shouldn’t be overly surprising.

It took well over 60 second for the boot loader itself to start booting, let alone bootstrap the VHD load rest of the operating system. The initial boot time was about 25 minutes – although does have to go through OOBE and perform the driver and HAL customisation processes during the initial boot, so it isn’t very fair to be overly critical at this stage.

The next point of suffering was encountered at the lock screen. On pressing Ctrl + Alt + Del, a 15 second delay elapsed before the screen refreshed and offers the log-in text fields. After resetting the password and logging on, the blue Hyper-V Server configuration sconfig script took around 90 seconds to load. In short, the system was painfully unresponsive.

I had expected it to be sluggish – but I was not expecting it to be quite this bad.

Windows had loaded the UFD’s VHD file with the write cache enabled but buffer flusging (‘advanced features’) disabled. I thus used dskcache.exe to enable both settings.

dskcache +p +w

… and rebooted.

The boot time was around 4 minutes, the Ctrl + Alt + Del screen was still sluggish as was the login process – but it was certainly faster. Having completed the first Windows Update run, boot times to a password entry screen had reduced to a far more respectable 1 minute and 17 seconds. The sluggishness (while still there) had again reduced to 10-15 seconds from log-in to sconfig.

So what is the problem? There is certainly a lot of variables here:

  • The Datatraveler G4 does not offer the performance it is supposed to
  • The bus is USB 2.0
  • There is an artificial abstraction layer being imposed by the disk virtualisation process in and out of the VHD
  • Behind the scenes, Windows still likely thinks that this device is removable and is reacting accordingly
  • While the VHD upload process was a linear one that consisted of a single large file, the operating system will be making thousands of random seeks and random small writes. Random I/O and Linear I/O always offer different statistics – the latter being more synthetic than real world usage will otherwise offer.

8GB

As I mentioned previously, my original test with Hyper-V Server 2012 was on an 8GB UFD with a 7GB primary partition. After install, Hyper-V Server 2012 R2 consumes 2.98GB with no Page File. By the time Windows Update had scanned, downloaded attempted to install updates – including the 870MB Windows Server 2012 R2 Update 1 (KB2919255) – there was only 154MB of free disk space available. It was unable to complete the installation as a result.

Having ascertained that I could not resize the partition post-creation I recreated the VHDX once again, from scratch onto one of the 16GB.

16GB

Installing the Hyper-V Server 2012 R2 into a 14GB VHD on a 16GB stick at left plenty of available disk space. By the time that Windows Update had got around to having downloaded and subsequently attempted to install all available Windows Server 2012 R2 updates, there was 4.52 GB free.

At this point the Hypervisor itself still has not been configured and required support tools such as security software, Dell OpenManage or Dell EqualLogic Host Integration Tools.

Therefore, as with the advice offered on the Microsoft article, do not attempt to run Windows Hyper-V Server 2012 R2 from anything smaller than a 16GB memory stick. If you do, you are going to encounter longevity and maintenance problems with your deployments. In practice you should not consider using anything smaller than 32GB. I can see a time within the next couple of years when the 16GB installation will (as with the 8GB installation) be too large to continue to self-update.

This is significant and should be something that you factor during design as if Fail over Cluster Manager spots a mismatched DSM driver version (i.e. out of sync Windows Update state between cluster nodes in the case of the Microsoft driver), the validation will fail and Microsoft will not offer support for your setup. Therefore being unable to install updates is not an a situation that you want for your clustered Hyper-V Server environments.

Windows Update

As a side note, it is worth pointing out that I ran Windows Update on the live UFD in the server while it was booted. One of the advantages of the UFD approach is that it is easy to keep a box of pre-configured UFD’s in a draw that can be grabbed as a fast way to stand-up a new server or recover a server when its existing UFD has failed. Windows Update maintenance of these UFD’s is made far easier if you use DISM to off-line service the VHD’s and apply Windows Updates to the image before you even start to use the memory stick.

You can periodically update the box of UFD’s to the latest patch revision meaning that should you ever need to use one and you will have a far more up to date fresh install of the OS to hand. Something that is significantly faster than performing on-line servicing.

Improving Performance

There were three choices at this point. Abandon the project, focus on the UFD and buy the higher spec drive (£30 vs £7) or focus on the controller. Looking at prices, the controller was the cheaper option to explore.

USB 2.0 is an old technology. Its maximum theoretical bit rate is 480Mbps (that’s Megabits per second, not MegaBytes). This equates to 60MB/s (MegaBytes second). If we compare this with USB 3.0 whose maximum theoretical bit rate is 5Gbps (Gigabits) or 640MB/s we can see a very clear route to better performance. In practice, USB 3.0 isn’t going to get anywhere near 640MB/s, however a quick trip to eBay revealed controller pricing of between £6 and £35 making it something that was easier to swallow inside my £46.02 budget.

After researching chipset options, I narrowed it down to there being three chip options. The Etrom EJ198, which seems to have the fastest benchmark figures. The Renesas (formerly NEC) D720202 which is the new version of the D720201 which (coming in a close second) and finally the cheap and cheerful VIA Labs (VLI) 805-06 1501.

After further research, I found a lot of reports of compatibility issues with the Etrom which, coupled with its higher price, meant I abandoned it. So I picked up a £6.45 VLI dual port card and a dual port Renesas card for £12.66 simply as a means to have two different chips to test with.

Total spend on project: £25.91. Still well within budget.

Before starting I had a working theory that introducing the USB PCIe controller was going to break the BIOS’s ability to boot from the USB port. Despite extensive research, I was unable to find any controller cards online that stated the presence of an Option ROM to explicitly offer boot support. So ultimately I may have spent £25.91 for nothing particularly as USB 3.0 may not be able to add anything to the already I/O constrained cheaper 16GB UFD; but at this point there was still £20.11 left in the budget witch was available to use if chasing USB 3.0 was a red-herring. Consequently I was able to pickup a Kingston DataTraveler Ultimate G3 for £16.99 from eBay to allow for a thorough exploration of both avenues.

Total spend on project: £42.90. Still £3.20 left in the budget for a cup of tea!

Kingston DataTraveler Ultimate G3

The first thing to note was that it is a far larger memory stick and as such is a lot more obvious and significantly more intrusive sitting on the rear I/O plane of the server. You would definitely want to internally mount this larger UFD simply to protect it from damage caused during routine maintenance and cable management activities.

I elected not to re-create the experiment from scratch with a full 30/31GB VHDX, so instead I copied the existing 14GB VHDX from the existing UFD. Over a USB 2.0 bus a UFD to UFD copy resulted in a 18.1MB/s transfer speed – an immediate 5.3MB/s improvement. Repeating the file transfer from the hard drive onto the UFD increased this further to 24.6MB/s – an improvement. of 11.8MB/s and nearly a doubling of the write speed onto the UFD.

Windows file copy showing 24.6MB/s

Testing the connection on the Servers USB 2.0 bus, the performance difference was immediate. While still occasionally lagging and significantly slower than compared to even a 7k rotational hard drive. Its responsiveness was now at a point where I concluded that performance was acceptable – even for the USB 2.0 bus.

Rolling the VHDX back to an older, un-patched version of the image and having the server self-update was a better experience; with the update process lasting for a period of a few hours rather than all day.

I did however start to experience some operational problems with the higher specification drive. For example, while I had no problems with the cheaper drive (eventually) completing tasks, the DataTraveler Ultimate G3 could not complete some DISM servicing activities, citing “Error: 1726 The remote procedure call failed”. This could be illustrative of the start of a drive failure or some form of corruption in the VHD.

USB 3.0

The cheaper £6.45 USB 3.0 controller arrived first and I threw it into a PCIe 1x slot on the test system. I then retested the file copy on both the Ultimate G3 and the DataTraveler G4 to see if there was any improvement in performance.

Windows file copy showing 16.3MB/s

The DataTraveler G4 copied up at around the 16.3 MB/s mark, this is a 3.5 MB/s improvement over the 12.8 MB/s off of the USB 2.0 controller but nothing compared to the 24.6 MB/s of the DataTraveler Ultimate G3 on the USB 2.0 controller.

So what about the performance of the DataTraveler Ultimate G3 on the USB 3.0 bus? The result was quite phenomenal in comparison

Windows file copy showing 88.2MB/s

88.2 MB/s, some 63.6 MB/s faster than the same drive on the USB 2.0 bus – some 705.6 Megabits per-second. Not bad for a £6.45 VIA Labs chip from eBay!

As anticipated however, the lack of an OptionROM was the downfall to the experiment. The BIOS was unable to ‘see’ the USB 3.0 controller as an add-in device during POST and thus was unable to boot from it.

I attempted to create a dual USB boot solution where the VHDX file lived on a memory stick attached to the USB 3.0 bus. A second memory stick containing the boot loader existed on the motherboards USB 2.0 port. Sadly however no amount of tinkering could get the system to link one to the other. 'No bootable device -- insert boot disk and press any key'.

The second, more expensive Renesas USB 3.0 controller arrived around a week later. Just as with the cheaper VIA Labs controller, there was no possibility of getting it to boot directly either.

Writing onto the cheaper DataTraveler Ultimate G4 using a Renesas driver actually managed a throughput of 19.3 MB/s. Repeating the test with the DataTraveler Ultra G3 yielded a write speed of 93.0 MB/s, again showing an improvement over the VIA. Be it a not particularly significant one given that it was double the price.

In summary the write speeds for performing the large file transfer of the VHD onto the memory stick are shown below.

Write Speed: MegaBytes second (MB/s) – Higher is better
Memory Stick USB 2.0 USB 3.0 (VIA) USB 3.0 (Renesas)
DataTraveler G4
12.8
16.3
19.3
DataTraveler Ultimate G3
24.6
88.2
93.0

From a subjective point of view, the use of the DataTraveler Ultimate G3 on the USB 2.0 bus was “acceptable”. Acceptable given what the system needed to do. Thus the randiom read/write bottleneck can be conculded as being in the memory stick and not the controller itself.

Update 08/04/2019: The VIA controller only lasted around 6 months before it started causing system instability (blue screens). Shortly there-after it died. The Renesas controller is still going strong!

 

Conclusion

So having spend £42 on the experiment, what can conclusions can be drawn.

Many of you have probably been shouting the obvious here. That the best way to reduce costs would be to obtain more efficient servers and consolidate the old ones into fewer appliances. This is true beceause newer servers:

  • Have more efficient, less power hungry, higher capacity components
  • Emit less heat
  • Have more efficient power supplies
  • Have newer, better fans
  • Can consolidate more virtual servers

In the real world most of us don’t work for Google or Microsoft and we cannot get management to agree to write blank cheques. Neither can most start-ups, home lab builders, ‘hand-me-down’ dev-test environments or backup environments. The short of it is if you want to save some money, reduce heat and in turn reduce noise (always useful in a home environment). A £40 – £50 saving a year can go a long way. So spending £42 wasn’t unreasonable.

USB 2.0 is ‘good enough’, especially for testing environments. There are clear performance advantages with USB 3.0, however you are going to need USB 3.0 enabled boot support to make practical use of this technology. Even if you have that, you should consider other solutions such as a small SSD or SATA DOM before considering USB 3.0. If you are in a position to add bootable USB 3.0 to your system. It is however a very viable option.

The biggest headline from his process has been that not all UFD’s are created equally. The wide and varied margin between different models from the same company was surprising – espeically with both devices claiming USB 3.0 featuresets. The benchmark statistics are so stark as to prove that there is virtually no point in having USB 3.0 if you are going to use a low-end UFD.

For Hyper-V Server, with the correct investment in your UFD, you can make USB 2.0 suffice for your needs and as long as you realise that it will not be as fast as a rotational drive. Despite this, if you do not reboot your envrionment very often, it might just be good enough for your requirements.

For me personally, I will be getting the testing cluster migrated over to VHDX/UFD booting hypervisors. There is a cost saving rational that helps me to keep the testing devices running. On a more personal level, for home, I have created UFD devices for a couple of desktop machines that are in my lab and these have been setup as off-line nodes in my cluster. The value here is that they can become hypervisors for a short time without interfering with the OS or drives. Even more importantly I do not have to worry about multi-booting. With these UFD’s I plan on simplifying the maintenance process of the main environment so that I no longer need to have down time on my setup.

So why would you want to consider creating a UFD boot setup for your hypervisors? There are some advantages just as there are clearly some disadvantages

Advantages

  • There is potentially a financial saving to be made as a result of power consumption reduction. This is especially true for large clusters and whole racks of servers using  shared storage
  • It is a very easy way to make a low-cost, reportable environmental sustainability push. This is particularly true if you are not yet able to dispose of your legacy hardware
  • It works well with Microsoft’s push towards the use of SMB 3.0 for low-cost Hyper-V shared storage setups for SMB’s
  • If you accept RAID as being unnecessary in a clustered environment. In the event of a UFD failure you can easily keep a box of pre-configured UFD’s in a draw. Allowing you to get the Hypervisor up and running again and back into the cluster very quickly. Offline servicing can also be used to very easily keep the off-line UFD devices patched
  • Heat reduction was my main driver. By removing the hot RAID/SAS JBOD controllers there is a thermalsaving. There is also potentially an area of additional cost saving in environmental cooling
  • It is extremely cheap to implement. Not specifying your new Hyper-V Server purchase with hard drives will more than pay for the cost and time of setting up the environment. Most new servers will have an internal USB port within the chassis and you can use this to your advantage for security. The UFD approach is cheaper than similar SSD/mSATA alternatives
  • Removing hard drives cuts down on power, use, heat and noise. This is less important for Enterprise but for a small business or an average home/home lab user this might be a very important driver
  • The convenience of a UFD makes this a very good option to keep in mind for emergency planning/disaster recovery. You can throw a pre-configured UFD into any server or even desktop and have it running a serviceable hypervisor minutes. All without impacting the original server’s drives. Simply remove the UFD and reboot and it goes back to doing whatever it was doing previously. This is potentially very useful for a SMB with limited resources who need to service a running Hypervisor without downtime. If you can temporarily promote a seperate machine to be a Hypervisor by plugging in a UFD and rebooting. You can creatively increase your organisational uptime
  • It is becoming difficult to purchase small (32/64GB) SSD drives while it remains easy to obtain smaller UFD’s. This saves money as you will not need to buy a 128GB SSD to support a 20GB requirement
  • You can use both VHD or the newer VHDX formats. VHDX offers better failure safeguards, 4k sector support and is the only real choice for UEFI setups

Disadvantages

  • There is no support for disk redundancy from the setup described in this article. If you require OS’s underpinned by a mirror. Then this is not something to consider and you should look at SSD’s
  • Most Enterprise scale deployments will make use of scripted, rapidly provisioned PXE deployments of Hyper-V Server. The use of VHDX means that you will be unable to use these technologies
  • The UFD to VHD(x) abstraction process introduced by disk virtualisation adds a performance penalty
  • As has been demonstrated, UFD’s are slower than rotational drives and are considerably slower than SSD’s
  • The longevity of the UFD being used for this purpose is unknown. In the absence of reliable MTBF figures, most Enterprise users probably wouldn’t (and shouldn’t) consider it
  • Integration with server management tools such as OpenManage may be a problem for your OEM. This in turn may have an impact on support and warranty options.

 

In summary: For the average Enterprise user on primary production kit it may not be something that you want to consider. In some use cases, such as for backup, testing or disaster recovery environments there are clear advantages. Especially if you are prepared to be creative!