Error 1635 from Microsoft Office Installer when installing from a SMB File Source

System Requirements:

  • Office 2007
  • Office 2010
  • Office 2013 (Enterprise)
  • Office 2016 (Enterprise)

The Problem:

You’re one of the lucky people who has access to a version of Office that doesn’t need to use the awful “Click to Run” to perform the install. You are trying to use SCCM, Group Policy, a BAT file or some other remote invocation method to deploy Office from a source repository on a SMB file share.

Office starts to install, the installation looks like it is working and even one or two app’s appear on the start menu… before it unexplainably performs a roll back and uninstalls itself leaving your machine with no Office installation.

If you check the log files (in %temp% or c:\users\<user>\AppData\Local\Temp by default) you will find a Windows Installer Error 1635 entry listed before it commenced the rollback.

More Info

Windows Installer Error 1635 is “This patch package could not be opened. Verify that the patch package exists and is accessible, or contact the application vendor to verify that this is a valid Windows Installer patch package”.

If you are using SCCM, then you are probably attempting to execute this through the NT System account or via local admin.

Obviously you’ve checked the share permissions, perhaps even enabled a Null Session Share to allow anonymous access to the share… and it still doesn’t work. The client script is obviously executing successfully because the setup programme loads and runs for 10 minutes or more quite happily before erroring out.

The Fix

Quite simply, save yourself a headache and script the process to call the Office setup.exe from the local machine rather than directly against the SMB Share. This will solve the problem. Something in line with the following will download the installer and files from the repository to the OS Temp directory, execute the installer process and then clean-up

mkdir C:\Windows\Temp\OfficeSetup

xcopy "\\server\share" "C:\Windows\Temp\OfficeSetup" /Y /E /V /C /Q /H /R

:: Pre-Install actions here

call "C:\Windows\Temp\OfficeSetup\setup.exe" <switches go here>

:: Post-Install actions here

rmdir "C:\Windows\Temp\OfficeSetup" /S /Q

Error 0x80070002 when attempting to backup a Hyper-V Virtual Machine using Windows Server Backup

System Requirements:

  • Windows Server 2008 R2
  • Windows Server 2012
  • Windows Server 2012 R2

The Problem:

You backup, right? Of course you do! Only the cool people backup – and you are one of the cool people aren’t you?

…If only life was that simple.

So imagine for a moment that you are attempting to use VVS and WIndows Server Backup to backup a server. In particular a fully loaded Hypervisor running Windows Server 2012 R2 Datacentre in this case.

The backup process goes OK for the most part, but fails to complete on a number (but by no means all) VMs. The process fails with the following errors on Windows Server 2008 VMs, but not necessarily newer ones.

From Windows Server Backup:

Windows Server Backup “Failed” -or- “Completed with warnings” -or-“Backup failed to complete”

The component <VM Name>(Online) was skipped during the snapshot and will not be available for recovery. Error: The writer experienced a non-transient error. If the backup process is retried, the error is likely to reoccur

In the Hyper-V-VMMS\Admin log in Event Log:

‘<VM Name>’ cannot create the storage required for the checkpoint using disk E:\Virtual Machines\<VM Path>\Virtual Hard Disks\<VHD Filename>.vhdx: The system cannot find the file specified. (0x80070002). (Virtual machine ID <VM GUID>)

and…

Checkpoint operation for ‘<VM Name>’ failed. (Virtual machine ID <VM GUID>)

and…

Could not create backup checkpoint for virtual machine ‘<VM Name>’: The system cannot find the file specified. (0x80070002). (Virtual machine ID <VM GUID>)

and of course most helpfully…

The operation failed.

More Info

If you actually look at the backup file, you will see what looks to be a complete file set for the backup, however given that this error represents an error in VSS, you would not be advised to trust it.

As usual with Hyper-V error logs, the error message have little if anything to do with the actual issue and someone in the Microsoft Development team just needs to be shot for it… but I digress.

The odd think was that the issue was occurring on all of the Windows Server 2008 (R1) VMs, the Windows Server 2008 R2 and higher VMs were backing up correctly.

The Fix

So before I get into the issue I encountered, lets run past the generic fixes

  1. Check that you have enough disk space on the volume to perform the VSS. If your volume is sub-15%, try using the Hyper-V Manager to change the snapshot directory to another volume – plug in an external NTFS formatted hard drive if you have to.
  2. Check the permissions of the VHD stated in the error.
    icacls “C:\Users\Public\Documents\Hyper-V\Virtual Hard Disks\<VHD File>” /grant “NT VIRTUAL MACHINE\Virtual Machines”:F /TSource: Technet
    Source: System Center Central
  3. Ensure that Hyper-V is patched fully.
    Windows Server 20102 R2 users see: https://support.microsoft.com/en-us/kb/2920151
  4. Run chkdsk on the physcial volume on the Hypervisor and on the virtual volume in the VM
  5. Ensure that the Integration Service Components are at the latest version and that they VSS Writer module for it is enabled in the VM properties in Hyper-V Manager

Now the less well documented approaches

  1. Check that you can manually checkpoint/snapshot the VM while it is running.
    In Hyper-V Manager or in PowerShell, force a checkpoint on the VM and then delete it and wait for it to merge back. If this works, you are not having a physical VSS issue. If it fails, you need to troubleshoot this and not the WSB error.
  2. Live Migrate the VM off of the current server and onto a different Hypervisor, attempt the backup here, then bring it back to the original server and try again. This process will reset the permissions on the VM file set. If you cannot live or offline migrate the VM, the you need to troubleshoot this and not the WSB error.

My fix

In my case, the issue was to do with having the VM VHDX files split across a couple of different storage LUN/volumes. I usually move VM page files onto a dedicated partition on a dedicated spindle (usually an SSD) and leave OS and data volumes on larger arrays. This helps to keep the VMs running smoothly and keeps unnecessary paging operations off of parity checked storage volumes.

So imagine that the VM has the following file storage structure

Physical Hypervisor SSD (this is where the Page File’s live)
– D:\my-virtual-server-d-drive.vhdx

Physical Hypervisor Storage Array
– E:\Virtual Machines\<VM Name>\Planned Virtual Machines
– E:\Virtual Machines\<VM Name>\Snapshots
– E:\Virtual Machines\<VM Name>\Virtual Hard Drives\my-virtual-server-c-drive.vhdx
– E:\Virtual Machines\<VM Name>\Virtual Hard Drives\my-virtual-server-e-drive.vhdx
– E:\Virtual Machines\<VM Name>\Virtual Machines

It is actually this structure which breaks the WSB backup job. Contrary to the VSS event log error, the problem drive is NOT my-virtual-server-c-drive.vhdx it is actually my-virtual-server-d-drive.vhdx. The event log will actually log that the error was caused on the first drive attached to the system bus (I think).

If you weren’t too clever when you followed my advice above and live migrated all of the storage to the same location on a different Hypervisor, you probably found this out for yourself – the backup should have worked.

When you split the job back into separate LUNs, it fails again. The fix is oddly simple and continues to allow you to have split LUN storage if you wish. Change the file system structure to:

Physical Hypervisor SSD (this is where the Page File’s live)
– D:\<VM Name>\my-virtual-server-d-drive.vhdx

Physical Hypervisor Storage Array
– E:\Virtual Machines\<VM Name>\Planned Virtual Machines
– E:\Virtual Machines\<VM Name>\Snapshots
– E:\Virtual Machines\<VM Name>\Virtual Hard Drives\my-virtual-server-c-drive.vhdx
– E:\Virtual Machines\<VM Name>\Virtual Hard Drives\my-virtual-server-e-drive.vhdx
– E:\Virtual Machines\<VM Name>\Virtual Machines

Note the introduction of a folder on the D drive with the same name as the <VM Name> folder on the E drive. Do NOT shutdown the VM and move the storage there yourself, use the Hyper-V Manager or PowerShell processes to perform a “move” on the ‘storage only’ and just move the one drive. This will ensure that permissions are correct.

The next time that you run the backup, it will VSS correctly.

As for why it can do this on its own with Windows Server 2008 R2 or higher VMs, but not Windows Server 2008 or lower VMs… I have no idea although I suspect it to have something to do with the capabilities of the integration services components.

Edit: A post publish search on the issue reveals that I’m not alone in working this out
View: Technet

Booting Windows Hyper-V Server from USB: Lessons From Practice

System Requirements

  • Windows Hyper-V Server 2008 R2
  • Windows Hyper-V Server 2012
  • Windows Hyper-V Server 2012 R2

The Problem

I recently wanted to explore the viability of pulling a Server RAID controller into a workstation. A few choice pieces of electrical tape to cover PCIe pins later and the card worked as intended… until it melted down a few minutes later.

The inevitable failure got me thinking. Like most enterprise hardware, the PERC 5/i and 6/i do not support processor power management. The onboard processor runs at 100% speed, 100% of the time. As a result the heat that it generated easily overwhelmed the modest airflow of a desktop. The thermals went well past 80 degrees C before it tripped out.

Most of the servers that we are running in one particular production stack were using the same controllers. Despite this, none of them were actually being used as RAID controllers. They were set as HBA/JBOD devices with a single drive attached – i.e. no disk redundancy. The reason why we have a production setup with such a bad design? These servers are clustered hypervisors. It doesn’t much matter if they burn out. There are 20 more to take their place and all actual client data is held within a fully redundant, complex storage network. An admin simply needs to replace the broken part, rebuild the OS and throw it back into the pool.

Cost Rationalisation

Was changing the design of these servers feasible? Each 10,000 RPM 70GB hard drive was at best using 20GB of data – and less than 15GB in most cases. Each of those drives is consuming 15-25w of power, making noise and never sleeping. At the same time each controller is consuming 6-18w of power and again, never sleeping. Both are adding to the heat being thrown down through the backplane and out into the hot isle. All for pretty much needlessly.

Based upon my domestic energy tariff, the potential per-server electricity cost saving stands to be between £3.29 and £4.38 per month. £39.48 and £52.56 per year. This does not include any residual savings in air conditioning costs. While it doesn’t seem a lot. On a cluster of 20 servers that’s between £789.60 and £1051.20 per-year. At that level the potential savings to start to add up.

As an IT designer, it also gives me a budgetary value that I can rationalise any savings against. If we split the difference over 12 months between the upper and lower estimate we get a £46.02 average. If it costs more than that – particularly for old server hardware – it isn’t worth doing: so £46.02 became the ‘per-machine budget’ for my experiment.

Options to Consider

With that said and on the understanding that there is no RAID redundancy involved in the setup I am (re)designing. There were three options to explore:

  1. Pull the RAID controller and attempt to utilise the DVD drive SATA connector with an SSD. This would solve the heat issue, solve the noise issue and reduce power consumption (to ~4w). It will also be faster than the 10,000 RPM rotational drive. The down side is that getting hold of affordable SSD’s (as of writing) isn’t yet an option. Not to mention that various adapters and extra cabling would be required to get the SSD mounted properly (at extra cost). Modifying new cable runs into 1u servers can often be a challenge (it’s bad enough in 3u). The Server BMC also complicates matters as under Dell, OpenManage will notice that you aren’t using a Dell approved drive and this will quickly hit your environmental reporting data. Approximate cost ~£70+ per server. Well over budget.
  2. Pull the RAID controller and mount a SSD/mSATA/m.2 into a PCI-e slot (even potentially the RAID controllers slot) on a PCIe adapter. This solves the cabling problem and has the added advantage of clearing both drive slots. It also means that I can control the bus specification, potentially getting a boost from a SATA III or NVMe controller. Of course this is more expensive although it is easier to get hold of smaller mSATA SSD’s than it is 2.5″ ones. Cost per-server ~£125+. Again, over budget.
  3. Look at SATA DOM or booting from Compact Flash/SD Card. SATA DOM isn’t an option for the PowerEdge 1950 and a NAND flash solution would require modification of the chassis. The headache of managing boot support would also be an issue. Rending this unreaslistic.
  4. Pull the RAID controller, disk and boot the entire enclosure from USB. This solves pretty much all problems but does add one in that these servers do not have an internal USB port. The active OS drive would therefore need to be insecurely exposed and accessible within the rack. Think malicious intent through to “I need a memory stick… ah, no one will notice if I use that one”. The cost of an average consumer USB 3.0/ 16GB USB Flash Drive (UFD) is about £7 – and it just so happened that operations have boxes of new ones lying around for the pilfering fully authorised, fully funded project.

I decided to experiement with option 4 and started to investigate how to boot Hyper-V from USB.

How to

Running Hyper-V Server from a UFD is a supported mechanism (as long as you use supported hardware types and not a consumer off the shelf UFD like I am).

The main Microsoft article on this topic was written for Hyper-V Server 2008 R2, however a set of liner notes with hardware recommendations are also available for 2012/R2.

View: Run Hyper-V Server from a USB Flash Drive

View: Deploying Microsoft Hyper-V Server 2008 R2 on USB Flash Drive

 

So far, so good. The basic premise is that you use disk virtualisation and the Windows 7/8 boot loader to boot strap the operating system. The Hyper-V Server is installed into a VHD and once the boot loader mounts the VHD and loads Windows as if it were any other Virtual Machine. The performance will suffer, but for Windows Server Core, this really doesn’t matter.

Microsoft states that USB 2.0 or higher must be used and that (for OEM redistributors) the UFD must not report itself as being ejectable.

Microsoft recommends the following drives

  • Kingston DataTraveler Ultimate
  • uper Talent Express RC8
  • Western Digital My Passport Enterprise

The closest that I could find were 16GB Kingston DataTraveler G4’s. Based upon UserBenchmark data offer 45% lower performance vs. the DataTraveler Ultimate G3 and 131% lower write speeds. Similarly, USB3Speed reports that the G4 read/write is 102.86/31.48 MB/second on a USB 3.0 bus vs. 174.76/38.46 for the Ultimate G3. So there is a decisive bottle neck being introduced as a result of using a cheaper UFD model.

The Microsoft article recommends the use of 16GB UFD’s rather than 8GB ones to allow for the installation of future updates I grabbed 4x 16GB DataTraveler G4 sticks and proceeded to prepare them to support the boot process.

View: UserBenchmark: DataTraveler Ultimate G3 vs. DataTraveler G4

View: USB3Speed

Check your Server for Suitability

The Microsoft article states that USB 2.0 is supported for Hyper-V Server USB Booting. I confirmed through empirical experimentation that the PowerEdge 1950 does support USB 2.0 and that its firmware supported booting from USB in a reliable, consistent way.

What I mean here is that you don’t want to have to go into a F12 boot menu every time you restart the server because the BIOS/UEFI will not automatically attempt to boot from the USB port. You should – as a matter of course – update your server firmware, including (but not limited to) the BIOS/UEFI as a way to mitigate against any potentially solvable issues in this regard. Do remember however that in a clustered environment, you shold normalise your hardware and firmware setup on all participating nodes before you set out to create the cluster.

In testing, the PowerEdge 1950 demonstrated that it could boot properly from the UFD without intervention. Thus with another tick in the box, the idea was looking increasingly more viable.

The USB Stick & VHD(X) Creation Process

I am not going to repeat the instructions for creating the bootable USB stick, they are clear enough on the Microsoft website. It is a shame that Microsoft closed the Technet Code Library meaning that you can no longer get access to the automated tool.

What I will add is that as I was installing Windows Hyper-V Server 2012 R2, I decided to attempt to use convert VHD to the newer VHDX format. The advantages here are nominal; better crash recovery and support for 4K drives are the main headlines. Regardless I wanted to start with the latest rather than using the VHD as prescribed in the 2008 R2 creation guide.

It didn’t work. The boot loader seemed unable to read the VHDX file. Running the VHDX back through the Hyper-V’s disk editor and into a VHD did however work. After some testing, I discovered that the issue was in the issue was the VHD migration process. To use VHDX you must update the BCD using the Windows 8.1 version of BCD edit and start with the Windows 8.1 boot loader. Repeating the process from scratch in a native VHDX did however result in a bootable OS.

I had initially started testing Hyper-V Server on an 8GB UFD. During the process and having obtained a 16GB drive, I decided to expand the size of the VHD from 7 to 14GB. This was a mistake. The VHD will expands fine, however Windows will not allow you to resize the VHD’s primary partition to fill the newly available space via the GUI or DiskPart. So unless you have access to partition management tools that can work with a mounted VHD(x), you will need to ensure that the size of the VHD is correct when you create it.

The file copy from the management computer onto the UFD of the 14GB VHD file (with write cache enabled) was excruciating. Making around 12.8MB/s from a USB 2.0 port is was getting far less than the benchmarked speed of 31MB/s.

Windows file copy showing 12.8MB/s

Finally, I created a Tools folder on the root of each UFD and copied the Windows 8.1 x64 versions of:

  • ImageX.exe
  • bcdedit.exe
  • bcdboot.exe
  • bootsect.exe

I also copued the x86 version of a Microsoft utility called dskcache.exe into here. dskcache can be used to enable/disable write caching and buffer flusging on connected hard drives. You could directly inject these into the VHD if you wanted to, however if left on the UFD, they are servicable.

Also note that this is your best opportunity to inject drivers into the VHD should you have any special hardware requirements.

The Results

USB 2.0

Despite the Microsoft article stating that USB 2.0 is supported, it became obvious within about 20 seconds of the boot process that something was not right. The time that it took to boot was agonising. Given the poor sustain file write speed shown above, this shouldn’t be overly surprising.

It took well over 60 second for the boot loader itself to start booting, let alone bootstrap the VHD load rest of the operating system. The initial boot time was about 25 minutes – although does have to go through OOBE and perform the driver and HAL customisation processes during the initial boot, so it isn’t very fair to be overly critical at this stage.

The next point of suffering was encountered at the lock screen. On pressing Ctrl + Alt + Del, a 15 second delay elapsed before the screen refreshed and offers the log-in text fields. After resetting the password and logging on, the blue Hyper-V Server configuration sconfig script took around 90 seconds to load. In short, the system was painfully unresponsive.

I had expected it to be sluggish – but I was not expecting it to be quite this bad.

Windows had loaded the UFD’s VHD file with the write cache enabled but buffer flusging (‘advanced features’) disabled. I thus used dskcache.exe to enable both settings.

dskcache +p +w

… and rebooted.

The boot time was around 4 minutes, the Ctrl + Alt + Del screen was still sluggish as was the login process – but it was certainly faster. Having completed the first Windows Update run, boot times to a password entry screen had reduced to a far more respectable 1 minute and 17 seconds. The sluggishness (while still there) had again reduced to 10-15 seconds from log-in to sconfig.

So what is the problem? There is certainly a lot of variables here:

  • The Datatraveler G4 does not offer the performance it is supposed to
  • The bus is USB 2.0
  • There is an artificial abstraction layer being imposed by the disk virtualisation process in and out of the VHD
  • Behind the scenes, Windows still likely thinks that this device is removable and is reacting accordingly
  • While the VHD upload process was a linear one that consisted of a single large file, the operating system will be making thousands of random seeks and random small writes. Random I/O and Linear I/O always offer different statistics – the latter being more synthetic than real world usage will otherwise offer.

8GB

As I mentioned previously, my original test with Hyper-V Server 2012 was on an 8GB UFD with a 7GB primary partition. After install, Hyper-V Server 2012 R2 consumes 2.98GB with no Page File. By the time Windows Update had scanned, downloaded attempted to install updates – including the 870MB Windows Server 2012 R2 Update 1 (KB2919255) – there was only 154MB of free disk space available. It was unable to complete the installation as a result.

Having ascertained that I could not resize the partition post-creation I recreated the VHDX once again, from scratch onto one of the 16GB.

16GB

Installing the Hyper-V Server 2012 R2 into a 14GB VHD on a 16GB stick at left plenty of available disk space. By the time that Windows Update had got around to having downloaded and subsequently attempted to install all available Windows Server 2012 R2 updates, there was 4.52 GB free.

At this point the Hypervisor itself still has not been configured and required support tools such as security software, Dell OpenManage or Dell EqualLogic Host Integration Tools.

Therefore, as with the advice offered on the Microsoft article, do not attempt to run Windows Hyper-V Server 2012 R2 from anything smaller than a 16GB memory stick. If you do, you are going to encounter longevity and maintenance problems with your deployments. In practice you should not consider using anything smaller than 32GB. I can see a time within the next couple of years when the 16GB installation will (as with the 8GB installation) be too large to continue to self-update.

This is significant and should be something that you factor during design as if Fail over Cluster Manager spots a mismatched DSM driver version (i.e. out of sync Windows Update state between cluster nodes in the case of the Microsoft driver), the validation will fail and Microsoft will not offer support for your setup. Therefore being unable to install updates is not an a situation that you want for your clustered Hyper-V Server environments.

Windows Update

As a side note, it is worth pointing out that I ran Windows Update on the live UFD in the server while it was booted. One of the advantages of the UFD approach is that it is easy to keep a box of pre-configured UFD’s in a draw that can be grabbed as a fast way to stand-up a new server or recover a server when its existing UFD has failed. Windows Update maintenance of these UFD’s is made far easier if you use DISM to off-line service the VHD’s and apply Windows Updates to the image before you even start to use the memory stick.

You can periodically update the box of UFD’s to the latest patch revision meaning that should you ever need to use one and you will have a far more up to date fresh install of the OS to hand. Something that is significantly faster than performing on-line servicing.

Improving Performance

There were three choices at this point. Abandon the project, focus on the UFD and buy the higher spec drive (£30 vs £7) or focus on the controller. Looking at prices, the controller was the cheaper option to explore.

USB 2.0 is an old technology. Its maximum theoretical bit rate is 480Mbps (that’s Megabits per second, not MegaBytes). This equates to 60MB/s (MegaBytes second). If we compare this with USB 3.0 whose maximum theoretical bit rate is 5Gbps (Gigabits) or 640MB/s we can see a very clear route to better performance. In practice, USB 3.0 isn’t going to get anywhere near 640MB/s, however a quick trip to eBay revealed controller pricing of between £6 and £35 making it something that was easier to swallow inside my £46.02 budget.

After researching chipset options, I narrowed it down to there being three chip options. The Etrom EJ198, which seems to have the fastest benchmark figures. The Renesas (formerly NEC) D720202 which is the new version of the D720201 which (coming in a close second) and finally the cheap and cheerful VIA Labs (VLI) 805-06 1501.

After further research, I found a lot of reports of compatibility issues with the Etrom which, coupled with its higher price, meant I abandoned it. So I picked up a £6.45 VLI dual port card and a dual port Renesas card for £12.66 simply as a means to have two different chips to test with.

Total spend on project: £25.91. Still well within budget.

Before starting I had a working theory that introducing the USB PCIe controller was going to break the BIOS’s ability to boot from the USB port. Despite extensive research, I was unable to find any controller cards online that stated the presence of an Option ROM to explicitly offer boot support. So ultimately I may have spent £25.91 for nothing particularly as USB 3.0 may not be able to add anything to the already I/O constrained cheaper 16GB UFD; but at this point there was still £20.11 left in the budget witch was available to use if chasing USB 3.0 was a red-herring. Consequently I was able to pickup a Kingston DataTraveler Ultimate G3 for £16.99 from eBay to allow for a thorough exploration of both avenues.

Total spend on project: £42.90. Still £3.20 left in the budget for a cup of tea!

Kingston DataTraveler Ultimate G3

The first thing to note was that it is a far larger memory stick and as such is a lot more obvious and significantly more intrusive sitting on the rear I/O plane of the server. You would definitely want to internally mount this larger UFD simply to protect it from damage caused during routine maintenance and cable management activities.

I elected not to re-create the experiment from scratch with a full 30/31GB VHDX, so instead I copied the existing 14GB VHDX from the existing UFD. Over a USB 2.0 bus a UFD to UFD copy resulted in a 18.1MB/s transfer speed – an immediate 5.3MB/s improvement. Repeating the file transfer from the hard drive onto the UFD increased this further to 24.6MB/s – an improvement. of 11.8MB/s and nearly a doubling of the write speed onto the UFD.

Windows file copy showing 24.6MB/s

Testing the connection on the Servers USB 2.0 bus, the performance difference was immediate. While still occasionally lagging and significantly slower than compared to even a 7k rotational hard drive. Its responsiveness was now at a point where I concluded that performance was acceptable – even for the USB 2.0 bus.

Rolling the VHDX back to an older, un-patched version of the image and having the server self-update was a better experience; with the update process lasting for a period of a few hours rather than all day.

I did however start to experience some operational problems with the higher specification drive. For example, while I had no problems with the cheaper drive (eventually) completing tasks, the DataTraveler Ultimate G3 could not complete some DISM servicing activities, citing “Error: 1726 The remote procedure call failed”. This could be illustrative of the start of a drive failure or some form of corruption in the VHD.

USB 3.0

The cheaper £6.45 USB 3.0 controller arrived first and I threw it into a PCIe 1x slot on the test system. I then retested the file copy on both the Ultimate G3 and the DataTraveler G4 to see if there was any improvement in performance.

Windows file copy showing 16.3MB/s

The DataTraveler G4 copied up at around the 16.3 MB/s mark, this is a 3.5 MB/s improvement over the 12.8 MB/s off of the USB 2.0 controller but nothing compared to the 24.6 MB/s of the DataTraveler Ultimate G3 on the USB 2.0 controller.

So what about the performance of the DataTraveler Ultimate G3 on the USB 3.0 bus? The result was quite phenomenal in comparison

Windows file copy showing 88.2MB/s

88.2 MB/s, some 63.6 MB/s faster than the same drive on the USB 2.0 bus – some 705.6 Megabits per-second. Not bad for a £6.45 VIA Labs chip from eBay!

As anticipated however, the lack of an OptionROM was the downfall to the experiment. The BIOS was unable to ‘see’ the USB 3.0 controller as an add-in device during POST and thus was unable to boot from it.

I attempted to create a dual USB boot solution where the VHDX file lived on a memory stick attached to the USB 3.0 bus. A second memory stick containing the boot loader existed on the motherboards USB 2.0 port. Sadly however no amount of tinkering could get the system to link one to the other. 'No bootable device -- insert boot disk and press any key'.

The second, more expensive Renesas USB 3.0 controller arrived around a week later. Just as with the cheaper VIA Labs controller, there was no possibility of getting it to boot directly either.

Writing onto the cheaper DataTraveler Ultimate G4 using a Renesas driver actually managed a throughput of 19.3 MB/s. Repeating the test with the DataTraveler Ultra G3 yielded a write speed of 93.0 MB/s, again showing an improvement over the VIA. Be it a not particularly significant one given that it was double the price.

In summary the write speeds for performing the large file transfer of the VHD onto the memory stick are shown below.

Write Speed: MegaBytes second (MB/s) – Higher is better
Memory Stick USB 2.0 USB 3.0 (VIA) USB 3.0 (Renesas)
DataTraveler G4
12.8
16.3
19.3
DataTraveler Ultimate G3
24.6
88.2
93.0

From a subjective point of view, the use of the DataTraveler Ultimate G3 on the USB 2.0 bus was “acceptable”. Acceptable given what the system needed to do. Thus the randiom read/write bottleneck can be conculded as being in the memory stick and not the controller itself.

Update 08/04/2019: The VIA controller only lasted around 6 months before it started causing system instability (blue screens). Shortly there-after it died. The Renesas controller is still going strong!

 

Conclusion

So having spend £42 on the experiment, what can conclusions can be drawn.

Many of you have probably been shouting the obvious here. That the best way to reduce costs would be to obtain more efficient servers and consolidate the old ones into fewer appliances. This is true beceause newer servers:

  • Have more efficient, less power hungry, higher capacity components
  • Emit less heat
  • Have more efficient power supplies
  • Have newer, better fans
  • Can consolidate more virtual servers

In the real world most of us don’t work for Google or Microsoft and we cannot get management to agree to write blank cheques. Neither can most start-ups, home lab builders, ‘hand-me-down’ dev-test environments or backup environments. The short of it is if you want to save some money, reduce heat and in turn reduce noise (always useful in a home environment). A £40 – £50 saving a year can go a long way. So spending £42 wasn’t unreasonable.

USB 2.0 is ‘good enough’, especially for testing environments. There are clear performance advantages with USB 3.0, however you are going to need USB 3.0 enabled boot support to make practical use of this technology. Even if you have that, you should consider other solutions such as a small SSD or SATA DOM before considering USB 3.0. If you are in a position to add bootable USB 3.0 to your system. It is however a very viable option.

The biggest headline from his process has been that not all UFD’s are created equally. The wide and varied margin between different models from the same company was surprising – espeically with both devices claiming USB 3.0 featuresets. The benchmark statistics are so stark as to prove that there is virtually no point in having USB 3.0 if you are going to use a low-end UFD.

For Hyper-V Server, with the correct investment in your UFD, you can make USB 2.0 suffice for your needs and as long as you realise that it will not be as fast as a rotational drive. Despite this, if you do not reboot your envrionment very often, it might just be good enough for your requirements.

For me personally, I will be getting the testing cluster migrated over to VHDX/UFD booting hypervisors. There is a cost saving rational that helps me to keep the testing devices running. On a more personal level, for home, I have created UFD devices for a couple of desktop machines that are in my lab and these have been setup as off-line nodes in my cluster. The value here is that they can become hypervisors for a short time without interfering with the OS or drives. Even more importantly I do not have to worry about multi-booting. With these UFD’s I plan on simplifying the maintenance process of the main environment so that I no longer need to have down time on my setup.

So why would you want to consider creating a UFD boot setup for your hypervisors? There are some advantages just as there are clearly some disadvantages

Advantages

  • There is potentially a financial saving to be made as a result of power consumption reduction. This is especially true for large clusters and whole racks of servers using  shared storage
  • It is a very easy way to make a low-cost, reportable environmental sustainability push. This is particularly true if you are not yet able to dispose of your legacy hardware
  • It works well with Microsoft’s push towards the use of SMB 3.0 for low-cost Hyper-V shared storage setups for SMB’s
  • If you accept RAID as being unnecessary in a clustered environment. In the event of a UFD failure you can easily keep a box of pre-configured UFD’s in a draw. Allowing you to get the Hypervisor up and running again and back into the cluster very quickly. Offline servicing can also be used to very easily keep the off-line UFD devices patched
  • Heat reduction was my main driver. By removing the hot RAID/SAS JBOD controllers there is a thermalsaving. There is also potentially an area of additional cost saving in environmental cooling
  • It is extremely cheap to implement. Not specifying your new Hyper-V Server purchase with hard drives will more than pay for the cost and time of setting up the environment. Most new servers will have an internal USB port within the chassis and you can use this to your advantage for security. The UFD approach is cheaper than similar SSD/mSATA alternatives
  • Removing hard drives cuts down on power, use, heat and noise. This is less important for Enterprise but for a small business or an average home/home lab user this might be a very important driver
  • The convenience of a UFD makes this a very good option to keep in mind for emergency planning/disaster recovery. You can throw a pre-configured UFD into any server or even desktop and have it running a serviceable hypervisor minutes. All without impacting the original server’s drives. Simply remove the UFD and reboot and it goes back to doing whatever it was doing previously. This is potentially very useful for a SMB with limited resources who need to service a running Hypervisor without downtime. If you can temporarily promote a seperate machine to be a Hypervisor by plugging in a UFD and rebooting. You can creatively increase your organisational uptime
  • It is becoming difficult to purchase small (32/64GB) SSD drives while it remains easy to obtain smaller UFD’s. This saves money as you will not need to buy a 128GB SSD to support a 20GB requirement
  • You can use both VHD or the newer VHDX formats. VHDX offers better failure safeguards, 4k sector support and is the only real choice for UEFI setups

Disadvantages

  • There is no support for disk redundancy from the setup described in this article. If you require OS’s underpinned by a mirror. Then this is not something to consider and you should look at SSD’s
  • Most Enterprise scale deployments will make use of scripted, rapidly provisioned PXE deployments of Hyper-V Server. The use of VHDX means that you will be unable to use these technologies
  • The UFD to VHD(x) abstraction process introduced by disk virtualisation adds a performance penalty
  • As has been demonstrated, UFD’s are slower than rotational drives and are considerably slower than SSD’s
  • The longevity of the UFD being used for this purpose is unknown. In the absence of reliable MTBF figures, most Enterprise users probably wouldn’t (and shouldn’t) consider it
  • Integration with server management tools such as OpenManage may be a problem for your OEM. This in turn may have an impact on support and warranty options.

 

In summary: For the average Enterprise user on primary production kit it may not be something that you want to consider. In some use cases, such as for backup, testing or disaster recovery environments there are clear advantages. Especially if you are prepared to be creative!

IPMI: Lessons from Practice

System Requirements:

  • IPMI / BMC equipped hardware
  • Sideband or Dedicated IPMI interfaces

The Problem:

I have recently been saved (and hindered) by having to pull the less often used IPMI tool from the proverbial administrators utility belt. In doing so I found a couple of design issues that you might like to consider if you are starting out with this technology.

More Info

These are a couple of design gotcha’s to keep in mind.

IPMI Physical & Logical Network Design

If you want to implement IPMI what should and should you not be doing? In looking around and thinking about it, this seems to be a poorly documented topic so I wanted to write up some thoughts on the subject.

Physical: Sideband or Dedicated Port?

Sadly this is always a choice, as your hardware will likely be one or the other. Some Dell hardware is equipped with the option to use the dedicated port or a logical MAC addresses presented through one of the motherboard integrated NICs (the so called LOM interfaces or LAN on Motherboard interfaces which are sometimes referred to as Sideband IPMI interfaces).

With a sideband interface, the physical NIC(s) is allocated a second Ethernet MAC address in software through which the IPMI controller listens for traffic destined for the IPMI controller. By definition there has to be a penalty for doing this, even if it is extremely minor as the NIC must check which MAC address the inbound packet is destined for and redirect it accordingly. Most of the time traffic will be destined for the production network rather than the IPMI virtual interface and thus the processing overhead is being performed needlessly.

In contrast the use of the Sideband interface removes 1 cable from your rack and one used switch port or, to scale that up in the case of a 50u rack, 5 cables requiring 50 switch ports which just needed an extra 2+ switches to receive all of the extra ethernet cabling. It is for this reason that sideband ports are a viable option. Some IPMI implementation also allow you to change the physical LOM port in software while others are fixed.

Both therefore have their uses although in general terms I prefer the use of dedicated management channels as the means there is less to potentially go wrong with production systems or accidental cable pulls of the IPMI Sideband LOM port as you can colour code your cabling accordingly to represent the fact that “blue equals IPMI/ILO/iDRAC etc”.

You do not always have the luxury of choice however. Servers such as the Dell PowerEdge 1950 that do not have an iDRAC card have no dedicated port option and the software LOM is fixed on LOM Port #1. Consequently I posit the following advice:

  1. If you need to NIC Team (e.g. LCAP) across the Motherboard NIC + any other expansion NIC, do not include the IPMI sideband port’s VLAN in the team otherwise you may find that IPMI traffic never finds its way up the correct bit of copper!
  2. If you need to team the LOM (and only 2+ ports on the LOM with no expansion cards included) check to see if this is supported by the server vendor. If there is only a single LOM port without support for teaming on your hardware you will encounter the same problem as outlined above; IPMI commands may simply get lost as they never make it to the correct port *.
  3. If you are not going to be using NIC Teaming, set the IPMI sideband port to be the port that services your management network, not your iSCSI or production networks. This allows you to reconfigure the NIC without impacting production services, make the IPMI NIC a member of the management network directly, make it a member of an alias management network for IPMI traffic or assign it as part of a VLAN that splits IPMI and non-IPMI management traffic.
  4. Unless you have a rack/switch density issue or do not have the budget for dedicated switches to support IPMI I recommend using the dedicated port option on your hardware if available — lets be honest, a dedicated IPMI switch even in here in 2015 can safely be a 10/100 Fast Ethernet switch, you do not need 10GB hardware to issue the occasional 2 byte PSU reset command. This is especially true when you realise that Serial Over LAN (SOL) is only itself operating at 9600bps up to 115200bps to provide you with serial console access via the IPMI system.

* I believe that Dell 9th Generation + servers all support NIC Teaming on the LOM, thus 8th Generation and lower will have issues with NIC Teaming. The solution implemented by Dell to support this is that all LOM NICs listen for IPMI traffic while only the named IPMI port sends IPMI response and Event Trap data. Dell firmware refers to this as “Shared Mode” while “Failover Mode” simply adds the option to dynamically change the IPMI Tx channel to a different live port in the event that NIC 1 fails.

Source: Dell OpenManage Baseboard Management Controller Utilities User’s Guide

Logical: Isolation/No-Isolation

If you actually look into IPMI as a technology, it is fair I think to say that it is not the most secure, particularly if you don’t spend time setting up manual encryption keys on each of your servers. So what can/should you do with traffic?

Firstly it depends on how complicated you want it to be/need it to be and secondly it depends on whether you have managed switches or not. If you do NOT have managed switches, then the best that you can muster is one of the following three options:

  1. Assign address allocations to the IPMI interfaces inside your production LAN (Hint: Do Not Do This)
  2. If you are not willing to have dedicated switches or sacrifice a physical server NIC port as a dedicated IPMI port: Assign address allocations to the IPMI interfaces on the servers and access them as an overlay network using IP Alias Addresses (see below for more details)
  3. If you are willing to have dedicated switches or sacrifice a physical server NIC port as a dedicated IPMI port: Isolate your IPMI traffic in hardware

If you do have managed switches then in addition to the above (not recommended) three options you also have the following two options:

Add the IPMI interface to your existing single management network. You have one, right? You know, the network that you perform Windows Updates off of, perform administrative file transfers across, have hooked up to change, configuration, asset and remote management tools and through which you perform remote administration of your servers from the one and only client ethernet port that connects directly to your management workstation. Yep, that management network! The advantage of this is that you already have the VLAN, you likely already have DHCP services on it (it is your call on whether you want to hunt around in the DHCP allocation list to find the specific allocation when you need to rescue a server in an emergency) and it is already reasonably secure.

The disadvantages of using the existing VLAN are that you may want to use encryption on the management network and there may be packet sniffers involved as part of the day to day management of the network. IPMI is not going to fit in nicely with your enterprise policy and encryption software e.g. IPSec. The inherent insecurity of IPMI (particularly before version 2.0) means that exposing the ability to power off systems to anything sniffing the management network may not be ideal given the damage that could be caused. Thus, good design suggests that your IPMI, ILO and DRAC traffic should be further isolated into their own VLAN, separate from the main production network (that should go without saying) and separate again from the main management network.

Which way you choose to go down is a design time decision based upon the needs of your own environment. Personally if given a choice, I prefer to cable the management and IPMI/ILO/iDRAC networks using different colours on dedicated ports to mitigate against disconnection accidents and in turn isolate them into their own VLAN with a minimum number of intersection points – usually just a management VM or workstation.

The footprint for something to go wrong is significantly reduced under this model as is the need for you to meticulously configure security and keep firmware patched (it isn’t a reason not to do it mind!). As an abstraction you could also think of it in terms of a firmware management VLAN (IPMI) and a software management VLAN (Windows/Linux management tools) or a day to day management VLAN and your ‘get out of jail free’ VLAN.

Accessing the servers IPMI interface using an Alias Address

When you are starting out with IPMI, or for more simplistic configurations and are using a Sideband port (where the IPMI port is shared with an active, production network connection) you may want to connect to other IPMI devices directly from one of the servers that itself is an IPMI host. Alternatively you may have a single NIC management PC and you want to connect to the same isolated IPMI network without losing access to your main network.

You can avoid needing to sacrifice one of your network ports as a dedicated IPMI management port by making use of an Alias IP Address. The main advantage of this is that you do not have to make use of VLAN’s (which are not supported properly anyway under IPMI 1.5 or older) and thus do not require managed switch hardware.

An Alias address is simply the process of offering one or more additional IP addresses to an already addressed network adapter which do not necessarily have to be part of the same network subnet.

IP Address Alias Screenshot

As you can see in the above screenshot, the servicing 128 address is complimented with a 192 address. This means that the network card can actively participate in both networks although the latter is limited to its own broadcast domain as it doesn’t have a gateway configured.

You do not need to alias any IPMI adapter to have the server participate in being a servicing IPMI client, the firmware does this once configured and that configuration is completely transparent to the Operating System. However if you need the server itself (or a management PC) to be able to connect to the IPMI overlay network without resorting to additional NIC’s, an alias address is a good way to get started without resorting to VLAN’s

To create an Alias

In Unix like operating systems you would issue

ifconfig en1 inet 10.0.1.1/32 alias

In Windows you can either use the Advanced TCP/IP Settings screen shown above or you can use the following to achieve the same result

netsh interface ip add address “Local Area Connection” 10.0.1.1 255.255.255.255

Note: The alias address that you set on the adapter should not be the same as the one on the IPMI virtual firmware interface. It must be independent and unique on the IPMI subnet.

There is however a problem with this approach if you are attempting to access IPMI from an IPMI enable server; the server will not be able to see itself.

In testing ping, ipmitool and IPMI Viewer are unable to view the “localhost” servers interface. when using this type of configuration. This of course isn’t really too much of a problem because your management server is working as demonstrated by the fact that you are administering IPMI using it. However if you attempt to communicate with its own interface or keep a list of all hosts in IPMI Viewer, the host will simply appear as offline.

Using a Virtual Machine as an IMIP Management Console

The issue outlined above may seem trivial, however the reason that I have covered it is to record (highlight) the issue that I directly encountered with the seemingly sensible idea of having the management server as a virtual server hosted on a large node cluster. From a safety standpoint, having a multi-node cluster is a fairly safe place to keep your management tools while allowing easy sharing amongst admin team members without having to route vast amounts of VLAN’s into desktop machines (although I always keep one standby physical server bound to all management networks and excluded from public production and client network for such emergencies).

As an expansion on the issue outlined above there is an obvious pathway flaw in the logical network. If the physical server cannot access itself, then the hypervisor cannot access itself. Thus, when a management server node is running on the hypervisor stack, whichever node the management server is running on at the time will appear offline.

If the management server VM moves to a different hypervisor, then the offline hypervisor will change once ARP cache expiration time limitations have been taken into account — although the new host will appear offline immediately the now available one will not – in testing – appear until ARP has fully flushed.

Again, in a real world scenario this isn’t much of an issue: if you need to IPMI into a host, most of the time it is because that host is down. Thus, if the host is down then the Virtual Management server is not going to be running on it any more, it will have migrated to a new host and thus you will be able to see the downed server. Of course stranger things have happened!

The point of this article is to highlight what caused me a little head scratching and some incorrect troubleshooting where by I assumed that IPMI controller on a hypervisor had failed because it could not be raised from a management VM that (you guessed it) was running (it turned out) on the very same hypervisor. Once I had moved the VM and expired the ARP cache, the offline server changed to the new host node and the previously offline server popped back to life.

Notes on VLAN use

If you plan to use a VLAN to isolate the traffic onto your IPMI interface, obviously you need to ensure that your IPMI BMC is capable of filtering and in turn tagging VLAN traffic against its interface.

One thing to keep in mind is that Intel (particularly) have dropped a lot of driver update support for anything older than the I340 series with the release of Windows Server 2012 R2 (something that I learnt the hard way when I assumed that Windows Server 2012 ‘R1’ support would automatically equate to R2 support.

In practice this means that the Windows driver itself will not expose the Intel Device Manager extensions and thus you might find it difficult to tag the non-IMPI side of the shared interface onto the correct non-IPMI VLAN.

Remember that you will need to set the switch port to trunk mode and set the allowed VLAN ID’s for the port to both your production VLAN and your IPMI VLAN – or worse more if you are using the port as part of a virtualised converged fabric. Just remember that the ports that you will be able to send the IPMI VLAN down to are limited to one or a sub-set of all available NIC ports (depending upon the specification of your BMC).

Finally, you should be aware that NIC ‘link-up’ convergence can be noticeably slower when using VLAN trunk ports and tag isolation. This isn’t limited to just IPMI interfaces, but for the purposes of troubleshooting you should double the time that you allow to check that the port is in fact not working. For example, if your convergence time on a non-VLAN tagged access port is 8-10 pings you should wait for 16-20 pings while troubleshooting to allow for a VLAN to come up and register with both Windows and/or the IPMI BMC.

In real word situations, this means that Windows can be past the kernel initialisation process on a Dell PowerEdge before the IPMI interface is responding to traffic following a warm boot. The point of this: the use of VLAN tagging can lull you into a sense of insecurity in that you think that things are not working when all that is actually required is patience.