Installing Plex Media Server on Windows Server 2016 or Windows Server 2019 Core

System Requirements

  • Windows Server Core
  • Windows Server 2012, Windows Server 2012 R2, Windows Server 2016, Windows Server 2019
  • Plex Media Server

 

The Problem

“Just because you shouldn’t do something, doesn’t mean you can’t”

Plex Server, the sometimes controversial media streaming hub, is a staple of the media diet of many home-brew media centre connoisseurs. I personally keep it installed as a gateway between Smart TV’s and my music/video/photo library as it is a convenient way of getting DLNA support on the network. Where pushed due to lack of Kodi support Plex also gives a consistent alternative front-end user interface.

The problem with Plex Server is that it isn’t quite a “server”. It’s a service, but one that insists on running in the userland (as a tray icon). If you log off from its user account, it shuts down the service and you no longer have a working Plex environment.

 

Why is this a problem?

At home, the only computers that I have running 24/7 are servers and these are exclusively Hypervisors. I want Plex to be always on, but not to be sharing a with VM performing other duties. Neither do I want it be forced to leave a logged-on VM running that does something else and thus increases the attack vector.

To date, my answer has been to run Plex Server in a Windows 10 VM, but this means consuming a £120+ Windows 10 Pro license so that it can effectively molly-coddle a tray icon.

Ah ha! I hear you cry. How is consuming a £900 Windows Server license any better?

It’s not, obviously… unless you’ve got Windows Server Data Centre licenses. If you fall into this category, it literally doesn’t matter how many VMs you install on your hypervisor. The argument is academic as long as your have the horsepower on your server to keep piling on additional VMs.

More commonly however, and perhaps more practically. You may find that you have some old Windows Server 2012, 2012 R2 or 2016 Standard licenses knocking around from recent server decommissions. This may become more common again as your organisation starts migrating to Windows Server 2019.

The advantage of using even a down-version of Windows Server comes in the fact that versions of Windows from 2012 upwards all remain part of the Microsoft Long-term Servicing Branch (LTSB) support model. Consequently, by re-using the licenses your Plex install will receive security patches for many years to come, while remaining lighter than a client edition of Windows and – in the cae of Windows 10 – will save you from the 6-monthly ache of having to Feature Update Windows 10. In other words. Server Core gives you a stable platform to ‘set it and forget it’.

So, for these minority edge cases, an experiment was born to see Plex Media Server could in fact run on Windows Server Core.

 

Why Windows Server Core?

Partly because I’m a stickler for pain and partly because at ~5GB (Windows Server 2019), it represents a considerable disk and resource saving over the ~18GB of Windows 10. My Windows 10 VM Plex Server install, with Windows 10 Pro, Plex and its various database (but no local media assets) weighed in at 33GB (after defragging and compressing). Its RAM utilisation typically sitting between 1.4 GB and 1.8 GB (remember that it’s sitting at a user account lock screen most of the time, but a user is logged on non the less).

This gives us some numbers to define relevant success or failure of the experiment against.

 

How To

The new VM was setup with the following specs:

  • 3 CPU Cores*
  • 1024 MB Startup RAM with dynamic memory between 400 MB and 2048 MB
  • A 127 GB dynamic VHDX
  • Connected to the correct network
  • Set (in my case) to PXE boot and install from my build server
  • Windows Server 2019 Core as the install source

*I find that at 2 cores, Plex rides the CPU at 90% during library updates. With 3 cores, it is usually sub 40% and does make use of the available thread afforded from the extra CPU.

 

Minimising

Firstly, remove any unwanted Windows Features. My build server is configured to enable several features by default, so we’ll strip these off. Fewer features and less services mean a leaner VM footprint. Use Get-WindowsFeature in PowerShell* to view the state of play with yours and remove as appropriate. For example

Remove-WindowsFeature -Name Hyper-V

Remove-WindowsFeature -Name Windows-Defender

* At the comment prompt type “start powershell” and hit enter to launch a PowerShell console.

Simiarly, go through Get-WindowsOptionalFeature -online | ? {$_.State -eq 'Enabled'} to check for more things to disable e.g.

Disable-WindowsOptionalFeature -online -FeatureName <name>

As well as Get-Windowscapability -online | ? {$_.State -eq 'Installed'}

Remove-WindowsCapability -online -Name <name>

… and Get-WindowsPackage -online | ? {$_.PackageState -eq 'Installed'} using

Remove-WindowsPackage -online -PackageName <name>

Note: Do not remove WOW64 from the install as you will require it to run Plex.

 

Preparing

If you aren’t automated, patch it, join it to the domian and make any registry and config changes that you need (such as IP addressing and enabling Remote Desktop).

Decide what account your Plex Server install will run in. Obviously, you’ll be sitting in an administrator account after install, and you don’t want to run Plex in that! I have a user account on the domain that has minimal permissions and access to multimedia shares. You should decide what will work for you.

Set the Windows Firewall so that you can perform remote management. Here are some examples of functions that you may wish to enable (they may differ depending on the Windows Server Edition). We need to enable File and Printer Sharing (SMB) access so that we can copy the Plex installer over to the VM from a management workstation.

enable-netfirewallrule -displaygroup "Core Networking"

enable-netfirewallrule -displaygroup "File and Printer Sharing"

enable-netfirewallrule -displaygroup "Network Discovery"

enable-netfirewallrule -displaygroup "Performance Logs and Alerts"

enable-netfirewallrule -displaygroup "Remote Desktop"

enable-netfirewallrule -displaygroup "Remote Event Log Management"

enable-netfirewallrule -displaygroup "Remote Event Monitor"

enable-netfirewallrule -displaygroup "Remote Scheduled Tasks Management"

enable-netfirewallrule -displaygroup "Remote Service Management"

enable-netfirewallrule -displaygroup "Remote Shutdown"

enable-netfirewallrule -displaygroup "Remote Shut-down"

enable-netfirewallrule -displaygroup "Remote Volume Management"

enable-netfirewallrule -displaygroup "Windows Firewall Remote Management"

enable-netfirewallrule -displaygroup "Windows Remote Management"

enable-netfirewallrule -displaygroup "Windows Management Instrumentation (WMI)"

enable-netfirewallrule -displaygroup "Windows Backup"

Before you can run Plex Server, you will also need to enable Windows Media Foundation services.

Add-WindowsFeature -Name Server-Media-Foundation

Now jump to a Management machine, something with Windows 10 1809 and RSAT installed on it.

On the management machine, open Computer Management from the start button right click or by calling the MSC. Right click on “Computer Management (Local)” at the top of the left-hand pane and connect to the machine by hostname or IP Address. You can now:

  • Manage Task Scheduler
  • View the Event Logs
  • Manage Shared Folders
  • Manage Local Users & Groups

Note: If you are in a workgroup, you need to ensure that the user account and password used to open Computer Management matches the administrator account on the Plex VM. Otherwise you will see ‘Access Denied’. You will also need to have setup WinRM, which is beyond the scope of this article.

 

Auto Log-on

Installing on Windows Server will not change Plex’ behaviour. It will still run as a tray service even though there isn’t a systray to display its icon in. This means that the virtual machine must auto log-on at reboot in order to start Plex Server’s services.

To set auto-logon, from an administrator account add the following registry material.

Note: You can type regedit at the command prompt to gain access to the standard Windows registry editor if you prefer to do it manually.

reg add "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon" /t REG_SZ /v "DefaultUserName" /d "Plex" /f

reg add "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon" /t REG_SZ /v "DefaultPassword" /d "your_password_here" /f

reg add "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon" /t REG_SZ /v "DefaultDomainName" /d "your_domain_here" /f

reg add "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon" /t REG_SZ /v "AutoAdminLogon" /d "1" /f

To test whether you have successfully setup auto-logon, simply reboot the server VM.

Note: The password is inserted in fully readable plain text in the registry. Keep that in mind when designing the security for this account!

 

Install Plex Media Server

Use the following process to install Plex on the new Windows Server Core VM:

  1. Log onto the VM using your preferred Plex user account. For the rest of this article we will call the username for that account “Plex”. This is to create the user account structures.
  2. Download the latest Plex Server installer file from www.plex.tv.
    Note: For some silly reason at the time of writing, the download link is in the page footer with the copyright. It’s almost as if they don’t want you to download it… but I digress.
  3. In file explorer on the management machine, open a SMB share to the VM either using \\<FQDN>\c$ or \\<ipAddress>\c$. Copy the Plex installer file into \\<host>\c$\Users\Plex
  4. Return to the VM via Remote Desktop or your Hypervisor, and ensure that you are logged on as the Plex user account. You should be a command prompt “C:\Users\Plex>”
    1. If your user account is a member of the local administrators group: Type “Plex” and hit tab, it should auto complete the full file name of the Plex installer and hit return e.g.
      Plex-Media-Server-1.14.0.5470-9d51fdfaa.exe
    2. If the Plex account is a standard user: Type “runas /noprofile /user:domain\adminUsername Plex-Media-Server-1.14.0.5470-9d51fdfaa.exe” and hit return.
  5. Should you receive any errors from the installer, you can access the log file via the management machine at the following path to troubleshoot the problem:
    \\<host>\c$\Users\Plex\AppData\Local\Temp

Once the installer has finished, the Launch button will not doing anything as it is attempting to start the default web browser – and there isn’t a default web browser on Windows Server Core. Simply exit the installer to complete the installation.

 

Post-install

At this point you will have the Plex Server binary files installed, however unlike on a GUI install, Plex will not yet function correctly.

 

Drive Maps

Should you need to set up drive maps for media content you can use group policy or create local account mapped shares to your media files using

net use <driveLetter> \\server\share /persistent:yes

 

Auto-Start

Now that Plex is in installed, it is necessary to start its processes. As Windows Explorer (and the startup folder) does not exist to do this for us, you will have to set it up manually.

The obvious way would have been to use task scheduler.

SchTasks /Create /SC ONLOGON /TN "Plex Server" /TR "C:\Program Files (x86)\Plex\Plex Media Server\Plex Media Server.exe"

However, I was unable to get the event to fire at logon and the service never started.

Equally, I was unable to get an auto-run working from HKCU\Software\Microsoft\Windows\CurrentVersion\Run on a non-administrative account, although your mileage may vary if you are using an administrative account.

In the interest of time, the quickest way to achieve this is to use the following procedure:

  1. Log in as a system administrator
  2. Open Regedit
  3. Navigate to:
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon\AlternateShell\AvailableShells
  4. Right click on the AvailableShells key, click Permissions…
  5. Click Advanced
  6. Change the Owner to the administrators group and cascade the ownership change to sub-objects
  7. Set the Administrators group to have Full Control of ‘This key and subkeys’
  8. OK back to Regedit
  9. Edit the REG_SZ under AvailableShells so that you add cmd.exe /k “C:\Program Files (x86)\Plex\Plex Media Server\Plex Media Server.exe” into the Value data string e.g.
    cmd.exe /c "cd /d "%USERPROFILE%" & start cmd.exe /c "C:\Program Files (x86)\Plex\Plex Media Server\Plex Media Server.exe" & start cmd.exe /k runonce.exe /AlternateShellStartup"
    Note: The last command (each & is the start of a seperate command) to be executed will be the window on top after the boot completes.

Note: You cannot use SC as a mechanism to invoke the auto-start as Plex requires the user account to functionally access remote file shares. If all of your media is stored locally on the Plex Server VM then technically you could use SC and in this case you would not need to auto-logon the VM at all.

If you log-off and log-on again you should get the Plex Media Server.exe process running in taskmanager.

 

Adding the ability to Shutdown the VM

If you want your non-administrative user to shutdown the VM without having to log-off, log onto an administrator account and then perform the shutdown. You need to modify the local security policy (or Group Policy) to grant your low security account this right.

You can either

  1. Export a modified policy as a template from your management machine in Local Security Policy (Security Settings > Local Policies > User Rights Assignment > Shut down the system) and then import it onto the Windows Server Core VM using secedit /configure /cfg <exportFilePath> /db secedit.sdb
  2. Use ntrights.exe from the Windows Server 2003 Resource Kit Tools and issue the command ntrights.exe -U "domain\username" +R SeShutdownPrivilege

Download: Windows Server 2003 Resource Kit Tools

 

Open Plex

To start using Plex as part of a new install. Return to your management machine and open a web browser and navigate to:

http://<ipAddress>:32400/web/

You should be presented with the beginning of the Plex configuration wizard in your browser. Do not not be surprised if Plex knows who you are based upon your IP address if you are an existing user. You should be able to  sign-in and configure Plex as required based on it being a new install.

 

Migrating your Plex Server

If you wish to migrate an existing Plex Server into the VM, use the following procedure to perform the migration:

  1. Ensure that both source and destination Plex Installs are running the same version
  2. Shutdown the Plex Media Server processes on both the source and destination Plex Installs by entering
    net stop PlexUpdateService
    tskill "Plex Media Server"
    tskill "PlexScriptHost"
  3. On the old server, export the entirety of the “HKEY_CURRENT_USER\Software\Plex, Inc.” registry key and import it onto the new server
  4. On the new server, rename “C:\Users\Plex\AppData\Local\Plex Media Server” to “C:\Users\Plex\AppData\Local\Plex Media Server-OLD”
  5. Copy the “C:\Users\Plex\AppData\Local\Plex Media Server” folder from the old server to the new server. This folder will be very large and the copy will be very slow as it contains a large number of files and folders. In my case some 662,915 files and folders totalling around 18 GB.
  6. Ensure that your old Plex install remains offline
  7. Reboot the new Plex VM
  8. Test
  9. Delete “C:\Users\Plex\AppData\Local\Plex Media Server-OLD”

 

The Results

At the beginning of the article, I outlined that the old Windows 10 VM disk was sitting at 33 GB with typical idle RAM use sitting around 1.4 GB.

After defragging and compressing the virtual disk for the Windows Server Core VM, the VHDX file size was 27 GB; a small improvement. RAM use was also better. Typical idle values of around 550 MB matched library updates sub-720 MB and observed highs around 900 MB.

Boot times for the VM are considerably faster compared to Windows 10, not that it is especially important for media consumption. As an early superficial observation, the library load times between a Smart TV and the Plex DLNA enumeration service appear snappier than under the previous install. I leave that as a subjective and not an empirical observation however.

So is it worth it? The answer to this should depend on your comfort level with managing Windows Server Core. If you want to play with Server Core to learn it, or are already familiar with it, then it is worth considering for the RAM saving alone. The promise of a long-term stable platform under LTSB servicing does allow you to “set it and forget it” and, if like me you are fed-up of contending with large 6-monthly full reinstalls of Windows 10 for no intrinsic gain. It really does offer a streamlined way to host Plex.

With that said, you do lose three practical things by using Server Core and not much else

  1. The omnipresent tray icon which lets shortcut into the web GUI or manually initiate library scans (all of which you can do from the web UI).
  2. The ability to open the web UI on the VM itself is lost.
  3. Being able to troubleshoot with a GUI in Windows Explorer is occasionally useful. You must now use an intermediate management machine/VM to do this. For any admin who already manages Server Core, they will already have this environment. They will also be used to viewing the local server console as a weapon of last resort, not first resort as will be the case with the majority of GUI administrators.

Once working on Server Core, Plex is essentially managed exclusively through the web UI. There are only very occasional needs to interact with Windows Installer on the console during version upgrades. If you want your Plex VM to do something other than just Plex, then it probably isn’t worth considering going down this route. Should you think like a server admin however and prefer task isolation, then why do you need a GUI, Game Bar and Candy Crush saga to server multimedia content to your TVs? If you think like a savvy consumer, why do you need the extra licensing overhead?

Windows NT 4.0 on Hyper-V 2016

System Requirements:

  • Windows Server 2016, Hyper-V Server 2016
  • Windows 10
  • Windows NT 4.0 Advanced Server, Server, Terminal Server Edition, Workstation

The Problem:

For reasons that defy any sane logic, I decided that I needed to install NT 4.0.

It’s 2018 and over the last few years I have been slowly clearing out all of my old IT hardware, to the point now that I no longer have any legacy motherboards or systems in the house or office. So when I recently needed to fire up Windows NT 4.0 once again – for reasons that defy logic – you would assume that Virtualisation was the easy win.

Sadly – and especially with Hyper-V – this is not the case. Microsoft’s virtualisation solution is (and always has been) designed around its currently supported operating systems, with a little Linux added in to the mix in more recent times. Down-level operating systems are not supported and by default, are not going to work. This is especially true of what in effect is Windows 1996, the workhorse wonder that was Windows NT 4.0.

I am sure that the non-masochists of you will just use something like VMWare or Virtual Box to do thy bidding and carry on with their day… but I digress….

Note: This process is will be very similar for Windows NT 3.5 and NT 3.51 as it will be for Windows 2000 – however Windows 2000 does not have the 8GB disk/2GB partition initial size limitation.

The Fix

The following procedure will get you up and running with a working NT 4.0 install under Hyper-V 2016. I am assuming that you know your way around Hyper-V and this article is intended as a results based guide, not a step-by-step ‘click here, go here’ guide.

Create the VM

Use the following configuration when creating your VM:

  1. Create a generation 1 Virtual Machine. In our case this will be “NT 4.0 Server”
  2. Set the RAM to 512 MB (or lower)
  3. You can set it to 1 or 2 CPU cores as required
  4. Do not connect to a network. Remove the default network adapter completely. Add a new Legacy Network Adapter
  5. Create a new virtual hard drive. The drive can be fixed or dynamically expanding, however set the maximum disk size to 6GB or lower. Ensure that both the VHDX and the virtual DVD drive are connected to the IDE bus, not the SCSI bus
  6. Attach your NT 4.0 install CD/ISO to the virtual DVD drive
  7. [If applicable] attach the NT 4.0 virtual floppy boot disk to the virtual floppy drive
  8. Set the required boot order (Floppy or CD ahead of HDD)

Pre-configure Hyper-V

By default, Hyper-V will attempt to run the VM under its default modern architectures mode, compatible with Windows Vista+ systems. The 1996 Windows NT 4.0 code-base is not compatible with modern platforms or CPU instruction sets and if you attempt to boot to the NT text mode installer without addressing this issue, NT 4 will blue screen while attempting to bootstrap the installer.

To fix this, you need to enable the legacy CPU compatibility. This used to be a GUI option in Hyper-V 1.0 under Windows Server 2008, but the option was removed in later releases. Despite being removed from the GUI, the option does still exist in the Hyper-V core and can be re-enabled for the VM using PowerShell.

To enable compatibility mode, open an elevated PowerShell sessionon the hypervisor and enter the following command:

Set-VMProcessor "NT 4.0 Server" -CompatibilityForOlderOperatingSystemsEnabled $true
Get-VMProcessor "NT 4.0 Server"

Text Mode Setup

Boot your Virtual Machine from the floppy/CD and enter text mode and follow through the setup process.

  1. You do NOT need to add any additional mass storage device drivers (this includes the NT 4.0 SP4 ATAPI update, which if you attempt to add the updated driver, the installer will ignore).
  2. When prompted to choose the keyboard layout, language and confirm the computer type. Change the computer type to “Standard PC” for a single core VM or “MPS Multiprocessor PC” if you require access to two cores. Enter your preferred keyboard settings as required.
  3. In the drive partitioning section, create an NTFS partition of less than 2048MB. I would suggest 1024MB for simplicity. Do not attempt to create a larger partition. The reason for this is that NT 4.0 will initially format the VHDX as FAT16, which has a maximum partition size of 2GB. During the later installer process and before entering GUI mode setup, NTFS conversion will be run over the FAT partition, converting it into an NTFS 1.2 file system. You will patch it to NTFS version 3.0 after installing NT 4.0 SP4 or later.

If you receive an installation failure because setup cannot write to the Windows folder or a setup error stating that permissions could not be created, this is most likely caused by you creating the initial VHDX larger than 8GB.

GUI Mode Setup

There are no special requirements or steps to perform during GUI Mode Setup.

Auto detection of the Network card will work with the Hyper-V Legacy Network Adapter. Ensure that you properly configure TCP/IP and remove IPX/SPX from the protocol list (unless you specifically need it).

Post Install

  1. Install SP6a (SP4 at a minimum).
  2. Turn off the machine.
  3. Increase the RAM from 512MB if required.
  4. In Hyper-V Manager/PowerShell edit the virtual disk and set the maximum size to your required size (e.g. the default 127GB).
  5. In Windows Server 2016, locate the VHDX and mount the disk. Using PowerShell or Disk Manager, expand the partition to fill the entire size of the disk.
  6. [1/2] If you want Windows NT 4.0 to turn off automatically when you click the shutdown button (instead of telling you it is now safe to turn off your computer):
    1. Use 7-zip (or similar) and extract the hal.dll.softex file from the SP6a installer, rename it HAL.dll and copy it into C:\WinNT\System32\
      Note: If you are using a multi-processor VM, rename the halmps.dll.softex to HAL.dll and do the same.
  7. Unmount the VHDX.
  8. Reboot the Virtual Machine.
  9. [2/2] If you want Windows NT 4.0 to turn off automatically when you click the shutdown button (instead of telling you it is now safe to turn off your computer):
    1. Add the following to the registry:
      REGEDIT4[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon\]
      "PowerdownAfterShutdown"="1"
  10. Install Internet Explorer 6.0 SP1, patch and update the Windows install, install application and configure to your needs.

Things the inexperienced user may not know

If you are playing with NT 4.0 for the first time, then there may be some things that you are not aware of. Here are a list of a few points that are worth noting should you be the Windows equivalent of a millennial (pun intended).

  1. The total install size (less the page file), after patching and cleaning up uninstall data was ~320MB. They don’t make them like that any more!
  2. NT 4.0 does not support plug and play. If you want to add hardware, you have to do it manually via a plethora of different places in the control panel – there is no device manager!
  3. NT 4.0 is extremely insecure by default. Know that it has no built-in firewall and that the base system policy and security configuration is insecure by default (even file system permissions are a free for all). You should keep this in mind when attempting to do anything at all with NT. If it really needs to be on the network you should at a minimum harden system policy and add a firewall (ZoneAlarm Free was the go to back in the day).
  4. There are no display drivers for Hyper-V. This means that there is no mouse integration and as such you will be unable to install NT 4.0 over a Remote Desktop session. It also means that you will be stuck at a max resolution of 800×600 in 16 colour using official means.
    1. Unofficially, you can make use of the great work of the VBEMP NT project to increase the resolution and get NT 4.0 running at modern resolutions and up to ‘True Colour’ (24-bit). This does not offer any cursor integration between the VM and the Hyper-V Manager, preventing mouse use over a Remote Desktop connection and requiring the ctrl +  alt + left cursor to escape the Hyper-V Connection window.
      View: VBEMP NT (tested with NT 4.0 stable version 3.0)
  5. There are no sound drivers for NT 4.0 in Hyper-V (unlike there used to be in Virtual PC) as Hyper-V does not emulate any sound adapters.
  6. The disk performance is fairly poor, until you have patched up to SP6a + the SP6a URP (Q299444). You can further improve performance by enabling DMA Mode on the IDE adapter and write caching on the VHDX.
  7. NT 4.0 by default does not use SMB signing and uses LAN Manager authentication instead of NTLM. It can use NTLM v1/2 once it has been fully patched. However, be aware that this means that it will be unable to communicate with Windows XP SP2+ or Windows Server 2003 in their default configurations. You will have to perform some security hardening on NT 4.0 or security weakening on XP+ to get SMB working. Hint: It’s the same process in the registry on both, so security harden NT 4.0 after installing SP6a.
  8. NT 4.0 ONLY supports SMB 1.0 / CIFS (“SMB 1.5”). Microsoft have been removing support for SMB 1.0 with each successive Windows release. Under Widows Server 2016 and Windows 10 SMB 1.0 support is an optional component/feature that you may need to install manually.
    Note: You should not be using SMB 1.0 at all in 2018 as it is a 100% exploitable security risk.
  9. After you have performed the install, you may be looking for the easiest way to copy SP6a, Internet Explorer, patches and app installers to the VM. To do this as fast as possible without having to pre-harden the OS either burn the updates into an ISO or do it via Guest authenticated SMB by:
    1. Enable the guest account
    2. Create a SMB share on the root of the C Drive and set Guest access to read/write (modify) under NTFS and Share permissions.
      Note: before it is patched, you will struggle to SMB into NT 4.0 using a username and password combination unless you weaken the security policy on the calling client. Using the guest account bypasses the problem.
    3. Use an intermediate level VM/system as a bridge between newer and older SMB versions. For example I used a Windows Server 2008 VM to pull data from third server with a SMB 2.x file share of updates and drop them onto the NT 4.0 SMB 1.0 share found at c:\shared.
    4. Once NT 4.0 is patched, you should disable the guest account again, remove its permissions to the file share and authenticate into NT 4.0 using a normal user account found in the SAM database. Do note my warning above about SMB signing however, which will scupper you unless you have made mitigations via hardening.
Once you have done all of the above, and have a fully patched system. You will have something resembling the below running in Hyper-V 2016.
NT 4.0 in Hyper-V 2016
NT 4.0 on Windows 10 via Windows Server 2016 Hyper-V install

“RPC server unavailable. Unable to establish communication between and ” when connecting to Hyper-V 2008, 2008 R2, 2012, 2012 R2 from Hyper-V Manager version 1709

System Requirements:

  • Windows 10 1709
  • Windows Server 2016
  • Hyper-V Management Console
  • RSAT 2016/1709 for Windows 10 version 1709

The Problem:

After upgrading to Windows 10 version 1709 and installing the updated Windows Server 2016 (version 2016 or version 1709) RSAT tools for Windows 10 1709. On attempting to connect to a down-level Windows Server 2012 R2, 2012, 2008 R2, 2008 Hyper-V Server via the Hyper-V Manager MMC snap-in. You receive the error even though no configuration changes have been made on the Hyper-V hosts:

"RPC server unavailable. Unable to establish communication between <management host> and <Hyper-V host>"

At this point you are unable to manage down-level version of Hyper-V from Windows 10. This issue does not impact the management of remote Windows Server 2016 or Windows Server 1709 Hyper-V instances.

View: Remote Server Administration Tools for Windows 10 (RSAT)

The Fix

This appears to be related to a change in the default firewall behaviour on Windows 10 1709 installs. to fix the problem. On the client system, where you have installed RSAT to remote manage the hypervisor (i.e. not on the hypervisor itself):

  1. Open ‘Administrative Tools’ in the Windows Control Panel
  2. Open ‘Windows Defender Firewall with Advanced Security’
  3. Select ‘Inbound Rules’ from the left hand side
  4. Scroll down until you get to ‘Windows Management Instrumentation (ASync-In)’
  5. Enable the rule for domain/private/public networks as required
    Note: By default the Windows firewall MMC will only display WMI rules for domain and private networks. If you are not running against a domain and Windows has not been explicitly told that you are on a private network, Windows will assume that you are on a public network. Check in network settings in the settings app to ensure that you are not running on a public network, or if you are edit the firewall rule to include public networks. In general, it is a bad idea to open WMI up to traffic on public networks.
  6. Restart Hyper-V Manager

You should now find that you can connect to down-level versions of Hyper-V from Windows 10 1709.

Overview of iSCSI MPIO Recommendations & Best Practice on Windows Server

System Requirements:

  • Windows Server 2008 Storage Server
  • Windows Server 2008 R2 Storage Server
  • Windows Server 2012, 2012 R2, 2016

The Problem:

I needed to outline some of the general thinking relating to exactly how a practitioner should logically and physically understand MPIO, however most of the discourse on the subject skips a fair amount of the obvious questions that people starting out with the technology may be asking (or trying to answer). I therefore present some thinking on the subject of understanding MPIO optimisation and best practice for iSCSI.

The information presented in this document is intended for those who are new to the concept of iSCSI and MPIO and is not intended to be product specific.

More Info

Multi-Path Input/Output or MPIO is a server technology that usually sits on the storage side of load balancing, failover and aggregation technologies. If you are getting into SAS, iSCSI or Enterprise RAID solutions where it is most commonly used (encountered), then you this may (or may not) help you with understanding what MPIO is any why it (possibly) isn’t what you think it is!

The document is written from the perspective of an iSCSI user where it can be conceptually a little harder for new users to understand the best way to approach MPIO.

Logically understanding what MPIO is all about

So you have 2x1Gbps ports in a MPIO team, that means you’ve got a 2Gbps link right? Wrong. That isn’t what is going on with MPIO.

MPIO (and in fact pretty much the majority of balancing and aggregation technologies) doesn’t double the speed, but it does roughly double the bandwidth available to the system. Confused? Think of it like this:

You own a car. The car has a top speed of 70mph and not one mph more. You get on a one way, single track road in a country where there are no speed limits. You are now happily driving along at 70mph. Some bright spark at your local council decides that you should be able to drive at 140mph, so they cut down the trees on one side of the road and add a second one-way carriage way, going in the same direction as the first.

Can your car now drive at 140mph because of the new lane? No. The public official is wrong. Your engine can only offer you 70mph. The extra lane doesn’t help you, but it does help the guy in the car next to you also driving at 70mph arrive at the other end of the road at the same time. It also means that when you encounter a tractor ambling along in your lane, you have somewhere else to go without slowing down.

This is fundamentally what MPIO is doing. So why isn’t it a 2Gbps link? Basically, because networking technology is a serial communications medium and by adding a second lane and calling it a faster way to get data to the end of it, you get into the different world of parallel communications. Under parallel communications you have to split (fragment) information into smaller pieces and push it down each one of the wires to the destination. This in turn infers the need to have more complicated buffer/caching designs to store information as part of a strategy that is designed to be able to cope with each section of the data arriving at a different time, it arriving all at the same time, in a different order than intended or of course, it not arriving at all. Something known as clock skew.

To fix this, you need to introduce overhead either to synchronise delivery to be reliable (thus slowing it down and reducing error tolerance) or adding overheard mechanisms designed to deconstruct, sequence, wait for or re-request missing or corrupt data sections and track timing – alls something that you really don’t want in an iSCSI or SAS environment where response time (latency) is king. Consequently, there is a diminishing return on how much of this parallel working you can derive a benefit from in any system, including an MPIO system. iSCSI MPIO, if correctly configured, will offer something at around about the boundary between worth-while and not bothering in the first place. Yet it is important to understand that it will not be a 100% increase in performance, neither will likely be a 50% increase, but more realistically something around the 30-40% mark.

Performance is only one of the intended design considerations for MPIO, and in that it is not the primary consideration. The primary consideration is for fault tolerance and reliability.

In a correctly designed iSCSI system, independent NICs connected to more than one switch and usually to more than one controller on the storage side and more than one server on the host side. If one of these fails, in a correctly implemented system, your production service probably won’t even notice. You can even be as bold to perform live switch re-wiring on iSCSI systems without impacting the client services involved – although it should be stressed that this is for bragging rights and in practice should not be attempted.

To summarise, MPIO allows you to get twice (+) as much data down to the end of the link, but you cannot get it there any faster. In general, if you can avoid using fragmented streams, you will reap the maximum benefit. The obvious approach here is that each “lane” should be using unrelated data: instead of carving up a single video file and pushing little bits of it down each lane one bit at a time (MPIO can do this), one lane is used for the video and the second lane is used for literally anything and everything else. This is a simplification of what MPIO generally does, however in practice is offers a good way to get your head around it.

Techniques

So how does MPIO carve up the traffic?

There are broadly speaking 4 different paradigms for carving up MPIO traffic

Failover/Redundant In this mode, one link is active, while the other is passive i.e. up, but not doing anything. If the first link fails, the second path takes over and all existing traffic streams continue to receive the same bandwidth (% of the total available pie) on the same terms as before. This would give us a completely separate road that can only be used in emergencies. It may not be as fast or robust, or it may be identically spec’d and just as capable. A failover design may or may not return traffic to the first channel once it becomes available once again.
Round Robin This mode alternates traffic between channel 1 and channel 2, then goes back to channel 1, channel 2 and so on. Both links are active, both receive traffic in a slight skew as the data is de-queued at the sender. This offers the 2 lane analogy used above with each 70mph car getting to the end at roughly the same time.
Least Queue Depth This puts the traffic into the channel that has the least amount on it (or to be more accurately about to go onto it). If one channel is busier than the other (e.g. the large video file) then it will put other traffic down the second channel, allowing the video to transfer without needing to slow down to allow new traffic to join, delaying its delivery. There are many different algorithms that exist on how this is achieved, including varieties that use hashing to offer clients consistent paths based upon Layer 2 or Layer 3 addresses.
Path Weighting Weighted paths and least blocking methods assess the state/capabilities of the channels. This is more useful if there are lots of hops between source and destination, multiple routes between a destination or different channels have different capabilities. For example, if you have iSCSI running through a routed network, then there could be multiple ways for it to get there. One route may go through 5 routers and another 18 routers. Generally, the 5 router path might be preferable, provided the lower hop route genuinely gets the data there faster. Equally the weighting could be based upon the speed of the path through to the recipient or finally, if channel 1 is 10Gbps and channel 2 is only 1Gbps, then you might prefer the 10Gbps path to be used with a higher preference. Usually, a lower weighting number means a higher preference. This would be the equivalent of a 70mph road with a backup road with a max speed of 50mph. You know that it will get you to the destination, but you can guarantee that if you have to use it, it will take longer.

So, more lanes equal more stuff then?

Sounds simple doesn’t it? Just keep throwing lanes into the road and then everyone gets to travel smoothly at 70mph.

In principle, it is a nice idea, but in practice it doesn’t actually work in most iSCSI implementations.

For starters, server grade network card (which you should be using for MPIO, and not client adapters) are expensive and server backplanes can only accept a finite amount of them. Server NIC’s also consume power and power consumes money! Keep that in mind if you do decide to throw extra ports at an iSCSI solution.

The reality is that if you have an MPIO solution that will allow you to experiment with more than 2 NIC adapters in a MPIO group, you will likely see the performance gain rapidly tail off. In turn it will actually wind up presenting you with steadily worsening performance, not the increase that you are expecting.

Attempting to MPIO iSCSI traffic across 4x 1Gbps NIC’s actually offers worse read and write speeds for a virtual machine than 2x 1Gbps under a Hyper-V environment (see tests E and F below). The system starts to waste so much time trying to break apart and put back together each lane’s worth of traffic that it just doesn’t help the hypervisor.

Where a 4 NIC configuration is beneficial is actually in providing you with a “RAID 6” MPIO solution. Here you can have 2 active and 2 passive adapters – remember in an idealised scenario they could be 2x10Gbe and 2x1Gbe with a hard-coded preference for the 10Gbe and a method of failing traffic back to the 10Gbe. Just be aware that you can only use the 10Gbe set OR the 1Gbe set at the same time, not one port from each. The exception to this rule is for hashing based channel assignment as these offer more paths to “permanently” assign data into, without the overhead of path swapping or de-fragmentation of traffic.

Some DSM’s (effectively a OEM specific MPIO driver under Windows, such as Dell Host Integration Tools [HIT] or NetApp Host Utilities) logically limit a MPIO to two active NIC’s if the storage controller is only exposing 2 usable NICs back to the HIT instance. Dell EqualLogic Host Integration Tools (the EqualLogic DSM) will grab the first two paths it finds and shutdown any others into a passive state, no matter how hard you try to start them up.

What should a MPIO network “look” like?

Ultimately this is down to what you want to get out of the MPIO solution and within the bounds of what your hardware vendor will support.

There are effectively three schools of thought here (I won’t comment on which is right because as you’ll see, it isn’t that simple)

MPIO is about Meshing

If you see MPIO is a mesh then 2 NIC’s in a server connecting to 2 NIC’s in a storage appliance equals a mesh where each NIC has a path to the other. This is more aligned with how you probably already think about Ethernet networks.

MPIO is about Pathing

If you see MPIO in this model it is simple about more than one line being drawn between two different end points, with no line crossing or adding any complexity, complication and confusion. This is more aligned with how you likely currently think about SAS, Fibre Channel and hard drive wiring.

MPIO is about Redundancy

This is the purest of the three views. It sees the complexity and overheads associated with MPIO as being a problem – there will always be some sort of increase in latency, a drop in some aspect performance by trying to squeeze more bandwidth out of MPIO. This view attempts to keep the design simple, run everything at an unimpeded wire speed but maintain the failover functionality afforded by MPIO.

The three schools of thought are outlined in the diagram below.

Why not Meshing?

When you start out with MPIO, you may be tempted towards implementing option 1. After all, your Server NICs (circles) are likely connected to a switch, as is your storage array (squares). The switch allows you to design to this topology and if you allow the MPIO system to have knowledge over all possible permeations of connectivity, the system will highly redundant, making it very robust.

Yes and no! Yes, it is very robust, but at this point in your implementation, how do you know which path traffic is taking? How do you know that it is optimised? What is stopping Server NIC1 and Server NIC2 from both talking to storage NIC1 at the same time? If they do that, then they have to share 1Gbps of bandwidth between them while Storage NIC2 is left idle. Suddenly all of your services will have intermediate bursts of speed and infuriating drops in performance. The more server NICs that you add, the faster the decrease in performance will be. With 4 Server NICs, there is nothing to stop the MPIO load balancer from intermittently pushing the data from all 4 Server NICs towards a single Storage NIC.

In a Round Robin setup, in a full Mesh design (as shown in #1) it will likely order the RR protocol in the order that you gave the system access to the paths. Given the following IP Addresses

Server: 192.168.0.1, 192.168.0.2
Storage: 192.168.0.11, 192.168.0.12

The RR table could like this

  1. 192.168.0.1 -> 192.168.0.11
  2. 192.168.0.2 -> 192.168.0.11
  3. 192.168.0.1 -> 192.168.0.12
  4. 192.168.0.2 -> 192.168.0.12

Or it could like like this

  1. 192.168.0.1 -> 192.168.0.11
  2. 192.168.0.1 -> 192.168.0.12
  3. 192.168.0.2 -> 192.168.0.11
  4. 192.168.0.2 -> 192.168.0.12

In both examples you either have two different sets of traffic being sent from the same Server NIC concurrently or received by the same Storage NIC concurrently. This is going to undermine performance, not improve it (this is outlined in Mbps terms in the tests shown later in this document).

In a failure situation, the performance issue is exacerbated

  • If #3 fails, then nothing changes in performance or bandwidth.
  • If #2 fails then the total bandwidth available to the system halves and all services contend using the first link.
  • If #1 fails then as with #2, all services suffer with contended bandwidth, however the system also has the overhead of MPIO to further reduce performance.

What benefit is there to MPIO operating in scenario #1? In this failed state, should one of the Storage NICs also fail, the system will continue to operate. In #2 if the working Storage NIC fails, the entire system will fail despite the fact that the Storage NIC on the second path is actually working. It is up to you and your design as to whether you think that the performance hit that you will experience is worth this extra safeguard? In a highly secure system, mission critical or safety system it may be worth the extra overhead.

There are however some middleware layers that can manage this for you. Dell Host Integration Tools (HIT), does, for example, attempt to undertake some management of these types of situations, optimising the mesh by putting the links that will cause overhead into a failover only state, while maintaining the optimal number of active mesh links. In my experience though, the HIT solution is not able to perfectly manage the optimal risk. It does not provide any consideration over redundant NIC controllers. For example, if you have 2 physical Dual Port NICs in your Server with the intention of one port from each NIC making up the active “pair”, Dell HIT is not able to detect or be programmed to ensure that the active paths are prioritised around ensuring that the correct controller is being used. In my experience, it will tend to bunch them together onto the same physical NIC controller, leaving the second controller idle.

Fixing this problem requires an additional layer of complex, expensive and usually proprietary middleware logic, further impacting performance and increasing cost. Therefore, industry best practice is to avoid thinking of iSCSI MPIO as being a Full or even a Partial Mesh, but instead think of it as offering independent channels akin to those shown in #2. It is for this reason that virtually all iSCSI MPIO vendors insist that each Server -> Storage NIC pair exist on its own logical IP subnet as this completely negates the possibility of interweaving the MPIO paths while also ensuring that any subnet-local issue (such as a broadcast or unicast storm) is only likely to take down one of the subnets, not both.

iSCSI as part of a Virtual Network Adapter, Converged Fabric LBFO Team

Since the release of Windows Server 2012, Microsoft have allowed to be hinted at the idea of using iSCSI through Converged Fabric* Load balancing Failover (LBFO) teams — as long as the iSCSI NICs are Virtual and they connect through a Hyper-V VM Switch which itself backs onto a Windows Server LBFO team. Even the venerable Aidan Finn has hinted at it. I have, however, never seen a discussion of it being attempted online, neither have I ever seen it benchmarked.

To be clear over what we are talking about when I say a Virtualised, Converged Fabric, LBFO Team:

  1. 4x 1Gbps Ethernet physical adapters
  2. Grouped into a Windows Server 2016 LBFO Team, appearing to Windows as a single logical network adapter called “ConvergedNIC”
  3. “ConvergedNIC” is connected to an External Virtual Switch called “ConvergedSwitch”
  4. A Virtual Machine Network Adapter is created on the Hypervisor’s Parent Partition (ManagementOS) and this is assigned to the correct VLAN, given an IP address and hooked up to the iSCSI Target
  5. 4 physical NICs, no MPIO, 1 logical NIC

So, does it work?

Yes! It does work and it appears to be stable and even usable; but with some sacrifice in performance (keep reading for some benchmark numbers as “test A” below). I have however had test VMs running under this design for nearly a year without any perceivable issues in either VM or hypervisor stability.

* If you are not familiar with the Concept of a Converged Fabric: A Converged Fabric is a data centre architecture model in which the concept of 1 NIC = 1 Network/Subnet/VLAN/Traffic Type is abandoned. Instead, NICs are usually pooled together into Teams with multiple traffic types, Networks, Subnets and VLANs being allowed to use any of the available bandwidth within the team. Quality of Service (QoS) algorithms are used to ensure that priority traffic types are defined (such as iSCSI in this example), ensuring that the iSCSI system is never starved for bandwidth by someone performing a large file transfer across the team. A Converged Fabric architecture is considered to be more efficient, lower cost and offer better failover reliability than traditional methods in which entire 1GbE or 10GbE NICs could be left idle, waiting for traffic that while high bandwidth, may be infrequent. A Converged Fabric architecture allows other users/systems to benefit from the available bandwidth when not needed by its primary application. It can also offer the primary application additional bandwidth in some situations.

If you have an 8 NIC Hypervisor setup with 2 physical iSCSI NICs, 2 physical production network NICs, 1 physical heartbeat NIC, 1 physical live migration NIC, 1 management network NIC and 1 out of bounds management NIC, then you are paying to power but to not derive much of any benefit from NICs 4-8 due to how infrequently they are used. If this sounds familiar to you, then you should consider migrating to a Converged Fabric design.

Quantifying Best Practice

So far, this article has discussed MPIO, meshing, pathing and redundancy as well as a quick detour into using converged fabric LBFO for iSCSI connections. So let’s look at some numbers that underpin these approaches.

Tests were undertaken using the following hardware configuration:

  • Dell EqualLogic PS4110x running firmware 9.1.1 R436216, with 2 active 1GbE NIC’s on a single controller
  • Dell PowerEdge P630 with 8x1GbE adapters (4x Broadcom NetXtreme and 4x Intel I350 adapters) with 9K Jumbo Frames correctly enabled
  • Windows Server 2016
  • Switching on Cat6a cabling via 2x Cisco Catalyst 2960-48’s
  • The 64K block, GPT formatted, 3TB target LUN was setup as a CSV and the nodes were in a Cluster with a second identical node idling as a second cluster member (CSV-FS has a natural performance hit compared to NTFS)

7 tests were performed as outlined in the following table

Physical Paths
Active NICs
Test Description Active Passive Intel Broadcom LBFO Team Dell HIT MPIO Mode
A
4 NIC in LBFO Team, No MPIO
4
0
0
4
Y
N
n/a
B
4 NICs, fully meshed, RR
8
0
2
2
N
N
Round Robin
C
2 NICs, no mesh (point to point)
2
2
2
0
N
N
Round Robin
D
1 NIC only (control test)
1
1
1
0
N
N
n/a
E
4 NICs, fully meshed, LQD
8
0
2
2
N
N
Least Queue Depth
F
4 NICs, partial mesh, RR
4
0
2
2
N
N
Round Robin
G
2 NICs, no mesh (point to point) with EqualLogic Host Integration Tools
1
1
2
0
N
Y
Least Queue Depth

If you are more visual, the following diagram summarises the above in a graphical format

The Results

The following table summarises the read/write performance of each test on Sequential 4MB reads as outlined through “Anvil’s Storage Utilities”, version 1.1.0, build 1st January 2014. all tests were performed on the same Windows 10 Enterprise VM without rebooting in between each test and without performing any other activities on the VM disk.

The results below are ordered by test, from the test offering the best performance to the test offering the worst performance, using the Read MB/s column as the sort index.

Sequential 4MB (Read)
Sequential 4MB (Write)
Test
Response (ms)
MB read
IOPS
MB/s
Control Deviance (%)
Response (ms)
MB written
IOPS
MB/s
Control Deviance (%)
C
30.4791
1052
32.81
131.24
32.17
21.7266
1024
46.03
184.11
70.25
F
39.801
804
25.13
100.50
1.21
468.9896
772
2.13
8.53
-92.11
D
40.2814
796
24.83
99.30
0
36.9883
1024
27.04
108.14
0
A
51.3782
624
19.46
77.85
-21.60
89.5977
1024
11.16
44.64
-58.72
G
60.7197
528
16.47
65.88
-33.66
23.8047
1024
42.01
168.03
55.38
E
273.9667
120
3.65
14.60
-85.30
1010.7556
360
0.99
3.96
-96.34
B
404.65
80
2.47
9.89
-90.04
964.766
376
1.04
4.15
-96.16

Response (ms) = Lower is better
MB read/written = Higher is better
IOPS = Higher is better

Control Deviance (%) = the positive or negative impact in MB/s performance compared to the single NIC, no MPIO control test (test D).

Test A | Converged Fabric LBFO

The Microsoft dream of virtualising everything does hold up – at not being completely terrible. Sitting in the middle of the table, using a fully converged fabric, virtualised setup across 4 NICs resulted in a 22% reduction in read speed compared to a single NIC and a 59% reduction in write speed.

There may be some improvements to made by creating multiple Virtual iSCSI interfaces connected to the virtual switch, however these were not tried. Based upon the current view of the technology, while it works and offers a data centre design simplification, that simplification factor is not worth the performance sacrifice.

Test B | Round Robin, Full Mesh

This test proves that viewing an iSCSI setup as a full mesh and throwing NICs at the proverbial problem is going to do nothing to help you. Your iSCSI should be configured in a 1:1 “path” setup between initiator and target. Any additional NICs should be put into “Round Robin with subset” i.e. made to be passive fail-over adapters. That is a 90% and 96% reduction in respective read/write performance!

Test C | Round Robin, 1:1 Paths

This test proves how you are supposed to use iSCSI. Two, non-crossing paths allows for a full bandwidth connection down each path between the initiator and the target. This configuration provided an increase in performance over a single adapter and was the only test that provided improvements to both read and write metrics.

Test D | Control

This was the baseline control test for this experiment. 1 NIC talking to 1 controller port. Nothing complicated here.

Test E | Least Queue Depth, Full Mesh

This test repeated Test B, but changed the MPIO model from RR to LQD to see if it made any difference. Read performance was slightly better than under RR, but was still 85% worse than the control test.

Test F | Partial Active Mesh

This test looked to see whether having a partial active mesh made any difference. There was a very small 1% increase in read performance from this, but a significant write penalty. In practice, you cannot push/pull 2Gbps to/from a 1Gbps source, so the design is not conducive towards improved speed under a synthetic load.

Test G | Least Queue Depth, 1:1 Paths

Test G was a genuine surprise. I was expecting to see Dell EqualLogic Host Integration Tools (HIT) version 4.9 offer an increase in performance, not a decrease. However, repeating the test yielded the same results. In my experience, this has never usually been the case, with VM’s feeling more responsive with HIT installed compared to not. Experience suggests to me that something else was at play here, perhaps the HIT version being poorly optimised for Windows Server 2016, or the Dell stack getting grumbly about it using a retail Intel I350-T4 adapter instead of a Dell one. Dell HIT forces the use of pathing no matter what you try and set all other adapters into passive mode. It used LQD as the MPIO algorithm. Evidently this resulted in an increase in writes but a reduction in read performance, be it not as high as without HIT being installed.

Although not shown in the results above, HIT did help improve performance in some of the Anvil Tests. The long queue depth tests resulted in higher IOPS figures for both read and write values by a small margin. None of the other tests yielded such an improvement.

Conclusion

As you can see from these results. There is only one way that you should be conceptually thinking about your iSCSI environment – 1:1, point to point paths. Anything over and above this should be set to being passive/failover/offline in order not to impact performance.

General Subnet Recommendations

Subnet recommendations go hand in hand with this, but you should note are generally made by the storage vendor — and you should follow their advice. I have encapsulated the general recommendations/requirements of a number of providers in the table below. The subnet count column is in essence a statement that for each NIC on the storage device, there should be a dedicated subnet (and ideally broadcast domain/VLAN) back to the iSCSI server.

Vendor Subnet Count Source
Dell (Non-EqualLogic)
2 View
Dell EMC
2 View
Dell EqualLogic
1 View
Microsoft
2 View
NetApp
? I couldn’t find any guidance from an official source. There is community evidence of both being used by end-users
NetGear
2 View
QNAP
2 View
Synology
2 View

As you can see, with the exception of Dell EqualLogic which provides a middleware solution known as the Host Integration Tools (HIT) to cope with this, most vendors are quite specific on the use of a “single path” logical topology for server/storage connectivity — aka one subnet per storage appliance NIC.

General Advice

I will end this piece with some general advice and tips for working with MPIO. It isn’t exhaustive, but they are some quick observations from experience of using the technology for many years. Some of them are obvious, some of them might help you avoid a head scratcher.

  1. If you are using an enterprise iSCSI solution, follow the vendor’s advice, forget anything you read on the Internet. Everyone is a know-it-all on the internet and there are plenty of “I’m a Linux user so I know best” screaming matches about how EqualLogic are wrong about the recommendations for EqualLogic’s own hardware. I’m pretty sure that EqualLogic… uh, tested their stuff before writing their user manual.
  2. If you are using an enterprise solution and the vendor offers a DSM (MPIO driver), use it. Dell HIT vs the generic Microsoft DSM for Windows Server is noticeably faster, but only works will Dell SAN hardware (naturally). Also ensure that you keep you DSMs up to date.
  3. Follow you vendor’s guidelines with respect to subnets. If in doubt, drop them an email. You’ll usually find them quite accommodating.
  4. Unless your vendor has expressly told you to, you do not MPIO back from the storage system – i.e. don’t team, MPIO, load balance etc on the storage side. Do it all on the server initiating the request.
  5. Stick to two port/1:1 path MPIO designs. If you need more create multiple pairs and have each on different networks going to different storage systems so that the driver knows where to send traffic explicitly while maintaining isolation.
  6. If you want to think about your MPIO as a meshing design, it has to be meshed for redundancy, not active links (unless your system needs to keep living, breathing human beings alive and do so at all costs).
  7. With iSCSI and SAN MPIO, try and avoid network hops (routers).
  8. All ports in a group must be the same type, speed and duplex.
  9. Disable port negotiation and manually set the speed on the client and switch, this will make failover/failback processes faster for your redundant paths.
  10. Use VLAN’s as much as possible (try and avoid overlaying broadcast domains across a shared Layer 2 topology).
  11. Use Jumbo Frames as much as possible unless the iSCSI subnet involves client traffic.
    Hint: Your iSCSI subnet should not involve client traffic!
  12. Ensure that your NIC drivers and firmware are kept up-to-date
  13. Disable all Windows NIC service bindings apart from vanilla IPv4 on your iSCSI networks. For example, Client for Microsoft Networks, QoS Packet Scheduler, File and Printer Sharing for Microsoft Networks etc. If you aren’t using it, disable IPv6 too on the iSCSI interfaces to prevent IPv6 node-chatter.
  14. In the driver config for your server grade NIC (because you are using server grade NIC’s, right?) max out the send and receive buffer sizes on the iSCSI port. If the server NIC has iSCSI features that are relevant (such as iSCSI offloading), enable them.
  15. When you are building a Windows Server, script the MPIO install, enable MPIO during the script and set the default policy as part of the build process —- then patch and REBOOT the system before you even start configuration. If I had a £1 for every time I’d had to rescue someone from not doing that and then not REBOOTING…
  16. If you are using a SOHO/SME general purpose commodity NAS, if (and only if) you have a UPS, disable Journaling and/or Sync Writes on your iSCSI partitions/devices. There is a benefit, but remember if you are hosting SMB shares on a commodity appliance you actually do want Journaling running on those volumes.
  17. Keep your NAS/SAN firmware up to date.
  18. Keep your storage system and iSCSI block sizes, cluster and sector sizes optimised for the workload. Generally this means bigger is better for virtualisation storage and video. 256/64K, 128/64K or 64/64K depending on what your solution can offer.
  19. Keep volumes under 80% of capacity as much as possible.
  20. Use UPS’s: Remember, iSCSI and SAS are hard drive/storage protocols. They are designed to get data onto permanent storage medium just like RAID controllers. RAID controllers have backup batteries because you do not want to lose what is in process in the RAID controller cache when the power goes out. Similarly, you need to think of your iSCSI and External SAS sub-systems much the same as you would a RAID sub-system.
  21. If you have a robust UPS solution, enable write caching and write behind/write back cache features on your storage systems and iSCSI mounted services to gain extra performance benefits. Be mindful that there is risk in this if your power and shutdown solution isn’t bullet proof.
  22. Test it! Build a test VM and yank a cable out a few times. You’ll be glad you sacrificed a Windows install or two to ensure it is right when you actually pull an iSCSI cable out of a running server… Believe me I know what a relief that is.