Redesigning the Hardware for the Virtual TV Streaming Server

This article discusses a hardware design change to the Virtual TV Streaming Server discussed in Creating a Virtual TV Streaming Server.

If you are not familiar with the previous setup. The design revolved around an array of TV tuners connected to a 7-port USB 3.0 hub. In turn, this connected to a USB 3.0 controller which was passed through Discrete Device Assignment (DDA) through to a Windows 10 Virtual Machine. This run DVBLogic TV Mosaic, the IP TV streaming software.

 

Virtual TV Streaming Server Meltdown

The solution has run extremely well. There have been no crashes from TV Mosaic, the VM or the Hypervisor. Until last week.

The system missed last Saturdays recording schedules and on Sunday afternoon, wouldn’t initiate playback. On inspection of the VM, one of the Tuners was showing as “unknown” on the TV Mosaic console. The others were all fine. Once this phantom tuner was removed from the console, everything started working again.

Initially thinking that it was related to a coincidental BIOS update on the server, it turned out that the tuner was simply dead. I RMA’d it with DVBLogic – who didn’t challenge my diagnostic or offer any resistance – but I did have to ship it Internationally at my own expense.

A week later, I came to use the system again and, once more, it was dead. A trip to the attic later and the was dead. A multimeter confirmed that the power supply had died, and I begun an RMA process with StarTech this morning.

 

Analysis

If the power supply on the StarTech was defective, it could potentially have caused the fault with the TV Butler tuner. Although this is speculative and unprovable. My main suspicion is that the problems were caused by heat. The attic roof space is uninsulated, and the UK is in the summer period. With temperature in the attic space certainly to have ranged into the 40c’s.

Unlike with the physical TV server that this setup replaced – which had fans. This setup doesn’t. PCIe TV Tuners are intrinsically designed to withstand higher thermal variances than USB ones. The StarTech and TV Butler products are quite simply basic consumer devices. It is possible that this factor led to both of their demises.

There was a power outage mid-week last week, and the StarTech itself was not sitting on the server UPS – but it was on a surge protector. It is my belief that this did not contribute to the issue.

 

Hardware Redesign

The brief for the redesign is simple

  1. Remove essential electrical components from the attic
  2. Minimise space use
  3. Minimise electrical consumption (as everything will now be powered through the UPS)
  4. Do not clutter up the backplane of the server with dongles

 

Power

To accommodate #1, #2 and #3 the USB Hub is going to be eliminated from the design. The TV Tuners will now connect directly to the DDA USB controller. In order to do this, the dual port controller will need to be replaced.

After deliberating on whether to get an externally powered or bus powered 4-port controller, I chose a , bus powered card. A risk, given my previous experience here. The DG-PCIE-04B reviewed better than a similarly priced externally powered one. The decider was that it uses a NEC chipset and not a RealTek/SiS (i.e. cheap) chip. Finally, the fact that each of the ports had its own voltage management and fuse circuit is a valuable quality safeguard.

 

Patch Panel TV

To satisfy design brief #4, the USB TV Tuners will need to be mounted away from the server. To achieve this, I am going to mount the Tuners in the patch panel.

Using a set of keystone jacks. A USB lead will run between the USB controller and the Patch panel; simply mounting to the TV Tuners held in the patch panel.
TNP USB 3.0 Keystone Jack Image

The patch panel happens to be near the ceiling, directly above the TV aerial distributor for the house. Using 4m coaxial cable, the aerial feed can route through the existing ceiling cable run and clip neatly into the TV Tuners.

The Amazon order consisted of

  • 1x
  • 1x Pack of 5
  • 4x Rankie USB 3.0 Type A Male to Male Data Cable, 3m (Server – Patch Panel)
  • 3x Ex-Pro White Coax F Plug Type – to – Male M Coax plug Connection Cable Lead – 4m (Aerial distributor – TV Tuners)

 

Installation

The installation was extremely simple.

  1. Replace the existing 2 port USB controller with the 4 port one
  2. Clip the USB 3.0 keystones into the patch panel
  3. Run cables between the USB controller and the front profile (base) of the USB keystones
  4. Passing the USB controller through to Hyper-V
    1. Shutdown the Virtual Streaming TV Server VM
    2. Get the Device Instance Path from the Details tab > Device instance path section in Device Manager e.g.
      PCI\VEN_1912&DEV_0014&SUBSYS_00000000&REV_03\4&1B96500D&0&0010
    3. Use PowerShell to dismount the USB Controller from the Hypervisor and attach it to the VM
$vmName = 'TvServer'
$pnpdevs = Get-PnpDevice -PresentOnly | Where-Object {$_.InstanceId -eq 'PCI\VEN_1912&DEV_0014&SUBSYS_00000000&REV_03\4&1B96500D&0&0010'}
$instanceId = $pnpdev.InstanceId
$locationPath = ($pnpdevs[0] | get-pnpdeviceproperty DEVPKEY_Device_LocationPaths).data[0]
Write-Host "    Instance ID: $instanceId"
Write-Host "    Location Path: $locationpath"

# Disable the Device on the Host Hypervisor
Disable-PnpDevice -InstanceId $instanceId -Confirm:$false

# Wait for the dismount to complete
Start-Sleep -s 15

# Dismount the Device from the Host Hypervisor
Dismount-VmHostAssignableDevice -locationpath $locationPath -Force

# Attach the PCIe Device to the Virtual Machine
Add-VMAssignableDevice -LocationPath $locationpath -VMName $vmName

# Note: You may need to reboot the Hypervisor hosts at this point.
# If the VM's device manager informs you that it can see the controller, but is  unable to initialise
# the controllers USB Root Hub. A reboot should fix it.
  1. Clip the DVBLogic TV Butler TV Tuners into the patch panel USB keystone jacks using the inside (top) port on the keystones
  2. Start the TV Server VM
Photograph of USB Tuners mounted in patch panel
The patch panel now has three USB ports – the left-most TV Butler is missing as the RMA replacement has not yet arrived.

Photograph of USB Tuners mounted in patch panel Photograph of USB Tuners mounted in patch panel

The Virtualised Windows 10 Streaming TV Server came back online and there hasn’t been any instability caused by the bus-powered USB controller. The TV Butler’s are warm to the touch, have plenty of air-flow and the ambient temperature can be monitored via existing sensors in the room.

The completed assembly in the Patch Panel

With any luck, I will not need to revisit this project for quite some time!

VBScript Timer() function precision

This article explores the numeric precision of the ASP and VBScript Timer() function, outlining that it may be more accurate than it initially appears.

 

Timer()

Prior to the arrival of PowerShell. The VBScript Timer() function wass the closest thing that script creators had for undertaking high precision timings in the Windows Scripting Host (WSH) environment.

The Timer() function returns a VBSingle – aka a Single Prevision floating point number or a “Real” – value as a representation of the systems real time clock. As a Single precision value, the permitted range is -3.402823E38 to -1.401298E-45 for negative values; 1.401298E-45 to 3.402823E38 for positive values. If you print out the value of Timer() in ASP/VBScript

WScript.Echo Timer()

You will get a value like

52191.62

This value is a representation of the number of seconds that have elapsed on the local executing system since the local system click last hit midnight. Consequently, evaluating Timer() against a DateDiff() evaluation of the number of seconds since Midnight will result in the same answer (save for the decimal point).

WScript.Echo Timer()
WScript.Echo DateDiff("s", #2019-07-01#, Now())

Which results in

52191.62
52191

 

The hidden detail

Masked in the 2 decimal places default printout. Timer() is significantly more detailed than just two decimal points. Depending on the precision of your systems Real Time Clock (RTC) the precision may be up to 7 decimal places. You can view your systems capability by subtracting the Integer from the timer value:

WScript.Echo Timer() - Int(Timer())

Which may result in a value such as

0.6171875

At 7 decimal places, the precision of Timer() is – floating point number inaccuracy side – considerably better than that of VBDateTime. Under VBDateTime, the second is the atomic value, offering no more precision.

0         - Second
0.6       - Decisecond  / 1 tenth
0.61      - Centisecond / 1 hundredth
0.617     - Millisecond / 1 thousandth
0.6171    -               1 ten thousandth
0.61718   -               1 hundred thousandth
0.617187  - Microsecond / 1 millionth
0.6171875 -               1 ten millionth

This demonstrates that there is flexibility in VBScript for more precise clock operations. But is the resolution high enough?

 

Fidelity

The resolution and fidelity of the Timer() function is what makes it valuable (or not) to a programmer. On a modern system, with a High Precision Event Timer (HPET), the update interrupt will only fire so many times to update the clock.

Running an imprecise test as follows:

for i = 0 to 999
  WScript.Echo Timer() - Int(Timer())
next

The output value only changed between 10 and 31 cycles (reflective of the CPU scheduler performing other tasks during execution. The counter incrementation was consistent, updating 48 times with an increment of between 0.0117187 and 0.0195313 seconds. Once every 20.8 cycles on average.

0.0195313
0.0195313
0.0195313
0.0195313
0.0195313
0.0195313
0.0195313
0.0195313
0.0195313
0.0195313
0.0195313
0.0195313
0.0195313
0.0195313
0.0195313
0.015625
0.015625
0.015625
0.015625
0.015625
0.015625
0.015625
0.015625
0.015625
0.015625
0.015625
0.015625
0.015625
0.015625
0.015625
0.015625
0.015625
0.0117187
0.0117187
0.0117187
0.0117187
0.0117187
0.0117187
0.0117187
0.0117187
0.0117187
0.0117187
0.0117187
0.0117187
0.0117187
0.0117187
0.0117187
0.0117187

In practice this means that the viable, comparable resolution of Timer() is not much better than once aver 0.2 seconds. Over a longer time period, Timer() can offer higher accuracy: provided you aren’t polling for an update more than every 0.2 of a second.

Scanning and repairing drive 9% complete – the curse of chkdsk

This article discusses an issue of a computer getting stuck at boot with the message “Scanning and repairing drive 9% complete” with chkdsk hanging at 9%.

The hypervisor was 12 months over-due for a BIOS update. Updating the UEFI should be simple enough, however SuperMicro have a nasty habit of clearing the CMOS during BIOS updates. Why most other OEM’s are able to transfer settings and SuperMicro insists on not is one of only a few gripes that I have ever had with the firm. Yet it is a persistent one that I’ve had with them going back to 1998.

The Fault

After the successful update, I reset the BIOS to the previous values as best I could recall. Unfortunately I also enabled the firmware watchdog timer.

SuperMicro’s firmware level watchdog timer does not operate as you might expect. It requires a daemon or service to be present within the running operating system that polls the watchdog interrupt periodically. If the interrupt isn’t polled, the firmware forces a soft reboot. Supermicro do not provide a driver to do this for Windows, although their IPMI implementation can do so.

After 5 minutes from the POST the hypervisor performed an ungraceful, uninitaited reset. Following the first occurrence I assumed it was completing Windows Update. Subsequent to the second, I was looking for a problem and after the third (and a carefully placed stopwatch) I had a suspicion that I must have turned on the UEFI watchdog.

I was correct and, after disabling it, the issue was resolved.

This particular hypervisor has SSD block storage for VMs internally and large block storage for backup via an external USB 3.1 enclosure – a lot of it. Without giving it any thought, I told the system to

chkdsk <mountPoint> /F

Note that this does not include the /R switch to perform a 5 step surface scan. I told chkdsk not to dismount the volume, but to bundle all of the scans together during the required reboot to scan the C:. Doing it this way meant that I could walk away from the system. In theory this would mean that when chkdsk finished, it would rejoin the Hyper-V cluster on its own and become available to receive workloads.

… and restarted.

 

Scanning and repairing drive 9% complete

chkdsk skipped the SSD storage as it is all configured as ReFS. Under ReFS, disk checking is not required as it performs journaling activities in the background to preserve data integrity. Unfortunately, the external backup enclosure volume was NTFS. It would be scanned – and it was also quite full.

The system rebooted, and sitting at the intermedia chkdsk stage of the NT boot process. It zipped through the SSD NTFS boot volume in a few seconds, before hitting the external enclosure. Within around 5 minutes it had arrived at the magic “9% complete” threshold.

1 hour, 2 hours, 4 hours… 8 hours. That turned into 24 hours later and the message was still the same.

Windows Boot Scanning and repairing drive (F:): 9% complete

Scanning and repairing drive (F:): 9% complete.

Crashing the chkdsk

The insanity of waiting over 24 hours had to come to an end and I used IPMI to forcefully shutdown the server.

After a minute or two, we powered back on. To be met with a black screen of death from Windows after the POST.

The c:\pagefile.sys was corrupt and unreadable. Perform a system recovery or press enter to load the boot menu. On pressing enter, the single option to boot Windows Server 2019 was present, and, after a few moments. Windows self-deleted the corrupt pagefile.sys, recreated it and booted -to much relief.

I then ran

chkdsk c: /f

and rebooted, which completed within a few seconds and marked the volume as clean, with no reported anomalies.

The Windows System Event Log contained no errors (in fact as you might expect, no data) for the 24 hour period that the server had been ‘down’. The were no ‘after the event’ errors added to the System log or any of the Hardware or Disk logs either. for all intents and purposes, the system reported as fine.

 

Trying chkdsk for a second time

I decided to brave running chkdsk on the external enclosure again. Initially in read-only mode

chkdsk F:

Note the absence of the /F switch here.

It zipped through the process in a few seconds stating

Windows has scanned the file system and found no problems.
No further action is required.

Next I ran a full 3-phase scan

chkdsk F: /F

Again, it passed the scan in a few seconds without reporting any errors. So much for the last 24 hours!

 

Analysis

The corruption in the page file indicates that Windows was doing something. The disk array was certainly very active, with disk activity visible (via LED), acoustically and via data from the power monitor on the server all confirming that “something” was happening. Forcibly shutting down the system killed the page file during a write. Had been a 5-step chkdsk F: /f /r scan I could understand the length of time that it was taking.

With chkdsk /f /r – assuming a 512 byte hard drive – the system has to test 1,953,125,000 sectors for each terabyte of disk space. Depending on the drive speed, CPU speed and RAM involved it isn’t uncommon to hear of systems taking 5 hours per-terabyte to scan. This scan was not a 5-step scan, just a 3-step. A live Windows environment could scan the disk correctly in a few seconds.

Resources were not an issue in this system. Being a hypervisor, it had 128GB of RAM and was running with 2018 manufactured processors.

My suspicion is that the problem exists because of a bad interaction between the boot level USB driver and the USB enclosure. The assumption is that Windows fell into either a race condition or a deadlocked loop. During this fault, chkdsk was genuinely scanning the disk and diagnostic data was being tested in virtual memory (i.e. in the page file) but it was never able to successfully exit.

The lesson that I will take away from this experience is that unless it to avoid using a boot cycle chkdsk to perform a scan on a USB disk enclosure.

“TsManager.exe – Entry Point Not Found” error during MDT task sequence

This article discusses an error that you may receive performing a MDT Windows Client Upgrade Task Sequence. The error message “TsManager.exe – Entry Point Not Found” may halt, but not end your Task Sequence process or, my cause it to fail prematurely part-way through the task sequence.

Error Message Screenstor of TsManager.exe - Entry Point Not Found

TsManager.exe - Entry Point Not Found
The procedure entry point MDMIsExternallyManaged could not be located in the dynamic link library C:\WINDOWS\CCM\lsutilities.dll

Note: The above error is for “LSUtilities.dll” not “iSUtilities.dll”

The name of the entry point stated in the error may differ depending on

  1. The stage in the MDT Task Sequence where the error occurs
  2. The version of the SCCM Client installed on the computer. The above error is from SCCM Client version 5.00.8790.1025 (SCCM 1902 Update 1) at the end of the Post-Processing phase of the Task Sequence

 

More Info

“TsManager.exe – Entry Point Not Found” is caused by a mismatch between the MDT integration with SCCM and the SCCM Client LSUtilities.dll – which provides Location Services for the SCCM Client.

Microsoft have made the assumption that if you are using the SCCM client and MDT, that MDT has been integrated into SCCM. If you are running a stand-alone MDT LTI build process, you may encounter this problem. The issues occur because MDT integration modifies MDT to work with the task sequence and the SCCM client. Without the integration, the SCCM client’s MDT hooks still fire, but cannot function due to MDT not having been upgraded.

 

Check the Client Version

Ensure that the SCCM Client version you are deploying is an approximate match to the version to the one in General Availability at the time the MDT build was released.

For example, MDT version 8456 was released 25th January 2019. By using the SystemCentreDudes SCCM Client Version list, you can see that the active release of SCCM in January 2019 would have been SCCM 1810. Consequently you should ensure that the SCCM Client version deployed on the system/via the MDT Task Sequence is is a SCCM 1810 release.

This means that you should avoid using the SCCM 1902 client with a stand-alone MDT 8456 build server.

Ensuring client version parity will not prevent all Entry Point errors, however it will minimise the possible range of errors that can be encountered by the Client/MDT. SCCM will self-update the client to the current version as part of its client servicing activities post-install.

View: SystemCentreDudes “SCCM Version Numbers and Cumulative Update List”

 

Preventing an irrecoverable failure of the task sequence

You must uninstall the SCCM client from the workstation before the operating system upgrade commences. You can automate this as part of your task sequence. The only requirement is for it to have been uninstalled before the first reboot of the system by MDT.

  1. Open the “Task Sequence” tab in the properties of you Task Sequence
  2. In the “Preparation” section, after “Validate” add a new “Run Command Line”
  3. Name the task:
    Uninstall SCCM Client
  4. Enter the Command Line:
    cmd.exe /c ""C:\Windows\ccmsetup\ccmsetup.exe" /uninstall & RD /S /Q "C:\Windows\ccmsetup" & RD /S /Q "C:\Windows\CCM""

The first part of the command shuts down all SCCM client services and performs the uninstall:

"C:\Windows\ccmsetup\ccmsetup.exe" /uninstall

The second part of the command deletes the “C:\Windows\CCM” folder, ensuring that LSUtilities.dll cannot be present during the imaging process.

Note: Deleting this folder will delete all historic log data associated with the client.

 

Reinstall the SCCM Client

To reverse the removal and restore the SCCM client, you must introduce its re-installation later in the task sequence. To reinstall the SCCM Client add the SCCM client setup as a new application in the Applications section of your Deployment Share. The following example is for the SCCM 1902 client.

MDT Application Definition for SCCM ClientMDT Application SCCM Client Install Command

Note: Do not select “Reboot the computer after installing this application”. Doing so will force the SCCM client to interact with MDT.

An example of the install command would be:

ccmsetup.exe /service /mp:<management point FQDN> /forceinstall SMSSITECODE=<site code> DNSSUFFIX=<domain suffix> FSP=<server> SMSMP=<server FQDN>

Return to the Task Sequence tab in the properties of the task sequence.

Create an “Install Application” task as late as possible in the “Post-Processing” section of the Task Sequence. Select a single application and browse for the newly created SCCM Client package.

Note: Avoid running additional tasks after the reinstallation of the client as you increase the risk of experiencing an associated fault once the system has restarted.