QNAP TVS-1271U Top Row of Drives (drives 9-12) do not start on boot/reboot of the NAS appliance

System Requirements:

  • QNAP TVS-1271U
  • QTS 4.3

The Problem:

A brand new, out of the box 12 drive 2u NAS appliance, with the “firmware” up to date gets thrown into production. During its first maintenance cycle, the firmware is updated again and it is rebooted.

On completion of the reboot, the array is offline. Visual inspection of the appliance reveals that the top row of drives in the array, drives 9, 10, 11 and 12 are offline. They just are not powered. Drives 1-8 are online and all green.

Rebooting the array shows the diagnostic LED’s on all 12 drives flash red, but the 4 drives in question then go into a powered down state and the boot completes with an inconsistent volume layout.

  1. Hot swapping the drives did not make a difference
  2. Rebooting the array did not make a difference.
  3. Testing all of the top rows worth of drives externally showed no errors in any drives
  4. Performing a couple of hard power downs and cold booting did solve the problem and allowed the array to start normally.

The same thing happened during the next test and diagnostics window.

The Fix

Disgruntled customer note: I want to add at this point that trying to phone QNAP UK is an exercise in futility. I sat in their queue system for more than half an hour listening to the same loop of music without getting anywhere. By the time that I gave up, I’d managed to get the thing to start through cold boot cycles and done quick SMART tests on all of the drives. When it came to this occurring the second time, I used the online chat feature with their people in Taiwan, who were more responsive and most helpful, but would not entertain speaking on the phone come what may.

My bigger issue is that QNAP support seems to have little sympathy for the needs of a production IT department, where as you would have thought that QNAP would be the kings of working to such constraints.

 

Once I had got to second line, the fix in itself was quite simple in the end, but its implementation leaves something to be desired. The system needed a BIOS update.

You might assume that when you are updating your “firmware“, this sort of things is being accommodated for in those updates. Apparently not! It seems as this new device may have been sitting in a warehouse for some time so was out of date, but it was immediately firmware serviced as soon as it was first booted. You would have expected that this sort of well know about, intermittent issue was being dealt with through the update delivery mechanism.

As soon as the BIOS of the TVS-1271U was updated to version QW10IR12 and rebooted, the problem was fixed. Why QNAP have not put information on this on their website knowledge base I do not know. This would seem more than sensible. QNAP does itself more reputational damage by trying to hide the issue and hope that only a few people see it. The reality is that they are likely causing stress and grief to end users and unnecessary RMA’s to their suppliers.

After checking power rails, our initial response was that it was a dead backplane and we were assuming it would have to be an RMA. Fortunately, it wasn’t and fortunately I went to QNAP before I went back to the supplier, but you do not always, especially with companies increasingly insisting that you deal with suppliers for RMA processes. A little public disclosure about this known issue would have just saved a lot of headaches (and disclosure makes you look good QNAP!).

The execution of the BIOS update itself was unfortunately quite cumbersome, requiring the support tech to back-up and then re-programme all of the NIC MAC addresses built into the motherboard. Something that could have easily have been sorted in a shall script and then transparently bundled into the firmware delivery, saving QNAP Taiwan more than an hours’ worth of time on this and some 8 emails.

It did however fix the problem and the Taiwan support team were pretty accomodating about doing it at 7am UK time.

Here for the benefit of the rest of the world, is the process that the tech went through to flash the BIOS. I have substituted real MAC addresses with fake ones below.

Note: I strongly recommend that you do not try this yourself and that you contact QNAP via their web chat support if you have a need to perform this procedure. If you try it, it is entirely at your own risk.

login as: admin
admin@192.168.1.1's password:
[~] # md_checker Welcome to MD superblock checker (v1.4) - have a nice day~ Scanning system... HAL firmware detected!
Scanning Enclosure 0...
RAID metadata found!
UUID: 3e5d7c85:95d82d3e:42647860:c0aaec32
Level: raid6
Devices: 12
Name: md2
Chunk Size: 512K
md Version: 1.0
Creation Time: Apr 19 11:57:42 2017
Status: ONLINE (md2) [UUUUUUUUUUUU]
===============================================================================
Disk | Device | # | Status | Last Update Time | Events | Array State
===============================================================================
1 /dev/sdc3 0 Active Jun 16 07:15:08 2017 156 AAAAAAAAAAAA
2 /dev/sdd3 1 Active Jun 16 07:15:08 2017 156 AAAAAAAAAAAA
3 /dev/sde3 2 Active Jun 16 07:15:08 2017 155 AAAAAAAAAAAA
4 /dev/sdf3 3 Active Jun 16 07:15:08 2017 155 AAAAAAAAAAAA
5 /dev/sdk3 4 Active Jun 16 07:15:08 2017 155 AAAAAAAAAAAA
6 /dev/sdl3 5 Active Jun 16 07:15:08 2017 155 AAAAAAAAAAAA
7 /dev/sdm3 6 Active Jun 16 07:15:08 2017 155 AAAAAAAAAAAA
8 /dev/sdn3 7 Active Jun 16 07:15:08 2017 155 AAAAAAAAAAAA
9 /dev/sdg3 8 Active Jun 16 07:15:08 2017 155 AAAAAAAAAAAA
10 /dev/sdh3 9 Active Jun 16 07:15:08 2017 155 AAAAAAAAAAAA
11 /dev/sdi3 10 Active Jun 16 07:15:08 2017 155 AAAAAAAAAAAA
12 /dev/sdj3 11 Active Jun 16 07:15:08 2017 155 AAAAAAAAAAAA
=============================================================================== [~] # cd /share/CACHEDEV2_DATA/Public/
[/share/CACHEDEV2_DATA/Public] # ls
@Recycle/ messages
[/share/CACHEDEV2_DATA/Public] # wget http://download.qnap.com/Storage/tsd/bios/TVS-1271U_QW10IR12.zip
--2017-06-16 07:16:51-- http://download.qnap.com/Storage/tsd/bios/TVS-1271U_QW10IR12.zip
Resolving download.qnap.com (download.qnap.com)... 2.17.149.135, 2a02:26f0:ec:38c::1b52, 2a02:26f0:ec:398::1b52
Connecting to download.qnap.com (download.qnap.com)|2.17.149.135|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5931607 (5.7M) [application/zip]
Saving to: ‘TVS-1271U_QW10IR12.zip’ TVS-1271U_QW10IR12.zip 100%[=====================================================================>] 5.66M 5.50MB/s in 1.0s 2017-06-16 07:16:52 (5.50 MB/s) - ‘TVS-1271U_QW10IR12.zip’ saved [5931607/5931607] [/share/CACHEDEV2_DATA/Public] # chmod +x TVS-1271U_QW10IR12.zip
[/share/CACHEDEV2_DATA/Public] # unzip TVS-1271U_QW10IR12.zip
Archive: TVS-1271U_QW10IR12.zip
creating: BIOS_QW10IR12/
inflating: BIOS_QW10IR12/flashrom
inflating: BIOS_QW10IR12/QW10IR12.bin
[/share/CACHEDEV2_DATA/Public] # ls
@Recycle/ BIOS_QW10IR12/ TVS-1271U_QW10IR12.zip* messages
[/share/CACHEDEV2_DATA/Public] # dmidecode -t bios | grep version
[/share/CACHEDEV2_DATA/Public] # cd
[~] # cd /
[/] # dmidecode -t bios | grep version
[/] # cd -
/root
[~] # cd /share/CACHEDEV2_DATA/Public/
[/share/CACHEDEV2_DATA/Public] # head /etc/config/uLinux.conf
[System]
Model = TS-X71U
Internal Model = TS-X71
Server comment =
Version = 4.3.3
Build Number = 20170606
Number = 0224
Time Zone = Europe/London
Enable Daylight Saving Time = TRUE
Workgroup = QNAP
[/share/CACHEDEV2_DATA/Public] # cat /etc/model.conf | grep INTERNAL_NET_PORT_NUM
INTERNAL_NET_PORT_NUM = 4
[/share/CACHEDEV2_DATA/Public] # hal_app --se_sys_get_mac obj_index=0
35:35:35:35:36:31
[/share/CACHEDEV2_DATA/Public] # hal_app --se_sys_get_mac obj_index=1
35:35:35:35:36:32
[/share/CACHEDEV2_DATA/Public] # hal_app --se_sys_get_mac obj_index=2
35:35:35:35:36:33
[/share/CACHEDEV2_DATA/Public] # hal_app --se_sys_get_mac obj_index=3
35:35:35:35:36:34
[/share/CACHEDEV2_DATA/Public] # cd
[~] # cd /share/Public
[/share/Public] # ls
@Recycle/ BIOS_QW10IR12/ TVS-1271U_QW10IR12.zip* messages
[/share/Public] # cd BIOS_QW10IR12/
[/share/Public/BIOS_QW10IR12] # ls
QW10IR12.bin flashrom
[/share/Public/BIOS_QW10IR12] # ls
QW10IR12.bin flashrom
[/share/Public/BIOS_QW10IR12] # chmod +x *
[/share/Public/BIOS_QW10IR12] # ls
QW10IR12.bin* flashrom*
[/share/Public/BIOS_QW10IR12] # ./flashrom -c MX25L128050 --programmer internal -w QW10IR12.bin
flashrom v0.9.8-unknown on Linux 4.2.8 (x86_64)
flashrom is free software, get the source code at http://www.flashrom.org Error: Unknown chip 'MX25L128050' specified.
Run flashrom -L to view the hardware supported in this flashrom version.
[/share/Public/BIOS_QW10IR12] # ./flashrom -c MX25L12805D --programmer internal -w QW10IR12.bin
flashrom v0.9.8-unknown on Linux 4.2.8 (x86_64)
flashrom is free software, get the source code at http://www.flashrom.org Calibrating delay loop... OK.
Found chipset "Intel C226".
This chipset is marked as untested. If you are using an up-to-date version
of flashrom *and* were (not) able to successfully update your firmware with it,
then please email a report to flashrom@flashrom.org including a verbose (-V) log.
Thank you!
Enabling flash write... Warning: SPI Configuration Lockdown activated.
OK.
Found Macronix flash chip "MX25L12805D" (16384 kB, SPI) mapped at physical address 0xff000000.
Reading old flash chip contents... done.
Erasing and writing flash chip... Erase/write done.
Verifying flash... VERIFIED.
[/share/Public/BIOS_QW10IR12] # hal_app --se_sys_set_mac obj_index=0,value=35:35:35:35:36:81
eth port = 0, Set MAC address = 35:35:35:35:36:31,ret = 0
[/share/Public/BIOS_QW10IR12] # hal_app --se_sys_set_mac obj_index=1,value=35:35:35:35:36:82
eth port = 1, Set MAC address = 35:35:35:35:36:32,ret = 0
[/share/Public/BIOS_QW10IR12] # hal_app --se_sys_set_mac obj_index=2,value=35:35:35:35:36:83
eth port = 2, Set MAC address = 35:35:35:35:36:33,ret = 0
[/share/Public/BIOS_QW10IR12] # hal_app --se_sys_set_mac obj_index=3,value=35:35:35:35:36:84
eth port = 3, Set MAC address = 35:35:35:35:36:34,ret = 0
[/share/Public/BIOS_QW10IR12] # reboot