Dell Latitude CSx J500XT (CSx H500XT)

System Requirements:

  • Dell Latitude CSx

The Problem:

Welcome to wonderland Alice!

I was given a Dell Latitude CSx J500XT (Post Screen) – a Latitude CSx H0XT (BIOS) – and wanted to see what this slim line pup was made of. This article simply outlines my playing with the device.

More Information:

It’s quite a natty little device for its age, very light and slim line and it has some teeth given its PIII coppermine back end.

When I originally got my hands on the system, it was billed as being non-functional, was equipped with the following specification:

  • PIII 500MHz
  • 128MB PC100 DIMM
  • 20GB Hitachi Hard Drive
  • Windows NT 4.0 Workstation

Windows 2000

A system with a PIII and 384 MB of RAM can hack something a little more modern than NT 4.0 (not that I have anything against NT 4.0 mind), so I figured that I would try Windows 2000 SP4 (see my slipstream guide) on the system. There was a slight problem however.

No CD/DVD Drive. This system is slim line, and cannot accept the usual Dell modular bay hardware, instead it is supposed to have an external bay device caddy for the drive and a cable that attaches to the port replicator on the right hand side – neither of which I had. Incidentally, I have since discovered that the part number required is 10NRN External Media Cable.

I pulled the disk out and threw it into a Latitude CPi, wiped the disk, partition tables and FDisk /mbr’d the system. I then copied the i386 folder from the slipstream disk onto the device and threw it back into the CSx. Would it boot? Like hell!

To cut a very long story short, I had to perform some rather deep scrubbing of the 512 KB boot sector on the disk, because for some reason unbeknownst to me, none of my standard disk tools seemed able to scrub the boot sector on the hard disk, and the install process that embeds NTLoader onto the disk wasn’t able to over-write it either! So one assumes that the disk must be damaged

  • ChkDsk
  • ScanDisk
  • Partition Magic
  • SeaTool for DOS
  • Replacing the hard disk and installing a known working Seagate of equal capacity

Nothing after purging that drive using all of these standard tools and fixes.

  • FDisk
  • PowerQuest Partition Magic
  • Acronis
  • Windows 2000 Setup

None of these could get it. To cut this story short, in the end I discovered a nice little utility that blitz ‘d the disk’s first few sectors at the hardware level and it whirred back into life, installing Windows 2000 in a comfortable amount of time and working very nicely.

Grab the chipset driver and the graphics drivers from the Dell website and get them on the install – the NeoMagic driver offers you a little more control over the TV output and screen settings on the motherboard.

Windows XP

You have to try don’t you. I used a PIII 733 Dell Inspiron 3800 for several years with Windows XP and 384 MB RAM, so theoretically the performance level shouldn’t be that much different, particularly given that I have installed a larger, faster hard drive in the CSx.

Again, the installation had to take place using a local system i386 install process from a DOS prompt – hooray for my new multi-mode (ATA, 2.5″ ATA and SATA) to USB hard drive converter.

RAM Upgrade

The first thing that I always do (of course) is update the firmware of my equipment, so I hit the BIOS up to the latest A13 and found and loaded a firmware update for the Hitachi hard drive in the device.

According to Dell, Crucial and Kingston, the maximum RAM that this system can take is a rather unusual 320 MB (256 MB + 64 MB). Well, I had a 256 MB DIMM and a box full of 128’s lying about, so I threw them in there, expecting it to beep code at me, and…

BIOS With 384MB

Here the device is, posting and happily running with 384 MB (256 MB + 128 MB) in the BIOS.

I then came into possession of another 256MB chip, this time a PC133 CAS latency 3 chip (you can tell by the chip label, CL3 chips are PC133-333-520). This refused to POST at all, so assumed that it had made the maximum RAM, however fortuitously I came into another 256MB chip, this time a CL2 chip (PC133-322-620) and tried again.

The Latitude CSx H500XT posted with 512MB of PC133, loaded Windows XP and to this day still runs happily with 512MB and XP SP3. Proving that you can’t really rely upon the manufacturer or RAM supplier sites to have a clue about exactly what systems are capable of to begin with.

The Processor

Dell shipped this notebook with a:

  • Intel Mobile Pentium III
  • 500MHz
  • 100MHz FSB
  • 256 KB L2 Cache
  • Specification Finder Code: SL43P

I, on the other hand, happened to have one of these (a x86 Family 6 Model 8 Stepping 3):

  • Intel Mobile Pentium III
  • 650MHz
  • 100MHz FSB
  • 256 KB L2 Cache
  • Specification Finder Code: SL442
sSpec Number SL43P SL442
CPU Speed 500 MHz 650 MHz
Bus Speed 100 MHz 100 MHz
Bus/Core Ratio 5 6.5
L2 Cache Size 256 KB 256 KB
L2 Cache Speed 500 MHz 650 MHz
Package Type Micro-PGA2 Micro-PGA2
Manufacturing Technology 0.18 micron 0.18 micron
Core Stepping PB0 PB0
CPUID String 0683 0683
Thermal Design Power 16.8 w 21.5 w
Thermal Specification 100c 100c
VID Voltage Range 1.6 v 1.6 v / 1.35 v

PIII SL422

It doesn’t seem illogical to assume that this CPU would operate in the Latitude, and one would be correct with that sentiment… with a but. The system does not seem to actually acknowledge the extra megahertz that it has in it. According to the Windows System Information tool, the operating system is operating at 489 MHz (Exactly the same as with the original 500 MHz chip in it).

Jumpers (Click to Enlarge)The Intel Processor Frequency ID utility identifies the processor as running at 500MHz, but curiously also states that it was expecting 500MHz to begin with! I find this rather odd.

Finally, the Dell BIOS does not properly detect the processor. On the initial overview screen (shown at the top of this article), the processor is listed as:

  • Intel(R) SpeedStep(tm) 0/0 MHz

Perhaps more interestingly, the configuration option for the CPU boot compatibility mode now toggles between compatible and “0 MHz” – quite a feat!

This tells me that there must not be any headroom in the Dell BIOS programming for additional microcodes on this system – or that the multiplier has been hard locked on the board. I had a look around the motherboard and there is not much in the way of jumpers to configure, the only ones that are visible on the top side can be seen in the photo on the right (click to enlarge). I was not willing to fully disassemble the system and check the back of the board, however knowing Dell, there would not be a hardware option to change the multiplier.

 

I did of course experiment with tripping the four J8 pins, and each time, without fail, and no matter what the combination, the system would not boot – or even power on. I have no idea what this header is for, however it doesn’t appear to do anything to the processor. If anyone else is looking for information on upgrading the processor in the Latitude CSx series, then there is your answer, unless you happen to know of a BIOS hack/trick that can be used to modify the multiplier and get the system past the 500MHz barrier.

I have left the 650 MHz processor in the Latitude, it runs quite well with it in, with no obvious thermal issues or instability (and you never know your luck, the hardware monitors could be reporting incorrectly).

If you would like a photo of the top-side of the motherboard – click here.

Error: “Error during write buffer commit. Please check all cables and connections. Also verify that proper drive termination is used” while attempting to upgrade the firmware on Dell PowerVault 100T DDS4

System Requirements:

  • Dell PowerVault 100T DDS4

The Problem:

While attempting to upgrade the firmware on a PowerVault 100T DDS4 you receive the following error message from the Dell Windows updater.

error

The firmware updater exits from the session without upgrading the firmware.

More Information:

I have two Dell PowerVault 100T DDS drives, one in a PowerEdge 2600 and the other in the legend that is my PowerEdge 2400. Both systems run Windows Server 2003 and are pretty much vanilla Microsoft setups. The 2600 quite happily takes the firmware updates for the 100T DDS4, while the 2400 always drops out of the update procedure with the error message listed above.

Luckily the drive suffers no ill effects from encountering the error so far as I can see.

 

Obviously do pay attention to the error message. Check your termination and cabling if necessary. However there is a more simple solution. You’re flashing the wrong firmware.

There are two different versions of the 100T DDS4, and in their infinite wisdom (and for some reason) Dell didn’t think to add “v2” to the hardware name. What it boils down to is that there are two firmware kernels, the v8000 and the v9000. If you have an older drive then you have a v8000… and the Windows firmware updater is designed only for the v9000 drive.

 

How do I know which drive I have?

You can either scoot off and look in the PnP ID’s for the system, you could reboot the system and watch the POST… but that means downtime!

The lazy way to do it is to re-run the Windows updater package for the v9000. Before the flash begins you will be prompted with the following dialogue:

DDS4 Version Check

If the Installed Version string begins with an 8 (825B in this case) you have the older v8000 drive. If you have a v9000 drive at this juncture you really do need to check your termination and cabling.

 

Updating the firmware

Once you are secure in the knowledge that you have a v8000, the procedure is quite simply to download the hard drive install package and follow the instructions initiate the flash through the (provided) terminal application.

I have checked from A05 through to the latest A17 (at the time of writing) and while the highest FWID firmware revision seems to be labeled as 825B, the date time stamps on the images are being updated which makes me suspect that Dell simply aren’t updating the build number if they are indeed doing anything with it. In contrast the v9000 series is receiving incremental versions numbering. None the less, grab yourself the latest and greatest and get flashing!

 

Update 31st December 2007: There is now an A18, and while the version number is identical the date stamps have again changed.

Dell PowerEdge 2400

System Requirements:

  • Dell PowerEdge 2400

The Problem:

Dell PowerEdge 2400 with 2 Gigabytes of main RAM (4x512MB PC133 ECC, Registered) blue screens out sporadically with the following kernel error:

*** Hardware Malfunction ***
Call your hardware vendor for support
*** The system has halted ***NMI: Parity Check / Memory Parity Error

More Information:

The PowerEdge 2400 is functionally quite a nice little server assembly, it was the last of the PIII Slot 1 generation systems released by Dell in a tower chassis form factor.

When I originally got my hands on the system, it was billed as being non-functional, was equipped with the following specification:

  • 1x PIII 600EB
  • 2GB Registered ECC PC133 (4x512MB) DIMM
  • 3x8GB SCSI 160 Hard Drives
  • 3Com high capacity server 10/100 NIC

The original stated issue with the server was a propensity towards unreliability and interrupt BSOD’s so had been written off by an IT service company also involved with this particular client that was more interested in tendering a new (and unnecessary) £24,000 hardware requisition than trying to fix the problem.

The source of their problem was so spectacularly simple that I had solved it in less than 20 minutes after dusting it down. In attempting to be clever, they had installed the reference drivers for the integrated Intel motherboard NIC released by the Intel Corporation. The higher driver resource files had got stuck on the system and couldn’t be removed, preventing driver regression. Anyone who has ever had any experience with integrated NIC’s on Dell motherboard will know instantly not to use reference drivers on Dell modified hardware. Clearly someone here didn’t.

 

NMI Memory Parity Error

This is the point at which the NMI parity errors started showing through on the system. Initially they would only crop up during low load periods, approximately every 3 days. If the system was being kept busy it never seemed to crop up.
The problem was that the more I cleaned up the messy Windows 2000 install, the more frequent the problems started to become, until I got around to updating all of the systems primary firmware.

  • BIOS A09
  • ESM / Backplane A51
  • Raid Controllers 2.8.1.6098

Interestingly this “strongly recommended” set of updates brought the frequency of the NMI Parity errors to a head, with them now occurring approximately 20-30 second after the boot splash screen has finished animating.

Let us stop here and have a quick look at what a NMI error is.
Put as simply as possible: NMI, or Non-Maskable Interrupt is an Interrupt (commonly referred to as IRQ’s, best described as an ‘attention signal’) which can be generated by the lower level hardware devices in a system. Standard Interrupt ‘signals’ are used in a computer system to to request that the processor pay attention to the initiating device as a priority operation over general data processing tasks. What distinguishes an NMI event is that generally speaking, when a NMI is triggered by a hardware device, the processor (and fundamentally the operating system kernel) is not at liberty to ignore it.

It is because the Windows Kernel has been coded to comply with the inability to ignore the NMI that the Parity error on the PowerEdge doesn’t behave like your average BSOD – there is no memory dump, no automatic reboot, no ACAPI responsiveness. It is the hardware demanding that Windows terminate, rather than Windows deciding that it needs to stop; hence the synonymous NT BSOD or ***** STOP ***** error.

 

The Generic, catch all approach to fixing NMI Memory Parity errors

I’ve o doubt that if you’re suffering from this problem, and have got here through some sort of search engine you’ve waded through numerous posts telling you exactly the same thing already. So without wanting to spend too much time on the generic’s here’s a quick recap.

  1. “It’s a Memory Parity Error, Stupid” – I’ve not seen a T-Shirt with that on yet, but I suspect I will someday. The blatantly obvious cause of the error is that you have actually experienced a Memory Parity Error. Hard to believe I know!In the server system, you are using Registered main RAM (also known as buffered memory). This is a little sub-process performed on the clock which checks to see if there are any parity (mathematically formulated true/false checks on data) errors in the data being fed into RAM. The parity check is designed to prevent errors being passed back into the processor and wasting additional processor cycles or worse causing the system to crash – they should get caught directly in RAM and be fixed through processes present at the hardware level. If you get the NMI Parity Error because you have had a genuine failure in RAM, the system bottles out because it can’t work out how to fix it, so instead of sending gibberish back to the system registers, it panics and suspends processing.
    What causes it, simple, bad RAM. This can be the result of a manufacturing error, badly inserted DIMMs, sub-standard contact between the DIMM pins and the DIMM slot connectors (due to bent pins, damaged motherboard slots, grit, grime and dust), using DIMMs of different quality standards together (always try and match DIMMs to the same manufacturer, model and batch) or it is even possible that you have the DIMMs inserted in the wrong order – always place the largest DIMM’s towards the processor source/sink and the smallest at the end of the array – on the 2400, this is DIMM bank A (farthest from the CPU) followed by B, C and finally D.

    Memory Test the DIMM’s using as many testing metrics as you can find, both environmentally (outside of the operating system and natively (inside the operating system). Enable the full RAM test (POST) in the BIOS if you have this option available to you. Microsoft also have some free tools that you can obtain from Microsoft OCA to do this, and there are plenty of others – including the Win32 and bootable Dell Diagnostics & F10 management partitions on Dell configured disks. Whatever you do, don’t think that you can run one iteration and “it means its OK”. Be prepared to walk away for a day or two and come back to a green panel and THEN move onto the next test.

  2. Start pulling out your DIMMs and sequence test them, see if the crashing only happens with a certain combination, or in a certain slot. If it does, you can bet with a fair level of certainty that you have a real Parity error from either a bad DIMM or a bad DIMM slot.
  3. Update the system. I spend a lot of time reminding people of this. If you don’t have the latest Kernel enhancements provided through Service Packs and patches, how can you expect driver/firmware written after their release to be fully interoperable with the older systems.
    If you update your firmware, then update the drivers to match the firmware revision.
    If at this point you are attempting to run NT 3.51 or NT 4.0 with 2GB of RAM in a modern hardware environment – go get a nice shiny, shrink wrapped version of 2003 Server. OK, so I threw that one in for no particular reason… someone has to keep Mr. Gates in the manner he’s accustom to! Don’t they?
  4. Check that you aren’t overloading your PSU (power supply). If you have a poor flow down your power rails you can upset the components inside a computer.
  5. Update stroppy, meddling kernel-mode system drivers to the latest versions that you can find. I’m going to include Anti-virus applications in this as well, I have heard of people experiencing NMI issue when using Symantec anti virus server products (why? I have to ask) – update the fundamentals or try and test the system with the latest version of the product if you aren’t already rolled out, not forgetting to strip out its lower level driver routines!
  6. Yank the RAID array, throw in a blank disk and reinstall Windows. I know it’s a rather unruly suggestion, however you’ve already spent 72 hours performing constant RAM diagnostics, the idea of spending 39 minutes reinstalling Windows 2003 in a non-destructive manner isn’t that big a deal in the greater scheme of things.
    What is this designed to tell you? If the system still dies under this test with a clean install of Windows, then you are likely looking at a hardware problem rather than a genuine Windows malconfiguration. If it works, try installing your base application rollout, drivers and so on and continuity test the system.
  7. Do NOT in-place upgrade/repair the operating system. It is utterly pointless. If it is a Windows error bite the bullet and spend the time performing a clean install. As you’ll see below, upgrading the in place operating system does no good what so ever.

 

The experiment(s)

Clearly all of the above have been performed on my victim… er test subject and the PowerEdge, running Windows 2000 Server is still experiencing the problems. So how to fix it? I have outlined all the additional steps that I took, and outlined briefly the outcome of each. I’ve listed them primarily to give you some ideas over what you can do if you are experiencing the same problem. Do remember that just because something impacted or didn’t impact my predicament, doesn’t for a moment mean the same will be true for you.

The first and most immediate solution became apparent quite quickly in the generic testing.

  1. Don’t run it with 2GB RAM. My system has 4x512MB ECC DIMMs in it. This fills all four banks and maxes out the RAM capacity of the system. If I take out ANY of the four DIMMs from ANY of the slots, the problem vanishes. Completely. All the RAM passed all the testing I threw at it, but for some reason it seems that the system just didn’t want to address and remain stable with 2048MB in it. At least the system could be quickly made production worth. However… I don’t like leaving things unfinished. It’s not a very professional way to approach the trials that life throws at us.
  2. Put 4x128MB ECC PC133 DIMMs in the system. This was one of the last tests I did, and the system seemed to operate flawlessly with 512MB in it. Proving that the issue is not a bus related one.
  3. Try an non-Microsoft operating system. I have to confess that at this point I was plum out of SCSI drives to use, so I elected to perform testing with Live CD’s instead – it’s not perfect – but several renditions of Linux showed no observable problems, and perhaps more interestingly, no Live CD versions of Windows did either.
    I tested using Windows PE for 2003 Server and Bart PE, and although both use the core of Windows XP, they exhibited no obvious problems and certainly no NMI BSOD’s even with most of the system drivers loaded. Except the PERC 2/Si controller.
  4. Dump the RAID configuration and start from Scratch. Nothing happened except having to pull one of the 512’s out to get it to go through setup without BSOD stalling.
  5. Tape restore the system and in-place upgrade from Windows 2000 Server to Windows 2003 Standard Server. New Kernel, more robust operating system you think? Alas no. At this point 2003 Server became the test operating system.
  6. Replace the PIII 600EB processor with a PIII 733EB processor. I got a nice speed boost, but, alas, not fix to the problem.
  7. Replace the uni-processor PIII 733EB with two PIII 1000 (1GHz) SL4BS’s (making the system dual-processor). I was quite hopeful on this one, but sadly no. Best £19 I ever spent though!
  8. Pull everything from the PCI bus, disable all the integrated BIOS hardware (NIC, USB etc). No change.
  9. Reset the EEPROM (Pull the BIOS battery) – no change
  10. Alternate and re-sequence the use of the RAID controller positions on the backplane – no change.
  11. Replace the RAID controller DIMM – no change

 

Ah!

All seems rather hopeless doesn’t it?

I pulled the server from the farm (again) and brought it somewhere a little more comfortable to work with (again). Cleaned the DIMM slots, cleaned the pins on the DIMMs (again). I have always felt that the NMI error in this case has been something of a misnomer.

I have been gravitating towards the backplane/RAID controller for some time in my other experiments, so primarily out of having no better ideas, I decided to completely disconnect the backplane assembly. I unlinked the ribbon cables and cleaned them, removed all the PCB’s, straightened out the wiring, cleaned off the heavily dust laden drive connectors and cleaned the connector pins on both the daughter board and the motherboard. I then fully cleaned out the inside of the case around the drive bays, reseated the backplane and put the cabling back.

I fired the 2400 up and was immediately presented with a POST warning stating a warning that the ESM firmware revision was out of date.

!!!!****Warning: Firmware is out of date, please update****!!!!

Particularly strange considering everything was long since flashed to the latest version before the system was last powered out. I can only surmise that the process is non-volatile and with the disconnection of the battery sources, the firmware reverted to its original settings.

To my surprise, the system booted straight into Windows 2003 Server and hasn’t NMI’d out since!
Deciding to tempt fate, I downloaded and re-flashed the ESM controller to the latest version, and I am please to report that even with the re-flash it hasn’t (yet) fallen over.

I have made no driver changes to the installation, everything is running on Windows 2003 default drivers with the exception of the Tape stream driver from Windows update. I have installed the McAfee Enterprise 8.0i (Patch 14, latest DAT’s/Engine) on the system and setup IIS in its production configuration. The PowerEdge has mainly been idling since the reinstall (and it’s been lying on its side, as well as counting the system idle process!), but don’t worry. If this holds out for a little longer, I will put those two new SL4BS’s to good use.

All being well, you’re receiving this website from it

 

Update 15th April 2007: Well, all has been well and you are seeing this website from the PowerEdge 2400. The server has offered impeccable performance since I wrote this article (shame I can’t say the same for Microsoft’s patching downtime requirements so that I could prove the uptime). The server also survived the cold winter without incident, unlike its 2600 counterpart which fell over numerous times. PowerEdge 2400, a workhorse and a graceful lady. Good job Dell.

 

Update 17th June 2007: All still seems to be good. I updated to McAfee 8.0.0i Patch 15, and a couple of days later had a complete system drop out (one of those black screen, no response to anything, just spinning the electricity meter moments). It wasn’t NMI related though, I suspect a PSU fluke in this case. Aside from this, the system continues to run flawlessly.

 

Update 31st December 2007: Everything seems to be running just fine with the server although I have concluded that it doesn’t want to run with 2.0 GB RAM in it as it seems to force the System Management Controller to reboot it around every 2 months. I have reduced it to 1.5 GB where it runs comfortably without any unexpected reboots by the SMC. I have also updated the tape firmware to A18.