Installing Capita’s SIMS FMS on a Citrix Metaframe server (and having it work)

System Requirements:

  • Citrix
  • Capita’s SIMS
  • FMS for SIMS
  • Windows Server 2003 / Windows Server 2003 R2

The Problem:

Capita’s SIMS, or ‘Schools Information Management System’ is a widely used UK school management application made by the masters of conglomerate outsourcing the Capita group. The propaganda states that it is used by over 22,000 schools here in the UK…

FMS (Financial Management System) is a bursary module for SIMS providing financial support backed by the wider SIMS database sub-system. FMS makes use of its own database environment, separated from the main SIMS data store: should you attempt to install FMS (version less than 6.91) on a Citrix server you are going to run into a no longer working version of FMS.

FMS installation on the Citrix terminal server system renders a mal-configured users session, resulting in a DSN error.

SIMS configuration requires the presence of a User level DSN on the system (At the NTLM SAM level) in order for the client application to make a connection to the database server.

The problem has been diagnosed as three fold:

  1. The FMS installation process is not MSTSC/Citrix compatible and is unable to correctly configure the client system at run-time. The current assumption is that this is due to the use of a pre-windows/very early windows 2000 driver which is certified for NT4 (non-TSE) and not enterprise TS environments (Dated 1999).
  2. The user DSN database setting are not being registered into the default profile on the server and as such no user accounts are receiving the user configuration settings.
  3. The legacy database connectivity driver is unable to automatically register on the system, relying upon the installation process to perform all required integration into the ODBC environment. As the installer is failing, so is the driver registration. This means that the User DSN cannot be created on the TS without intervention.

Lovely problem, terrible system

As you all know, I like a challenge.

FMS makes use of the Borland Desktop Engine, or more accurately the Sybase SQL Anywhere 5.0 DBMS. The version compiled for use by SIMS FMS was compiled way back in 1999 when men were men, terminal server were never used and Windows 2000 was not exactly being used by anyone other than… me.

Here we are in 2007 and trying to push the FMS configuration data off of a standalone workstation and onto a dedicated Citrix server; it wasn’t going to play nicely was it!

FMS will install, but it will never enable you to connect to the database citing connectivity issues.

The Fix:

This problem is squarely due to Capita having never evolved their database components, it isn’t FMS causing the issue, it isn’t Citrix, Windows Server 2003 or SIMS. It’s SQL Anywhere 5.0.

I really would urge you as a School’s administrator to head to Capita at this point and perform a cost / benefit analysis on obtaining FMS 6.91 or higher, which will port your old databases across to a new MS SQL 2005 environment. I guess even Capita realised that they were on borrowed time!

There are two distinct problems that must be solved before you will be able to use FMS over Citrix:

  1. The installation of the SQL Anywhere 5.0 driver set is not designed to run across the Terminal Server Installation Mode. As a direct consequence, once the installation is complete you will be left with the FMS database binary data scattered around your servers hard drive, but it will not be in a state where it is able to do anything! Windows is left oblivious to its presence.
  2. FMS makes use of a Mandatory DNS connection hosted at the user level to initiate connections between the client (on the Citrix server) and the SIMS / FMS database (hopefully stored somewhere other than the Citrix server). The user level DNS is not installed as part of the Citrix install and must be constructed by hand in the required user accounts.

 

Fixing the Sybase SQL Anywhere 5.0 Driver

The failure here is down to the age of the database components being used. Sybase’s developers must have been using legacy design standards and/or cut some corners in the construction of their ODBC driver set because there is not a single component in version 5.0 which is capable of performing and self registration functions under Win32.

The long and short of it is that the entire system had to be pulled to pieces, analysed and a manual driver registration system generated.

You will be pleased to hear that I served my penance, did my good deed for society and the future generation and am offering it up to you here to download.

Download: FMS-Terminal-Server-Hack.zip (2KB)

 

The following steps require that the FMS installation be performed on the TS as far as is permissible before failure.

The steps assume that the default installation paths are used on all components of the database client system.

The following documentation assumes that future connection attempts are being made to the FMS database server using the same database name, conventions and settings as found at the time of writing.

Prerequisites

  1. Ensure that FMS is installed
  2. Ensure that the Sybase SQL Anywhere 5.0 driver components are located on the client at the following location:
    c:\SQLANY50\WIN32
    Note: The default installation for the client software is C:\SQLANY50\win32\. If you have installed the client applications to another location, you must edit the path in the hack file you downloaded above before registering it.
  3. Ensure that you have the DBclient.exe in the above path
  4. Ensure that you have FMS-Terminal-Server-Hack.reg from the zip file accompanying this document
  5. Make any edits to the fie that are required to tailor it to your system (see note above)
  6. Log into the terminal server with Administrative permissions to begin the process

Server configuration

Enter FMS-Terminal-Server-Hack.reg into the Windows registry.
This should be all that is required to configure the server. A reboot should not be necessary.

To validate the reconfiguration

  1. Start > Run > obdcad32.exe > ok
  2. User DSN tab
  3. Add
  4. Scroll to the bottom of the list and verify the presence of: Sybase SQL Anywhere 5.0 version 5.05.04.1980 date 10/08/1999

Client Configuration

The use of a mandatory User DSN means that the DSN must be configured on the client account, not the administrator account as it is not a global configuration setting.

The settings will be replicated to the user account profile.

 

Note: I cannot/will not provide screen shots of the process that follows in the interest of preserving client security. You will need to source the information from a working FMS client system from within your organisation.

 

To add the User DSN by hand:

  1. Start > Run > odbcad32.exe
  2. User DSN tab
  3. Add
  4. Select Sybase SQL Anywhere 5.0 from the bottom of the list, click Finish

Configure the Sybase DSN as follows

  • DSN: SIMS32
  • Description: General
  • User ID: <blank>
  • Password: <blank>
  • Server Name: <YOUR FMS INFO FROM A WORKING CLIENT>
  • Database Name: <YOUR FMS INFO FROM A WORKING CLIENT>
  • Database File: <YOUR FMS INFO FROM A WORKING CLIENT>
    Type: Custom
  • Translator Name: <No Translator>
  • Enable: Microsoft Applications (Keys in SQLStatistics)

Click on the OPTIONS button

  • Start Command: c:\sqlany50\win32\dbclient.exe -ta 600 -x NamedPipes,TCPIP
    Note: Modify the path above to match your Sybase SQL Anywhere 5.0 drive path
  • Database Switches: <blank>
  • Agent: Client
  • Autostop Database: Ticked
  • Isolation Level: <blank>
  • Describe Cursor Behaviour: If Required

Click Ok twice to save the DSN

Run FMS to prove that it works and and publish it into your Citrix environment as to your specific organisational needs.

Error: Activex componet can not create object: “WScript.Shell” when running WScript application

System Requirements:

  • Windows 95, 98, 98SE, Millennium
  • Windows NT 4.0, 2000, XP, 2003, Vista

The Problem:

When you run a .vbs file with a call to CreateObject(“WScript.Shell”) the script/application terminates with the follow error message:

ActiveX componet can not create object:”WScript.Shell”
Code: 800A01AD

The script then exits

More Information:

Your Windows Scripting Host has a mis-registered control. If you have just Installed Microsoft Internet Explorer <anything>.<anything> chances are the install went wrong. Check the install log in %SystemRoot% and check for failures.

I would recommend reinstalling IE properly to be safe, but chances are the problem will still be there afterward (it was with this particular muse and MSIE 7.0).

The fix is very simple however (assuming that your error is down to Windows Scripting Host and not bad programming; you are on your own on that one).

From cmd, or from a .bat run:

regsvr32 dispex.dll
regsvr32 jscript.dll
regsvr32 scrobj.dll
regsvr32 scrrun.dll
regsvr32 vbscript.dll
regsvr32 wshcon.dll
regsvr32 wshext.dll
regsvr32 wshom.ocx

I recommend that you start with wshom.ocx and test . No reboot is required for shell initiated .vbs files. If the problem is with IIS, you should restart the IIS Administrative and WWW services.

Be sure that you are using WScipt 5.6 (Unless running Vista or XP with IE7). You can download the latest release here:

Dell PowerEdge 2400

System Requirements:

  • Dell PowerEdge 2400

The Problem:

Dell PowerEdge 2400 with 2 Gigabytes of main RAM (4x512MB PC133 ECC, Registered) blue screens out sporadically with the following kernel error:

*** Hardware Malfunction ***
Call your hardware vendor for support
*** The system has halted ***NMI: Parity Check / Memory Parity Error

More Information:

The PowerEdge 2400 is functionally quite a nice little server assembly, it was the last of the PIII Slot 1 generation systems released by Dell in a tower chassis form factor.

When I originally got my hands on the system, it was billed as being non-functional, was equipped with the following specification:

  • 1x PIII 600EB
  • 2GB Registered ECC PC133 (4x512MB) DIMM
  • 3x8GB SCSI 160 Hard Drives
  • 3Com high capacity server 10/100 NIC

The original stated issue with the server was a propensity towards unreliability and interrupt BSOD’s so had been written off by an IT service company also involved with this particular client that was more interested in tendering a new (and unnecessary) £24,000 hardware requisition than trying to fix the problem.

The source of their problem was so spectacularly simple that I had solved it in less than 20 minutes after dusting it down. In attempting to be clever, they had installed the reference drivers for the integrated Intel motherboard NIC released by the Intel Corporation. The higher driver resource files had got stuck on the system and couldn’t be removed, preventing driver regression. Anyone who has ever had any experience with integrated NIC’s on Dell motherboard will know instantly not to use reference drivers on Dell modified hardware. Clearly someone here didn’t.

 

NMI Memory Parity Error

This is the point at which the NMI parity errors started showing through on the system. Initially they would only crop up during low load periods, approximately every 3 days. If the system was being kept busy it never seemed to crop up.
The problem was that the more I cleaned up the messy Windows 2000 install, the more frequent the problems started to become, until I got around to updating all of the systems primary firmware.

  • BIOS A09
  • ESM / Backplane A51
  • Raid Controllers 2.8.1.6098

Interestingly this “strongly recommended” set of updates brought the frequency of the NMI Parity errors to a head, with them now occurring approximately 20-30 second after the boot splash screen has finished animating.

Let us stop here and have a quick look at what a NMI error is.
Put as simply as possible: NMI, or Non-Maskable Interrupt is an Interrupt (commonly referred to as IRQ’s, best described as an ‘attention signal’) which can be generated by the lower level hardware devices in a system. Standard Interrupt ‘signals’ are used in a computer system to to request that the processor pay attention to the initiating device as a priority operation over general data processing tasks. What distinguishes an NMI event is that generally speaking, when a NMI is triggered by a hardware device, the processor (and fundamentally the operating system kernel) is not at liberty to ignore it.

It is because the Windows Kernel has been coded to comply with the inability to ignore the NMI that the Parity error on the PowerEdge doesn’t behave like your average BSOD – there is no memory dump, no automatic reboot, no ACAPI responsiveness. It is the hardware demanding that Windows terminate, rather than Windows deciding that it needs to stop; hence the synonymous NT BSOD or ***** STOP ***** error.

 

The Generic, catch all approach to fixing NMI Memory Parity errors

I’ve o doubt that if you’re suffering from this problem, and have got here through some sort of search engine you’ve waded through numerous posts telling you exactly the same thing already. So without wanting to spend too much time on the generic’s here’s a quick recap.

  1. “It’s a Memory Parity Error, Stupid” – I’ve not seen a T-Shirt with that on yet, but I suspect I will someday. The blatantly obvious cause of the error is that you have actually experienced a Memory Parity Error. Hard to believe I know!In the server system, you are using Registered main RAM (also known as buffered memory). This is a little sub-process performed on the clock which checks to see if there are any parity (mathematically formulated true/false checks on data) errors in the data being fed into RAM. The parity check is designed to prevent errors being passed back into the processor and wasting additional processor cycles or worse causing the system to crash – they should get caught directly in RAM and be fixed through processes present at the hardware level. If you get the NMI Parity Error because you have had a genuine failure in RAM, the system bottles out because it can’t work out how to fix it, so instead of sending gibberish back to the system registers, it panics and suspends processing.
    What causes it, simple, bad RAM. This can be the result of a manufacturing error, badly inserted DIMMs, sub-standard contact between the DIMM pins and the DIMM slot connectors (due to bent pins, damaged motherboard slots, grit, grime and dust), using DIMMs of different quality standards together (always try and match DIMMs to the same manufacturer, model and batch) or it is even possible that you have the DIMMs inserted in the wrong order – always place the largest DIMM’s towards the processor source/sink and the smallest at the end of the array – on the 2400, this is DIMM bank A (farthest from the CPU) followed by B, C and finally D.

    Memory Test the DIMM’s using as many testing metrics as you can find, both environmentally (outside of the operating system and natively (inside the operating system). Enable the full RAM test (POST) in the BIOS if you have this option available to you. Microsoft also have some free tools that you can obtain from Microsoft OCA to do this, and there are plenty of others – including the Win32 and bootable Dell Diagnostics & F10 management partitions on Dell configured disks. Whatever you do, don’t think that you can run one iteration and “it means its OK”. Be prepared to walk away for a day or two and come back to a green panel and THEN move onto the next test.

  2. Start pulling out your DIMMs and sequence test them, see if the crashing only happens with a certain combination, or in a certain slot. If it does, you can bet with a fair level of certainty that you have a real Parity error from either a bad DIMM or a bad DIMM slot.
  3. Update the system. I spend a lot of time reminding people of this. If you don’t have the latest Kernel enhancements provided through Service Packs and patches, how can you expect driver/firmware written after their release to be fully interoperable with the older systems.
    If you update your firmware, then update the drivers to match the firmware revision.
    If at this point you are attempting to run NT 3.51 or NT 4.0 with 2GB of RAM in a modern hardware environment – go get a nice shiny, shrink wrapped version of 2003 Server. OK, so I threw that one in for no particular reason… someone has to keep Mr. Gates in the manner he’s accustom to! Don’t they?
  4. Check that you aren’t overloading your PSU (power supply). If you have a poor flow down your power rails you can upset the components inside a computer.
  5. Update stroppy, meddling kernel-mode system drivers to the latest versions that you can find. I’m going to include Anti-virus applications in this as well, I have heard of people experiencing NMI issue when using Symantec anti virus server products (why? I have to ask) – update the fundamentals or try and test the system with the latest version of the product if you aren’t already rolled out, not forgetting to strip out its lower level driver routines!
  6. Yank the RAID array, throw in a blank disk and reinstall Windows. I know it’s a rather unruly suggestion, however you’ve already spent 72 hours performing constant RAM diagnostics, the idea of spending 39 minutes reinstalling Windows 2003 in a non-destructive manner isn’t that big a deal in the greater scheme of things.
    What is this designed to tell you? If the system still dies under this test with a clean install of Windows, then you are likely looking at a hardware problem rather than a genuine Windows malconfiguration. If it works, try installing your base application rollout, drivers and so on and continuity test the system.
  7. Do NOT in-place upgrade/repair the operating system. It is utterly pointless. If it is a Windows error bite the bullet and spend the time performing a clean install. As you’ll see below, upgrading the in place operating system does no good what so ever.

 

The experiment(s)

Clearly all of the above have been performed on my victim… er test subject and the PowerEdge, running Windows 2000 Server is still experiencing the problems. So how to fix it? I have outlined all the additional steps that I took, and outlined briefly the outcome of each. I’ve listed them primarily to give you some ideas over what you can do if you are experiencing the same problem. Do remember that just because something impacted or didn’t impact my predicament, doesn’t for a moment mean the same will be true for you.

The first and most immediate solution became apparent quite quickly in the generic testing.

  1. Don’t run it with 2GB RAM. My system has 4x512MB ECC DIMMs in it. This fills all four banks and maxes out the RAM capacity of the system. If I take out ANY of the four DIMMs from ANY of the slots, the problem vanishes. Completely. All the RAM passed all the testing I threw at it, but for some reason it seems that the system just didn’t want to address and remain stable with 2048MB in it. At least the system could be quickly made production worth. However… I don’t like leaving things unfinished. It’s not a very professional way to approach the trials that life throws at us.
  2. Put 4x128MB ECC PC133 DIMMs in the system. This was one of the last tests I did, and the system seemed to operate flawlessly with 512MB in it. Proving that the issue is not a bus related one.
  3. Try an non-Microsoft operating system. I have to confess that at this point I was plum out of SCSI drives to use, so I elected to perform testing with Live CD’s instead – it’s not perfect – but several renditions of Linux showed no observable problems, and perhaps more interestingly, no Live CD versions of Windows did either.
    I tested using Windows PE for 2003 Server and Bart PE, and although both use the core of Windows XP, they exhibited no obvious problems and certainly no NMI BSOD’s even with most of the system drivers loaded. Except the PERC 2/Si controller.
  4. Dump the RAID configuration and start from Scratch. Nothing happened except having to pull one of the 512’s out to get it to go through setup without BSOD stalling.
  5. Tape restore the system and in-place upgrade from Windows 2000 Server to Windows 2003 Standard Server. New Kernel, more robust operating system you think? Alas no. At this point 2003 Server became the test operating system.
  6. Replace the PIII 600EB processor with a PIII 733EB processor. I got a nice speed boost, but, alas, not fix to the problem.
  7. Replace the uni-processor PIII 733EB with two PIII 1000 (1GHz) SL4BS’s (making the system dual-processor). I was quite hopeful on this one, but sadly no. Best £19 I ever spent though!
  8. Pull everything from the PCI bus, disable all the integrated BIOS hardware (NIC, USB etc). No change.
  9. Reset the EEPROM (Pull the BIOS battery) – no change
  10. Alternate and re-sequence the use of the RAID controller positions on the backplane – no change.
  11. Replace the RAID controller DIMM – no change

 

Ah!

All seems rather hopeless doesn’t it?

I pulled the server from the farm (again) and brought it somewhere a little more comfortable to work with (again). Cleaned the DIMM slots, cleaned the pins on the DIMMs (again). I have always felt that the NMI error in this case has been something of a misnomer.

I have been gravitating towards the backplane/RAID controller for some time in my other experiments, so primarily out of having no better ideas, I decided to completely disconnect the backplane assembly. I unlinked the ribbon cables and cleaned them, removed all the PCB’s, straightened out the wiring, cleaned off the heavily dust laden drive connectors and cleaned the connector pins on both the daughter board and the motherboard. I then fully cleaned out the inside of the case around the drive bays, reseated the backplane and put the cabling back.

I fired the 2400 up and was immediately presented with a POST warning stating a warning that the ESM firmware revision was out of date.

!!!!****Warning: Firmware is out of date, please update****!!!!

Particularly strange considering everything was long since flashed to the latest version before the system was last powered out. I can only surmise that the process is non-volatile and with the disconnection of the battery sources, the firmware reverted to its original settings.

To my surprise, the system booted straight into Windows 2003 Server and hasn’t NMI’d out since!
Deciding to tempt fate, I downloaded and re-flashed the ESM controller to the latest version, and I am please to report that even with the re-flash it hasn’t (yet) fallen over.

I have made no driver changes to the installation, everything is running on Windows 2003 default drivers with the exception of the Tape stream driver from Windows update. I have installed the McAfee Enterprise 8.0i (Patch 14, latest DAT’s/Engine) on the system and setup IIS in its production configuration. The PowerEdge has mainly been idling since the reinstall (and it’s been lying on its side, as well as counting the system idle process!), but don’t worry. If this holds out for a little longer, I will put those two new SL4BS’s to good use.

All being well, you’re receiving this website from it

 

Update 15th April 2007: Well, all has been well and you are seeing this website from the PowerEdge 2400. The server has offered impeccable performance since I wrote this article (shame I can’t say the same for Microsoft’s patching downtime requirements so that I could prove the uptime). The server also survived the cold winter without incident, unlike its 2600 counterpart which fell over numerous times. PowerEdge 2400, a workhorse and a graceful lady. Good job Dell.

 

Update 17th June 2007: All still seems to be good. I updated to McAfee 8.0.0i Patch 15, and a couple of days later had a complete system drop out (one of those black screen, no response to anything, just spinning the electricity meter moments). It wasn’t NMI related though, I suspect a PSU fluke in this case. Aside from this, the system continues to run flawlessly.

 

Update 31st December 2007: Everything seems to be running just fine with the server although I have concluded that it doesn’t want to run with 2.0 GB RAM in it as it seems to force the System Management Controller to reboot it around every 2 months. I have reduced it to 1.5 GB where it runs comfortably without any unexpected reboots by the SMC. I have also updated the tape firmware to A18.