This chapter describes known problems in the current versions of the console software products. Most problem reports include a way to work around the problem.
Unless otherwise stated, fixes for the problems listed here will be considered for inclusion in a future release of the product.
ATTENTION Trouble shooting time can be greatly reduced if the clocks on the Console PC and the quads are synchronized.
When upgrading CSW, there may be one or more script files that are read-only. In this case, you are prompted whether to overwrite the file. The choices are Yes, No, and Cancel. If you click Cancel, Installshield may hang, and the dialog box disappears.
Workaround.
Use the NT Task Manager to kill the Installshield process. Connect to the VCS directories and remove the read-only attribute for the files to be overwritten. After cleaning up the directories, restart the installation.
If a Quad has been replaced, the previous Quad still persists in the operating system naming database. For example, if quad2 has been replaced in a three-Quad system, the OS names the Quads quad0, quad1, quad3. This can be viewed with the ptx command: dumpconf | grep quad.
On the console PC, there is still a configuration tree entry for quad2 as the Quads were replaced, but when the OS boots, entries for quad3 are added according to the OS naming database. This creates what appears to be a ghost Quad on the console. If a hardware fault occurs, VCS crashes with a Dr. Watson report.
Workaround.
The safest workaround is the following:
Log in to the operating system as root.
Use the command: devctl -D to delete the Quad that was removed. Refer to the devctlman page for specific information.
Use the command: devctl -n to rename the new Quad to the name of the removed Quad.
Shut down and restart VCS.
When creating eight systems in VCS (not a supported configuration), on the eighth system VCS crashes. After VCS has been restarted, the eighth system cannot be removed, and after a few seconds, VCS crashes again. The first seven systems can be destroyed using the VCS CLI. The only way to remove system eight is via the registry. A side effect of this problem behaves as follows: After cleaning up the registry, and restarting then closing VCS, a GPF/Dr. Watson error occurs.
Quads that include memory (VLM) between 3.75 GB to 4 GB can lose up to 512 MB due to deconfiguring of a row in the QMC card.
Workaround.
Do not populate memory in this region.
Using manual deconfig, you could potentially deconfigure all of memory. If you now attempt to boot the system, it hangs at BIOS with a post code of 0x73 indicating that there is ZERO memory. (The POST code can be seen in the VCS log).
Workaround.
If this occurs, use the VCS CLI to do the following. This example assumes that your system is named numaq1, and that quad0 is where the BIOS hang has occurred.
->cd /numaq1 ->reset -f on ->cd quad0/qbb/qbb_rsic/ ->rs ext1Use the value returned for ext1 to determine a new value nn that will clear bit 7 of ext1
-> ws ext1 <nn> -> cd /sys1/quad0 -> reset off
If the console displays a dialog box with the error VcsApp initialization error - Terminating this instance, or if the message Sx Qxx ERROR SiteMgr Open - error getting IP address for MDC net is written to the VCS log, it indicates a problem has occurred with the MDC NET. There is either a mismatch between the MDC NET settings within the registry and the IP address for the ethernet card, or a physical failure has occurred.
Workaround
To check for an MDC NET mismatch:
Check the current IP address setting for the card from the Windows NT Control Panel. The properties can be found from the Network option.
Check the current MDC NET IP address setting in Registry using the setreg.exe program. Change the IP address to match if necessary.
If the settings match, check the physical MDC NET connection. A failure can be caused if the MDC NET is not connected, the connection is broken, or the connection is improperly terminated. Check the connection, then reboot Windows NT.
Failure can also occur if Windows NT was booted when the connection was not correct. In this case, a dialog box is displayed a short time after booting indicating a service or driver did not start. An entry is added to the Windows NT Event Log detailing which ethernet driver did not start. Check the connection and reboot Windows NT. Power cycling the console PC is recommended.
If the MDC Net is suspect, a small network consisting of a "T" adaptor and two terminators can be used instead.
Occasionally when a system is deleted, the MDCs switch to Error Mode, displaying an E on their LED display.
Workaround.
Power cycle the Quad or IQ-Ring module.
In some cases, AC powering off Quads and IQ-Ring modules can cause VCS to crash.
Workaround.
Restart VCS.
In some cases, such as clicking Reset On while the sak.dat file is loading, the Console and message window get sent behind all of the other open windows on the console PC. This obscures the message until the operator closes or moves the other windows.
VCS seems to hang during the shutdown process. It can take up to three minutes to complete the shutdown. This is noticeable if you close VCS then immediately close Launcher. The Launcher window remains on the screen for several minutes.
During the time, the Console Software is performing internal data structure cleanup, and other routine checks. Once these tasks are complete, Launcher exits.
The autoreconfig and fault reboot features are designed to reduce the downtime and speed up the recovery by automatically reconfiguring the hardware to reboot the system from an alternate boot PBay.
If the system is configured to automatically reconfigure and boot, but does not have alternate boot PBay, the Quads are switched in the event of a failure. This results in invalid device paths and incorrect probes. The system then goes into a loop, for example:
SAK: 00: S1 Qxx INFO SysMon SAK: 00: NOTICE: invalid device path specified for root S1 Qxx INFO SysMon SAK: 00: Returning to Firmware. S1 Qxx INFO SysMon state change: 14: Unix Kernel Shutdown S1 Qxx DEBUG TargMgr SetSystemState - System state has been set to (Operating System Shutdown, OS Shutdown, Running O/S)
Workaround.
Run the sysdef -l and make sure the Quads in your output are in the right order. The Quad with the QLC and the boot PBay must also have the boot device (if the OS is in a Multi-User state, the command lh -r shows the Quad's components, including the QLC). You can read off the LED 7 segment number on each Quad and make sure they are in order. If they are not, use the sysdef -d command to delete the system, then recreate it in the proper order. The sysdef -l command shows the actual name list.
-> sysdef -d bambam sysdef: 'bambam' deleted successfully -> sysdef -l sysdef: Unassigned Mdc(s) sysdef: MDC0: Quad_1200005 State: Active Net: 138.95.158.105 (8.0.47.2.1.A1) sysdef: MDC1: Quad_3900012 State: Active Net: 138.95.158.103 (8.0.47.2.7.9D) sysdef: MDC2: Quad_3900009 State: Active Net: 138.95.158.101 (8.0.47.2.7.50) sysdef: MDC3: Quad_3900007 State: Active Net: 138.95.158.102 (8.0.47.2.7.89) sysdef: MDC4: Lash_1100004 State: Active Net: 138.95.158.100 (8.0.47.2.7.5E) -> sysdef -c bambam 1 Quad_3900007 Quad_3900009 Quad_3900012 Quad_1200005 Lash_1100004 sysdef: 'bambam' created successfully -> cd bambam -> sysdef -l sysdef: System1: bambam State: (Config Mgmt, Activating, Power Off) MDC0: quad0 (Quad_3900007) State: Active Net: 138.95.158.102 (8.0.47.2.7.89) MDC1: quad1 (Quad_3900009) State: Active Net: 138.95.158.101 (8.0.47.2.7.50) MDC2: quad2 (Quad_3900012) State: Active Net: 138.95.158.103 (8.0.47.2.7.9D) MDC3: quad3 (Quad_1200005) State: Active Net: 138.95.158.105 (8.0.47.2.1.A1) MDC4: lash4 (Lash_1100004) State: Active Net: 138.95.158.100 (8.0.47.2.7.5E) -> cd quad0 -> bootflags bootflags: nodeid "7f003204" bootflags: masterid "" bootflags: faultreconfig "1" bootflags: faultreboot "0" bootflags: autoboot "1" bootflags: bootpath "0 quad(0)pci(5,0)scsi(1,0)disk(0,0,0)" bootflags: autodump "20" bootflags: dumppath "0 dump -f /etc/dumplist" bootflags: ptxdebug "off" bootflags: biosbootpath "lynxer.elf" bootflags: lynxflags "0x00000001 0x0000fc0f 0x00000000 0x00000000" bootflags: lynxerptxpath "ptxldr.elf" bootflags: lynxerntpath "not used" bootflags: lynxermquadpath "quad.elf" bootflags: lynxerseqcodepath "str3_07.obj"
The system initialization may take at least a minute to execute. The initialization uses the majority of the MDCs resources. We recommend a two-minute wait between powering on the system and booting the operating system, flashing firmware, or updating the QBB BIOS.
If a Quad has existed as part of a system, it persists in the operating system naming database. For example, if quad2 has been replaced in a three-Quad system, the OS names the Quads quad0, quad1, quad3. This can be viewed with the command: dumpconf | grep quad.
On the console PC, there is still a configuration tree entry for quad2 as the Quads were replaced, but when the OS boots, entries for quad3 are added according to the OS naming database. This creates what appears to be a ghost Quad on the console.
Workaround.
The safest workaround is the following:
Log in to the operating system as root.
Use the command: devctl -D to delete the Quad that was removed. Refer to the devctl man page for specific information.
Use the command: devctl -n to rename the new Quad to the name of the removed Quad.
Shut down and restart VCS.
During a Lynx Bus MBE error test, the OS panicked as expected, de-configured the Quad, and rebooted. However, the QMI link failed. At the login prompt in the console window, an error is generated for each character typed: Qmisend to Thread 0x65 failed. Resetting the system and rebooting has the same result. Log in is prohibited because the QMI errors. Diary entries were properly added during the panic.
If an SCI cable is unplugged, a diagnostic occurs but does not detect a problem and does not de-configure the Quad. When the system is re-booted, it fails due to the missing SCI connection.
When resizing the number of rows on the OS Console window using the Options -> Rows/Columns dialog, sometimes the window does not resize.
Workaround.
Use the resize handles on the window boarder to resize the window after using the dialog box to set the new number of rows.
The audit command may fail with CSW V1.7.3 with an Invalid Password specified error. The audit command queries all valid hardware assigned or unassigned to a connected console (if audit is run to the highest level // ). For unassigned hardware to be audited the audit script runs the assign script which determines what devices are not assigned and then assigns them using the sysdef command. In CSW V1.7.3 sysdef requires a password to create or delete a system definition. A password cannot be passed to sysdef on the command line, so the assign script fails.
After running the audit command, CSW appears to halt, displaying the following messages:
Sx Qxx TEXT VcsCli ->audit Sx Qxx ERROR VcsCli *E* sysdef: Invalid Password specified with -c option script file: 'c:\vcs\scripts\sequent\assign.cli' line 79 command: 'sysdef -c $sysName $sysID $nameList'
Workaround.
Change to a system level directory. For example,
-> cd /numaq1
Manually assign all hardware to the system using the sysdef -l and sysdef -c commands.
If the VCS Command updateprocinfo is used, you must wait for at least 60 seconds for the command to finish to ensure the information is completely written to the Diary RAM cache. After waiting the 60 seconds, shutdown and restart VCS.
If VCS is installed on a console that is set to a different time zone than the time zone of the machine that built VCS, the fwcheck command can report that files have changed when they have not changed. This bug does not alter any data. It simply reports that files have changed when they have not. If fwcheck does not report any changes, it is working properly. If it does report changes, it may or may not be working properly.
Workaround.
Confirm all changes reported by fwcheck by comparing the creation date of the files in /quadfw/ with the creation date of the files displayed by the mdcflash -l CLI command.
Do not delete a system when the NUMA-Q system is in the Config MGMT, Activating State. The access to the MDCs will become very slow, then fail and appear to timeout.
Workaround.
Re-start VCS.
If the first command in a script or CLI window is invalid, the error message in not actually displayed until after the second command is entered. In this case, it appears the message is generated from the second, valid command instead of the first, invalid command.
Workaround.
Put a reporting only command, such as vcsrev at the top of each script.
Using the right mouse button to paste information in a CLI window does not work. The information is pasted, but cannot be executed.
Workaround.
Use Shift-Insert, Ctrl-v or select Paste from the Edit menu.
The lashinit command is only supposed to work in the path /xxx/lashX. Currently, the command can be issued anywhere in the hardware tree without generating an error or performing the specified action.
The sciprobe command appears to be reporting the status from the boot, rather than performing a new check when the command is issued.
Workaround.
Destruct and recreate the system prior to issuing the sciprobe -v command.
The ConfigProm information of hardware components is cached in the VCS console configuration tree to increase the performance of the diary command. If you replace a hardware component and enter the new ConfigProm data, VCS continues to report the old data until VCS is re-booted.· Another workaround is to delete then re-create the system using the sysdef command.
In a 3+ Quad system, the Offline Diagnostics MultiQuad 2 test cuases COPB errors.
Workaround.
Break the system into smaller (1 or 2 Quad) systems and test them separately.
On MQuads, the SPC Board Continuity Tests intermittently fail.
In a 16 quad system, the Offline Diagnostics can hang.
Workaround.
Break the system into two 8 quad systems and test them separately.
During offline diagnostics, you may see the error message Quad[1] COPB reported an SERR NMI or Quad[1] COPB : Time-out on Host Bus Detected by COPB agent_id=0x3, apic_id=0x0 followed by a test failure message. This is a known diagnostics software bug; these specific error messages can be ignored.
After a redundant power supply has failed and been removed from a running system, the offline diagnostics may report spurious QBB RSIC scan test failures. These failures are reported every 30 seconds.
This is a known diagnostics software bug; these specific error messages can be ignored.
In a multi-Quad system, it takes a considerable length of time for the Offline Diagnostics module to perform a Commit Changes request. If testing is started prior to the request completing, the testing is actually performed on the old list instead of using the new changes.
Workaround.
Place the Offline Diagnostics window behind another window. When the Offline Diagnostics window pops to the front, the Commit Changes request is complete.
Testing occasionally stops without an error message. If testing is taking an inordinate amount of time to complete, check the VCS log. If it has been longer than one hour since an offline diagnostic message has been logged, the tests have quit running.
Workaround.
Exit VCS. Restart VCS and then restart the offline diagnostic testing.
The FRU list for the Offline Diagnostics is not implemented for this release of the console software.
The Online Diagnostic synctest is incorrect. A different version can be run from a DYNIX/ptx prompt.
Workaround
Use the following commands from a DYNIX/ptx prompt:
ln -s /usr/sync/lib/.mach_dep/sync/sync_test /usr/onldiag/ to
ln -s /usr/chat/bin/chatdiag /usr/onldiag/
Installing the Online Diagnostics fails to setup properly on all installation methods which support upgrade of the base OS: INIT ALT DISK DELTA, ALT DISK DELTA, and SCRATCH. The symptom of this failure is the VCS Online Diagnostics does not connect to the dispatcher after installing ptx 4.5.0.
The problem occurs when installation is on a partition other than the current root partition. During normal installations, the new partition is mounted at /installmnt and the file systems are built there. Online Diagnostics is installed at /installmnt/usr/onldiags. The install script for Online Diagnostics runs the onldiag_cleanup and the onldiag_setup scripts. The location of $ROOT has changed from / to /installmnt, so the scripts clean up the current /usr/onldiag installation and set it up again. This includes copying the S99 scripts to /etc/rc2.d and executing them to get the dispatcher running.
Workaround.
After rebooting the new installation, run the onldiag_cleanup and the onldiag_setup scripts. This copies the S99 scripts /etc/rc2.d and starts the dispatcher.
The Online Diagnostic memtest can fail on systems with large memory configurations, such as SQuads with 8GB or multiple SQuads with 4GB each. The following error is reported:
memtest: malloc XXXXXXXXX bytes, ret int 0, ulong 0: Note enough space (12)Workaround
Edit the file c:\vcs\bin\memory.tlg
Change the command option variables from n(2,4,2) to n(4,4,0)
This enables all four processors during the Online Diagnostic memtest.
In this release of the console software, Online Diagnostics does not support concurrent testing of multiple FDDI or CDDI boards.
Workaround.
Only install a loopback fixture and manually enable testing of one FDDI or CDDI board at a time.
If a Quad has existed as part of a system, it persists in the operating system naming database. For example, if quad2 has been replaced in a three-Quad system, the OS names the Quads quad0, quad1, quad3. This can be viewed with the command: dumpconf | grep quad.
On the console PC, there is still a configuration tree entry for quad2 as the Quads were replaced, but when the OS boots, entries for quad3 are added according to the OS naming database. This creates what appears to be a ghost Quad on the console.
Workaround.
The safest workaround is the following:
Log in to the operating system as root.
Use the command: devctl -D to delete the Quad that was removed. Refer to the devctlman page for specific information.
Use the command: devctl -n to rename the new Quad to the name of the removed Quad.
Shut down and restart VCS.
When a new log file is created, the default fragment and log sizes are used even if new values are specified at the same time.
Workaround
Issue the log command twice. The first time to change the name of the log file, the second time to change the size of the new log file. Recording the arguments to the log command does not work, it needs to be run twice.
Under certain loading conditions, spurious warning ping timeout warning messages may be generated by the Online Diags Test Dispatcher and the OS Config Daemon.
The messages appear in the log in the form of:
WARN VcsApp OsConfig::, System 1.These warnings do not affect normal operation of the system and can be ignored.
No response from config daemon for 240 seconds
Alternately, the warnings can be turned off by using the VCS Registry Setup program. On the first settings page, set the Ping Warning and Ping Error times to 0. This prevents all future spurious ping timeout messages.
This section provides instructions for loading vendor-supplied software (Emulex®) for the PCI-bus FC host adapter (the IBM NUMA-Q FC-P board), and downloading it to firmware on the board. This procedure is only needed if the host-based ffutil utility is not accessible.
ATTENTION If you are upgrading from Console Software 1.3.1 to 1.3.2 or 1.3.3, it is not necessary to flash the FC-P Host-adapter firmware. The files did not change for the 1.3.2 or 1.3.3 release.
Use these instructions to:
Upgrade current FC-P host-adapter firmware to a new release level.
Upgrade a down-revision replacement FC-P board to the current firmware level.
The procedure to flash the firmware on an FC-P board in a Quad is unusual in that the DYNIX/ptx operating system must be shut down and the BIOS of a Quad must be accessed to boot DOS and run a vendor-supplied utility. Because the utility only works with Quad 0, each Quad in the system will have to be temporarily defined one at a time as a single-Quad system. Two upgrade procedures are provided. The short-form procedure is intended for installers who have previously performed the upgrade. The detailed procedure is intended for installers performing the upgrade for the first time.
Before beginning an upgrade procedure, check first to make sure that the following conditions are present in the host system:
Document the existing system definition.
Bring the operating system to run-level 0.
Use the Power slider switch on the console front panel to soft power-down (turns off DC power, but not AC power) all Quads in the system and close the VCS PTX Console window.
Install the optical loopback plugs originally supplied with each FC-P board.
ATTENTION Extra caution is required when unplugging and handling fibre cables. When a cable is unplugged, the plastic dust covers/caps should be placed on the ends of the cable. The caps from the optical loopback cable can be used.
Delete the old system definition and create a new single-Quad system defintion.
Change the biosbootpath and boot DOS on the Quad with the power on command in the VCS CLI window.
Open a new PTX Console window.
Run the Emulex utility program to upgrade and test all boards in that Quad.
Close the VCS PTX Console window.
Reset the biosbootpath to the normal path.
From the VCS CLI window, use the power off command to soft-power down the Quad (turning off DC power but not AC power).
Remove the optical loopback plug from each FC-P board in the Quad and reconnect the fiber-optic duplex cable.
Repeat Steps 5 through 10 until all Quads in the system have been upgraded.
Redefine the original system using the information taken in Step 1.
Open a new VCS PTX Console window for this system and power up all Quads.
Use the cd /system_name and the sysdef -l commands to document the current system definition. This information is necessary to restore the system after all FC-P host adapters have been flashed with the DOS utility.
Bring the operating system down to run-level 0 with the init 0 command.
From the VCS CLI window, soft power-down all Quads in the system with the power off command for each Quad.
Close the VCS PTX Console window.
For each FC-P board in the system, disconnect the existing fiber-optic duplex cable.
ATTENTION Extra caution is required when unplugging and handling fibre cables. When a cable is unplugged, the plastic dust covers/caps should be placed on the ends of the cable. The caps from the optical loopback cable can be used.
Install the optical loopback plug supplied with each FC-P board to allow running the onboard diagnostic tests after the upgrade firmware has been installed.
Use the Windows NT Explorer to verify the file C:\vcs\quaddos\dosflist.txt is in the directory c:\vcs\scripts\dosflist.txt. Copy it there if it is not present.
Use the sysdef -d system_name command to delete the current system definition.
List the unassigned MDCs (sysdef -l).
->sysdef -l Unassigned Mdc(s) MDC0:Quad_3701000 State: Active Net: 138.95.158.100 (8.0.47.2.0.42) MDC1:Quad_3601060 State: Active Net: 138.95.158.102 (8.0.47.2.0.45) MDC2:Lash_0700027 State: Active Net: 138.95.158.101 (8.0.47.2.0.52)
Use the sysdef -c command to define an unassigned Quad as a system by itself. In this example, we are defining a system named flash1, system number 1, composed of the first unassigned MDC Quad_3701000. From the VCS CLI window, type:
->sysdef -c flash1 1 Quad_3701000
In the VCS CLI window, enter the following commands:
-> cd /sys_name/quad0 -> mdcflash -w c:/vcs/quaddos/dosboot.exe -> bootflags biosbootpath dosboot.exe
Open a new VCS PTX Console window for this system.
Enter the following commands in the VCS CLI window:
-> power on -> reset off
Click in the upper left pane of the Console window. A blinking cursor appears.
Wait for DOS to load (this may take several minutes) and begin running. The AUTOEXEC.BAT program prompts you before starting the Emulex Diagnostic and Firmware Update Utility.
Press any key to continue (for example, the spacebar).
Record the firmware versions for all of the Host Adapters that are displayed. Press Enter to continue.
ATTENTION At any menu prompt, a 0 entry returns you to the previous menu until the Main Menu is reached.
Select the Maintenance option (option number 5).
Select the Update Firmware option (option number 1).
Select all host adapters by number to be upgraded. Use the syntax described in the on-screen instruction.
The firmware image file to be flashed at this point is the base firmware that contains all the operational files, including diagnostics. The Quad file name is:
ff222.awcThe SQuad file name is:
sf222.awc
Enter 1 to download the new firmware. When complete, press Enter to continue.
When the flash is complete for all boards, press 0 until you return to the Main Menu.
From the Main Menu, select the Restart Host Adapters option (option number 3).
Select the Test Host Adapters option (option number 1).
Verify that all host adapters pass the diagnostic tests.
Confirm that the firmware for each host adapter in the system has been updated to show the new firmware file name (minus the extension).
From the Main Menu select the Quit option.
From the VCS PTX Console window, click the Reset button.
When the reset is complete, click the Power OFF button.
Close the VCS PTX Console window.
In the VCS CLI window, type the following commands:
-> cd /sys_name/quad0 -> bootflags biosbootpath lynxer.elf
Repeat all of the steps in this section (Flashing an Individual Quad) for each Quad in the system.
When all FC-P boards have been flashed, delete the last system definition (sysdef -d) and confirm that all Quads are unassigned (sysdef -l) and in the soft-powered down condition (AC power still applied).
Remove the optical loopback plug and reconnect the fiber-optical duplex cable to each FC host adapter board.
ATTENTION Extra caution is required when unplugging and handling fibre cables. Be sure and replace the plastic dust covers/caps back on the optical loopback cable.
Use the sysdef -c command to redefine the original system from the information gathered in Step 1 of the Preparing the System section.
Open a new PTX Console window.
Use the Power slider switch to turn the DC power on.
This completes the flashing procedure for updating the firmware of an FC-P host adapter board.