Chapter 2
FC I/O Subsystem-Level Problem Reports


ATTENTION

IBM strongly recommends that the system owner make and save for reference purposes a file copy of the dumpconf listing of the complete and correct configuration of all system hardware devices after the system has been successfully brought up for the first time.

After any power interruption, reboot, or devctl -c reconfiguration command, use this reference file and compare it to the new listing to determine if any devices failed to reestablish their links and disappeared from the system's configuration tree.


The problems described in this chapter involve more than one FC device of the I/O subsystem. Problems specific to a particular device are documented in the separate chapters for each device:


2.1 FC Device Initialization Problems

At this release, there are no known subsystem-level problems regarding FC device initialization.


2.2 EMC® Storage Subsystem-related Problems

The following problems occasionally occur when an EMC Symmetrix® Storage Subsystem is connected to an FC Switch.

EMC Disk I/O Timeouts Caused by Coincidence with Internal EMC Tests (PR 251177, 251703)

Under certain unknown I/O loads, an EMC Symmetrix subsystem running 5265 firmware can cause sequence time-outs. These errors have been discovered by inspecting the xSeries 430 or IBM NUMA-Q system ktlog for sequence time-out messages from EMC disks. As long as the system is configured for a Level-2 or -3 resource domain, retries and rerouting will protect from data loss.

The probable cause is interference from the internal environmental tests programmed within the EMC system to run at a certain time of day. Check the ktlog messages to see if the same thing is occurring every day, or over many days because the incidence appears to be load-dependent. Request the EMC field engineer to check scheduled timing for the internal environmental tests within the EMC system. Check specifically the time scheduled for "Environmental Test 1," the most likely culprit. If the ktlog time-out messages occur within 120 seconds after the internal EMC process completes, this is the most likely cause.


ATTENTION

Check first for any discrepancies between the xSeries 430 or IBM NUMA-Q system clock and the EMC system clock.


Workaround: Request the EMC field engineer to reschedule those internal tests to a known low-load time of day. An alternative is to upgrade the EMC subsystem to the first general availability release of the 5266 firmware.

SCSI-Ported EMC Disks Sometimes Respond Inappropriately to Probing From FC Bridge (PR 250790)

Under certain unknown I/O loads in a Level-2 or -3 resource domain, a disk in a SCSI-ported EMC subsystem can be missed when configuring (devctl -c) an FC Bridge back into the system following a service procedure such as a replacement. Under certain I/O loads, an EMC disk will respond to the probe with a "SCSI busy" signal, which is not recognized by the operating system as a retryable event, and that disk will not be included in the I/O configuration for that path. This error has been discovered by checking the dumpconf output after the procedure to verify that all devices are found.


ATTENTION

As long as the system is configured for a Level-2 or -3 resource domain, retries and rerouting will protect from data loss.


Workaround: Once a previously configured disk is discovered to be missing from a particular scsibus connected to the parent FC Bridge that was reconfigured, issue a devctl -c scsibusx command to recover the attached disk.

EMC Sequence Timeouts Sometimes Turn into Hard Errors in xSeries 430 or IBM NUMA-Q Clusters (PR 251635)

In xSeries 430 or IBM NUMA-Q clusters, if a hardware-error interrupt (B_INT) should occur on one of the nodes during certain unknown load conditions, an EMC I/O may fail on the other node(s) of the cluster, causing soft and hard I/O errors on those nodes.

Workaround: There is no known workaround at this time. The probability of a B_INT on a xSeries 430 or IBM NUMA-Q system is very low.