Isolating problems with the SAN File System

This topic explains how to begin problem determination by isolating problems with the SAN File System, SAN, and LAN.

In most cases, you can use the logs provided by the SAN File System to begin isolating problems.
  • For problems that seem to be related to clients, use the logs that are available with the client operating system (such as the system log on AIX® clients and the Event Log on Windows® clients) to determine the cause of the problem. In addition, you can use the information provided in Troubleshooting a SAN File System client.
  • For problems that seem to be related to administrative user access, use the security log and the administrative log to determine the cause of the problem. If you access these logs through the master metadata server, you will see a consolidated view of the logs from each of the metadata servers in the cluster. If you access these logs through a subordinate metadata server, you will see the logs for that particular metadata server.

    In addition, you can use the information provided in Troubleshooting an administrative server.

  • For problems that seem to be related to the cluster, metadata servers, or metadata, use the server log to determine the cause of the problem.

    In addition, you can use the information provided in Troubleshooting the cluster. If you access this log through the master metadata server, you will see a consolidated view of the logs from each of the metadata servers in the cluster (called the cluster log). If you access these logs through a subordinate metadata server, you will see the logs for that particular metadata server.

  • For problems that seem to be related to the engines in the SAN File System, refer to the server documentation to determine the cause of the problem.
In cases where you are not sure whether the problem is related to the SAN File System, SAN, or LAN, use the information in this section to begin isolating the problem.
Note: Certain events, such as operating system reboots or cable disconnects, can cause the SAN File System to lose connectivity to LUNs. If the logs indicates I/O failures for a client or metadata server, verify the following:
  • Configured LUNs are visible from the metadata servers and SAN File System clients. From the server, you should see both the user LUNs and system LUNs. From a client, you should see only the user LUNs.

    You can configure the clients to only see a select group of user LUNs which it needs to access, or you can configure them to view all of the user LUNs.

  • A SAN fabric switch has not lost the zoning configuration. If the operating system on the switch is rebooted, it is possible for the fabric to lose the zoning configuration, which prevents metadata servers and SAN File System clients from reaching LUNs.
You might need to force the SAN File System to remap LUNs in the event of lost connectivity. Please see your SAN administrator for help with LUN rediscovery options specific to your operating environment.

Identifying SAN problems

Perform the following steps to determine whether the problem is related to the SAN itself:
  1. Determine whether the SAN configuration was recently changed, such as changing the Fibre Channel cable connections or switch zoning. If so, verify that the changes were correct and if necessary reverse those changes.
  2. Verify that all switches and RAID controllers that are used by the SAN File System are powered on and are not reporting any hardware failures. If problems are found, resolve them before proceeding further.
  3. Verify that the Fibre Channel cables that connect the metadata servers to the switches are securely connected.
  4. IBM® Subsystem Device Driver (SDD) version 1.5.1 is provided with the SAN File System and provides support for multipath environments. You can use the datapath query commands to view statistics, as well as information about paths and adapters. For information about using the datapath query commands, refer to the Subsystem Device Driver User's Guide, which is provided on the SAN File System documentation CD-ROM.
  5. If you are running a SAN Management tool that you are familiar with, use it to view the SAN topology and isolate the failing component. If you are not using a SAN Management tool, you can start IBM Tivoli® SAN Manager on the master console and use it to view the SAN Topology and isolate the failure. For information about SAN problem determination with IBM Tivoli SAN Manager, contact the Tivoli Storage Area Network (SAN) support center.

Identifying IP networking problems

Perform the following steps to determine whether the problem is related to the IP network itself:
  1. Verify that all switches used by the SAN File System are powered on and are not reporting any hardware failures. If problems are found, resolve them before proceeding further.
  2. Verify that the Ethernet cables that connect the metadata servers to the switches are securely connected.
  3. Verify that the metadata servers, clients, and storage devices are on the same network and subnet.

Identifying storage problems

Perform the following steps to determine whether the problem is related to the storage devices:
  1. Determine whether any other hosts that may be attached to the storage devices are having the same problems.
  2. Determine whether a single metadata server or client is having trouble accessing the storage device or all metadata servers and clients are experiencing I/O errors.
  3. Refer to the documentation for the storage devices for more information about isolating problems with those devices.

Parent topic: Troubleshooting

Related reference
Client diagnostic tools
Server diagnostic tools

Terms of use | Feedback
(C) Copyright IBM Corporation 2003, 2004. All Rights Reserved.