Isolating problems with the SAN File System

This topic explains how to determine and isolate problems with the SAN File System, SAN, and LAN.

In most cases, you can use the logs provided by the SAN File System to begin isolating problems.
  • For problems that seem to be related to clients, use the logs that are available with the client operating system (such as the system log on AIX® clients and the Event Log on Windows® clients) to determine the cause of the problem. In addition, you can use the information provided in Troubleshooting a SAN File System client.
  • For problems that seem to be related to administrative user access, use the security log and the administrative log to determine the cause of the problem. If you access these logs through the master metadata server, you see a consolidated view of the logs from each of the metadata servers in the cluster. If you access these logs through a subordinate metadata server, you see the logs for that particular metadata server.

    In addition, you can use the information provided in Troubleshooting an administrative server.

  • For problems that seem to be related to the cluster, metadata servers, or metadata, use the server log to determine the cause of the problem.

    In addition, you can use the information provided in Troubleshooting the cluster. If you access this log through the master metadata server, you see a consolidated view of the logs from each metadata server in the cluster (called the cluster log). If you access these logs through a subordinate metadata server, you see the logs for that particular metadata server only.

  • For problems that seem to be related to the installation, refer to Troubleshooting the installation. The topics provide commands and suggestions that you can use to diagnose installation problems.
  • For problems that seem to be related to the RSA (Remote Supervisor Adapter) II, refer to Troubleshooting the RSA II adapter. This topic provides information about problems you might encounter with the RSA II during the installation and upgrade processes.
  • For problems that seem to be related to the engines in the SAN File System, refer to the server documentation to determine the cause of the problem.
If you are not sure if the problem is related to the SAN File System, SAN, or LAN, use the information in this section to begin isolating the problem.
Note: Certain events, such as restarting the operating system or disconnecting cables, can cause the SAN File System to lose connectivity to logical unit numbers (LUNs). If the logs indicates I/O failures for a client or metadata server, verify the following:
  • Configured system LUNs should be visible only from metadata servers. Each metadata server must see all configured system LUNs. Configured user LUNs should be visible only from SAN File System clients. Not all user LUNs need to be seen by all clients. A client can see a subset of the total user LUNs, but it must see all LUNs configured for any pool to which it has access. If the client cannot see all LUNs associated with a particular pool, a warning prints in the server log.

    You can configure clients to only see a select group of user LUNs that it needs to access, or configure them to view all of the user LUNs.

  • A SAN fabric switch has not lost the zoning configuration. If the operating system on the switch is restarted, it is possible for the fabric to lose the zoning configuration, which prevents metadata servers and SAN File System clients from reaching LUNs.
You might need to force SAN File System to remap LUNs if you lose connectivity. Refer to your SAN administrator for help with LUN rediscovery options specific to your operating environment.

Identifying SAN problems

Perform the following steps to determine if the problem is related to the SAN itself:
  1. Determine whether the SAN configuration was recently changed, such as changing the Fibre Channel cable connections or switch zoning. If so, verify that the changes were correct and, if necessary, reverse those changes.
  2. Verify that all switches and storage devices used by SAN File System are powered on and are not reporting hardware failures. If you find problems, resolve them before proceeding.
  3. Verify that the Fibre Channel cables that connect the metadata servers to the switches are securely connected.
  4. IBM® Subsystem Device Driver (SDD) version 1.5.1 is provided with SAN File System and provides support for multipath environments. You can use the datapath query commands to view statistics, as well as information about paths and adapters. For information about using the datapath query commands, refer to the Subsystem Device Driver User's Guideprovided on the SAN File System documentation CD-ROM.
  5. If you are running a SAN Management tool that you are familiar with, use it to view the SAN topology and isolate the failing component. If you are not using a SAN Management tool, you can start IBM Tivoli® SAN Manager on the master console and use it to view the SAN Topology and isolate the failure. For information about SAN problem determination with IBM Tivoli SAN Manager, contact the Tivoli Storage Area Network (SAN) support center.

Identifying IP networking problems

Perform the following steps to determine whether the problem is related to the IP network itself:
  1. Verify that all switches used by the SAN File System are powered on and are not reporting any hardware failures. If problems are found, resolve them before proceeding further.
  2. Verify that the Ethernet cables that connect the metadata servers to the switches are securely connected.
  3. Verify that the metadata servers, clients, and storage devices are on the same network and subnet.

Identifying storage problems

Perform the following steps to determine whether the problem is related to the storage devices:
  1. Determine whether any other hosts that may be attached to the storage devices are having the same problems.
  2. Determine whether a single metadata server or client is having trouble accessing the storage device or all metadata servers and clients are experiencing I/O errors.
  3. Refer to the documentation for the storage devices for more information about isolating problems with those devices.

Parent topic: Troubleshooting

Related reference
Client diagnostic tools
Server diagnostic tools

Library | Support | Terms of use | Feedback
(C) Copyright IBM Corporation 2003, 2004. All Rights Reserved.
IBM TotalStorage SAN File System v2.2