Troubleshooting the local network

Use the information in this topic to troubleshoot problems that you are having with the local network.

Problem

There is a problem with the local network on which the metadata servers communicate. The problem may be:
  • A network fault. A local network fault can occur if there is a bad Ethernet adapter in an engine or the Ethernet cable is not connected between the Ethernet adapter and the IP network. In the event of a local network fault, the cluster will react as if the metadata server on which the fault occurred is down.

    The master metadata server reforms the cluster, excluding the failed metadata server. The metadata server itself will go into a wait state, and any filesets assigned to that metadata server will no longer be available to clients.

  • A network partition. A local network partition can occur if there is a problem in the Ethernet network that causes two or more metadata servers to lose communications with the master metadata server. The partition containing the master metadata server will react as if the metadata servers in the other partition are down. The metadata servers in the other partition will react as if the master metadata server is down.

Investigation

If there is a local network fault with one of the subordinate metadata servers, take the following actions.

This illustration shows a local network fault in the cluster
Perform the following steps in order until the problem is resolved:
  1. Use the RSA II Web interface to access the RSA II adapter for the engine hosting the subordinate metadata server.
  2. Shut down the engine from the RSA II Web interface. You will not be able to use the Administrative command-line interface or the SAN File System console to shut down the metadata server because the master metadata server already considers the server to be shut down.
  3. Reassign (move) the filesets from the subordinate metadata server to another metadata server that is online. The decision to reassign filesets will be based on the length of time it will take to repair the metadata server.
    1. Run lsserver to verify that the metadata server to which you are going to assign the filesets is up and running.
    2. Refer to Reassigning filesets to metadata servers for the procedure to reassign the filesets.
  4. After repairing the network fault, you can have the metadata server rejoin the cluster:
    1. Start the metadata server.
    2. Wait for the cluster to be reformed to include this metadata server.
    3. Use the SAN File System console or the Administrative command-line interface to verify that all metadata servers in the cluster are in an Online state.
  5. If you previously reassigned the filesets for this metadata server to another server, you can now assign them back to this metadata server.
    1. Run lsserver to verify that the metadata server to which you are going to assign the filesets is up and running.
    2. Refer to Reassigning filesets to metadata servers for the procedure to reassign the filesets.
If there is a network partition, take the following actions.

This illustration shows a network partition
Perform the following steps in order until the problem is resolved:
  1. Use the RSA II Web interface to access the RSA II adapter for each of the partitioned engines.
  2. Shut down each engine from the RSA II Web interface. You will not be able to use the Administrative command-line interface or the SAN File System console to shut down the metadata server because the master metadata server already considers the servers to be shut down.
  3. Reassign (move) the filesets from the subordinate metadata server to another metadata server that is online. The decision to reassign filesets will be based on the length of time it will take to repair the metadata server.
    1. Run lsserver to verify that the metadata server to which you are going to assign the filesets is up and running.
    2. Refer to Reassigning filesets to metadata servers for the procedure to reassign the filesets.
  4. After repairing the network partition, you can have the metadata servers rejoin the cluster:
    1. Start each metadata server.
    2. Wait for the cluster to be reformed to include these metadata servers.
    3. Use the SAN File System console or the Administrative command-line interface to verify that all metadata servers in the cluster are in an Online state.
  5. If you previously reassigned the filesets for these metadata servers to other servers, you can now assign them back to these metadata servers.
    1. Run lsserver to verify that the metadata server to which you are going to assign the filesets is up and running.
    2. Refer to Reassigning filesets to metadata servers for the procedure to reassign the filesets.

Parent topic: Troubleshooting the cluster

Related tasks
Reassigning filesets to metadata servers
Taking a metadata server offline

Terms of use | Feedback
(C) Copyright IBM Corporation 2003, 2004. All Rights Reserved.