Use the information in this topic to troubleshoot problems that
you are having with the local network.
Note: For more information on metadata server failures, refer to Troubleshooting
the cluster.
Problem
A
problem exists with the local network on which the metadata servers communicate.
The
problem might be a network fault or network partition:
- A network fault. A local network fault can occur if there is a
bad Ethernet adapter in an engine or the Ethernet cable is not connected between
the Ethernet adapter and the IP network. In a local network fault, the cluster
reacts as if the metadata server on which the fault occurred is down.
The master metadata server excludes the failed
metadata server and reforms the cluster. The metadata server that was excluded
is aborted, resulting in a server core file that goes to /usr/tank/server.
If the abort fails, the RSA II stops and restarts the engine that is hosting
the failed metadata server. Review the logs (log.std and log.stopengine) on
the master and the log.std log on the failed metadata server.
- A network partition. A local network partition
can occur if there is a problem in the Ethernet network that causes two or
more metadata servers to lose communication with the master metadata server
or with the rest of the cluster. Both sides of the partition attempt to take
over, but the side of the partition that contains the majority of the metadata
servers generally survives and reforms the cluster. If the partition results
in an even number of metadata servers on each side of the partition, the side
of the partition that contains the master survives and reforms the cluster.
On the side of the partition that does not survive, the metadata servers are
aborted or its engines are shutdown and restarted by the RSA II. Review the
logs (log.std and log.stopengine) on the master of the surviving partition
and the log.std log on all subordinate servers in the losing partition.
Investigation
- If a local network fault occurs with one of the metadata servers, SAN
File System performs the following actions to resolve the problem:

- The master aborts the failed metadata server or, if the abort fails, the
RSA II automatically shuts down and restarts the engine hosting the failed
metadata server. SAN File System fails over all filesets served by the failed
subordinate metadata server to another metadata server.
- After repairing the network fault, one of the following situations occurs:
- After restarting, the failed metadata server
attempts to restart and goes into Initializing state because it cannot communicate
with the master. After the network partition is fixed, the metadata server
completes the initialization and automatically rejoins the cluster.
- If autorestart is disabled, run the following command from the command-line
interface to restart the server: /usr/tank/admin/bin/sfscli startautorestart
- Wait for the cluster to be reformed to include this metadata server.
- Use the SAN File System console or the administrative CLI to verify that
all metadata servers in the cluster are in an online state.
- SAN File System automatically fails back any static filesets assigned
to the restored metadata server.
- If there is a network partition, SAN File System performs the following
actions to resolve the problem:

- The master attempts to abort the server. If the abort fails, the RSA II
automatically shuts down and restarts the engine hosting the failed metadata
server. SAN File System fails over all filesets served by the failed metadata
server to another metadata server.
- After repairing the network fault, one of the following situations occurs:
- After restarting, the failed metadata server
attempts to restart and goes into Initializing state because it cannot communicate
with the master. After the network partition is fixed, the metadata server
completes the initialization and automatically rejoins the cluster.
- If autorestart is disabled, run the following command from the command-line
interface to restart the server: /usr/tank/admin/bin/sfscli startautorestart
- Wait for the cluster to be reformed to include this metadata server.
- Use the SAN File System console or the administrative CLI to verify that
all metadata servers in the cluster are in an online state.