Troubleshooting a metadata server

Use the information in this topic to troubleshoot problems that you are having with a metadata server.

Problem

A metadata server fails with a hard fault or fails when auto-restart is disabled (for example, when the metadata server has not restarted). When a metadata server fails in this manner, SAN File System dynamically and automatically distributes the failed metadata server's filesets to one of the surviving metadata servers. If the metadata server that failed is the master, SAN File System automatically elects a new master allowing continued accessibility to the cluster. The client also pauses while the cluster reforms, a new master is detected, or filesets move. When it completes, clients continue to access file metadata as before

A metadata server fails with a soft fault or fails when auto-restart is enabled (for example, when the metadata server restarts immediately). When a metadata server fails in this manner, there is no fileset movement and no mastership change. However, the client pauses briefly while the cluster reforms. Clients continue to access file metadata after the cluster reformation completes.

Investigation

If a metadata server fails

This illustration shows the failure of a subordinate Metadata server

  1. Use the SAN File System console or the administrative command-line interface (CLI) to view the status of the metadata server.
  2. View the cluster message log to verify that SAN File System could not restart the metadata server. In addition, the messages in this log indicate a reason for the failure.
    • To view the cluster log from the master console, click Monitor system > Cluster log.
    • To view the cluster log from the CLI, enter this command on the master metadata server: /usr/tank/admin/bin/sfscli catlog -log cluster -entries 25.
  3. Make sure that the metadata server is actually down and all restart attempts were unsuccessful. Use the /usr/tank/admin/bin/sfscli lsautorestart command to view the current state of the restart mechanism. If the Service state indicates "failed" and the Last Probe state of the metadata server is "absent," the auto-restart feature was unable to restart the metadata server.
    1. Run the /usr/tank/admin/bin/sfscli lsengine command to verify that the engine hosting the metadata server is shut down.
    2. Run the /usr/tank/admin/bin/sfscli lsserver command on the master metadata server to verify that the metadata server assigned the role of master is up and running.
  4. Examine the RSA-II logs for the failed metadata server to find any hardware-related errors in the logs.
    Note: The auto-restart feature might have restarted the server. If it did not restart automatically, continue with the following steps.
  5. After repairing the failed metadata server, if the automatic restart service is enabled, the previously failed metadata server restarts automatically when you restart the metadata server. If the service is disabled, enable it to automatically restart the metadata server. The restarted metadata server comes up as a subordinate metadata server.
  6. If there were any static filesets assigned to the restarted server, SAN File System automatically fails back the static filesets to the restarted server when it rejoins the cluster. Depending on the fileset load balance, SAN File System might also redistribute dynamic filesets.

Troubleshooting metadata server access to metadata

If a metadata server cannot access metadata LUNs, from the administrative command-line interface on the master metadata server, run the /usr/tank/admin/bin/sfscli lslun command to verify that the LUNs are available. If the disks are not available, you can rediscover all disks by running the following commands from a metadata server:
  1. Run the /usr/tank/admin/bin/sfscli stopserver command from the administrative command-line interface to stop a single server. This command might not work when trying to stop a server with an I/O problem.
  2. Run the rmmod qla2300command.
  3. Run the modprobe qla2300command.
  4. Run the /usr/tank/server/bin/device_init.sh command to recreate raw devices needed by SAN File System.
  5. Run the /usr/tank/admin/bin/sfscli rediscoverluns command from the administrative command-line interface.
  6. Run the /usr/tank/admin/bin/sfscli startserver command from the administrative command-line interface to start the metadata server.

Parent topic: Troubleshooting the cluster

Related tasks
Accessing the RSA II adapter
Reassigning filesets to metadata servers
Taking a metadata server offline

Library | Support | Terms of use | Feedback
(C) Copyright IBM Corporation 2003, 2004. All Rights Reserved.
IBM TotalStorage SAN File System v2.2