Troubleshooting a metadata server

Use the information in this topic to troubleshoot problems that you are having with a metadata server.

Problem

A metadata server has failed. Attempts by the SAN File System to automatically restart the metadata server have also failed and manual intervention will be required to restart the metadata server. Clients cannot access file metadata and, in the case of a master metadata server failure, the cluster itself is no longer available.
Note: If a metadata server loses power, you will need to reset the RSA II card on the engine hosting the metadata server to restore proper communication between the RSA II card and the metadata server. To reset the RSA II card, you will need to access the Web interface for the RSA II card and select the option for resetting the RSA II card.

Investigation

If a subordinate metadata server has failed, take the following actions:

This illustration shows the failure of a subordinate Metadata server
Perform the following steps until the problem is resolved:
  1. Use the SAN File System console or the administrative command-line interface to view the status of the subordinate metadata server.
  2. View the cluster message log to verify that the SAN File System has not been able to restart the subordinate metadata server. In addition, the messages in this log can provide you with an indication of what the problem may be.
  3. Reassign (move) the filesets from the subordinate metadata server to another metadata server that is online. The decision to reassign filesets will be based on the length of time it will take to repair the metadata server.
    Note: Make sure that this metadata server is actually down and all restart attempts were unsuccessful before reassigning filesets:
    1. Run lsserver to verify that the metadata server is shut down.
    2. Run lsengine to verify that the engine hosting the master metadata server is shut down.
    3. Run lsserver to verify that the metadata server to which you are going to assign the role of master is up and running.
  4. Resolve any problems found in the cluster message log that are related to this metadata server. If the messages indicate a hardware error, use the RSA II Web interface to access the RSA II adapter for the engine hosting the subordinate metadata server. The RSA II Web Interface can assist you in isolating the hardware problem.
  5. If there was an abnormal termination of the metadata server, you may begin to see errors on the clients, even after the problem with the metadata server has been resolved by itself with the automatic restart feature or by taking the failed metadata server offline and reassigning its filesets to another metadata server. If you begin seeing these types of problems, you will need to restart the affected clients:
    • On clients running the AIX® operating system:
      1. Run rmstclient to unmount the global namespace, remove the virtual client, and unload the file-system driver.
      2. Run setupstclient to load the file-system driver, create the virtual client, and mount the global namespace.
    • On clients running the Windows® operating system, reboot the system.
  6. After repairing the failed metadata server, bring the server back online (use the startserver command from the administrative command-line interface).
  7. If you previously reassigned the filesets for this metadata server to another server, you can now assign them back to this metadata server.
If the master metadata server has failed, take the following actions.

This illustration shows the failure of a master Metadata server
Perform the following steps until the problem is resolved:
  1. Use the administrative command-line interface to view the server message log and verify that the SAN File System has not been able to restart the master metadata server. In addition, the messages in this log can provide you with an indication of what the problem may be.
  2. Define a new master metadata server for the cluster.
    Note: Make sure that this metadata server is actually down and all restart attempts were unsuccessful before attempting to set a new master or before reassigning filesets:
    1. Run lsserver to verify that the cluster does not have a master metadata server.
    2. Run lsengine to verify that the engine hosting the master metadata server is shut down.
    3. Run lsserver to verify that the metadata server to which you are going to assign the role of master is up and running.
  3. Reassign (move) the filesets from this metadata server to another metadata server that is online.
    1. List all of the filesets assigned to the metadata server to be upgraded.
      /usr/tank/admin/bin/sfscli lsfileset -server metadata_server_name
    2. Assign the filesets to different metadata server.
      /usr/tank/admin/bin/sfscli setfilesetserver -server 
      new_metadata_server_name fileset1 fileset2 fileset3
  4. Resolve any problems found in the cluster message log that are related to this metadata server. If the messages indicate a hardware error, use the RSA II Web interface to access the RSA II adapter for the engine hosting the former master metadata server. The RSA II Web Interface can assist you in isolating the hardware problem.
  5. If there was an abnormal termination of the metadata server, you may begin to see errors on the clients, even after the problem with the metadata server has been resolved by itself with the automatic restart feature or by taking the failed metadata server offline and reassigning its filesets to another metadata server. If you begin seeing these types of problems, you will need to restart the affected clients:
    • On clients running AIX or Linux:
      1. Run rmstclient to unmount the global namespace, remove the virtual client, and unload the file-system driver.
      2. Run setupstclient to load the file-system driver, create the virtual client, and mount the global namespace.
    • On clients running Windows, reboot the system.
    • On clients running Solaris:
      1. Run umount to unmount the global namespace.
      2. Run mount to mount the global namespace again.
  6. After repairing the failed metadata server, bring the server back online (use the startserver command from the administrative command-line interface).
  7. If you choose, you can now set this metadata server to be the master metadata server once again. To set this metadata server to be the new master, you must first shut down the existing master metadata server and power off the engine hosting the master.
  8. If you previously reassigned the filesets for this metadata server to another server, you can now assign them back to this metadata server.

Parent topic: Troubleshooting the cluster

Related tasks
Accessing the RSA II adapter
Reassigning filesets to metadata servers
Taking a metadata server offline

Terms of use | Feedback
(C) Copyright IBM Corporation 2003, 2004. All Rights Reserved.