Troubleshooting the cluster

This topic provides an overview of how to resolve problems with the SAN File System cluster.

A SAN File System cluster can contain from two to eight engines, each running a separate instance of a metadata server. The metadata servers have one of the following roles:
  • Master.

    The master metadata server manages system metadata for the entire cluster. It controls all operations involving system metadata, such as allocation of storage space, coordination of most administrative operations, and access to the global namespace. In addition, the master metadata server can also perform the same tasks that are performed by subordinate metadata servers, managing file metadata and workload for one or more filesets.

    One metadata server at a time can act as the master in a cluster.

  • Subordinate.
    Subordinate servers manage user metadata and workload for one or more filesets.
    Note: A fileset can be managed by only one metadata server.

To obtain access to the user data in a specific fileset, clients communicate with the metadata server that manages that fileset.

Metadata server failures

When a subordinate metadata server becomes unresponsive or fails, such as when the operating system crashes or hangs, the engine is automatically restarted. In addition, if you have enabled the automatic restart service (enabled by default), the metadata server is also automatically restarted.

While the subordinate metadata server is in the process of restarting, it cannot respond to requests from clients:
  • Client requests to the metadata server and client access of any files served by the metadata server will fail or be delayed.
  • Client applications experience a pause in service while the metadata server is unavailable (typically, this will last approximately one or two minutes). During this time, active operations of some applications can begin to time out. Whether additional errors occur is based on how the client applications respond to a timeout situation.

When the master metadata server becomes unresponsive or fails, any clients attempting to access filesets managed by the master experience the same results as clients attempting to access filesets managed by subordinate metadata servers. In addition, subordinate metadata servers can be affected by the unavailability of the master. Metadata servers in the cluster rely on a heartbeat mechanism to verify availability. Depending on the length of time that the master metadata server is unavailable, subordinate metadata servers may detect the loss of the heartbeat mechanism and cease all activity until the master is available again (or you set a new master).

Parent topic: Troubleshooting

Related reference
Commands

Terms of use | Feedback
(C) Copyright IBM Corporation 2003, 2004. All Rights Reserved.