How many metadata servers do I need?

This topic helps you determine the number of metadata servers that are needed to handle the workload in your environment.

Context

Estimating the number of metadata servers requires a strong understanding of your environment and application workloads. If you do not have these statistics, it is recommended that you start with two to three metadata servers. Metadata servers can be added dynamically at a later time based upon actual observed performance. Note that three metadata servers would allow for additional metadata-server capacity and higher availability in case a metadata server needs service.

There are two methods for estimating the number of metadata servers, depending on the level of understanding you have with your environment and application workload: available storage and application workload. The application-workload method generally gives a more accurate picture of what to expect, but it requires a strong understanding of the I/O pattern for the application workload. The available-storage method, based on the number of disk drives, generally provides a simpler approach that is based on the physical attributes of the environment. The approach might not be as accurate for workloads that have very high or low metadata-transaction rates. Typically, workloads that perform a high number of file creations or deletions or space allocation activities result in high amounts of metadata traffic, whereas workloads that perform I/O operations over a few, fixed number of file system objects are generally not metadata intensive.

The number of metadata servers needed is proportional to the sum of all metadata transactions that all the connected clients generate for any workload. However, because there is an intermediate metadata cache on the SAN File System clients, under typical working conditions the volume of metadata transactions or metadata server operations per second (OPs) might be relatively few compared to the volume of file operations per second (FOPs) produced from a given workload of application operations per second (APPOPS).

The cache on the SAN File System client plays an important role in the SAN File System operation and generally operates like any other least-recently-used object cache. The cache has the typical characteristics of a cache in the sense that the larger the cache, the higher the performance, or the larger the hotset size (number of objects), the greater the potential for lower performance.

Applications that use few file system objects and are I/O intensive over a few file system objects tend to have the smallest hotset size. Hence, they could potentially perform the best under SAN File System.

Parent topic: Planning the cluster configuration

Terms of use | Feedback
(C) Copyright IBM Corporation 2003, 2004. All Rights Reserved.