Knowledge Center         Contents    Previous  Next    Index  
Platform Computing Corp.

Tuning the Cluster

Contents

Tuning LIM

LIM provides critical services to all LSF components. In addition to the timely collection of resource information, LIM provides host selection and job placement policies. If you are using Platform MultiCluster, LIM determines how different clusters should exchange load and resource information. You can tune LIM policies and parameters to improve performance.

LIM uses load thresholds to determine whether to place remote jobs on a host. If one or more LSF load indices exceeds the corresponding threshold (too many users, not enough swap space, etc.), then the host is regarded as busy and LIM will not recommend jobs to that host. You can also tune LIM load thresholds.

You can also change default LIM behavior and pre-select hosts to be elected master to improve performance.

In this section

Adjusting LIM Parameters

There are two main goals in adjusting LIM configuration parameters: improving response time, and reducing interference with interactive use. To improve response time, tune LSF to correctly select the best available host for each job. To reduce interference, tune LSF to avoid overloading any host.

LIM policies are advisory information for applications. Applications can either use the placement decision from LIM, or make further decisions based on information from LIM.

Most of the LSF interactive tools use LIM policies to place jobs on the network. LSF uses load and resource information from LIM and makes its own placement decisions based on other factors in addition to load information.

Files that affect LIM are lsf.shared, lsf.cluster.cluster_name, where cluster_name is the name of your cluster.

RUNWINDOW parameter

LIM thresholds and run windows affect the job placement advice of LIM. Job placement advice is not enforced by LIM.

The RUNWINDOW parameter defined in lsf.cluster.cluster_name specifies one or more time windows during which a host is considered available. If the current time is outside all the defined time windows, the host is considered locked and LIM will not advise any applications to run jobs on the host.

Load Thresholds

Load threshold parameters define the conditions beyond which a host is considered busy by LIM and are a major factor in influencing performance. No jobs will be dispatched to a busy host by LIM's policy. Each of these parameters is a load index value, so that if the host load goes beyond that value, the host becomes busy.

LIM uses load thresholds to determine whether to place remote jobs on a host. If one or more LSF load indices exceeds the corresponding threshold (too many users, not enough swap space, etc.), then the host is regarded as busy and LIM will not recommend jobs to that host.

Thresholds can be set for any load index supported internally by the LIM, and for any external load index.

If a particular load index is not specified, LIM assumes that there is no threshold for that load index. Define looser values for load thresholds if you want to aggressively run jobs on a host.

See Load Thresholds for more details.

In this section
Load indices that affect LIM performance

Load index
Description
r15s
15-second CPU run queue length
r1m
1-minute CPU run queue length
r15m
15-minute CPU run queue length
pg
Paging rate in pages per second
swp
Available swap space
it
Interactive idle time
ls
Number of users logged in

For more details on load indices see Load Indices.

Comparing LIM load thresholds

To tune LIM load thresholds, compare the output of lsload to the thresholds reported by lshosts -l.

The lsload and lsmon commands display an asterisk * next to each load index that exceeds its threshold.

Example

Consider the following output from lshosts -l and lsload:

lshosts -l  
HOST_NAME:  hostD
...
LOAD_THRESHOLDS:
     r15s   r1m  r15m   ut   pg    io   ls   it   tmp   swp   mem
     -      3.5  -      -    15    -    -    -    -     2M    1M

HOST_NAME:  hostA
...
LOAD_THRESHOLDS:
     r15s   r1m  r15m   ut   pg    io   ls   it   tmp   swp   mem
     -      3.5  -      -    15    -    -    -    -     2M    1M 
lsload 
HOST_NAME status r15s  r1m  r15m   ut    pg   ls  it  tmp  swp  mem
hostD     ok     0.0   0.0  0.0    0%    0.0  6   0   30M  32M  10M
hostA     busy   1.9   2.1  1.9    47%  *69.6 21  0   38M  96M  60M 

In this example, the hosts have the following characteristics:

If LIM often reports a host as busy

If LIM often reports a host as busy when the CPU utilization and run queue lengths are relatively low and the system is responding quickly, the most likely cause is the paging rate threshold. Try raising the pg threshold.

Different operating systems assign subtly different meanings to the paging rate statistic, so the threshold needs to be set at different levels for different host types. In particular, HP-UX systems need to be configured with significantly higher pg values; try starting at a value of 50.

There is a point of diminishing returns. As the paging rate rises, eventually the system spends too much time waiting for pages and the CPU utilization decreases. Paging rate is the factor that most directly affects perceived interactive response. If a system is paging heavily, it feels very slow.

If interactive jobs slow down response

If you find that interactive jobs slow down system response too much while LIM still reports your host as ok, reduce the CPU run queue lengths (r15s, r1m, r15m). Likewise, increase CPU run queue lengths if hosts become busy at low loads.

Multiprocessor systems

On multiprocessor systems, CPU run queue lengths (r15s, r1m, r15m) are compared to the effective run queue lengths as displayed by the lsload -E command.

CPU run queue lengths should be configured as the load limit for a single processor. Sites with a variety of uniprocessor and multiprocessor machines can use a standard value for r15s, r1m and r15m in the configuration files, and the multiprocessor machines will automatically run more jobs.

Note that the normalized run queue length displayed by lsload -N is scaled by the number of processors. See Load Indices for the concept of effective and normalized run queue lengths.

Changing Default LIM Behavior to Improve Performance

You may want to change the default LIM behavior in the following cases:

In this section
Default LIM behavior

By default, each LIM running in an LSF cluster must read the configuration files lsf.shared and lsf.cluster.cluster_name to obtain information about resource definitions, host types, host thresholds, etc. This includes master and slave LIMs.

This requires that each host in the cluster share a common configuration directory or an exact replica of the directory.

Change default LIM behavior

The parameter LSF_MASTER_LIST in lsf.conf allows you to identify for the LSF system which hosts can become masters. Hosts not listed in LSF_MASTER_LIST will be considered as slave-only hosts and will never be considered to become master.

Set LSF_MASTER_LIST (lsf.conf)
  1. Edit lsf.conf and set the parameter LSF_MASTER_LIST to indicate hosts that are candidates to become the master host. For example:
  2. LSF_MASTER_LIST="hostA hostB hostC" 
     

    The order in which you specify hosts in LSF_MASTER_LIST is the preferred order for selecting hosts to become the master LIM.

  3. Save your changes.
  4. Reconfigure the cluster
  5. lsadmin reconfig
    badmin mbdrestart. 
    
Reconfiguration and LSF_MASTER_LIST
If you change LSF_MASTER_LIST

Whenever you change the parameter LSF_MASTER_LIST, reconfigure the cluster with lsadmin reconfig and badmin mbdrestart.

If you change lsf.cluster.cluster_name or lsf.shared

If you make changes that do not affect load report messages such as adding or removing slave-only hosts, you only need to restart the LIMs on all master candidates with the command lsadmin limrestart and the specific host names.

For example:

lsadmin limrestart hostA hostB hostC 

If you make changes that affect load report messages such as load indices, you must restart all the LIMs in the cluster. Use the command lsadmin reconfig.

How LSF works with LSF_MASTER_LIST
LSF_MASTER_LIST undefined

In this example, lsf.shared and lsf.cluster.cluster_name are shared among all LIMs through an NFS file server. The preferred master host is the first available server host in the cluster list in lsf.cluster.cluster_name.

Any slave LIM can become the master LIM. Whenever you reconfigure, all LIMs read lsf.shared and lsf.cluster.cluster_name to get updated information.

For this example, slave LIMs read local lsf.conf files.

LSF_MASTER_LIST defined

The files lsf.shared and lsf.cluster.cluster_name are shared only among LIMs listed as candidates to be elected master with the parameter LSF_MASTER_LIST.

The preferred master host is no longer the first host in the cluster list in lsf.cluster.cluster_name, but the first host in the list specified by LSF_MASTER_LIST in lsf.conf.

Whenever you reconfigure, only master LIM candidates read lsf.shared and lsf.cluster.cluster_name to get updated information. The elected master LIM sends configuration information to slave LIMs.

The order in which you specify hosts in LSF_MASTER_LIST is the preferred order for selecting hosts to become the master LIM.

Considerations

Generally, the files lsf.cluster.cluster_name and lsf.shared for hosts that are master candidates should be identical.

When the cluster is started up or reconfigured, LSF rereads configuration files and compares lsf.cluster.cluster_name and lsf.shared for hosts that are master candidates.

In some cases in which identical files are not shared, files may be out of sync. This section describes situations that may arise should lsf.cluster.cluster_name and lsf.shared for hosts that are master candidates not be identical to those of the elected master host.

LSF_MASTER_LIST not defined

When LSF_MASTER_LIST is not defined, LSF rejects candidate master hosts from the cluster if their lsf.cluster.cluster_name and lsf.shared files are different from the files of the elected master. Even if only comment lines are different, hosts are rejected.

A warning is logged in the log file lim.log.master_host_name and the cluster continues to run, but without the hosts that were rejected.

If you want the hosts that were rejected to be part of the cluster, ensure lsf.cluster.cluster_name and lsf.shared are identical for all hosts and restart all LIMs in the cluster with the command:

lsadmin limrestart all 
LSF_MASTER_LIST defined

When LSF_MASTER_LIST is defined, LSF only rejects candidate master hosts listed in LSF_MASTER_LIST from the cluster if the number of load indices in lsf.cluster.cluster_name or lsf.shared for master candidates is different from the number of load indices in the lsf.cluster.cluster_name or lsf.shared files of the elected master.

A warning is logged in the log file lim.log.master_host_name and the cluster continues to run, but without the hosts that were rejected.

If you want the hosts that were rejected to be part of the cluster, ensure the number of load indices in lsf.cluster.cluster_name and lsf.shared are identical for all master candidates and restart LIMs on the master and all master candidates:

lsadmin limrestart hostA hostB hostC

LSF_MASTER_LIST defined, and master host goes down

If LSF_MASTER_LIST is defined and the elected master host goes down, and if the number of load indices in lsf.cluster.cluster_name or lsf.shared for the new elected master is different from the number of load indices in the files of the master that went down, LSF will reject all master candidates that do not have the same number of load indices in their files as the newly elected master. LSF will also reject all slave-only hosts. This could cause a situation in which only the newly elected master is considered part of the cluster.

A warning is logged in the log file lim.log.new_master_host_name and the cluster continues to run, but without the hosts that were rejected.

To resolve this, from the current master host, restart all LIMs:

lsadmin limrestart all

All slave-only hosts will be considered part of the cluster. Master candidates with a different number of load indices in their lsf.cluster.cluster_name or lsf.shared files will be rejected.

When the master that was down comes back up, you will have the same situation as described in LSF_MASTER_LIST defined. You will need to ensure load indices defined in lsf.cluster.cluster_name and lsf.shared for all master candidates are identical and restart LIMs on all master candidates.

Improving performance of mbatchd query requests on UNIX

You can improve mbatchd query performance on UNIX systems using the following methods:

In this section

How mbatchd works without multithreading

Ports

By default, mbatchd uses the port defined by the parameter LSB_MBD_PORT in lsf.conf or looks into the system services database for port numbers to communicate with LIM and job request commands.

It uses this port number to receive query requests from clients.

Servicing requests

For every query request received, mbatchd forks a child mbatchd to service the request. Each child mbatchd processes the request and then exits.

Configure mbatchd to use multithreading

When mbatchd has a dedicated port specified by the parameter LSB_QUERY_PORT in lsf.conf, it forks a child mbatchd which in turn creates threads to process query requests.

As soon as mbatchd has forked a child mbatchd, the child mbatchd takes over and listens on the port to process more query requests. For each query request, the child mbatchd creates a thread to process it.

The child mbatchd continues to listen to the port number specified by LSB_QUERY_PORT and creates threads to service requests until the job status changes, a new job is submitted, or until the time specified in MBD_REFRESH_TIME in lsb.params has passed.

Specify a time interval, in seconds, when mbatchd will fork a new child mbatchd to service query requests to keep information sent back to clients updated. A child mbatchd processes query requests creating threads.

MBD_REFRESH_TIME has the following syntax:

MBD_REFRESH_TIME=seconds [min_refresh_time]

where min_refresh_time defines the minimum time (in seconds) that the child mbatchd will stay to handle queries. The valid range is 0 - 300. The default is 5 seconds.

The default for min_refresh_time is 10 seconds.

If you use the bjobs command and do not get up-to-date information, you may want to decrease the value of MBD_REFRESH_TIME or MIN_REFRESH_TIME in lsb.params to make it likely that successive job queries could get the newly-submitted job information.

note:  
Lowering the value of MBD_REFRESH_TIME or MIN_REFRESH_TIME increases the load on mbatchd and might negatively affect performance.
  1. Specify a query-dedicated port for the mbatchd by setting LSB_QUERY_PORT in lsf.conf.
  2. See Set a query-dedicated port for mbatchd.

  3. Optional: Set an interval of time to indicate when a new child mbatchd is to be forked by setting MBD_REFRESH_TIME in lsb.params. The default value of MBD_REFRESH_TIME is 5 seconds, and valid values are 0-300 seconds.
  4. See Specify an expiry time for child mbatchds (optional).

  5. Optional: Use NEWJOB_REFRESH=Y in lsb.params to enable a child mbatchd to get up to date new job information from the parent mbatchd.
  6. See Configure mbatchd to push new job information to child mbatchd.

Set a query-dedicated port for mbatchd

To change the default mbatchd behavior so that mbatchd forks a child mbatchd that can create threads, specify a port number with LSB_QUERY_PORT in lsf.conf.

tip:  
This configuration only works on UNIX platforms that support thread programming.
  1. Log on to the host as the primary LSF administrator.
  2. Edit lsf.conf.
  3. Add the LSB_QUERY_PORT parameter and specify a port number that will be dedicated to receiving requests from hosts.
  4. Save the lsf.conf file.
  5. Reconfigure the cluster:
  6. badmin mbdrestart

Specify an expiry time for child mbatchds (optional)

Use MBD_REFRESH_TIME in lsb.params to define how often mbatchd forks a new child mbatchd.

  1. Log on to the host as the primary LSF administrator.
  2. Edit lsb.params.
  3. Add the MBD_REFRESH_TIME parameter and specify a time interval in seconds to fork a child mbatchd.
  4. The default value for this parameter is 5 seconds. Valid values are 0 to 300 seconds.

  5. Save the lsb.params file.
  6. Reconfigure the cluster as follows:
  7. badmin reconfig

Specify hard CPU affinity

You can specify the master host CPUs on which mbatchd child query processes can run (hard CPU affinity). This improves mbatchd scheduling and dispatch performance by binding query processes to specific CPUs so that higher priority mbatchd processes can run more efficiently.

When you define this parameter, LSF runs mbatchd child query processes only on the specified CPUs. The operating system can assign other processes to run on the same CPU, however, if utilization of the bound CPU is lower than utilization of the unbound CPUs.

  1. Identify the CPUs on the master host that will run mbatchd child query processes.
  2. In the file lsb.params, define the parameter MBD_QUERY_CPUS.
  3. For example, if you specify:

    MBD_QUERY_CPUS=
    1 2

    the mbatchd child query processes will run only on CPU numbers 1 and 2 on the master host.

    You can specify CPU affinity only for master hosts that use one of the following operating systems:

  4. Verify that the mbatchd child query processes are bound to the correct CPUs on the master host.
    1. Start up a query process by running a query command such as bjobs.
    2. Check to see that the query process is bound to the correct CPU.
      • Linux: Run the command taskset -p <pid>
      • Solaris: Run the command ps -AP
Configure mbatchd to push new job information to child mbatchd

Prerequisites: LSB_QUERY_PORT must be defined. in lsf.conf.

If you have enabled multithreaded mbatchd support, the bjobs command may not display up-to-date information if two consecutive query commands are issued before a child mbatchd expires because child mbatchd job information is not updated. Use NEWJOB_REFRESH=Y in lsb.params to enable a child mbatchd to get up to date new job information from the parent mbatchd.

When NEWJOB_REFRESH=Y the parent mbatchd pushes new job information to a child mbatchd. Job queries with bjobs display new jobs submitted after the child mbatchd was created.

  1. Log on to the host as the primary LSF administrator.
  2. Edit lsb.params.
  3. Add NEWJOB_REFRESH=Y.
  4. You should set MBD_REFRESH_TIME in lsb.params to a value greater than 10 seconds.

  5. Save the lsb.params file.
  6. Reconfigure the cluster as follows:
  7. badmin reconfig


Platform Computing Inc.
www.platform.com
Knowledge Center         Contents    Previous  Next    Index