Knowledge Center         Contents    Previous  Next    Index  
Platform Computing Corp.

How the System Works

LSF can be configured in different ways that affect the scheduling of jobs. By default, this is how LSF handles a new job:

  1. Receive the job. Create a job file. Return the job ID to the user.
  2. Schedule the job and select the best available host.
  3. Dispatch the job to a selected host.
  4. Set the environment on the host.
  5. Start the job.

Contents

Job Submission

The life cycle of a job starts when you submit the job to LSF. On the command line, bsub is used to submit jobs, and you can specify many options to bsub to modify the default behavior, including the use of a JSDL file. Jobs must be submitted to a queue.

Queues

Queues represent a set of pending jobs, lined up in a defined order and waiting for their opportunity to use resources. Queues implement different job scheduling and control policies. All jobs submitted to the same queue share the same scheduling and control policy. Queues do not correspond to individual hosts; each queue can use all server hosts in the cluster, or a configured subset of the server hosts.

A queue is a network-wide holding place for jobs. Jobs enter the queue via the bsub command. LSF can be configured to have one or more default queues. Jobs that are not submitted to a specific queue will be assigned to the first default queue that accepts them. Queues have the following attributes associated with them:

Example queue
Begin Queue
QUEUE_NAME = normal
PRIORITY = 30
STACKLIMIT= 2048
DESCRIPTION = For normal low priority jobs, running only if hosts 
are lightly loaded.
QJOB_LIMIT = 60     # job limit of the queue
PJOB_LIMIT = 2     # job limit per processor
ut = 0.2
io = 50/240
USERS = all
HOSTS = all 
NICE = 20
End Queue 
Queue priority

Defines the order in which queues are searched to determine which job will be processed. Queues are assigned a priority by the LSF administrator, where a higher number has a higher priority. Queues are serviced by LSF in order of priority from the highest to the lowest. If multiple queues have the same priority, LSF schedules all the jobs from these queues in first-come, first-served order.

Automatic queue selection

Typically, a cluster has multiple queues. When you submit a job to LSF you might define which queue the job will enter. If you submit a job without specifying a queue name, LSF considers the requirements of the job and automatically chooses a suitable queue from a list of candidate default queues. If you did not define any candidate default queues, LSF will create a new queue using all the default settings, and submit the job to that queue.

Viewing default queues

Use bparams to display default queues:

bparams
Default Queues: normal
... 

The user can override this list by defining the environment variable LSB_DEFAULTQUEUE.

How automatic queue selection works

LSF selects a suitable queue according to:

If multiple queues satisfy the above requirements, then the first queue listed in the candidate queues (as defined by the DEFAULT_QUEUE parameter or the LSB_DEFAULTQUEUE environment variable) that satisfies the requirements is selected.

Job files

When a batch job is submitted to a queue, LSF Batch holds it in a job file until conditions are right for it to be executed. Then the job file is used to execute the job.

UNIX

The job file is a Bourne shell script run at execution time.

Windows

The job file is a batch file processed at execution time.

Job Scheduling and Dispatch

Submitted jobs sit in queues until they are scheduled and dispatched to a host for execution. When a job is submitted to LSF, many factors control when and where the job starts to run:

Scheduling policies

First-Come, First-Served (FCFS) scheduling

By default, jobs in a queue are dispatched in first-come, first-served (FCFS) order. This means that jobs are dispatched according to their order in the queue. Since jobs are ordered according to job priority, this does not necessarily mean that jobs will be dispatched in the order of submission. The order of jobs in the queue can also be modified by the user or administrator.

Service level agreement (SLA) scheduling

An SLA in LSF is a "just-in-time" scheduling policy that defines an agreement between LSF administrators and LSF users. The SLA scheduling policy defines how many jobs should be run from each SLA to meet the configured goals.

Fairshare scheduling and other policies

If a fairshare scheduling policy has been specified for the queue or if host partitions have been configured, jobs are dispatched in accordance with these policies instead. To solve diverse problems, LSF allows multiple scheduling policies in the same cluster. LSF has several queue scheduling policies such as exclusive, preemptive, fairshare, and hierarchical fairshare.

Scheduling and dispatch

Jobs are scheduled at regular intervals (5 seconds by default, configured by the parameter JOB_SCHEDULING_INTERVAL in lsb.params). Once jobs are scheduled, they can be immediately dispatched to hosts.

To prevent overloading any host, LSF waits a short time between dispatching jobs to the same host. The delay is configured by the JOB_ACCEPT_INTERVAL parameter in lsb.params or lsb.queues. JOB_ACCEPT_INTERVAL controls the number of seconds to wait after dispatching a job to a host before dispatching a second job to the same host. The default is 60 seconds. If JOB_ACCEPT_INTERVAL is set to zero, more than one job can be started on a host at a time.

For large clusters, define LSF_SERVER_HOSTS in lsf.conf to decrease the load on the master LIM.

Some operating systems, such as Linux and AIX, let you increase the number of file descriptors that can be allocated to the master host. You do not need to limit the number of file descriptors to 1024 if you want fast job dispatching. To take advantage of the greater number of file descriptors, you must set the parameter LSB_MAX_JOB_DISPATCH_PER_SESSION in lsf.conf to a value greater than 300 and less than or equal to one-half the value of MAX_SBD_CONNS defined in lsb.params. LSB_MAX_JOB_DISPATCH_PER_SESSION defines the maximum number of jobs that mbatchd can dispatch during one job scheduling session. You must restart mbatchd and sbatchd when you change the value of this parameter for the change to take effect.

Dispatch order

Jobs are not necessarily dispatched in order of submission.

Each queue has a priority number set by an LSF Administrator when the queue is defined. LSF tries to start jobs from the highest priority queue first.

By default, LSF considers jobs for dispatch in the following order:

Jobs can be dispatched out of turn if pre-execution conditions are not met, specific hosts or resources are busy or unavailable, or a user has reached the user job slot limit.

Viewing job order in queue

Use bjobs to see the order in which jobs in a queue will actually be dispatched for the FCFS policy.

Changing job order in queue (btop and bbot)

Use the btop and bbot commands to change the job order in the queue.

See Changing Job Order Within Queues for more information.

Host Selection

Each time LSF attempts to dispatch a job, it checks to see which hosts are eligible to run the job. A number of conditions determine whether a host is eligible:

A host is only eligible to run a job if all the conditions are met. If a job is queued and there is an eligible host for that job, the job is placed on that host. If more than one host is eligible, the job is started on the best host based on both the job and the queue resource requirements.

Host load levels

A host is available if the values of the load indices (such as r1m, pg, mem) of the host are within the configured scheduling thresholds. There are two sets of scheduling thresholds: host and queue. If any load index on the host exceeds the corresponding host threshold or queue threshold, the host is not eligible to run any job.

Viewing host load levels

Eligible hosts

When LSF tries to place a job, it obtains current load information for all hosts.

The load levels on each host are compared to the scheduling thresholds configured for that host in the Host section of lsb.hosts, as well as the per-queue scheduling thresholds configured in lsb.queues.

If any load index exceeds either its per-queue or its per-host scheduling threshold, no new job is started on that host.

Viewing eligible hosts

The bjobs -lp command displays the names of hosts that cannot accept a job at the moment together with the reasons the job cannot be accepted.

Resource requirements

Resource requirements at the queue level can also be used to specify scheduling conditions (for example, r1m<0.4 && pg<3).

A higher priority or earlier batch job is only bypassed if no hosts are available that meet the requirements of that job.

If a host is available but is not eligible to run a particular job, LSF looks for a later job to start on that host. LSF starts the first job found for which that host is eligible.

Job Execution Environment

When LSF runs your jobs, it tries to make it as transparent to the user as possible. By default, the execution environment is maintained to be as close to the submission environment as possible. LSF will copy the environment from the submission host to the execution host. The execution environment includes the following:

Since a network can be heterogeneous, it is often impossible or undesirable to reproduce the submission host's execution environment on the execution host. For example, if home directory is not shared between submission and execution host, LSF runs the job in the /tmp on the execution host. If the DISPLAY environment variable is something like Unix:0.0, or :0.0, then it must be processed before using on the execution host. These are automatically handled by LSF.

To change the default execution environment, use:

For resource control, LSF also changes some of the execution environment of jobs. These include nice values, resource usage limits, or any other environment by configuring a job starter.

Shared user directories

LSF works best when user home directories are shared across all hosts in the cluster. To provide transparent remote execution, you should share user home directories on all LSF hosts.

To provide transparent remote execution, LSF commands determine the user's current working directory and use that directory on the remote host.

For example, if the command cc file.c is executed remotely, cc only finds the correct file.c if the remote command runs in the same directory.

LSF automatically creates an .lsbatch subdirectory in the user's home directory on the execution host. This directory is used to store temporary input and output files for jobs.

Executables and the PATH environment variable

Search paths for executables (the PATH environment variable) are passed to the remote execution host unchanged. In mixed clusters, LSF works best when the user binary directories (for example, /usr/bin, /usr/local/bin) have the same path names on different host types. This makes the PATH variable valid on all hosts.

LSF configuration files are normally stored in a shared directory. This makes administration easier. There is little performance penalty for this, because the configuration files are not frequently read.

Fault Tolerance

LSF is designed to continue operating even if some of the hosts in the cluster are unavailable. One host in the cluster acts as the master, but if the master host becomes unavailable another host takes over. LSF is available as long as there is one available host in the cluster.

LSF can tolerate the failure of any host or group of hosts in the cluster. When a host crashes, all jobs running on that host are lost. No other pending or running jobs are affected. Important jobs can be submitted to LSF with an option to automatically restart if the job is lost because of a host failure.

Dynamic master host

The LSF master host is chosen dynamically. If the current master host becomes unavailable, another host takes over automatically. The failover master host is selected from the list defined in LSF_MASTER_LIST in lsf.conf (specified in install.config at installation). The first available host in the list acts as the master. LSF might be unavailable for a few minutes while hosts are waiting to be contacted by the new master.

Running jobs are managed by sbatchd on each server host. When the new mbatchd starts, it polls the sbatchd on each host and finds the current status of its jobs. If sbatchd fails but the host is still running, jobs running on the host are not lost. When sbatchd is restarted it regains control of all jobs running on the host.

Network failure

If the cluster is partitioned by a network failure, a master LIM takes over on each side of the partition. Interactive load-sharing remains available, as long as each host still has access to the LSF executables.

Event log file (lsb.events)

Fault tolerance in LSF depends on the event log file, lsb.events, which is kept on the primary file server. Every event in the system is logged in this file, including all job submissions and job and host status changes. If the master host becomes unavailable, a new master is chosen by lim. sbatchd on the new master starts a new mbatchd. The new mbatchd reads the lsb.events file to recover the state of the system.

For sites not wanting to rely solely on a central file server for recovery information, LSF can be configured to maintain a duplicate event log by keeping a replica of lsb.events. The replica is stored on the file server, and used if the primary copy is unavailable. When using LSF's duplicate event log function, the primary event log is stored on the first master host, and re-synchronized with the replicated copy when the host recovers.

caution:  
LSF

Partitioned network

If the network is partitioned, only one of the partitions can access lsb.events, so batch services are only available on one side of the partition. A lock file is used to make sure that only one mbatchd is running in the cluster.

Host failure

If an LSF server host fails, jobs running on that host are lost. No other jobs are affected. Jobs can be submitted as rerunnable, so that they automatically run again from the beginning or as checkpointable, so that they start again from a checkpoint on another host if they are lost because of a host failure.

If all of the hosts in a cluster go down, all running jobs are lost. When a host comes back up and takes over as master, it reads the lsb.events file to get the state of all batch jobs. Jobs that were running when the systems went down are assumed to have exited, and email is sent to the submitting user. Pending jobs remain in their queues, and are scheduled as hosts become available.

Job exception handling

You can configure hosts and queues so that LSF detects exceptional conditions while jobs are running, and take appropriate action automatically. You can customize what exceptions are detected, and the corresponding actions. By default, LSF does not detect any exceptions.

See Handling Host-level Job Exceptions and Handling Job Exceptions in Queues for more information about job-level exception management.


Platform Computing Inc.
www.platform.com
Knowledge Center         Contents    Previous  Next    Index