Knowledge Center Contents Previous Next Index |
Interactive Jobs with bsub
Contents
- About Interactive Jobs
- Submitting Interactive Jobs
- Performance Tuning for Interactive Batch Jobs
- Interactive Batch Job Messaging
- Running X Applications with bsub
- Writing Job Scripts
- Registering utmp File Entries for Interactive Batch Jobs
About Interactive Jobs
It is sometimes desirable from a system management point of view to control all workload through a single centralized scheduler.
Running an interactive job through the LSF batch system allows you to take advantage of batch scheduling policies and host selection features for resource-intensive jobs. You can submit a job and the least loaded host is selected to run the job.
Since all interactive batch jobs are subject to LSF policies, you will have more control over your system. For example, you may dedicate two servers as interactive servers, and disable interactive access to all other servers by defining an interactive queue that only uses the two interactive servers.
Scheduling policies
Running an interactive batch job allows you to take advantage of batch scheduling policies and host selection features for resource-intensive jobs.
An interactive batch job is scheduled using the same policy as all other jobs in a queue. This means an interactive job can wait for a long time before it gets dispatched. If fast response time is required, interactive jobs should be submitted to high-priority queues with loose scheduling constraints.
Interactive queues
You can configure a queue to be interactive-only, batch-only, or both interactive and batch with the parameter INTERACTIVE in
lsb.queues
.See the
Platform LSF Configuration Reference
for information about configuring interactive queues in thelsb.queues
file.Interactive jobs with non-batch utilities
Non-batch utilities such as
lsrun
,lsgrun
, etc., use LIM simple placement advice for host selection when running interactive tasks. For more details on using non-batch utilities to run interactive tasks, see Running Interactive and Remote Tasks.Submitting Interactive Jobs
Use the
bsub -I
option to submit batch interactive jobs, and thebsub -Is
and -Ip
options to submit batch interactive jobs in pseudo-terminals.Pseudo-terminals are not supported for Windows.
For more details, see the
bsub
command.Finding out which queues accept interactive jobs
Before you submit an interactive job, you need to find out which queues accept interactive jobs with the
bqueues -l
command.If the output of this command contains the following, this is a batch-only queue. This queue does not accept interactive jobs:
SCHEDULING POLICIES: NO_INTERACTIVEIf the output contains the following, this is an interactive-only queue:
SCHEDULING POLICIES: ONLY_INTERACTIVEIf none of the above are defined or if
SCHEDULING POLICIES
is not in the output ofbqueues -l
, both interactive and batch jobs are accepted by the queue.You configure interactive queues in the
lsb.queues
file.Submit an interactive job
- Use the
bsub -I
option to submit an interactive batch job.For example:
bsub -I ls
Submits a batch interactive job which displays the output of
ls
at the user's terminal.
%
bsub -I -q interactive -n 4,10 lsmake
<<Waiting for dispatch ...>>
This example starts Platform Make on 4 to 10 processors and displays the output on the terminal.
A new job cannot be submitted until the interactive job is completed or terminated.
When an interactive job is submitted, a message is displayed while the job is awaiting scheduling. The
bsub
command stops display of output from the shell until the job completes, and no mail is sent to the user by default. A user can issue actrl-c
at any time to terminate the job.Interactive jobs cannot be checkpointed.
Interactive batch jobs cannot be rerunnable (
bsub -r
)You can submit interactive batch jobs to rerunnable queues (RERUNNABLE=y in
lsb.queues
) or rerunnable application profiles (RERUNNABLE=y inlsb.applications
).Submit an interactive job by using a pseudo-terminal
Submission of interaction jobs using pseudo-terminal is not supported for Windows for either
lsrun
orbsub
LSF commands.bsub -Ip
- To submit a batch interactive job by using a pseudo-terminal, use the
bsub -Ip
option.For example:
%bsub -Ip vi myfile
Submits a batch interactive job to edit
myfile
.When you specify the
-Ip
option,bsub
submits a batch interactive job and creates a pseudo-terminal when the job starts. Some applications such asvi
for example, require a pseudo-terminal in order to run correctly.bsub -Is
- To submit a batch interactive job and create a pseudo-terminal with shell mode support, use the
bsub -Is
option.For example:
%bsub -Is csh
Submits a batch interactive job that starts up
csh
as an interactive shell.When you specify the
-Is
option,bsub
submits a batch interactive job and creates a pseudo-terminal with shell mode support when the job starts. This option should be specified for submitting interactive shells, or applications which redefine the CTRL-C and CTRL-Z keys (for example,jove
).Submit an interactive job and redirect streams to files
bsub -i, -o, -e
You can use the
-I
option together with the-i
,-o
, and-e
options ofbsub
to selectively redirect streams to files. For more details, see thebsub(1)
man page.
- To save the standard error stream in the
job.err
file, while standard input and standard output come from the terminal:%bsub -I -q interactive -e job.err lsmake
Split stdout and stderr
If in your environment there is a wrapper around
bsub
and LSF commands so that end-users are unaware of LSF and LSF-specific options, you can redirect standard output and standard error of batch interactive jobs to a file with the > operator.By default, both standard error messages and output messages for batch interactive jobs are written to
stdout
on the submission host.
- To write both
stderr
andstdout
tomystdout
:bsub -I myjob 2>mystderr 1>mystdout
- To redirect both
stdout
andstderr
to different files, set LSF_INTERACTIVE_STDERR=y inlsf.conf
or as an environment variable.For example, with LSF_INTERACTIVE_STDERR set:
bsub -I myjob 2>mystderr 1>mystdout
stderr
is redirected tomystderr
, andstdout
tomystdout
.See the
Platform LSF Configuration Reference
for more details on LSF_INTERACTIVE_STDERR.Submit an interactive job, redirect streams to files, and display streams
When using any of the interactive
bsub
options (for example: -I
,-Is
,-ISs
) as well as the-o
or-e
options, you can also have your output displayed on the console by using the-tty
option.
- To run an interactive job, redirect the error stream to file, and display the stream to the console:
%
bsub -I -q interactive -e job.err -tty
lsmake
Performance Tuning for Interactive Batch Jobs
LSF is often used on systems that support both interactive and batch users. On one hand, users are often concerned that load sharing will overload their workstations and slow down their interactive tasks. On the other hand, some users want to dedicate some machines for critical batch jobs so that they have guaranteed resources. Even if all your workload is batch jobs, you still want to reduce resource contentions and operating system overhead to maximize the use of your resources.
Numerous parameters can be used to control your resource allocation and to avoid undesirable contention.
Types of load conditions
Since interferences are often reflected from the load indices, LSF responds to load changes to avoid or reduce contentions. LSF can take actions on jobs to reduce interference before or after jobs are started. These actions are triggered by different load conditions. Most of the conditions can be configured at both the queue level and at the host level. Conditions defined at the queue level apply to all hosts used by the queue, while conditions defined at the host level apply to all queues using the host.
Scheduling conditions
These conditions, if met, trigger the start of more jobs. The scheduling conditions are defined in terms of load thresholds or resource requirements.
At the queue level, scheduling conditions are configured as either resource requirements or scheduling load thresholds, as described in
lsb.queues
. At the host level, the scheduling conditions are defined as scheduling load thresholds, as described inlsb.hosts
.Suspending conditions
These conditions affect running jobs. When these conditions are met, a SUSPEND action is performed to a running job.
At the queue level, suspending conditions are defined as STOP_COND as described in
lsb.queues
or as suspending load threshold. At the host level, suspending conditions are defined as stop load threshold as described inlsb.hosts
.Resuming conditions
These conditions determine when a suspended job can be resumed. When these conditions are met, a RESUME action is performed on a suspended job.
At the queue level, resume conditions are defined as by RESUME_COND in
lsb.queues
, or by theloadSched
thresholds for the queue if RESUME_COND is not defined.Types of load indices
To effectively reduce interference between jobs, correct load indices should be used properly. Below are examples of a few frequently used parameters.
Paging rate (pg)
The paging rate (
pg
) load index relates strongly to the perceived interactive performance. If a host is paging applications to disk, the user interface feels very slow.The paging rate is also a reflection of a shortage of physical memory. When an application is being paged in and out frequently, the system is spending a lot of time performing overhead, resulting in reduced performance.
The paging rate load index can be used as a threshold to either stop sending more jobs to the host, or to suspend an already running batch job to give priority to interactive users.
This parameter can be used in different configuration files to achieve different purposes. By defining paging rate threshold in
lsf.cluster.
cluster_name
, the host will become busy from LIM's point of view; therefore, no more jobs will be advised by LIM to run on this host.By including paging rate in queue or host scheduling conditions, jobs can be prevented from starting on machines with a heavy paging rate, or can be suspended or even killed if they are interfering with the interactive user on the console.
A job suspended due to
pg
threshold will not be resumed even if the resume conditions are met unless the machine is interactively idle for more than PG_SUSP_IT seconds.Interactive idle time (it)
Strict control can be achieved using the idle time (
it
) index. This index measures the number of minutes since any interactive terminal activity. Interactive terminals include hard wired ttys,rlogin
andlslogin
sessions, and X shell windows such asxterm
. On some hosts, LIM also detects mouse and keyboard activity.This index is typically used to prevent batch jobs from interfering with interactive activities. By defining the suspending condition in the queue as
it<1 && pg>50
, a job from this queue will be suspended if the machine is not interactively idle and the paging rate is higher than 50 pages per second. Furthermore, by defining the resuming condition asit>5 && pg<10
in the queue, a suspended job from the queue will not resume unless it has been idle for at least five minutes and the paging rate is less than ten pages per second.The
it
index is only non-zero if no interactive users are active. Setting theit
threshold to five minutes allows a reasonable amount of think time for interactive users, while making the machine available for load sharing, if the users are logged in but absent.For lower priority batch queues, it is appropriate to set an
it
suspending threshold of two minutes and scheduling threshold of ten minutes in thelsb.queues
file. Jobs in these queues are suspended while the execution host is in use, and resume after the host has been idle for a longer period. For hosts where all batch jobs, no matter how important, should be suspended, set a per-host suspending threshold in thelsb.hosts
file.CPU run queue length (r15s, r1m, r15m)
Running more than one CPU-bound process on a machine (or more than one process per CPU for multiprocessors) can reduce the total throughput because of operating system overhead, as well as interfering with interactive users. Some tasks such as compiling can create more than one CPU-intensive task.
Queues should normally set CPU run queue scheduling thresholds below 1.0, so that hosts already running compute-bound jobs are left alone. LSF scales the run queue thresholds for multiprocessor hosts by using the effective run queue lengths, so multiprocessors automatically run one job per processor in this case.
For short to medium-length jobs, the
r1m
index should be used. For longer jobs, you might want to add anr15m
threshold. An exception to this are high priority queues, where turnaround time is more important than total throughput. For high priority queues, anr1m
scheduling threshold of 2.0 is appropriate.See Load Indices for the concept of effective run queue length.
CPU utilization (ut)
The
ut
parameter measures the amount of CPU time being used. When all the CPU time on a host is in use, there is little to gain from sending another job to that host unless the host is much more powerful than others on the network. Aut
threshold of 90% prevents jobs from going to a host where the CPU does not have spare processing cycles.If a host has very high
pg
but lowut
, then it may be desirable to suspend some jobs to reduce the contention.Some commands report
ut
percentage as a number from 0-100, some report it as a decimal number between 0-1. The configuration parameter in thelsf.cluster.
cluster_name
file and the configuration files take a fraction in the range from 0 to 1, while thebsub -R
resource requirement string takes an integer from 1-100.The command
bhist
shows the execution history of batch jobs, including the time spent waiting in queues or suspended because of system load.The command
bjobs -p
shows why a job is pending.Scheduling conditions and resource thresholds
Three parameters, RES_REQ, STOP_COND and RESUME_COND, can be specified in the definition of a queue. Scheduling conditions are a more general way for specifying job dispatching conditions at the queue level. These parameters take resource requirement strings as values which allows you to specify conditions in a more flexible manner than using the
loadSched
orloadStop
thresholds.Interactive Batch Job Messaging
LSF can display messages to
stderr
or the Windows console when the following changes occur with interactive batch jobs:
- Job state
- Pending reason
- Suspend reason
Other job status changes, like switching the job's queue, are not displayed.
Limitations
Interactive batch job messaging is not supported in a MultiCluster environment.
Windows
Interactive batch job messaging is not fully supported on Windows. Only changes in the job state that occur before the job starts running are displayed. No messages are displayed after the job starts.
Configure interactive batch job messaging
Messaging for interactive batch jobs can be specified cluster-wide or in the user environment.
Cluster level
- To enable interactive batch job messaging for all users in the cluster, the LSF administrator configures the following parameters in
lsf.conf
:
- LSB_INTERACT_MSG_ENH=Y
- (Optional) LSB_INTERACT_MSG_INTVAL
LSB_INTERACT_MSG_INTVAL specifies the time interval, in seconds, in which LSF updates messages about any changes to the pending status of the job. The default interval is 60 seconds. LSB_INTERACT_MSG_INTVAL is ignored if LSB_INTERACT_MSG_ENH is not set.
User level
- To enable messaging for interactive batch jobs, LSF users can define LSB_INTERACT_MSG_ENH and LSB_INTERACT_MSG_INTVAL as environment variables.
The user-level definition of LSB_INTERACT_MSG_ENH overrides the definition in
lsf.conf
.Example messages
Job in pending state
The following example shows messages displayed when a job is in pending state:
bsub -Is -R "ls < 2" csh
Job <2812> is submitted to default queue <normal>. <<Waiting for dispatch ...>> << Job's resource requirements not satisfied: 2 hosts; >> << Load information unavailable: 1 host; >> << Just started a job recently: 1 host; >> << Load information unavailable: 1 host; >> << Job's resource requirements not satisfied: 1 host; >>Job terminated by user
The following example shows messages displayed when a job in pending state is terminated by the user:
bsub -m hostA -b 13:00 -Is sh
Job <2015> is submitted to default queue <normal>. Job will be scheduled after Fri Nov 19 13:00:00 1999 <<Waiting for dispatch ...>> << New job is waiting for scheduling >> << The job has a specified start time >>bkill 2015
<< Job <2015> has been terminated by user or administrator >> <<Terminated while pending>>Job suspended then resumed
The following example shows messages displayed when a job is dispatched, suspended, and then resumed:
bsub -m hostA -Is sh
Job <2020> is submitted to default queue <normal>. <<Waiting for dispatch ...>> << New job is waiting for scheduling >> <<Starting on hostA>>bstop 2020
<< The job was suspended by user >>bresume 2020
<< Waiting for re-scheduling after being resumed by user >>Running X Applications with bsub
You can start an X session on the least loaded host by submitting it as a batch job:
bsub xterm
An
xterm
is started on the least loaded host in the cluster.When you run X applications using
lsrun
orbsub
, the environment variableDISPLAY
is handled properly for you. It behaves as if you were running the X application on the local machine.Writing Job Scripts
You can build a job file one line at a time, or create it from another file, by running
bsub
without specifying a job to submit. When you do this, you start an interactive session in whichbsub
reads command lines from the standard input and submits them as a single batch job. You are prompted withbsub
>
for each line.You can use the
bsub -Zs
command to spool a file.For more details on
bsub
options, see thebsub(1)
man page.Writing a job file one line at a time
UNIX example
%bsub -q simulation bsub> cd /work/data/myhomedir bsub> myjob arg1 arg2 ...... bsub> rm myjob.log bsub> ^D
Job <1234> submitted to queue <simulation>.In the above example, the 3 command lines run as a Bourne shell (
/bin/sh
) script. Only valid Bourne shell command lines are acceptable in this case.Windows example
C:\> bsub -q simulation bsub> cd \\server\data\myhomedir bsub> myjob arg1 arg2 ...... bsub> del myjob.log bsub> ^Z
Job <1234> submitted to queue <simulation>.In the above example, the 3 command lines run as a batch file (.BAT). Note that only valid Windows batch file command lines are acceptable in this case.
Specifying job options in a file
In this example, options to run the job are specified in the
options_file
.%bsub -q simulation < options_file
Job <1234> submitted to queue <simulation>.UNIX
On UNIX, the
options_file
must be a text file that contains Bourne shell command lines. It cannot be a binary executable file.Windows
On Windows, the
options_file
must be a text file containing Windows batch file command lines.Spooling a job command file
Use
bsub -Zs
to spool a job command file to the directory specified by the JOB_SPOOL_DIR parameter inlsb.params
, and use the spooled file as the command file for the job.Use the
bmod -Zsn
command to modify or remove the command file after the job has been submitted. Removing or modifying the original input file does not affect the submitted job.Redirecting a script to bsub standard input
You can redirect a script to the standard input of the
bsub
command:%bsub < myscript
Job <1234> submitted to queue <test>.In this example, the
myscript
file contains job submission options as well as command lines to execute. When thebsub
command reads a script from its standard input, it can be modified right afterbsub
returns for the next job submission.When the script is specified on the
bsub
command line, the script is not spooled:%bsub myscript
Job <1234> submitted to default queue <normal>.In this case the command line
myscript
is spooled, instead of the contents of themyscript
file. Later modifications to themyscript
file can affect job behavior.Specifying embedded submission options
You can specify job submission options in scripts read from standard input by the
bsub
command using lines starting with#BSUB
:%bsub -q simulation bsub> #BSUB -q test bsub> #BSUB -o outfile -R "mem>10" bsub> myjob arg1 arg2 bsub> #BSUB -J simjob bsub> ^D
Job <1234> submitted to queue <simulation>.Note that:
- Command-line options override embedded options. In this example, the job is submitted to the
simulation
queue rather than thetest
queue.- Submission options can be specified anywhere in the standard input. In the above example, the
-J
option ofbsub
is specified after the command to be run.- More than one option can be specified on one line, as shown in the example above.
Running a job under a particular shell
By default, LSF runs batch jobs using the Bourne (
/bin/sh)
shell. You can specify the shell under which a job is to run. This is done by specifying an interpreter in the first line of the script.For example:
%bsub bsub> #!/bin/csh -f bsub> set coredump=`ls |grep core` bsub> if ( "$coredump" != "") then bsub> mv core core.`date | cut -d" " -f1` bsub> endif bsub> myjob bsub> ^D
Job <1234> is submitted to default queue <normal>.The
bsub
command must read the job script from standard input to set the execution shell. If you do not specify a shell in the script, the script is run using/bin/sh
. If the first line of the script starts with a#
not immediately followed by an exclamation mark (!
), then/bin/csh
is used to run the job.For example:
%bsub bsub> # This is a comment line. This tells the system to use /bin/csh to bsub> # interpret the script. bsub> bsub> setenv DAY `date | cut -d" " -f1` bsub> myjob bsub> ^D
Job <1234> is submitted to default queue <normal>.If running jobs under a particular shell is required frequently, you can specify an alternate shell using a command-level job starter and run your jobs interactively. See Controlling Execution Environment Using Job Starters for more details.
Registering utmp File Entries for Interactive Batch Jobs
LSF administrators can configure the cluster to track user and account information for interactive batch jobs submitted with
bsub -Ip
orbsub -Is
. User and account information is registered as entries in the UNIXutmp
file, which holds information for commands such aswho
. Registering user information for interactive batch jobs inutmp
allows more accurate job accounting.Configuration and operation
To enable
utmp
file registration, the LSF administrator sets the LSB_UTMP parameter inlsf.conf
.When LSB_UTMP is defined, LSF registers the job by adding an entry to the
utmp
file on the execution host when the job starts. After the job finishes, LSF removes the entry for the job from theutmp
file.Limitations
- Registration of
utmp
file entries is supported on the following platforms:
- SGI IRIX (6.4 and later)
- Solaris (all versions)
- HP-UX (all versions)
- Linux (all versions)
utmp
file registration is not supported in a MultiCluster environment.- Because interactive batch jobs submitted with
bsub -I
are not associated with a pseudo-terminal,utmp
file registration is not supported for these jobs.
Platform Computing Inc.
www.platform.com |
Knowledge Center Contents Previous Next Index |