Knowledge Center Contents Previous Next Index |
Job Arrays
LSF provides a structure called a job array that allows a sequence of jobs that share the same executable and resource requirements, but have different input files, to be submitted, controlled, and monitored as a single unit. Using the standard LSF commands, you can also control and monitor individual jobs and groups of jobs submitted from a job array.
After the job array is submitted, LSF independently schedules and dispatches the individual jobs. Each job submitted from a job array shares the same job ID as the job array and are uniquely referenced using an array index. The dimension and structure of a job array is defined when the job array is created.
Contents
- Create a Job Array
- Handling Input and Output Files
- Job Array Dependencies
- Monitoring Job Arrays
- Controlling Job Arrays
- Requeuing a Job Array
- Job Array Job Slot Limit
Create a Job Array
A job array is created at job submission time using the
-J
option ofbsub
.
- For example, the following command creates a job array named
myArray
made up of 1000 jobs.bsub -J "myArray[1-1000]" myJob
Job <123> is submitted to default queue <normal>.Syntax
The
bsub
syntax used to create a job array follows:bsub -J "
arrayName
[
indexList
,
...]"
myJob
Where:
-J "
arrayName
[indexList
, ...]"Names and creates the job array. The square brackets,
[ ]
, aroundindexList
must be entered exactly as shown and the job array name specification must be enclosed in quotes. Commas (,) are used to separate multipleindexList
entries. The maximum length of this specification is 255 characters.arrayName
User specified string used to identify the job array. Valid values are any combination of the following characters:
a-z | A-Z | 0-9 | . | - | _
indexList
= start[-end[:step]]Specifies the size and dimension of the job array, where:
start
Specifies the start of a range of indices. Can also be used to specify an individual index. Valid values are unique positive integers. For example,
[1-5]
and[1, 2, 3, 4, 5]
specify 5 jobs with indices 1 through 5.end
Specifies the end of a range of indices. Valid values are unique positive integers.
step
Specifies the value to increment the indices in a range. Indices begin at
start
, increment by the value ofstep
, and do not increment past the value ofend
. The default value is 1. Valid values are positive integers. For example,[1-10:2]
specifies a range of 1-10 with step value 2 creating indices 1, 3, 5, 7, and 9.After the job array is created (submitted), individual jobs are referenced using the job array name or job ID and an index value. For example, both of the following series of job array statements refer to jobs submitted from a job array named
myArray
which is made up of 1000 jobs and has a job ID of 123:myArray[1], myArray[2], myArray[3], ..., myArray[1000] 123[1], 123[2], 123[3], ..., 123[1000]Change the maximum size of a job array
A large job array allows a user to submit a large number of jobs to the system with a single job submission.
By default, the maximum number of jobs in a job array is 1000, which means the maximum size of a job array can never exceed 1000 jobs.
- To make a change to the maximum job array value, set MAX_JOB_ARRAY_SIZE in
lsb.params
to any positive integer between 1 and 2147483646. The maximum number of jobs in a job array cannot exceed the value set by MAX_JOB_ARRAY_SIZE.Handling Input and Output Files
LSF provides methods for coordinating individual input and output files for the multiple jobs created when submitting a job array. These methods require your input files to be prepared uniformly. To accommodate an executable that uses standard input and standard output, LSF provides runtime variables (%I and %J) that are expanded at runtime. To accommodate an executable that reads command line arguments, LSF provides an environment variable (LSB_JOBINDEX) that is set in the execution environment.
Methods
Prepare input files
LSF needs all the input files for the jobs in your job array to be located in the same directory. By default LSF assumes the current working directory (CWD); the directory from where
bsub
was issued.
- To override CWD, specify an absolute path when submitting the job array.
Each file name consists of two parts, a consistent name string and a variable integer that corresponds directly to an array index. For example, the following file names are valid input file names for a job array. They are made up of the consistent name
input
and integers that correspond to job array indices from 1 to 1000:input.1, input.2, input.3, ..., input.1000Redirecting Standard Input and Output
The variables %I and %J are used as substitution strings to support file redirection for jobs submitted from a job array. At execution time, %I is expanded to provide the job array index value of the current job, and %J is expanded at to provide the job ID of the job array.
Redirect standard input
- Use the
-i
option ofbsub
and the %I variable when your executable reads from standard input.To use %I, all the input files must be named consistently with a variable part that corresponds to the indices of the job array. For example:
input.1, input.2, input.3, ..., input.NFor example, the following command submits a job array of 1000 jobs whose input files are named
input.1
,input.2
,input.3
, ...,input.1000
and located in the current working directory:bsub -J "myArray[1-1000]" -i "input.%I" myJob
Redirect standard output and error
- Use the
-o
option ofbsub
and the %I and %J variables when your executable writes to standard output and error.
- To create an output file that corresponds to each job submitted from a job array, specify %I as part of the output file name.
For example, the following command submits a job array of 1000 jobs whose output files are put in CWD and named
output.1
,output.2
,output.3
, ...,output.1000:
bsub -J "myArray[1-1000]" -o "output.%I" myJob
- To create output files that include the job array job ID as part of the file name specify %J.
For example, the following command submits a job array of 1000 jobs whose output files are put in CWD and named
output.123.1
,output.123.2
,output.
123.
3
, ...,output.
123.
1000
. The job ID of the job array is 123.bsub -J "myArray[1-1000]" -o "output.%J.%I" myJob
Passing Arguments on the Command Line
The environment variable LSB_JOBINDEX is used as a substitution string to support passing job array indices on the command line. When the job is dispatched, LSF sets LSB_JOBINDEX in the execution environment to the job array index of the current job. LSB_JOBINDEX is set for all jobs. For non-array jobs, LSB_JOBINDEX is set to zero (0).
To use LSB_JOBINDEX, all the input files must be named consistently and with a variable part that corresponds to the indices of the job array. For example:
input.1, input.2, input.3, ..., input.NYou must escape LSB_JOBINDEX with a backslash, \, to prevent the shell interpreting
bsub
from expanding the variable. For example, the following command submits a job array of 1000 jobs whose input files are namedinput.1
,input.2
,input.3
, ...,input.1000
and located in the current working directory. The executable is being passed an argument that specifies the name of the input files:bsub -J "myArray[1-1000]"
myJob -f input.\$LSB_JOBINDEX
Job Array Dependencies
Like all jobs in LSF, a job array can be dependent on the completion or partial completion of a job or another job array. A number of job-array-specific dependency conditions are provided by LSF.
Set a whole array dependency
- To make a job array dependent on the completion of a job or another job array use the
-w "dependency_condition"
option ofbsub
.For example, to have an array dependent on the completion of a job or job array with job ID 123, use the following command:
bsub -w "done(123)" -J "myArray2[1-1000]" myJob
Set a partial array dependency
- To make a job or job array dependent on an existing job array , use one of the following dependency conditions.
- Use one the following operators (
op
) combined with a positive integer (num
) to build a condition:== | > | < | >= |<= | !=Optionally, an asterisk (
*
) can be used in place ofnum
to mean all jobs submitted from the job array.For example, to start a job named
myJob
when 100 or more elements in a job array with job ID 123 have completed successfully:bsub -w "numdone(123, >= 100)" myJob
Monitoring Job Arrays
Use
bjobs
andbhist
to monitor the current and past status of job arrays.Display job array status
- To display summary information about the currently running jobs submitted from a job array, use the
-A
option ofbjobs
.For example, a job array of 10 jobs with job ID 123:
bjobs -A 123
JOBID ARRAY_SPEC OWNER NJOBS PEND DONE RUN EXIT SSUSP USUSP PSUSP 123 myArra[1-10] user1 10 3 3 4 0 0 0 0Display job array dependencies
- To display information for any job dependency information for an array, use the
bjdepinfo
command.For example, a job array (with job ID 456) where you want to view the dependencies on the third element of the array:
bjdepinfo -c "456
[3]"JOBID CHILD CHILD_STATUS CHILD_NAME LEVEL 456[3] 300 PEND job300 1
Individual job status
Display current job status
- To display the status of the individual jobs submitted from a job array, specify the job array job ID with
bjobs
. For jobs submitted from a job array, JOBID displays the job array job ID, and JOBNAME displays the job array name and the index value of each job.For example, to view a job array with job ID 123:
bjobs 123
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 123 user1 DONE default hostA hostC myArray[1] Feb 29 12:34 123 user1 DONE default hostA hostQ myArray[2] Feb 29 12:34 123 user1 DONE default hostA hostB myArray[3] Feb 29 12:34 123 user1 RUN default hostA hostC myArray[4] Feb 29 12:34 123 user1 RUN default hostA hostL myArray[5] Feb 29 12:34 123 user1 RUN default hostA hostB myArray[6] Feb 29 12:34 123 user1 RUN default hostA hostQ myArray[7] Feb 29 12:34 123 user1 PEND default hostA myArray[8] Feb 29 12:34 123 user1 PEND default hostA myArray[9] Feb 29 12:34 123 user1 PEND default hostA myArray[10] Feb 29 12:34Display past job status
- To display the past status of the individual jobs submitted from a job array, specify the job array job ID with
bhist
.For example, to view the history of a job array with job ID 456:
bhist 456
Summary of time in seconds spent in various states: JOBID USER JOB_NAME PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL 456[1] user1 *rray[1] 14 0 65 0 0 0 79 456[2] user1 *rray[2] 74 0 25 0 0 0 99 456[3] user1 *rray[3] 121 0 26 0 0 0 147 456[4] user1 *rray[4] 167 0 30 0 0 0 197 456[5] user1 *rray[5] 214 0 29 0 0 0 243 456[6] user1 *rray[6] 250 0 35 0 0 0 285 456[7] user1 *rray[7] 295 0 33 0 0 0 328 456[8] user1 *rray[8] 339 0 29 0 0 0 368 456[9] user1 *rray[9] 356 0 26 0 0 0 382 456[10]user1 *ray[10] 375 0 24 0 0 0 399Specific job status
Display the current status of a specific job
- To display the current status of a specific job submitted from a job array, specify in quotes, the job array job ID and an index value with
bjobs
.For example, the status of the 5th job in a job array with job ID 123:
bjobs "123[5]"
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 123 user1 RUN default hostA hostL myArray[5] Feb 29 12:34Display the past status of a specific job
- To display the past status of a specific job submitted from a job array, specify, in quotes, the job array job ID and an index value with
bhist
.For example, the status of the 5th job in a job array with job ID 456:
bhist "456[5]"
Summary of time in seconds spent in various states: JOBID USER JOB_NAME PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL 456[5] user1 *rray[5] 214 0 29 0 0 0 243Controlling Job Arrays
You can control the whole array, all the jobs submitted from the job array, with a single command. LSF also provides the ability to control individual jobs and groups of jobs submitted from a job array. When issuing commands against a job array, use the job array job ID instead of the job array name. Job names are not unique in LSF, and issuing a command using a job array name may result in unpredictable behavior.
Most LSF commands allow operation on both the whole job array, individual jobs, and groups of jobs. These commands include
bkill
,bstop
,bresume
, andbmod
.Some commands only allow operation on individual jobs submitted from a job array. These commands include
btop
,bbot
, andbswitch
.
- Control a whole array
- Control individual jobs
- Control groups of jobs
Control a whole array
- To control the whole job array, specify the command as you would for a single job using only the job ID.
For example, to kill a job array with job ID 123:
bkill 123
Control individual jobs
- To control an individual job submitted from a job array, specify the command using the job ID of the job array and the index value of the corresponding job. The job ID and index value must be enclosed in quotes.
For example, to kill the 5th job in a job array with job ID 123:
bkill "123[5]"
Control groups of jobs
- To control a group of jobs submitted from a job array, specify the command as you would for an individual job and use
indexList
syntax to indicate the jobs.For example, to kill jobs 1-5, 239, and 487 in a job array with job ID 123:
bkill "123[1-5, 239, 487]"
Job Array Chunking
Job arrays in most queues can be chunked across an array boundary (not all jobs must belong to the same array). However, if the queue is preemptable or preemptive, the jobs are chunked when they belong to the same array.
For example:
job1[1], job1[2], job2[1], job2[2]
in a preemption queue withCHUNK_JOB_SIZE=3
Then
- job1[1] and job1[2] are chunked.
- job2[1] and job2[2] are chunked.
Requeuing a Job Array
Use
brequeue
to requeue a job array. When the job is requeued, it is assigned the PEND status and the job's new position in the queue is after other jobs of the same priority. You can requeue:
- Jobs in
DONE
job state- Jobs in
EXIT
job state- All jobs regardless of job state in a job array.
EXIT, RUN, DONE
jobs toPSUSP
state- Jobs in
RUN
job state
brequeue
is not supported across clusters.Requeue jobs in DONE state
- To requeue DONE jobs use the -d option of brequeue.
For example, the command
brequeue -J "myarray[1-10]" -d 123
requeues jobs with job ID 123 andDONE
status.Requeue Jobs in EXIT state
- To requeue EXIT jobs use the -e option of brequeue.
For example, the command
brequeue -J "myarray[1-10]" -e 123
requeues jobs with job ID 123 andEXIT
status.Requeue all jobs in an array regardless of job state
- A submitted job array can have jobs that have different job states. To requeue all the jobs in an array regardless of any job's state, use the -a option of
brequeue
.For example, the command
brequeue -J "myarray[1-10]" -a 123
requeues all jobs in a job array with job ID 123 regardless of their job state.Requeue RUN jobs to PSUSP state
- To requeue RUN jobs to PSUSP state, use the -H option of brequeue.
For example, the command
brequeue -J "myarray[1-10]" -H 123
requeues toPSUSP RUN
status jobs with job ID 123.Requeue jobs in RUN state
- To requeue RUN jobs use the -r option of brequeue.
For example, the command
brequeue -J "myarray[1-10]" -r 123
requeues jobs with job ID 123 andRUN
status.Job Array Job Slot Limit
The job array job slot limit is used to specify the maximum number of jobs submitted from a job array that are allowed to run at any one time. A job array allows a large number of jobs to be submitted with one command, potentially flooding a system, and job slot limits provide a way to limit the impact a job array may have on a system. Job array job slot limits are specified using the following syntax:
bsub -J "
job_array_name
[
index_list
]%
job_slot_limit
"
myJob
where:
%job_slot_limit
Specifies the maximum number of jobs allowed to run at any one time. The percent sign (
%
) must be entered exactly as shown. Valid values are positive integers less than the maximum index value of the job array.Setting a job array job slot limit
Set a job array slot limit at submission
- Use the
bsub
command to set a job slot limit at the time of submission.To set a job array job slot limit of 100 jobs for a job array of 1000 jobs:
bsub -J "job_array_name[1000]%100" myJob
Set a job array slot limit after submission
- Use the
bmod
command to set a job slot limit after submission.For example, to set a job array job slot limit of 100 jobs for an array with job ID 123:
bmod -J "%100" 123
Change a job array job slot limit
Changing a job array job slot limit is the same as setting it after submission.
- Use the
bmod
command to change a job slot limit after submission.For example, to change a job array job slot limit to 250 for a job array with job ID 123:
bmod -J "%250" 123
View a job array job slot limit
- To view job array job slot limits use the
-A
and-l
options ofbjobs
. The job array job slot limit is displayed in the Job Name field in the same format in which it was set.For example, the following output displays the job array job slot limit of 100 for a job array with job ID 123:
bjobs -A -l 123
Job <123>, Job Name <myArray[1-1000]%100
>, User <user1>, Project <default>, Sta tus <PEND>, Queue <normal>, Job Priority <20>, Command <my Job> Wed Feb 29 12:34:56: Submitted from host <hostA>, CWD <$HOME>; COUNTERS: NJOBS PEND DONE RUN EXIT SSUSP USUSP PSUSP 10 9 0 1 0 0 0 0
Platform Computing Inc.
www.platform.com |
Knowledge Center Contents Previous Next Index |