Enabling resizable jobs allows LSF to run a job with minimum and maximum slots requested and have it dynamically use the number of slots available at any given time.
By default, if a job specifies minimum and maximum slots requests (bsub -n min,max), LSF makes a one time allocation and schedules the job. You can configure resizable jobs, where LSF dispatches jobs as long as minimum slot request is satisfied. After the job successfully starts, LSF continues to schedule and allocate additional resources to satisfy the maximum slot request for the job. For example, a job asks for -n 4,32 processors. The Job starts to run and gets 20 slots at time t0. After that, LSF continues to allocate more job resources; for instance, 4 slots to jobs at time t1. Then, another 8 slots at time t2, which finally satisfies 32 slot requirement.
A job whose job slot allocation can grow and shrink during its run time. The allocation change request may be triggered automatically or by the bresize command. For example, after the job starts, you can explicitly cancel resize allocation requests or have the job release idle resources back to the LSF.
A resizable job with a minimum and maximum slot request. LSF automatically schedules and allocates additional resources to satisfy job maximum request as the job runs.
For autoresizable jobs, LSF automatically calculates the pending allocation requests. The maximum pending allocation request is calculated based on the maximum number of requested slots minus the number of allocated slots. And the minimum pending allocation request is always 1. B ecause the job is running and its previous minimum request is already satisfied, LSF is able to allocate any number of additional slots to the running job. For instance, if job requests -n 4, 32, if LSF allocates 20 slots to the job initially, its active pending allocation request is 1 to 12. 1 is minimum slot request. 12 is maximum slot request. After LSF assigns another 4 slots, the pending allocation request is 1 to 8.
An additional resource request attached to a resizable job. Only running jobs can have pending allocation requests. At any given time, the job only has one allocation request.
LSF creates a new pending allocation request and schedules it after job physically starts on the remote host (after LSF receives the JOB_EXECUTE event from sbatchd) or notification successfully completes.
A notification command is an executable that is invoked on the first execution host of a job in response to an allocation (grow or shrink) event. It can be used to inform the running application for allocation change. Due to the various implementations of applications, each resizable application may have its own notification command provided by the application developer.
The notification command runs under the same user ID environment, home, and working directory as the actual job. The standard input, output, and error of the program are redirected to the NULL device. If the notification command is not in the user's normal execution path (the $PATH variable), the full path name of the command must be specified.
A notification command exits with one of the following values:
LSB_RESIZE_NOTIFY_OK=0
LSB_RESIZE_NOTIFY_FAIL=1
LSF sets these environment variables in notification command environment. LSB_RESIZE_NOTIFY_OK indicates notification succeeds. For allocation both "grow" and "shrink" events, LSF updates the job allocation to reflect the new allocation.
LSB_RESIZE_NOTIFY_FAIL indicates notification failure. For allocation "grow" event, LSF reschedules the pending allocation request. For allocation "shrink" event, LSF fails the alloction release request.
The resizable jobs feature is enabled by defining an application profile using the RESIZABLE_JOBS parameter in lsb.applications.