Parallel Application Manager – job starter for MPI applications
The Parallel Application Manager (PAM) is the point of control for Platform LSF. PAM is fully integrated with Platform LSF to interface the user application with LSF. PAM acts as the supervisor of a parallel LSF job.
MPI jobs started by pam can only be submitted through the LSF Batch system. PAM cannot be used interactively to start parallel jobs. sbatchd starts PAM on the first execution host.
Uses a vendor MPI library or an MPI Parallel Job Launcher (PJL); for example, mpirun, poe start a parallel job on a specified set of hosts in an LSF cluster.
PAM contacts RES on each execution host allocated to the parallel job.
PAM queries RES periodically to collect resource usage for each parallel task and passes control signals through RES to all process groups and individual running tasks, and cleans up tasks as needed.
Passes job-level resource usage and process IDs (PIDs and PGIDs) to sbatchd for enforcement
Collects resource usage information and exit status upon termination
The pam command starts a vendor MPI job on a specified set of hosts in a LSF cluster. Using pam to start an MPI job requires the underlying MPI system to be LSF aware, using a vendor MPI implementation that supports LSF (SGI IRIX vendor MPI or HP-UX vendor MPI).
PAM uses the vendor MPI library to spawn the child processes needed for the parallel tasks that make up your MPI application. It starts these tasks on the systems allocated by LSF. The allocation includes the number of execution hosts needed, and the number of child processes needed on each host.
The -auto_place option on the pam command line tells the SGI IRIX mpirun library to launch the MPI application according to the resources allocated by LSF.
In the SGI environment, the -mpi option on the bsub and pam command line is equivalent to the mpirun command.
On HP-UX, you can have LSF manage the allocation of hosts to achieve better resource utilization by coordinating the start-up phase with mpirun. This is done by preceding the regular HP MPI mpirun command with:
For HP-UX vendor MPI jobs, the -mpi option must be the first option of the pam command.
For example, to run a single-host job and have LSF select the host, the command:
The number of processors required to run the parallel application, typically the same as the number of parallel tasks in the job. If the host is a multiprocessor, one host can start several tasks.
You can use both bsub -n and pam -n in the same job submission. The number specified in the pam -n option should be less than or equal to the number specified by bsub -n. If the number of tasks specified with pam -n is greater than the number specified by bsub -n, the pam -n is ignored.
For example, on SGI IRIX or SGI Altix, you can specify:
bsub -n 5 pam -n 2 -mpi -auto_place a.out
Here, the job requests 5 processors, but PAM only starts 2 parallel tasks.
The name of the MPI application to be run on the listed hosts. This must be the last argument on the command line.
This option tells pam not to print out the MPI job tasks summary report to the standard output. By default, the summary report prints out the task ID, the host on which it was executed, the command that was executed, the exit status, and the termination time.
Verbose mode. Displays the name of the execution host or hosts.
The number of processors required to run the MPI application, typically the number of parallel tasks in the job. If the host is a multiprocessor, one host can start several tasks.
You can use both bsub -n and pam -n in the same job submission. The number specified in the pam -n option should be less than or equal to the number specified by bsub -n. If the number of tasks specified with pam -n is greater than the number specified by bsub -n, the pam -n is ignored.
The name of the MPI application to be run on the listed hosts. This must be the last argument on the command line.