[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
- Using LSF HPC with ANSYS
- Using LSF HPC with NCBI BLAST
- Using LSF HPC with FLUENT
- Using LSF HPC with Gaussian
- Using LSF HPC with Lion Bioscience SRS
- Using LSF HPC with LSTC LS-Dyna
- Using LSF HPC with MSC Nastran
[ Top ]
Using LSF HPC with ANSYS
LSF HPC use supports various ANSYS solvers through a common integration console built-in to the ANSYS GUI. The only change the average ANSYS user sees is the addition of a Run using LSF? button on the standard ANSYS console.
Using ANSYS with LSF HPC simplifies distribution of jobs, and improves throughput by removing the need for engineers to worry about when or where their jobs run. They simply request job execution and know that their job will be completed as fast as their environment will allow.
Configuring LSF HPC for ANSYS
During installation,
lsfinstall
adds the Boolean resourceansys
to the Resource section oflsf.shared
.If only some of your hosts can accept ANSYS jobs, configure the Host section of
lsf.cluster.
cluster_name to identify those hosts.Edit
LSF_ENVDIR/conf/lsf.cluster.cluster_name
file and add theansys
resource to the hosts that can run ANSYS jobs:Begin Host HOSTNAME model type server r1m mem swp RESOURCES ... hostA ! ! 1 3.5 () () () hostB ! ! 1 3.5 () () (ansys) hostC ! ! 1 3.5 () () () ... End HostSubmitting jobs through ANSYS
To start a job, choose the Batch menu item. The following dialog is displayed:
![]()
The name given to the job for easier recognition at runtime.
Specifies the file of ANSYS commands you are submitting for batch execution. You can either type in the desired file name or click on the ... button, to display a file selection dialog box.
Specifies the file to which ANSYS directs text output by the program. If the file name already exists in the working directory, it will be overwritten when the batch job is started.
The memory requirements for the job.
Launches ANSYS LSF, a separately licensed product.
Runs the ANSYS job in background or in foreground mode.
Includes or excludes the input file listing at the beginning of the output file.
Additional ANSYS parameters
Specifies a start time and date to start the job. This option is active after Run in background? has been changed to Yes. To use this option, you must have permission to run the at command on UNIX systems.
You can also configure additional options to specify LSF job requirements such as queue, host, or desired host architecture:
![]()
Allows users to specify a specific host to run the job on.
Allows users to specify which queue they desire instead of the default.
Allows users to specify a specific architecture for their job.
Submitting jobs through the ANSYS command-line
Submitting a command line job requires extra parameters to run correctly through LSF.
bsub
-R ansys
[bsub_options] ansys_command-b -p
productvar<
input_name>&
output_nameRun the job on hosts with the Boolean resource
ansys
configuredRegular options to
bsub
that specify the job parametersThe ANSYS executable to be executed on the host (for example,
ansys57
)Run the job in ANSYS batch mode
ANSYS product to use with the job
ANSYS input file. (You can also use the
bsub -i
option.)ANSYS output file. (You can also use the
bsub -o
option.)[ Top ]
Using LSF HPC with NCBI BLAST
LSF HPC accepts jobs running NCBI BLAST (Basic Local Alignment Search Tool).
Configuring LSF HPC for BLAST jobs
During installation,
lsfinstall
adds the Boolean resourceblast
to the Resource section oflsf.shared
.If only some of your hosts can accept BLAST jobs, configure the Host section of
lsf.cluster.
cluster_name to identify those hosts.Edit
LSF_ENVDIR/conf/lsf.cluster.cluster_name
file and add theblast
resource to the hosts that can run BLAST jobs:Begin Host HOSTNAME model type server r1m mem swp RESOURCES ... hostA ! ! 1 3.5 () () () hostB ! ! 1 3.5 () () (blast) hostC ! ! 1 3.5 () () () ... End HostSubmitting BLAST jobs
Use BLAST parallel provided with LSF HPC to submit BLAST jobs.
BLAST parallel is a PERL program that distributes BLAST searches across a cluster by splitting both the query file and the reference database and merging the result files after all BLAST jobs finish.
See the README in the
LSF_MISC/examples/blastparallel/
for information about installing, configuring, and using BLAST parallel.[ Top ]
Using LSF HPC with FLUENT
LSF HPC is integrated with products from Fluent Inc., allowing FLUENT jobs to take advantage of the checkpointing and migration features provided by LSF. This increases the efficiency of the software and means data is processed faster.
FLUENT 5 offers versions based on system vendors' parallel environments (usually MPI using the VMPI version of FLUENT 5.) Fluent also provides a parallel version of FLUENT 5 based on its own socket-based message passing library (the NET version).
This chapter assumes you are already familiar with using FLUENT software and checkpointing jobs in LSF.
See Administering Platform LSF for more information about checkpointing in LSF.
- Hardware vendor-supplied MPI environment for network computing to use the "vmpi" version of FLUENT 5.
Configuring LSF HPC for FLUENT jobs
During installation,
lsfinstall
adds the Boolean resourcefluent
to the Resource section oflsf.shared
.LSF HPC also installs the
echkpnt.fluent
anderestart.fluent
files in LSF_SERVERDIR.If only some of your hosts can accept FLUENT jobs, configure the Host section of
lsf.cluster.
cluster_name to identify those hosts.Edit
LSF_ENVDIR/conf/lsf.cluster.cluster_name
file and add thefluent
resource to the hosts that can run FLUENT jobs:Begin Host HOSTNAME model type server r1m mem swp RESOURCES ... hostA ! ! 1 3.5 () () () hostB ! ! 1 3.5 () () (fluent) hostC ! ! 1 3.5 () () () ... End HostCheckpointing in FLUENT
FLUENT 5 is integrated with LSF HPC to use the LSF checkpointing capability. At the end of each iteration, FLUENT looks for the existence of a checkpoint file (
check
) or a checkpoint exit file (exit
). If it detects the checkpoint file, it writes a case and data file, removes the checkpoint file, and continues iterating. If it detects a checkpoint exit file, it writes a case and data file, then exits.Use the
bchkpnt
command to create the checkpoint and checkpoint exit files, which forces FLUENT to checkpoint, or checkpoint and exit itself. FLUENT also creates a journal file with instructions to read the checkpointed case and data files, and continue iterating. FLUENT uses this file when it is restarted with thebrestart
command.LSF HPC installs
echkpnt.fluent
anderestart.fluent
, which are special versions ofechkpnt
anderestart
to allow checkpointing with FLUENT. Usebsub -a fluent
to make sure your job uses these files.Checkpoint directories
When you submit a checkpointing job, you specify a checkpoint directory.
Before the job starts running, LSF sets the environment variable LSB_CHKPNT_DIR. The value of LSB_CHKPNT_DIR is a subdirectory of the checkpoint directory specified in the command line. This subdirectory is identified by the job ID and only contains files related to the submitted job.
Checkpoint trigger files
When you checkpoint a FLUENT job, LSF creates a checkpoint trigger file (
check
) in the job subdirectory, which causes FLUENT to checkpoint and continue running. A special option is used to create a different trigger file (exit
) to cause FLUENT to checkpoint and exit the job.FLUENT uses the LSB_CHKPNT_DIR environment variable to determine the location of checkpoint trigger files. It checks the job subdirectory periodically while running the job. FLUENT does not perform any checkpointing unless it finds the LSF trigger file in the job subdirectory. FLUENT removes the trigger file after checkpointing the job.
Restarting jobs
If a job is restarted, LSF attempts to restart the job with the
-restart
option appended to the original FLUENT command. FLUENT uses the checkpointed data and case files to restart the process from that checkpoint, rather than repeating the entire process.Each time a job is restarted, it is assigned a new job ID, and a new job subdirectory is created in the checkpoint directory. Files in the checkpoint directory are never deleted by LSF, but you may choose to remove old files once the FLUENT job is finished and the job history is no longer required.
Submitting FLUENT jobs
Use
bsub
to submit the job, including parameters required for checkpointing.The syntax for the
bsub
command to submit a FLUENT job is:[
-R fluent
]-a fluent
[-k
checkpoint_dir |-k
"checkpoint_dir [checkpoint_period]"
[bsub options] FLUENT command [FLUENT options]-lsf
Optional. Specify the
fluent
shared resource if the FLUENT application is only installed on certain hosts in the clusterUse the
esub
for FLUENT jobs, which automatically sets the checkpoint method tofluent
to use the checkpoint and restart programs for FLUENT jobs,echkpnt.fluent
anderestart.fluent
.The checkpointing feature for FLUENT jobs requires all of the following parameters:
-k checkpoint_dir
Regular option to
bsub
that specifies the name of the checkpoint directory.checkpoint_period
Regular option to
bsub
that specifies the time interval in minutes that LSF will automatically checkpoint jobs.FLUENT command
Regular command used with FLUENT software.
Special option to the FLUENT command. Specifies that FLUENT is running under LSF, and causes FLUENT to check for trigger files in the checkpoint directory if the environment variable LSB_CHKPNT_DIR is set.
- Sequential FLUENT batch job
% bsub -a fluent fluent 3d -g -i journal_file -lsf- Parallel FLUENT net version batch job on 4 CPUs
% bsub -a fluent -n 4 fluent 3d -t0 -pnet -g -i journal_file -lsf
When using the net version of FLUENT 5, pam is not used to launch FLUENT, so the JOB_STARTER argument of the queue should not be set. Instead, LSF sets an environment variable to contain a list of hosts and FLUENT uses this list to launch itself.
Checkpointing, restarting, and migrating FLUENT jobs
bchkpnt
[bchkpnt_options] [-k
] [job_ID]
-k
Specifies checkpoint and exit. The job will be killed immediately after being checkpointed. When the job is restarted, it continues from the last checkpoint.
- job_ID
Job ID of the FLUENT job. Specifies which job to checkpoint. Each time the job is migrated, the job is restarted and assigned a new job ID.
brestart
[brestart options] checkpoint_directory [job_ID]
- checkpoint_directory
Specifies the checkpoint directory, where the job subdirectory is located.
- job_ID
Job ID of the FLUENT job, specifies which job to restart. At this point, the restarted job is assigned a new job ID, and the new job ID is used for checkpointing. The job ID changes each time the job is restarted.
bmig
[bsub_options] [job_ID]
- job_ID
Job ID of the FLUENT job, specifies which job to restart. At this point, the restarted job is assigned a new job ID, and the new job ID is used for checkpointing. The job ID changes each time the job is restarted.
Examples
- Sequential FLUENT batch job with checkpoint and restart
% bsub -a fluent -k "/home/username 60" fluent 3d -g -i journal_file -lsfSubmits a job that uses the checkpoint/restart method
echkpnt.fluent
anderestart.fluent
,/home/username
as the checkpoint directory, and a 60 minute duration between automatic checkpoints. FLUENT checks if there is a checkpoint trigger file/home/username/exit
or/home/username/check
.% bchkpnt job_ID
echkpnt
creates the checkpoint trigger file/home/username/check
and waits until the file is removed and the checkpoint is successful. FLUENT writes a case and data file, and a restart journal file at the end of its current iteration. The files are saved in/home/username/
job_ID and FLUENT continues to iterate.Use
bjobs
to verify that the job is still running after checkpoint.% bchkpnt -k job_ID
echkpnt
creates the checkpoint trigger file/home/username/exit
and waits until the file is removed and the checkpoint is successful. FLUENT writes a case and data file, and a restart journal file at the end of its current iteration. The files are saved in/home/username/
job_ID and FLUENT exits.Use
bjobs
to verify that the job is not running after checkpoint.% brestart /home/username/job_IDStarts a FLUENT job using the latest case and data files in
/home/username/
job_ID. The restart journal file/home/username/
job_ID
/#restart.inp
is used to instruct FLUENT to read the latest case and data files and continue iterating.- Parallel FLUENT VMPI version batch job with checkpoint and restart on 4 CPUs
% bsub -a fluent -k "/home/username 60" -n 4 fluent 3d -t4 -pvmpi -g -i journal_file -lsf % bchkpnt -k job_IDForces FLUENT to write a case and data file, and a restart journal file at the end of its current iteration. The files are saved in
/home/username/
job_ID and FLUENT exits.% brestart /home/username/job_IDStarts a FLUENT job using the latest case and data files in
/home/username/
job_ID. The restart journal file/home/username/
job_ID
/#restart.inp
is used to instruct FLUENT to read the latest case and data files and continue iterating.The parallel job is restarted using the same number of processors (4) requested in the original
bsub
submission.% bmig -m hostA 0All jobs on
hostA
are checkpointed and moved to another host.[ Top ]
Using LSF HPC with Gaussian
Platform HPC accepts jobs running the Gaussian electronic structure modeling program.
Configuring LSF HPC for Gaussian jobs
During installation,
lsfinstall
adds the Boolean resourcegaussian
to the Resource section oflsf.shared
.If only some of your hosts can accept Gaussian jobs, configure the Host section of
lsf.cluster.
cluster_name to identify those hosts.Edit
LSF_ENVDIR/conf/lsf.cluster.cluster_name
file and add thegaussian
resource to the hosts that can run Gaussian jobs:Begin Host HOSTNAME model type server r1m mem swp RESOURCES ... hostA ! ! 1 3.5 () () () hostB ! ! 1 3.5 () () (gaussian) hostC ! ! 1 3.5 () () () ... End HostSubmitting Gaussian jobs
Use
bsub
to submit the job, including parameters required for Gaussian.[ Top ]
Using LSF HPC with Lion Bioscience SRS
SRS is Lion Bioscience's Data Integration Platform, in which data is extracted by all other Lion Bioscience applications or third-party products. LSF HPC works with the batch queue feature of SRS to provide load sharing and allow users to manage their running and completed jobs.
Configuring LSF HPC for SRS jobs
During installation,
lsfinstall
adds the Boolean resourcelion
to the Resource section oflsf.shared
.If only some of your hosts can accept SRS jobs, configure the Host section of
lsf.cluster.
cluster_name to identify those hosts.Edit
LSF_ENVDIR/conf/lsf.cluster.cluster_name
file and add thelion
resource to the hosts that can run SRS jobs:Begin Host HOSTNAME model type server r1m mem swp RESOURCES ... hostA ! ! 1 3.5 () () () hostB ! ! 1 3.5 () () (lion) hostC ! ! 1 3.5 () () () ... End HostYou must also configure SRS for batch queues. When SRS batch queueing is enabled, users select from the available batch queues displayed next to the application Launch button in the Application Launch page.
See the SRS administration manual for information about setting up a batch queue system. No additional configuration is required in LSF HPC.
Submitting and monitoring SRS jobs
Use
bsub
to submit the job, including parameters required for SRS.As soon as the application is submitted, you can monitor the progress of the job. When applications are launched and batch queues are in use, an icon appears. The icon looks like a "new mail" icon in an email program when jobs are running, and looks like a "read mail" icon when all launched jobs are complete. You can click this icon at any time to:
You can also view the application results or launch another application against those results, using the results of the initial job as input for the next job.
See the SRS Administrator's Manual for more information.
[ Top ]
Using LSF HPC with LSTC LS-Dyna
LSF HPC is integrated with products from Livermore Software Technology Corporation (LSTC). LS-Dyna jobs can use the checkpoint and restart features of LSF HPC and take advantage of both SMP and distributed MPP parallel computation.
To submit LS-Dyna jobs through LSF HPC, you only need to make sure that your jobs are checkpointable.
See Administering Platform LSF for more information about checkpointing in LSF.
Configuring LSF HPC for LS-Dyna jobs
During installation,
lsfinstall
adds the Boolean resourcels_dyna
to the Resource section oflsf.shared
.LSF HPC also installs the
echkpnt.ls_dyna
anderestart.ls_dyna
files in LSF_SERVERDIR.If only some of your hosts can accept LS-Dyna jobs, configure the Host section of
lsf.cluster.
cluster_name to identify those hosts.Edit
LSF_ENVDIR/conf/lsf.cluster.cluster_name
file and add thels_dyna
resource to the hosts that can run LS-Dyna jobs:Begin Host HOSTNAME model type server r1m mem swp RESOURCES ... hostA ! ! 1 3.5 () () () hostB ! ! 1 3.5 () () (ls_dyna) hostC ! ! 1 3.5 () () () ... End HostLS-Dyna integration with LSF checkpointing
LS-Dyna is integrated with LSF HPC to use the LSF checkpointing capability. It uses application-level checkpointing, working with the functionality implemented by LS- Dyna. At the end of each time step, LS-Dyna looks for the existence of a checkpoint trigger file, named D3KIL.
LS-Dyna jobs always exit with 0 even when checkpointing. LSF will report that the job has finished when it has checkpointed.
Use the
bchkpnt
command to create the checkpoint trigger file,D3KIL
, which LS- Dyna reads. The file forces LS-Dyna to checkpoint, or checkpoint and exit itself. The existence of aD3KIL
file and the checkpoint information that LSF writes to the checkpoint directory specified for the job are all LSF HPC needs to restart the job.Checkpointing and tracking of resources of SMP jobs is supported.
With pam and Task Starter, you can track resources of MPP jobs, but cannot checkpoint. If you do not use pam and Task Starter, checkpointing of MPP jobs is supported, but tracking is not.
LSF HPC installs
echkpnt.ls_dyna
anderestart.ls_dyna
, which are special versions ofechkpnt
anderestart
to allow checkpointing with LS-Dyna. Usebsub -a ls_dyna
to make sure your job uses these files.The method name
ls_dyna
, uses theesub
for LS-Dyna jobs, which sets the checkpointing methodLSB_ECHKPNT_METHOD="ls_dyna"
to useechkpnt.ls_dyna
anderestart.ls_dyna
.When you submit a checkpointing job, you specify a checkpoint directory.
Before the job starts running, LSF sets the environment variable LSB_CHKPNT_DIR to a subdirectory of the checkpoint directory specified in the command line, or the CHKPNT parameter in lsb.queues. This subdirectory is identified by the job ID and only contains files related to the submitted job.
For checkpointing to work when running an LS-Dyna job from LSF, you must CD to the directory that LSF sets in
$LSB_CHKPNT_DIR
after submitting LS-Dyna jobs. You must change to this directory whether submitting a single job or multiple jobs. LS-Dyna puts all its output files in this directory.When you checkpoint a job, LSF creates a checkpoint trigger file named
D3KIL
in the working directory of the job.The
D3KIL
file contains an entry depending on the desired checkpoint outcome:
sw1.
causes the job to checkpoint and exit. LS-Dyna writes to a restart data filed3dump
and exits.sw3.
causes the job to checkpoint and continue running. LS-Dyna writes to a restart data filed3dump
and continues running until the next checkpoint.
The other possible LS-Dyna switch parameters are not relevant to LSF checkpointing.
LS-Dyna does not remove the
D3KIL
trigger file after checkpointing the job.If a job is restarted, LSF attempts to restart the job with the
-r
restart_file option used to replace any existing-i
or-r
options in the original LS-Dyna command. LS-Dyna uses the checkpointed data to restart the process from that checkpoint point, rather than starting the entire job from the beginning.Each time a job is restarted, it is assigned a new job ID, and a new job subdirectory is created in the checkpoint directory. Files in the checkpoint directory are never deleted by LSF, but you may choose to remove old files once the LS-Dyna job is finished and the job history is no longer required.
Submitting LS-Dyna jobs
To submit LS_Dyna jobs, redirect a job script to the standard input of
bsub
, including parameters required for checkpointing. With job scripts, you can manage two limitations of LS-Dyna job submissions:
- When LS-Dyna jobs are restarted from a checkpoint, the job will use the checkpoint environment instead of the job submission environment. You can restore your job submission environment if you submit your job with a job script that includes your environment settings.
- LS-Dyna jobs must run in the directory that LSF sets in the LSB_CHKPNT_DIR environment variable. This lets you submit multiple LS-Dyna jobs from the same directory but is also required if you are submitting one job. If you submit a job from a different directory, you must change to the
$LSB_CHKPNT_DIR
directory. You can do this if you submit your jobs with a job script.
If you are running a single job or multiple jobs, all LS_Dyna jobs must run in the $LSB_CHKPT_DIR directory.
To submit LS-Dyna jobs with job submission scripts, embed the LS-Dyna job in the job script. Use the following format to run the script:
% bsub < jobscript
Inside your job scripts, the syntax for the
bsub
command to submit an LS-Dyna job is either of the following:[
-R ls_dyna
]-k
"checkpoint_dirmethod=ls_dyna
" |-k "
checkpoint_dir [checkpoint_period]method=ls_dyna"
[bsub_options] LS_Dyna_command [LS_Dyna_options]OR:
[
-R ls_dyna
]-a ls_dyna
-k
"checkpoint_dir" |-k "
checkpoint_dir [checkpoint_period]"
[bsub options] LS_Dyna_command [LS_Dyna_options]Optional. Specify the
ls_dyna
shared resource if the LS-Dyna application is only installed on certain hosts in the cluster.Mandatory. Use the
esub
for LS-Dyna jobs, which automatically sets the checkpoint method tols_dyna
to use the checkpoint and restart programsechkpnt.ls_dyna
anderestart.ls_dyna
. Alternatively, usebsub -a
to specify thels_dyna
esub
.The checkpointing feature for LS-Dyna jobs requires all of the following parameters:
-k checkpoint_dir
Mandatory. Regular option to
bsub
that specifies the name of the checkpoint directory. Specify thels_dyna
method here if you do not use thebsub -a
option.checkpoint_period
Regular option to
bsub
that specifies the time interval in minutes that LSF will automatically checkpoint jobs.LS_Dyna_command
Regular LS-Dyna software command and options.
Preparing your job scripts
Specify any environment variables required for your LS-Dyna jobs. For example:
LS_DYNA_ENV=VAL;export LS_DYNA_ENVIf you do not set your environment variables in the job script, then you must add some lines to the script to restore environment variables. For example:
if [ -f $LSB_CHKPNT_DIR/.envdump ]; then .$LSB_CHKPNT_DIR/.envdump fiEnsure that your jobs run in the checkpoint directory set by LSF, by adding the following line after your bsub commands:
cd $LSB_CHKPNT_DIRWrite the LS-Dyna command you want to run. For example:
/usr/share/ls_dyna_path/ls960 endtime=2 i=/usr/share/ls_dyna_path/airbag.deploy.k ncpu=1Example job scripts
All scripts must contain the
ls_dyna
method and thecd
command to the checkpoint directory set by LSF.
- Job scripts with SMP LS-Dyna job embedded in the script. Environment variables are set in the script.
% bsub < jobscript
Example job submission script:
#!/bin/sh #BSUB -J LS_DYNA #BSUB -k "/usr/share/checkpoint_dir method=ls_dyna" #BSUB -o "/usr/share/output/output.%J" cd $LSB_CHKPNT_DIR setenv LS_DYNA_VAR1 VAL1 setenv LS_DYNA_VAR2 VAL2 cp /usr/share/datapool/input.data /home/usr1/input.data /full_path/ls960 i=/home/usr1/input.data- Job scripts with SMP LS-Dyna job embedded in the script. Environment variables are set in the script.
% bsub < jobscript
Example job submission script:
#!/bin/sh #BSUB -J LS_DYNA #BSUB -k "/usr/share/checkpoint_dir method=ls_dyna" cd $LSB_CHKPNT_DIR LS_DYNA_ENV=VAL;export LS_DYNA_ENV /usr/share/ls_dyna_path/ls960 endtime=2 i=/usr/share/ls_dyna_path/airbag.deploy.k ncpu=1 exit $?- Job scripts with SMP LS-Dyna job embedded in the script. Environment variables are not set in the script, and the settings must be read from a hidden file, .
envdump
, which theechkpnt.ls_dyna
program creates in the$LSB_CHKPNT_DIR
directory. The script must source the./envdump
file.% bsub < jobscript
Example job submission script:
#!/bin/sh #BSUB -J LS_DYNA #BSUB -k "/usr/share/checkpoint_dir method=ls_dyna" cd $LSB_CHKPNT_DIR #after the first checkpoint if [ -f $LSB_CHKPNT_DIR/.envdump ]; then .$LSB_CHKPNT_DIR/.envdump fi /usr/share/ls_dyna_path/ls960 endtime=2 i=/usr/share/ls_dyna_path/airbag.deploy.k ncpu=1 exit $?- Job script running MPP LS-Dyna job embedded in the script. Without PAM and TaskStarter, the job can be checkpointed, but not resource usage or job control are available.
% bsub < jobscript
Example job submission script: #!/bin/sh #BSUB -J LS_DYNA #BSUB -k "/usr/share/checkpoint_dir method=ls_dyna" #BSUB -o "/usr/share/output/output.%J" #BSUB -n 4 cd $LSB_CHKPNT_DIR setenv ENV1 ENV1_VAL setenv ENV2 ENV2_VAL cp /usr/share/datapool/input.data /home/usr1/input.data mpirun /ls_dyna_mpp_path/mpp960 i=/home/usr1/input.data- Job script with
lammpi
wrapper running MPP LS-Dyna job embedded in the script.PAM
and TaskStarter ensures job control and resource usage information, but the job cannot be checkpointed.% bsub < jobscript
Example job submission script:
#!/bin/sh #BSUB -J LS_DYNA #BSUB -q priority #BSUB -n 1 #BSUB -o /usr/share/output/output.%J #BSUB -k "/usr/share/checkpoint_dir method=ls_dyna" export PATH=/usr/share/jdk/bin:$PATH cd $LSB_CHKPNT_DIR pam -g 1 lammpirun_wrapper /usr/share/ls_dyna_mpp_path/mpp960 i=/usr/share/DYNA/airbag.deploy.kSee Administering Platform LSF for information about submitting jobs with job scripts.
Checkpointing, restarting, and migrating LS-Dyna jobs
bchkpnt
[bchkpnt_options] [-k
] [job_ID]
-k
Specifies checkpoint and exit. The job will be killed immediately after being checkpointed. When the job is restarted, it continues from the last checkpoint.
- job_ID
Job ID of the LS-Dyna job. Specifies which job to checkpoint. Each time the job is migrated, the job is restarted and assigned a new job ID.
See Platform LSF Command Reference for more information about
bchkpnt
.brestart
[brestart_options] checkpoint_directory [job_ID]
- checkpoint_directory
Specifies the checkpoint directory, where the job subdirectory is located. Each job is run in a unique directory.
To change to the checkpoint directory for LSF to restart a job, place the following line in your job script before the LS-Dyna command is called:
cd $LSB_CHKPNT_DIR- job_ID
Job ID of the LS-Dyna job, specifies which job to restart. After the job is restarted, it is assigned a new job ID, and the new job ID is used for checkpointing. A new job ID is assigned each time the job is restarted.
See Platform LSF Command Reference for more information about
brestart
.bmig
[bsub_options] [job_ID]
- job_ID
Job ID of the LS-Dyna job, specifies which job to migrate. After the job is migrated, it is restarted and assigned a new job ID. The new job ID is used for checkpointing. A new job ID is assigned each time the job is migrated.
See Platform LSF Command Reference for more information about
bmig
.[ Top ]
Using LSF HPC with MSC Nastran
MSC Nastran Version 70.7.2 ("Nastran") runs in a Distributed Parallel mode, and automatically detects a job launched by LSF HPC, and transparently accepts the execution host information from LSF HPC.
The Nastran application checks if the LSB_HOSTS or LSB_MCPU_HOSTS environment variable is set in the execution environment. If either is set, Nastran uses the value of the environment variable to produce a list of execution nodes for the solver command line. Users can override the hosts chosen by LSF HPC to specify their own host list.
Configuring LSF HPC for Nastran jobs
During installation,
lsfinstall
adds the Boolean resourcenastran
to the Resource section oflsf.shared
.No additional executable files are needed.
If only some of your hosts can accept Nastran jobs, configure the Host section of
lsf.cluster.
cluster_name to identify those hosts.Edit
LSF_ENVDIR/conf/lsf.cluster.cluster_name
file and add thenastran
resource to the hosts that can run Nastran jobs:Begin Host HOSTNAME model type server r1m mem swp RESOURCES ... hostA ! ! 1 3.5 () () () hostB ! ! 1 3.5 () () (nastran) hostC ! ! 1 3.5 () () () ... End HostSubmitting Nastran jobs
Use
bsub
to submit the job, including parameters required for the Nastran command line.bsub -n
num_processors [-R nastran
] bsub_options nastran_command
-n
num_processorsNumber of processors required to run the job
-R nastran
Optional. Specify the
nastran
shared resource if the Nastran application is only installed on certain hosts in the cluster.You must set the Nastran
dmp
variable to the same number as the number of processors the job is requesting (-n
option ofbsub
).
- Parallel job through LSF HPC requesting 4 processors:
% bsub -n 4 -a nastran -R "nastran" nastran example dmp=4Note that both the
bsub -n 4
and Nastrandmp=4
options are used. The value for-n
anddmp
must be the same.- Parallel job through LSF HPC requesting 4 processors, no more than 1 processor per host:
% bsub -n 4 -a nastran -R "nastran span[ptile=1]" nastran example dmp=4Nastran on Linux using LAM/MPI
You must write a script that will pick up the LSB_HOSTS variable and provide the chosen hosts to the Nastran program. You can then submit the script using
bsub
:bsub -a nastran lammpi -q hpc_linux -n 2 -o out -e err -R "span[ptile=1]" lsf_nastThis will submit a 2-way job which will put its standard output in the file named
out
and standard error in a file namederr
. Theptile=1
option tells LSF to choose at most 1 CPU per node chosen for the job.The following sample
lsf_nast
script only represents a starting point, but deals with the host specification for LAM/MPI. This script should be modified at your site before use.#! /bin/sh # # lsf script to use with Nastran and LAM/MPI. # # #Set information for Head node: DAT=/home/user1/lsf/bc2.dat # #Set information for Cluster node: TMPDIR=/home/user1/temp # LOG=${TMPDIR}/log LSB_HOST_FILE=${TMPDIR}/lsb_hosts :> ${LOG} # The local host MUST be in the host file. echo ${LSB_SUB_HOST} > ${LSB_HOST_FILE} # # # Create the lam hosts file: for HOST in $LSB_HOSTS do echo $HOST >> ${LSB_HOST_FILE} done # cd ${TMPDIR} rcp ${LSB_SUB_HOST}:${DAT} . id # recon -v ${LSB_HOST_FILE} # cat ${LSB_HOST_FILE} # pwd recon -v ${LSB_HOST_FILE} >> ${LOG} 2>&1 lamboot -v ${LSB_HOST_FILE} >> ${LOG} 2>&1 NDMP=`sed -n -e '$=' ${LSB_HOST_FILE}` HOST="n0" (( i=1 )) while (( i < $NDMP )) ; do HOST="$HOST:n$i" (( i += 1 )) done echo DAT=${DAT##*/} pwd nast707t2 ${DAT##*/} dmp=${NDMP} scr=yes bat=no hosts=$HOST >> ${LOG} 2>&1 wipe -v ${LSB_HOST_FILE} >> ${LOG} 2>&1 # # Bring back files: DATL=${DAT##*/} rcp ${DATL%.dat}.log ${LSB_SUB_HOST}:${DAT%/*} rcp ${DATL%.dat}.f04 ${LSB_SUB_HOST}:${DAT%/*} rcp ${DATL%.dat}.f06 ${LSB_SUB_HOST}:${DAT%/*} # # End[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: March 13, 2009
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2009 Platform Computing Corporation. All rights reserved.