Knowledge Center Contents Previous Next Index |
Non-Shared File Systems
Contents
- About Directories and Files
- Using LSF with Non-Shared File Systems
- Remote File Access
- File Transfer Mechanism (lsrcp)
About Directories and Files
LSF is designed for networks where all hosts have shared file systems, and files have the same names on all hosts.
LSF includes support for copying user data to the execution host before running a batch job, and for copying results back after the job executes.
In networks where the file systems are not shared, this can be used to give remote jobs access to local data.
Supported file systems
UNIX
On UNIX systems, LSF supports the following shared file systems:
- Network File System (NFS). NFS file systems can be mounted permanently or on demand using
automount
.- Andrew File System (AFS)
- Distributed File System (DCE/DFS)
Windows
On Windows, directories containing LSF files can be shared among hosts from a Windows server machine.
Non-shared directories and files
LSF is usually used in networks with shared file space. When shared file space is not available, LSF can copy needed files to the execution host before running the job, and copy result files back to the submission host after the job completes. See Remote File Access for more information.
Some networks do not share files between hosts. LSF can still be used on these networks, with reduced fault tolerance. See Using LSF with Non-Shared File Systems for information about using LSF in a network without a shared file system.
Using LSF with Non-Shared File Systems
LSF installation
To install LSF on a cluster without shared file systems, follow the complete installation procedure on every host to install all the binaries, man pages, and configuration files.
Configuration files
After you have installed LSF on every host, you must update the configuration files on all hosts so that they contain the complete cluster configuration. Configuration files must be the same on all hosts.
Master host
You must choose one host to act as the LSF master host. LSF configuration files and working directories must be installed on this host, and the master host must be listed first in
lsf.cluster.
cluster_name
.You can use the parameter LSF_MASTER_LIST in
lsf.conf
to define which hosts can be considered to be elected master hosts. In some cases, this may improve performance.For Windows password authentication in a non-shared file system environment, you must define the parameter LSF_MASTER_LIST in
lsf.conf
so that jobs will run with correct permissions. If you do not define this parameter, LSF assumes that the cluster uses a shared file system environment.Fault tolerance
Some fault tolerance can be introduced by choosing more than one host as a possible master host, and using NFS to mount the LSF working directory on only these hosts. All the possible master hosts must be listed first in
lsf.cluster.
cluster_name
. As long as one of these hosts is available, LSF continues to operate.Remote File Access
Using LSF with non-shared file space
LSF is usually used in networks with shared file space. When shared file space is not available, use the
bsub -f
command to have LSF copy needed files to the execution host before running the job, and copy result files back to the submission host after the job completes.LSF attempts to run a job in the directory where the
bsub
command was invoked. If the execution directory is under the user's home directory,sbatchd
looks for the path relative to the user's home directory. This handles some common configurations, such as cross-mounting user home directories with the/net
automount option.If the directory is not available on the execution host, the job is run in
/tmp
. Any files created by the batch job, including the standard output and error files created by the-o
and-e
options tobsub
, are left on the execution host.LSF provides support for moving user data from the submission host to the execution host before executing a batch job, and from the execution host back to the submitting host after the job completes. The file operations are specified with the
-f
option tobsub
.LSF uses the
lsrcp
command to transfer files.lsrcp
contacts RES on the remote host to perform file transfer. If RES is not available, the UNIXrcp
command is used. See File Transfer Mechanism (lsrcp) for more information.bsub -f
The
-f "[
local_file
operator
[
remote_file
]]"
option to thebsub
command copies a file between the submission host and the execution host. To specify multiple files, repeat the-f
option.local_file
File name on the submission host
remote_file
File name on the execution host
The files
local_file
andremote_file
can be absolute or relative file path names. You must specific at least one file name. When the fileremote_file
is not specified, it is assumed to be the same aslocal_file
. Includinglocal_file
without the operator results in a syntax error.operator
Operation to perform on the file. The operator must be surrounded by white space.
Valid values for
operator
are:>
local_file
on the submission host is copied toremote_file
on the execution host before job execution.remote_file
is overwritten if it exists.<
remote_file
on the execution host is copied tolocal_file
on the submission host after the job completes.local_file
is overwritten if it exists.<<
remote_file
is appended tolocal_file
after the job completes.local_file
is created if it does not exist.><, <>
Equivalent to performing the > and then the < operation. The file
local_file
is copied toremote_file
before the job executes, andremote_file
is copied back, overwritinglocal_file
, after the job completes. <> is the same as ><If the submission and execution hosts have different directory structures, you must ensure that the directory where
remote_file
andlocal_file
will be placed exists. LSF tries to change the directory to the same path name as the directory where thebsub
command was run. If this directory does not exist, the job is run in your home directory on the execution host.You should specify
remote_file
as a file name with no path when running in non-shared file systems; this places the file in the job's current working directory on the execution host. This way the job will work correctly even if the directory where thebsub
command is run does not exist on the execution host. Be careful not to overwrite an existing file in your home directory.bsub -i
If the input file specified with
bsub -i
is not found on the execution host, the file is copied from the submission host using the LSF remote file access facility and is removed from the execution host after the job finishes.bsub -o and bsub -e
The output files specified with the
-o
and-e
arguments tobsub
are created on the execution host, and are not copied back to the submission host by default. You can use the remote file access facility to copy these files back to the submission host if they are not on a shared file system.For example, the following command stores the job output in the
job_out
file and copies the file back to the submission host:
bsub -o job_out -f "job_out <" myjob
Example
To submit
myjob
to LSF, with input taken from the file/data/data3
and the output copied back to/data/out3
, run the command:
bsub -f "/data/data3 > data3" -f "/data/out3 < out3" myjob data3 out3
To run the job
batch_update
, which updates thebatch_data
file in place, you need to copy the file to the execution host before the job runs and copy it back after the job completes:
bsub -f "batch_data <>" batch_update batch_data
File Transfer Mechanism (lsrcp)
The LSF remote file access mechanism (
bsub -f
) useslsrcp
to process the file transfer. Thelsrcp
command tries to connect to RES on the submission host to handle the file transfer.See Remote File Access for more information about using
bsub -f
.Limitations to lsrcp
Because LSF client hosts do not run RES, jobs that are submitted from client hosts should only specify
bsub -f
ifrcp
is allowed. You must set up the permissions forrcp
if account mapping is used.File transfer using
lscrp
is not supported in the following contexts:
- If LSF account mapping is used;
lsrcp
fails when running under a different user account- LSF client hosts do not run RES, so
lsrcp
cannot contact RES on the submission hostSee Authorization options for more information.
Workarounds
In these situations, use the following workarounds:
rcp on UNIX
If
lsrcp
cannot contact RES on the submission host, it attempts to usercp
to copy the file. You must set up the/etc/hosts.equiv
orHOME/.rhosts
file in order to usercp
.See the
rcp
(1) andrsh
(1) man pages for more information on using thercp
command.Custom file transfer mechanism
You can replace
lsrcp
with your own file transfer mechanism as long as it supports the same syntax aslsrcp
. This might be done to take advantage of a faster interconnection network, or to overcome limitations with the existinglsrcp
.sbatchd
looks for thelsrcp
executable in theLSF_BINDIR
directory as specified in thelsf.conf
file.
Platform Computing Inc.
www.platform.com |
Knowledge Center Contents Previous Next Index |