Knowledge Center Previous Next Index |
Testing Your LSF Installation
Before you make LSF available to users, you should make sure LSF is installed and operating correctly. This chapter describes how to use some basic LSF commands to do the following:
- Check the cluster configuration
- Start the LSF daemons (LSF services)
- Verify that your new cluster is operating correctly
If you have a mixed UNIX and Windows cluster, make sure you can perform operations from both UNIX and Windows hosts.
Contents
- Checking the license server (permanent LSF license)
- Checking the cluster
- Checking the LSF batch system
Checking the license server (permanent LSF license)
If you are using a DEMO license, proceed to Checking the cluster.
If you are using a permanent LSF license, perform the steps indicated to check the license server.
Check the License Server is started
The FLEXlm License Server service is installed as a Windows service to start automatically.
To check the License Server is started:
- Select Start > Settings > Control Panel > Services and make sure the FLEXlm License Server service is started.
Display license server status
The lmstat command
Use the lmstat command to check the License Server status and display the number of licenses available. You must use the -c option to specify the path to the LSF license file.
For example, depending on the LSF features installed, the output of the command should look something like the following:
C:\lsf\7.0\etc> lmutil lmstat -a -c %LSF_ENVDIR%/license.dat lmutil - Copyright (C) 1989-2000 Globetrotter Software, Inc. Flexible License Manager status on Fri 05/24/2002 13:23 License server status: 1711@hostA License file(s) on hostA: f:\winnt\system32\\\hostA\c$\flexlm\license.dat: hostA: license server UP (MASTER) v7.0 Vendor daemon status (on hostA): lsf_ld: UP v7.0 Feature usage info: Users of lsf_base: (Total of 2 licenses available) Users of lsf_manager: (Total of 2 licenses available) ...Display licensed products
Use the lshosts -l command to show what products are licensed for any host in the cluster:
C:\lsf\7.0\bin> lshosts -l hostA HOST_NAME: hostA type model cpuf ncpus ndisks maxmem maxswp maxtmp rexpri server NTX86 PC450 13.2 1 2 127M 514M 749M 0 Yes RESOURCES: (win2k) RUN_WINDOWS: (always open) LICENSES_ENABLED: (LSF_Base LSF_Manager Platform_HPC LSF_Sched_Fairshare LSF_Sched_Resource_Reservation LSF_Sched_Preemption LSF_Sched_Parallel LSF_Sched_Advance_Reservation) LICENSE_NEEDED: Class(B) LOAD_THRESHOLDS: r15s r1m r15m ut pg io ls it tmp swp mem - - - - - - - - - - -For more information
- Refer to the FLEXlm documentation for more information about the lmstat and lmgrd commands.
- Refer to Administering Platform LSF for more information about configuring and running the FLEXlm license server.
Checking the cluster
Before using any LSF commands, wait a few minutes for LSF services to start
To check the cluster, log on to any host in the cluster, and run the LSF commands described in this section.
Every command in LSF will display a list of possible options by using the -h command line argument and all LSF commands display a version string when run with the -V option.
Verify cluster configuration
The lsadmin command
Verify the cluster configuration using the lsadmin command. This can be done without LSF daemons running.
The lsadmin command controls the operation of an LSF cluster and administers the LSF services, Platform LIM, Platform RES, and Platform SBD. Use the lsadmin ckconfig command to check the LSF configuration files.
The -v option displays detailed information about the LSF configuration:
C:\LSF_7.0>lsadmin ckconfig -v Checking configuration files ... Platform EGO 1.2.3.98817, Nov 2 2007 Copyright (C) 1992-2007 Platform Computing Corporation binary type: nt-x86 Reading configuration from C:\LSF_7.0\conf\ego\cluster1\kernel/ego.conf Dec 21 08:38:59 2007 4196:1492 6 7.02 Lim starting... Dec 21 08:38:59 2007 4196:1492 6 7.02 LIM is running in advanced workload execution mode. Dec 21 08:38:59 2007 4196:1492 6 7.02 Master LIM is not running in EGO_DISABLE_UNRESOLVABLE_HOST mode. Dec 21 08:38:59 2007 4196:1492 5 7.02 C:\LSF_7.0\7.0\etc/lim.exe -C Dec 21 08:38:59 2007 4196:1492 7 7.02 setMyClusterName: searching cluster files... Dec 21 08:38:59 2007 4196:1492 7 7.02 setMyClusterName: local host hostA belongs to cluster cluster1 Dec 21 08:38:59 2007 4196:1492 3 7.02 domanager(): C:\LSF_7.0\ conf/lsf.cluster.cluster1(13): The cluster manager is the invoker <LSF\lsfadmin> in debug mode Dec 21 08:38:59 2007 4196:1492 6 7.02 reCheckClass: numhosts 1 so reset exchIntvl to 15.00 Dec 21 08:38:59 2007 4196:1492 7 7.02 getDesktopWindow: no Desktop time window configured Dec 21 08:38:59 2007 4196:1492 6 7.02 Checking Done. --------------------------------------------------------- No errors found.The messages shown are typical of normal output from lsadmin ckconfig -v.
Other messages may indicate problems with the Platform LSF configuration. See the Platform LSF Reference for help with some common configuration errors.
Start the cluster
When you first start the cluster, it takes LSF some time to select an LSF master host. During this time (approximately 20 seconds) the cluster may not be able to locate the master host.
Use the following command to start the LSF cluster:
C:\lsf\7.0\bin> lsfstartupThis command starts the LSF services, Platform LIM, Platform RES, and Platform SBD on all LSF Windows hosts.
Mixed cluster
If you have a mixed UNIX-Windows cluster, you will need to log on to a UNIX host and start the UNIX daemons with lsfstartup, and then log on to a Windows host and use lsfstartup from a Windows host to start LSF services on all Windows hosts.
Check the Load Information Manager (LIM)
If all the following commands display correct output, the LIMs are running correctly.
The lsid command
The lsid command displays the cluster name and master host name.
The master name displayed by lsid may vary, but it is usually the first host configured in the Hosts section of the LSF_CONFDIR\lsf.cluster.cluster_name file.
lsid Platform LSF 7 Update 5 Aug 01 2008 Copyright 1992-2007 Platform Computing Corporation My cluster name is cluster1 My master name is hostA.platform.comThe lsinfo command
The lsinfo command displays cluster configuration information about resources, host types, and host models. The information displayed by lsinfo is configured in LSF_CONFDIR\lsf.shared.
Depending on the LSF products installed, and the host types configured in your cluster, the output of the command should look something like the following. The ellipsis (...) indicates where the full output has been shortened for appearance.
In this example, only built-in resources are shown. Refer to Administering Platform LSF for information about configuring custom resources.
lsinfo RESOURCE_NAME TYPE ORDER DESCRIPTION r15s Numeric Inc 15-second CPU run queue length r1m Numeric Inc 1-minute CPU run queue length (alias: cpu) r15m Numeric Inc 15-minute CPU run queue length ut Numeric Inc 1-minute CPU utilization (0.0 to 1.0) pg Numeric Inc Paging rate (pages/second) io Numeric Inc Disk IO rate (Kbytes/second) ls Numeric Inc Number of login sessions (alias: login) it Numeric Dec Idle time (minutes) (alias: idle) tmp Numeric Dec Disk space in /tmp (Mbytes) swp Numeric Dec Available swap space (Mbytes) (alias: swap) mem Numeric Dec Available memory (Mbytes) ... TYPE_NAME UNKNOWN_AUTO_DETECT DEFAULT DigitalUNIX HPPA IBMAIX3 NTX86 NTALPHA SGI6 SUNSOL WIN95 ... MODEL_NAME CPU_FACTOR ARCHITECTURE Ultra5S 10.30 SUNWUltra510_270_sparcv9 HP300 1.00 PENT_100 7.00 PC450 13.20 i686_448 NEWS5000 7.00 INDIGOXS24 7.00 SunSparc 12.00 ...The lshosts command
The lshosts command displays configuration information and status of LSF hosts.
The output contains one line for each host in the cluster. Type, model, and resource information is configured in the LSF_CONFDIR\lsf.cluster.cluster_name file. The cpuf matches the CPU factor given for the host model in LSF_CONFDIR\lsf.shared.
lshosts HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES HostA NTX86 PC450 13.2 1 127M 514M Yes (win2k) HostB SUNSOL5 DEFAULT 1.0 4 1024M 1934M Yes () HostC SGI6 DEFAULT 1.0 - - - Yes () HostD HPPA DEFAULT 1.0 1 108M 256M Yes ()The lsload command
The lsload command displays the current load levels of the cluster.
The output contains one line for each host in the cluster. The status should be ok for all hosts in your cluster.
lsload HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem HostA ok 0.0 0.0 0.0 6% 0.2 2 1365 97M 65M 29M HostB ok 0.1 0.1 0.2 9% 0.0 4 1 130M 319M 12M HostC ok 2.5 2.2 1.9 64% 56.7 50 0 929M 931M 4000M HostD ok 0.2 0.2 0.2 1% 0.0 0 367 93M 86M 50MCheck the Remote Execution Server (RES)
Make sure you have input your user password using lspasswd.
If all the following commands display correct output, RES on all hosts is running correctly.
The lsrun command
The lsrun command runs a command on one LSF host through RES. For example, the following command runs the hostname command on the remote host hostA:
lsrun -v -m hostA hostname <<Execute hostname on remote host hostA>> hostAThe lsgrun command
The lsgrun command runs a command on a group of hosts through RES. For example, the following command runs the hostname command on three remote hosts:
lsgrun -v -m "hostA hostB hostC" hostname <<Executing hostname on hostA>> hostA <<Executing hostname on hostB>> hostB <<Executing hostname on hostC>> hostC <<Executing hostname on hostD>> hostDThe lsclusters command
The lsclusters command displays cross-cluster configuration information. The status should be ok for your cluster.
lsclusters -l CLUSTER_NAME STATUS MASTER_HOST ADMIN HOSTS SERVERS cluster1 ok HostA lsfadmin 4 4 LSF administrators: lsfadmin Available resources: win2k Available host types: WINX86 Available host models: UNKNOWN_AUTO_DETECT PC450 Accept jobs from this cluster: yes Send jobs to this cluster: yesFor more information
- For more information about LSF commands, refer to Administering Platform LSF and the Platform LSF Reference.
LSF on Platform EGO
LSF on Platform EGO allows EGO to serve as the central resource broker, enabling enterprise applications to benefit from sharing of resources across the enterprise grid.
See Administering Platform LSF for more information about LSF on Platform EGO.
See Administering and Using Platform EGO for detailed information about EGO administration.
How to handle parameters in lsf.conf with corresponding parameters in ego.conf
When EGO is enabled, existing LSF parameters (parameter names beginning with LSB_ or LSF_) that are set only in lsf.conf operate as usual because LSF daemons and commands read both lsf.conf and ego.conf.
Some existing LSF parameters have corresponding EGO parameter names in ego.conf (LSF_CONFDIR\lsf.conf is a separate file from LSF_CONFDIR\ego\cluster_name\kernel\ego.conf). You can keep your existing LSF parameters in lsf.conf, or your can set the corresponding EGO parameters in ego.conf that have not already been set in lsf.conf.
You cannot set LSF parameters in ego.conf, but you can set the following EGO parameters related to LIM, PIM, and ELIM in either lsf.conf or ego.conf:
- EGO_DAEMONS_CPUS
- EGO_DEFINE_NCPUS
- EGO_SLAVE_CTRL_REMOTE_HOST
- EGO_WORKDIR
- EGO_PIM_SWAP_REPORT
You cannot set any other EGO parameters (parameter names beginning with EGO_) in lsf.conf. If EGO is not enabled, you can only set these parameters in lsf.conf.
note:
If you specify a parameter in lsf.conf and you also specify the corresponding parameter in ego.conf, the parameter value in ego.conf takes precedence over the conflicting parameter in lsf.conf.If the parameter is not set in either lsf.conf or ego.conf, the default takes effect depends on whether EGO is enabled. If EGO is not enabled, then the LSF default takes effect. If EGO is enabled, the EGO default takes effect. In most cases, the default is the same.Some parameters in lsf.conf do not have exactly the same behaviour, valid values, syntax, or default value as the corresponding parameter in ego.conf, so in general, you should not set them in both files. If you need LSF parameters for backwards compatibility, you should set them only in lsf.conf.If you have LSF 6.2 hosts in your cluster, they can only read lsf.conf, so you must set LSF parameters only in lsf.conf.
LSF and EGO corresponding parameters
The following table summarizes existing LSF parameters that have corresponding EGO parameter names. You must continue to set other LSF parameters in lsf.conf.
Parameters that have changed in LSF
The default for LSF_LIM_PORT has changed to accommodate EGO default port configuration. On EGO, default ports start with lim at 7869, and are numbered consecutively for pem, vemkd, and egosc.
This is different from previous LSF releases where the default LSF_LIM_PORT was 6879. res, sbatchd, and mbatchd continue to use the default pre-version 7 ports 6878, 6881, and 6882.
Upgrade installation preserves existing port settings for lim, res, sbatchd, and mbatchd. EGO pem, vemkd, and egosc use default EGO ports starting at 7870, if they do not conflict with existing lim, res, sbatchd, and mbatchd ports.
EGO connection ports and base port
On every host, a set of connection ports must be free for use by LSF and EGO components.
LSF and EGO require exclusive use of certain ports for communication. EGO uses the same four consecutive ports on every host in the cluster. The first of these is called the base port.
The default EGO base connection port is 7869. By default, EGO uses four consecutive ports starting from the base port. By default, EGO uses ports 7869-7872.
The ports can be customized by customizing the base port. For example, if the base port is 6880, EGO uses ports 6880-6883.
LSF and EGO needs the same ports on every host, so you must specify the same base port on every host.
Checking the LSF batch system
To check the LSF batch system, complete the following steps:
- Verify the LSF batch daemon configuration using the badmin command.
- Check the LSF batch system by running a few basic commands: bhosts, bqueues, bsub, bjobs.
- To perform these checks, LIM and mbatchd must be running on the master host and on the submission host, which is the host from which you are running the command. See Start the cluster for information about starting LSF services.
- Refer to the LSF Reference for an explanation of the output for the LSF commands discussed in this section.
Verify the LSF batch daemon configuration
The badmin command
The badmin command controls and monitors the operation of the LSF Batch system. Use the badmin ckconfig command to check the LSF Batch configuration files. The -v option displays detailed information about the configuration:
C:\LSF_7.0>badmin ckconfig -v Checking configuration files ... --------------------------------------------------------- No errors found.The messages shown above are the normal output from badmin ckconfig -v. Other messages may indicate problems with the Platform LSF Batch configuration. Refer to the Platform LSF Reference for help with some common configuration errors.
Display batch hosts
The bhosts command
The bhosts command displays the status of batch server hosts in the cluster. The status should be ok for all hosts in your cluster.
C:\lsf\bin>bhosts HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV hostA ok - - 0 0 0 0 0 hostB ok - - 0 0 0 0 0 hostC ok - - 0 0 0 0 0 hostD ok - - 0 0 0 0 0Display batch queues
The bqueues command
The bqueues command checks available queues and their configuration parameters. For a queue to accept and dispatch jobs, the status should be Open:Active. Queue information displayed by bqueues is configured in LSB_CONFDIR\cluster_name\configdir\lsb.queues.
C:\lsf\bin>bqueues QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP owners 43 Open:Active - 6 - - 0 0 0 0 priority 43 Open:Active - - - - 0 0 0 0 night 40 Open:Active - - - - 0 0 0 0 chkpnt_rerun_qu 40 Open:Active - - - - 0 0 0 0 short 35 Open:Active - - - - 0 0 0 0 license 33 Open:Active - - - - 0 0 0 0 normal 30 Open:Active - - - - 0 0 0 0 idle 20 Open:Active - - - - 0 0 0 0Display the default batch queue
The bparams command
The bparams command displays information about the LSF Batch configuration parameters. Use bparams to display the name of the default queue:
C:\lsf\bin>bparams Default Queues: normal Job Dispatch Interval: 20 seconds Job Checking Interval: 15 seconds Job Accepting Interval: 20 secondsThe DEFAULT_QUEUE parameter in
LSB_CONFDIR\cluster_name\configdir\lsb.params defines which queue is the default queue.Submit a test job
The bsub command
The bsub command submits jobs to LSF queues.
For example, the following command submits a sleep job to the default queue named normal:
C:\lsf\7.0\bin> bsub sleep 60 Job <1> is submitted to default queue <normal>.Display batch jobs
The bjobs command
The bjobs command displays the job status. The bjobs -l option displays a long format of jobs running in the batch system. Use bjobs -w to display the full user name, including domain name.
C:\lsf\7.0\bin> bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 1 lsfadmin RUN normal hostA hostB sleep 60 Jan 5 17:39:58If all hosts are busy, the job is not started immediately and the STAT column says PEND. The job sleep 60 should take one minute to run. When the job completes, LSF sends mail reporting the job completion.
For more information
- For more information about LSF commands, refer to Administering Platform LSF and Platform LSF Reference.
Test the Platform Management Console (PMC)
- Browse to the web server URL and log in to the PMC as user Admin with password Admin.
- If you have only one management host (the master host), the web server URL is http://master_host:8080/platform.
- If you have multiple management hosts, locate the web server:
- Log on as lsfadmin and run egosh client view.
- This command locates the PMC. It is only needed if EGO is enabled.
- Scan the client list for a name preceded by GUIURL, such as GUIURL_HostW.
- The additional information shows the web server URL; for example, http://Host_W:8080/platform.
- As a security measure, use the PMC to change the Admin and Guest account passwords from the simple default passwords, Admin and Guest.
Platform Computing Inc.
www.platform.com |
Knowledge Center Previous Next Index |