Knowledge Center         Contents    Previous  Next    Index  
Platform Computing Corp.

Managing LSF on Platform EGO

Contents

About LSF on Platform EGO

LSF on Platform EGO allows EGO to serve as the central resource broker, enabling enterprise applications to benefit from sharing of resources across the enterprise grid.

What is Platform EGO?

Platform Enterprise Grid Orchestrator (EGO) allows developers, administrators, and users to treat a collection of distributed software and hardware resources on a shared computing infrastructure (cluster) as parts of a single virtual computer.

EGO assesses the demands of competing business services (consumers) operating within a cluster and dynamically allocates resources so as to best meet a company's overriding business objectives. These objectives might include

Platform EGO also provides a full suite of services to support and manage resource orchestration. These include cluster management, configuration and auditing of service-level plans, resource facilitation to provide fail-over if a master host goes down, monitoring and data distribution.

EGO is only sensitive to the resource requirements of business services; EGO has no knowledge of any run-time dynamic parameters that exist for them. This means that EGO does not interfere with how a business service chooses to use the resources it has been allocated.

How does Platform EGO work?

Platform products work in various ways to match business service (consumer) demands for resources with an available supply of resources. While a specific clustered application manager or consumer (for example, an LSF cluster) identifies what its resource demands are, Platform EGO is responsible for supplying those resources. Platform EGO determines the number of resources each consumer is entitled to, takes into account a consumer's priority and overall objectives, and then allocates the number of required resources (for example, the number of slots, virtual machines, or physical machines).

Once the consumer receives its allotted resources from Platform EGO, the consumer applies its own rules and policies. How the consumer decides to balance its workload across the fixed resources allotted to it is not the responsibility of EGO.

So how does Platform EGO know the demand? Administrators or developers use various EGO interfaces (such as the SDK or CLI) to tell EGO what constitutes a demand for more resources. When Platform LSF identifies that there is a demand, it then distributes the required resources based on the resource plans given to it by the administrator or developer.

For all of this to happen smoothly, various components are built into Platform EGO. Each EGO component performs a specific job.

Platform EGO components

Platform EGO comprises a collection of cluster orchestration software components. The following figure shows overall architecture and how these components fit within a larger system installation and interact with each other:

Key EGO concepts

Consumers

A consumer represents an entity that can demand resources from the cluster. A consumer might be a business service, a business process that is a complex collection of business services, an individual user, or an entire line of business.

EGO resources

Resources are physical and logical entities that can be requested by a client. For example, an application (client) requests a processor (resource) in order to run.

Resources also have attributes. For example, a host has attributes of memory, processor utilization, operating systems type, etc.

Resource distribution tree

The resource distribution tree identifies consumers of the cluster resources, and organizes them into a manageable structure.

Resource groups

Resource groups are logical groups of hosts. Resource groups provide a simple way of organizing and grouping resources (hosts) for convenience; instead of creating policies for individual resources, you can create and apply them to an entire group. Groups can be made of resources that satisfy a specific requirement in terms of OS, memory, swap space, CPU factor and so on, or that are explicitly listed by name.

Resource distribution plans

The resource distribution plan, or resource plan, defines how cluster resources are distributed among consumers. The plan takes into account the differences between consumers and their needs, resource properties, and various other policies concerning consumer rank and the allocation of resources.

The distribution priority is to satisfy each consumer's reserved ownership, then distribute remaining resources to consumers that have demand.

Services

A service is a self-contained, continuously running process that accepts one or more requests and returns one or more responses. Services may have multiple concurrent service instances running on multiple hosts. All Platform EGO services are automatically enabled by default at installation.

Run egosh to check service status.

If EGO is disabled, the egosh command cannot find ego.conf or cannot contact vemkd (not started), and the following message is displayed:

You cannot run the egosh command because the administrator has chosen 
not to enable EGO in lsf.conf: LSF_ENABLE_EGO=N. 
EGO user accounts

A user account is a Platform system user who can be assigned to any role for any consumer in the tree. User accounts include optional contact information, a name, and a password.

LSF and EGO directory structure

The following tables describe the purpose of each sub-directory and whether they are writable or non-writable by LSF.

LSF_TOP

Directory Path
Description
Attribute
LSF_TOP/7.0
LSF 7.0 binaries and other machine dependent files
Non-writable
LSF_TOP/conf
LSF 7.0 configuration files
You must be LSF administrator or root to edit files in this directory
Writable by the LSF administrator, master host, and master candidate hosts
LSF_TOP/log
LSF 7.0 log files
Writable by all hosts in the cluster
LSF_TOP/work
LSF 7.0 working directory
Writable by the master host and master candidate hosts, and is accessible to slave hosts

EGO, GUI, and PERF directories

Directory Path
Description
Attribute
LSF_BINDIR
EGO binaries and other machine dependent files
Non-writable
LSF_LOGDIR/ego/cluster_name/eservice
(EGO_ESRVDIR)
EGO services configuration and log files.
Writable
LSF_LOGDIR/ego/cluster_name/kernel
(EGO_CONFDIR, LSF_EGO_ENVDIR)
EGO kernel configuration, log files and working directory, including conf/log/work
Writable
LSB_SHAREDIR/cluster_name/ego (EGO_WORKDIR)
EGO working directory
Writable
LSF_TOP/perf/1.2
PERF commands, library and schema
Non-writable
LSF_LOGDIR/perf/cluster_name/conf
(PERF_CONFDIR)
PERF configuration
Writable
LSB_SHAREDIR/cluster_name/perf/data
(PERF_DATADIR)
PERF embedded data files for derby
Writable
LSF_TOP/perf/1.2/etc
PERF script command for services
Non-writable
LSF_TOP/log/perf
(PERF_LOGDIR)
PERF log files
Writable
LSB_SHAREDIR/cluster_name/perf
(PERF_WORKDIR)
PERF working directory
Writable
LSF_TOP/jre
Java Runtime Environment
Non-writable
LSF_TOP/gui
GUI
Non-writable
LSF_LOGDIR/gui/cluster_name/conf
(GUI_CONFDIR)
GUI configuration
Writable
LSB_SHAREDIR/cluster_name/gui
(CATALINA_WORKDIR, CATALINA_TMPDIR)
GUI working directory
Writable
LSF_TOP/log/gui
(GUI_LOGDIR)
GUI log files
Writable
LSF_TOP/gui/2.0/
GUI binaries and tomcat
Non-writable
LSF_TOP/gui/2.0/tomcat
Tomcat web server
Writable

note:  
Several directories under LSF_TOP/gui/1.2/tomcat are writable by Tomcat servers. You should install the whole Tomcat directory on a writable file system.

Example directory structures

UNIX and Linux

The following figures show typical directory structures for a new UNIX or Linux installation with lsfinstall. Depending on which products you have installed and platforms you have selected, your directory structure may vary.

Microsoft Windows

The following diagram shows an example directory structure for a Windows installation.

Configuring LSF and EGO

EGO configuration files for LSF daemon management (res.xml and sbatchd.xml)

The following files are located in EGO_ESRVDIR/esc/conf/services/:

When LSF daemon control through EGO Service Controller is configured, lsadmin uses the reserved EGO service name res to control the LSF res daemon, and badmin uses the reserved EGO service name sbatchd to control the LSF sbatchd daemon.

How to handle parameters in lsf.conf with corresponding parameters in ego.conf

When EGO is enabled, existing LSF parameters (parameter names beginning with LSB_ or LSF_) that are set only in lsf.conf operate as usual because LSF daemons and commands read both lsf.conf and ego.conf.

Some existing LSF parameters have corresponding EGO parameter names in ego.conf (LSF_CONFDIR/lsf.conf is a separate file from LSF_CONFDIR/ego/cluster_name/kernel/ego.conf). You can keep your existing LSF parameters in lsf.conf, or your can set the corresponding EGO parameters in ego.conf that have not already been set in lsf.conf.

You cannot set LSF parameters in ego.conf, but you can set the following EGO parameters related to LIM, PIM, and ELIM in either lsf.conf or ego.conf:

You cannot set any other EGO parameters (parameter names beginning with EGO_) in lsf.conf. If EGO is not enabled, you can only set these parameters in lsf.conf.

note:  
If you specify a parameter in lsf.conf and you also specify the corresponding parameter in ego.conf, the parameter value in ego.conf takes precedence over the conflicting parameter in lsf.conf.
If the parameter is not set in either lsf.conf or ego.conf, the default takes effect depends on whether EGO is enabled. If EGO is not enabled, then the LSF default takes effect. If EGO is enabled, the EGO default takes effect. In most cases, the default is the same.
Some parameters in lsf.conf do not have exactly the same behavior, valid values, syntax, or default value as the corresponding parameter in ego.conf, so in general, you should not set them in both files. If you need LSF parameters for backwards compatibility, you should set them only in lsf.conf.

If you have LSF 6.2 hosts in your cluster, they can only read lsf.conf, so you must set LSF parameters only in lsf.conf.

LSF and EGO corresponding parameters

The following table summarizes existing LSF parameters that have corresponding EGO parameter names. You must continue to set other LSF parameters in lsf.conf.

lsf.conf parameter
ego.conf parameter
LSF_API_CONNTIMEOUT
EGO_LIM_CONNTIMEOUT
LSF_API_RECVTIMEOUT
EGO_LIM_RECVTIMEOUT
LSF_CLUSTER_ID (Windows)
EGO_CLUSTER_ID (Windows)
LSF_CONF_RETRY_INT
EGO_CONF_RETRY_INT
LSF_CONF_RETRY_MAX
EGO_CONF_RETRY_MAX
LSF_DEBUG_LIM
EGO_DEBUG_LIM
LSF_DHPC_ENV
EGO_DHPC_ENV
LSF_DYNAMIC_HOST_TIMEOUT
EGO_DYNAMIC_HOST_TIMEOUT
LSF_DYNAMIC_HOST_WAIT_TIME
EGO_DYNAMIC_HOST_WAIT_TIME
LSF_ENABLE_DUALCORE
EGO_ENABLE_DUALCORE
LSF_GET_CONF
EGO_GET_CONF
LSF_GETCONF_MAX
EGO_GETCONF_MAX
LSF_LIM_DEBUG
EGO_LIM_DEBUG
LSF_LIM_PORT
EGO_LIM_PORT
LSF_LOCAL_RESOURCES
EGO_LOCAL_RESOURCES
LSF_LOG_MASK
EGO_LOG_MASK
LSF_MASTER_LIST
EGO_MASTER_LIST
LSF_PIM_INFODIR
EGO_PIM_INFODIR
LSF_PIM_SLEEPTIME
EGO_PIM_SLEEPTIME
LSF_PIM_SLEEPTIME_UPDATE
EGO_PIM_SLEEPTIME_UPDATE
LSF_RSH
EGO_RSH
LSF_STRIP_DOMAIN
EGO_STRIP_DOMAIN
LSF_TIME_LIM
EGO_TIME_LIM

Parameters that have changed in LSF 7

The default for LSF_LIM_PORT has changed to accommodate EGO default port configura6tion. On EGO, default ports start with lim at 7869, and are numbered consecutively for pem, vemkd, and egosc.

This is different from previous LSF releases where the default LSF_LIM_PORT was 6879. res, sbatchd, and mbatchd continue to use the default pre-version 7 ports 6878, 6881, and 6882.

Upgrade installation preserves any existing port settings for lim, res, sbatchd, and mbatchd. EGO pem, vemkd, and egosc use default EGO ports starting at 7870, if they do not conflict with existing lim, res, sbatchd, and mbatchd ports.

EGO connection ports and base port

On every host, a set of connection ports must be free for use by LSF and EGO components.

LSF and EGO require exclusive use of certain ports for communication. EGO uses the same four consecutive ports on every host in the cluster. The first of these is called the base port.

The default EGO base connection port is 7869. By default, EGO uses four consecutive ports starting from the base port. By default, EGO uses ports 7869-7872.

The ports can be customized by customizing the base port. For example, if the base port is 6880, EGO uses ports 6880-6883.

LSF and EGO needs the same ports on every host, so you must specify the same base port on every host.

Special resource groups for LSF master hosts

By default, Platform LSF installation defines a special resource group named ManagementHosts for the Platform LSF master host. (In general, Platform LSF master hosts are dedicated hosts; the ManagementHosts EGO resource group serves this purpose.)

Platform LSF master hosts must not be subject to any lend, borrow, or reclaim policies. They must be exclusively owned by the Platform LSF consumer.

The default Platform EGO configuration is such that the LSF_MASTER_LIST hosts and the execution hosts are in different resource groups so that different resource plans can be applied to each group.

Managing LSF daemons through EGO

EGO daemons

Daemons in LSF_SERVERDIR
Description
vemkd
Started by lim on master host
pem
Started by lim on every host
egosc
Started by vemkd on master host

LSF daemons

Daemons in LSF_SERVERDIR
Description
lim
lim runs on every host. On UNIX, lim is either started by lsadmin through rsh/ssh or started through rc file. On Windows, lim is started as a Windows service.
pim
Started by lim on every host
mbatchd
Started by sbatchd on master host
mbschd
Started by mbatchd on master host
sbatchd
Under OS startup mode, sbatchd is either started by lsadmin through rsh/ssh or started through rc file on UNIX. On Windows, sbatchd is started as a Windows service.
Under EGO Service Controller mode, sbatchd is started by pem as an EGO service on every host.
res
Under OS startup mode, res is either started by lsadmin through rsh/ssh or started through rc file on UNIX. On Windows, res is started as a Windows service.
Under EGO Service Controller mode, res is started by pem as an EGO service on every host.

Operating System daemon control

Opertaing system startup mode is the same as previous releases:

EGO Service Controller daemon control

Under EGO Service Control mode, administrators configure the EGO Service Controller to start res and sbatchd, and restart them if they fail.

You can still run lsadmin and badmin to start LSF manually, but internally, lsadmin and badmin communicates with the EGO Service Controller, which actually starts sbatchd and res as EGO services.

If EGO Service Controller management is configured and you run badmin hshutdown and lsadmin resshutdown to manually shut down LSF, the LSF daemons are not restarted automatically by EGO. You must run lsadmin resstartup and badmin hstartup to start the LSF daemons manually.

Permissions required for daemon control

To control all daemons in the cluster, you must

Bypass EGO login at startup (lsf.sudoers)

Prerequisites: You must be the LSF administrator (lsfadmin) or root to configure lsf.sudoers.

When LSF daemons control through EGO Service Controller is configured, users must have EGO credentials for EGO to start res and sbatchd services. By default, lsadmin and badmin invoke the egosh user logon command to prompt for the user name and password of the EGO administrator to get EGO credentials.

  1. Configure lsf.sudoers to bypass EGO login to start res and sbatchd automatically.
  2. Set the following parameters:

EGO control of PMC and PERF services

When EGO is enabled in the cluster, EGO may control services for components such as the Platform Management Console (PMC) or LSF Reports (PERF). This is recommended. It allows failover among multiple management hosts, and allows EGO cluster commands to start, stop, and restart the services.

PMC not controlled by EGO

For PMC, if it is not controlled by EGO, you must specify the host to run PMC. Use the pmcadmin command to start and stop PMC. Use the pmcsetrc.sh command to enable automatic startup on the host (the daemon will restart if the host is restarted).

PERF services not controlled by EGO

For PERF, if the services are not controlled by EGO, you must specify the host to run PERF services plc, jobdt, and purger. Use the perfadmin command to start and stop these services on the host. Use the perfsetrc.sh command to enable automatic startup of these services on the host (the daemons will restart if the host is restarted). If the PERF host is not the same as the Derby database host, run the same commands on the Derby database host to control derbydb.

Administrative Basics

See Administering and Using Platform EGO for detailed information about EGO administration.

Set the command-line environment

On Linux hosts, set the environment before you run any LSF or EGO commands. You need to do this once for each session you open. root, lsfadmin, and egoadmin accounts use LSF and EGO commands to configure and start the cluster.

You need to reset the environment if the environment changes during your session, for example, if you run egoconfig mghost, which changes the location of some configuration files.

If Platform EGO is enabled in the LSF cluster (LSF_ENABLE_EGO=Y and LSF_EGO_ENVDIR are defined in lsf.conf), cshrc.lsf and profile.lsf, set the following environment variables:

See the Platform EGO Reference for more information about these variables.

See the Platform LSF Configuration Reference for more information about cshrc.lsf and profile.lsf.

Logging and troubleshooting

LSF log files

LSF event and account log location

LSF uses directories for temporary work files, log files and transaction files and spooling.

LSF keeps track of all jobs in the system by maintaining a transaction log in the work subtree. The LSF log files are found in the directory LSB_SHAREDIR/cluster_name/logdir.

The following files maintain the state of the LSF system:

lsb.events

LSF uses the lsb.events file to keep track of the state of all jobs. Each job is a transaction from job submission to job completion. LSF system keeps track of everything associated with the job in the lsb.events file.

lsb.events.n

The events file is automatically trimmed and old job events are stored in lsb.event.n files. When mbatchd starts, it refers only to the lsb.events file, not the lsb.events.n files. The bhist command can refer to these files.

LSF error log location

If the optional LSF_LOGDIR parameter is defined in lsf.conf, error messages from LSF servers are logged to files in this directory.

If LSF_LOGDIR is defined, but the daemons cannot write to files there, the error log files are created in /tmp.

If LSF_LOGDIR is not defined, errors are logged to the system error logs (syslog) using the LOG_DAEMON facility. syslog messages are highly configurable, and the default configuration varies widely from system to system. Start by looking for the file /etc/syslog.conf, and read the man pages for syslog(3) and syslogd(1).

If the error log is managed by syslog, it is probably already being automatically cleared.

If LSF daemons cannot find lsf.conf when they start, they will not find the definition of LSF_LOGDIR. In this case, error messages go to syslog. If you cannot find any error messages in the log files, they are likely in the syslog.

LSF daemon error logs

LSF log files are reopened each time a message is logged, so if you rename or remove a daemon log file, the daemons will automatically create a new log file.

The LSF daemons log messages when they detect problems or unusual situations.

The daemons can be configured to put these messages into files.

The error log file names for the LSF system daemons are:

LSF daemons log error messages in different levels so that you can choose to log all messages, or only log messages that are deemed critical. Message logging for LSF daemons is controlled by the parameter LSF_LOG_MASK in lsf.conf. Possible values for this parameter can be any log priority symbol that is defined in /usr/include/sys/syslog.h. The default value for LSF_LOG_MASK is LOG_WARNING.

LSF log directory permissions and ownership

Ensure that the permissions on the LSF_LOGDIR directory to be writable by root. The LSF administrator must own LSF_LOGDIR.

EGO log files

Log files contain important run-time information about the general health of EGO daemons, workload submissions, and other EGO system events. Log files are an essential troubleshooting tool during production and testing.

The naming convention for most EGO log files is the name of the daemon plus the host name the daemon is running on.

The following table outlines the daemons and their associated log file names. Log files on Windows hosts have a .txt extension.

Daemon
Log file name
ESC (EGO Service Controller)
esc.log.hostname
named
named.log.hostname
PEM (Process Execution Manager)
pem.log.hostname
VEMKD (Platform LSF Kernel Daemon)
vemkd.log.hostname
WSM (Platform Management Console/WEBGUI)
wsm.log.hostname
WSG (Web Service Gateway)
wsg.log

Most log entries are informational in nature. It is not uncommon to have a large (and growing) log file and still have a healthy cluster.

EGO log file locations

By default, most Platform LSF log files are found in LSF_LOGDIR .

EGO log entry format

Log file entries follow the format

date time_zone log_level [process_id:thread_id] action:description/message 

where the date is expressed in YYYY-MM-DD hh-mm-ss.sss.

For example, 2006-03-14 11:02:44.000 Eastern Standard Time ERROR [2488:1036] vemkdexit: vemkd is halting.

EGO log classes

Every log entry belongs to a log class. You can use log class as a mechanism to filter log entries by area. Log classes in combination with log levels allow you to troubleshoot using log entries that only address, for example, configuration.

Log classes are adjusted at run time using egosh debug.

Valid logging classes are as follows:

Class
Description
LC_ALLOC
Logs messages related to the resource allocation engine
LC_AUTH
Logs messages related to users and authentication
LC_CLIENT
Logs messages related to clients
LC_COMM
Logs messages related to communications
LC_CONF
Logs messages related to configuration
LC_CONTAINER
Logs messages related to activities
LC_EVENT
Logs messages related to the event notification service
LC_MEM
Logs messages related to memory allocation
LC_PEM
Logs messages related to the process execution manager (pem)
LC_PERF
Logs messages related to performance
LC_QUERY
Logs messages related to client queries
LC_RECOVER
Logs messages related to recovery and data persistence
LC_RSRC
Logs messages related to resources, including host status changes
LC_SYS
Logs messages related to system calls
LC_TRACE
Logs the steps of the program

EGO log levels

There are nine log levels that allow administrators to control the level of event information that is logged.

When you are troubleshooting, increase the log level to obtain as much detailed information as you can. When you are finished troubleshooting, decrease the log level to prevent the log files from becoming too large.

Valid logging levels are as follows:

Number
Level
Description
0
LOG_EMERG
Log only those messages in which the system is unusable.
1
LOG_ALERT
Log only those messages for which action must be taken immediately.
2
LOG_CRIT
Log only those messages that are critical.
3
LOG_ERR
Log only those messages that indicate error conditions.
4
LOG_WARNING
Log only those messages that are warnings or more serious messages. This is the default level of debug information.
5
LOG_NOTICE
Log those messages that indicate normal but significant conditions or warnings and more serious messages.
6
LOG_INFO
Log all informational messages and more serious messages.
7
LOG_DEBUG
Log all debug-level messages.
8
LOG_TRACE
Log all available messages.

EGO log level and class information retrieved from configuration files

When EGO is enabled, the pem and vemkd daemons read ego.conf to retrieve the following information (as corresponds to the particular daemon):

The wsm daemon reads wsm.conf to retrieve the following information:

The wsg daemon reads wsg.conf to retrieve the following information:

The service director daemon (named) reads named.conf to retrieve the following information:

Why do log files grow so quickly?

Every time an EGO system event occurs, a log file entry is added to a log file. Most entries are informational in nature, except when there is an error condition. If your log levels provide entries for all information (for example, if you have set them to LOG_DEBUG), the files will grow quickly.

Suggested settings:

tip:  
If your log files are too long, you can always rename them for archive purposes. New, fresh log files will then be created and will log all new events.
How often should I maintain log files?

The growth rate of the log files is dependent on the log level and the complexity of your cluster. If you have a large cluster, daily log file maintenance may be required.

We recommend using a log file rotation utility to do unattended maintenance of your log files. Failure to do timely maintenance could result in a full file system which hinders system performance and operation.

Troubleshoot using multiple EGO log files

EGO log file locations and content

If a service does not start as expected, open the appropriate service log file and review the run-time information contained within it to discover the problem. Look for relevant entries such as insufficient disk space, lack of memory, or network problems that result in unavailable hosts.

Log file
Default location
What it contains
catalina.out
Linux: LSF_LOGDIR/gui/catalina.out
Windows: LSF_LOGDIR\gui\catalina.out
Logs system errors and debug information from Tomcat web server startup.
esc.log
Linux: LSF_LOGDIR/ego/cluster_name/eservice/esc/log/esc.log.hostname
Windows: LSF_LOGDIR\ego\cluster_name\eservice\esc\log\esc.log.hostname
Logs service failures and service instance restarts based on availability plans. Errors surrounding Platform Management Console startup are logged here.
named.log
Linux: LSF_LOGDIR/ego/cluster_name/eservice/esd/conf/named/namedb/named.log.hostname
Windows: LSF_LOGDIR\ego\cluster_name\eservice\esd\conf\named\namedb\named.log.hostname
Logs information gathered during the updating and querying of service instance location; logged by BIND, a DNS server.
pem.log
Linux: LSF_LOGDIR/pem.log.hostname
Windows: LSF_LOGDIR\pem.log.hostname
Logs remote operations (start, stop, control activities, failures). Logs tracked results for resource utilization of all processes associated with the host, and information for accounting or chargeback.
vemkd.log
Linux: LSF_LOGDIR/vemkd.log.hostname
Windows: LSF_LOGDIR\vemkd.log.hostname
Logs aggregated host information about the state of individual resources, status of allocation requests, consumer hierarchy, resources assignment to consumers, and started operating system-level process.
wsg.log
Linux: LSF_LOGDIR/ego/cluster_name/eservice/wsg/log/wsg.log.hostname
Windows: LSF_LOGDIR\ego\cluster_name\eservice\wsg\log\wsg.log.hostname
Logs service failures surrounding web services interfaces for web service clients (applications).
wsm.log
Linux: LSF_LOGDIR/gui/wsm.log.hostname
Windows: LSF_LOGDIR\gui\wsm.log.hostname
Logs information collected by the web server monitor daemon. Failures of the WEBGUI service that runs the Platform Management Console are logged here.

Matching service error messages and corresponding log files

If you receive this message...
This may be the problem...
Review this log file
failed to create vem working directory
Cannot create work directory during startup
vemkd
failed to open lock file
Cannot get lock file during startup
vemkd
failed to open host event file
Cannot recover during startup because cannot open event file
vemkd
lim port is not defined
EGO_LIM_PORT in ego.conf is not defined
lim
master candidate can not set GET_CONF=lim
Wrong parameter defined for master candidate host (for example, EGO_GET_CONF=LIM)
lim
there is no valid host in EGO_MASTER_LIST
No valid host in master list
lim
ls_getmyhostname fails
Cannot get local host name during startup
pem
temp directory (%s) not exist or not accessible, exit
Tmp directory does not exist
pem
incorrect EGO_PEM_PORT value %s, exit
EGO_PEM_PORT is a negative number
pem
chdir(%s) fails
Tmp directory does not exist
esc
cannot initialize the listening TCP port %d
Socket error
esc
cannot log on
Log on to vemkd failed
esc
JAVA_HOME is not defined, exit
WEBGUI service profile is wrong
wsm
failed to get hostname: %s
Host name configuration problem
wsm
event_init ( ) failed
EGO event plugin configuration problem in ego.conf file
wsm
ego.conf_loadeventplug ( ) failed
Event library problem
wsm
cannot write to child
Web server is down or there is no response
wsm
child no reply
Web server is down or there is no response
wsm
vem_register: error in invoking vem_register function
VEM service registration failed
wsg
you are not authorized to unregister a service
Either you are not authorized to unregister a service, or there is no registry client
wsg
request has invalid signature: TSIG service.ego: tsig verify failure (BADTIME)
Resource record updating failed
named

For more information

Frequently asked questions

Question

Does LSF 7 on EGO support a grace period when reclamation is configured in the resource plan?

Answer

No. Resources are immediately reclaimed even if you set a resource reclaim grace period.

Question

Does LSF 7 on EGO support upgrade of the master host only?

Answer

Yes

Question

Under EGO Service Controller daemon management mode on Windows, does PEM start sbatchd and res directly or does it ask Windows to start sbatchd and res as Windows Services?

Answer

On Windows, LSF still installs sbatchd and res as Windows services. If EGO Service Controller daemon control is selected during installation, the Windows service will be set up as Manual. PEM will start up the sbatchd and res directly, not as Windows Services.

Question

What's the benefit of LSF daemon management through the EGO Service Controller?

Answer

EGO Service Controller provides High Availability services to sbatchd and res, and faster cluster startup than startup with lsadmin and badmin.

Question

How does the hostsetup script work in LSF 7?

Answer

LSF 7 hostsetup script functions essentially the same as previous versions. It sets up a host to use the LSF cluster and configures LSF daemons to start automatically. In LSF 7, running hostsetup --top=/path --boot="y" will check the EGO service defination files sbatchd.xml and res.xml. If res and sbatchd startup is set to "Automatic", the host rc setting will only start lim. If set to "Manual", the host rc setting will start lim, sbatchd, and res as in previous versions.

Question

Is non-shared mixed cluster installation supported, for example, adding UNIX hosts to a Windows cluster, or adding Windows hosts to a UNIX cluster?

Answer

In LSF 7, non-shared installation is supported. For example, to add a UNIX host to a Windows cluster, set up the Windows cluster first, then run lsfinstall -s -f slave.config. In slave.config, put the Windows hosts in LSF_MASTER_LIST. After startup, the UNIX host will become an LSF host. Adding a Windows host is even simpler. Run the Windows installer, enter the current UNIX master host name. After installation, all daemons will automatically start and the host will join the cluster.

Question

As EGO and LSF share base configuration files, how are other resources handled in EGO in addition to hosts and slots?

Answer

Same as previous releases. LSF 7 mbatchd still communicates with LIM to get available resources. By default, LSF can schedule jobs to make use of all resources started in cluster. If EGO-enabled SLA scheduling is configured, LSF only schedules jobs to use resources on hosts allocated by EGO.

Question

How about compatibility for external scripts and resources like elim, melim, esub and others?

Answer

LSF 7 supports full compatibility for these external executables. elim.xxx is started under LSF_SERVERDIR as usual. By default, LIM is located under LSF_SERVERDIR.

Question

Can Platform LSF MultiCluster share one EGO base?

Answer

No, each LSF cluster must run on top of one EGO cluster.

Question

Can EGO consumer policies replace MultiCluster lease mode?

Answer

Conceptually, both define resource borrowing and lending policies. However, current EGO consumer policies can only work with slot resources within one EGO cluster. MultiCluster lease mode supports other load indices and external resources between multiple clusters. If you are using MultiCluster lease mode to share only slot resources between clusters, and you are able to merge those clusters into a single cluster, you should be able to use EGO consumer policy and submit jobs to EGO-enabled SLA scheduling to achieve the same goal.


Platform Computing Inc.
www.platform.com
Knowledge Center         Contents    Previous  Next    Index