Knowledge Center Contents Previous Next |
lsb_readjobinfo_cond()
Returns the next job information record for condensed host groups in mbatchd.
DESCRIPTION
lsb_readjobinfo_cond() reads the number of records defined by the more parameter. The more parameter receives its value from either lsb_openjobinfo() or lsb_openjobinfo_a(). Each time lsb_readjobinfo_cond() is called, it returns one record from mbatchd. Use lsb_readjobinfo_cond() in a loop and use more to determine how many times to repeat the loop to retrieve job information records.
lsb_readjobinfo_cond() differs from lsb_readjobinfo() in that if jInfoHExt is not NULL, lsb_readjobinfo_cond() substitutes hostGroup (if it is a condensed host group) for job execution hosts.
SYNOPSIS
#include <lsf/lsbatch.h> #include <time.h> #include <lsf/lsf.h> struct jobInfoEnt *lsb_readjobinfo_cond(int *more, struct jobInfoHeadExt *jInfoHExt); struct_jobInfoEnt { LS_LONG_INT jobId; char *user; int status; int *reasonTb; int numReasons; int reasons; int subreasons; int jobPid; time_t submitTime; time_t reserveTime; time_t startTime; time_t predictedStartTime; time_t endTime; time_t lastEvent; time_t nextEvent; int duration; float cpuTime; int umask; char *cwd; char *subHomeDir; char *fromHost; char **exHosts; int numExHosts; float cpuFactor; int nIdx; float *loadSched; float *loadStop; struct submit submit; int exitStatus; int execUid; char *execHome; char *execCwd; char *execUsername; time_t jRusageUpdateTime; struct jRusage runRusage; int jType; char *parentGroup; char *jName; int counter[NUM_JGRP_COUNTERS]; u_short port; int jobPriority; int numExternalMsg; struct jobExternalMsgReply **externalMsg; int clusterId; char *detailReason; float idleFactor; int exceptMask; char *additionalInfo; int exitInfo; int warningTimePeriod; char *warningAction; char *chargedSAAP; char *execRusage; time_t rsvInActive; int numLicense; char **licenseNames; float aps; int runTime int reserveCnt struct reserveItem *items; float adminFactorVal; int resizeMin int resizeMax time_t resizeReqTime int jStartNumExHosts char **jStartExHosts time_t lastResizeTime struct reserveItem *items; }; struct submit { int options; int options2; char *jobName; char *queue; int numAskedHosts; char **askedHosts; char *resReq; int rLimits[LSF_RLIM_NLIMITS]; char *hostSpec; int numProcessors; char *dependCond; char *timeEvent; time_t beginTime; time_t termTime; int sigValue; char *inFile; char *outFile; char *errFile; char *command; char *newCommand; time_t chkpntPeriod; char *chkpntDir; int nxf; struct xFile *xf; char *preExecCmd; char *mailUser; int delOptions; int delOptions2; char *projectName; int maxNumProcessors; char *loginShell; char *userGroup; char *exceptList; int userPriority; char *rsvId; char *jobGroup; char *sla; char *extsched; int warningTimePeriod; char *warningAction; char *licenseProject; int options3; int delOptions3; char *app; int jsdlFlag; char *jsdlDoc; void *correlator; char *apsString; char *postExecCmd; char *cwd; int runtimeEstimation; char *requeueEValues; int initChkpntPeriod; int migThreshold; char *notifyCmd; }; struct jRusage{ int mem; int swap; int utime; int stime; int npids; struct pidInfo; int npgids; int *pgid; int nthreads; }; struct pidInfo { int pid; int ppid; int pgid; int jobid; }; struct reserveItem { char *resName; int nHost; float *value; int shared; };PARAMETERS
*more
Number of job records in the master batch daemon.
*jInfoHExt
Job information header info for the condensed host group.
RETURN VALUES
jobInfoEnt
Function was successful.
The fields in the jobInfoEnt structure have the following meaning:
jobId
The job ID that the LSF system assigned to the job.
user
The name of the user who submitted the job.
status
The current status of the job. Possible values are:
JOB_STAT_PEND
The job is pending, i.e., it has not been dispatched yet.
JOB_STAT_PSUSP
The pending job was suspended by its owner or the LSF system administrator.
JOB_STAT_RUN
The job is running.
JOB_STAT_SSUSP
The running job was suspended by the system because an execution host was overloaded or the queue run window closed. (See lsb_queueinfo(), lsb_hostinfo(), and lsb.queues.)
JOB_STAT_USUSP
The running job was suspended by its owner or the LSF system administrator.
JOB_STAT_EXIT
The job has terminated with a non-zero status - it may have been aborted due to an error in its execution, or killed by its owner or by the LSF system administrator.
JOB_STAT_DONE
The job has terminated with status 0.
JOB_STAT_UNKWN
The slave batch daemon (sbatchd) on the host on which the job is processed has lost contact with the master batch daemon (mbatchd).
reasonTb
Pending or suspending reasons of the job.
numReasons
Length of reasonTb vector.
reasons
The reason a job is pending or suspended.
If status is JOB_STAT_PEND, the values of reasons and subreasons are explained by lsb_pendreason(). If status is JOB_STAT_PSUSP, the values of reasons and subreasons are explained by lsb_suspreason().
When reasons is PEND_HOST_LOAD or SUSP_LOAD_REASON, subreasons indicates the load indices that are out of bounds. If reasons is PEND_HOST_LOAD, subreasons is the same as busySched in the hostInfoEnt structure; if reasons is SUSP_LOAD_REASON, subreasons is the same as busyStop in the hostInfoEnt structure. (See lsb_hostinfo().)
submitTime
The time the job was submitted, in seconds since 00:00:00 GMT, Jan. 1, 1970.
reserveTime
Time when job slots are reserved.
startTime
The time that the job started running, if it has been dispatched.
PredictedStartTime
Job's predicted start time.
endTime
The termination time of the job, if it has completed.
LastEvent
Last time event.
nextEvent
Next time event.
duration
Duration time (in minutes).
cpuTime
The CPU time that the job has used.
umask
The file creation mask when the job was submitted.
cwd
The current working directory when the job was submitted.
subHomeDir
Home directory on submission host.
fromHost
The name of the host from which the job was submitted.
exHosts
The array of names of hosts on which the job executes.
numExHosts
The number of hosts on which the job executes.
cpuFactor
The CPU factor for normalizing CPU and wall clock time limits.
nIdx
The number of load indices in the loadSched and loadStop arrays.
loadSched & loadStop
The loadSched and loadStop arrays are assigned to the job according to those of the queue and hosts to control job suspension and resumption.
The values in the loadSched array specify the thresholds for the corresponding load indices. Only if the current values of all specified load indices of a host are within (below or above, depending on the meaning of the load index) their corresponding thresholds may the suspended job be resumed on this host.
Similarly, the values in the loadStop array specify the thresholds for job suspension; if any of the current load index values of the host crosses its threshold, the job will be suspended.
For an explanation of the entries in the loadSched and loadStop arrays, see lsb_hostinfo().
submit
Structure for lsb_submit() call.
exitStatus
Job exit status.
execUid
Mapped UNIX user ID on the execution host.
execHome
Home directory for the job on the execution host.
execCwd
Current working directory for the job on the execution host.
execUsername
Mapped user name on the execution host.
jRusageUpdateTime
Time of the last job resource usage update.
jRusage
Contains resource usage information for the job.
jType
Job type.
parentGroup
The parent job group of a job or job group.
jName
if jType is JGRP_NODE_GROUP, then it is the job group name. Otherwise, it is the job name.
counter[NUM_JGRP_COUNTERS]
Index into the counter array. Only used for job arrays:
- JGRP_COUNT_NJOBS-total jobs in the array
- JGRP_COUNT_PEND-number of pending jobs in the array
- JGRP_COUNT_NPSUSP-number of held jobs in the array
- JGRP_COUNT_NRUN-number of running jobs in the array
- JGRP_COUNT_NSSUSP-number of jobs suspended by the system in the array
- JGRP_COUNT_NUSUSP-number of jobs suspended by the user in the array
- JGRP_COUNT_NEXIT-number of exited jobs in the array
- JGRP_COUNT_NDONE-number of successfully completed jobs
- JGRP_COUNT_NJOBS_SLOTS-total slots in the array
- JGRP_COUNT_PEND_SLOTS-number of pending slots in the array
- JGRP_COUNT_RUN_SLOTS-number of running slots in the array
- JGRP_COUNT_SSUSP_SLOTS-number of slots suspended by the system in the array
- JGRP_COUNT_USUSP_SLOTS- number of slots suspended by the user in the array
- JGRP_COUNT_RESV_SLOTS-number of reserverd slots in the array
port
Service port of the job.
jobPriority
Job dynamic priority.
numExternalMsg
The number of external messages in the job.
jobExternalMsgReply
This structure contains the information required to define an external message reply.
clusterId
MultiCluster cluster ID. If clusterId is greater than or equal to 0, the job is a pending remote job, and lsb_readjobinfo checks for host_name@cluster_name. If host name is needed, it should be found in jInfoH->remoteHosts. If the remote host name is not available, the constant string remoteHost is used.
detailReason
Detailed reason field.
idleFactor
Idle factor for job exception handling. If the job idle factor is less than the specified threshold, LSF invokes LSF_SERVERDIR/eadmin to trigger the action for a job idle exception.
exceptMask
Job exception handling mask.
additionalInfo
Placement information of LSF HPC jobs.
exitInfo
Job termination reason. See lsbatch.h.
warningTimePeriod
Job warning time period in seconds; -1 if unspecified.
warningAction
Job warning action, SIGNAL | CHKPNT | command; NULL if unspecified.
chargedSAAP
SAAP charged for job.
execRusage
The rusage satisfied at job runtime.
rsvInActive
The time when advance reservation expired or was deleted.
numLicense
The number of licenses reported from License Scheduler.
licenseNames
License Scheduler license names.
aps
Absolute priority scheduling (APS) priority value.
adminAps
Absolute priority scheduling (APS) string set by administrators to denote static system APS value
adminFactorVal
Absolute priority scheduling (APS) string set by administrators to denote ADMIN factor APS value.
runTime
The real runtime on the execution host.
runTime
The real runtime on the execution host.
reserveCnt
How many kinds of resource are reserved by this job
reserveItem
The reserveItem structure contains the following fields:
resname
Name of the resource to reserve.
items
Details reservation information for each kind of resource.
value
Amount of reservation is made on each host. Some hosts may reserve 0.
nhost
The number of hosts to reserve this resource.
shared
Flag for shared or host-base resource
resizeMin
Pending resize min. 0, if no resize pending.
resizeMax
Pending resize max. 0, if no resize pending.
resizeReqTime
Time when pending request was issued.
jStartNumExHosts
Number of hosts when job starts.
jStartExHosts
Host list when job starts.
lastResizeTime
Last time when job allocation changed.
The fields in the submit structure:
submit
submit uses the submit structure provided by the invoker of lsb_submit().
See lsb_submit() for descriptions of the submit structure fields.
The fields in the runRusage structure have the following meaning:
runRusage
runRusage uses the jRusage structure to provide the total resident memory usage in KB, total virtual memory usage in KB, cumulative total CPU time in seconds and a list of currently active process group IDs and process IDs in a job.
The jRusage structure contains the following fields:
mem
Total resident memory usage in KB of all currently running processes in given process groups.
swap
Total virtual memory usage in KB of all currently running processes in given proces groups.
utime
Cumulative total user time in seconds.
stime
Cumulative total system time in seconds.
npids
Number of currently active processesin given process groups.
npgids
Number of currently active process groups.
pgid
Array of currently active process group ids.
nthreads
Number of currently active threads in given process groups.
The fields in the pidInfo structure have the following meaning:
pidInfo
Structure containing information about an active process.
pid
Process id.
ppid
Parent's process id.
pgid
Process group id.
jobid
Process Cray job id (only on Cray).
ERRORS
If there are no more records, then lsberrno is set to LSBE_EOF.
SEE ALSO
Related API
lsb_openjobinfo() - Opens a job information connection to mbatchd
lsb_openjobinfo_a() - Provides the name and number of jobs and hosts in mbatchd
lsb_closejobinfo() - Closes job information connection with mbatchd
lsb_hostinfo() - Returns informaton about job server hosts
lsb_pendreason() - Explains why a job is pending
lsb_queueinfo() - Returns information about batch queues
lsb_readjobinfo() - Returns the next job information record in mbatchd
lsb_suspreason() - Explains why a job was suspended
Equivalent line command
none
Files
lsb.queues
Platform Computing Inc.
www.platform.com |
Knowledge Center Contents Previous Next |