Knowledge Center         Contents    Previous  Next    
Platform Computing Corp.

lsb_readjobinfo()

Returns the next job information record in mbatchd.

DESCRIPTION

lsb_readjobinfo() reads the number of records defined by the more parameter. The more parameter receives its value from either lsb_openjobinfo() or lsb_openjobinfo_a(). Each time lsb_readjobinfo() is called, it returns one record from mbatchd. Use lsb_readjobinfo() in a loop and use more to determine how many times to repeat the loop to retrieve job information records.

SYNOPSIS

#include <lsf/lsbatch.h> 
#include <time.h> 
#include <lsf/lsf.h> 
struct jobInfoEnt *lsb_readjobinfo(int *) 
struct jobInfoEnt { 
    LS_LONG_INT jobId; 
    char    *user; 
    int     status; 
    int     *reasonTb; 
    int     numReasons; 
    int     reasons; 
    int     subreasons; 
    int     jobPid; 
    time_t  submitTime; 
    time_t  reserveTime; 
    time_t  startTime; 
    time_t  predictedStartTime; 
    time_t  endTime; 
    time_t  lastEvent; 
    time_t  nextEvent; 
    int     duration; 
    float   cpuTime; 
    int     umask; 
    char    *cwd; 
    char    *subHomeDir; 
    char    *fromHost; 
    char    **exHosts; 
    int     numExHosts; 
    float   cpuFactor; 
    int     nIdx; 
    float   *loadSched; 
    float   *loadStop; 
    struct  submit submit; 
    int     exitStatus; 
    int     execUid; 
    char    *execHome; 
    char    *execCwd; 
    char    *execUsername; 
    time_t  jRusageUpdateTime; 
    struct  jRusage runRusage; 
    int     jType; 
    char    *parentGroup; 
    char    *jName; 
    int     counter[NUM_JGRP_COUNTERS]; 
    u_short port; 
    int     jobPriority; 
    int numExternalMsg; 
    struct jobExternalMsgReply **externalMsg; 
    int     clusterId; 
    char   *detailReason; 
    float   idleFactor; 
    int     exceptMask; 
    char   *additionalInfo; 
    int     exitInfo; 
    int    warningTimePeriod; 
    char   *warningAction; 
    char   *chargedSAAP; 
    char   *execRusage; 
    time_t rsvInActive; 
    int    numLicense; 
    char   **licenseNames; 
    float  aps; 
    float  adminAps; 
    int runTime 
    int reserveCnt 
    struct reserveItem *items;  
    float  adminFactorVal; 
    int    resizeMin 
    int    resizeMax 
    time_t resizeReqTime 
    int    jStartNumExHosts 
    char   **jStartExHosts 
    time_t lastResizeTime 
}; 
struct submit { 
    int     options; 
    int     options2; 
    char    *jobName; 
    char    *queue; 
    int     numAskedHosts; 
    char    **askedHosts; 
    char    *resReq; 
    int     rLimits[LSF_RLIM_NLIMITS]; 
    char    *hostSpec; 
    int     numProcessors; 
    char    *dependCond; 
    char    *timeEvent; 
    time_t  beginTime; 
    time_t  termTime; 
    int     sigValue; 
    char    *inFile; 
    char    *outFile; 
    char    *errFile; 
    char    *command; 
    char    *newCommand; 
    time_t  chkpntPeriod; 
    char    *chkpntDir; 
    int     nxf; 
    struct xFile *xf; 
    char    *preExecCmd; 
    char    *mailUser; 
    int     delOptions; 
    int     delOptions2; 
    char    *projectName; 
    int     maxNumProcessors; 
    char    *loginShell; 
    char    *userGroup; 
    char    *exceptList; 
    int     userPriority; 
    char    *rsvId; 
    char    *jobGroup; 
    char    *sla; 
    char    *extsched; 
    int     warningTimePeriod; 
    char    *warningAction; 
    char    *licenseProject; 
    int     options3; 
    int     delOptions3; 
    char    *app; 
    int     jsdlFlag; 
    char    *jsdlDoc; 
    void    *correlator; 
    char    *apsString; 
    char    *postExecCmd; 
    char    *cwd; 
    int     runtimeEstimation; 
    char    *requeueEValues; 
    int     initChkpntPeriod; 
    int     migThreshold; 
    char    *notifyCmd; 
}; 
struct jRusage{ 
    int mem; 
    int swap; 
    int utime; 
    int stime; 
    int npids; 
    struct pidInfo *pidInfo; 
    int npgids; 
    int *pgid; 
    int nthreads; 
}; 
struct pidInfo{ 
    int pid; 
    int ppid; 
    int pgid; 
    int jobid; 
}; 
struct reserveItem { 
    char    *resName;  
    int     nHost;  
    float   *value;  
    int     shared;  
}; 

PARAMETERS

*more

Number of job records in the master batch daemon.

RETURN VALUES

jobInfoEnt

Function was successful.

The fields in the jobInfoEnt structure have the following meaning:

jobId

The job ID that the LSF system assigned to the job.

user

The name of the user who submitted the job.

status

The current status of the job. Possible values are:

JOB_STAT_PEND

The job is pending, i.e., it has not been dispatched yet.

JOB_STAT_PSUSP

The pending job was suspended by its owner or the LSF system administrator.

JOB_STAT_RUN

The job is running.

JOB_STAT_SSUSP

The running job was suspended by the system because an execution host was overloaded or the queue run window closed. (See lsb_queueinfo(), lsb_hostinfo(), and lsb.queues.)

JOB_STAT_USUSP

The running job was suspended by its owner or the LSF system administrator.

JOB_STAT_EXIT

The job has terminated with a non-zero status - it may have been aborted due to an error in its execution, or killed by its owner or by the LSF system administrator.

JOB_STAT_DONE

The job has terminated with status 0.

JOB_STAT_PDONE

Post job process done successfully.

JOB_STAT_PERR

TPost job process has error.

JOB_STAT_WAIT

Chunk job waiting its turn to execute.

JOB_STAT_UNKWN

The slave batch daemon (sbatchd) on the host on which the job is processed has lost contact with the master batch daemon (mbatchd).

reasonTb

Pending or suspending reasons of the job.

numReasons

Length of reasonTb vector.

reasons

The reason a job is pending or suspended.

subreasons

The reason a job is pending or suspended. If status is JOB_STAT_PEND, the values of reasons and subreasons are explained by lsb_pendreason(). If status is JOB_STAT_PSUSP, the values of reasons and subreasons are explained by lsb_suspreason().

When reasons is PEND_HOST_LOAD or SUSP_LOAD_REASON, subreasons indicates the load indices that are out of bounds. If reasons is PEND_HOST_LOAD, subreasons is the same as busySched in the hostInfoEnt structure; if reasons is SUSP_LOAD_REASON, subreasons is the same as busyStop in the hostInfoEnt structure. (See lsb_hostinfo().)

jobPid

The job process ID.

submitTime

The time the job was submitted, in seconds since 00:00:00 GMT, Jan. 1, 1970.

reserveTime

Time when job slots are reserved

startTime

The time that the job started running, if it has been dispatched.

PredictedStartTime

Job's predicted start time

endTime

The termination time of the job, if it has completed.

LastEvent

Last time event.

nextEvent

Next time event.

duration

Duration time (minutes).

cpuTime

The CPU time that the job has used.

umask

The file creation mask when the job was submitted.

cwd

The current working directory when the job was submitted.

subHomeDir

Home directory on submission host.

fromHost

The name of the host from which the job was submitted.

exHosts

The array of names of hosts on which the job executes.

numExHosts

The number of hosts on which the job executes.

cpuFactor

The CPU factor for normalizing CPU and wall clock time limits.

nIdx

The number of load indices in the loadSched and loadStop arrays.

loadSched & loadStop

The loadSched and loadStop arrays are assigned to the job according to those of the queue and hosts to control job suspension and resumption.

The values in the loadSched array specify the thresholds for the corresponding load indices. Only if the current values of all specified load indices of a host are within (below or above, depending on the meaning of the load index) their corresponding thresholds may the suspended job be resumed on this host.

Similarly, the values in the loadStop array specify the thresholds for job suspension; if any of the current load index values of the host crosses its threshold, the job will be suspended.

For an explanation of the entries in the loadSched and loadStop arrays, see lsb_hostinfo().

submit

Structure for lsb_submit() call.

exitStatus

Job exit status.

execUid

Mapped UNIX user ID on the execution host.

execHome

Home directory for the job on the execution host.

execCwd

Current working directory for the job on the execution host.

execUsername

Mapped user name on the execution host.

jRusageUpdateTime

Time of the last job resource usage update.

jRusage

Contains resource usage information for the job.

jType

Job type.

parentGroup

The parent job group of a job or job group.

jName

if jType is JGRP_NODE_GROUP, then it is the job group name. Otherwise, it is the job name.

counter[NUM_JGRP_COUNTERS]

Index into the counter array. Only used for job arrays:

port

Service port of the job.

jobPriority

Job dynamic priority.

numExternalMsg

The number of external messages in the job.

jobExternalMsgReply

This structure contains the information required to define an external message reply.

clusterId

MultiCluster cluster ID. If clusterId is greater than or equal to 0, the job is a pending remote job, and lsb_readjobinfo checks for host_name@cluster_name. If host name is needed, it should be found in jInfoH->remoteHosts. If the remote host name is not available, the constant string remoteHost is used.

detailReason

Detailed reason field.

idleFactor

Idle factor for job exception handling. If the job idle factor is less than the specified threshold, LSF invokes LSF_SERVERDIR/eadmin to trigger the action for a job idle exception.

exceptMask

Job exception handling mask.

additionalInfo

Placement information of LSF HPC jobs.

exitInfo

Job termination reason. See lsbatch.h.

warningTimePeriod

Job warning time period in seconds; -1 if unspecified.

warningAction

Job warning action, SIGNAL | CHKPNT | command; NULL if unspecified.

chargedSAAP

SAAP charged for job.

execRusage

The rusage satisfied at job runtime.

rsvInActive

The time when advance reservation expired or was deleted.

numLicense

The number of licenses reported from License Scheduler.

licenseNames

License Scheduler license names.

aps

Absolute priority scheduling (APS) priority value.

adminAps

Absolute priority scheduling (APS) string set by administrators to denote static system APS value

adminFactorVal

Absolute priority scheduling (APS) string set by administrators to denote ADMIN factor APS value.

runTime

The real runtime on the execution host.

reserveCnt

How many kinds of resource are reserved by this job

reserveItem

The reserveItem structure contains the following fields:

resname

Name of the resource to reserve.

items

Details reservation information for each kind of resource.

value

Amount of reservation is made on each host. Some hosts may reserve 0.

nhost

The number of hosts to reserve this resource.

shared

Flag for shared or host-base resource.

resizeMin

Pending resize min. 0, if no resize pending.

resizeMax

Pending resize max. 0, if no resize pending.

resizeReqTime

Time when pending request was issued.

jStartNumExHosts

Number of hosts when job starts.

jStartExHosts

Host list when job starts.

lastResizeTime

Last time when job allocation changed.

The fields in the submit structure:

submit

submit uses the submit structure provided by the invoker of lsb_submit().

See lsb_submit() for descriptions of the submit structure fields.

The fields in the runRusage structure have the following meaning:

runRusage

runRusage uses the jRusage structure to provide the total resident memory usage in KB, total virtual memory usage inKB, cumulative total CPU time in seconds and a list of currently active process group IDs and process IDs in a job.

The jRusage structure contains the following fields:

mem

Total resident memory usage in KB of all currently running processes in given process groups.

swap

Total virtual memory usage in KB of all currently running processes in given proces groups.

utime

Cumulative total user time in seconds.

stime

Cumulative total system time in seconds.

npids

Number of currently active processesin given process groups.

npgids

Number of currently active process groups

pgid

Array of currently active process group ids

nthreads

Number of currently active threads in given process groups.

The fields in the pidInfo structure have the following meaning:

pidInfo

Structure containing information about an active process.

pid

Process id.

ppid

Parent's process id.

pgid

Process group id.

jobid

Process Cray job ID (only on Cray).

ERRORS

If there are no more records, then lsberrno is set to LSBE_EOF.

SEE ALSO

Related API

lsb_openjobinfo() - Opens a job information connection tombatchd

lsb_openjobinfo_a() - Provides the name and number of jobs and hosts inmbatchd

lsb_closejobinfo() - Closes job information connection with mbatchd

lsb_hostinfo() - Returns informaton about job server hosts

lsb_pendreason() - Explains why a job is pending

lsb_queueinfo() - Returns information about batch queues

lsb_suspreason() - Explains why a job was suspended

Equivalent line command

none

Files

lsb.queues


Platform Computing Inc.
www.platform.com
Knowledge Center         Contents    Previous  Next