As the job scheduler and grid endpoint process a
batch job, the job state updates in the job scheduler database.
The diagram shows the relationship between states, and the following
table lists the possible batch job states and the events that trigger
transitions between states. You can view the current state of a batch
job from the job management console,
or retrieve it using the command line or Enterprise JavaBeans (EJB) interface. If a failure
occurs before a batch step initializes, then the batch job goes into
execution failed state. Otherwise, it goes into restartable state.

Table 1. Batch job states. The
table includes each batch start state with its client command, system
action, special condition, numeric return code, and end state. An
empty table cell indicates that there is not a client command, system
action, condition, or return code for the start state.Start state |
Client command |
System action |
Special condition |
Return code |
End state |
non-existent (delayed submit) |
submit |
|
|
|
pending submit |
non-existent |
submit |
|
|
|
submitted |
submitted |
|
dispatch |
|
0 |
executing |
submitted |
cancel |
|
|
0 |
restartable |
executing |
stop |
|
|
0 |
restartable |
executing |
cancel |
|
|
4 |
cancel_pending |
executing |
|
caught application error* |
|
4 |
restartable |
executing |
|
|
Infrastructure problem** |
4 |
restartable/unknown |
executing |
suspend |
|
|
4 |
suspend_pending |
executing |
|
job completed |
|
4 |
ended |
executing |
|
|
Infrastructure problem in job setup*** |
4 |
restartable |
suspend_pending |
|
checkpoint |
|
2 |
suspended |
suspend_pending |
|
|
Infrastructure problem** |
2 |
restartable/unknown |
suspended |
resume |
|
|
5 |
resume_pending |
suspended |
cancel |
|
|
5 |
cancel_pending |
suspended |
|
|
Infrastructure problem** |
5 |
restartable/unknown |
resume_pending |
|
job resumed |
|
2 |
executing |
resume_pending |
|
|
Infrastructure problem** |
2 |
restartable/unknown |
restartable |
restart |
|
|
8 |
submitted |
cancel_pending |
|
job canceled |
|
1 |
restartable |
cancel_pending |
|
|
Infrastructure problem** |
1 |
restartable/unknown |
restartable |
purge |
|
|
8 |
non-existent |
execution_failed |
purge |
|
|
9 |
non-existent |
ended |
purge |
|
|
7 |
non-existent |
Table 2. Notes for the batch job states
table. The table includes each note with a description.Note |
Description |
* Application error |
The batch application failed at run time. The grid endpoints detected
this failure. |
** Infrastructure problem |
An unexpected error has occurred. See the following
example for infrastructure problem in job setup. |
*** Infrastructure problem in job setup |
An unexpected error that occurs when a batch
job is set up for the first time by the grid endpoints. For example,
if there is an unexpected database failure, the job goes into execution_failed
state. - In this condition, the batch job is run for the first time and
no steps are processed yet. Batch jobs go into the restartable state
under most failure conditions so that they can restart from checkpointed
positions if the failure condition can be overcome. However, in this
instance of a failure condition, a batch job goes into execution_failed
state and cannot be restarted. Since this situation is a job setup
scenario and work is not yet processed by the batch job, batch work
is not lost as a result of failure.
- If jobs are in a non-final state on the endpoint, the scheduler
puts the jobs into an unknown state under two conditions. The conditions
are that the endpoint loses communications or the endpoint goes down.
If the endpoint comes back up, the scheduler synchronizes the job
status with the endpoint. If the endpoint goes down, all batch jobs
are put into a restartable state and all compute- intensive jobs in
an execution failed state. If the endpoint has only lost communication
with the scheduler and the jobs continue to run, the scheduler updates
its status. The status update is the final state of the jobs running
on the endpoint at that point.
|