In service integration, there can be exception conditions
that do not require a messaging engine to restart, exception conditions
that require an automatic restart of the messaging engine, exception
conditions that are detected by explicit health monitoring and handled
by the HAManager, and
exception conditions that require user intervention.
Recovery with the messaging engine running
A
messaging engine can handle certain exception conditions without requiring
the messaging engine to restart or fail over. The exception condition
is corrected automatically and an entry is added to the system error
log that explains the exception and suggests any user actions. The
messaging engine continues to run and to honor the quality of service
specified for the messages it is processing.
Recovery with automatic restart of the messaging engine
(local exceptions)
A messaging engine can recover from local
exceptions by an automatic restart of the messaging engine, either
on its current server or on an alternative server. For example, if
a messaging engine cannot connect to its data store, possibly the
server in which the messaging engine runs cannot create a connection
to the data store, but another server in the same cluster can. In
a high availability configuration, that is, failover is enabled, the HAManager will fail over the
messaging engine to a new server and shut down the server on which
it was running. For a configuration without failover, for example
a single server rather than a cluster, the server is shut down and
the messaging engine is restarted only after the server is restarted.
Recovery from exceptions detected by explicit health
monitoring
A messaging engine cannot detect exceptions such
as a thread spinning (when the thread becomes trapped in a loop and
no longer performs useful work), or a deadlock (when two threads are
blocking each other), but explicit health monitoring can. The HAManager provides such monitoring,
and periodically tests the health of the messaging engine. If the HAManager detects that the
messaging engine cannot run properly, the HAManager shuts down the server
that is hosting the messaging engine. If the server is in a cluster,
the HAManager restarts
the messaging engine on an alternative server, if the policy of the
messaging engine allows failover. The node agent will restart the
server that was shut down. If the server is not in a cluster, the
server must be restarted, then the messaging engine will restart on
that server.
Recovery that requires user intervention (global exceptions)
A
messaging engine cannot recover from global exceptions by restarting
or failing over the messaging engine. For example, if the data store
for a messaging engine becomes corrupted, the problem is not resolved
by running the messaging engine on a different server because it encounters
the same problem. If a messaging engine in this situation were to
be failed over, the messaging engine would be continually failed over
because it could not run in any server. There would be unwanted disruption
to the cluster as servers attempted to run the messaging engine and
were shut down. To avoid such a situation, if a global exception occurs,
the messaging engine logs an error, stops processing messages, and
is not failed over. The messaging engine cannot be restarted until
you correct the global exception condition and restart the server.