The following sections briefly describe CICS® recovery processing after:
Whenever possible, CICS attempts to contain the effects of a failure--typically by terminating only the offending task while all other tasks continue normally. The updates performed by a prematurely terminated task can be backed out automatically (see CICS recovery processing following a transaction failure).
Causes of communication failure include:
There are two aspects to processing following a communications failure:
If the link fails and is later reestablished, CICS and its partners use the SNA set-and-test-sequence-numbers (STSN) command to find out what they were doing (backout or commit) at the time of link failure. For more information on link failure, see the CICS Intercommunication Guide.
When communication fails, the communication system access method either retries the transmission or notifies CICS. If a retry is successful, CICS is not informed. Information about the error can be recorded by the operating system. If the retries are not successful, CICS is notified.
When CICS detects a communication failure, it gives control to one of two programs:
Both dummy and sample versions of these programs are provided by CICS. The dummy versions do nothing; they simply allow the default actions selected by CICS to proceed. The sample versions show how to write your own NEP or TEP to change the default actions.
The types of processing that might be in a user-written NEP or TEP are:
For more information about NEPs and TEPs, see Communication error processing.
Loss of communication between CICS regions can be caused by the loss of an MVS image in which CICS regions are running. If the regions are communicating over XCF/MRO links, the loss of connectivity may not be immediately apparent because XCF waits for a reply to a message it issues.
The loss of an MVS image in a sysplex is detected by XCF in another MVS, and XCF issues message IXC402D. If the failed MVS is running CICS regions connected through XCF/MRO to CICS regions in another MVS, tasks running in the active regions are initially suspended in an IRLINK WAIT state.
XCF/MRO-connected regions do not detect the loss of an MVS image and its resident CICS regions until an operator replies to the XCF IXC402D message. When the operator replies to IXC402D, the CICS interregion communication program, DFHIRP, is notified and the suspended tasks are abended, and MRO connections closed. Until the reply is issued to IXC402D, an INQUIRE CONNECTION command continues to show connections to regions in the failed MVS as in service and normal.
When the failed MVS image and its CICS regions are restarted, the interregion communication links are reopened automatically.
Causes of a transaction failure include:
During normal execution of a transaction working with recoverable resources, CICS stores recovery information in the system log. If the transaction fails, CICS uses the information from the system log to back out the changes made by the interrupted UOW. Recoverable resources are thus not left in a partially updated or inconsistent state. Backing out an individual transaction is called dynamic transaction backout.
After dynamic transaction backout has completed, the transaction can restart automatically without the operator being aware of it happening. This function is especially useful in those cases where the cause of transaction failure is temporary and an attempt to rerun the transaction is likely to succeed (for example, DL/I program isolation deadlock). The conditions when a transaction can be automatically restarted are described under Abnormal termination of a task.
If dynamic transaction backout fails, perhaps because of an I/O error on a VSAM data set, CICS backout failure processing shunts the UOW and converts the locks that are held on the backout-failed records into retained locks. The data set remains open for use, allowing the shunted UOW to be retried. If backout keeps failing because the data set is damaged, you can create a new data set using a backup copy and then perform forward recovery, using a utility such as CICSVR. When the data set is recovered, retry the shunted unit of work to complete the failed backout and release the locks.
Unit of work recovery and abend processing gives more details about CICS processing of a transaction failure.
Causes of a system failure include:
During normal execution, CICS stores recovery information on its system log stream, which is managed by the MVS system logger. If you specify START=AUTO, CICS automatically performs an emergency restart when it restarts after a system failure.
During an emergency restart, the CICS log manager reads the system log backward and passes information to the CICS recovery manager.
The CICS recovery manager then uses the information retrieved from the system log to:
A special case of CICS processing following a system failure is covered in CICS emergency restart.
[[ Contents Previous Page | Next Page Index ]]