This section describes the functions and interfaces provided by CICS® for recovery after a communication failure, or a CICS system failure.
Not all CICS releases provide the same level of support; this section describes MRO and APPC parallel-session connections to other CICS Transaction Server for z/OS® systems. Much of it applies also to other types of connection, but with some restrictions. For information about the restrictions for connections to non-CICS Transaction Server for z/OS systems, and for LU6.1 and APPC single-session connections, see Connections that do not fully support shunting.
This section also assumes that each CICS system is restarted correctly (that is, that AUTO is coded on the START system initialization parameter). If an initial start is performed there are implications for connected systems; these are described in Initial and cold starts.
If CICS is left in-doubt about a unit of work due to a communication failure, it can do one of two things (how you can influence which of the two actions CICS takes is described in The in-doubt attributes of the transaction definition):
There is a trade-off between the two functions: the suspension of in-doubt UOWs causes updated data to be locked against subsequent access; this disadvantage has to be weighed against the possibility of corruption of the consistency of data, which could result from taking unilateral decisions. When unilateral decisions are taken, there may be application-dependent processes, such as reconciliation jobs, that can restore consistency, but there is no general method that can be provided by CICS.
This section summarizes the resource definition options, system programming commands, and CICS-supplied transactions that you can use to control and investigate units of work that fail during the in-doubt period. For definitive information about defining resources, system programming commands, and CEMT transactions, see the CICS Resource Definition Guide, the CICS System Programming Reference manual, and the CICS Supplied Transactions manual, respectively.
You can control the action that CICS takes after a communication failure during the in-doubt period by specifying in-doubt attributes when you define the transaction, using the WAIT, WAITTIME, and ACTION options of the TRANSACTION definition. These options are honored when communication is lost with the coordinator and the UOW is in the in-doubt period.
You can use WAIT and WAITTIME to allow an opportunity for normal recovery and resynchronization to take place, while ensuring that a unit of work releases locks within a reasonable time.
The action is dependent on the WAIT attribute. If WAIT specifies YES, ACTION has no effect unless the interval specified on the WAITTIME option expires before recovery from the failure.
Whether you specify BACKOUT or COMMIT is likely to depend on the kinds of changes that the transaction makes to resources in the remote system--see Specifying in-doubt attributes--an example.
As an illustration of specifying the in-doubt attributes of a transaction, consider the following simple example:
A transaction is given a part number; it checks the entry in a local file to see whether the part is in stock, decrements the quantity in stock by updating the stock file, and sends a record to a remote transient data queue to initiate the dispatch of the part.
The update to the local file should take place only if the addition is made to the remote transient data (TD) queue, and the TD queue should only be updated if an update is made to the local file. The first step towards achieving this is to specify both the file and the TD queue as recoverable resources. This ensures synchronization of the changes to the resources (that is, both changes will either be backed out or committed) in all cases except for a session or system failure during the in-doubt period of syncpoint processing.
To deal with a communications failure--for example, a failure of the remote system--during the in-doubt period, specify on the local transaction definition, WAIT(YES), ACTION(BACKOUT), and a WAITTIME long enough to allow the remote system to be recycled. This enables resynchronization to take place automatically, if communication is restored within the specified time limit. During the WAITTIME period, until resynchronization takes place, the local UOW is shunted, and a lock is held on the stock-file record.
If communication is not restored within the time limit, changes made to the stock file on the local system are backed out. The addition to the TD queue on the remote system may or may not have been committed; this must be investigated after communication is restored.
The CEMT and EXEC CICS interfaces provide a set of inquiry commands that you can use to investigate the execution of distributed units of work, and diagnose problems. In summary, the commands are:
Note that INQUIRE UOW returns information about a local UOW--that is, for a distributed UOW it returns information only about the work required on the local system. You can assemble information about a distributed UOW by matching the network-wide identifier returned in the NETUOWID field against the identifiers of local UOWs on other systems. For an example of how to do this, see Resolving a resynchronization failure.
For a local UOW, INQUIRE UOWLINK returns a list of tokens (UOW-links) representing connections to the systems that are involved in the distributed UOW. For each UOW-link, INQUIRE UOWLINK returns:
For examples of the use of these commands to diagnose problems with distributed units of work, see Problem determination examples.
In exceptional cases, you may need to override the in-doubt action normally controlled by the transaction definition. For example, a connected system may take longer than expected to restart. If the connected system is the coordinator of any UOWs, you can use the EXEC CICS or CEMT SET CONNECTION UOWACTION(FORCE|COMMIT|BACKOUT) command to force the UOWs to take a local, unilateral decision to commit or back out.
The following commands are described in The exchange lognames process and Managing connection definitions: