Recovery functions and interfaces

This section describes the functions and interfaces provided by CICS® for recovery after a communication failure, or a CICS system failure.

Important

Not all CICS releases provide the same level of support; this section describes MRO and APPC parallel-session connections to other CICS Transaction Server for z/OS® systems. Much of it applies also to other types of connection, but with some restrictions. For information about the restrictions for connections to non-CICS Transaction Server for z/OS systems, and for LU6.1 and APPC single-session connections, see Connections that do not fully support shunting.

This section also assumes that each CICS system is restarted correctly (that is, that AUTO is coded on the START system initialization parameter). If an initial start is performed there are implications for connected systems; these are described in Initial and cold starts.

Recovery functions

If CICS is left in-doubt about a unit of work due to a communication failure, it can do one of two things (how you can influence which of the two actions CICS takes is described in The in-doubt attributes of the transaction definition):

There is a trade-off between the two functions: the suspension of in-doubt UOWs causes updated data to be locked against subsequent access; this disadvantage has to be weighed against the possibility of corruption of the consistency of data, which could result from taking unilateral decisions. When unilateral decisions are taken, there may be application-dependent processes, such as reconciliation jobs, that can restore consistency, but there is no general method that can be provided by CICS.

Recovery interfaces

This section summarizes the resource definition options, system programming commands, and CICS-supplied transactions that you can use to control and investigate units of work that fail during the in-doubt period. For definitive information about defining resources, system programming commands, and CEMT transactions, see the CICS Resource Definition Guide, the CICS System Programming Reference manual, and the CICS Supplied Transactions manual, respectively.

The in-doubt attributes of the transaction definition

You can control the action that CICS takes after a communication failure during the in-doubt period by specifying in-doubt attributes when you define the transaction, using the WAIT, WAITTIME, and ACTION options of the TRANSACTION definition. These options are honored when communication is lost with the coordinator and the UOW is in the in-doubt period.

WAIT({YES|NO})
Specifies whether or not a unit of work is to wait, pending recovery from a failure that occurred after it had entered the in-doubt period, before taking the action specified by ACTION.
YES
The UOW is to wait, pending recovery from the failure, to resolve its in-doubt state and determine whether recoverable resources are to be backed out or committed. In other words, it is to be shunted.
NO
The UOW is not to wait. CICS takes immediately whatever action is specified on the ACTION attribute.
Note:
The setting of the WAIT option can be overridden by other system settings--see the description of DEFINE TRANSACTION in the CICS Resource Definition Guide.
WAITTIME({00,00,00|dd,hh,mm})
Specifies, if WAIT=YES, how long the transaction is to wait, before taking the action specified by ACTION.

You can use WAIT and WAITTIME to allow an opportunity for normal recovery and resynchronization to take place, while ensuring that a unit of work releases locks within a reasonable time.

ACTION({BACKOUT|COMMIT})
Specifies the action to be taken when communication with the coordinator of the unit of work is lost, and the UOW has entered the in-doubt period.
BACKOUT
All changes made to recoverable resources are backed out, and the resources are returned to the state they were in before the start of the UOW.
COMMIT
All changes made to recoverable resources are committed and the UOW is marked as completed.

The action is dependent on the WAIT attribute. If WAIT specifies YES, ACTION has no effect unless the interval specified on the WAITTIME option expires before recovery from the failure.

Whether you specify BACKOUT or COMMIT is likely to depend on the kinds of changes that the transaction makes to resources in the remote system--see Specifying in-doubt attributes--an example.

Specifying in-doubt attributes--an example

As an illustration of specifying the in-doubt attributes of a transaction, consider the following simple example:

Example

A transaction is given a part number; it checks the entry in a local file to see whether the part is in stock, decrements the quantity in stock by updating the stock file, and sends a record to a remote transient data queue to initiate the dispatch of the part.

The update to the local file should take place only if the addition is made to the remote transient data (TD) queue, and the TD queue should only be updated if an update is made to the local file. The first step towards achieving this is to specify both the file and the TD queue as recoverable resources. This ensures synchronization of the changes to the resources (that is, both changes will either be backed out or committed) in all cases except for a session or system failure during the in-doubt period of syncpoint processing.

To deal with a communications failure--for example, a failure of the remote system--during the in-doubt period, specify on the local transaction definition, WAIT(YES), ACTION(BACKOUT), and a WAITTIME long enough to allow the remote system to be recycled. This enables resynchronization to take place automatically, if communication is restored within the specified time limit. During the WAITTIME period, until resynchronization takes place, the local UOW is shunted, and a lock is held on the stock-file record.

If communication is not restored within the time limit, changes made to the stock file on the local system are backed out. The addition to the TD queue on the remote system may or may not have been committed; this must be investigated after communication is restored.

INQUIRE commands

The CEMT and EXEC CICS interfaces provide a set of inquiry commands that you can use to investigate the execution of distributed units of work, and diagnose problems. In summary, the commands are:

INQUIRE CONNECTION RECOVSTATUS
Use it to find out whether any resynchronization work is outstanding between the local system and the connected system. The returned CVDA values are:
NORECOVDATA
Neither side has recovery information outstanding.
NOTAPPLIC
This is not an APPC parallel-session nor a CICS-to-CICS MRO connection, and does not support two-phase commit protocols.
NRS
CICS does not have recovery outstanding for the connection, but the partner may have.
RECOVDATA
There are in-doubt units of work associated with the connection, or there are outstanding resyncs awaiting FORGET on the connection. Resynchronization takes place when the connection next becomes active, or when the UOW is unshunted.
INQUIRE CONNECTION PENDSTATUS
Use it to discover whether there are any UOWs for which resynchronization is impossible because of an initial start by the connected system.
INQUIRE CONNECTION XLNSTATUS (APPC parallel-sessions only)
Use it to discover whether the link is currently able to support syncpoint (synclevel 2) work. See The exchange lognames process for more information.
INQUIRE UOW
Use it to discover why a unit of work is waiting or shunted. If the reason is a connection failure (the WAITCAUSE option returns a CVDA value of CONNECTION), the SYSID and LINK options return the sysid and netname of the remote system that caused the UOW to wait or be shunted.

Note that INQUIRE UOW returns information about a local UOW--that is, for a distributed UOW it returns information only about the work required on the local system. You can assemble information about a distributed UOW by matching the network-wide identifier returned in the NETUOWID field against the identifiers of local UOWs on other systems. For an example of how to do this, see Resolving a resynchronization failure.

INQUIRE UOWLINK
This command allows you to inquire about the resynchronization needs of individual UOWs. Use it to discover information about connections involved in a distributed UOW.

For a local UOW, INQUIRE UOWLINK returns a list of tokens (UOW-links) representing connections to the systems that are involved in the distributed UOW. For each UOW-link, INQUIRE UOWLINK returns:

For examples of the use of these commands to diagnose problems with distributed units of work, see Problem determination examples.

SET CONNECTION command

In exceptional cases, you may need to override the in-doubt action normally controlled by the transaction definition. For example, a connected system may take longer than expected to restart. If the connected system is the coordinator of any UOWs, you can use the EXEC CICS or CEMT SET CONNECTION UOWACTION(FORCE|COMMIT|BACKOUT) command to force the UOWs to take a local, unilateral decision to commit or back out.

The following commands are described in The exchange lognames process and Managing connection definitions:

Related concepts
Syncpoint exchanges
Initial and cold starts
Connections that do not fully support shunting
APPC connection quiesce processing
Related tasks
Managing connection definitions
Problem determination
Related reference
Terminology
[[ Contents Previous Page | Next Page Index ]]