Unit of work concepts

When resources are being changed, there comes a point when the changes are complete and do not need backout if a failure occurs later.

Unit of work

The period between the start of a particular set of changes and the point at which they are complete is called a unit of work (UOW). The unit of work is a fundamental concept of all CICS® backout mechanisms.

From the application designer's point of view, a UOW is a sequence of actions that needs to be complete before any of the individual actions can be regarded as complete. To ensure data integrity, a unit of work must be atomic, consistent, isolated, and durable (see ACID properties in the CICS Transaction Server for z/OS® Glossary).

The CICS recovery manager operates with units of work. If a transaction that consists of multiple UOWs fails, or the CICS region fails, committed UOWs are not backed out.

Unit of work states

A unit of work can be in one of the following states:

The shunted state

A shunted UOW is one awaiting resolution of an in-doubt failure, a commit failure, or a backout failure. (See Unit of work recovery for descriptions of these types of UOW failure.) The CICS recovery manager attempts to complete a shunted UOW when the failure that caused it to be shunted has been resolved.

A UOW can be unshunted and then shunted again (in theory, any number of times). For example, a UOW could go through the following stages:

  1. A UOW fails in-doubt and is shunted.
  2. After resynchronization, CICS finds that the decision is to back out the in-doubt UOW.
  3. Recovery manager unshunts the UOW to perform backout.
  4. If backout fails, it is shunted again.
  5. Recovery manager unshunts the UOW to retry the backout.
  6. Steps 4 and 5 can occur several times until the backout succeeds.

These situations can persist for some time, depending on how long it takes to resolve the cause of the failure. Because it is undesirable for transaction resources to be held up for too long, CICS attempts to release as many resources as possible while a UOW is shunted. This is generally achieved by abending the user task to which the UOW belongs, resulting in the release of the following:

The resources CICS retains include:

Locks

For files opened in RLS mode, VSAM maintains a single central lock structure using the lock-assist mechanism of the MVS™ coupling facility. This central lock structure provides sysplex-wide locking at a record level--control interval (CI) locking is not used.

The locks for files accessed in non-RLS mode, the scope of which is limited to a single CICS region, are file-control managed locks. Initially, when CICS processes a read-for-update request, CICS obtains a CI lock. File control then issues an ENQ request to the enqueue domain to acquire a CICS lock on the specific record. This enables file control to notify VSAM to release the CI lock before returning control to the application program. Releasing the CI lock minimizes the potential for deadlocks to occur.

For coupling facility data tables updated under the locking model, the coupling facility data table server stores the lock with its record in the CFDT. As in the case of RLS locks, storing the lock with its record in the coupling facility list structure that holds the coupling facility data table ensures sysplex-wide locking at record level.

For both RLS and non-RLS recoverable files, CICS releases all locks on completion of a unit of work. For recoverable coupling facility data tables, the locks are released on completion of a unit of work by the CFDT server.

Active and retained states for locks

CICS supports active and retained states for locks.

When a lock is first acquired, it is an active lock. It remains an active lock until successful completion of the unit of work, when it is released, or is converted into a retained lock if the unit of work fails, or for a CICS or SMSVSAM failure:

Converting active locks into retained locks not only protects data integrity. It also ensures that new requests for locks owned by the failed unit of work do not wait, but instead are rejected with the LOCKED response.

Synchronization points

The end of a UOW is indicated to CICS by a synchronization point (usually abbreviated to syncpoint).

A syncpoint arises in the following ways:

It follows from this that a unit of work starts:

A UOW that does not change a recoverable resource has no meaningful effect for the CICS recovery mechanisms. Nonrecoverable resources are never backed out.

A unit of work can also be ended by backout, which causes a syncpoint in one of the following ways:

Examples

In Figure 1, task A is a nonconversational (or pseudoconversational) task with one UOW, and task B is a multiple UOW task (typically a conversational task in which each UOW accepts new data from the user). The figure shows how UOWs end at syncpoints. During the task, the application program can issue syncpoints explicitly, and, at the end, CICS issues a syncpoint.

Figure 1. Units of work and syncpoints
 This diagram shows the elapsed time, as two lines, for task A and task B. In each case, the ends of the line indicate the start and end of the tasks and, in each case, end-of-task also shows a syncpoint. However, the whole of task A from start to finish represents only one UOW, and the end-of-task syncpoint is the only one. The line for Task B. on the other hand, is much longer than task A and is divided into a number of separate UOWs, with the task taking a syncpoint at three intervals in addition to the end-of-task syncpoint.

Figure 2 shows that database changes made by a task are not committed until a syncpoint is executed. If task processing is interrupted because of a failure of any kind, changes made within the abending UOW are automatically backed out.

If there is a system failure at time X:

Figure 2. Backout of units of work
 This diagram shows the elapsed time, as three horizontal lines, for tasks A, B, and C, showing that each task modifies some recoverable resources. Tasks A and C do not take explicit syncpoints and rely on the implicit syncpoint taken by CICS at end-of-task. Task B comprises four separate UOWs, with each UOW performing updates, indicated as Mod1 through Mod4. Before tasks B and C can complete, there is a system failure (represented by a vertical line that cuts through tasks B and C). Task A completes before the system failure, therefore the syncpoint is taken and the updates are committed. In task B the system failure occurs during UOW 3, and in task C the failure occurs after both updates but before the end-of-task syncpoint. The recovery for tasks B and C is as indicated in the text.
[[ Contents Previous Page | Next Page Index ]]