This section describes:
In the event of a communications failure, the connected systems may resolve their local parts of a distributed unit of work in ways that are inconsistent with each other. To warn of this possibility, when a CICS region loses communication with a partner, for each session on which the UOW is in the in-doubt period, it issues a DFHRMxxxx message. The message may appear at the time of a session failure, a failure of the partner, or during emergency restart.
When the connection has been reestablished, on each affected session the UOW is unshunted, its state is determined, and another message is issued. For LUTYPE6.1 conversations, these messages may appear only on the initiator side.
All messages contain the following information, which enables them to be correlated:
The messages associated with intersystem session failure and recovery are shown in three figures. Table 18 and Table 19 show the messages that can be produced when contact is lost with the coordinator of the UOW: Table 18 shows the messages produced when WAIT(YES) is specified on the transaction definition and shunting is possible; Table 19 shows the messages produced when WAIT(NO) is specified, or when shunting is not possible. Table 20 shows the messages produced when contact is lost with a subordinate in the UOW. Full details are in the CICS Messages and Codes manual.
Sequence of messages | Circumstances | Messages issued | Meaning of messages |
---|---|---|---|
Stage 1 | Session failure | DFHRM0106 | Intersystem session failure. Resource changes will not be committed or backed out until session recovery. |
Stage 1 | System failure/restart | -- | -- |
Stage 2 | Session recovery successful. | DFHRM0108 | Intersystem session recovery. Suspended resource changes now being committed. |
Stage 2 | Session recovery successful. | DFHRM0109 | Intersystem session recovery. Suspended resource changes now being backed out. |
Stage 2 | Wait time exceeded or SET UOW ACTION issued. | DFHRM0104 DFHRM0105 | See next table. |
Stage 2 | SET CONNECTION NOTPENDING ¢ or XLNACTION (FORCE) ¢ or NORECOVDATA $ issued. | DFHRM0125 DFHRM0126 | Local resources committed or backed out. |
Stage 2 | Session recovery after a cold start of local resources. | DFHRM0209 | UOW backed out. |
Stage 2 | Session recovery after a cold start of local resources. | DFHRM0208 | UOW committed. |
Stage 2 | Session recovery error -- for example, partner cold-started. * | DFHRM0112 DFHRM0113 DFHRM0115 DFHRM0116 DFHRM0118 DFHRM0119 DFHRM0121 DFHRM0122 | Intersystem recovery error. Local resource changes are committed or backed out. |
Key:
|
Sequence of messages | Circumstances | Messages issued | Meaning of messages |
---|---|---|---|
Stage 1 | Session failure | DFHRM0104 DFHRM0105 | Intersystem session failure. Resource changes are being committed or backed out and may be out of sync with partner. |
Stage 1 | System failure/restart | -- | -- |
Stage 2 | Session recovery successful. | DFHRM0110 | Intersystem session recovery. Resource updates found to be synchronized. |
Stage 2 | Session recovery successful. | DFHRM0111 | Intersystem session recovery. Resource updates found to be out of sync. |
Stage 2 | SET CONNECTION NOTPENDING ¢ or XLNACTION (FORCE) ¢ or NORECOVDATA $ issued. | DFHRM0127 | SET NOTPENDING issued. |
Stage 2 | Session recovery error -- for example, partner cold-started. * | DFHRM0112 DFHRM0113 DFHRM0115 DFHRM0116 DFHRM0118 DFHRM0119 DFHRM0121 DFHRM0122 | Local resource changes committed or backed out. |
Key:
|
Sequence of messages | Circumstances | Messages issued | Meaning of messages |
---|---|---|---|
Stage 1 | UOW shunted due to failure of session to coordinator ¢ | -- | -- |
Stage 1 | Session failure | DFHRM0107 | Intersystem session failure. Notification of decision may not reach the remote system. |
Stage 1 | System failure/restart | -- | -- |
Stage 2 | Session recovery successful. | DFHRM0135 DFHRM0148 + | Intersystem session recovery. Resource updates found to be synchronized. |
Stage 2 | Session recovery successful. | DFHRM0110 | Intersystem session recovery. Resource updates found to be synchronized, after a unilateral decision on the remote system. |
Stage 2 | Session recovery successful. | DFHRM0111 DFHRM0124 + | Intersystem session recovery. Resource updates found to be out of sync, after a unilateral decision on the remote system. |
Stage 2 | SET CONNECTION NOTPENDING ¢ or XLNACTION (FORCE) ¢ or NORECOVDATA $ issued. | DFHRM0127 | SET NOTPENDING issued. |
Stage 2 | Session recovery error -- for example, partner cold-started. * | DFHRM0114 DFHRM0117 DFHRM0120 DFHRM0123 | Intersystem session recovery error. Resource changes may be out of sync. |
Key:
|
This section contains examples of how to resolve in-doubt and resynchronization failures.
This section is an example of how to resolve a unit of work that fails during the in-doubt period. It uses the following commands:
A user reports that their task has hung on region IYM51. A CEMT INQUIRE TASK command shows the following:
INQUIRE TASK STATUS: RESULTS - OVERTYPE TO MODIFY Tas(0000061) Tra(RTD1) Fac(S254) Sus Ter Pri( 001 ) Sta(TO) Use(CICSUSER) Uow(AB1DF09A54115600) Hty(ENQUEUE ) Hva(TDNQ ) Tas(0000064) Tra(CEMT) Fac(S255) Run Ter Pri( 255 ) Sta(TO) Use(CICSUSER) Uow(AB1DF16E3B78B403)
The hanging task is 61, tranid RTD1. It is waiting on an enqueue for a transient data resource. A CEMT INQUIRE UOWENQ command shows:
INQUIRE UOWENQ STATUS: RESULTS Uow(AB1DF0804B0F5801) Tra(RFS4) Tas(0000060) Ret Tsq Own Res(RMLTSQ ) Rle(008) Enq(00000000) Uow(AB1DF0804B0F5801) Tra(RFS4) Tas(0000060) Ret Dat Own Res(DCXISCG.IYLX1.RMLFILE ) Rle(021) Enq(00000000) Uow(AB1DF0804B0F5801) Tra(RFS4) Tas(0000060) Act Tdq Own Res(QILR ) Rle(004) Enq(00000000) Uow(AB1DF0804B0F5801) Tra(RFS4) Tas(0000060) Act Tdq Own Res(QILR ) Rle(004) Enq(00000000) Uow(AB1DF09A54115600) Tra(RTD1) Tas(0000061) Act Tdq Wai Res(QILR ) Rle(004) Enq(00000000)
In this instance, task 61 is the only waiter, and task 60 is the only owner, simplifying the task of identifying the enqueue owner. Task 60 owns one enqueue of type TSQUEUE, one of type DATASET, and two of type TDQ. These enqueues are owned on resources RMLTSQ, DCXISCG.IYLX1.RMLFILE and QILR respectively.
The CEMT INQUIRE TASK screen shows that task 60 has ended. You can use the CEMT INQUIRE UOW command to return information about the status of units of work that are associated with tasks which have ended, as well as with tasks that are still active.
INQUIRE UOW STATUS: RESULTS - OVERTYPE TO MODIFY Uow(AB1DD0FE5F219205) Inf Act Tra(CSSY) Tas(0000005) Age(00002569) Use(CICSUSER) Uow(AB1DD0FE5FEF9C00) Inf Act Tra(CSSY) Tas(0000006) Age(00002569) Use(CICSUSER) Uow(AB1DD0FE7FB82600) Inf Act Tra(CSTP) Tas(0000008) Age(00002569) Use(CICSUSER) Uow(AB1DD98323E1C005) Inf Act Tra(CSNC) Tas(0000018) Age(00000282) Use(CICSUSER) Uow(AB1DF0804B0F5801) Ind Shu Tra(RFS4) Tas(0000060) Age(00002699) Ter(S255) Netn(IGCS255 ) Use(CICSUSER) Con Lin(IYM52 ) Uow(AB1DF09A54115600) Inf Act Tra(RTD1) Tas(0000061) Age(00002673) Ter(S254) Netn(IGCS254 ) Use(CICSUSER) Uow(AB1DF0B309126800) Inf Act Tra(CSNE) Tas(0000021) Age(00002647) Use(CICSUSER) Uow(AB1DF16E3B78B403) Inf Act Tra(CEMT) Tas(0000064) Age(00002451) Ter(S255) Netn(IGCS255 ) Use(CICSUSER)
The CEMT INQUIRE UOW command can be filtered so that a UOW for a particular task is displayed. For example, CEMT INQUIRE UOW TASK(60) shows:
INQUIRE UOW TASK(60) STATUS: RESULTS - OVERTYPE TO MODIFY Uow(AB1DF0804B0F5801) Ind Shu Tra(RFS4) Tas(0000060) Age(00002699) Ter(S255) Netn(IGCS255 ) Use(CICSUSER) Con Lin(IYM52 )
In order to see more information for a particular UOW, position the cursor alongside the UOW and press ENTER:
INQUIRE UOW RESULT - OVERTYPE TO MODIFY Uow(AB1DF0804B0F5801) Uowstate( Indoubt ) Waitstate(Shunted) Transid(RFS4) Taskid(0000060) Age(00002801) Termid(S255) Netname(IGCS255) Userid(CICSUSER) Waitcause(Connection) Link(IYM52) Sysid(ISC2) Netuowid(..GBIBMIYA.IGCS255 .0......)
The UOW in question is AB1DF0804B0F5801. The Uowstate is Shunted, which means that syncpoint processing has been deferred and locks are retained until resource integrity can be ensured. In this case, the UOW is shunted Indoubt, which means that task 60 failed during syncpoint processing while in the in-doubt window.
The reason for the UOW being shunted is given by Waitcause--in this case, it is Connection. The UOW has been shunted due to a failure of connection ISC2. The associated Link (or netname) for the connection is IYM52.
A CEMT INQUIRE UOWLINK command shows information about connections involved in distributed UOWs:
INQUIRE UOWLINK STATUS: RESULTS Uowl(02EC0011) Uow(AB1DF0804B0F5801) Con Lin(IYM52 ) Coo Appc Una Sys(ISC2) Net(..GBIBMIYA.IGCS255 .0......)
To see more information for the Link, position the cursor alongside the UOW and press ENTER:
INQUIRE UOWLINK RESULT Uowlink(02EC0011) Uow(AB1DF0804B0F5801) Type(Connection) Link(IYM52) Action( ) Role(Coordinator) Protocol(Appc) Resyncstatus(Unavailable) Sysid(ISC2) Rmiqfy() Netuowid(..GBIBMIYA.IGCS255 .0......)
In this example, we can see that the connection ISC2 to system IYM52 is the syncpoint Coordinator for this UOW. The Resyncstatus is Unavailable, which means that the connection is not currently acquired.
A CEMT INQUIRE CONNECTION command confirms our findings:
I INQUIRE CONNECTION STATUS: RESULTS - OVERTYPE TO MODIFY Con(ISC2) Net(IYM52 ) Ins Rel Vta Appc Rec Con(ISC4) Net(IYM54 ) Ins Acq Vta Appc Xok Unk Con(ISC5) Net(IYM55 ) Ins Acq Vta Appc Xok Unk
To see more information for connection ISC2, position the cursor alongside the connection and press ENTER:
INQUIRE CONNECTION RESULT Connection(ISC2) Netname(IYM52) Pendstatus( Notpending ) Servstatus( Inservice ) Connstatus( Released ) Accessmethod(Vtam) Protocol(Appc) Purgetype( ) Xlnstatus() Recovstatus( Recovdata ) Uowaction( ) Grname() Membername() Affinity( ) Remotesystem() Rname() Rnetname()
This shows that the connection ISC2 is Released with Recovstatus Recovdata, indicating that resynchronization is outstanding for this connection.
At this stage, if it is possible to acquire the connection to system IYM52, resynchronization will take place automatically, UOW AB1DF0804B0F5801 will be unshunted and its enqueues will be released, allowing task 61 to complete. However, if it is not possible to acquire the connection, you may decide to unshunt the UOW and override normal resynchronization. To decide whether to commit or backout the UOW, you need to inquire on the associated UOW on system IYM52. A CEMT INQUIRE UOW command on system IYM52 shows:
INQUIRE UOW STATUS: RESULTS - OVERTYPE TO MODIFY Uow(AB1DD01221BA6E01) Inf Act Tra(CSSY) Tas(0000005) Age(00003191) Use(CICSUSER) Uow(AB1DD0122276C201) Inf Act Tra(CSSY) Tas(0000006) Age(00003191) Use(CICSUSER) Uow(AB1DD01248A7B005) Inf Act Tra(CSTP) Tas(0000008) Age(00003191) Use(CICSUSER) Uow(AB1DD9057B8DD800) Inf Act Tra(CSNC) Tas(0000018) Age(00000789) Use(CICSUSER) Uow(AB1DF0805E76B400) Com Wai Tra(CSM3) Tas(0000079) Age(00003003) Ter(-AC3) Netn(IYM51 ) Use(CICSUSER) Wai Uow(AB1DF0B2FDD36400) Inf Act Tra(CSNE) Tas(0000019) Age(00003024) Use(CICSUSER) Uow(AB1DF15502238000) Inf Act Tra(CEMT) Tas(0000086) Age(00002853) Ter(S25C) Netn(IGCS25C ) Use(CICSUSER)
For transactions started at a terminal, the CEMT INQUIRE UOW command can be filtered using Netuowid, so that only UOWs associated with transactions executed from a particular terminal are displayed. In this case, task 60 on system IYM51 was executed at terminal S255. The Netuowid of UOW AB1DF0804B0F5801 on system IYM51 contains the luname of terminal S255.
Because Netuowids are identical for all UOWs which are connected within a single distributed unit of work, the Netuowid is a useful way of tying these UOWs together. In this example, the command CEMT INQUIRE UOW NETUOWID(*S255*) filters the CEMT INQUIRE UOW command as follows:
INQUIRE UOW NETUOWID(*S255*) STATUS: RESULTS - OVERTYPE TO MODIFY Uow(AB1DF0805E76B400) Com Wai Tra(CSM3) Tas(0000079) Age(00003003) Ter(-AC3) Netn(IYM51 ) Use(CICSUSER) Wai
To see more information for UOW AB1DF0805E76B400, position the cursor alongside the UOW and press ENTER:
INQUIRE UOW RESULT - OVERTYPE TO MODIFY Uow(AB1DF0805E76B400) Uowstate( Commit ) Waitstate(Waiting) Transid(CSM3) Taskid(0000079) Age(00003003) Termid(-AC3) Netname(IYM51 ) Userid(CICSUSER) Waitcause(Waitforget) Link( ) Sysid( ) Netuowid(..GBIBMIYA.IGCS255 .0......)
We can see that UOW AB1DF0805E76B400 is associated with a mirror task used in function shipping. The Uowstate Commit means that the UOW has been committed and the Waitstate Waiting means that it is waiting because the decision has not been communicated to IYM51. This allows us safely to commit the shunted UOW on system IYM51, in the knowledge that resource updates will be synchronous with those on IYM52 for this distributed unit of work. You can use the CEMT SET UOW command to commit the shunted UOW. Once the shunted UOW is committed, its enqueues are released and task 61 is allowed to continue.
Another possible scenario could be that IYM52 is not available. If it is not practical to wait for IYM52 to become available and you are prepared to accept the risk to data integrity, you can use the CEMT SET CONNECTION command to commit, backout, or force all UOWs that have failed in-doubt due to the failure of connection ISC2.
In this example, transaction RTD1 was suspended on an ENQUEUE for a transient data queue. An active lock for the queue was owned by UOW AB1DF0804B0F5801, which had failed in-doubt. To avoid tasks being suspended in this way, you could define the transient data queue with the WAITACTION option set to REJECT (the default WAITACTION). If you do this, an in-doubt failure of a task updating the queue results in a retained lock being held by the shunted UOW. Requests for the retained lock are then rejected with the LOCKED condition.
For detailed information about CEMT commands, see the CICS Supplied Transactions manual.
This section is an example of how to resolve a resynchronization failure. It uses the following commands:
A user has reported that their transaction on system IYLX1 (which involves function shipping requests to system IYLX4) is failing with a 'SYSIDERR'. A CEMT INQUIRE CONNECTION command on system IYLX1 shows the following:
INQUIRE CONNECTION STATUS: RESULTS - OVERTYPE TO MODIFY Con(ISC2) Net(IYLX2 ) Ins Rel Vta Appc Unk Con(ISC4) Net(IYLX4 ) Pen Ins Acq Vta Appc Xno Unk Con(ISC5) Net(IYLX5 ) Ins Acq Vta Appc Xok Unk
The connection to system IYLX4 is an APPC connection called ISC4. To see more information about this connection, put the cursor on the ISC4 line and press ENTER--see Figure 82.
INQUIRE CONNECTION RESULT - OVERTYPE TO MODIFY Connection(ISC4) Netname(IYLX4) Pendstatus( Pending ) Servstatus( Inservice ) Connstatus( Acquired ) Accessmethod(Vtam) Protocol(Appc) Purgetype( ) Xlnstatus(Xnotdone) Recovstatus( Nrs ) Uowaction( ) Grname() Membername() Affinity( ) Remotesystem() Rname() Rnetname()
Although the Connstatus of connection ISC4 is Acquired, the Xlnstatus is Xnotdone. The exchange lognames (XLN) flow for this connection has not completed successfully. (When CICS systems connect they exchange lognames. These lognames are verified before resynchronization is attempted, and an exchange lognames failure means that resynchronization is not possible.) For function shipping, a failure for the connection causes a SYSIDERR. Synchronization level 2 conversations are not allowed on this connection until lognames are successfully exchanged. (This restriction does not apply to MRO connections.)
The reason for the exchange lognames failure is reported in the CSMT log. A failure on a CICS Transaction Server for z/OS® system can be caused by:
The Pendstatus for connection ISC4 is Pending, which means that there is resynchronization work outstanding for the connection; this work cannot be completed because of the exchange lognames failure.
At this stage, if we were not concerned about loss of synchronization, we could force all in-doubt UOWs to commit or back out by issuing the SET CONNECTION NOTPENDING command. However, there are commands that allow us to investigate the outstanding resynchronization work that exists before we clear the pending condition.
You can use a CEMT INQUIRE UOWLINK command to display information about UOWs that require resynchronization with system IYLX4:
INQUIRE UOWLINK LINK(IYLX4) STATUS: RESULTS - OVERTYPE TO MODIFY Uowl(016C0005) Uow(ABD40B40C1334401) Con Lin(IYLX4 ) Coo Appc Col Sys(ISC4) Net(..GBIBMIYA.IYLX150 M. A....) Uowl(01680005) Uow(ABD40B40C67C8201) Con Lin(IYLX4 ) Coo Appc Col Sys(ISC4) Net(..GBIBMIYA.IYLX151 M. F@b..) Uowl(016D0005) Uow(ABD40B40DA5A8803) Con Lin(IYLX4 ) Coo Appc Col Sys(ISC4) Net(..GBIBMIYA.IYLX156 M. .!h..)
To see more information for each UOW-link, press enter alongside it. For example, the expanded information for UOW-link 016C0005 shows the following:
I UOWLINK LINK(IYLX4) RESULT - OVERTYPE TO MODIFY Uowlink(016C0005) Uow(ABD40B40C1334401) Type(Connection) Link(IYLX4) Action( ) Role(Coordinator) Protocol(Appc) Resyncstatus(Coldstart) Sysid(ISC4) Rmiqfy() Netuowid(..GBIBMIYA.IYLX150 M. A....)
The Resyncstatus of Coldstart confirms that system IYLX4 has been started with a new logname. The Role for this UOW-link is shown as Coordinator, which means that IYLX4 is the syncpoint coordinator.
You could now use a CEMT INQUIRE UOW LINK(IYLX4) command to show all UOWs that are in-doubt and which have system IYLX4 as the coordinator system:
INQUIRE UOW LINK(IYLX4) STATUS: RESULTS - OVERTYPE TO MODIFY Uow(ABD40B40C1334401) Ind Shu Tra(RFS1) Tas(0000674) Age(00003560) Ter(X150) Netn(IYLX150 ) Use(CICSUSER) Con Lin(IYLX4 ) Uow(ABD40B40C67C8201) Ind Shu Tra(RFS1) Tas(0000675) Age(00003465) Ter(X151) Netn(IYLX151 ) Use(CICSUSER) Con Lin(IYLX4 ) Uow(ABD40B40DA5A8803) Ind Shu Tra(RFS1) Tas(0000676) Age(00003462) Ter(X156) Netn(IYLX156 ) Use(CICSUSER) Con Lin(IYLX4 )
To see more information for each in-doubt UOW, press enter on its line. For example, the expanded information for UOW ABD40B40C1334401 shows the following:
INQUIRE UOW LINK(IYLX4) RESULT - OVERTYPE TO MODIFY Uow(ABD40B40C1334401) Uowstate( Indoubt ) Waitstate(Shunted) Transid(RFS1) Taskid(0000674) Age(00003906) Termid(X150) Netname(IYLX150) Userid(CICSUSER) Waitcause(Connection) Link(IYLX4) Sysid(ISC4) Netuowid(..GBIBMIYA.IYLX150 M. A....)
This UOW cannot be resynchronized by system IYLX4--its status is shown as Indoubt, because IYLX4 does not know whether the associated UOW that ran on IYLX4 committed or backed out.
You can use the CEMT INQUIRE UOWENQ command to display the resources that have been locked by all shunted UOWs (those that own retained locks):
INQUIRE UOWENQ OWN RETAINED STATUS: RESULTS Uow(ABD40B40C1334401) Tra(RFS1) Tas(0000674) Ret Tsq Own Res(RFS1X150 ) Rle(008) Enq(00000008) Uow(ABD40B40C67C8201) Tra(RFS1) Tas(0000675) Ret Tsq Own Res(RFS1X151 ) Rle(008) Enq(00000008) Uow(ABD40B40DA5A8803) Tra(RFS1) Tas(0000676) Ret Tsq Own Res(RFS1X156 ) Rle(008) Enq(00000008)
You can filter the INQUIRE UOWENQ command so that only enqueues that are owned by a particular UOW are displayed. For example, to filter for enqueues owned by UOW ABD40B40C1334401:
INQUIRE UOWENQ OWN UOW(*4401) STATUS: RESULTS Uow(ABD40B40C1334401) Tra(RFS1) Tas(0000674) Ret Tsq Own Res(RFS1X150 ) Rle(008) Enq(00000008)
To see more information for this UOWENQ, press enter alongside it:
INQUIRE UOWENQ OWN UOW(*4401) RESULT Uowenq Uow(ABD40B40C1334401) Transid(RFS1) Taskid(0000674) State(Retained) Type(Tsq) Relation(Owner) Resource(RFS1X150) Rlen(008) Enqfails(00000008) Netuowid(..GBIBMIYA.IYLX150 M. A....) Qualifier() Qlen(000)
With knowledge of the application, it may now be possible to decide whether updates to the locked resources should be committed or backed out. In the case of UOW ABD40B40C1334401, the locked resource is the temporary storage queue RFS1X150. This resource has an ENQFAILS value of 8, which is the number of tasks that have received the LOCKED response due to this enqueue being held in retained state.
You can use the SET UOW command to commit, back out, or force the uncommitted updates made by the shunted UOWs. Next, you must use the SET CONNECTION(ISC4) NOTPENDING command to clear the pending condition and allow synchronization level 2 conversations (including the function shipping requests which were previously failing with SYSIDERR).
You can use the XLNACTION option of the CONNECTION definition to control the effect of an exchange lognames failure. In this example, the XLNACTION for the connection ISC4 is KEEP. This meant that:
An XLNACTION of FORCE for connection ISC4 would have caused the SET CONNECTION NOTPENDING command to have been issued automatically when the cold/warm log mismatch occurred. This would have forced the shunted UOWs to commit or back out, according to the ACTION option of the associated transaction definition. The connection ISC4 would then not have been placed into Pending status. However, setting XLNACTION to FORCE allows no investigation of shunted UOWs following an exchange lognames failure, and therefore represents a greater risk to data integrity than setting XLNACTION to KEEP.