gtps2m1yACF/SNA Data Communications Reference

Host IPL Considerations

After the TPF system performs an IPL, it will take one of the following actions for the RTP connections:

When the TPF system takes an outage, whether the network remains active or not depends on how long it takes the TPF system to IPL. If the TPF system is down for too long and the automatic network shutdown (ANS) timers expire, the network will deactivate the links to the TPF system causing all PU 5 and PU 2.1 LU-LU sessions across those links to fail. If the links remain active across the IPL, the LU-LU sessions remain active as well.

Because there is no way of knowing how long the TPF system has been down for a hardware IPL, and because you can define the ANS timer values for each link, the TPF system verifies the status of each link after the IPL. For HPR LU-LU sessions, a link failure does not cause those sessions to fail. Instead, it triggers a path switch of the RTP connections that were using that link. If the TPF system is down for a long period of time, the network will clean up all RTP connections that went to the TPF system and, therefore, all HPR LU-LU sessions as well. When the TPF system comes back up, it needs to be able to detect this condition and internally clean up all RTP connections and HPR LU-LU sessions that were active before the outage.

If the network has cleaned up its RTP connections, the TPF system could detect this after the IPL. No responses would be received on those RTP connections causing path switches to be started. When the path switch attempts eventually time out, this would cause the TPF system to clean up the RTP connections. Rather than wait for all of this to happen, the TPF system checks to see if any HPR-capable links remained active across the IPL. If not, this indicates that the TPF system has been down too long and causes the TPF system to clean up all RTP connections and HPR LU-LU sessions.

If at least one HPR-capable link remained active across the IPL of the TPF system and the SNA tables were reloaded from file (which always occurs for a hardware IPL), information in the RTPCB entries will likely be old. If RTPRSYNC=YES is coded on the SNAKEY macro in CTK2, the RTP connections remain active and the TPF system performs the RTP connection resynchronization process for each RTP connection. If RTPRSYNC=NO is coded on the SNAKEY macro in CTK2, the TPF system ends all RTP connections.

If at least one HPR-capable link remained active across the IPL of the TPF system and the core copies of the SNA tables were reused, the RTP connections remain active. The RTP connection resynchronization process is not necessary when this occurs because the information in the RTPCB entries is accurate.

RTP Connection Resynchronization Process

The RTP connection resynchronization process is the method used to keep RTP connections active across a hardware IPL of the TPF system. After a hardware IPL, the file copy of the SNA tables, including the RTPCB table, is reloaded from file. The RTPCB table on file is likely to be several seconds old. Therefore, the TPF system does not know the current input or output byte sequence number (BSN) values for an RTP connection. The current SYNC and ECHO values for an RTP connection are not known either. See SYNC and ECHO Numbers for more information about how SYNC and ECHO numbers are used by HPR support. The following provides an example of the problems:

  1. An RTP connection is active. Time-initiated keypointing files out the RTPCB entry, which contains the following values:
  2. Messages are sent and received on the RTP connection. The RTPCB entry now contains the following values:
  3. A hardware IPL of the TPF system is done. SNA restart reloads the SNA tables from file. The RTPCB entry after the IPL contains:

All of the values in the RTPCB entry are old, which can lead to different problems:

  1. If the TPF system sends an NLP containing a STATUS segment, the remote RTP endpoint discards the control information in that NLP because the SYNC number (103) in the NLP is old. The current SYNC number is now 105. The ECHO number (85) in the STATUS segment is also old (it should be 88).
  2. If the remote RTP endpoint sends an NLP containing a STATUS segment, the TPF system would accept the control information because the SYNC number (88) in the NLP is equal to or greater than the last SYNC number received (85). The problem is that the STATUS segment would acknowledge receiving messages up to BSN value 450, but the TPF system thinks it has not yet sent bytes 200-449. This would be treated as a protocol violation and cause the RTP connection to be taken down.
  3. If the TPF system sends an NLP containing data, the remote RTP endpoint discards the data thinking that it is duplicate data because its BSN (200) is older than the next expected BSN (which is 450).
  4. If the TPF system sends an NLP with BSN=200 and a length of 300 bytes, the first 250 bytes of data would be discarded (bytes 200-449) and the last 50 bytes would be treated as the next expected message. Because these 50 bytes are really the middle of a message, they would not have the correct start-of-message header settings. This will cause the remote RTP endpoint to break the RTP connection because of a protocol violation.
  5. If the remote RTP endpoint sends an NLP containing data, the BSN in the NLP will be 622. Because the TPF system is expecting data starting with BSN=500, the TPF system queues the NLP and asks the remote RTP endpoint to retransmit bytes 500-621. Because the TPF system already acknowledged receipt of bytes 500-621 before the IPL, the remote RTP endpoint does not have that data anymore and will break the RTP connection.

The RTP connection resynchronization process prevents all of these problems. The first step after reloading the RTPCB table from file is to increase the SYNC number value of the RTP connection by a large amount to make sure it is current. Using the previous example, the SYNC number in the file copy of the RTPCB entry is 85, but the real current SYNC number is 88. The RTP connection resynchronization process will increase the SYNC number in the RTP entry by a large amount (for example, by 100), so that the new value (185) is guaranteed to be greater than the current SYNC number. This way, control information sent by the TPF system will be accepted.

The next step is to set a flag in the RTPCB entry to indicate that when the first NLP is received after the IPL, assume that the BSN in that NLP is the BSN of the next expected message.

The final step is the most complicated part of the RTP connection resynchronization process. The TPF system sends out an HPR control message (an NLP with no data) to ask the remote RTP endpoint the BSN of the next message it is expecting. Until the response to that control message is received, the TPF system cannot send any data on this RTP connection. When the response is received, the next expected BSN value is copied into the output BSN field in the RTPCB entry and data traffic continues.

The following figure shows an example of the events leading up to a hardware IPL of the TPF system:

Figure 80. RTP Connection Resynchronization, before the IPL




In Figure 80:

  1. An RTP connection is active. SNA time-initiated keypointing files out the RTPCB entry with the values shown.
  2. The TPF system sends two 50-byte messages and receives a 20-byte message.
  3. The TPF system sends three more 50-byte messages, the first of which asks for a status reply.
  4. The RTPCB entry for this RTP connection now contains the values shown.
  5. A hardware IPL of the TPF system occurs.

The following figure shows an example of the steps involved in the RTP connection resynchronization process:

Figure 81. RTP Connection Resynchronization, after the IPL




In Figure 81:

  1. The RTPCB table is reloaded from file after the IPL during SNA restart. The values in the RTPCB entry are old.
  2. The RTP connection resynchronization process begins. The SYNC number is increased by a large amount (from 103 to 203) and the connection is placed in RESYNC state.
  3. When the TPF system is cycled up, an NLP is sent asking for a status reply. This NLP contains no user data.
  4. The TPF system receives an NLP containing 20 bytes of data and a STATUS segment. Because this is the first NLP received after the IPL, the BSN RECEIVED field in the RTPCB entry is set to the BSN value of this NLP (520). The data message is processed normally.

    The ECHO number (104) in the STATUS segment does not match the current SYNC number (203); therefore, the RTP connection resynchronization process continues. The STATUS segment in this NLP is the reply to the status request sent out just before the IPL.

  5. Another NLP is received containing a STATUS segment. This time the ECHO number (203) matches the current SYNC number; therefore, this is the reply to the status request sent out by the RTP connection resynchronization process. The RSEQ value in the STATUS segment indicates the next expected message that the remote RTP endpoint is waiting for starts with a BSN value of 450. The TPF system sets its output BSN (BSN SENT) field to 450 and places the connection back in CONNECTED state.
  6. The RTPCB entry contains the values as shown. The RTP connection resynchronization process is completed successfully and outbound data traffic continues.

This example shows that the first STATUS segment received after the IPL does not necessarily contain the latest information. The RSEQ value of the first STATUS segment was 350, but NLPs with BSN values 350-449 were already sent before the IPL. The RTP connection resynchronization process must send its own status request after the IPL and wait for the reply to that status request to determine the correct RSEQ value.

Enabling the RTP Resynchronization Process

The RTPRSYNC parameter on the SNAKEY macro in CTK2 enables the RTP resynchronization process for the TPF system. You can also use the ZNKEY command with the RTPRSYNC parameter specified to enable the RTP resynchronization process. See TPF ACF/SNA Network Generation for more information about the SNAKEY macro. See TPF Operations for more information about the ZNKEY command.