gtpm3m0fMulti-Processor Interconnect Facility Reference

MPIF Error Processing

MPIF failures, generally speaking, can occur in any one of several levels:

MPIF Failures

Failures in MPIF C-type code are considered catastrophic and must be followed with a re-IPL of the system. Failures in MPIF E-type code is treated individually and appropriately for the condition and the function being performed. Generally, E-type failures does not cause a re-IPL of the system.

Data Transfer Failures

During data transfers via MPIF from one user to another, many key MPIF resources are involved.

To insure that these key resources are freed in case of failure, MPIF enforces disconnections for all users of the path. The processing includes the invoking of user error exits to process the queued items.

MPIF User Failures

Each MPIF user has allocated to it MPIF resources primarily in the form of dedicated control blocks and dynamically allocated MPIF resources. These resources are shared by the users during MPIF processing of their requests (for example, during send or receive).

For example,

Whenever a user issues a DISCONNECT or FORGET request all MPIF resources allocated to the user are reclaimed for subsequent use by MPIF for other potential users. When a user fails, and does not affect the functioning of MPIF (that is, a ECB driven system utility function), allocated resources can become unavailable for re-use until the system is re-IPLed. Such users (for example, copies of some system utility connected to another common user or point) must provide for recovery of such allocated resources by scheduling a recovery process for failed users (for example, a time initiated monitor of such users). Neither MPIF nor TPF is able to detect a failure for this type of MPIF user.

Path Failures

A path is a critical shared MPIF resource with broad implications to both MPIF and its users.

The status of a path depends on the proper operation of the using processes (for example, send and receive), the correct functioning of the channel programs, and the correct and continued operation of the channel and hardware devices. A failure in any of these might result in path termination. After termination actions have been completed, a path restart is scheduled to ensure availability of the MPIF transport mechanism.

In addition, the path can be stopped (gracefully or not) with a command. If the STOP PATH command is issued with the quiesce option, path termination is started, but is not complete until all using connections have been given time to quiesce their activity and disconnect. Refer to the STOP PATH command in TPF Operations for a description of this interval.

If the STOP PATH command is issued with the purge option, path termination completes regardless of the status of the connections; some user data might be lost in the process. The user is informed of the abnormal disconnect. In this case, the path is not automatically restarted.

The following are a set of conditions that might cause path startup problems:

Connection Failures

The connection process is a complex, multistage, asynchronous process involving the coordinated actions of the two connecting users and the two MPIF systems. The process of terminating the connection (disconnection) at the request of one of the users involves similar asynchronous actions by these four parties.

Connection termination can be initiated by a user via a DISCONNECT request or by MPIF. A connection is a critical MPIF resource upon which numerous MPIF processes are dependent. A connection is, itself, also highly dependent upon other critical MPIF resources in order to be fully functional; for example, a failure among any of the aforementioned parties or the path to which it is allocated causes the connection to terminate. Also, the failure of any process using the connection (send or receive), can require the termination of the connection in order to ensure proper recovery from the error.